Review Board 1.7.22


[HBASE-4532] [89-fb] Avoid top row seek by dedicated bloom filter for delete family bloom filter

Review Request #2393 - Created Oct. 15, 2011 and updated

Liyin Tang
89-fb, trunk
HBASE-4532
Reviewers
hbase
amitanand, blackpearl, dhruba, gqchen, kannanm, karthik.ranga, khemani, Liyin, mbautin, nspiegelberg, stack
hbase-git
The previous jira, HBASE-4469, is to avoid the top row seek operation if row-col bloom filter is enabled. 
This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family

The only subtle use case is when we are interested in the top row with empty column.

For example, 
we are interested in row1/cf1:/1/put.
So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family.
Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol).
In this way, we have already missed the real kv we are interested in.


The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column.


Evaluation from TestSeekOptimization:
Previously:
For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60%
For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60%
For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%

For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60%
For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60%
For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%

So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is enabled.[HBASE-4469]

================================================

After this change:
For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%
For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%
For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%

For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%
For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%
For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%

So we can get about 10% more seek savings for ALL kinds of bloom filter.
Passed all the unit tests
src/main/java/org/apache/hadoop/hbase/KeyValue.java
Revision 93538bb New Change
[20] 1645 lines
[+20] [+] public static KeyValue createFirstOnRow(final byte [] row, final byte [] c,
1646
      final byte [] qualifier) {
1646
      final byte [] qualifier) {
1647
    return new KeyValue(row, family, qualifier, HConstants.LATEST_TIMESTAMP, Type.Maximum);
1647
    return new KeyValue(row, family, qualifier, HConstants.LATEST_TIMESTAMP, Type.Maximum);
1648
  }
1648
  }
1649

    
   
1649

   
1650
  /**
1650
  /**

    
   
1651
   * Create a Delete Family KeyValue for the specified row and family that would

    
   
1652
   * be smaller than all other possible Delete Family KeyValues that have the

    
   
1653
   * same row and family.

    
   
1654
   * Used for seeking.

    
   
1655
   * @param row - row key (arbitrary byte array)

    
   
1656
   * @param family - family name

    
   
1657
   * @return First Delete Family possible key on passed <code>row</code>.

    
   
1658
   */

    
   
1659
  public static KeyValue createFirstDeleteFamilyOnRow(final byte [] row,

    
   
1660
      final byte [] family) {

    
   
1661
    return new KeyValue(row, family, null, HConstants.LATEST_TIMESTAMP,

    
   
1662
        Type.DeleteFamily);

    
   
1663
  }

    
   
1664

   

    
   
1665
  /**
1651
   * @param row - row key (arbitrary byte array)
1666
   * @param row - row key (arbitrary byte array)
1652
   * @param f - family name
1667
   * @param f - family name
1653
   * @param q - column qualifier
1668
   * @param q - column qualifier
1654
   * @param ts - timestamp
1669
   * @param ts - timestamp
1655
   * @return First possible key on passed <code>row</code>, column and timestamp
1670
   * @return First possible key on passed <code>row</code>, column and timestamp
[+20] [20] 402 lines
src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java
Revision 9a79a74 New Change
 
src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
Revision 5d9b518 New Change
 
src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java
Revision 6cf7cce New Change
 
src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java
Revision 1f78dd4 New Change
 
src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
Revision 3c34f86 New Change
 
src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java
Revision 2e1d23a New Change
 
src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java
Revision c4b60e9 New Change
 
src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
Revision 92070b3 New Change
 
src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
Revision e4dfc2e New Change
 
src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java
Revision ebb360c New Change
 
src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
Revision 8814812 New Change
 
src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java
Revision fb4f2df New Change
 
src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java
Revision b8bcc65 New Change
 
src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java
Revision 48e9163 New Change
 
src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java
Revision 0eca9b8 New Change
 
  1. src/main/java/org/apache/hadoop/hbase/KeyValue.java: Loading...
  2. src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java: Loading...
  3. src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java: Loading...
  4. src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java: Loading...
  5. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java: Loading...
  6. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java: Loading...
  7. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java: Loading...
  8. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java: Loading...
  9. src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java: Loading...
  10. src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java: Loading...
  11. src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java: Loading...
  12. src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java: Loading...
  13. src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java: Loading...
  14. src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java: Loading...
  15. src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java: Loading...
  16. src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java: Loading...