Review Board 1.7.22


[HBASE-2794] Optimize multi-column scans using Bloom filters

Review Request #2084 - Created Sept. 28, 2011 and updated

Mikhail Bautin
HBASE-2794
Reviewers
hbase
hbase-git
Previously we only used row-column Bloom filters for scans that only requested one column. We have seen production queries that request up to 200 columns, and with say ~6 store files per store (region / column family combination) this might have resulted in 1200 block read operations in the worst case. With this diff we will be avoiding seeks on store files that we know don't contain the row/column of interest when using an ExplicitColumnTracker. The performance should remain the same for column range queries.
Existing unit tests. A new unit test (TestScanWithBloomError). Load testing using HBaseTest.
Review request changed
Updated (Sept. 29, 2011, 9:05 p.m.)
Addressing Jonathan's comments.
Ship it!
Posted (Sept. 30, 2011, 7:09 p.m.)
I'm +0 on commmitting this.  I tried reviewing it but I don't know this code well.  The added unit test is nicely intrusive and the asserts look right.  What about Nicolas's performance concerns.  How are they addressed by this patch?  I'm running a build of the patch and if that passes I'm +1 on commit. 
Interesting method name.  We should use this pattern everywhere we have to do this.
Should we get rid of this javadoc if an override?  (Let us know can do on commit)