Review Board 1.7.22


FLUME-1649. Improve HBaseSink performance.

Review Request #7657 - Created Oct. 18, 2012 and updated

Hari Shreedharan
FLUME-1649
Reviewers
Flume
flume-git
Modify the Hbase sink to use multiple threads to wait on RPC calls. Since the threads are blocked often, this will not cause too many running threads.
All current unit tests pass. Did functional testing.
Total:
4
Open:
3
Resolved:
0
Dropped:
1
Status:
From:
Review request changed
Updated (Oct. 19, 2012, 12:14 a.m.)
Add sink counters to support monitoring.
Ship it!
Posted (Nov. 4, 2012, 12:54 p.m.)
Ship It!
Posted (Dec. 6, 2012, 10:41 p.m.)
Hari, looks like a good patch! I have a few comments and questions below.
Why do we batch the puts into one runnable but not the increments?
  1. This is because Increments did not implement Row interface in 0.92.1 release (which we use in the hadoop-1 profile), and so they cannot be batched
  2. OK, perfect, I was not aware of this. Not being familiar with the HBase api, I still have one question, is there reason we don't send a list of Increments like we send a list of Rows in a single PutRunnable?
The formatting is kind of off because of the number of changes here so maybe I am off, but we also call tnx.commit() and tnx.close() below. Are we  sure that the future.get() will throw an exception everytime and as such we won't hit those lines?
Curious as to why we removed this behavior?
I wonder what the performance impact will be of flushing every increment?