Review Board 1.7.22


Improve RCFile::sync(long) by 10x

Review Request #10795 - Created April 26, 2013 and updated

Gopal V
trunk
HIVE-4423
Reviewers
hive
ashutoshc, haglein
hive-git
Speed up RCFile::sync() by reading large blocks of data from HDFS rather than using readByte() on the input stream. 

This improves the loop behaviour and reduces the number of calls on the synchronized read() methods within HDFS, resulting in a 10x performance boost to this function.

In real time, it converts a call that takes upto a second and brings it below 100ms, by reading 512 byte chunks instead of reading data 1 byte at a time.
ant test -Dtestcase=TestRCFile -Dmodule=ql
ant test -Dtestcase=TestCliDriver -Dqfile_regex=.*rcfile.* -Dmodule=ql

And benchmarking with count(1) on the store_sales rcfile table at scale=10

before: 43.8, after: 39.5 
Ship it!
Posted (April 26, 2013, 3:13 p.m.)
Ship It!