Review Board 1.7.22


ExecSource don't flush the cache if there is no input entries

Review Request #8854 - Created Jan. 7, 2013 and updated

Fengdong Yu
1.4.0
Reviewers
Flume
flume-git
ExecSource has a default batchSize: 20, exec source read data from the source, then put it into the cache, after the cache is full, push it to the channel.

but if exec source's cache is not full, and there isn't any input for a long time, then these entries always kept in the cache, there is no chance to the channel until the source's cache is full.

so, the patch added a new config line: batchTimeout for ExecSource, and default is 3 seconds, if batchTimeout exceeded, push all cached data to the channel even the cache is not full.

 
Total:
1
Open:
1
Resolved:
0
Dropped:
0
Status:
From:
Ship it!
Posted (Jan. 7, 2013, 9:42 a.m.)
Ship It!
Posted (Jan. 7, 2013, 10:20 p.m.)
Thanks for the patch!

I like the idea, but it does not look like this approach is sufficient - since the timeout is checked only when a new line is written out. If no lines are written out, after a few initial are written, the flush never happens.

Also, please add a unit test for the feature.
flume-ng-core/src/main/java/org/apache/flume/source/ExecSource.java (Diff revision 1)
 
 
 
 
 
 
 
 
 
 
How does this help? The readLine() method would block until the next line is read from the process's stdout right? So if the process writes only batchSize - 1 events before timeout and then never writes, the source would still not flush right? You probably need to add another thread to make sure the flush has happened. 

Also when you do this, you need to be careful about synchronization - you will probably need to put this inside a synchronized block or lock or something and put the timeout flush code in the same lock/synchronized block.