Patch to support custom FlumeFormatter implementations for writing HDFS SequenceFiles
Review Request #6918 - Created Sept. 5, 2012 and updated
This patch allows users to customise the format of HDFS SequenceFiles by providing a custom FlumeFormatter implementation. Currently, the user can set hdfs.writeFormat to either "Text" or "Writable", corresponding to HDFSTextFormatter and HDFSWritableFormatter respectively. With this patch, hdfs.writeFormat can also be set to the full class name of a Builder implementation, e.g.: agent_foo.sinks.hdfs-sink.writeFormat=com.mycompany.flume.MyCustomFormatter$Builder They can also pass custom configuration params to the builder, e.g.: agent_foo.sinks.hdfs-sink.writeFormat.ignoreHeaders=foo,bar These params will be passed to the Builder's build() method as a Context object. I've tried to be as consistent as possible with the design of EventSerializerFactory: * Use an enum for the different formatter types, rather than static strings. * Use a Builder, rather than constructing a FlumeFormatter directly.
Unit tests included in patch. Using a patched build of Flume in an internal project (not in production).