FLUME-1117: Implement support for Avro container file output in Flume
Review Request #4708 - Created April 12, 2012 and submitted
Patch with support for Avro Container File format.
Unit tests pass. New unit tests added for new functionality.
Posted (April 12, 2012, 9:51 p.m.)
I think we should log a separate jira to address cleanly integrating the Avro serializer with HDFS sink. Got a few minor comments, mainly calling flush/sync of the serializer and the stream is bit inconsistent. Rest look fine to me.
Posted (April 16, 2012, 1:49 a.m.)
I have one general question about this. I am not entirely sure that the serializer should handle the stream directly. I think it would be better if we simply return the serialized event as a byte array from the EventSerializer, which the class that deals with the output streams(the sinks etc) can deal with directly. This would in general remove the requirement of the serializer component to have a bunch of functions which are not associated with serializing at all, but with writing/flushing etc of the output stream. The serializer should simply return a serialized event which the sink can write to the stream. The serializer should not really be concerned about flushing the stream etc. The event serializer in my opinion should be simple. It knows the schema and simply returns the byte array which can be written.
Review request changed
Updated (April 16, 2012, 4:09 a.m.)