Review Board 1.7.22


SQOOP-428: Support compression for Avro import

Review Request #3600 - Created Jan. 24, 2012 and updated

Lars Francke
SQOOP-428
Reviewers
Sqoop
sqoop-trunk
This basically only ports all the code from Avro's (1.5.4) AvroOutputFormat to the new MR API.

I've changed the test to extract the common functionality into a helper method because they are the same apart from the two command line arguments.

I could have deleted AvroJob completely but as I was told last time that binary compatibility needs to be maintained I left it in. It's not needed anymore as all necessary functionality can be gotten from Avro's own version of that file as far as I can tell. So if it's okay to delete that redundant file (two actually, cloudera and apache package) let me know and I'll provide a new patch.
All tests pass for hadoopversion=20 but TestColumnTypes fails for me on 23. I can't see how that's related though.
src/docs/user/import.txt
Revision deddb1a New Change
[20] 384 lines
[+20]
385
using the deflate (gzip) algorithm with the +-z+ or +\--compress+
385
using the deflate (gzip) algorithm with the +-z+ or +\--compress+
386
argument, or specify any Hadoop compression codec using the
386
argument, or specify any Hadoop compression codec using the
387
+\--compression-codec+ argument. This applies to SequenceFile, text,
387
+\--compression-codec+ argument. This applies to SequenceFile, text,
388
and Avro files.
388
and Avro files.
389

    
   
389

   

    
   
390
With Avro the argument to +\--compression-codec+ must not be a fully

    
   
391
qualified class name but one of +deflate+ or +snappy+.

    
   
392

   
390
Large Objects
393
Large Objects
391
^^^^^^^^^^^^^
394
^^^^^^^^^^^^^
392

    
   
395

   
393
Sqoop handles large objects (+BLOB+ and +CLOB+ columns) in particular
396
Sqoop handles large objects (+BLOB+ and +CLOB+ columns) in particular
394
ways. If this data is truly large, then these columns should not be
397
ways. If this data is truly large, then these columns should not be
[+20] [20] 284 lines
src/java/org/apache/sqoop/mapreduce/AvroJob.java
Revision a57aaf1 New Change
 
src/java/org/apache/sqoop/mapreduce/AvroOutputFormat.java
Revision 96befd7 New Change
 
src/java/org/apache/sqoop/mapreduce/ImportJobBase.java
Revision ed6954a New Change
 
src/test/com/cloudera/sqoop/TestAvroImport.java
Revision 1b8b046 New Change
 
  1. src/docs/user/import.txt: Loading...
  2. src/java/org/apache/sqoop/mapreduce/AvroJob.java: Loading...
  3. src/java/org/apache/sqoop/mapreduce/AvroOutputFormat.java: Loading...
  4. src/java/org/apache/sqoop/mapreduce/ImportJobBase.java: Loading...
  5. src/test/com/cloudera/sqoop/TestAvroImport.java: Loading...