Review Board 1.7.22


Export dir to support subdirectories

Review Request #10646 - Created April 19, 2013 and updated

Vasanth kumar RJ
SQOOP-951
Reviewers
Sqoop
sqoop-trunk
Export dir to support subdirectories
Done
Total:
1
Open:
1
Resolved:
0
Dropped:
0
Status:
From:
Posted (May 20, 2013, 1:48 p.m.)
Hi Vasanth,
thank you very much for working on this patch, greatly appreciated! Would you mind introducing test case that will cover the new introduced functionality?
src/java/org/apache/sqoop/mapreduce/ExportJobBase.java (Diff revision 1)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
I don't feel entirely comfortable about this as it will change behavior of the default input format that is skipping certain names. For example files/directories starting with dot or underscore are normally skipped.

Perhaps we could introduce new parameter like --recursive-export that will be properly documented? 
  1. Hi Jarcec,
    
    Currently sqoop is normally skipping the files starting with dot or underscore. Yes this patch skips the dot or underscore files. Here, adding only input paths.
    So you want to include the file starting with dot and underscore?
    
    Kindly suggest
  2. I did not tried it myself yet, but I believe that with this patch Sqoop will try to add content of directories starting with dot or underscore such as "_logs" or others that might be generated by mapreduce job automatically. If that would be indeed the case, then I'm afraid that this patch might break current customer deployments and hence my concern about backward compatibility.
Jarcec
  1. With the introduction of Hcat support, we will be able to move entire hive tables with partitions in it (one of the cases for this) and I would assume data in HDFS in subdirectories destined for a single table would typically be a Hive table.   But still this can help in some scenarios.   I agree with Jarcec's comments that --recursive-export option would be needed to specifically request the changed behavior.