Review Board 1.7.22


SQOOP-931 - Integration of Sqoop and HCatalog

Review Request #10688 - Created April 21, 2013 and updated

Venkat Ranganathan
Reviewers
Sqoop
jarcec
sqoop-trunk
This patch implements the new feature of integrating HCatalog and Sqoop.   With this feature, it is possible to import and export data between Sqoop and HCatalog tables.   The document attached to SQOOP-931 JIRA issue discusses the high level appraches.  

With this integration, more fidelity can be brought to the process of moving data between enterprise data stores and hadoop ecosystem.
Two new integration test suites with more than 20 tests in total have been added to test various aspects of the integration.  A unit test to test the option management is also added.   All tests pass
Total:
28
Open:
28
Resolved:
0
Dropped:
0
Status:
From:
Description From Last Updated Status
I'm not feeling entirely comfortable about depending on SNAPSHOTS. Is there a particular feature that we're taking advantage of in ... Jarek Cecho May 20, 2013, 1:02 p.m. Open
Shouldn't those two dependencies be transitively propagated from HCatalog/Hive? Jarek Cecho May 20, 2013, 1:02 p.m. Open
Does the new property make sense when it's valid only on Hadoop2 that actually do not have any JobTracker address ... Jarek Cecho May 20, 2013, 1:02 p.m. Open
Is the timestamp mapped to String from similar reason as mentioned above with SMALLINT? Jarek Cecho May 20, 2013, 1:02 p.m. Open
I'm thinking if having subclass of the DataDrivenImportJob for HCat specific things that would override this and couple of other ... Jarek Cecho May 20, 2013, 1:02 p.m. Open
Similarly as in the import. Would having dedicated classes for HCatalog make sense/would be cleaner that having one class for ... Jarek Cecho May 20, 2013, 1:02 p.m. Open
This comment seems to be artifact from development. I would suggest to improve the message and move it into "debug" ... Jarek Cecho May 20, 2013, 1:02 p.m. Open
I think that we also want to skip invoking the output committer in case of hadoop 2. Jarek Cecho May 20, 2013, 1:02 p.m. Open
Nit: Incorrect indentation. Jarek Cecho May 20, 2013, 1:02 p.m. Open
Is this method necessary? It seems to be only calling the parent method without any additional logic. Jarek Cecho May 28, 2013, 9:32 a.m. Open
The "home" variable seems to be unused after the assignment, so I'm assuming that assignements are not necessary? Jarek Cecho May 28, 2013, 9:32 a.m. Open
It seems to me that the --as-avrodatafile and --as-sequencefile are also not compatible with the HCatalog import/export so we might ... Jarek Cecho May 28, 2013, 9:32 a.m. Open
Can we add here information what will happen if the table already exists and this parameter is specified? Jarek Cecho June 4, 2013, 11:14 p.m. Open
This seem unnecessary, can we tweak the bash scripts to do this automatically if the hcat command is present? Jarek Cecho June 4, 2013, 11:14 p.m. Open
This method seems to be required only for the debug message. Is it the only reason or did I miss ... Jarek Cecho June 4, 2013, 11:14 p.m. Open
Nit: Shouldn't be default Hive home in SqoopOptions.getDefaultHiveHome()? Jarek Cecho June 4, 2013, 11:14 p.m. Open
Shouldn't be default Hive home in SqoopOptions.getDefaultHcatHome()? Jarek Cecho June 4, 2013, 11:14 p.m. Open
Both Hive and HBase are idempotent when creating tables, so It might make sense to add "IF NOT EXISTS" in ... Jarek Cecho June 4, 2013, 11:14 p.m. Open
It seems that at this point we are not reading the hive configuration files but yet executing the in-process Hive ... Jarek Cecho June 4, 2013, 11:14 p.m. Open
Shouldn't we use here the SqoopOptions.getDefaultHiveHome()? Jarek Cecho June 4, 2013, 11:14 p.m. Open
Shouldn't we use here the SqoopOptions.getDefaultHCatHome()? Jarek Cecho June 4, 2013, 11:14 p.m. Open
Nit: Considering that there might be Hadoop3 in the future, would it be simple to change the condition to (isLocalMode ... Jarek Cecho June 4, 2013, 11:14 p.m. Open
Nit: Those lines seems to be unused. Jarek Cecho June 4, 2013, 11:14 p.m. Open
Can we write the file in temporary directory rather than in current working directory? (that might not be writable). Jarek Cecho June 4, 2013, 11:14 p.m. Open
I would suggest to alter this to single line: LOG.error("Error writing HCatalog load-in script: ", ioe); That will also print ... Jarek Cecho June 4, 2013, 11:14 p.m. Open
I would suggest to change this line to : LOG.warn("IOException closing stream to HCatalog script: ", ioe); That will also ... Jarek Cecho June 4, 2013, 11:14 p.m. Open
Rest of the Sqoop is expecting variable HADOOP_COMMON_HOME whereas the underlying hcat script is expecting HADOOP_HOME, so on BigTop this ... Jarek Cecho June 6, 2013, 6:34 p.m. Open
Nit: there seems to be extra "`" that is breaking the script. Jarek Cecho June 7, 2013, 1:03 a.m. Open
Review request changed
Updated (June 7, 2013, 2:03 a.m.)
Latest changes with the issue identified fixed.   

Thanks Jarek for a thorough review - Very much appreciated.   Will upload to JIRA also
Ship it!
Posted (June 7, 2013, 2:29 p.m.)
Ship It!