Partitions are created even when Jobs are aborted
Review Request #6219 - Created July 31, 2012 and updated
If an MR job using HCatOutputFormat fails, and FileOutputCommitterContainer::abortJob() is called, one would expect that partitions aren't created/registered with HCatalog. When using dynamic-partitions, one sees that this behaves correctly. But when static-partitions are used, partitions are created regardless of whether the Job succeeded or failed. (This manifested as a failure when the job is repeated. The retry-job fails to launch since the partitions already exist from the last failed run.) This is a result of bad code in FileOutputCommitter::cleanupJob(), which seems to do an unconditional partition-add. This can be fixed by adding a check for the output directory before adding partitions (in the !dynamicParititoning case), since the directory is removed in abortJob(). We'll have a patch for this shortly. As an aside, we ought to move the partition-creation into commitJob(), where it logically belongs. cleanupJob() is deprecated and common to both success and failure code paths.
unit tests and e2e test pass.
|Nitpick: print the new partitions before they are added. Prolly soon after you create ptnInfos. This way we know what ...||Francis Liu||Aug. 3, 2012, 6:16 p.m.||Open|
|We are not really backward compatible with Pig 0.8. Just clarify.||Francis Liu||Aug. 3, 2012, 6:16 p.m.||Open|
|this should be a log message.||Francis Liu||Aug. 3, 2012, 6:16 p.m.||Open|
|indentation||Francis Liu||Aug. 3, 2012, 6:16 p.m.||Open|