Review Board 1.7.22


Partitions are created even when Jobs are aborted

Review Request #6219 - Created July 31, 2012 and updated

Vandana Ayyalasomayajula
0.4
hcatalog-451
Reviewers
hcatalog
hcatalog
If an MR job using HCatOutputFormat fails, and FileOutputCommitterContainer::abortJob() is called, one would expect that partitions aren't created/registered with HCatalog.

When using dynamic-partitions, one sees that this behaves correctly. But when static-partitions are used, partitions are created regardless of whether the Job succeeded or failed.
(This manifested as a failure when the job is repeated. The retry-job fails to launch since the partitions already exist from the last failed run.)

This is a result of bad code in FileOutputCommitter::cleanupJob(), which seems to do an unconditional partition-add. This can be fixed by adding a check for the output directory before adding partitions (in the !dynamicParititoning case), since the directory is removed in abortJob().

We'll have a patch for this shortly. As an aside, we ought to move the partition-creation into commitJob(), where it logically belongs. cleanupJob() is deprecated and common to both success and failure code paths.
unit tests and e2e test pass. 
Total:
4
Open:
4
Resolved:
0
Dropped:
0
Status:
From:
Review request changed
Updated (Aug. 1, 2012, 11:35 p.m.)
Ship it!
Posted (Aug. 2, 2012, 6:09 p.m.)
Ship It!
Ship it!
Posted (Aug. 3, 2012, 6:16 p.m.)
Some minor changes which can be changed on commit.
nitpick: internalAbortJob() sounds better :-).
Nitpick: print the new partitions before they are added. Prolly soon after you create ptnInfos. This way we know what was trying to be added in case of failure and if the job passes we know that it was added. We don't need to print them after they are added.
We are not really backward compatible with Pig 0.8. Just clarify.
this should be a log message.
indentation