Review Board 1.7.22


ProgressReporter should work with both old and new MR API

Review Request #4971 - Created May 2, 2012 and updated

Travis Crawford
HCATALOG-373
Reviewers
hcatalog
francisliu
hcatalog-git
Update ProgressReporter to work with both old and new mapreduce API. Delay creating the base record reader so we have a StatusReporter and can use counters.
"ant clean test" passes

I can run pig+hcatalog queries using Elephant-Bird deprecated API wrappers, which is why this issue originally came up.
Review request changed
Updated (May 5, 2012, 1:23 a.m.)
Patch updated to include a test that counters exist when input has been read. Initially I went to add a test specific to this functionality but there was significant overlap with what already exists, so I updated an existing test to check that the bytes read counter is set when data has actually been read. This seems like the best approach as the ongoing maintenance will be low. The check includes a comment about why the check exists to help future developers.

Suggestions from the previous review have also been incorporated. Specifically, we reuse hcatSplit instead of recasting. As the whole setting properties loop is not used I removed that section.
Posted (May 11, 2012, 11:42 p.m.)
Looks good to me. Just a concern with the change in the test output directory. If you agree with any of my proposals, I can change that on commit. Will run e2e tests.
  1. Sure, either approach sounds fine. The main thing is being able to run in the IDE. Do you want to make the change at commit time or for me to make it?
  2. Cool. I don't mind either way.
  3. Hey Francis,
    
    What do you think about checking this in as-is, and we can address the logs directory in https://issues.apache.org/jira/browse/HCATALOG-374 ? There's a lot of inconsistent setting up of a test metastore and I'd like to consolidate that setup code in a base test class so updates like these are available to all tests. I posted a patch in that issue you can look at as an example of what I'm proposing.
  4. Hi Travis, if you're addressing this issue in a separate Jira. I'd prefer not checking in this portion of the patch as it might affect the automated builds.
  5. The tests pass locally so CI should not be affected. In general I'm not a fan of builds producing things outside the build directory, because it makes locating test data and logs more difficult, and the clean target needs to do more than simply deleting the build directory, so there are opportunities for clean to miss things. Putting files in directories with the test class name ensures state does not get shared between tests.
    
    Please let me know what you'd like me to do here so we can wrap this up. This patch has been going for over a month now and I'd really like to reach closure and move on to new issues.
  6. I see, I'm no fan either and since you're addressing it in a separate jira. I'd rather we make the changes there rather than making it point to potentially another external directory. I'll check this in without this section. Let me know if you're ok with that.
  7. That would be great. Thanks!
The warehouse directory is set in the ant test target. If you're doing this for runs in your IDE then it'd be better to have this check if the sysproperty hive.metastore.warehouse.dir is set to support both cases.

It'll probably better if you put the dir in /tmp similar to ant the config "/tmp/hcat_junit_warehouse". Or add ant into the name {user.home}/hive_build/...