PIG-3536 Implement DISTINCT for Pig-on-Tez
Review Request #15219 - Created Nov. 5, 2013 and updated
Implement DISTINCT for Pig-on-Tez by providing a (very straightforward) implementation in TezCompiler.java. For the moment, this does NOT use two optimizations done in the MRCompiler. We will create a separate JIRA for these optimizations: 1. A distinct combiner 2. A combiner optimizer that replaces certain uses of DISTINCT with an algebraic udf [Little code note: I changed the name of getPlainForEach to getForEachPlain. That way we can have getForEachHelper1, getForEachHelper2, etc. all follow alphabetically. Sorry if that's a little too OCD.]
This patch includes: -A unit test in TestTezCompiler.java -An e2e test DANIEL: Can you check that my e2e test looks appropriate? I wasn't sure which test data set to choose, I just picked studenttab20m.