Review Board 1.7.22


PIG-3536 Implement DISTINCT for Pig-on-Tez

Review Request #15219 - Created Nov. 5, 2013 and updated

Alex Bain
tez
PIG-3536
Reviewers
pig
cheolsoo, daijy, mwagner, rohini
pig-git
Implement DISTINCT for Pig-on-Tez by providing a (very straightforward) implementation in TezCompiler.java.

For the moment, this does NOT use two optimizations done in the MRCompiler. We will create a separate JIRA for these optimizations:
1. A distinct combiner
2. A combiner optimizer that replaces certain uses of DISTINCT with an algebraic udf

[Little code note: I changed the name of getPlainForEach to getForEachPlain. That way we can have getForEachHelper1, getForEachHelper2, etc. all follow alphabetically. Sorry if that's a little too OCD.]
This patch includes:
-A unit test in TestTezCompiler.java
-An e2e test

DANIEL: Can you check that my e2e test looks appropriate? I wasn't sure which test data set to choose, I just picked studenttab20m.
Total:
1
Open:
0
Resolved:
1
Dropped:
0
Status:
From:
Description From Last Updated Status
Ship it!
Posted (Nov. 5, 2013, 5:19 a.m.)
Looks great! Thank you Alex!

Please let me fix the typo below when I commit it.
test/e2e/pig/tests/tez.conf (Diff revision 2)
 
 
This should be 2.