Review Board 1.7.22


PIG-3555 Initial implementation of Tez combiner optimization

Review Request #15261 - Created Nov. 6, 2013 and submitted

Cheolsoo Park
tez
PIG-3555
Reviewers
pig
abain, daijy, mwagner, rohini
pig-git
Initial implementation of Tez combiner optimizer. The patch includes the following changes-
* Factored out CombinerOptimizer code into a utility class called CombinerOptimizerUtil. So both MR and Tez CombinerOptimizer use this utility class instead of duplicating code.
* Introduced a new class called TezEdgeDescriptor that holds combine plans as well as various edge properties.
* Added TezEdgeDescriptors to TezOperator. Note that I added multiple descriptors for inbound edges but a single descriptor for all the outbound edges. This is because TezDagBuilder always creates an edge by connecting predecessors to the current vertex. Please let me know if you think we should allow multiple descriptors for outbound edges too.
* Refactored some code in TezDagBuilder while touching it.
ant test-tez passes.
ant test-e2e-tez passes.

I didn't add new test cases, but an e2e test case (Checkin_3) includes an algebraic udf (count) following group-by. I also manually tested it on a live cluster.
Total:
16
Open:
0
Resolved:
11
Dropped:
5
Status:
From:
Description From Last Updated Status
Review request changed
Updated (Nov. 12, 2013, 12:34 a.m.)
  • changed from pending to submitted