Review Board 1.7.22


PIG-3555 Initial implementation of Tez combiner optimization

Review Request #15261 - Created Nov. 6, 2013 and submitted

Cheolsoo Park
tez
PIG-3555
Reviewers
pig
abain, daijy, mwagner, rohini
pig-git
Initial implementation of Tez combiner optimizer. The patch includes the following changes-
* Factored out CombinerOptimizer code into a utility class called CombinerOptimizerUtil. So both MR and Tez CombinerOptimizer use this utility class instead of duplicating code.
* Introduced a new class called TezEdgeDescriptor that holds combine plans as well as various edge properties.
* Added TezEdgeDescriptors to TezOperator. Note that I added multiple descriptors for inbound edges but a single descriptor for all the outbound edges. This is because TezDagBuilder always creates an edge by connecting predecessors to the current vertex. Please let me know if you think we should allow multiple descriptors for outbound edges too.
* Refactored some code in TezDagBuilder while touching it.
ant test-tez passes.
ant test-e2e-tez passes.

I didn't add new test cases, but an e2e test case (Checkin_3) includes an algebraic udf (count) following group-by. I also manually tested it on a live cluster.
src/org/apache/pig/PigServer.java
Revision c0826ea New Change
[20] 179 lines
[+20] [+] private String constructScope() {
180
     */
180
     */
181
    public PigServer(String execTypeString) throws ExecException, IOException {
181
    public PigServer(String execTypeString) throws ExecException, IOException {
182
        this(addExecTypeProperty(PropertiesUtil.loadDefaultProperties(), execTypeString));
182
        this(addExecTypeProperty(PropertiesUtil.loadDefaultProperties(), execTypeString));
183
    }
183
    }
184

    
   
184

   

    
   
185
    public PigServer(String execTypeString, Properties properties) throws ExecException, IOException {

    
   
186
        this(addExecTypeProperty(properties, execTypeString));

    
   
187
    }

    
   
188

   
185
    public PigServer(Properties properties) throws ExecException, IOException {
189
    public PigServer(Properties properties) throws ExecException, IOException {
186
        this(new PigContext(properties));
190
        this(new PigContext(properties));
187
    }
191
    }
188

    
   
192

   
189
    private static Properties addExecTypeProperty(Properties properties, String execType) {
193
    private static Properties addExecTypeProperty(Properties properties, String execType) {
[+20] [20] 1657 lines
src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/CombinerOptimizer.java
Revision 18a382b New Change
 
src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRExecType.java
Revision 07d737d New Change
 
src/org/apache/pig/backend/hadoop/executionengine/tez/CombinerOptimizer.java
Revision e69de29 New Change
 
src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java
Revision 0b1f3c9 New Change
 
src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java
Revision 45e47b0 New Change
 
src/org/apache/pig/backend/hadoop/executionengine/tez/TezEdgeDescriptor.java
Revision e69de29 New Change
 
src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java
Revision 3f14644 New Change
 
src/org/apache/pig/backend/hadoop/executionengine/tez/TezOperator.java
Revision e612d88 New Change
 
src/org/apache/pig/backend/hadoop/executionengine/tez/TezPrinter.java
Revision 5a42ded New Change
 
src/org/apache/pig/backend/hadoop/executionengine/util/CombinerOptimizerUtil.java
Revision e69de29 New Change
 
test/tez-tests
Revision c22a448 New Change
 
test/org/apache/pig/test/TestCombiner.java
Revision 6252b51 New Change
 
test/org/apache/pig/test/data/GoldenFiles/TEZC1.gld
Revision 925f07e New Change
 
test/org/apache/pig/test/data/GoldenFiles/TEZC2.gld
Revision a3974fe New Change
 
test/org/apache/pig/test/data/GoldenFiles/TEZC3.gld
Revision a8c942b New Change
 
test/org/apache/pig/test/data/GoldenFiles/TEZC4.gld
Revision fb7c903 New Change
 
test/org/apache/pig/test/data/GoldenFiles/TEZC5.gld
Revision e6cd25e New Change
 
  1. src/org/apache/pig/PigServer.java: Loading...
  2. src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/CombinerOptimizer.java: Loading...
  3. src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRExecType.java: Loading...
  4. src/org/apache/pig/backend/hadoop/executionengine/tez/CombinerOptimizer.java: Loading...
  5. src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java: Loading...
  6. src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java: Loading...
  7. src/org/apache/pig/backend/hadoop/executionengine/tez/TezEdgeDescriptor.java: Loading...
  8. src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java: Loading...
  9. src/org/apache/pig/backend/hadoop/executionengine/tez/TezOperator.java: Loading...
  10. src/org/apache/pig/backend/hadoop/executionengine/tez/TezPrinter.java: Loading...
  11. src/org/apache/pig/backend/hadoop/executionengine/util/CombinerOptimizerUtil.java: Loading...
  12. test/tez-tests: Loading...
  13. test/org/apache/pig/test/TestCombiner.java: Loading...
  14. test/org/apache/pig/test/data/GoldenFiles/TEZC1.gld: Loading...
  15. test/org/apache/pig/test/data/GoldenFiles/TEZC2.gld: Loading...
  16. test/org/apache/pig/test/data/GoldenFiles/TEZC3.gld: Loading...
  17. test/org/apache/pig/test/data/GoldenFiles/TEZC4.gld: Loading...
  18. test/org/apache/pig/test/data/GoldenFiles/TEZC5.gld: Loading...