Review Board 1.7.22


SchemaTuple in Pig

Review Request #4651 - Created April 5, 2012 and updated

Jonathan Coveney
PIG-2632
Reviewers
pig
julien
pig
This work builds on Dmitriy's PrimitiveTuple work. The idea is that, knowing the Schema on the frontend, we can code generate Tuples which can be used for fun and profit. In rudimentary tests, the memory efficiency is 2-4x better, and it's ~15% smaller serialized (heavily heavily depends on the data, though). Need to do get/set tests, but assuming that it's on par (or even faster) than Tuple, the memory gain is huge.

Need to clean up the code and add tests.

Right now, it generates a SchemaTuple for every inputSchema and outputSchema given to UDF's. The next step is to make a SchemaBag, where I think the serialization savings will be really huge.

Needs tests and comments, but I want the code to settle a bit.

 

Diff revision 10 (Latest)

1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10

  1. trunk/.gitignore: Loading...
  2. trunk/conf/pig.properties: Loading...
  3. trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java: Loading...
  4. trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapBase.java: Loading...
  5. trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapReduce.java: Loading...
  6. trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigTupleDefaultRawComparator.java: Loading...
  7. trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/PhysicalOperator.java: Loading...
  8. trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java: Loading...
  9. trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POFRJoin.java: Loading...
  10. trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POMergeJoin.java: Loading...
  11. trunk/src/org/apache/pig/builtin/mock/Storage.java: Loading...
  12. trunk/src/org/apache/pig/data/AppendableSchemaTuple.java: Loading...
  13. trunk/src/org/apache/pig/data/BinInterSedes.java: Loading...
  14. trunk/src/org/apache/pig/data/BinSedesTupleFactory.java: Loading...
  15. trunk/src/org/apache/pig/data/DataByteArray.java: Loading...
  16. trunk/src/org/apache/pig/data/FieldIsNullException.java: Loading...
  17. trunk/src/org/apache/pig/data/PBooleanTuple.java: Loading...
  18. trunk/src/org/apache/pig/data/PDoubleTuple.java: Loading...
  19. trunk/src/org/apache/pig/data/PFloatTuple.java: Loading...
  20. trunk/src/org/apache/pig/data/PIntTuple.java: Loading...
This diff has been split across 3 pages: 1 2 3 >
trunk/.gitignore
Revision 1355561 New Change
1
*~
1
*~
2
build/
2
build/
3
src-gen/
3
src-gen/
4
test/org/apache/pig/test/utils/dotGraph/parser/
4
test/org/apache/pig/test/utils/dotGraph/parser/
5
target/
5
target/
6
ivy/*.jar
6
ivy/*.jar
7
pig.jar
7
pig.jar
8
pig-withouthadoop.jar
8
pig-withouthadoop.jar
9
*.iml
9
*.iml
10
*.ipr
10
*.ipr
11
*.iws
11
*.iws
12
*.patch
12
*.patch
13
*.log
13
*.log
14
*.orig
14
*.orig
15
*.rej
15
*.rej
16
*.class
16
*.class

    
   
17
*.classpath
trunk/conf/pig.properties
Revision 1355561 New Change
 
trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java
Revision 1355561 New Change
 
trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapBase.java
Revision 1355561 New Change
 
trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapReduce.java
Revision 1355561 New Change
 
trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigTupleDefaultRawComparator.java
Revision 1355561 New Change
 
trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/PhysicalOperator.java
Revision 1355561 New Change
 
trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java
Revision 1355561 New Change
 
trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POFRJoin.java
Revision 1355561 New Change
 
trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POMergeJoin.java
Revision 1355561 New Change
 
trunk/src/org/apache/pig/builtin/mock/Storage.java
Revision 1355561 New Change
 
trunk/src/org/apache/pig/data/AppendableSchemaTuple.java
New File
 
trunk/src/org/apache/pig/data/BinInterSedes.java
Revision 1355561 New Change
 
trunk/src/org/apache/pig/data/BinSedesTupleFactory.java
Revision 1355561 New Change
 
trunk/src/org/apache/pig/data/DataByteArray.java
Revision 1355561 New Change
 
trunk/src/org/apache/pig/data/FieldIsNullException.java
New File
 
trunk/src/org/apache/pig/data/PBooleanTuple.java
Revision 1355561 New Change
 
trunk/src/org/apache/pig/data/PDoubleTuple.java
Revision 1355561 New Change
 
trunk/src/org/apache/pig/data/PFloatTuple.java
Revision 1355561 New Change
 
trunk/src/org/apache/pig/data/PIntTuple.java
Revision 1355561 New Change
 
  1. trunk/.gitignore: Loading...
  2. trunk/conf/pig.properties: Loading...
  3. trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java: Loading...
  4. trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapBase.java: Loading...
  5. trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigGenericMapReduce.java: Loading...
  6. trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigTupleDefaultRawComparator.java: Loading...
  7. trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/PhysicalOperator.java: Loading...
  8. trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java: Loading...
  9. trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POFRJoin.java: Loading...
  10. trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POMergeJoin.java: Loading...
  11. trunk/src/org/apache/pig/builtin/mock/Storage.java: Loading...
  12. trunk/src/org/apache/pig/data/AppendableSchemaTuple.java: Loading...
  13. trunk/src/org/apache/pig/data/BinInterSedes.java: Loading...
  14. trunk/src/org/apache/pig/data/BinSedesTupleFactory.java: Loading...
  15. trunk/src/org/apache/pig/data/DataByteArray.java: Loading...
  16. trunk/src/org/apache/pig/data/FieldIsNullException.java: Loading...
  17. trunk/src/org/apache/pig/data/PBooleanTuple.java: Loading...
  18. trunk/src/org/apache/pig/data/PDoubleTuple.java: Loading...
  19. trunk/src/org/apache/pig/data/PFloatTuple.java: Loading...
  20. trunk/src/org/apache/pig/data/PIntTuple.java: Loading...
This diff has been split across 3 pages: 1 2 3 >