PIG-3585 Implement union in Tez
Review Request #15931 - Created Dec. 1, 2013 and submitted
This patch implements union as follows: load vertices -> broadcast edges -> union vertex. Th changes include: * In the front-end, TezCompiler converts POUnion into a new vertex and connects it to its predecessors with broadcast edges. * In the back-end, a new POPackage class called POBroadcastTezLoad is added. This classes implements TezLoad interface, and it pulls every record from ShuffledUnorderedKVInputs in order and unions them.
* New e2e test case is added. * ant test-tez passes. * All e2e tests pass.
Posted (Dec. 1, 2013, 5:06 p.m.)
The code is fine if we have union after some processing. But for simple load and union case as below, this will create 3 vertices - 2 load vertices and one union vertex. a = load 'a' b = load 'b' c = union a, b In MR, this is handled in a simple map C: Store(/tmp/tezout:PigStorage) - scope-23 | |---C: Union[bag] - scope-22 | |---A: New For Each(false,false,false)[bag] - scope-10 | | | | .......... | |---B: New For Each(false,false,false)[bag] - scope-21 | | | ......... | |---B: Load(/tmp/data:org.apache.pig.builtin.PigStorage) - scope-11-------- We should also try do that in a single vertex to be more optimal. We can handle that in a separate jira though.
Review request changed
Updated (Dec. 1, 2013, 11:14 p.m.)
- changed from pending to submitted