-
Notifications
You must be signed in to change notification settings - Fork 290
Join And CoGroup
Chris Lu edited this page Oct 13, 2016
·
2 revisions
Join() and CoGroup() can join rows with the same keys together.
Suppose we are joining a left dataset and a right dataset.
left:
key1, value1, value2
key1, value3, value4
key2, value5, value6
right:
key1, value7, value8
key2, value9, value10
operation: left.Join(right)
The final join results:
key1, value1, value2, value7, value8
key1, value3, value4, value7, value8
key2, value5, value6, value9, value10
If we CoGroup() the same left and right datasets:
left:
key1, value1, value2
key1, value3, value4
key2, value5, value6
right:
key1, value7, value8
key2, value9, value10
operation: left.CoGroup(right)
The final cogroup results:
key1, {{value1, value2}, {value3, value4} }, {{value7, value8}}
key2, {{value5, value6}, {{value9, value10}}
This can selectively choose to group on which field, or fields. The fields will be moved to the front.
before:
key1, value1, key2, value2
key1, value3, key2, value4
key3, value5, key2, value6
operation:
GroupBy(1,3)
after:
key1, key2, [[value1, value2],[value3, value4]]
key3, key2, [[value5, value6]]
Specally, if only one field is a non-key field, the non-key field will be flattened by one level.
before:
key1, value1, key2
key1, value3, key2
key3, value5, key2
operation:
GroupBy(1,3)
after:
key1, key2, [value1, value3]
key3, key2, [value5]