Implementation of database relation join operators - Hash Join and Sort Merge Join
Given M memory blocks and two large relations R(X,Y)and S(Y,Z). There are iterators for the following operations.
- SortMerge Join
- open()- Create sorted sublists for R and S, each of size M blocks.
- getnext()- Use 1 block for each sublist and get minimum of R & S. Joins this minimum Y value with the other table and return. Checks for B(R)+B(S)<M^2
- close()- close all files
- Hash Join
- open()- Create M1 hashed sublists for R and S
- getnext()- For each Ri and Si thus created, loads the smaller of the two in the main memory and creates a search structure over it. Then recursively loads the other file in the remaining blocks and for each record of this file, search corresponding records (with same join attribute value) from the other file. Checks for min(B(R),B(S))<M^2
- close()- close all files
Join condition is (R.Y==S.Y). One block is used for output which is filled by row returned by getnext() and when it gets full, appends it to the output file and continues.
- Path to file containing relation R
- Path to file containing relation S
- Type of join sort/hash
- Number of blocks
Note that all attributes, X, Y and Z must be strings and Y may be a non-key attribute.
Assumed that each block can store 100 tuples for both relations, R and S.
The bash script named run.sh
is used to compile and run your code. The command for execution would be of the form:
run.sh <path of R file> <path of S file><sort/hash> <M>
<R filename>_<S filename>_join.txt
(Kindly note it contains only R & S base filename and not their path).
Use the generator folder to generate sample inputs.