App allows to find rows in csv file with duplicate Values by specified column

Algorithm

file size
- if file has small size => go through data and put data to hashmap to count occurrences of the target column values, output rows that we've met before. Time complexity O(n), Space complexity O(n)
- if file has large size => content of file might be not fitted to memory => external merge sort can be used to solve this problem
nature of input data
- If values of key column are unique with high probability (like guids) => hashmap-based algorithm will be storing a key for each value, thus spanning a memory for the size of the whole input file.
- If contents of key column are known to be limited to some known set of possible values => then even for large files, not fitted in memory, we can use hashmap

Step1. Divide file into chunks fittable into memory (size can be adjusted)
Step2. Sort each chunk and save result of sorting into temp file Sorting can be implemented using well known algorithm. I used Sorted Hash Map. Time complexity of insert entry to Sorted Hash Map will be O(log n). Extra space O(n).
Step3. Read limited piece of data from each chunk file. In my implementation each chunk has associsated queue and I read data to these queues.
Step4. Compare first values in queues. Find queue with minimum value - target value
Step5. Started from this queue push element to output while value are equal target value. When elements are finished grab new piece of data from file or move to next queue.
Step6. When all queues have no values to output (== targe value) go to step 4.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
FindDuplicatesByColumnInCsv.Test		FindDuplicatesByColumnInCsv.Test
FindDuplicatesByColumnInCsv		FindDuplicatesByColumnInCsv
.gitignore		.gitignore
FindDuplicatesByColumnIsCsv.sln		FindDuplicatesByColumnIsCsv.sln
LICENSE		LICENSE
README.md		README.md