This project is about creating a hindi dataset cleaning python package.It will be a command line based solution to pre-process hindi dataset. The abilities of this package will include-
- pre-processing given file into hindi characters only.
- splitting paragraphs into sentences
- removal of punctuations from the dataset if required.
Technologies In Use
- Python
- Data Science
Number of member/s required: 2
Start Date: 11-08-2020
Expected Deadline: 20-09-2020