Scalable data pre processing and curation toolkit for LLMs
python
data
data-processing
data-preparation
deduplication
data-quality
data-curation
data-prep
fine-tuning
fast-data-processing
data-processing-pipelines
datacuration
large-language-models
llm
llmapps
large-scale-data-processing
datarecipes
semantic-deduplication
llm-data-quality
-
Updated
Nov 22, 2024 - Jupyter Notebook