⚡ From finding text to search and replace, from sorting to beautifying text and more 🎨
-
Updated
Jun 5, 2024 - Shell
⚡ From finding text to search and replace, from sorting to beautifying text and more 🎨
Diff Match Patch is a high-performance library in multiple languages that manipulates plain text.
Intuitive find & replace CLI (sed alternative)
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.
Python library for creating PEG parsers
Text Classification Algorithms: A Survey
Program to convert lines of text into a tree structure.
Persian NLP Toolkit
The most accurate natural language detection library for Go, suitable for short text and mixed-language text
A fast implementation of Aho-Corasick in Rust.
A fast and convenient fuzzy matcher library for rust
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
A simple Python module for parsing human names into their individual components
Open Korean Text Processor - An Open-source Korean Text Processor
All-in-one text de-duplication
PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Mor…
Text Normalization & Inverse Text Normalization
Add a description, image, and links to the text-processing topic page so that developers can more easily learn about it.
To associate your repository with the text-processing topic, visit your repo's landing page and select "manage topics."