Fine tuning pre-trained transformer models in TensorFlow and in PyTorch for question answering
-
Updated
Feb 5, 2022 - Jupyter Notebook
Fine tuning pre-trained transformer models in TensorFlow and in PyTorch for question answering
Use Huggingface Transformer and Tokenizers as Tensorflow Reusable SavedModels
Use custom tokenizers in spacy-transformers
This project shows how to derive the total number of training tokens from a large text dataset from 🤗 datasets with Apache Beam and Dataflow.
Bachelor Thesis Repository. Wsm-tokenizer (word shape mapping) uses vocabulary comparisons to find probable morphemes in lexemic tokens.
Small library that provides functions to tokenize a string into an array of words with or without punctuation
Visualize some important concepts related to LLM architectures.
A graphical user interface for the Elasticsearch Analyze API
Question and Answer web applicaiton using fine-tuned and pre-trained T5 models. Application runs on Streamlit.
Megatron-LM/GPT-NeoX compatible Text Encoder with 🤗Transformers AutoTokenizer.
NLP Dataset Creation and Semantic Search Demonstration
Package to align tokens from different tokenizations.
Create prompts with a given token length for testing LLMs and other transformers text models.
ML Model designed to learn compositional structure of LEGO assemblies
Develop DL models using Pytorch and Hugging Face
the small distributed language model toolkit; fine-tune state-of-the-art LLMs anywhere, rapidly
Kingchop ⚔️ is a JavaScript English based library for tokenizing text (chopping text). It uses vast rules for tokenizing, and you can adjust them easily.
Add a description, image, and links to the tokenizers topic page so that developers can more easily learn about it.
To associate your repository with the tokenizers topic, visit your repo's landing page and select "manage topics."