This repository contains a fine-tuned T5-small model that performs "Skill Prediction." It takes a piece of text, such as a Job Description (JD) or Resume, and extracts hard technical skills from the input text.
The Skill Prediction model aims to automatically extract hard technical skills from a given text, such as resumes or job descriptions. The model is based on the T5-small architecture and has been fine-tuned on a dataset consisting of annotated skills.
The model uses the T5-small architecture, which is a Transformer-based encoder-decoder model. It has been fine-tuned for the task of text-to-text transformation, where the input is a text corpus and the output is a list of hard technical skills.
- T5 (Text-to-Text Transfer Transformer) treats all NLP tasks as a text-to-text problem, making it versatile for various tasks, including text generation and extraction.
- The small variant is computationally efficient, making it suitable for fine-tuning with limited resources.
The training dataset consists of text samples paired with annotated technical skills.
The dataset is stored in the following format:
resume_text
: The input text (e.g., JD or resume).skills
: The annotated technical skills.
The fine-tuning process involves training the T5-small model using the Hugging Face Transformers library.
- Load the Dataset: Use the provided dataset with
resume_text
andskills
. - Tokenization: Use the T5 tokenizer to tokenize the input text and target text.
- Fine-tuning: Train the model using standard training arguments for text generation.
- Evaluation: Evaluate the model's performance on the test set.