We're open sourcing a fine-tuned GPT-4o model that's deeply versed in the NEAR ecosystem, designed specifically for developers like you.
-
This GPT-4o model is fine-tuned for the NEAR ecosystem. Utilizing the GitHub API, we directly extracted data from NEAR codebases and fine-tuned the model on 50 million tokens and 5,000 example prompts over 4 epochs. This training process has proven to be a highly performant and specialized model adept at understanding the intricacies of NEAR.
-
The model excels in generating accurate and efficient code in Rust, TypeScript, and JavaScript. It adheres to NEAR's coding standards and best practices, making it a valuable tool for developers seeking to produce high-quality code.
-
This model is optimized to enhance the memory and performance of onchain agents. It aids agents in comprehending and navigating the complexities of the NEAR ecosystem.
-
A comprehensive test suite is included in the repository to benchmark the model's performance. This ensures the reliability of the fine-tuning components and the model's exceptional performance in real-world scenarios.
-
The model is designed for a wide range of applications, from building AI and Web3 applications to generating code and enhancing onchain agents.
-
Clone the repository:
git clone https://github.com/jbarnes850/near-fine-tuned-model.git cd near-fine-tuned-model
-
Set up a virtual environment:
python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate`
-
Install dependencies:
pip install -r requirements.txt
-
Set up your environment variables:
-
Create a
.env
file in the project root. -
Add your GitHub and OpenAI API keys:
GITHUB_API_KEY=your_github_api_key OPENAI_API_KEY=your_openai_api_key
-
To use the NEAR Ecosystem Fine-Tuned Model, follow these steps:
-
Ensure you have completed the installation steps above.
-
Run the fine-tuning script:
python -m fine_tuning.main
This script will:
- Fetch data from specified NEAR repositories and articles.
- Process and refine the data using GPT-4o.
- Create a JSONL file with the training data.
- Upload the training file to OpenAI.
- Start a fine-tuning job.
- Monitor the job until completion.
-
Once the fine-tuning is complete, you will receive a fine-tuned model ID. You can use this ID to make API requests to your specialized NEAR ecosystem model.
-
To use the fine-tuned model in your applications, use the OpenAI API with the provided model ID:
from openai import OpenAI client = OpenAI() response = client.chat.completions.create( model='ft:gpt-4o-2024-08-06:personal:near-ecosystem:AB2aZGVL', messages=[ {"role": "system", "content": "You are a NEAR Protocol expert."}, {"role": "user", "content": "Explain NEAR's sharding mechanism."} ], temperature=1, max_tokens=2048, top_p=1, frequency_penalty=0, presence_penalty=0, response_format={ "type": "text" } ) print(response.choices[0].message.content)
We have included a comprehensive test suite to verify each component of the fine-tuning process and to evaluate the performance of the fine-tuned model against the base model.
-
Install Testing Dependencies (if not already installed):
pip install -r requirements.txt
-
Navigate to the
tests
directory:cd tests
-
Run all tests:
python -m unittest discover
This command will discover and run all test cases in the
tests
directory.
-
File Upload and Processing Tests (
test_file_upload.py
):- Verifies that the training file is uploaded successfully to OpenAI and processed correctly.
- Tests handling of upload failures and exceptions.
-
Fine-Tuning Job Creation Tests (
test_fine_tune_creation.py
):- Ensures that fine-tuning jobs are created correctly with validated training files.
- Tests behavior with unprocessed or invalid training files.
-
Job Monitoring Tests (
test_job_monitoring.py
):- Checks that the fine-tuning job status is monitored accurately until completion.
- Tests handling of different job outcomes (success, failure, cancellation).
-
Model Evaluation Test (
test_model_evaluation.py
):- Compares responses from the fine-tuned model and the base model using a set of evaluation prompts.
- Helps assess the quality and improvements of the fine-tuned model.
-
Set Your OpenAI API Key:
Ensure your OpenAI API key is set in your environment variables:
export OPENAI_API_KEY='your_openai_api_key'
-
Replace the Fine-Tuned Model ID:
In
tests/test_model_evaluation.py
, replace'your_fine_tuned_model_id'
with the actual model ID obtained after fine-tuning. -
Run the Model Evaluation Test:
python test_model_evaluation.py
The script will output the prompts and the corresponding responses from both the base model and the fine-tuned model for manual comparison.
The NEAR Ecosystem Fine-Tuned Model uses a variety of data sources to ensure comprehensive coverage of the NEAR Protocol ecosystem:
-
GitHub Repositories:
- NEAR Protocol documentation
- NEAR Enhancement Proposals (NEPs)
- NEAR node documentation
- NEAR core implementation
- NEAR examples and SDKs
- NEAR tools and utilities
-
Web Articles:
- Official NEAR blog posts
- Technical updates and announcements
- Ecosystem news and developments
The full list of repositories and articles can be found in the config.yaml
file.
The fine-tuning process consists of several steps:
-
Data Collection:
- Fetches markdown and code files from specified GitHub repositories.
- Retrieves content from selected web articles.
-
Data Processing:
- Splits content into manageable chunks.
- Generates diverse prompts for both repository files and articles.
- Uses GPT-4o-mini to refine prompts and completions for each data point.
-
Training Data Creation:
- Generates a JSONL file with processed data in the required format for fine-tuning
-
Fine-Tuning:
- Uploads the training data to OpenAI
- Initiates a fine-tuning job on the GPT-4o model
- Monitors the progress of the fine-tuning job
-
Model Deployment:
- Upon successful completion, provides a fine-tuned model ID for use in applications
The fine-tuned model is hosted on OpenAI's servers and can be accessed through their API. In the future, we plan to make this model available on Hugging Face for easier access and integration.
To use the hosted model, follow the usage instructions provided above.
This project is licensed under the MIT License - see the LICENSE file for details.
Jarrod Barnes - jarrod.barnes@near.foundation
Project Link: https://github.com/jbarnes850/near-fine-tuned-model
For any questions, suggestions, or concerns, please open an issue in the GitHub repository or contact the maintainers directly.