Skip to content

This is a port of the tiktoken for PHP 7.4

License

Notifications You must be signed in to change notification settings

guttedgarden/tiktoken-php-7.4

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

tiktoken-php

Yethee's tiktoken library port for PHP 7.4 without the Symfony package

Installation

$ composer require guttedgarden/tiktoken

Usage

use guttedgarden\Tiktoken\EncoderProvider;

$provider = new EncoderProvider();

$encoder = $provider->getForModel('gpt-3.5-turbo-0301');
$tokens = $encoder->encode('Hello world!');
print_r($tokens);
// OUT: [9906, 1917, 0]

$encoder = $provider->get('p50k_base');
$tokens = $encoder->encode('Hello world!');
print_r($tokens);
// OUT: [15496, 995, 0]

Cache

The encoder uses an external vocabularies, so caching is used by default to avoid performance issues.

By default, the directory for temporary files is used. You can override the directory for cache via environment variable TIKTOKEN_CACHE_DIR or use EncoderProvider::setVocabCache():

use guttedgarden\Tiktoken\EncoderProvider;

$encProvider = new EncoderProvider();
$encProvider->setVocabCache('/path/to/cache');

// Using the provider

Disable cache

You can disable the cache, if there are reasons for this, in one of the following ways:

  • Set an empty string for the environment variable TIKTOKEN_CACHE_DIR.
  • Programmatically:
use guttedgarden\Tiktoken\EncoderProvider;

$encProvider = new EncoderProvider();
$encProvider->setVocabCache(null); // disable the cache

Limitations

  • Encoding for GPT-2 is not supported.
  • Special tokens (like <|endofprompt|>) are not supported.

License

MIT

About

This is a port of the tiktoken for PHP 7.4

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • PHP 100.0%