Skip to content
This repository has been archived by the owner on Nov 13, 2024. It is now read-only.

openai tokenizer to treat specail tokens as natural text #73

Merged
merged 2 commits into from
Oct 17, 2023

Conversation

acatav
Copy link
Contributor

@acatav acatav commented Oct 16, 2023

Problem

OpenAI tokenizer raise errors on special tokens

Solution

Encode special tokens to natural text

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update
  • Infrastructure change (CI configs, etc)
  • Non-code change (docs, etc)
  • None of the above: (explain here)

Test Plan

Added a unit tests to cover this case

Copy link
Contributor

@igiloh-pinecone igiloh-pinecone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@acatav acatav enabled auto-merge October 17, 2023 10:20
@acatav acatav merged commit 9b15cde into dev Oct 17, 2023
9 checks passed
@acatav acatav deleted the special-tokens-as-natural-text branch October 17, 2023 12:42
@acatav acatav mentioned this pull request Oct 23, 2023
2 tasks
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants