Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEA:add split token and generate related resource #59

Open
wants to merge 39 commits into
base: main
Choose a base branch
from

Conversation

txy77
Copy link
Collaborator

@txy77 txy77 commented Oct 6, 2022

  1. Update split token, generate word2vec, copy_mask, token2id, load pretrained model
  2. Fix some bugs in redial, inspired and tgredial model

Copy link
Collaborator Author

@txy77 txy77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Fix the bugs
  2. Retypeset the code

Copy link
Collaborator Author

@txy77 txy77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change the name of the variable:

  1. processing -> processed_
  2. split_token -> split_text

Copy link
Collaborator Author

@txy77 txy77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change the name of the variable:

  1. processing -> processed_
  2. split_token -> split_text

Copy link
Collaborator Author

@txy77 txy77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Add the version number of python package gensim

Copy link
Collaborator Author

@txy77 txy77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. change the name of variable:
    crslabtokenizer -> Tokenizer

Copy link
Collaborator Author

@txy77 txy77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change the wat of load config

Copy link
Collaborator Author

@txy77 txy77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix the problem of build copy_mask.npy

Copy link
Collaborator Author

@txy77 txy77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIX: Removed unnecessary word2vec

Copy link
Collaborator Author

@txy77 txy77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIX: Complete the integration of tokenizer classes

Copy link
Collaborator Author

@txy77 txy77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIX: problem of data type

Copy link
Collaborator Author

@txy77 txy77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix: add special_token_idx to tokenizer

Copy link
Collaborator Author

@txy77 txy77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIX: conv special_token_idx

Copy link
Collaborator Author

@txy77 txy77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIX: variable name : CRS_Tokenizer -> crs_tokenizer

Copy link
Collaborator Author

@txy77 txy77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIX: variable name : wordembedding -> word_embedding

Copy link
Collaborator Author

@txy77 txy77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIX : delete redundant variable : crstokennizer

Copy link
Collaborator Author

@txy77 txy77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change variable name: BaseCrsTokenize -> BaseTokenizer

Copy link
Collaborator Author

@txy77 txy77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIX: change as_tensor function

Copy link
Collaborator Author

@txy77 txy77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIX: seperate the word2vec & copy_mask from dictionary

Copy link
Collaborator Author

@txy77 txy77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIX: delete npy_dict

Copy link
Collaborator Author

@txy77 txy77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIX: delete npy_dict

Copy link
Collaborator Author

@txy77 txy77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIX: bert_tokenize -> BertToeknizer

Copy link
Collaborator Author

@txy77 txy77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIX: variable name
self.Tokenizer -> self.tokenizer

Copy link
Collaborator Author

@txy77 txy77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIX: add copy_mask = None

Copy link
Collaborator Author

@txy77 txy77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list_word -> word_list
add return copy_mask

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant