Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In Chinese word segmentation, only a single word is separated #176

Open
xiaominger opened this issue Aug 15, 2023 · 2 comments
Open

In Chinese word segmentation, only a single word is separated #176

xiaominger opened this issue Aug 15, 2023 · 2 comments

Comments

@xiaominger
Copy link

Execute the following code (tabooSegmentCustomDicList there are more than 2000 words)
`
for _, tabooSegmentCustomDic := range tabooSegmentCustomDicList {
lowerCaseWord := strings.ToLower(tabooSegmentCustomDic.Word)
segmentutil.AddWord(lowerCaseWord)
}

func AddWord(word string) bool {
defer recoverPanic(word)
err := seg.AddToken(word, 100)
if err != nil {
logger.Errorf("Error when AddWord,%s", word, err)
return false
}
return true
}

func TextSegment(text string) []string {
defer recoverPanic(text)
return seg.Cut(text)
}

`

TextSegment("api发送文本loumès 𝘾𝘼𝙍𝙏𝙄𝙀𝙍")

the result is ["api","发","送","文","本","lou","mès"," ","𝘾𝘼𝙍𝙏𝙄𝙀𝙍"]

@zwj186
Copy link

zwj186 commented Nov 9, 2023

Please set 'DefaultAnalyzer' to 'cjk. AnalyzerName' will resolve the issue.

@kms9
Copy link

kms9 commented Jun 15, 2024

how to set DefaultAnalyzer , search all repo files, no find this keyword/setting

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants