In Chinese word segmentation, only a single word is separated #176

xiaominger · 2023-08-15T09:35:12Z

Execute the following code (tabooSegmentCustomDicList there are more than 2000 words)
`
for _, tabooSegmentCustomDic := range tabooSegmentCustomDicList {
lowerCaseWord := strings.ToLower(tabooSegmentCustomDic.Word)
segmentutil.AddWord(lowerCaseWord)
}

func AddWord(word string) bool {
defer recoverPanic(word)
err := seg.AddToken(word, 100)
if err != nil {
logger.Errorf("Error when AddWord,%s", word, err)
return false
}
return true
}

func TextSegment(text string) []string {
defer recoverPanic(text)
return seg.Cut(text)
}

`

TextSegment("api发送文本loumès 𝘾𝘼𝙍𝙏𝙄𝙀𝙍")

the result is ["api","发","送","文","本","lou","mès"," ","𝘾𝘼𝙍𝙏𝙄𝙀𝙍"]

zwj186 · 2023-11-09T10:42:43Z

Please set 'DefaultAnalyzer' to 'cjk. AnalyzerName' will resolve the issue.

kms9 · 2024-06-15T18:04:20Z

how to set DefaultAnalyzer , search all repo files, no find this keyword/setting

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

In Chinese word segmentation, only a single word is separated #176

In Chinese word segmentation, only a single word is separated #176

xiaominger commented Aug 15, 2023

zwj186 commented Nov 9, 2023 •

edited

Loading

kms9 commented Jun 15, 2024

In Chinese word segmentation, only a single word is separated #176

In Chinese word segmentation, only a single word is separated #176

Comments

xiaominger commented Aug 15, 2023

zwj186 commented Nov 9, 2023 • edited Loading

kms9 commented Jun 15, 2024

zwj186 commented Nov 9, 2023 •

edited

Loading