Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mecab-dict-index] error #36

Open
coleea opened this issue Jul 10, 2017 · 2 comments
Open

[mecab-dict-index] error #36

coleea opened this issue Jul 10, 2017 · 2 comments

Comments

@coleea
Copy link

coleea commented Jul 10, 2017

Hi.
When I run 'mecab-dict-index', error occured.
log information is like this.

==============================================================================

reading ./ETN.csv ... 14
reading ./LISTEN_NER.csv ... 2081
reading ./Preanalysis.csv ... 5
reading ./TV_fullKorean_dict.csv ... 1687814
reading ./NP.csv ... 342
reading ./EF.csv ... 1820
reading ./XSA.csv ... 20
reading ./MM.csv ... 453
reading ./keyword.csv ... 276
reading ./XPN.csv ... 83
reading ./unk_word 1 1 0 (2nd).csv ... 276
reading ./Inflect.csv ... 44850
reading ./VA.csv ... 2360
reading ./XSV.csv ... 24
reading ./keyword_etc.csv ... 222
reading ./Place.csv ... 30300
reading ./LISTEN_unk_word 1 1 9.csv ... 254
reading ./LISTEN_KEYWORD.csv ... 2
reading ./sejong21_word.csv ... 846637
reading ./NNP.csv ... 2371
reading ./Hanja.csv ... 124570
reading ./EP.csv ... 51
reading ./KOR_ENG_csv.csv ... 60365
reading ./sejong21_verbal2.csv ... 15160
reading ./Foreign.csv ... 11599
reading ./NR.csv ... 482
reading ./NNB.csv ... 140
reading ./LISTEN_unk_word.csv ... 254
reading ./Wikipedia.csv ... 36763
reading ./sejong21_fusion.csv ... 1321382
reading ./VCN.csv ... 7
reading ./NNG.csv ... 205269
reading ./MAG.csv ... 14244
reading ./Person-actor.csv ... 99237
reading ./Symbol.csv ... 16
reading ./VCP.csv ... 9
reading ./VX.csv ... 125
reading ./Person.csv ... 196461
reading ./Group.csv ... 3176
reading ./XSN.csv ... 124
reading ./ETM.csv ... 133
reading ./NorthKorea.csv ... 3
dictionary.cpp(472) [da.build(str.size(), const_cast<char **>(&str[0]), &len[0], &val[0], &progress_bar_darts) == 0] unkown error in building double-array

==============================================================================

[dictionary.cpp] line 472~476 is like this

for (size_t i = 0; i < dic.size(); ++i) {
  | tbuf.append(reinterpret_cast<const char*>(dic[i].second),
  | sizeof(Token));
  | delete dic[i].second;
  | }
 

==============================================================================

this error occured when I add 'TV_fullKorean_dict.csv' that contains 1,687,814 entry data.
file size is 165.8MB.
Is there any limit of csv file size ?

Thank you

@yosato
Copy link

yosato commented Apr 2, 2018

I have exactly the same problem, when I tried to increase the size of the dictionary. A reply would be appreciated.

@schizokids
Copy link

I have exactly the same problem. It runs when I separate the dictionary but it fails when I try to apply it to the original file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants