We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
在提问之前,请确认以下几点:
1、内存错误 2、[dynet] random seed: 254078971 中的 seed 为什么每次都随机?难道不应该固定吗?
labeller = SementicRoleLabeller() labeller.load(srl_model_path) >>> [dynet] random seed: 254078971 [dynet] allocating memory: 2000MB [dynet] memory allocation done.
情况1、在连续应用 segmentor, sostagger, sarser, sementicRoleLabeller 对句子(小于500字)进行 srl 时,内存会由开始的 4G 多到 6G 多,再到 10G 左右稳定,再持续一段时间到 突然 13G 然后被 kill 掉。 情况2、一开始运行内存就很快升高到 13、16G 左右,导致还没等对一个句子 srl 成功就已经被 kill 掉了。 情况3、同情况1一样,但最后并不是被 kill 掉,而是报 CPU memory allocation failed n =11173625856 align=32 Exception CPU memory allocation failed 然后卡住,而不是被 kill 掉,此时仍然占用内存,大概 13G 左右吧。
情况2、情况3 偶然发生,情况1一直发生,虽然开始能运行,能对对 3万多个个句子(小于500字)持续 srl,但被 kill 掉只是时间问题。
已经参考过问题 #141,怀疑是 内存泄漏问题,还请解决这个问题。
# load ltp ============================================= LTP_DATA_DIR = './ltp_data_v3.4.0' # ltp模型目录的路径 cws_model_path = os.path.join(LTP_DATA_DIR, 'cws.model') # 分词模型路径,模型名称为`cws.model` pos_model_path = os.path.join(LTP_DATA_DIR, 'pos.model') # 词性标注模型路径,模型名称为`pos.model` par_model_path = os.path.join(LTP_DATA_DIR, 'parser.model') # 依存句法分析模型路径,模型名称为`parser.model` srl_model_path = os.path.join(LTP_DATA_DIR, 'pisrl.model') # 语义角色标注模型目录路径,模型目录为`srl`。注意该模型路径是一个目录,而不是一个文件。 segmentor = Segmentor() #segmentor.load(cws_model_path) # 加载模型,第二个参数是外部词典文件路径 segmentor.load_with_lexicon(cws_model_path, './dict_for_ltp/ltp_customer.txt') postagger = Postagger() #postagger.load(pos_model_path) postagger.load_with_lexicon(pos_model_path, './dict_for_ltp/ltp_customer.txt') parser = Parser() parser.load(par_model_path) labeller = SementicRoleLabeller() labeller.load(srl_model_path) # ====================================================== read_file = './xxxx.txt' write_file = './xxxx.txt' #articles = load_json_line_data(read_file) killed_count = 32832 count = 0 print('read file ... ') with io.open(read_file, "r", encoding='utf-8') as f: while True: line = f.readline() #print('read line ... ') if len(line) > 0: count += 1 if count <= killed_count: continue try: article = json.loads(line.strip()) temp_dic = {} title = article['title'] print('srl title ...') title_srl_result = get_event_triples_srl(title) print('srl title success! ') temp_dic['title'] = title temp_dic['title_srl_result'] = title_srl_result temp_dic['url'] = article['url'] temp_dic['publishAt'] = article['publishAt'] sentences_srl_result = [] p = article['event_discription'] for s in p.split('。'): for s1 in s.split(';'): if len(s1) > 500: continue if len(s1.strip()) > 0: print('srl sentence ...') s1_srl_result = get_event_triples_srl(s1) print('srl sentence success! ') sentences_srl_result.append({'sentence': s1, 'sentence_srl_result': s1_srl_result}) temp_dic['sentences_srl_result'] = sentences_srl_result with io.open(write_file, 'a', encoding='utf-8') as f1: f1.write(json.dumps(temp_dic, ensure_ascii=False) + "\n") if count % 1000 == 1: print('count: ', count) print('count: ', count) except Exception as e: print('ltp error') print("Exception: {}".format(e)) else: break # ------------------------------------------------------- segmentor.release() postagger.release() parser.release() labeller.release()
其中 def get_event_triples_srl(sentence):
sentence_srl_result = {} words = segmentor.segment(sentence) # 分词 words = '\t'.join(words) words = words.split('\t') postags = postagger.postag(words) # 词性标注 postags = '\t'.join(postags) postags = postags.split('\t') #print('words: ', words) #print('postags: ', postags) sentence_srl_result['words'] = words sentence_srl_result['postags'] = postags arcs = parser.parse(words, postags) # 句法分析 roles = labeller.label(words, postags, arcs) # 语义角色标注 然后是 roles 的进一步处理
Linux python 3.6 pyltp==0.2.1 模型 ltp_data_v3.4.0
Please ensure your issue adheres to the following guidelines:
The text was updated successfully, but these errors were encountered:
@liu946 这个项目现在不更新了吗?以后是否还打算更新呢?
Sorry, something went wrong.
@wenfeixiang1991 开源版本不在进行更新。我们的最新进展会上线讯飞开放平台,欢迎大家使用 https://www.xfyun.cn/services/lexicalAnalysis 。
奥奥,好的,那这个 pyltp 中的这个内存问题是否能劳烦解决一下呢?现在做不了稍大一点的数据实验分析,很头痛
@liu946 我在想如果能解决这个问题,即使不再更新,也还是可以用的,那就太感谢啦! :)
No branches or pull requests
在提问之前,请确认以下几点:
问题类型
1、内存错误
2、[dynet] random seed: 254078971 中的 seed 为什么每次都随机?难道不应该固定吗?
出错场景
情况1、在连续应用 segmentor, sostagger, sarser, sementicRoleLabeller 对句子(小于500字)进行 srl 时,内存会由开始的 4G 多到 6G 多,再到 10G 左右稳定,再持续一段时间到 突然 13G 然后被 kill 掉。
情况2、一开始运行内存就很快升高到 13、16G 左右,导致还没等对一个句子 srl 成功就已经被 kill 掉了。
情况3、同情况1一样,但最后并不是被 kill 掉,而是报 CPU memory allocation failed n =11173625856 align=32
Exception CPU memory allocation failed
然后卡住,而不是被 kill 掉,此时仍然占用内存,大概 13G 左右吧。
情况2、情况3 偶然发生,情况1一直发生,虽然开始能运行,能对对 3万多个个句子(小于500字)持续 srl,但被 kill 掉只是时间问题。
已经参考过问题 #141,怀疑是 内存泄漏问题,还请解决这个问题。
代码片段
其中
def get_event_triples_srl(sentence):
如何复现这一错误
运行环境
Linux
python 3.6
pyltp==0.2.1
模型 ltp_data_v3.4.0
期望结果
其他
Please ensure your issue adheres to the following guidelines:
What is affected by this bug?
When does this occur?
Where on the code does it happen?
How do we replicate the issue?
Your environment information
Expected behavior (i.e. solution)
Other Comments
The text was updated successfully, but these errors were encountered: