Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About using own text data for SynGCN and SemGCN #37

Open
40347015S opened this issue Apr 3, 2021 · 1 comment
Open

About using own text data for SynGCN and SemGCN #37

40347015S opened this issue Apr 3, 2021 · 1 comment

Comments

@40347015S
Copy link

40347015S commented Apr 3, 2021

Your WordGCN paper is very exciting and very well written, so I want to try to use your code in my current work, and I would like to ask you some questions.
For training SynGCN and SemGCN, If I try to use other text data such as transcripts of speech recognition benchmark corpus (AMI) rather than the Wikipedia corpus and receive the AMI corpus-based SynGCN and SemGCN word embeddings, what is the first step I need to do, or how to process my own text data.
Thanks!

Shih-Hsuan

@svjan5
Copy link
Member

svjan5 commented Apr 6, 2021

Hi Shih-Hsuan
The corpus need to be arranged in the data.txt format which has been described in the readme. You'll have to run a dependency parser on your corpus so that you can get a dependency parse tree for each sentence

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants