LexisNeixs Docx parser is a tool that parses word(docx) format of LexisNexis's World Major Publication
Download this git for initiating this program.
$git clone https://github.com/chanhee-kang/LexisNexis-Docx-Parser.git
Then, you have to download docx file(World Major Publication) from LexisNexis.
The docx file looks like as following picture:
Also, 'docx2txt' doesn't contain in Anaconda so you need to...
$pip install docx2txt
If you don't use Anaconda for python, then you also need to install as following:
$pip install docx2txt
$pip instal pandas
$pip instal os
Put your docx file directory
text = docx2txt.process("*.docx")
Following line is for matching countries for the articles depends on the csv file.
country = pd.read_csv('mwp_list.csv')
- Country will be shown as unknown if there is no matching country in mwp_list
- Loaded as a single file only
If you have any requests, please contact: https://ck992.github.io/.