Python module to parse xml wikidump of type pages-meta-history and return page which is edited maximum time within any give time range
- python
- xml file should follow http://www.mediawiki.org/xml/export-0.10.xsd schema format
- Clone this repository
- Edit the config file
- First line should be the complete path of wikidump xml file
- Second and third line should be start and end date respectively to define the range in which revisions should be considered
- All dates should be in YYYY-mm-dd fomat
- Run the module
- To parse the first 2 Kb of the dump file type:
python wikiparse.py config 0 2048
- To parse the entire dump file type:
python wikiparse.py config 0 -1
- To parse the first 2 Kb of the dump file type:
The module prints following information on the console
Page ID Page Title Total number of revisions