Skip to content

Latest commit

 

History

History
63 lines (49 loc) · 4.62 KB

README.md

File metadata and controls

63 lines (49 loc) · 4.62 KB

wikiassignment

GoDoc Reference Build Status Go Report Card Bugs Coverage Lines of Code Maintainability Rating Reliability Rating Security Rating Vulnerabilities Description

Package wikiassignment is a golang package that provides utility functions for automatically assigning wikipedia pages to topics.

Documentation

API documentation can be found in the associated godoc reference.

Topics data can be found in overpedia.

Installation

This package can be installed with the go get command:

go get github.com/negapedia/wikiassignment/...

Requirements

You will need a machine with internet connection, 16GB of RAM (for the english version) and docker storage base directory properly setted.

This package depends on PETSc. The associated dockerfile provides a complete environment in which use this package. Otherwise PETSc can be installed following the same steps as in the dockerfile or in the PETSc installation page.

Export options

  1. lang: wikipedia nationalization to parse or custom JSON, default it.
  2. date: wikipedia dump date in the format AAAAMMDD, default latest.

Examples of use

  1. docker run negapedia/wikiassignment export -lang en -date 20060102: basic usage, run the image on the english nationalization dump in date 2 January 2006 and store the result in the in-containter /data folder, containing: ..1. semanticgraph.json maps source page ID to the array of target page IDs. ..2. partition.json maps typology of node (article,category or topic) to the array of page IDs belonging to it. ..3. absorptionprobabilities.csv represents each page in a row with its ID and the weight assignment for each topic. ..4. pages.csv represents pages in the form requested by wiki2overpediadb.
  2. docker run -v /path/2/out/dir:/data negapedia/wikiassignment -d export -lang en: ..1. run the image as before. ..2. mount as a volume the guest /data folder to the host folder /path/2/out/dir, the output folder, so that at the end of the operations /path/2/out/dir will contain the result. This folder can be changed to an arbitrary folder of your choice. ..3. run the image in detatched mode. For further explanations please refer to docker run reference.

Useful commands

  1. docker pull negapedia/wikiassignment Update the image to the last revision.
  2. docker kill --signal=SIGQUIT $(docker ps -ql) Quit the last container and log trace dump.
  3. docker logs -f $(docker ps -ql) Fetch the logs of the last container.
  4. docker system prune -fa --volumes Remove all unused images and volume without asking for confirmation.