Skip to content

UCLA-BD2K/trainingsetbuilder

Repository files navigation

trainingsetbuilder

This project helps us build a training set for the Aztec tool classification project. Users can manually classify publications as either containing a new software tool or not.

#setup
pip install virtualenv
virtualenv --python=`which python` ~/.virtualenvs/django
source ~/.virtualenvs/django/bin/activate
pip install django

#launch
source ~/.virtualenvs/django/bin/activate
python manage.py runserver

Find a tool to classify on localhost:8000/classify/next

View database at localhost:8000/admin

How to add a bunch of new publications to the database(example):

from classify.models import Publication
import urllib2, json

journal = 'Bioinformatics (Oxford, England)'
count = 5000
query = 'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term="' + journal + '"[Journal]&retmode=json&retmax=' + str(count)

response = urllib2.urlopen(query).read()
data = json.loads(response)
idlist = data["esearchresult"]["idlist"]
for pm_id in idlist:
	p = Publication(pmid=pm_id, classification=-1)
	p.save()

About

Build a training set for the Aztec tool classification.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published