You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ogrisel edited this page Jan 2, 2011
·
12 revisions
pignlproc usage tips
Here are some tips to use pignlproc tools to mine wikipedia & dbpedia dumps:
Splitting a Wikipedia XML dump using Mahout into small chunks is useful to make pig able to work in parallel for instance on a S3 bucket. It is also useful to test a script locally on a small chunk before launching the script as a job on Hadoop cluster.