Sisyphe is a simple NodeJS (recursive) folders analyser application & a (lerna) git monorepo.
Basically it can provided somes informations, check here for informations
Tested with NodeJS@8.X, Redis@3.2.6
Works on Linux/OSX/Windows
Example to run a quick local redis (thanks to docker):
docker run --name sisyphe-redis -p 6379:6379 redis:3.2.6
- Download the latest Sisyphe version
- Just do :
npm install
(this will execute a npm postinstall) - ... that's it.
npm run test
will test sisyphe & its workers
./app.js --help
Will output help
-V, --version output the version number
-n, --corpusname <name> Corpus name (session name)
-s, --select <name> Choose modules for the analyse
-c, --config-dir <path> Configuration folder path
-t, --thread <number> The number of process which sisyphe will take
-b, --bundle <number> Regroup jobs in bundle of jobs
-r, --remove-module <name> Remove module name from the workflow
-q, --quiet Silence output
-l, --list List all available workers
-h, --help output usage information
Just start Sisyphe on a folder with any files in it.
node app -n sessionName ~/Documents/customfolder/corpus
node app -n sessionName -c ~/Documents/customfolder/folderResources ~/Documents/customfolder/session
Sisyphe is now working in background using all your computer threads. Just take a coffee and wait , it will prevent you when it's done :)
The result of sisyphe is present @ sisyphe/out/{timestamp}-corpusname/
(errors,info,duration..)
For a control panel & full binded app, go to Sisyphe-monitor
sisyphe has a server that allows to control it and to obtain more information on its execution.
Simply run the server with npm run server
to access these features
These are the default modules (focused on xml & pdf).
- FILETYPE Will detect mimetype,extension, corrupted files..
- PDF Will get info from PDF (version, author, meta...)
- XML Will check if it's wellformed, valid-dtd's, get elements from balises ...
- LANG Will detect lang of files (xml/text files ...)
- XPATH Will generate a complete list of xpaths from submitted folder
- OUT Will export data to json file & ElasticSearch database
- NB Try to assing some categories to an XML document by using its abstract
- MULTICAT Try to assing some categories to an XML document by using its identifiers
- TEEFT Try to extract keywords of a fulltext
- SKEEFT Try to extract keywords of a structured fulltext by using teeft algorithm and text structuration
When you work on worker, just:
- Commit your changes as easy
- Do a
npm run updated
(to check what worker has changed) - Do a
npm run publish
(it will ask you to change version of module worker & publish it to github)
Some bugs could occured with certains files with 'skeeft' on windows module please just disactivate it until we fix.