Distributed FM and LR with parameter server

Portable, scalable and reliable distributed machine learning.

Wormhole is a place where DMLC projects works together to provide scalable and reliable machine learning toolkits that can run on various platforms

Since wormhole has been Deprecated, we retain some useful tools and keep updating

Features

Portable:
- Supported platforms: local machine, Apache YARN, MPI and Sungrid Engine
Rich support of Data Source
- All projects can read data from HDFS, S3 or local filesystem
Scalable and Reliable

Requires a C++11 compiler (e.g.~g++ >=4.8) and git. Install them on Ubuntu >= 13.10
cd dmlc-core; make to make dmlc core
cd ps-lite; make to make ps
cd src/linear; make or cd src/difacto; make

How to set multipath?
if you have date paths like: ./data/train1 ./data/train2
pls set train_data = "./data/train1;./data/train2" or "./data/train.*"
for hdfsfiles: train_data = "hdfs://data/train1;hdfs://data/train2" or "hdfs://data/train.*"
How to use HDFS?
set USE_HDFS=1 in dmlc-core/make/config.mk and ps-lite/make/config.mk
How to get readable weight?
use ./build/dump.dmlc model_in=your_model_path(should be local file) dump_out=dump_file need_inverse=1(0 or 1) then dump_file is the readable weight
dump error when use hdfs: ./build/dump.dmlc: error while loading shared libraries: libhdfs.so.0.0.0: cannot open shared object file: No such file or directory?
pls add hadoop lib path to LD_LIBRARY_PATH before dump
in my case: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/data/clusterserver/hadoop/lib/native/
Why the ids of dumped file is large like -2305843009213693952 and how to use the original ids?
see issue8 and issue10

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
dmlc-core		dmlc-core
doc		doc
ps-lite		ps-lite
src		src
README.md		README.md