nimfm

A library for factorization machines in Nim.

Factorization machines (FMs)[1,2] are machine learning models using second-order feature combinations (e.g., x1 * x2, x2 * x5) efficiently.

nimfm provides

Not-only second-order but also higher-order factorization machines [3].
Coordinate descent (a.k.a alternative least squares) solver.
Stochastic gradient descent solver with some step-size scheduling methods [4] and AdaGrad solver [5].
Greedy coordinate descent [6] and Hazan's (Frank-Wolfe) algorithm [7] (with some heuristics) solvers for convex factorization machines.
Some sparse regularizers [8,9,10,11] for feature selection and feature interaction selection, and various optimizers for such regularizers [11,12,13,14].
Field-aware factorization machines [15] and AdaGrad/SGD solver for them.
Various loss functions: Squared, Huber, SquaredHinge, and Logistic.
Binary file for end users.

Data format

For `FactorizationMachine` and `ConvexFactorizationMachine`

nimfm uses its own data type for datasets, CSRDataset and CSCDataset, and provides procs for loading libsvm/svmlight format file as such datasets. CSRDataset is for SGD, Adagrad, PSGD, Katyusha, PGD, MBPSGD, FISTA and NMAPGD solvers. CSCDataset is for CD, GreedyCD, Hazan, PCD, and PBCD solvers.

In addition, nimfm provides binary data formats: StreamCSRDataset and StreamCSCDataset. They have only parts of a dataset in main memory and therefore are useful for a very large-scale dataset. convertSVMLightFile converts a svmlight format file to a StreamCSRDataset format file. transposeFile converts a StreamCSRDataset format file to a StreamCSCDataset format file.

The two-dimensional sequence seq[seq[float64]] can be easily transformed to such datasets by toCSR and toCSC.

For `FieldAwareFactorizationMachine`

For field-aware factorization machines, nimfm provides CSRFieldDataset, CSCFieldDataset, and procs for loading libffm format file as such datasets. Binary formats for field-aware datasets, StreamCSRFieldDataset and StreamCSCFieldDataset, are also supported. Note that currently (version 0.3.0) nimfm provides no solver using CSCFieldDataset.

Sparse regularizers

nimfm provides some sparse regularizers [8,9,10,11] and various optimizers for such regularizers [11,12,13,14]. The following table shows whether each optimizer can be used for each regularizer or not.

Optimizer \ Regularizer	`L1` [8]	`L21` [9,10]	`SquaredL12` [11]	`SquaredL21` [11]	`OmegaTI` [11]	`OmegaCS` [11]
`PCD` [11]	Yes	No	Yes	No	Yes	No
`PBCD` [11]	Yes	Yes	No	Yes	No	Yes
`PGD`	Yes	Yes	Yes	Yes	No	No
`FISTA` [12]	Yes	Yes	Yes	Yes	No	No
`NMAPGD` [13]	Yes	Yes	Yes	Yes	No	No
`PSGD`	Yes	Yes	Yes	Yes	No	No
`MBPSGD` (MB = MiniBatch)	Yes	Yes	Yes	Yes	No	No
`Katyusha` [14]	Yes	Yes	Yes	Yes	No	No

L21, SquaredL21, and OmegaCS are for feature selection and others are for feature interaction selection. Note that PSGD for SquaredL12 and SquaredL21 might be slow when a dataset is sparse.

Multi-threading

Currently (version 0.3.0), SGD and AdaGrad solvers for FactorizationMachine and FieldAwareFactorizationMachine can run with multi-threading. They update model parameters asynchronously, so their results are different from those of single-thearding version.

Installation for Nim users

Install by nimble:

nimble install nimfm

Installation for end users

Download the source codes by

git clone https://github.com/neonnnnn/nimfm.git
Make:

nimble make

Then, nimfm, nimfm_cfm, and nimfm_sparsefm binary will be created in the ./bin directory. nimfm is for conventional factorization machines, nimfm_cfm is for convex factorization machines, and nimfm_sparsefm is for factorization machines with sparse regularization. ./bin/[binary-name] --help shows the usage.

How to use nimfm?

Please see examples at benchmarks directory. If you are familiar with scikit-learn, you probably will be able to use nimfm well.

References

S. Rendle. Factorization machines. In ICDM, pp. 995--1000, 2010.
S. Rendle. Factorization machines with libfm. ACM Transactions on Intelligent Systems and Technology, 3(3):57--78, 2012.
M. Blondel, A. Fujino, N. Ueda, M. Ishihata. Higher-order factorization machines. In NeurIPS, pp. 3351--3359, 2016.
L. Bottou. Stochastic gradient descent tricks. Neural Networks, Tricks of the Trade, Reloaded, pp. 430–445, Lecture Notes in Computer Science (LNCS 7700), Springer, 2012.
J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12(Jul):2121-–2159, 2011.
M. Blondel, A. Fujino, and N. Ueda. Convex factorization machines. In ECML-PKDD, pp. 19--35, 2015.
M. Yamada, W. Lian, A. Goyal, J. Chen, K. Wimalawarne, S. A. Khan, S. Kaski, H. Mamitsuka, and Y. Chang. Convex factorization machine for toxicogenomics prediction. In KDD, pp. 1215--1224, 2017.
Z. Pan, E. Chen, Q. Liu, T. Xu, H. Ma, and H. Lin. Sparse factorization machines for click-through rate prediction. In ICDM, pp. 400--409, 2016.
J Xu, K Lin, P. Tan, and J. Zhou. Synergies that matter: Efficient interaction selection via sparse factorization machine. In SDM, pp. 1008-–0116, 2016.
H. Zhao, Q. Yao, J. Li, Y. Song, and D. L. Lee. Meta-graph based recommendation fusion over heterogeneous information networks. In KDD, pp. 635–-644, 2017
K. Atarashi, S. Oyama, and M. Kurihara. Factorization machines with regularization for sparse feature interactions. https://arxiv.org/abs/2010.09225
A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2(1):183-–202, 2009.
H. Li and Z. Lin. Accelerated proximal gradient methods for nonconvex programming. In NeurIPS, pp. 379-–387, 2015.
Z. Allen-Zhu. Katyusha: The first direct acceleration of stochastic gradient methods. Journal of Machine Learning Research, 18(1):8194–-8244, 2017.
Y. Juan, Y. Zhuang, W-S. Chin, and C-J. Lin: Field-aware factorization machines for CTR prediction. In SIGIR, pp. 43--50, 2016.

Authors

Kyohei Atarashi, 2020-present

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
.github/workflows		.github/workflows
benchmarks/ml100k		benchmarks/ml100k
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
nimfm.nimble		nimfm.nimble

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

nimfm

Data format

For `FactorizationMachine` and `ConvexFactorizationMachine`

For `FieldAwareFactorizationMachine`

Sparse regularizers

Multi-threading

Installation for Nim users

Installation for end users

How to use nimfm?

References

Authors

About

Releases

Packages

Languages

License

neonnnnn/nimfm

Folders and files

Latest commit

History

Repository files navigation

nimfm

Data format

For FactorizationMachine and ConvexFactorizationMachine

For FieldAwareFactorizationMachine

Sparse regularizers

Multi-threading

Installation for Nim users

Installation for end users

How to use nimfm?

References

Authors

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

For `FactorizationMachine` and `ConvexFactorizationMachine`

For `FieldAwareFactorizationMachine`

Packages