Skip to content

Latest commit

 

History

History
273 lines (222 loc) · 12.1 KB

README.md

File metadata and controls

273 lines (222 loc) · 12.1 KB

#project-template - A template for scikit-learn extensions

Tavis Status Coveralls Status CircleCI Status

project-template is a template project for scikit-learn compatible extensions.

It aids development of estimators that can be used in scikit-learn pipelines and (hyper)parameter search, while facilitating testing (including some API compliance), documentation, open source development, packaging, and continuous integration.

Important Links

HTML Documentation - http://contrib.scikit-learn.org/project-template/

Installation and Usage

The package by itself comes with a single module and an estimator. Before installing the module you will need numpy and scipy. To install the module execute:

$ python setup.py install

or

pip install sklearn-template

If the installation is successful, and scikit-learn is correctly installed, you should be able to execute the following in Python:

>>> from skltemplate import TemplateEstimator
>>> estimator = TemplateEstimator()
>>> estimator.fit(np.arange(10).reshape(10, 1), np.arange(10))

TemplateEstimator by itself does nothing useful, but it serves as an example of how other Estimators should be written. It also comes with its own unit tests under template/tests which can be run using nosetests.

Creating your own library

1. Cloning

Clone the project into your computer by executing

$ git clone https://github.com/scikit-learn-contrib/project-template.git

You should rename the project-template folder to the name of your project. To host the project on Github, visit https://github.com/new and create a new repository. To upload your project on Github execute

$ git remote set-url origin https://github.com/username/project-name.git
$ git push origin master

2. Modifying the Source

You are free to modify the source as you want, but at the very least, all your estimators should pass the check_estimator test to be scikit-learn compatible. (If there are valid reasons your estimator cannot pass check_estimator, please raise an issue at scikit-learn so we can make check_estimator more flexible.)

This template is particularly useful for publishing open-source versions of algorithms that do not meet the criteria for inclusion in the core scikit-learn package (see FAQ), such as recent and unpopular developments in machine learning. However, developing using this template may also be a stepping stone to eventual inclusion in the core package.

In any case, developers should endeavor to adhere to scikit-learn's Contributor's Guide which promotes the use of:

  • algorithm-specific unit tests, in addition to check_estimator's common tests
  • PEP8-compliant code
  • a clearly documented API using NumpyDoc and PEP257-compliant docstrings
  • references to relevant scientific literature in standard citation formats
  • doctests to provide succinct usage examples
  • standalone examples to illustrate the usage, model visualisation, and benefits/benchmarks of particular algorithms
  • efficient code when the need for optimization is supported by benchmarks

3. Modifying the Documentation

The documentation is built using sphinx. It incorporates narrative documentation from the doc/ directory, standalone examples from the examples/ directory, and API reference compiled from estimator docstrings.

To build the documentation locally, ensure that you have sphinx, sphinx-gallery and matplotlib by executing:

$ pip install sphinx matplotlib sphinx-gallery

The documentation contains a home page (doc/index.rst), an API documentation page (doc/api.rst) and a page documenting the template module (doc/template.rst). Sphinx allows you to automatically document your modules and classes by using the autodoc directive (see template.rst). To change the asthetics of the docs and other paramteres, edit the doc/conf.py file. For more information visit the Sphinx Documentation.

You can also add code examples in the examples folder. All files inside the folder of the form plot_*.py will be executed and their generated plots will be available for viewing in the /auto_examples URL.

To build the documentation locally execute

$ cd doc
$ make html

4. Setting up Travis CI

TravisCI allows you to continuously build and test your code from Github to ensure that no code-breaking changes are pushed. After you sign up and authourize TravisCI, add your new repository to TravisCI so that it can start building it. The travis.yml contains the configuration required for Travis to build the project. You will have to update the variable MODULE with the name of your module for Travis to test it. Once you add the project on TravisCI, all subsequent pushes on the master branch will trigger a Travis build. By default, the project is tested on Python 2.7 and Python 3.5.

5. Setting up Coveralls

Coveralls reports code coverage statistics of your tests on each push. Sign up on Coveralls and add your repository so that Coveralls can start monitoring it. The project already contains the required configuration for Coveralls to work. All subsequent builds after adding your project will generate a coverage report.

6. Setting up Circle CI

The project uses CircleCI to build its documentation from the master branch and host it using Github Pages. Again, you will need to Sign Up and authorize CircleCI. The configuration of CircleCI is governed by the circle.yml file, which needs to be mofified if you want to setup the docs on your own website. The values to be changed are

Variable Value
USERNAME The name of the user or organization of the repository where the project and documentation is hosted
DOC_REPO The repository where the documentation will be hosted. This can be the same as the project repository
DOC_URL The relative URL where the documentation will be hosted
EMAIL The email id to use while pushing the documentation, this can be any valid email address

In addition to this, you will need to grant access to the CircleCI computers to push to your documentation repository. To do this, visit the Project Settings page of your project in CircleCI. Select Checkout SSH keys option and then choose Create and add user key option. This should grant CircleCI privileges to push to the repository https://github.com/USERNAME/DOC_REPO/.

If all goes well, you should be able to visit the documentation of your project on

https://github.com/USERNAME/DOC_REPO/DOC_URL

7. Adding Badges

Follow the instructions to add a Travis Badge, Coveralls Badge and CircleCI Badge to your repository's README.

8. Advertising your package

Once your work is mature enough for the general public to use it, you should submit a Pull Request to modify scikit-learn's related projects listing. Please insert brief description of your project and a link to its code repository or PyPI page. You may also wish to announce your work on the scikit-learn-general mailing list.

9. Uploading your package to PyPI

Uploading your package to PyPI allows users to install your package through pip. Python provides two repositories to upload your packages. The PyPI Test repository, which is to be used for testing packages before their release, and the PyPI repository, where you can make your releases. You need to register a username and password with both these sites. The username and passwords for both these sites need not be the same. To upload your package through the command line, you need to store your username and password in a file called .pypirc in your $HOME directory with the following format.

[distutils]
index-servers =
  pypi
  pypitest

[pypi]
repository=https://pypi.python.org/pypi
username=<your-pypi-username>
password=<your-pypi-passowrd>

[pypitest]
repository=https://testpypi.python.org/pypi
username=<your-pypitest-username>
password=<your-pypitest-passowrd>

Make sure that all details in setup.py are up to date. To upload your package to the Test server, execute:

python setup.py register -r pypitest
python setup.py sdist upload -r pypitest

Your package should now be visible on: https://testpypi.python.org/pypi

To install a package from the test server, execute:

pip install -i https://testpypi.python.org/pypi <package-name>

Similary, to upload your package to the PyPI server execute

python setup.py register -r pypi
python setup.py sdist upload -r pypi

To install your package, execute:

pip install <package-name>

Thank you for cleanly contributing to the scikit-learn ecosystem!

Sik

Recomendations

Use a virtual environment (Virtualenv + VirtualenvWrapper)

Virtual-environments are not virtual machines. Virtual-environments are used to avoid library classing between the libraries of a project and those fom the system. Find more information in this virtual environment post describing how to use virtual environment for a mozilla marketplace testing.

Use the following to create a simblefaron environment based on the ./requirements.txt associated with the source directory ./src:

mkvirtualenv simblefaron -a . -r ./requirements.txt

Notice that mkvirtualenv activates such environment. The command deactivate is used to exit the virtual environment. Once the virtual environment exist on the system, the command workon simblefaron is rather convenient since it jumps into the working directory and activates the virtual enviroment.

Remember to keep requirements.txt up to date. For more details regarding the usage of the virtual enviroment, please look at the command reference.

Initial data encription (a tar.gz encripted taken from)

tarring and compression is a job for tar and gzip or bzip2, crypto is a job for either gpg or openssl:

Encrypt

 % tar cz folder_to_encrypt | \
      openssl enc -aes-256-cbc -e > out.tar.gz.enc

Decrypt

 % openssl aes-256-cbc -d -in out.tar.gz.enc | tar xz

Or using gpg

 % gpg --encrypt out.tar.gz

the openssl-variant uses symetric encryption, you would have to tell the receiving party about the used 'password' (aka 'the key'). the gpg-variant uses a combination of symetric and asymetric encryption, you use the key of the receiving party (which means that you do not have to tell any password involved to anyone) to create a session key and crypt the content with that key.