Skip to content

Latest commit

 

History

History
234 lines (166 loc) · 9.47 KB

1.environment.md

File metadata and controls

234 lines (166 loc) · 9.47 KB

Installing a Python dev environment

git and github

Once you have a github account and installed the git command on your system, open a new terminal session (use Git Bash under Windows) type the following commands.

  • Fork scikit-learn in the github web interface: go to https://github.com/scikit-learn/scikit-learn and click the "fork" button. You should be automatically redirected to your personal fork at: https://github.com/myusername/scikit-learn in your web browser.

  • Then, in the terminal clone your fork with git:

    $ git clone git@github.com:myusername/scikit-learn.git
    

    Note that the $ sign is a generic prompt indicator for terminal commands. Please do not copy it when you copy-paste commands from this document.

  • Many open source projects from the Python ecosystem share similar development practices. For instance, you can also (optionally) clone numpy with git if you want to use the development version of numpy instead of a released package:

    $ git clone git@github.com:myusername/numpy/numpy.git
    
  • After cloning those repo, you should see a new local folders with your clones in the output of the ls command:

    $ ls
    
  • To locate those folder, use the pwd (path to working directory) command:

    $ pwd
    
  • Configure some aliases for the remote repositories:

    List existing remotes in your scikit-learn clone:

    $ cd scikit-learn
    $ git remote -v
    

    You should a similar output to:

    origin          git@github.com:myusername/scikit-learn.git (fetch)
    origin          git@github.com:myusername/scikit-learn.git (push)
    

    Add a new remote for the reference scikit-learn repository on GitHub (i.e scikit-learn/scikit-learn) which is conventionally called upstream:

    $ git remote add upstream https://github.com/scikit-learn/scikit-learn.git
    

    Check that your the remote has been properly configured:

    $ git remote -v
    

    You should now get a similar output to:

    origin          git@github.com:myusername/scikit-learn.git (fetch)
    origin          git@github.com:myusername/scikit-learn.git (push)
    upstream        git@github.com:scikit-learn/scikit-learn.git (fetch)
    upstream        git@github.com:scikit-learn/scikit-learn.git (push)
    

conda

Conda is a command line tool to download software packages and work in isolated environements for different projects.

The fastest way to install the conda tool is to use a miniforge installer.

Install Miniforge

  • Install Miniforge from the official installation page (choose the latest Miniforge installer links for your Operating System version)

  • Initialize the conda command in git bash

    • Windows: open "Git Bash" and type
    $ cd Downloads/
    $ ./Miniforge3-Windows-x86_64.exe
    
    • Linux & macOS:
    $ cd Downloads/
    $ bash Miniforge3-*.sh
    

    And follow the instructions.

  • Make sure your initialized your shell environment:

    • Windows (with Git Bash) and Linux
    $ conda init bash
    
    • macOS uses zsh instead of bash by default:
    $ conda init zsh
    
  • then close your shell and start a new one to type:

    $ conda info
    

    and look for the location of the "base environment".

    or

    $ where conda
    

    to check that the conda command is in your PATH and useable from your shell.

conda environments

conda environments make it possible to have specific versions of your packages to work on a specific project independently of the dependencies used for other projects. Once your are done with a project it's very easy to delete a conda environment to avoid accumulating packages you no longer need on your system.

conda environments also make it easy to make sure that the versions of the packages you use on your developer environment matchs those used by your team members or those required by the production environment for instance.

  • create an environment named sklworkshop:
$ conda create --name sklworkshop -c conda-forge numpy scipy cython joblib threadpoolctl pytest matplotlib pandas

if your are on macOS you should add the compilers packages to that list:

$ conda create --name sklworkshop -c conda-forge numpy scipy cython joblib threadpoolctl pytest matplotlib pandas compilers

Note: the -c conda-forge flag is not necessary if you installed conda with the miniforge installer, but it is necessary if you use a conda command installed from the Miniconda or Anaconda installers.

  • activate and deactivate environments
$ conda activate sklworkshop
(sklworkshop)$ conda deactivate
  • version, environment and package listing
$ conda --version
$ conda env list
$ conda list

VS Code

VS Code is a very popular open source code editor with a rich set of extensions to turn it into a full fledged Integrated Development Environment (with fast code navigation, auto completion, pytest execution, debugger, jupyter notebook editing and execution...).

Here we show the main tips and tricks to get productive when using VS Code to work on Python projects such as scikit-learn (including the most useful keyboard shortcuts).

Install VSCode following the instructions for your Operating System.

Launch VS Code. At the first start you might get a popup to ask you to configure Telemetry settings. Feel free to disable telemetry if you don't want VS Code to report any data to its developers.

Install the Python extension:

  • Ctrl+Shift+X to open the extension manager
  • search for the python extension: install

Note to macOS users: replace Ctrl by Command on most of the keyboard short-cuts presented in this document.

Open project folder for scikit-learn: Ctrl+Shift+P then type: "File: Open Folder..." and open the scikit-learn folder your create when running the git clone command above.

Open a Python file from the project by clicking on setup.py in the left panel named "EXPLORER".

In order to work with VSCode in your Python environment

  • Ctrl+Shift+P then "python select interpreter" and choose "sklworkshop"

Optionally, you can open a new project folder for NumPy similarly: Ctrl+Shift+P then type: "File: Open Folder..." and select the numpy folder.

Activate the "sklworkshop" Python interpreter for the numpy project as well.

Switch between projects: Ctrl-r

Browse the code:

  • by files Ctrl-p
  • by symbols Ctrl-t

At some point VSCode will complain about not finding a linter: scikit-learn uses flake8

  • Install flake8 in your conda environment
$ conda activate sklworkshop
(sklworkshop)$ conda install flake8
  • Select flake8 as a linter in VS Code: Ctrl-Shift-P "select linter"

Practical code navigation

Find example files that mention the word "importance" in different ways:

Navigate to the RandomForestClassifier class from the plot_permutation_importance.py example:

  • VS Code: ctrl-clicking on the class name
  • GitHub: clicking on the class name

Find the class KMeans in scikit-learn in two different ways:

  • from the command line, in a bash or zsh terminal: use git grep "class KMeans" (note that using the "class" prefix makes the search more specific to only find the line of the class definition. Otherwise you will find all occurrences of the KMeans class, including in documentation, tests, examples...). When your are not sure about the casing, use the git grep -i "keyword" for case insensitive search instead.
  • VS Code: Ctrl-t and type "KMeans". If nothing happens, press Enter, and select the KMeans class from the list of suggestions.

Installing C/C++ compilers to be able to build native extensions

Building scikit-learn from source requires a C/C++ compiler (to build native extensions typically written in Cython for instance).

If you have never installed a C/C++ compiler for your system you need to do it now.

macOS users: feel free to install the compilers package from conda-forge in your environment if your did not do it already. After installation, you need to deactivate and reactivate your environment for this installation to be effective.

$ conda install -n sklworkshop compilers
$ conda deactivate
$ conda activate sklworkshop

The scikit-learn build instructions link below gives more details

See instructions for your OS in the installation guide.