This repository contains Jupyter notebooks introducing beginners to the Python packages Numpy and Pandas. The material has been designed for people already familiar with Python but not with its "scientific stack".
This material has been created by Guillaume Witz (Science IT Support, Microscopy Imaging Center, Bern University) in the frame of the courses offered by ScITS.
The course has the following content:
- Numpy arrays:: what they are and how to create, import and save them
- Maths with Numpy arrays: applying functions to arrays, doing basic statistics with arrays
- Numpy and Matplotlib: Basics of plotting Numpy arrays with Matplotlib
- Recovering parts of arrays: Using array coordinates to extract information (indexing, slicing)
- Combining arrays: Assembling arrays by concatenation, stacking etc. Combining arrays of different sizes (broadcasting)
- Introduction to Pandas: What does Pandas offer?
- Pandas data structures: Series and dataframes
- Importing data to Pandas: Importing data tables into Pandas (from Excel, CSV) and plotting them
- Pandas operations: Applying functions to the contents of Pandas dataframes (classical statistics,
apply
function etc.) - Combining Pandas dataframes: Using concatenation or join operations to combine dataframes
- Analyzing Pandas dataframes: Split dataframes into groups (
groupy
) for category-based analysis - A real-world example: Complete pipeline including data import, cleaning, analysis and plotting and showing the nitty-gritty issues one often faces with real data
During live sessions of the course, you are given access to a private Jupyter session and don't need to install anything no your computer.
Outside live-sessions, this entire course can still be run interactively without any local installation thanks to the mybinder service. For that just click on the mybinder tag at the top of this Readme. This will open a Jupyter session for you with all packages, notebooks and data available to run.
Alternatively you can also run the course on Google Colab. For that just click on the Colab badge at the top of this file.
For a local installation, we recommend using conda to create a specific environment to run the code. If you don't yet have conda, you can e.g. install miniconda, see here for instructions. Then:
- Clone the repository to your computer using this link and unzip it
- Open a terminal and move to the
NumpyPandas_course-master/binder
folder - Here you find an
environment.yml
file that you can use to create a conda environment. Choose an environment name e.g.numpypandas
and type:conda env create -n numpypandas -f environment.yml
- When you want to run the material, activate the environment and start jupyter:
Note that the top folder of your directory in Jupyter is the folder from where you started Jupyter. So if you are e.g. in the
conda activate numpypandas jupyter lab
binder
folder, move one level up to have access to the notebooks
In the Pandas part, we use some data provided publicly by the Swiss National Science foundation at this link: http://p3.snf.ch/Pages/DataAndDocumentation.aspx#DataDownload. The examples of analysis on these data are in no way confirmed or validated by the SNSF and are entirely the work of Guillaume Witz, Science IT Support, Bern University.