Skip to content
This repository has been archived by the owner on Oct 25, 2024. It is now read-only.

Measuring performance and using numpy #185 #201

Open
wants to merge 8 commits into
base: week10
Choose a base branch
from

Conversation

nuttamas
Copy link

@nuttamas nuttamas commented Dec 24, 2020

Running the given function without numpy

python -m timeit -n 100 -r 5 -s "from calc_pi import calculate_pi_timeit" "calculate_pi_timeit(10_000)()"

get

100 loops, best of 5: 7.66 msec per loop

After use numpy functions, run:

python -m timeit -n 100 -r 5 -s "from calc_pi_np import calculate_pi_timeit" "calculate_pi_timeit(10_000)()"

get

100 loops, best of 5: 2.23 msec per loop

numpy makes the code faster.

@nuttamas
Copy link
Author

nuttamas commented Dec 24, 2020

Profiling code #186
Try the given code in the JupyterNotebook

def sum_of_lists(N):
total = 0
for i in range(5):
L = [j ^ (j >> i) for j in range(N)]
# j >> i == j // 2 ** i (shift j bits i places to the right)
# j ^ i -> bitwise exclusive or; j's bit doesn't change if i's = 0, changes to complement if i's = 1
total += sum(L)
return total

and run %prun sum_of_lists(10_000_000)

Got result below

14 function calls in 13.066 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)

5 10.708 2.142 10.708 2.142 :4()
5 1.422 0.284 1.422 0.284 {built-in method builtins.sum}
1 0.761 0.761 12.891 12.891 :1(sum_of_lists)
1 0.175 0.175 13.066 13.066 :1()
1 0.000 0.000 13.066 13.066 {built-in method builtins.exec}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}

The lines that take most time are L = [j ^ (j >> i) for j in range(N)] and total += sum(L)

@nuttamas
Copy link
Author

nuttamas commented Dec 24, 2020

Then try
%load_ext line_profiler
%lprun -f sum_of_lists sum_of_lists(10_000_000)

Got the result

Timer unit: 1e-06 s

Total time: 19.2983 s
File:
Function: sum_of_lists at line 1

Line # Hits Time Per Hit % Time Line Contents

1                                           def sum_of_lists(N):
2         1          2.0      2.0      0.0      total = 0
3         6         22.0      3.7      0.0      for i in range(5):
4         5   18459236.0 3691847.2     95.7          L = [j ^ (j >> i) for j in range(N)]
5                                                   # j >> i == j // 2 ** i (shift j bits i places to the right)
6                                                   # j ^ i -> bitwise exclusive or; j's bit doesn't change if i's = 0, changes to complement if i's = 1
7         5     839024.0 167804.8      4.3          total += sum(L)
8         1          1.0      1.0      0.0      return total

The line that takes most time is the looping command: L = [j ^ (j >> i) for j in range(N)]

@nuttamas
Copy link
Author

nuttamas commented Dec 24, 2020

Approximating π using Numba/Cython #195

run calc_pi_numba.py, got the result

Elapsed (with compilation) = 900 msec
pi = 3.1272 (with 10000)
Elapsed (after compilation) = 180 μsec
pi = 3.142 (with 10000)

The code takes much less time than the original.

Using Cython, got the result

5.48 ms ± 112 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

takes longer than number

@nuttamas
Copy link
Author

nuttamas commented Dec 24, 2020

Improving performance using MPI #194
The code uses the Message Passing Interface (MPI) to accomplish this approximation of π.

run python calc_pi.py -np 10_000_000 -n 1 -r 1
got

pi = 3.1411436 (with 10000000)
1 loops, best of 1: 7.97 sec per loop

numpy is the fastest MPI could work faster when performing parallel tasks.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants