Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelization refactor adding joblib prefer="processes" #100

Merged
merged 2 commits into from
Jun 24, 2024

Conversation

vitorandreazza
Copy link
Contributor

This PR is a refactor to higher parallelization by utilizing the prefer="processes" option of the joblib.Parallel class. This change permits the full usage of the CPU cores and processing power, significantly reducing the total runtime of the .fit.
For example, in this simple test:

series = pd.Series([random.randint(1, 1000) for _ in range(10000)])
f = Fitter(series, timeout=20)
f.fit(progress=True)

Current running time output:

Fitting 113 distributions:   0%|          | 0/113 [00:00<?, ?it/s]
Fitting 113 distributions:   2%|▏         | 2/113 [00:00<00:10, 10.80it/s]
Fitting 113 distributions:   5%|▌         | 6/113 [00:00<00:08, 12.40it/s]
...
Fitting 113 distributions:  96%|█████████▋| 109/113 [01:17<00:03,  1.23it/s]
Fitting 113 distributions:  97%|█████████▋| 110/113 [01:17<00:01,  1.55it/s]
Fitting 113 distributions:  98%|█████████▊| 111/113 [01:19<00:01,  1.16it/s]
Fitting 113 distributions:  99%|█████████▉| 112/113 [01:20<00:01,  1.10s/it]
Fitting 113 distributions: 100%|██████████| 113/113 [01:21<00:00,  1.39it/s]

PR running time output:

Fitting 113 distributions:   0%|          | 0/113 [00:00<?, ?it/s]
Fitting 113 distributions:   1%|          | 1/113 [00:01<02:48,  1.51s/it]
Fitting 113 distributions:   4%|▍         | 5/113 [00:01<00:26,  4.07it/s]
...
Fitting 113 distributions:  93%|█████████▎| 105/113 [00:07<00:01,  5.51it/s]
Fitting 113 distributions:  96%|█████████▋| 109/113 [00:09<00:01,  3.71it/s]
Fitting 113 distributions:  98%|█████████▊| 111/113 [00:20<00:00,  3.71it/s]
Fitting 113 distributions:  99%|█████████▉| 112/113 [00:22<00:00,  1.02it/s]
Fitting 113 distributions: 100%|██████████| 113/113 [00:22<00:00,  4.97it/s]

@coveralls
Copy link

Coverage Status

coverage: 92.727% (-7.3%) from 100.0%
when pulling b1418d8 on vitorandreazza:main
into 6a8c238 on cokelaer:main.

@cokelaer
Copy link
Owner

@vitorandreazza looks great to me. multiprocessing is always a bit tricky to test though. I will merge your work since it looks really good. All tests are good locally not in the continuous integration but i believe this is just not related. I'll update the version, readme and add your contribution straightway in the main branch. again thanks a lot.

@cokelaer cokelaer merged commit 652cfd2 into cokelaer:main Jun 24, 2024
2 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants