Easily Speed up Pandas with Modin #5646
Replies: 4 comments 2 replies
-
Hi, @shoukewei. That is a great article. I think it will be definetely useful for plenty of Modin users. Thanks for writing it. I would also propose to you to pay attention on Modin on unidist with MPI backend. Though it may be slower than Ray or Dask on certain cases, exploring the operations which are slower with MPI would help us a lot in order to focus on the concrete cases. |
Beta Was this translation helpful? Give feedback.
-
Hi, @YarShev, thanks for positive feedback. Actually, I tried unidist with MPI backend, but there is error. I have not found a solution. So I have not included in the article. The error goes as follows: `UserWarning: unidist execution environment not yet initialized. Initializing...
Exception Traceback (most recent call last) ~\anaconda3\lib\site-packages\modin\logging\logger_decorator.py in run_and_log(*args, **kwargs) ~\anaconda3\lib\site-packages\modin\pandas\general.py in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy) ~\anaconda3\lib\site-packages\modin\logging\logger_decorator.py in run_and_log(*args, **kwargs) ~\anaconda3\lib\site-packages\modin\pandas\io.py in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options) ~\anaconda3\lib\site-packages\modin\pandas\io.py in _read(**kwargs) ~\anaconda3\lib\site-packages\modin\config\pubsub.py in subscribe(cls, callback) ~\anaconda3\lib\site-packages\modin\pandas_init_.py in _update_engine(publisher) ~\anaconda3\lib\site-packages\modin\core\execution\unidist\common\utils.py in initialize_unidist() ~\anaconda3\lib\site-packages\unidist\api.py in init() ~\anaconda3\lib\site-packages\unidist\core\base\utils.py in init_backend() ~\anaconda3\lib\site-packages\unidist\core\backends\mpi\utils.py in initialize_mpi() ~\anaconda3\lib\site-packages\unidist\core\backends\mpi\core\controller\api.py in init() mpi4py\MPI\Comm.pyx in mpi4py.MPI.Intracomm.Spawn() Exception: Internal MPI error!, error stack: |
Beta Was this translation helpful? Give feedback.
-
Hi Laroslav,
thanks for your solution. It works using `mpiexec -n 1 python script.py` in
terminal. It uses only 13.239 seconds, which is faster than Dask and Ray.
But originally, I worked in Jupuyter notebook, and the error came from
running in the Jupyter notebook. Do you have any idea about the error in
Jupyter notebook.
Thanks
Shouke
…On Mon, Feb 13, 2023 at 12:34 PM Iaroslav Igoshev ***@***.***> wrote:
@shoukewei <https://github.com/shoukewei>, most likely, you ran your code
just with python script.py, but if you look closer at the instruction
<https://modin.readthedocs.io/en/latest/development/using_pandas_on_unidist.html>,
you will notice that the script should be run as follows.
mpiexec -n 1 python script.py
Please let me know if it is not obvious from the instruction.
—
Reply to this email directly, view it on GitHub
<#5646 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ASFUUJ27C3BE7WGDYE7WRM3WXKLFVANCNFSM6AAAAAAUZALX3E>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Thanks for all information. |
Beta Was this translation helpful? Give feedback.
-
I just wrote a blog article about speeding up Pandas with Modin, in which it compares the speed of reading multiple data files with Pandas, Modin with Ray and Modin with Dask. I hope this article could helful to some of you guys.
Beta Was this translation helpful? Give feedback.
All reactions