Skip to content

Releases: Bears-R-Us/arkouda

Release Notes v2024.10.02

03 Oct 00:12
a44dd0f
Compare
Choose a tag to compare

Bug Fixes

  • Issue #3762 - Fix dataframe groupby aggregations when keys contain NaNs
  • Issues #3658, #3650, #3654, #3783, #3784, #3788 and PR #3386 - Fix IO bugs including:
    • reading segarrays containing NaNs and empty segments with hdf5 and parquet
    • reading dataframes containing uint and int segarray columns
    • CSV address sanitizer "use after free" memory issues
  • Issues #3648, #3676, #3682, #3679, #3687, #3666 - Fix multidimensional bugs in sorting, nonzero, repeat, flatten, and unflatten
  • Issue #3367 - Fixes racy condition in SegHead function
  • Issue #3468 - Fixes round trip discrepancies for Index with Categorical values
  • Issue #3649 - Fixes bitshift failures
  • Issue #3467 - Fixes indexing error in DataFrame instantiation

Major Updates

Minor Updates

Auto-Generated Release Notes
Read more

Release Notes v2024.06.21

21 Jun 19:30
cf6eeac
Compare
Choose a tag to compare

Bug Fixes

  • Issues #3074, #3234 - Fix bug reading Segarrays from parquet files
  • Issues #3001, #3185 - Fix broadcast bugs involving nans and Strings
  • Issue #3156 - Fixes Categorical.sort_values bug
  • Issues #3311, #3112 - Fix Parquet multi column byte writing and Parquet string column free
  • Issue #3115 - Fixes non-deterministic sparse_sum failure
  • Issue #3089 - Avoids out of memory crashes caused by in intents on makeDistArray
  • Issue #3009 and PRs #3232, #3316 - Improve performance of indexof1d and fix handling of null values
  • Issues #3158, #3222 - Fix print bugs involving Dataframe or Series containing a Segarray

Major Updates

  • PR #3303 - Drops support for Chapel 1.31
  • Issues #3343, #3346 - Pin numpy < 2.0 and python < 3.12.4
  • Issue #3148 - Updates IO functions to always return a dictionary
  • PRs #3238, #3314 and Issue #3347 - Reimplements CSV read to increase performance
  • Issue #3108 - Adds groupby.sample and dataframe.groupby.sample
  • Issue #2893 - Changes the behavior of dataframe.GroupBy.count to align with pandas
  • Issues #3086, #3118, #3245, #3322, #3167 and PRs #3110, #3280 - Add updates to Random module:
    • Adds choice, poisson, normal to random number generators
  • PRs #3242, #3305, #3160, #3223, #3237, #3142 - Improvements to Array API:
    • Add documentation for Array API functions
    • Add implementations ofvstack, clip, diff,pad and missing stats, search, and sort functions to Array API module
    • Compatibility improvements for Xarray chunk-manager
  • Issues #3213, #3206, #3202, #3208, #3217, #3188 - Add Index and MultiIndex properties:
    • Including levels,equals, names, ndim, etc
  • Issues #3050, #3192, #3128, #3196, #3198, #3200, #3130, #3123, #3194 - Work on proto tests:
    • Improvements to tests for dataframe, dtypes, groupby, io,numeric, symbol_table
    • Adds make-proto-tests command and updates our CI to run it

Minor Updates

  • Issues #3006, #3007 - Add median and count_nonzero
  • Issues #3079, #3080 - Add sum and += for boolean pdarrays
  • PRs #3221, #3211 - Add NYC taxi tutorial from CUG 2024
Auto-Generated Release Notes
Read more

Release Notes v2024.04.19

19 Apr 20:37
8ac2645
Compare
Choose a tag to compare

Bug Fixes

  • PR #3091 - Fixes Parquet double reads to properly account for null values
  • Issue #3087 - Fixes bug when reading non-float parquet columns with null values
  • Issue #3088 and PR #3090 - Fix an off by 1 bug in sparse_sum_helper

Major Updates

  • Issue #3083 - Optimizes Parquet Strings read
  • Issues #3033, #3054 - Optimize CSV write
  • Issues #3020, #3040 - Adds nan functions to DataFrame and Series
    • isna, notna, dropna, ...
  • Issues #3071, #3084 - Add permutation and shuffle to random number generators
  • Issue #3030 - Creates numpy subdirectory as part of the alignment effort
  • PRs #3056, #3093, #3070, #3072 - Improves and adds Array API functionality including manipulation and set functions

Minor Updates

  • PR #3076 - Adds support for large string Parquet type
  • Issue #3092 - Adds support for TLS token authentication
  • Issue #3045 - Adds map method to Index
  • Issue #3065 - Adds count to DataFrame
  • Issue #2913 - Adds isdecimal to Strings
  • Issue #3002 - Adds clip to pdarray
  • Issue #3062 - Enhances arkouda metrics capability
Auto-Generated Release Notes

New Contributors

Full Changelog: v2024.03.18...v2024.04.19

Release Notes v2024.03.18

18 Mar 22:51
e07f70e
Compare
Choose a tag to compare

Bug Fixes

  • Issue #3035 - Fixes inconsistent results when broadcasting with empty segments
  • Issue #2939 - Fixes TypeError in DataFrame.reset_index
  • Issue #2966 - Fixes error when pip installing from a tar file
  • Issue #2897 - Fixes bug where DataFrame.corr returns DataFrame without index
  • PR #3021 - Adds SegArray optimization and benchmark bug fix

Major Updates

  • Issue #2958 - Renames akstats to akscipy
  • Issue #2942 - Removes DataFrame.sorted
  • Issue #3024 and PR #2976 - Add sparse sum helper to util with merge based and sort based workflows
  • Issues #2993, #3008, #3017 - Add a random subfolder and stateful Generator objects
  • Issue #2974 - Adds Series.map
  • Issue #3019 - Adds outer join option to DataFrame merge
  • PRs #2936, #2967, #3014, #3027 - Improve Array API functionality specifically adding stats and manipulation functions

Minor Updates

  • Issue #2929 - Updates DataFrame.size to match pandas
  • Issues #2906, #2945 - Add shift operators between 2 bool pdarrays and between a combination bool and int64 pdarrays
  • Issues #2916, #2919 - Add isspace and capitalize to Strings
  • Issue #3023 - Adds to_markdown to DataFrame and Series
  • Issue #2957 - Adds Dot Function
  • Issue #2960 - Adds memory_usage functions
  • Issue #2924 - Updates DataFrame documentation
  • Issue #2896 - Updates DataFrame columns to return an Index
  • Issue #2952 - Makes Chapel 1.33 release default for CI testing
  • Issue #2985 - Updates libzmq version in Makefile
  • Issue #2981 - adds LICENSES folder including the licenses for numpy, pandas, and scipy
  • Issues #2969, #2971, #2977, #2989 - Update failing proto_tests
Auto-Generated Release Notes

Full Changelog: v2024.02.02...v2024.03.18

Release Notes v2024.02.02

02 Feb 22:33
3613d76
Compare
Choose a tag to compare

Bug Fixes

  • Issues #2647, #2650, #2661, #2666 - Fix bugs in filter, remove_repeats, append_single and get_jth for segarrays with empty segments
  • Issue #2937 - Fixes string read bug for large Parquet files

Major Updates

  • Issue #2853 - Adds df.merge on a mix of String and integer columns
  • Issues #2862, #2863, #2871 - Implement histogram2d and histogramdd
  • Issue #2927 - Adds power divergence statistic, chisquare, and xlogy
  • Issues #2905, #2912, #2914, #2918 - Add isalpha, isalnum, isdigit, and isempty for Strings
  • Issues #2888, #2878 - Add float support to in1d and groupby

Minor Updates

  • Issues #2873, #2886 - Enable short strings optimization for multi-column groupby
  • Issue #2894 - Adds dropna option to dataframe groupby
  • Issue #2461 - Adds where argument to trig functions
  • Issue #2430 - Adds shift equals for pdarray
  • Issue #2831 - Aligns dataframe.groupby().size() and dataframe.groupby().sum() with pandas
  • PRs #2829, #2865, #2876 - Add infrastructure and partial implementation of array API
  • PR #2870 - Expands DataFrame initializer
  • PR #2931 - Aligns Series indexing methods with pandas
  • Issue #2909 - Renames is_upper/lower/title and to_upper/lower/title to match numpy
Auto-Generated Release Notes

New Contributors

Full Changelog: v2023.11.15...v2024.02.02

Release Notes v2023.11.15

15 Nov 20:10
b7a0c22
Compare
Choose a tag to compare

Bug Fixes

  • Issue #2816 - Resolves nil config file bug
  • Issue #2804 - Fixes missing values in CSV bug
  • Issue #2825 - Fixes inconsistent Categorical print
  • Issue #2849 - Fixes bug in getEnv

Major Updates

  • PR #2844 - Drops support for Chapel 1.30
  • Issue #2838 - Expands dataframe merge functions to accept multiple columns
  • Issue #2810 - Expands inner_join to accept a list of pdarrays
  • Issues #1882, #2833, #2843 - Update dependencies to support python 3.x/3.12

Minor Updates

  • Issue #2823 - Adds casting between Strings and Categorical
  • Issue #2830 - Implements division and floor division for int64 and uint64 dtypes
  • PR #2821 - Adds support for reading Decimal128 Parquet columns
  • Issue #2819 - Adds error when reading a Parquet type that isn't supported
Auto-Generated Release Notes
  • Fixes #2816: Resolve nil config file bug by @pierce314159 in #2817
  • Fixes #2804: Missing values in CSV bug by @pierce314159 in #2813
  • fixed typo in README.md by @daulatojha17 in #2818
  • Closes #2819 - Throw error when reading a Parquet type that isn't supported by @bmcdonald3 in #2820
  • Closes #2815: Update install docs by @pierce314159 in #2822
  • Fixes #2825: Inconsistent Categorical print by @pierce314159 in #2826
  • Closes #2823: Casting between Strings and Categorical by @pierce314159 in #2827
  • Add support for reading Decimal128 Parquet columns by @bmcdonald3 in #2821
  • Closes #1882, #2833: CI failures due to python 3.x/3.12 by @pierce314159 in #2834
  • Add directory with files for Parquet C++ comparison by @bmcdonald3 in #2832
  • Closes #2810: Expand inner_join to accept a list of pdarrays by @pierce314159 in #2837
  • Fix a Makefile conditional to test for empty string instead of 'none' by @bradcray in #2841
  • Closes #2843: Add python 3.12/3.x check to CI by @pierce314159 in #2840
  • Drop support for Chapel 1.30 by @brandon-neth in #2844
  • Updates to the macOS build instructions + additional contributor guide by @brandon-neth in #2845
  • Fixes #2849: bug in getEnv by @pierce314159 in #2850
  • Closes #2830 Implement division and floor division for int64 and uint64 dtypes. by @jaketrookman in #2847
  • Closes #2838: Expand dataframe merge functions to accept multiple columns by @pierce314159 in #2848

Full Changelog: v2023.10.06...v2023.11.15

Release Notes v2023.10.06

06 Oct 21:23
f3f1de8
Compare
Choose a tag to compare

Bug Fixes

  • Issue #2802 - Fixes concatenate and arange mishandling max_bits
  • Issues #2472, #2800 - Fix bug in bigint indexing with max_bits set
  • Issues #2675, #2773 - Fix argsort and coargsort on boolean arrays

Major Updates

  • Issues #2724, #2730 - Add HDF5 support for Index and MultiIndex
  • Issues #2679, #2767, #2774, #2783, #2798 - Resolve chpl 1.32 deprecation warnings
  • Issue #2805 - Upgrades arrow to 11.0.0
  • Issues #2759, #2757, #2762, #2764, #2790, #2794 - Work to improve pdarray creation performance and out of memory error handling
  • Issue #2405 - Adds server <-> server transfers of pdarrays
  • Issue #2716 - Adds dataframe merge functionality
  • Issue #2712 - Adds stridable slicing to Strings
  • Issue #2744 - Implements skew and hist_all

Minor Updates

  • Issue #2778 - Adds convert_categoricals flag to dataframe.to_parquet
  • Issue #2771 - Adds pda.value_counts
  • Issue #2521 - Adds uint support to histogram
  • Issue #2474 - Updates ConcatenateMsg to use aggregation for bigint
Auto-Generated Release Notes

New Contributors

Full Changelog: v2023.09.06...v2023.10.06

Release Notes v2023.09.06

07 Sep 02:24
e574712
Compare
Choose a tag to compare

Bug Fixes

  • Issue #2596 - Fixes datetime columns in dataframe display bug
  • Issue #2612 - Fixes an oob error with multilocale read_hdf of a segarray with string values and empty segments
  • Issues #2560, #2268, #2566 - Fix missing empty segments for parquet read of segarray with string values
  • Issue #2567 - Fixes error when reading SegArray containing nan with parquet
  • Issue #2581 - Fixes Strings.get_null_indicies incorrect results
  • Issues #2681, #1493 - Fix Series equality bug
  • Issues #2644, #2645 - Fix uint cast to str/float and add str cast to str
  • Issue #2703 - Fixes sort bug with nans
  • Issue #2617 - Fixes bug in comparison of segarrays containing empty segments
  • Issue #2711 - Fixes bug in Multindex indexing with bigint values
  • Issue #2579 - Fixes compression when writing bools with Parquet
  • Issues #2508, #2519, #2505 - Fix uint scalar binops handling and overhauls mod/fmod
  • Issue #2635 - Fixes Strings double delete

Major Updates

  • Issues #2548, #2550 - Drop support for Chapel 1.29 and recommend 1.31
  • Issues #2395, #2723, #2726, #2737 - Rework Register/Attach
  • Issues #2512, #2541, #2614 - Adds Snapshotting via HDF5
  • Issue #2493 - Adds parquet support for multi-column SegArray with String values
  • Issues #2749, #1166 - Add missing aggregations for Dataframe groupby and ability to aggregate on a list column names
  • Issue #2708 - Adds support for preserving DateTime, TimeDelta, and IPV4 when reading/writing with HDF5
Added New Testing Architecture
  • Issue #2504 - numeric_tests.py conversion for new test framework
  • Issue #2538 - client_test.py conversion for new test framework
  • Issue #2542 - message_test.py Conversion for new test framework
  • Issues #2547, #2553, #2554, #2555 - setops_tests.py conversion for new test framework
  • Issue #2526 - Add arc/hyperbolic Tests to new test framework
  • Issue #2570 - bigint_agg_test.py Conversion for new test framework
  • Issue #2536 - alignment_test.py Conversion for new test framework
  • Issue #2559 - dataframe_test.py conversion for new framework
  • Issue #2585 - parquet_test.py conversion for new framework
  • Issue #2537 - array_view_test conversion for new framework
  • Issue #2607 - groupby_test.py conversion to new framework
  • Issue #2616 - index_test.py conversion for new framework
  • Issue #2625 - coargsort_test.py conversion for new test framework
  • Issue #2602 - io_test.py conversion for new framework
  • Issue #2640 - Convert security_test.py to new test framework
  • Issue #2648 - logger_test.py conversion for new framework
  • Issue #2583 - client_dtypes_test conversion for new framework
  • Issue #2624 - import_export_test.py conversion for new test framework
  • Issue #2572 - bit_ops.py Conversion for new test framework
  • Issue #2605 - dtypes_test.py Conversion for new test framework
  • Issue #2659 - regex_test.py conversion to new test framework
  • Issue #2626 - join_test.py Conversion for new test framework
  • Issue #2668 - io_util_test.py refactor for new framework
  • Issue #2642 - segarray_test.py refactor for new test framework
  • Issue #2694 - extrema_test.py conversion for new test framework
  • Issue #2688 - operator_test.py conversion for new test framework
  • Issue #2684 - symbol_table_test.py refactor for new test framework
  • Issue #2705 - where_test.py refactor to new test framework
  • Issue #2700 - string_test.py conversion for new test framework
  • Issue #2654 - sort_test.py conversion for new test framework
  • Issue #2620 - indexing_test.py conversion for new framework
  • Issue #2656 - stats_test.py conversion for new test framework
  • Issue #2686 - categorical_test.py conversion for new test framework
  • Issue #2573 - pdarray_creation_test.py Conversion for new test framework
  • Issue #2697 - datetime_test.py conversion for new test framework
  • Issue #2651 - series_test.py conversion for new framework
  • Issue #2591 - Add dtype testing to test_multi_array_search_interval

Minor Updates

  • Issue #2053 - Adds Strings property accessors
  • Issue #2568 - Provides client access to arkouda locale memory information
  • Issue #2419 - Adds ability to log arkouda commands to a file
  • Issue #2702 - Capture only user-specified arkouda commands on client
  • Issues #2524, #2721, #2746 - Extend client to enable gRPC proxy server channel implementations
  • Issues #2477, #2728 - Provide dynamic available memory calculation for all locales
  • Issue #2400 - Adds arc and hyperbolic trig functions
  • Issue #2471 - Adds ak.full() for Strings
  • Issue #2603 - Reworks arange handling of uint and bigint arguments
  • Issue #2714 - Adds regex argument to categorical substring search
  • Issue #2695 - Adds uint support to extrema methods
  • Issues #2658, #2575 - Implement ServerStatusDaemon
  • Issue #2690 - Updates ak.load_all glob expression
Auto-Generated Release Notes
Read more

Release Notes v2023.06.20

20 Jun 16:33
df0de8a
Compare
Choose a tag to compare

Bug Fixes

  • Issue #2513 - Fixes bug where channel connection doesn't account for slurm changing the server host between connections. Refactored to create a new Channel for each client.connect execution.

Minor Updates

  • Issue #2515 - Updates file I/O documentation to include SegArrays containing string values
Auto-Generated Release Notes * refactored to create a new Channel for each client.connect execution by @hokiegeek2 in https://github.com//pull/2514 * Closes #2515 - File I/O Doc Updates by @Ethan-DeBandi99 in https://github.com//pull/2516

Full Changelog: v2023.06.16...v2023.06.20

v2023.06.16

16 Jun 12:24
eeebe70
Compare
Choose a tag to compare

Bug Fixes

  • Issue #2481 - Fixes Multi-Column Parquet not handling Empty Files Properly
  • Issue #2506 -Fixes Categorical Optional Components Required Bug
  • Issue #2414 - Fixes overMemLimit calc error

Major Updates

  • Issues #2424 and #2432 - Adds Strings value support for SegArray
  • Issue #2443 - Read/Write SegArray of Strings for HDF5
  • Issue #2444 - Adds SegArray with Strings Values Parquet Support (Does not include Multi-Colmn)
  • Issue #2386 - Read/Write support for GroupBy objects in HDF5
  • Issues #2434, #2459, #2462, and #2463 - Adds hashing support for Segarray, Strings, Categorical, BigInt
  • Issues #2006, #2032, #2416, and #2431 - BigInt Support Improvements
  • Issue #2304 - Adds inner_join on Strings and Categorical
  • Issue #2417 - Filename_Codes match Categorical.codes
  • Issue #2425 - Import/Export lists from/to pandas

Minor Updates

  • Issue #2454 - Updates SegArray.__getitem__ to Always Return pdarray
  • Issue #2418 - Adds instructions to set max per-locale CPU cores and memory
  • Issue #2433 - Updates GroupBy Object to only be client side
Auto-Generated Release Notes * Adjust these modules to avoid deprecation warnings from non-default Math symbols by @lydia-duncan in https://github.com//pull/2415 * Closes #2412: Update quickstart to v2023.05.05 by @pierce314159 in https://github.com//pull/2413 * Fix overMemLimit calc error by @hokiegeek2 in https://github.com//pull/2421 * Closes #2427 - Deprecation updates related to Memory and Memory.Diagnostics by @jabraham17 in https://github.com//pull/2422 * Closes #2425 - Import/Export lists from/to pandas by @Ethan-DeBandi99 in https://github.com//pull/2428 * Closes #2386 - `GroupBy.to_hdf` & `GroupBy.update_hdf` by @Ethan-DeBandi99 in https://github.com//pull/2426 * add instructions to set max per-locale CPU cores and memory by @hokiegeek2 in https://github.com//pull/2429 * Closes #2416 and #2006: bigint shift performance by @pierce314159 in https://github.com//pull/2423 * Closes #2431: Add bigint broadcast by @pierce314159 in https://github.com//pull/2437 * Closes #2441: Adds missing `use Biginteger` in gt-130 bigint compat by @pierce314159 in https://github.com//pull/2442 * fixed typo by @hokiegeek2 in https://github.com//pull/2448 * Closes #2417 - `Filename_Codes` match `Categorical.codes` by @Ethan-DeBandi99 in https://github.com//pull/2440 * Closes #2445 - Deprecation updates related to string and byte factory functions by @jabraham17 in https://github.com//pull/2446 * Closes #2451: Remove pragma no doc instances in Chapel code by @bmcdonald3 in https://github.com//pull/2452 * Updates for Chapel `list.append` deprecation by @jeremiah-corrado in https://github.com//pull/2450 * Closes #2432 - Revert SegArray to Client Side by @Ethan-DeBandi99 in https://github.com//pull/2439 * Closes #2304: `inner_join` on `Strings` and `Categorical` by @pierce314159 in https://github.com//pull/2453 * Closes #2467: Arrow compilation can fail with clang 15 upgrade changes default PIE by @bmcdonald3 in https://github.com//pull/2468 * Closes #2032 - BigInt Support for HDF5 by @Ethan-DeBandi99 in https://github.com//pull/2460 * Closes #2454 - Update `SegArray.__getitem__` to Always Return `pdarray` by @Ethan-DeBandi99 in https://github.com//pull/2466 * Closes #2424 - Adds `SegArray` support for `Strings` Values by @Ethan-DeBandi99 in https://github.com//pull/2469 * Closes #1211 - Remove TaskErrors Workaround by @Ethan-DeBandi99 in https://github.com//pull/2470 * Closes #2433: GroupBy back to client only by @pierce314159 in https://github.com//pull/2456 * Changes for Chapel `c_memcpy` replacement with `OS.POSIX.memcpy` by @jeremiah-corrado in https://github.com//pull/2479 * Closes #2482 - Fix capitalization of POSIX compatibility module by @jeremiah-corrado in https://github.com//pull/2483 * Closes #2443 - Read/Write SegArray of Strings HDF5 by @Ethan-DeBandi99 in https://github.com//pull/2478 * Closes #2459 and #2434: `ak.hash` for `Segarray` and `Strings` by @pierce314159 in https://github.com//pull/2475 * Fixes #2481 - Multi-Column Parquet does not handle Empty Files Properly by @Ethan-DeBandi99 in https://github.com//pull/2484 * Closes #2436 - Updates `_buildReadAllJSON` to use `ObjType` Enum by @Ethan-DeBandi99 in https://github.com//pull/2486 * Closes #2490: Change `checkInstall` path to be relative to script, not Arkouda by @bmcdonald3 in https://github.com//pull/2491 * Closes #2462: Categorical hashing by @pierce314159 in https://github.com//pull/2487 * Closes #2476: Updates Chapel Tutorial by @pierce314159 in https://github.com//pull/2494 * Design and implement client Channel class hierarchy by @hokiegeek2 in https://github.com//pull/2496 * Closes #2463: Hashing for bigint pdarrays by @pierce314159 in https://github.com//pull/2497 * Closes #2500 - Remove Old Test Prototype by @Ethan-DeBandi99 in https://github.com//pull/2501 * Closes #2502 - Remove ArkoudaWeeklyCall References by @Ethan-DeBandi99 in https://github.com//pull/2503 * Closes #2488: Quiet deprecation warnings in prep for Chapel 1.31 by @bmcdonald3 in https://github.com//pull/2489 * Fixes #2506 - Categorical Optional Components Required Bug by @Ethan-DeBandi99 in https://github.com//pull/2507 * Closes #2444 - SegArray with String Values Parquet Support by @Ethan-DeBandi99 in https://github.com//pull/2492

Full Changelog: v2023.05.05...v2023.06.17