Releases: cylondata/cylon
0.6.0
Cylon 0.6.0 is a major release. We are excited to present UCC, Gloo integration, More distributed operations
Features
Cylon C++ and Python
- Implemention of Slice, Head and Tail Operations
- adding conda docker
- Ucc integration
- adding cylonflow as a submodule
- Use generic operator
- Summit fixes
- Adding custom mpirun params cmake var
- Adding cmake parallelism flag
- Gloo python binding
- Enabling gloo CI
- Add downloading catch2 header dynamically
- Dist sort cpu
- Cylon Gloo integration
- Adding distributed scalar aggregates
- Extending datatypes
- Allowing custom MPI_Comm for MPI
Build
- Updating to Arrow 0.9.x
- Windows build support
- MacOS build support
- Conda build is the default build
- Improving docker build
You can download source code from Github
Conda binaries are available in Anaconda
Commits
91bdd54 Update conda-actions.yml (#645)
d1739ed Added buildable instructions for Rivanna (#643)
d9a6420 Arrow 9.0.0 and gcc-11 update (#601)
4c867b1 Summit Fixes (#623)
7f8a3b1 Fixing sample bug (#631)
ce12454 Cython binding for slice, head and tail (#619)
ef4c904 #610: SampleArray util method replaced by using arrow::compute::Take … (#612)
4694a9e Minor fixes (#608)
121b386 Fixing: Corrupted result when joining tables contain list data types #615 (#616)
68fa598 Summit fixes (#607)
de3ec7b fixing bash splitting (#606)
0a489fc adding cmake parallelism flag (#605)
035fd70 Implement Slice, Head and Tail Operation in both centralize and distr… (#592)
d99a6f2 adding custom mpirun params cmake var (#604)
f20c119 Update README-summit.md (#603)
4bc27f9 Create README-summit.md (#602)
e6b7306 Minor fixes (#596)
2e6ac80 adding conda docker (#600)
4dd359f Ucc integration (#591)
61b4a82 adding cylonflow as a submodule (#593)
e4dd38b Use generic operator (#583)
6c0dfa8 Gloo python binding (#587)
773f11f Gloo python bindings (#585)
2fc95be Add downloading catch2 header dynamically (#584)
c56ab2d Enabling gloo CI (#582)
a820ed8 Dist sort cpu (#574)
f68cc62 Adding UCC build (#579)
2759a30 Cylon Gloo integration (#576)
b2c0820 Adding distributed scalar aggregates (#570)
9c2fdc4 Extending datatypes (#568)
e3d553c Bump ua-parser-js from 0.7.22 to 0.7.31 in /docs (#566)
3bafb75 Bump ssri from 6.0.1 to 6.0.2 in /docs (#565)
814a463 minor fixes (#564)
be92253 Bump lodash from 4.17.20 to 4.17.21 in /docs (#561)
e87dd7c Bump shelljs from 0.8.4 to 0.8.5 in /docs (#562)
71bd8bf Bump nanoid from 3.1.22 to 3.2.0 in /docs (#563)
49b343d Allowing custom MPI_Comm for MPI (#559)
fa52dd4 Update contributors.md
54d4a53 added io functions (#550)
1a8c3d7 Fixing 554 (#558)
887ea18 update arrow link (#557)
1ce4c6b Fixing 552 (#553)
f5e31a1 Merging 0.5.0 release (#547)
Contributors
Ahmet Uyar
Chathura Widanage
Damitha Sandeepa Lenadora
dependabot[bot]
Hasara Maithree
Kaiying Shan
niranda perera
Supun Kamburugamuve
Vibhatha Lakmal Abeykoon
Ziyao22
Arup Kumar Sarker
Mills Wellons Staylor
Gregor von Laszewski
License
Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0
0.5.0
Cylon 0.5.0 is a major release. We are excited to present GCylon, cudf-based distributed
DataFrame for Nvidia GPUs, UCX integration, Anaconda support, and much more.
Features
Cylon C++ and Python
- Adding UCX integration with MPI
- Adding read distribution
- Changing join column naming convention to match SQL and pandas
- Adding
Dataframe.applymap
,Dataframe.isin
- Add iloc operation to DataFrame
- Adding null handling to table operators and Comparators
- Adding Equal/ distributed equal operators
- Adding array flattening
- Adding Repartition
- Adding mapreduce style group-by aggregators
- Adding table level AllGather, Gather and Broadcast operators
- Performance improvements and bug fixes
Build
- Updating to Arrow 0.5.x
- Windows build support
- MacOS build support
- Conda build is the default build
- Improving docker build
Gcylon
First release of Gcylon which supports distributed DataFrame processing on Nvidia GPUs using CuDF:
- Implemented shuffling and distributed sorting
- Distributed Join/merge
- Distributed GroupBy
- DataFrame Set operations
- Repartitioning DataFrames
- Distributed IO for reading/writing CSV, JSON and Parquet files
You can download source code from Github
Conda binaries are available in Anaconda
Commits
3344bf9 Mapreduce style group-by aggregators (#535)
50ef890 Remove minor warnings (#544)
559e8eb Adding CPU serializer (#539)
abb4404 fixed unused variable/parameter and casting warnings (#542)
62a3f08 Distributed IO (#533)
15d06d6 Bump color-string from 1.5.4 to 1.7.4 in /docs (#534)
810c4ed fixing RNG issue (#538)
fbb049b fixing build error (#536)
a10e052 Bump algoliasearch-helper from 3.3.3 to 3.6.2 in /docs (#532)
112ea97 Repartition - CPU (#526)
79c4b73 create a MacOS yml file (#530)
b9e7a8c Repartition - GPU (#528)
2191b9f fixed function name change in cudf api from gcylon test files (#529)
3e9036e Upgrading to arrow 5.0.0 (#525)
24d182a Groupby values null handling (#527)
54a5074 Null handling for Comparators (#524)
0b9516e Adding array flattening (#522)
b3fc2a2 Implemented MergeOrSort when merging sorted tables (#523)
1e061b2 Feature/equal (#499)
e378d1d reformatted gcylon codes with tab size 2, non-functional changes (#521)
8450d9b Added support for sliced tables in gather, broadcast and sorting (#520)
92b8124 Update windows.yml
1f9790d Update macos.yml
d33f9ac Update conda-actions.yml
963d491 Update c-cpp.yml
2229981 added mpi datatype dispatching for primitive data types (#519)
d9936b4 Head tail operators (#512)
ac99d00 Formatting code (#518)
fff84cc Code formatting (#517)
f32f04d Null handling in splitters and build arrays (#511)
4cab7ca Delete files from CPP example folder that are not needed (#516)
d174430 moving tutorial repo to (#514)
9cd7911 Python example cleanup (#513)
fe4caf3 Distributed sorting (#510)
2302f58 Minor improvements to the Table API (#508)
71eb80a adding new test utils (#507)
24b83dd Adding to docker docs (#498)
6f2faf8 Update conda.md
4f8f3c7 Gcylon docs (#501)
a786258 Adding contributing guide to documentation (#496)
8ab8b2d changing join column naming convention to match SQL and pandas (#487)
f18b91f improvements to ucx build from conda (#484)
912fb54 Windows build (#482)
216758a making improvements to the build (#483)
4e2894e Add functions to dataframe (#481)
1f1ddd9 Documentation update (#479)
e623315 Bump tar from 6.1.5 to 6.1.11 in /docs (#477)
1e5db7b improve docs (#476)
58c0595 removing extra examples (#474)
3c823f6 Gcylon integration (#470)
92748eb Cpp example cleanup (#475)
fa14527 Docs improvements (#469)
1306220 Bump url-parse from 1.4.7 to 1.5.3 in /docs (#473)
8234ae7 Bump path-parse from 1.0.6 to 1.0.7 in /docs (#472)
c8b435b Bump tar from 6.0.5 to 6.1.5 in /docs (#471)
1cc28dd Performance improvements (#453)
9092bbf MacOS build (#464)
d59d91e Add iloc operation to DataFrame (#465)
8d7a8dc Removed glog files from the header files (#463)
ea62eef License updates (#462)
2f56265 changed all relative Cylon header references to global (#461)
123c93c Building in conda env without using conda-build (#457)
3b3a285 Compilation document improvements (#454)
8578b1f Adding barrier at the end of the test case (#458)
e6eded5 Fix for empty df (#455)
8f14992 Fixed mpi test case (#456)
cb06998 Changes to the Docs (#451)
4ce1d7e updates to the docker readme
e011e0f enhancing readme
adfa6c0 adding read distribution (#432)
bd2e024 UCX integration (#439)
a42d04a Bump ws from 6.2.1 to 6.2.2 in /docs (#437)
710b562 Bump dns-packet from 1.3.1 to 1.3.4 in /docs (#435)
07aee74 adding new operators to DataFrame API (#429)
71e57f8 Updating to arrow 4.0 (#418)
a490dc2 changing ctx to const reference in methods (#419)
18a5447 missing docs (#428)
38534f5 0.4.1 release (#427)
10f5a6a Enabling scalars in df set_item (#425)
0be7897 Op bench refactor (#417)
ec964d8 Bug fixes in dataframe (#420)
e0ba964 Update c-cpp.yml
0200c02 adding finalize check and removing destructor finalize call. (#412)
149919c Update README.md
016c5c9 adding missing test case
5609535 Update README.md
e3ca0bf 0.4.0 release (#411)
Contributors
Ahmet Uyar
Chathura Widanage
Damitha Sandeepa Lenadora
dependabot[bot]
Hasara Maithree
Kaiying Shan
niranda perera
Supun Kamburugamuve
Vibhatha Lakmal Abeykoon
Ziyao22
License
Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0
0.4.1
Cylon 0.4.1 is a bug fix release.
License
Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0
0.4.0
Cylon 0.4.0 is a major release with the following features.
Major Features
Python
- DataFrame API similar to Pandas supporting around 40 operators commonly used in Pandas.
- Conda build and conda based binaries for Linux for installing.
- Python binding to all the operators added on the C++ level.
- Providing compute functions with both Arrow and Numpy for filtering, math operations and comparison operators.
- Added operator benchmarks.
- Added new options for CSV reading supporting all the options in PyArrow for reading CSV.
C++
- Added distributed multi-column operations on tables for join, union, intersection, set difference and sort.
- Added improved hash operations using Bytell Hash Maps. Improved performance by 2 times for union, intersection, set difference and unique.
- Added new aggregate operations for GroupBy operation (Mean, Variance, Std Dev, Quantile, NUnique, Median).
- Implemented GroupBy aggregators using CRTP (Curiously recurring template pattern).
- Improved indexing at the core by Added more types, improved performance of indexed lookups.
- Added unique distributed operator.
- Added temporal data types like DateTime, Date32 (seconds resolution), Date64 (milliseconds resolution) and TImestamp (with time zone information).
- Other performance improvements and bug fixes.
Build
- Compiling using external Apache Arrow installation (local/ pip).
Applications and Benchmarks
- Implementing a subset of TPC-XBB queries (Queries 6, 7, 9, 14, 22, 23) and the rest is ongoing.
- Applications with connections to deep learning.
You can download source code from Github
License
Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0
0.3.1
Cylon 0.3.1 is a bug fix release.
You can download source code from Github
License
Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0
0.3.0
Cylon 0.3.0 adds the following features. Please note that this release may not be backward
compatible with previous releases.
Major Features
C++
- Adding order-by and distributed table sort operations
- Multiple partitioning schemes (modulo, hash, and range)
- C++ API refactoring
- Performance improvements in the existing C++ API
Python (Pycylon)
- Exposing table operators similar to Pandas (28 new operators).
- Comparison operators
- Logical Operators
- Math operators
- Null/NA value filtering and filling
- Filtering and updating (including inplace ops)
- Schema refactoring
- Experimental indexing abstract
- Distributed Data sorting Python bindings
- Adding new examples for updated operations. (https://github.com/cylondata/cylon/tree/master/python/examples)
You can download source code from Github
Examples
License
Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0
0.2.0
Cylon 0.2.0 adds the following features. Please note that this release may not be backward
compatible with v0.1.0.
Major Features
C++
- Adding aggregates and group-by API
- Creating tables using
std::vector
s orcylon::Column
s - C++ API refactoring
- Major performance improvements in the existing C++ API
Python (Pycylon)
- Extending Cython API for extended development for other Cython/Python libraries
- Aggregates and Groupby addition
- Column name-based relational algebra operations and aggregate/groupby ops addition
- Major performance improvements in the existing Python API
Java (JCylon)
- Performance improvements
You can download source code from Github
Examples
License
Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0
Cylon Release 0.1.0
Cylon 0.1.0 is the first open-source public release of Cylon Project. We are excited to bring a high-performance
data engineering toolkit that can work as a library as well as a standalone framework. This is the first step towards building a complete toolkit designed to work with AI/ML systems and integrate with data processing systems with the
vision "data engineering everywhere".
You can download source code from Github
Who should use Cylon?
- Users of Pandas dataframes or SQL interface
- Those needing parallel data engineering
- Those needing Python C++ Java interoperability
- HPC Python (Dask) and Big Data (Kubernetes) environments
Major Features in v0.1.0
- Introducing Cylon C++ engine based on Apache Arrow.
- Cylon C++, Python (PyCylon) and Java language bindings
- Seamless integration with Pandas and NumPy
- Distributed operations using MPI
- Local and distributed operations (Select, Project, Joins, Intersection, Union, Subtract)
- Jupyter notebook support and experimental Google Colab support
Examples
License
Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0