Releases: ray-project/ray
Ray-2.37.0
Ray Libraries
Ray Data
💫 Enhancements:
- Simplify custom metadata provider API (#47575)
- Change counts of metrics to rates of metrics (#47236)
- Throw exception for non-streaming HF datasets with "override_num_blocks" argument (#47559)
- Refactor custom optimizer rules (#47605)
🔨 Fixes:
- Remove ineffective retry code in
plan_read_op
(#47456) - Fix incorrect pending task size if outputs are empty (#47604)
Ray Train
💫 Enhancements:
- Update run status and add stack trace to
TrainRunInfo
(#46875)
Ray Serve
💫 Enhancements:
- Allow control of some serve configuration via env vars (#47533)
- [serve] Faster detection of dead replicas (#47237)
🔨 Fixes:
- [Serve] fix component id logging field (#47609)
RLlib
💫 Enhancements:
- New API stack:
- Add restart-failed-env option to EnvRunners. (#47608)
- Offline RL: Store episodes in state form. (#47294)
- Offline RL: Replace GAE in MARWILOfflinePreLearner with
GeneralAdvantageEstimation
connector in learner pipeline. (#47532) - Off-policy algos: Add episode sampling to EpisodeReplayBuffer. (#47500)
- RLModule APIs: Add
SelfSupervisedLossAPI
for RLModules that bring their own loss andInferenceOnlyAPI
. (#47581, #47572)
Ray Core
💫 Enhancements:
- [aDAG] Allow custom NCCL group for aDAG (#47141)
- [aDAG] support buffered input (#47272)
- [aDAG] Support multi node multi reader (#47480)
- [Core] Make is_gpu, is_actor, root_detached_id fields late bind to workers. (#47212)
- [Core] Reconstruct actor to run lineage reconstruction triggered actor task (#47396)
- [Core] Optimize GetAllJobInfo API for performance (#47530)
🔨 Fixes:
- [aDAG] Fix ranks ordering for custom NCCL group (#47594)
Ray Clusters
📖 Documentation:
- [KubeRay] add a guide for deploying vLLM with RayService (#47038)
Thanks
Many thanks to all those who contributed to this release!
@ruisearch42, @andrewsykim, @timkpaine, @rkooo567, @WeichenXu123, @GeneDer, @sword865, @simonsays1980, @angelinalg, @sven1977, @jjyao, @woshiyyya, @aslonnie, @zcin, @omatthew98, @rueian, @khluu, @justinvyu, @bveeramani, @nikitavemuri, @chris-ray-zhang, @liuxsh9, @xingyu-long, @peytondmurray, @rynewang
Ray-2.36.1
Ray-2.36.0
Ray Libraries
Ray Data
💫 Enhancements:
- Remove limit on number of tasks launched per scheduling step (#47393)
- Allow user-defined Exception to be caught. (#47339)
🔨 Fixes:
- Display pending actors separately in the progress bar and not count them towards running resources (#46384)
- Fix bug where
arrow_parquet_args
aren't used (#47161) - Skip empty JSON files in
read_json()
(#47378) - Remove remote call for initializing
Datasource
inread_datasource()
(#47467) - Remove dead
from_*_operator
modules (#47457) - Release test fixes
- Add
AWS ACCESS_DENIED
as retryable exception for multi-node Data+Train benchmarks (#47232) - Get AWS credentials with boto (#47352)
- Use worker node instead of head node for
read_images_comparison_microbenchmark_single_node
release test (#47228)
📖 Documentation:
- Add docstring to explain
Dataset.deserialize_lineage
(#47203) - Add a comment explaining the bundling behavior for
map_batches
with default batch_size (#47433)
Ray Train
💫 Enhancements:
- Decouple device-related modules and add Huawei NPU support to Ray Train (#44086)
🔨 Fixes:
- Update TORCH_NCCL_ASYNC_ERROR_HANDLING env var (#47292)
📖 Documentation:
- Add missing Train public API reference (#47134)
Ray Tune
📖 Documentation:
- Add missing Tune public API references (#47138)
Ray Serve
💫 Enhancements:
- Mark proxy as unready when its routers are aware of zero replicas (#47002)
- Setup default serve logger (#47229)
🔨 Fixes:
- Allow get_serve_logs_dir to run outside of Ray's context (#47224)
- Use serve logger name for logs in serve (#47205)
📖 Documentation:
- [HPU] [Serve] [experimental] Add vllm HPU support in vllm example (#45893)
🏗 Architecture refactoring:
- Remove support for nested DeploymentResponses (#47209)
RLlib
🎉 New Features:
- New API stack: Add CQL algorithm. (#47000, #47402)
- New API stack: Enable GPU and multi-GPU support for DQN/SAC/CQL. (#47179)
💫 Enhancements:
- New API stack: Offline RL enhancements: #47195, #47359
- Enhance new API stack stability: #46324, #47196, #47245, #47279
- Fix large batch size for synchronous algos (e.g. PPO) after EnvRunner failures. (#47356)
- Add torch.compile config options to old API stack. (#47340)
- Add kwargs to torch.nn.parallel.DistributedDataParallel (#47276)
- Enhanced CI stability: #47197, #47249
📖 Documentation:
- New API stack example scripts:
- Remove "new API stack experimental" hint from docs. (#47301)
🏗 Architecture refactoring:
- Remove 2nd Learner ConnectorV2 pass from PPO (#47401)
- Add separate learning rates for policy and alpha to SAC. (#47078)
🔨 Fixes:
Ray Core
💫 Enhancements:
- [ADAG] Raise proper error message for nccl within the same actor (#47250)
- [ADAG] Support multi-read of the same shm channel (#47311)
- Log why core worker is not idle during HandleExit (#47300)
- Add PREPARED state for placement groups in GCS for better fault tolerance. (#46858)
🔨 Fixes:
- Fix ray_unintentional_worker_failures_total to only count unintentional worker failures (#47368)
- Fix runtime env race condition when uploading the same package concurrently (#47482)
Dashboard
🔨 Fixes:
- Performance optimizations for dashboard backend logic (#47392) (#47367) (#47160) (#47213)
- Refactor to simplify dashboard backend logic (#47324)
Docs
💫 Enhancements:
- Add sphinx-autobuild and documentation for make local (#47275): Speed up of local docs builds with
make local
. - Add Algolia search to docs (#46477)
- Update PyTorch Mnist Training doc for KubeRay 1.2.0 (#47321)
- Life-cycle of documentation policy of Ray APIs
Thanks
Many thanks to all those who contributed to this release!
@GeneDer, @Bye-legumes, @nikitavemuri, @kevin85421, @MortalHappiness, @LeoLiao123, @saihaj, @rmcsqrd, @bveeramani, @zcin, @matthewdeng, @raulchen, @mattip, @jjyao, @ruisearch42, @scottjlee, @can-anyscale, @khluu, @aslonnie, @rynewang, @edoakes, @zhanluxianshen, @venkatram-dev, @c21, @allenyin55, @alexeykudinkin, @snehakottapalli, @BitPhinix, @hongchaodeng, @dengwxn, @liuxsh9, @simonsays1980, @peytondmurray, @KepingYan, @bryant1410, @woshiyyya, @sven1977
Ray-2.35.0
Notice: Starting from this release, pip install ray[all]
will not include ray[cpp]
, and will not install the respective ray-cpp
package. To install everything that includes ray-cpp
, one can use pip install ray[cpp-all]
instead.
Ray Libraries
Ray Data
🎉 New Features:
- Upgrade supported Arrow version from 16 to 17 (#47034)
- Add support for reading from Iceberg (#46889)
💫 Enhancements:
- Various Progress Bar UX improvements (#46816, #46801, #46826, #46692, #46699, #46974, #46928, #47029, #46924, #47120, #47095, #47106)
- Try get
size_bytes
from metadata and consolidate metadata methods (#46862) - Improve warning message when read task is large (#46942)
- Extend API to enable passing sample weights via ray.dataset.to_tf (#45701)
- Add a parameter to allow overriding LanceDB scanner options (#46975)
- Add failure retry logic for read_lance (#46976)
- Clarify warning for reading old Parquet data (#47049)
- Move datasource implementations to
_internal
subpackage (#46825) - Handle logs from tensor extensions (#46943)
🔨 Fixes:
- Change type of
DataContext.retried_io_errors
from tuple to list (#46884) - Make Parquet tests more robust and expose Parquet logic (#46944)
- Change pickling log level from warning to debug (#47032)
- Add validation for shuffle arg (#47055)
- Fix validation bug when size=0 in ActorPoolStrategy (#47072)
- Fix exception in async map (#47110)
- Fix wrong metrics group for
Object Store Memory
metrics on Ray Data Dashboard (#47170) - Handle errors in SplitCoordinator when generating a new epoch (#47176)
📖 Documentation:
Ray Train
💫 Enhancements:
Ray Tune
🔨 Fixes:
- [RLlib; Tune] Fix WandB metric overlap after restore from checkpoint. (#46897)
Ray Serve
💫 Enhancements:
- Improved handling of replica death and replica unavailability in deployment handle routers before controller restarts replica (#47008)
- Eagerly create routers in proxy for better GCS fault tolerance (#47031)
- Immediately send ping in router when receiving new replica set (#47053)
🏗 Architecture refactoring:
- Deprecate passing arguments that contain
DeploymentResponses
in nested objects to downstream deployment handle calls (#46806)
RLlib
🎉 New Features:
- Offline RL on the new API stack:
💫 Enhancements:
- Add ObservationPreprocessor (ConnectorV2). (#47077)
🔨 Fixes:
- New API stack: Fix IMPALA/APPO + LSTM for single- and multi-GPU. (#47132, #47158)
- Various bug fixes: #46898, #47047, #46963, #47021, #46897
- Add more control to Algorithm.add_module/policy methods. (#46932, #46836)
📖 Documentation:
- Example scripts for new API stack:
- New API stack documentation:
🏗 Architecture refactoring:
- Rename MultiAgent...RLModule... into MultiRL...Module for more generality. (#46840)
- Add learner_only flag to RLModuleConfig/Spec and simplify creation of RLModule specs from algo-config. (#46900)
Ray Core
💫 Enhancements:
- Emit total lineage bytes metrics (#46725)
- Adding accelerator type H100 (#46823)
- More structured logging in core worker (#46906)
- Change all callbacks to move to save copies. (#46971)
- Add ray[adag] option to pip install (#47009)
🔨 Fixes:
- Fix dashboard process reporting on windows (#45578)
- Fix Ray-on-Spark cluster crashing bug when user cancels cell execution (#46899)
- Fix PinExistingReturnObject segfault by passing owner_address (#46973)
- Fix raylet CHECK failure from runtime env creation failure. (#46991)
- Fix typo in memray command (#47006)
- [ADAG] Fix for asyncio outputs (#46845)
📖 Documentation:
- Clarify behavior of placement_group_capture_child_tasks in docs (#46885)
- Update ray.available_resources() docstring (#47018)
🏗 Architecture refactoring:
- Async APIs for the New GcsClient. (#46788)
- Replace GCS stubs in the dashboard to use NewGcsAioClient. (#46846)
Dashboard
💫 Enhancements:
- Polish and minor improvements to the Serve page (#46811)
🔨 Fixes:
- Fix CPU/GPU/RAM not being reported correctly on Windows (#44578)
Docs
💫 Enhancements:
- Add more information about developer tooling for docs contributions (#46636), including
esbonio
section
🔨 Fixes:
- Use PyData Sphinx theme version switcher (#46936)
Thanks
Many thanks to all those who contributed to this release!
@simonsays1980, @bveeramani, @tungh2, @zcin, @xingyu-long, @WeichenXu123, @aslonnie, @MaxVanDijck, @can-anyscale, @galenhwang, @omatthew98, @matthewdeng, @raulchen, @sven1977, @shrekris-anyscale, @deepyaman, @alexeykudinkin, @stephanie-wang, @kevin85421, @ruisearch42, @hongchaodeng, @khluu, @alanwguo, @hongpeng-guo, @saihaj, @Superskyyy, @tespent, @slfan1989, @justinvyu, @rynewang, @nikitavemuri, @amogkam, @mattip, @dev-goyal, @ryanaoleary, @peytondmurray, @edoakes, @venkatajagannath, @jjyao, @cristianjd, @scottjlee, @Bye-legumes
Release 2.34.0 Notes
Ray Libraries
Ray Data
💫 Enhancements:
- Add better support for UDF returns from list of datetime objects (#46762)
🔨 Fixes:
- Remove read task warning if size bytes not set in metadata (#46765)
📖 Documentation:
Ray Train
🔨 Fixes:
- Sort workers by node ID rather than by node IP (#46163)
🏗 Architecture refactoring:
- Remove dead RayDatasetSpec (#46764)
RLlib
🎉 New Features:
- Offline RL support on new API stack:
💫 Enhancements:
- Move DQN into the TargetNetworkAPI (and deprecate
RLModuleWithTargetNetworksInterface
). (#46752)
🔨 Fixes:
- Numpy version fix: Rename all np.product usage to np.prod (#46317)
📖 Documentation:
- Examples for new API stack: Add 2 (count-based) curiosity examples. (#46737)
- Remove RLlib CLI from docs (soon to be deprecated and replaced by python API). (#46724)
🏗 Architecture refactoring:
- Cleanup, rename, clarify: Algorithm.workers/evaluation_workers, local_worker(), etc.. (#46726)
Ray Core
🏗 Architecture refactoring:
- New python GcsClient binding (#46186)
Many thanks to all those who contributed to this release! @KyleKoon, @ruisearch42, @rynewang, @sven1977, @saihaj, @aslonnie, @bveeramani, @akshay-anyscale, @kevin85421, @omatthew98, @anyscalesam, @MaxVanDijck, @justinvyu, @simonsays1980, @can-anyscale, @peytondmurray, @scottjlee
Ray-2.33.0
Ray Libraries
Ray Core
💫 Enhancements:
- Add "last exception" to error message when GCS connection fails in ray.init() (#46516)
🔨 Fixes:
- Add object back to memory store when object recovery is skipped (#46460)
- Task status should start with PENDING_ARGS_AVAIL when retry (#46494)
- Fix ObjectFetchTimedOutError (#46562)
- Make working_dir support files created before 1980 (#46634)
- Allow full path in conda runtime env. (#45550)
- Fix worker launch time formatting in state api (#43516)
Ray Data
🎉 New Features:
- Deprecate Dataset.get_internal_block_refs() (#46455)
- Add read API for reading Databricks table with Delta Sharing (#46072)
- Add support for objects to Arrow blocks (#45272)
💫 Enhancements:
- Change offsets to int64 and change to LargeList for ArrowTensorArray (#45352)
- Prevent from_pandas from combining input blocks (#46363)
- Update Dataset.count() to avoid unnecessarily keeping
BlockRef
s in-memory (#46369) - Use Set to fix inefficient iteration over Arrow table columns (#46541)
- Add AWS Error UNKNOWN to list of retried write errors (#46646)
- Always print traceback for internal exceptions (#46647)
- Allow unknown estimate of operator output bundles and
ProgressBar
totals (#46601) - Improve filesystem retry coverage (#46685)
🔨 Fixes:
- Replace lambda mutable default arguments (#46493)
📖 Documentation:
Ray Train
💫 Enhancements:
- Update run status and actor status for train runs. (#46395)
🔨 Fixes:
- Replace lambda default arguments (#46576)
📖 Documentation:
- Add MNIST training using KubeRay doc page (#46123)
- Add example of pre-training Llama model on Intel Gaudi (#45459)
- Fix tensorflow example by using ScalingConfig (#46565)
Ray Tune
🔨 Fixes:
- Replace lambda default arguments (#46596)
Ray Serve
🎉 New Features:
- Fully deprecate
target_num_ongoing_requests_per_replica
andmax_concurrent_queries
, respectively replaced bymax_ongoing_requests
andtarget_ongoing_requests
(#46392 and #46427) - Configure the task launched by the controller to build an application with Serve’s logging config (#46347)
RLlib
💫 Enhancements:
- Moving sampling coordination for
batch_mode=complete_episodes
tosynchronous_parallel_sample
. (#46321) - Enable complex action spaces with stateful modules. (#46468)
🏗 Architecture refactoring:
- Enable multi-learner setup for hybrid stack BC. (#46436)
- Introduce Checkpointable API for RLlib components and subcomponents. (#46376)
🔨 Fixes:
- Replace Mapping typehint with Dict: #46474
📖 Documentation:
- More example scripts for new API stack: Two separate optimizers (w/ different learning rates). (#46540) and custom loss function. (#46445)
Dashboard
🔨 Fixes:
- Task end time showing the incorrect time (#46439)
- Events Table rows having really bad spacing (#46701)
- UI bugs in the serve dashboard page (#46599)
Thanks
Many thanks to all those who contributed to this release!
@alanwguo, @hongchaodeng, @anyscalesam, @brucebismarck, @bt2513, @woshiyyya, @terraflops1048576, @lorenzoritter, @omrishiv, @davidxia, @cchen777, @nono-Sang, @jackhumphries, @aslonnie, @JoshKarpel, @zjregee, @bveeramani, @khluu, @Superskyyy, @liuxsh9, @jjyao, @ruisearch42, @sven1977, @harborn, @saihaj, @zcin, @can-anyscale, @veekaybee, @chungen04, @WeichenXu123, @GeneDer, @sergey-serebryakov, @Bye-legumes, @scottjlee, @rynewang, @kevin85421, @cristianjd, @peytondmurray, @MortalHappiness, @MaxVanDijck, @simonsays1980, @mjovanovic9999
Ray-2.32.0
Highlight: aDAG Developer Preview
This is a new Ray Core specific feature called Ray accelerated DAGs (aDAGs).
- aDAGs give you a Ray Core-like API but with extensibility to pre-compile execution paths across pre-allocated resources on a Ray Cluster to possible benefits for optimization on throughput and latency. Some practical examples include:
- Up to 10x lower task execution time on single-node.
- Native support for GPU-GPU communication, via NCCL.
- This is still very early, but please reach out on #ray-core on Ray Slack to learn more!
Ray Libraries
Ray Data
💫 Enhancements:
- Support async callable classes in
map_batches()
(#46129)
🔨 Fixes:
- Ensure
InputDataBuffer
doesn't free block references (#46191) MapOperator.num_active_tasks
should exclude pending actors (#46364)- Fix progress bars being displayed as partially completed in Jupyter notebooks (#46289)
📖 Documentation:
- Fix docs:
read_api.py
docstring (#45690) - Correct API annotation for
tfrecords_datasource
(#46171) - Fix broken links in
README
and inray.data.Dataset
(#45345)
Ray Train
📖 Documentation:
- Update PyTorch Data Ingestion User Guide (#45421)
Ray Serve
💫 Enhancements:
- Optimize
ServeController.get_app_config()
(#45878) - Change default for max and target ongoing requests (#45943)
- Integrate with Ray structured logging (#46215)
- Allow configuring handle cache size and controller max concurrency (#46278)
- Optimize
DeploymentDetails.deployment_route_prefix_not_set()
(#46305)
RLlib
🎉 New Features:
- APPO on new API stack (w/
EnvRunners
). (#46216)
💫 Enhancements:
- Stability: APPO, SAC, and DQN activate multi-agent learning tests (#45542, #46299)
- Make Tune trial ID available in
EnvRunners
(and callbacks). (#46294) - Add
env-
andagent_steps
to custom evaluation function. (#45652) - Remove default-metrics from Algorithm (tune does NOT error anymore if any stop-metric is missing). (#46200)
🔨 Fixes:
- Various bug fixes: #45542
📖 Documentation:
- Example for new API stack: Offline RL (BC) training on single-agent, while evaluating w/ multi-agent setup. (#46251)
- Example for new API stack: Custom RLModule with an LSTM. (#46276)
Ray Core
🎉 New Features:
- aDAG Developer Preview.
💫 Enhancements:
- Allow env setup logger encoding (#46242)
- ray list tasks filter state and name on GCS side (#46270)
- Log ray version and ray commit during GCS start (#46341)
🔨 Fixes:
- Decrement lineage ref count of an actor when the actor task return object reference is deleted (#46230)
- Fix negative ALIVE actors metric and introduce IDLE state (#45718)
psutil
process attrnum_fds
is not available on Windows (#46329)
Dashboard
🎉 New Features:
- Added customizable refresh frequency for metrics on Ray Dashboard (#44037)
💫 Enhancements:
- Upgraded to MUIv5 and React 18 (#45789)
🔨 Fixes:
- Fix for multi-line log items breaking log viewer rendering (#46391)
- Fix for UI inconsistency when a job submission creates more than one Ray job. (#46267)
- Fix filtering by job id for tasks API not filtering correctly. (#45017)
Docs
🔨 Fixes:
- Re-enabled automatic cross-reference link checking for Ray documentation, with Sphinx nitpicky mode (#46279)
- Enforced naming conventions for public and private APIs to maintain accuracy, starting with Ray Data API documentation (#46261)
📖 Documentation:
- Upgrade Python 3.12 support to alpha, marking the release of the Ray wheel to PyPI and conducting a sanity check of the most critical tests.
Thanks
Many thanks to all those who contributed to this release!
@stephanie-wang, @MortalHappiness, @aslonnie, @ryanaoleary, @jjyao, @jackhumphries, @nikitavemuri, @woshiyyya, @JoshKarpel, @ruisearch42, @sven1977, @alanwguo, @GeneDer, @saihaj, @raulchen, @liuxsh9, @khluu, @cristianjd, @scottjlee, @bveeramani, @zcin, @simonsays1980, @SumanthRH, @davidxia, @can-anyscale, @peytondmurray, @kevin85421
Ray-2.31.0
Ray Libraries
Ray Data
🔨 Fixes:
- Fixed bug where
preserve_order
doesn’t work with file reads (#46135)
📖 Documentation:
- Added documentation for
dataset.Schema
(#46170)
Ray Train
💫 Enhancements:
- Add API for Ray Train run stats (#45711)
Ray Tune
💫 Enhancements:
- Missing stopping criterion should not error (just warn). (#45613)
📖 Documentation:
- Fix broken references in Ray Tune documentation (#45233)
Ray Serve
WARNING: the following default values will change in Ray 2.32:
- Default for
max_ongoing_requests
will change from 100 to 5. - Default for
target_ongoing_requests
will change from 1 to 2.
💫 Enhancements:
- Optimize DeploymentStateManager.get_deployment_statuses (#45872)
🔨 Fixes:
- Fix logging error on passing traceback object into exc_info (#46105)
- Run del even if constructor is still in-progress (#45882)
- Spread replicas with custom resources in torch tune serve release test (#46093)
- [1k release test] don't run replicas on head node (#46130)
📖 Documentation:
- Remove todo since issue is fixed (#45941)
RLlib
🎉 New Features:
- IMPALA runs on the new API stack (with EnvRunners and ConnectorV2s). (#42085)
- SAC/DQN: Prioritized multi-agent episode replay buffer. (#45576)
💫 Enhancements:
- New API stack stability: Add systematic CI learning tests for all possible combinations of: [PPO|IMPALA] + [1CPU|2CPU|1GPU|2GPU] + [single-agent|multi-agent]. (#46162, #46161)
📖 Documentation:
- New API stack: Example script for action masking (#46146)
- New API stack: PyFlight example script cleanup (#45956)
- Old API stack: Enhanced ONNX example (+LSTM). (#43592)
Ray Core and Ray Clusters
Ray Core
💫 Enhancements:
- [runtime-env] automatically infer worker path when starting worker in container (#42304)
🔨 Fixes:
- On GCS restart, destroy not forget the unused workers. Fixing PG leaks. (#45854)
- Cancel lease requests before returning a PG bundle (#45919)
- Fix boost fiber stack overflow (#46133)
Thanks
Many thanks to all those who contributed to this release!
@jjyao, @kevin85421, @vincent-pli, @khluu, @simonsays1980, @sven1977, @rynewang, @can-anyscale, @richardsliu, @jackhumphries, @alexeykudinkin, @bveeramani, @ruisearch42, @shrekris-anyscale, @stephanie-wang, @matthewdeng, @zcin, @hongchaodeng, @ryanaoleary, @liuxsh9, @GeneDer, @aslonnie, @peytondmurray, @Bye-legumes, @woshiyyya, @scottjlee, @JoshKarpel
Ray-2.30.0
Ray Libraries
Ray Data
💫 Enhancements:
- Improve fractional CPU/GPU formatting (#45673)
- Use sampled fragments to estimate Parquet reader batch size (#45749)
- Refactoring ParquetDatasource and metadata fetching logic (#45728, #45727, #45733, #45734, #45767)
- Refactor planner.py (#45706)
Ray Tune
💫 Enhancements:
- Change the behavior of a missing stopping criterion metric to warn instead of raising an error. This enables the use case of reporting different sets of metrics on different iterations (ex: a separate set of training and validation metrics). (#45613)
Ray Serve
💫 Enhancements:
- Create internal request id to track request objects (#45761)
RLLib
💫 Enhancements:
- Stability: DreamerV3 weekly release test (#45654); Add "official" benchmark script for Atari PPO benchmarks. (#45697)
- Enhance env-rendering callback (#45682)
🔨 Fixes:
- Bug fix in new MetricsLogger API: EMA stats w/o window would lead to infinite list mem-leak. (#45752)
- Various other bug fixes: (#45819, #45820, #45683, #45651, #45753)
📖 Documentation:
Ray Core
🎉 New Features:
- Alpha release of job level logging configuration: users can now config the user logging to be logfmt format with logging context attached. (#45344)
💫 Enhancements:
- Integrate amdsmi in AMDAcceleratorManager (#44572)
🔨 Fixes:
- Fix the C++ GcsClient Del not respecting del_by_prefix (#45604)
- Fix exit handling of FiberState threads (#45834)
Dashboard
💫 Enhancements:
- Parse out json logs (#45853)
Many thanks to all those who contributed to this release: @liuxsh9, @peytondmurray, @pcmoritz, @GeneDer, @saihaj, @khluu, @aslonnie, @yucai, @vickytsang, @can-anyscale, @bthananjeyan, @raulchen, @hongchaodeng, @x13n, @simonsays1980, @peterghaddad, @kevin85421, @rynewang, @angelinalg, @jjyao, @BenWilson2, @jackhumphries, @zcin, @chris-ray-zhang, @c21, @shrekris-anyscale, @alanwguo, @stephanie-wang, @Bye-legumes, @sven1977, @WeichenXu123, @bveeramani, @nikitavemuri
Ray-2.24.0
Ray Libraries
Ray Data
🎉 New Features:
- Allow user to configure timeout for actor pool (#45508)
- Add override_num_blocks to from_pandas and perform auto-partition (#44937)
- Upgrade Arrow version to 16 in CI (#45565)
💫 Enhancements:
- Clarify that num_rows_per_file isn't strict (#45529)
- Record more telemetry for newly added datasources (#45647)
- Avoid pickling LanceFragment when creating read tasks for Lance (#45392)
Ray Train
📖 Documentation:
- [HPU] Add example of Stable Diffusion fine-tuning and serving on Intel Gaudi (#45217)
- [HPU] Add example of Llama-2 fine-tuning on Intel Gaudi (#44667)
Ray Tune
🏗 Architecture refactoring:
- Improve excessive syncing warning and deprecate TUNE_RESULT_DIR, RAY_AIR_LOCAL_CACHE_DIR, local_dir (#45210)
Ray Serve
💫 Enhancements:
- Clean up Serve proxy files (#45486)
📖 Documentation:
- vllm example to serve llm models (#45430)
RLLib
💫 Enhancements:
- DreamerV3 on tf: Bug fix, so it can run again with tf==2.11.1 (2.11.0 is not available anymore) (#45419); Added weekly release test for DreamerV3.
- Added support for multi-agent off-policy algorithms (DQN and SAC) in the new (#45182)
- Config option for APPO/IMPALA to change number of GPU-loader threads (#45467)
🔨 Fixes:
- Various MetricsLogger bug fixes (#45543, #45585, #45575)
- Other fixes: #45588, #45617, #45517, #45465
📖 Documentation:
- Example script for new API stack: How-to restore 1 of n agents from a checkpoint. (#45462)
- Example script for new API stack: Autoregressive action module. #45525
Ray Core
💫 Enhancements:
- Improve node death observability (#45320, #45357, #45533, #45644, #45497)
- Ray c++ backend structured logging (#44468)
🔨 Fixes:
- Fix worker crash when getting actor name from runtime context (#45194)
- log dedup should not dedup number only lines (#45385)
📖 Documentation:
- Improve doc for
--object-store-memory
to describe how the default value is set (#45301)
Dashboard
🔨 Fixes:
- Move Job package uploading to another thread to unblock the event loop. (#45282)
Many thanks to all those who contributed to this release: @maxliuofficial, @simonsays1980, @GeneDer, @dudeperf3ct, @khluu, @justinvyu, @andrewsykim, @Catch-Bull, @zcin, @bveeramani, @rynewang, @angelinalg, @matthewdeng, @jjyao, @kira-lin, @harborn, @hongchaodeng, @peytondmurray, @aslonnie, @timkpaine, @982945902, @maxpumperla, @stephanie-wang, @ruisearch42, @alanwguo, @can-anyscale, @c21, @Atry, @KamenShah, @sven1977, @raulchen