Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VL] Global memory OOM #7249

Open
FelixYBW opened this issue Sep 14, 2024 · 0 comments
Open

[VL] Global memory OOM #7249

FelixYBW opened this issue Sep 14, 2024 · 0 comments
Labels
bug Something isn't working triage

Comments

@FelixYBW
Copy link
Contributor

Backend

VL (Velox)

Bug description

It's the new issue triggered by #6988

The root cause is Velox's sort needs to allocate a large memory buffer from global memory when spill is triggered. There should be some design issue there.

W20240914 06:04:39.696241 48552 MallocAllocator.cpp:267] [MEM] Failed to allocateBytes 256.00MB: Exceeded memory allocator limit of 3.00GB
E20240914 06:04:39.696458 48552 Exceptions.h:67] Line: /home/binweiyang/gluten/ep/build-velox/build/velox_ep/velox/common/memory/MemoryPool.cpp:1314, Function:handleAllocationFailure, Expression:  allocate failed with 256.00MB from Memory Pool[__sys_spilling__ LEAF root[__sys_root__] parent[__sys_root__] MALLOC no-usage-track thread-safe]<unlimited max capacity unlimited capacity used 0B available 0B reservation [used 0B, reserved 0B, min 0B] counters [allocs 109, frees 103, reserves 0, releases 0, collisions 0])> Failed to allocateBytes 256.00MB: Exceeded memory allocator limit of 3.00GB, Source: RUNTIME, ErrorCode: MEM_ALLOC_ERROR
24/09/14 06:04:39 ERROR [Executor task launch worker for task 1188.0 in stage 2.0 (TID 116257)] listener.ManagedReservationListener: Error reserving memory from target
org.apache.gluten.exception.GlutenException: Exception: VeloxRuntimeError
Error Source: RUNTIME
Error Code: MEM_ALLOC_ERROR
Reason: allocate failed with 256.00MB from Memory Pool[__sys_spilling__ LEAF root[__sys_root__] parent[__sys_root__] MALLOC no-usage-track thread-safe]<unlimited max capacity unlimited capacity used 0B available 0B reservation [used 0B, reserved 0B, min 0B] counters [allocs 109, frees 103, reserves 0, releases 0, collisions 0])> Failed to allocateBytes 256.00MB: Exceeded memory allocator limit of 3.00GB
Retriable: True
Context: Operator: OrderBy[1] 1
Function: handleAllocationFailure
File: /home/binweiyang/gluten/ep/build-velox/build/velox_ep/velox/common/memory/MemoryPool.cpp
Line: 1314
Stack trace:
# 0  _ZN8facebook5velox7process10StackTraceC1Ei
# 1  _ZN8facebook5velox14VeloxExceptionC1EPKcmS3_St17basic_string_viewIcSt11char_traitsIcEES7_S7_S7_bNS1_4TypeES7_
# 2  _ZN8facebook5velox6detail14veloxCheckFailINS0_17VeloxRuntimeErrorERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEvRKNS1_18VeloxCheckFailArgsET0_
# 3  _ZN8facebook5velox6memory14MemoryPoolImpl23handleAllocationFailureERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
# 4  _ZN8facebook5velox6memory14MemoryPoolImpl8allocateEl
# 5  _ZN8facebook5velox4exec7Spiller13fillSpillRunsEPNS1_20RowContainerIteratorE
# 6  _ZN8facebook5velox4exec7Spiller5spillEPKNS1_20RowContainerIteratorE
# 7  _ZN8facebook5velox4exec10SortBuffer10spillInputEv
# 8  _ZN8facebook5velox4exec7OrderBy7reclaimEmRNS0_6memory15MemoryReclaimer5StatsE
# 9  _ZNSt17_Function_handlerIFlvEZN8facebook5velox4exec8Operator15MemoryReclaimer7reclaimEPNS2_6memory10MemoryPoolEmmRNS6_15MemoryReclaimer5StatsEEUlvE_E9_M_invokeERKSt9_Any_data
# 10 _ZN8facebook5velox6memory15MemoryReclaimer3runERKSt8functionIFlvEERNS2_5StatsE
# 11 _ZN8facebook5velox4exec8Operator15MemoryReclaimer7reclaimEPNS0_6memory10MemoryPoolEmmRNS4_15MemoryReclaimer5StatsE
# 12 _ZN8facebook5velox6memory15MemoryReclaimer7reclaimEPNS1_10MemoryPoolEmmRNS2_5StatsE
# 13 _ZN8facebook5velox4exec23ParallelMemoryReclaimer7reclaimEPNS0_6memory10MemoryPoolEmmRNS3_15MemoryReclaimer5StatsE
# 14 _ZN8facebook5velox6memory15MemoryReclaimer7reclaimEPNS1_10MemoryPoolEmmRNS2_5StatsE
# 15 _ZN8facebook5velox4exec4Task15MemoryReclaimer11reclaimTaskERKSt10shared_ptrIS2_EmmRNS0_6memory15MemoryReclaimer5StatsE
# 16 _ZN8facebook5velox4exec4Task15MemoryReclaimer7reclaimEPNS0_6memory10MemoryPoolEmmRNS4_15MemoryReclaimer5StatsE
# 17 _ZN8facebook5velox6memory15MemoryReclaimer7reclaimEPNS1_10MemoryPoolEmmRNS2_5StatsE
# 18 _ZN6gluten20ListenableArbitrator14shrinkCapacityEmbb
# 19 _ZN6gluten24WholeStageResultIterator14spillFixedSizeEl
# 20 Java_org_apache_gluten_vectorized_ColumnarBatchOutIterator_nativeSpill
# 21 0x00007ff1f89bf427

@zhztheplayer

Spark version

None

Spark configurations

No response

System information

No response

Relevant logs

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
None yet
Development

No branches or pull requests

1 participant