You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[2024-09-12 15:20:08] INFO auto_device.py:88: Not found device: cuda:0
[2024-09-12 15:20:10] INFO auto_device.py:88: Not found device: rocm:0
[2024-09-12 15:20:12] INFO auto_device.py:88: Not found device: metal:0
[2024-09-12 15:20:13] INFO auto_device.py:79: Found device: vulkan:0
[2024-09-12 15:20:15] INFO auto_device.py:88: Not found device: opencl:0
[2024-09-12 15:20:15] INFO auto_device.py:35: Using device: vulkan:0
[2024-09-12 15:20:15] INFO download_cache.py:227: Downloading model from HuggingFace: HF://mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC
[2024-09-12 15:20:15] INFO download_cache.py:29: MLC_DOWNLOAD_CACHE_POLICY = ON. Can be one of: ON, OFF, REDO, READONLY
[2024-09-12 15:20:15] INFO download_cache.py:166: Weights already downloaded: C:\Users\username\AppData\Local\mlc_llm\model_weights\hf\mlc-ai\Llama-3-8B-Instruct-q4f16_1-MLC
[2024-09-12 15:20:15] INFO jit.py:43: MLC_JIT_POLICY = ON. Can be one of: ON, OFF, REDO, READONLY
[2024-09-12 15:20:15] INFO jit.py:118: Compiling using commands below:
[2024-09-12 15:20:15] INFO jit.py:119: 'C:\ProgramData\miniconda3\envs\mlc-prebuilt\python.exe' -m mlc_llm compile 'C:\Users\username\AppData\Local\mlc_llm\model_weights\hf\mlc-ai\Llama-3-8B-Instruct-q4f16_1-MLC' --opt 'flashinfer=1;cublas_gemm=1;faster_transformer=0;cudagraph=1;cutlass=1;ipc_allreduce_strategy=NONE' --overrides '' --device vulkan:0 --output 'C:\Users\username\AppData\Local\Temp\tmpr7njn151\lib.dll'
[2024-09-12 15:20:18] INFO auto_config.py:70: Found model configuration: C:\Users\username\AppData\Local\mlc_llm\model_weights\hf\mlc-ai\Llama-3-8B-Instruct-q4f16_1-MLC\mlc-chat-config.json
[2024-09-12 15:20:18] INFO auto_target.py:91: Detecting target device: vulkan:0
[2024-09-12 15:20:18] INFO auto_target.py:93: Found target: {"thread_warp_size": runtime.BoxInt(1), "supports_float32": runtime.BoxBool(true), "supports_int16": runtime.BoxBool(true), "max_threads_per_block": runtime.BoxInt(1024), "supports_storage_buffer_storage_class": runtime.BoxBool(true), "supports_int8": runtime.BoxBool(true), "supports_8bit_buffer": runtime.BoxBool(true), "supports_int64": runtime.BoxBool(true), "max_num_threads": runtime.BoxInt(256), "kind": "vulkan", "tag": "", "max_shared_memory_per_block": runtime.BoxInt(32768), "supports_16bit_buffer": runtime.BoxBool(true), "supports_int32": runtime.BoxBool(true), "keys": ["vulkan", "gpu"], "supports_float16": runtime.BoxBool(true)}
[2024-09-12 15:20:18] INFO auto_target.py:110: Found host LLVM triple: x86_64-pc-windows-msvc
[2024-09-12 15:20:18] INFO auto_target.py:111: Found host LLVM CPU: alderlake
[2024-09-12 15:20:18] INFO auto_config.py:154: Found model type: llama. Use --model-type to override.
Compiling with arguments:
--config LlamaConfig(hidden_size=4096, intermediate_size=14336, num_attention_heads=32, num_hidden_layers=32, rms_norm_eps=1e-05, vocab_size=128256, tie_word_embeddings=False, position_embedding_base=500000.0, rope_scaling=None, context_window_size=8192, prefill_chunk_size=2048, num_key_value_heads=8, head_dim=128, tensor_parallel_shards=1, pipeline_parallel_stages=1, max_batch_size=80, kwargs={})
--quantization GroupQuantize(name='q4f16_1', kind='group-quant', group_size=32, quantize_dtype='int4', storage_dtype='uint32', model_dtype='float16', linear_weight_layout='NK', quantize_embedding=True, quantize_final_fc=True, num_elem_per_storage=8, num_storage_per_group=4, max_int_value=7, tensor_parallel_shards=0)
--model-type llama
--target {"thread_warp_size": runtime.BoxInt(1), "host": {"mtriple": "x86_64-pc-windows-msvc", "tag": "", "kind": "llvm", "mcpu": "alderlake", "keys": ["cpu"]}, "supports_float32": runtime.BoxBool(true), "supports_int16": runtime.BoxBool(true), "max_threads_per_block": runtime.BoxInt(1024), "supports_storage_buffer_storage_class": runtime.BoxBool(true), "supports_int8": runtime.BoxBool(true), "supports_8bit_buffer": runtime.BoxBool(true), "supports_int64": runtime.BoxBool(true), "max_num_threads": runtime.BoxInt(256), "kind": "vulkan", "tag": "", "max_shared_memory_per_block": runtime.BoxInt(32768), "supports_16bit_buffer": runtime.BoxBool(true), "supports_int32": runtime.BoxBool(true), "keys": ["vulkan", "gpu"], "supports_float16": runtime.BoxBool(true)}
--opt flashinfer=0;cublas_gemm=0;faster_transformer=0;cudagraph=0;cutlass=0;ipc_allreduce_strategy=NONE
--system-lib-prefix ""
--output C:\Users\username\AppData\Local\Temp\tmpr7njn151\lib.dll
--overrides context_window_size=None;sliding_window_size=None;prefill_chunk_size=None;attention_sink_size=None;max_batch_size=None;tensor_parallel_shards=None;pipeline_parallel_stages=None
[2024-09-12 15:20:18] INFO compile.py:140: Creating model from: LlamaConfig(hidden_size=4096, intermediate_size=14336, num_attention_heads=32, num_hidden_layers=32, rms_norm_eps=1e-05, vocab_size=128256, tie_word_embeddings=False, position_embedding_base=500000.0, rope_scaling=None, context_window_size=8192, prefill_chunk_size=2048, num_key_value_heads=8, head_dim=128, tensor_parallel_shards=1, pipeline_parallel_stages=1, max_batch_size=80, kwargs={})
[2024-09-12 15:20:18] INFO compile.py:158: Exporting the model to TVM Unity compiler
[2024-09-12 15:20:22] INFO compile.py:164: Running optimizations using TVM Unity
[2024-09-12 15:20:22] INFO compile.py:185: Registering metadata: {'model_type': 'llama', 'quantization': 'q4f16_1', 'context_window_size': 8192, 'sliding_window_size': -1, 'attention_sink_size': -1, 'prefill_chunk_size': 2048, 'tensor_parallel_shards': 1, 'pipeline_parallel_stages': 1, 'kv_state_kind': 'kv_cache', 'max_batch_size': 80}
[2024-09-12 15:20:24] INFO pipeline.py:54: Running TVM Relax graph-level optimizations
[2024-09-12 15:20:31] INFO pipeline.py:54: Lowering to TVM TIR kernels
[2024-09-12 15:20:40] INFO pipeline.py:54: Running TVM TIR-level optimizations
[2024-09-12 15:21:03] INFO pipeline.py:54: Running TVM Dlight low-level optimizations
[2024-09-12 15:21:05] INFO pipeline.py:54: Lowering to VM bytecode
[2024-09-12 15:21:10] INFO estimate_memory_usage.py:58: [Memory usage] Function alloc_embedding_tensor: 16.00 MB
[2024-09-12 15:21:10] INFO estimate_memory_usage.py:58: [Memory usage] Function argsort_probs: 0.00 MB
[2024-09-12 15:21:10] INFO estimate_memory_usage.py:58: [Memory usage] Function batch_decode: 11.56 MB
[2024-09-12 15:21:10] INFO estimate_memory_usage.py:58: [Memory usage] Function batch_decode_to_last_hidden_states: 12.19 MB
[2024-09-12 15:21:10] INFO estimate_memory_usage.py:58: [Memory usage] Function batch_prefill: 296.62 MB
[2024-09-12 15:21:10] INFO estimate_memory_usage.py:58: [Memory usage] Function batch_prefill_to_last_hidden_states: 312.00 MB
[2024-09-12 15:21:10] INFO estimate_memory_usage.py:58: [Memory usage] Function batch_select_last_hidden_states: 0.62 MB
[2024-09-12 15:21:10] INFO estimate_memory_usage.py:58: [Memory usage] Function batch_verify: 296.00 MB
[2024-09-12 15:21:10] INFO estimate_memory_usage.py:58: [Memory usage] Function batch_verify_to_last_hidden_states: 312.00 MB
[2024-09-12 15:21:10] INFO estimate_memory_usage.py:58: [Memory usage] Function create_tir_paged_kv_cache: 0.00 MB
[2024-09-12 15:21:11] INFO estimate_memory_usage.py:58: [Memory usage] Function decode: 0.14 MB
[2024-09-12 15:21:11] INFO estimate_memory_usage.py:58: [Memory usage] Function decode_to_last_hidden_states: 0.15 MB
[2024-09-12 15:21:11] INFO estimate_memory_usage.py:58: [Memory usage] Function embed: 16.00 MB
[2024-09-12 15:21:11] INFO estimate_memory_usage.py:58: [Memory usage] Function gather_hidden_states: 0.00 MB
[2024-09-12 15:21:11] INFO estimate_memory_usage.py:58: [Memory usage] Function get_logits: 0.00 MB
[2024-09-12 15:21:11] INFO estimate_memory_usage.py:58: [Memory usage] Function multinomial_from_uniform: 0.00 MB
[2024-09-12 15:21:11] INFO estimate_memory_usage.py:58: [Memory usage] Function prefill: 296.01 MB
[2024-09-12 15:21:11] INFO estimate_memory_usage.py:58: [Memory usage] Function prefill_to_last_hidden_states: 312.00 MB
[2024-09-12 15:21:11] INFO estimate_memory_usage.py:58: [Memory usage] Function renormalize_by_top_p: 0.00 MB
[2024-09-12 15:21:11] INFO estimate_memory_usage.py:58: [Memory usage] Function sample_with_top_p: 0.00 MB
[2024-09-12 15:21:11] INFO estimate_memory_usage.py:58: [Memory usage] Function sampler_take_probs: 0.00 MB
[2024-09-12 15:21:11] INFO estimate_memory_usage.py:58: [Memory usage] Function sampler_verify_draft_tokens: 0.00 MB
[2024-09-12 15:21:11] INFO estimate_memory_usage.py:58: [Memory usage] Function scatter_hidden_states: 0.00 MB
[2024-09-12 15:21:11] INFO estimate_memory_usage.py:58: [Memory usage] Function softmax_with_temperature: 0.00 MB
[2024-09-12 15:21:13] INFO pipeline.py:54: Compiling external modules
[2024-09-12 15:21:13] INFO pipeline.py:54: Compilation complete! Exporting to disk
Traceback (most recent call last):
File "", line 198, in run_module_as_main
File "", line 88, in run_code
File "C:\ProgramData\miniconda3\envs\mlc-prebuilt\Lib\site-packages\mlc_llm_main.py", line 64, in
main()
File "C:\ProgramData\miniconda3\envs\mlc-prebuilt\Lib\site-packages\mlc_llm_main.py", line 33, in main
cli.main(sys.argv[2:])
File "C:\ProgramData\miniconda3\envs\mlc-prebuilt\Lib\site-packages\mlc_llm\cli\compile.py", line 129, in main
compile(
File "C:\ProgramData\miniconda3\envs\mlc-prebuilt\Lib\site-packages\mlc_llm\interface\compile.py", line 243, in compile
_compile(args, model_config)
File "C:\ProgramData\miniconda3\envs\mlc-prebuilt\Lib\site-packages\mlc_llm\interface\compile.py", line 188, in _compile
args.build_func(
File "C:\ProgramData\miniconda3\envs\mlc-prebuilt\Lib\site-packages\mlc_llm\support\auto_target.py", line 316, in build
).export_library(
^^^^^^^^^^^^^^^
File "C:\ProgramData\miniconda3\envs\mlc-prebuilt\Lib\site-packages\tvm\relax\vm_build.py", line 146, in export_library
return self.mod.export_library(
^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ProgramData\miniconda3\envs\mlc-prebuilt\Lib\site-packages\tvm\runtime\module.py", line 624, in export_library
return fcompile(file_name, files, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ProgramData\miniconda3\envs\mlc-prebuilt\Lib\site-packages\tvm\contrib\cc.py", line 96, in create_shared
_windows_compile(output, objects, options, cwd, ccache_env)
File "C:\ProgramData\miniconda3\envs\mlc-prebuilt\Lib\site-packages\tvm\contrib\cc.py", line 418, in _windows_compile
raise RuntimeError(msg)
RuntimeError: Compilation error:
clang -O2 --target=x86_64 -shared -o C:\Users\username\AppData\Local\Temp\tmpr7njn151\lib.dll C:\Users\username\AppData\Local\Temp\tmp8zcmbzym\lib0.o C:\Users\username\AppData\Local\Temp\tmp8zcmbzym\devc.o
clang: error: unable to execute command: program not executable
clang: error: linker (via gcc) command failed with exit code 1 (use -v to see invocation)
Traceback (most recent call last):
File "", line 198, in run_module_as_main
File "", line 88, in run_code
File "C:\ProgramData\miniconda3\envs\mlc-prebuilt\Scripts\mlc_llm.exe_main.py", line 7, in
File "C:\ProgramData\miniconda3\envs\mlc-prebuilt\Lib\site-packages\mlc_llm_main.py", line 45, in main
cli.main(sys.argv[2:])
File "C:\ProgramData\miniconda3\envs\mlc-prebuilt\Lib\site-packages\mlc_llm\cli\chat.py", line 36, in main
chat(
File "C:\ProgramData\miniconda3\envs\mlc-prebuilt\Lib\site-packages\mlc_llm\interface\chat.py", line 285, in chat
JSONFFIEngine(
File "C:\ProgramData\miniconda3\envs\mlc-prebuilt\Lib\site-packages\mlc_llm\json_ffi\engine.py", line 232, in init
model_args = _process_model_args(models, device, engine_config)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ProgramData\miniconda3\envs\mlc-prebuilt\Lib\site-packages\mlc_llm\serve\engine_base.py", line 171, in _process_model_args
model_args: List[Tuple[str, str]] = [_convert_model_info(model) for model in models]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ProgramData\miniconda3\envs\mlc-prebuilt\Lib\site-packages\mlc_llm\serve\engine_base.py", line 171, in
model_args: List[Tuple[str, str]] = [_convert_model_info(model) for model in models]
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ProgramData\miniconda3\envs\mlc-prebuilt\Lib\site-packages\mlc_llm\serve\engine_base.py", line 164, in _convert_model_info
model_lib = jit.jit(
^^^^^^^^
File "C:\ProgramData\miniconda3\envs\mlc-prebuilt\Lib\site-packages\mlc_llm\interface\jit.py", line 164, in jit
_run_jit(
File "C:\ProgramData\miniconda3\envs\mlc-prebuilt\Lib\site-packages\mlc_llm\interface\jit.py", line 124, in _run_jit
raise RuntimeError("Cannot find compilation output, compilation failed")
RuntimeError: Cannot find compilation output, compilation failed
TVM Unity Hash Tag (python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))", applicable if you compile models):
Any other relevant information:
Additional context
The text was updated successfully, but these errors were encountered:
🐛 Bug
Do not running Llama-3-8B-Instruct-q4f16_1-MLC
To Reproduce
Steps to reproduce the behavior:
[2024-09-12 15:20:08] INFO auto_device.py:88: Not found device: cuda:0
[2024-09-12 15:20:10] INFO auto_device.py:88: Not found device: rocm:0
[2024-09-12 15:20:12] INFO auto_device.py:88: Not found device: metal:0
[2024-09-12 15:20:13] INFO auto_device.py:79: Found device: vulkan:0
[2024-09-12 15:20:15] INFO auto_device.py:88: Not found device: opencl:0
[2024-09-12 15:20:15] INFO auto_device.py:35: Using device: vulkan:0
[2024-09-12 15:20:15] INFO download_cache.py:227: Downloading model from HuggingFace: HF://mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC
[2024-09-12 15:20:15] INFO download_cache.py:29: MLC_DOWNLOAD_CACHE_POLICY = ON. Can be one of: ON, OFF, REDO, READONLY
[2024-09-12 15:20:15] INFO download_cache.py:166: Weights already downloaded: C:\Users\username\AppData\Local\mlc_llm\model_weights\hf\mlc-ai\Llama-3-8B-Instruct-q4f16_1-MLC
[2024-09-12 15:20:15] INFO jit.py:43: MLC_JIT_POLICY = ON. Can be one of: ON, OFF, REDO, READONLY
[2024-09-12 15:20:15] INFO jit.py:118: Compiling using commands below:
[2024-09-12 15:20:15] INFO jit.py:119: 'C:\ProgramData\miniconda3\envs\mlc-prebuilt\python.exe' -m mlc_llm compile 'C:\Users\username\AppData\Local\mlc_llm\model_weights\hf\mlc-ai\Llama-3-8B-Instruct-q4f16_1-MLC' --opt 'flashinfer=1;cublas_gemm=1;faster_transformer=0;cudagraph=1;cutlass=1;ipc_allreduce_strategy=NONE' --overrides '' --device vulkan:0 --output 'C:\Users\username\AppData\Local\Temp\tmpr7njn151\lib.dll'
[2024-09-12 15:20:18] INFO auto_config.py:70: Found model configuration: C:\Users\username\AppData\Local\mlc_llm\model_weights\hf\mlc-ai\Llama-3-8B-Instruct-q4f16_1-MLC\mlc-chat-config.json
[2024-09-12 15:20:18] INFO auto_target.py:91: Detecting target device: vulkan:0
[2024-09-12 15:20:18] INFO auto_target.py:93: Found target: {"thread_warp_size": runtime.BoxInt(1), "supports_float32": runtime.BoxBool(true), "supports_int16": runtime.BoxBool(true), "max_threads_per_block": runtime.BoxInt(1024), "supports_storage_buffer_storage_class": runtime.BoxBool(true), "supports_int8": runtime.BoxBool(true), "supports_8bit_buffer": runtime.BoxBool(true), "supports_int64": runtime.BoxBool(true), "max_num_threads": runtime.BoxInt(256), "kind": "vulkan", "tag": "", "max_shared_memory_per_block": runtime.BoxInt(32768), "supports_16bit_buffer": runtime.BoxBool(true), "supports_int32": runtime.BoxBool(true), "keys": ["vulkan", "gpu"], "supports_float16": runtime.BoxBool(true)}
[2024-09-12 15:20:18] INFO auto_target.py:110: Found host LLVM triple: x86_64-pc-windows-msvc
[2024-09-12 15:20:18] INFO auto_target.py:111: Found host LLVM CPU: alderlake
[2024-09-12 15:20:18] INFO auto_config.py:154: Found model type: llama. Use
--model-type
to override.Compiling with arguments:
--config LlamaConfig(hidden_size=4096, intermediate_size=14336, num_attention_heads=32, num_hidden_layers=32, rms_norm_eps=1e-05, vocab_size=128256, tie_word_embeddings=False, position_embedding_base=500000.0, rope_scaling=None, context_window_size=8192, prefill_chunk_size=2048, num_key_value_heads=8, head_dim=128, tensor_parallel_shards=1, pipeline_parallel_stages=1, max_batch_size=80, kwargs={})
--quantization GroupQuantize(name='q4f16_1', kind='group-quant', group_size=32, quantize_dtype='int4', storage_dtype='uint32', model_dtype='float16', linear_weight_layout='NK', quantize_embedding=True, quantize_final_fc=True, num_elem_per_storage=8, num_storage_per_group=4, max_int_value=7, tensor_parallel_shards=0)
--model-type llama
--target {"thread_warp_size": runtime.BoxInt(1), "host": {"mtriple": "x86_64-pc-windows-msvc", "tag": "", "kind": "llvm", "mcpu": "alderlake", "keys": ["cpu"]}, "supports_float32": runtime.BoxBool(true), "supports_int16": runtime.BoxBool(true), "max_threads_per_block": runtime.BoxInt(1024), "supports_storage_buffer_storage_class": runtime.BoxBool(true), "supports_int8": runtime.BoxBool(true), "supports_8bit_buffer": runtime.BoxBool(true), "supports_int64": runtime.BoxBool(true), "max_num_threads": runtime.BoxInt(256), "kind": "vulkan", "tag": "", "max_shared_memory_per_block": runtime.BoxInt(32768), "supports_16bit_buffer": runtime.BoxBool(true), "supports_int32": runtime.BoxBool(true), "keys": ["vulkan", "gpu"], "supports_float16": runtime.BoxBool(true)}
--opt flashinfer=0;cublas_gemm=0;faster_transformer=0;cudagraph=0;cutlass=0;ipc_allreduce_strategy=NONE
--system-lib-prefix ""
--output C:\Users\username\AppData\Local\Temp\tmpr7njn151\lib.dll
--overrides context_window_size=None;sliding_window_size=None;prefill_chunk_size=None;attention_sink_size=None;max_batch_size=None;tensor_parallel_shards=None;pipeline_parallel_stages=None
[2024-09-12 15:20:18] INFO compile.py:140: Creating model from: LlamaConfig(hidden_size=4096, intermediate_size=14336, num_attention_heads=32, num_hidden_layers=32, rms_norm_eps=1e-05, vocab_size=128256, tie_word_embeddings=False, position_embedding_base=500000.0, rope_scaling=None, context_window_size=8192, prefill_chunk_size=2048, num_key_value_heads=8, head_dim=128, tensor_parallel_shards=1, pipeline_parallel_stages=1, max_batch_size=80, kwargs={})
[2024-09-12 15:20:18] INFO compile.py:158: Exporting the model to TVM Unity compiler
[2024-09-12 15:20:22] INFO compile.py:164: Running optimizations using TVM Unity
[2024-09-12 15:20:22] INFO compile.py:185: Registering metadata: {'model_type': 'llama', 'quantization': 'q4f16_1', 'context_window_size': 8192, 'sliding_window_size': -1, 'attention_sink_size': -1, 'prefill_chunk_size': 2048, 'tensor_parallel_shards': 1, 'pipeline_parallel_stages': 1, 'kv_state_kind': 'kv_cache', 'max_batch_size': 80}
[2024-09-12 15:20:24] INFO pipeline.py:54: Running TVM Relax graph-level optimizations
[2024-09-12 15:20:31] INFO pipeline.py:54: Lowering to TVM TIR kernels
[2024-09-12 15:20:40] INFO pipeline.py:54: Running TVM TIR-level optimizations
[2024-09-12 15:21:03] INFO pipeline.py:54: Running TVM Dlight low-level optimizations
[2024-09-12 15:21:05] INFO pipeline.py:54: Lowering to VM bytecode
[2024-09-12 15:21:10] INFO estimate_memory_usage.py:58: [Memory usage] Function
alloc_embedding_tensor
: 16.00 MB[2024-09-12 15:21:10] INFO estimate_memory_usage.py:58: [Memory usage] Function
argsort_probs
: 0.00 MB[2024-09-12 15:21:10] INFO estimate_memory_usage.py:58: [Memory usage] Function
batch_decode
: 11.56 MB[2024-09-12 15:21:10] INFO estimate_memory_usage.py:58: [Memory usage] Function
batch_decode_to_last_hidden_states
: 12.19 MB[2024-09-12 15:21:10] INFO estimate_memory_usage.py:58: [Memory usage] Function
batch_prefill
: 296.62 MB[2024-09-12 15:21:10] INFO estimate_memory_usage.py:58: [Memory usage] Function
batch_prefill_to_last_hidden_states
: 312.00 MB[2024-09-12 15:21:10] INFO estimate_memory_usage.py:58: [Memory usage] Function
batch_select_last_hidden_states
: 0.62 MB[2024-09-12 15:21:10] INFO estimate_memory_usage.py:58: [Memory usage] Function
batch_verify
: 296.00 MB[2024-09-12 15:21:10] INFO estimate_memory_usage.py:58: [Memory usage] Function
batch_verify_to_last_hidden_states
: 312.00 MB[2024-09-12 15:21:10] INFO estimate_memory_usage.py:58: [Memory usage] Function
create_tir_paged_kv_cache
: 0.00 MB[2024-09-12 15:21:11] INFO estimate_memory_usage.py:58: [Memory usage] Function
decode
: 0.14 MB[2024-09-12 15:21:11] INFO estimate_memory_usage.py:58: [Memory usage] Function
decode_to_last_hidden_states
: 0.15 MB[2024-09-12 15:21:11] INFO estimate_memory_usage.py:58: [Memory usage] Function
embed
: 16.00 MB[2024-09-12 15:21:11] INFO estimate_memory_usage.py:58: [Memory usage] Function
gather_hidden_states
: 0.00 MB[2024-09-12 15:21:11] INFO estimate_memory_usage.py:58: [Memory usage] Function
get_logits
: 0.00 MB[2024-09-12 15:21:11] INFO estimate_memory_usage.py:58: [Memory usage] Function
multinomial_from_uniform
: 0.00 MB[2024-09-12 15:21:11] INFO estimate_memory_usage.py:58: [Memory usage] Function
prefill
: 296.01 MB[2024-09-12 15:21:11] INFO estimate_memory_usage.py:58: [Memory usage] Function
prefill_to_last_hidden_states
: 312.00 MB[2024-09-12 15:21:11] INFO estimate_memory_usage.py:58: [Memory usage] Function
renormalize_by_top_p
: 0.00 MB[2024-09-12 15:21:11] INFO estimate_memory_usage.py:58: [Memory usage] Function
sample_with_top_p
: 0.00 MB[2024-09-12 15:21:11] INFO estimate_memory_usage.py:58: [Memory usage] Function
sampler_take_probs
: 0.00 MB[2024-09-12 15:21:11] INFO estimate_memory_usage.py:58: [Memory usage] Function
sampler_verify_draft_tokens
: 0.00 MB[2024-09-12 15:21:11] INFO estimate_memory_usage.py:58: [Memory usage] Function
scatter_hidden_states
: 0.00 MB[2024-09-12 15:21:11] INFO estimate_memory_usage.py:58: [Memory usage] Function
softmax_with_temperature
: 0.00 MB[2024-09-12 15:21:13] INFO pipeline.py:54: Compiling external modules
[2024-09-12 15:21:13] INFO pipeline.py:54: Compilation complete! Exporting to disk
Traceback (most recent call last):
File "", line 198, in run_module_as_main
File "", line 88, in run_code
File "C:\ProgramData\miniconda3\envs\mlc-prebuilt\Lib\site-packages\mlc_llm_main.py", line 64, in
main()
File "C:\ProgramData\miniconda3\envs\mlc-prebuilt\Lib\site-packages\mlc_llm_main.py", line 33, in main
cli.main(sys.argv[2:])
File "C:\ProgramData\miniconda3\envs\mlc-prebuilt\Lib\site-packages\mlc_llm\cli\compile.py", line 129, in main
compile(
File "C:\ProgramData\miniconda3\envs\mlc-prebuilt\Lib\site-packages\mlc_llm\interface\compile.py", line 243, in compile
_compile(args, model_config)
File "C:\ProgramData\miniconda3\envs\mlc-prebuilt\Lib\site-packages\mlc_llm\interface\compile.py", line 188, in _compile
args.build_func(
File "C:\ProgramData\miniconda3\envs\mlc-prebuilt\Lib\site-packages\mlc_llm\support\auto_target.py", line 316, in build
).export_library(
^^^^^^^^^^^^^^^
File "C:\ProgramData\miniconda3\envs\mlc-prebuilt\Lib\site-packages\tvm\relax\vm_build.py", line 146, in export_library
return self.mod.export_library(
^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ProgramData\miniconda3\envs\mlc-prebuilt\Lib\site-packages\tvm\runtime\module.py", line 624, in export_library
return fcompile(file_name, files, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ProgramData\miniconda3\envs\mlc-prebuilt\Lib\site-packages\tvm\contrib\cc.py", line 96, in create_shared
_windows_compile(output, objects, options, cwd, ccache_env)
File "C:\ProgramData\miniconda3\envs\mlc-prebuilt\Lib\site-packages\tvm\contrib\cc.py", line 418, in _windows_compile
raise RuntimeError(msg)
RuntimeError: Compilation error:
clang -O2 --target=x86_64 -shared -o C:\Users\username\AppData\Local\Temp\tmpr7njn151\lib.dll C:\Users\username\AppData\Local\Temp\tmp8zcmbzym\lib0.o C:\Users\username\AppData\Local\Temp\tmp8zcmbzym\devc.o
clang: error: unable to execute command: program not executable
clang: error: linker (via gcc) command failed with exit code 1 (use -v to see invocation)
Traceback (most recent call last):
File "", line 198, in run_module_as_main
File "", line 88, in run_code
File "C:\ProgramData\miniconda3\envs\mlc-prebuilt\Scripts\mlc_llm.exe_main.py", line 7, in
File "C:\ProgramData\miniconda3\envs\mlc-prebuilt\Lib\site-packages\mlc_llm_main.py", line 45, in main
cli.main(sys.argv[2:])
File "C:\ProgramData\miniconda3\envs\mlc-prebuilt\Lib\site-packages\mlc_llm\cli\chat.py", line 36, in main
chat(
File "C:\ProgramData\miniconda3\envs\mlc-prebuilt\Lib\site-packages\mlc_llm\interface\chat.py", line 285, in chat
JSONFFIEngine(
File "C:\ProgramData\miniconda3\envs\mlc-prebuilt\Lib\site-packages\mlc_llm\json_ffi\engine.py", line 232, in init
model_args = _process_model_args(models, device, engine_config)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ProgramData\miniconda3\envs\mlc-prebuilt\Lib\site-packages\mlc_llm\serve\engine_base.py", line 171, in _process_model_args
model_args: List[Tuple[str, str]] = [_convert_model_info(model) for model in models]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ProgramData\miniconda3\envs\mlc-prebuilt\Lib\site-packages\mlc_llm\serve\engine_base.py", line 171, in
model_args: List[Tuple[str, str]] = [_convert_model_info(model) for model in models]
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ProgramData\miniconda3\envs\mlc-prebuilt\Lib\site-packages\mlc_llm\serve\engine_base.py", line 164, in _convert_model_info
model_lib = jit.jit(
^^^^^^^^
File "C:\ProgramData\miniconda3\envs\mlc-prebuilt\Lib\site-packages\mlc_llm\interface\jit.py", line 164, in jit
_run_jit(
File "C:\ProgramData\miniconda3\envs\mlc-prebuilt\Lib\site-packages\mlc_llm\interface\jit.py", line 124, in _run_jit
raise RuntimeError("Cannot find compilation output, compilation failed")
RuntimeError: Cannot find compilation output, compilation failed
Expected behavior
Running chat.
Environment
conda
, source): condapip
, source):python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))"
, applicable if you compile models):Additional context
The text was updated successfully, but these errors were encountered: