Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix runtime errors reported when using long input sequence lengths wi…
…th LoRA (#343) This PR has following fixes, - Increase size of indices tensors used to maintain multi-lora state information from max_num_batched_tokens to 3*max_num_batched_tokens. This increase is done to provide buffer for padding done in batch & sequence dimensions. - Move logic to remove padding from lora_logits from execute_model() back to Class LogitsProcessorWithLoRA, this is done to fix race condition caused by updating multi-lora state information directly. FIX #237
- Loading branch information