Update dependency transformers to v4.45.2 #130

renovate · 2024-06-05T03:45:14Z

This PR contains the following updates:

Package	Change	Age	Adoption	Passing	Confidence
transformers	`==4.38.0` -> `==4.45.2`

Release Notes

huggingface/transformers (transformers)

`v4.45.2`

Compare Source

Patch release v4.45.2

Mostly some warnings that were not properly removed ⚠️ :

Ignore keys on validate_rope #33753 by @zucchini-nlp
remove warning v2 #33761 by @itazap
Config: lower save_pretrained exception to warning #33906 by @gante

🔴 Had a small regression with dynamic Cache 🔴
*Cache: revert DynamicCache init for BC #33861 by @gante

A small fix for idefic 🐩 :

Fixes for issue #33763 in idefics2 model #33766 by @aroun-coumar

And a fix for Siglip 🤧 !

hot fix self.position_embeddings->self.position_embedding #33958 and properly fix and RUN_SLOW #33965 thanks to @mranzinger

`v4.45.1`: Patch Release v4.45.1

Compare Source

Patches for v4.45.1

[MllamaProcessor] Update errors and API with multiple image (#33715) by @ArthurZucker
Generate: can_generate() recursive check (#33718) by @gante
clean_up_tokenization_spaces=False if unset (#31938) by @itazap

`v4.45.0`: Llama 3.2, mllama, Qwen2-Audio, Qwen2-VL, OLMoE, Llava Onevision, Pixtral, FalconMamba, Modular Transformers

Compare Source

New model additions

mllama

The Llama 3.2-Vision collection of multimodal large language models (LLMs) is a collection of pretrained and instruction-tuned image reasoning generative models in 11B and 90B sizes (text + images in / text out). The Llama 3.2-Vision instruction-tuned models are optimized for visual recognition, image reasoning, captioning, and answering general questions about an image. The models outperform many of the available open source and closed multimodal models on common industry benchmarks.

Add MLLama #33703, by @qubvel, @zucchini-nlp, @ArthurZucker

Qwen2-VL

The Qwen2-VL is a major update from the previous Qwen-VL by the Qwen team.

An extract from the Qwen2-VL blogpost available here is as follows:

Qwen2-VL is the latest version of the vision language models based on Qwen2 in the Qwen model familities. Compared with Qwen-VL, Qwen2-VL has the capabilities of:

SoTA understanding of images of various resolution & ratio: Qwen2-VL achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA, etc.
Understanding videos of 20min+: Qwen2-VL can understand videos over 20 minutes for high-quality video-based question answering, dialog, content creation, etc.
Agent that can operate your mobiles, robots, etc.: with the abilities of complex reasoning and decision making, Qwen2-VL can be integrated with devices like mobile phones, robots, etc., for automatic operation based on visual environment and text instructions.
Multilingual Support: to serve global users, besides English and Chinese, Qwen2-VL now supports the understanding of texts in different languages inside images, including most European languages, Japanese, Korean, Arabic, Vietnamese, etc.

support qwen2-vl by @simonJJJ in #32318

Qwen2-Audio

The Qwen2-Audio is the new model series of large audio-language models from the Qwen team. Qwen2-Audio is capable of accepting various audio signal inputs and performing audio analysis or direct textual responses with regard to speech instructions.

They introduce two distinct audio interaction modes:

voice chat: users can freely engage in voice interactions with Qwen2-Audio without text input
audio analysis: users could provide audio and text instructions for analysis during the interaction

Add Qwen2-Audio by @faychu in #32137

OLMoE

OLMoE is a series of Open Language Models using sparse Mixture-of-Experts designed to enable the science of language models. The team releases all code, checkpoints, logs, and details involved in training these models.

Add OLMoE by @Muennighoff in #32406

Llava Onevision

LLaVA-Onevision is a Vision-Language Model that can generate text conditioned on one or several images/videos. The model consists of SigLIP vision encoder and a Qwen2 language backbone. The images are processed with anyres-9 technique where the image is split into 9 patches to better process high resolution images and capture as much details as possible. However, videos are pooled to a total sequence length of 196 tokens each frame for more memory efficient computation. LLaVA-Onevision is available in three sizes: 0.5B, 7B and 72B and achieves remarkable performance on benchmark evaluations.

Llava Onevision: add model by @zucchini-nlp in #32673

FalconMamba

The FalconMamba model was proposed by TII UAE (Technology Innovation Institute) in their release.

The model has been trained on approximtely 6T tokens consisting a mixture of many data sources such as RefineWeb, Cosmopedia and Math data.

The team releases an accompanying blog post.

Add new model by @younesbelkada in #32615

Granite Language Models

he Granite model was proposed in Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler by Yikang Shen, Matthew Stallone, Mayank Mishra, Gaoyuan Zhang, Shawn Tan, Aditya Prasad, Adriana Meza Soria, David D. Cox and Rameswar Panda.

PowerLM-3B is a 3B state-of-the-art small language model trained with the Power learning rate scheduler. It is trained on a wide range of open-source and synthetic datasets with permissive licenses. PowerLM-3B has shown promising results compared to other models in the size categories across various benchmarks, including natural language multi-choices, code generation, and math reasoning.

Granite language models by @mayank31398 in #31502

Granite MOE

The GraniteMoe model was proposed in Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler by Yikang Shen, Matthew Stallone, Mayank Mishra, Gaoyuan Zhang, Shawn Tan, Aditya Prasad, Adriana Meza Soria, David D. Cox and Rameswar Panda.

PowerMoE-3B is a 3B sparse Mixture-of-Experts (sMoE) language model trained with the Power learning rate scheduler. It sparsely activates 800M parameters for each token. It is trained on a mix of open-source and proprietary datasets. PowerMoE-3B has shown promising results compared to other dense models with 2x activate parameters across various benchmarks, including natural language multi-choices, code generation, and math reasoning.

Granitemoe by @mayank31398 in #33207

Descript-Audio-Codec

The Descript Audio Codec (DAC) model is a powerful tool for compressing audio data, making it highly efficient for storage and transmission. By compressing 44.1 KHz audio into tokens at just 8kbps bandwidth, the DAC model enables high-quality audio processing while significantly reducing the data footprint. This is particularly useful in scenarios where bandwidth is limited or storage space is at a premium, such as in streaming applications, remote conferencing, and archiving large audio datasets.

Add Descript-Audio-Codec model by @kamilakesbi in #31494

Pixtral

The Pixtral model was released by the Mistral AI team. Pixtral is a multimodal model, taking images and text as input, and producing text as output. This model follows the Llava family, meaning image embeddings are placed instead of the [IMG] token placeholders.

The model uses PixtralVisionModel for its vision encoder, and MistralForCausalLM for its language decoder. The main contribution is the 2d ROPE (rotary postiion embeddings) on the images, and support for arbitrary image sizes (the images are not padded together nor are they resized).

Add support for Pixtral by @ArthurZucker in #33449

Mimi

The Mimi model was proposed in Moshi: a speech-text foundation model for real-time dialogue by Alexandre Défossez, Laurent Mazaré, Manu Orsini, Amélie Royer, Patrick Pérez, Hervé Jégou, Edouard Grave and Neil Zeghidour. Mimi is a high-fidelity audio codec model developed by the Kyutai team, that combines semantic and acoustic information into audio tokens running at 12Hz and a bitrate of 1.1kbps. In other words, it can be used to map audio waveforms into “audio tokens”, known as “codebooks”.

Codec integration by @ylacombe in #33565

Quantization

GGUF

GGUF support continues to be enhanced in the library by offering a way to load GGUF models within transformers by unquantizing them, before re-quantizing them for re-use within the GGUF/GGML ecosystem.

Add Qwen2Moe GGUF loading support by @VladOS95-cyber in #33264
Fix incorrect vocab size retrieval in GGUF config by @Isotr0py in #32551
Add chat_template for tokenizer extracted from GGUF model by @Isotr0py in #32908
🚨 Support dequantization for most GGML types by @Isotr0py in #32625
Add support for GGUF Phi-3 by @a8nova in #31844

Torch AO

An ongoing effort is to add the ability to use torchao as a quantization backend. Future PRs will enable saving and fine-tuning with peft.

Add TorchAOHfQuantizer by @jerryzh168 in #32306

Liger Kernel

The Liger kernel is now supported in the Trainer class.

Integrate Liger (Linkedin GPU Efficient Runtime) Kernel to Trainer by @JasonZhu1313 in #32860

Modular Transformers

This PR introduces Modularity for transformers, which has always been prohibited when working with transformers (see blog post for the accompanying design philosophy).

The core idea behind this PR is to facilitate model addition by enabling Pythonic inheritance while keeping true to our single-file policy in which models/processors must be contained within a single file, enabling working around the object without going through 10 layers of abstractions.

It is heavily recommended to read the PR description in order to understand the depth of the change: https://github.com/huggingface/transformers/pull/33248

Modular transformers: modularity and inheritance for new model additions by @ArthurZucker in #33248

Agents

Agents continue being improved at each release; this time making it much simpler to leverage a local engine through a local Transformers Engine.

Multi agents with manager by @aymeric-roucher in #32687
Add new documentation page for advanced agent usage by @aymeric-roucher in #33265
Create local Transformers Engine by @aymeric-roucher in #33218
Agents use grammar by @aymeric-roucher in #31735

Dynamic cache for decoder-only models

This PR adds to all decoder-only models (except for XLNet) support for dynamic cache.

The documentation for the Dynamic cache can be found here, and documentation related to the KV cache in transformers in general can be found here.

Cache: new Cache format in decoder-only models by @zucchini-nlp in #31421

Chat templates updates

We've made several updates to our handling of chat models and chat templates. The most noticeable change is that assistant prefill is now supported. This means you can end a chat with an assistant message, and the model will continue that message instead of starting a new one, allowing you to guide the model's response:

pipe = pipeline("text-generation", model_checkpoint)

chat = [
    {"role": "user", "content": "Can you format the answer in JSON?"},
    {"role": "assistant", "content": '{"name": "'}
]

output = pipe(chat)   # The model will continue outputting JSON!

We've also enabled several new functionalities in Jinja that will allow more powerful templates in future, including Loop Controls and a strftime_now function that can get the current date and time, which is commonly used in system messages. For more details, see the updated chat template docs.

Enable some Jinja extensions and add datetime capabilities by @Rocketknight1 in #32684
Update Jinja docs with new functions and general cleanup by @Rocketknight1 in #33097
Add assistant prefill for chat templates and TextGenerationPipeline by @Rocketknight1 in #33198
Add a warning to the chat template docs about the tool_calls format by @Rocketknight1 in #33277
Add tip to clarify tool calling by @Rocketknight1 in #32883

Bugfixes and improvements

🌐 [i18n-KO] Translated mask_generation.md to Korean by @jeongiin in #32257
🌐 [i18n-KO] Translated idefics.md to Korean by @boyunJang in #32258
🌐 [i18n-KO] Translated image_to_image.md to Korean by @shinhyunji36 in #32327
Gemma2: add cache warning by @zucchini-nlp in #32279
enable xla fsdp by @hanwen-sun in #32048
Fix typo in tokenization_utils_base.py by @blubitz in #32484
fix broken link in docs by @jorahn in #32491
Docs: alert for the possibility of manipulating logits by @gante in #32467
🌐 [i18n-KO] Translated gptq.md to Korean by @1kmmk1 in #32293
🌐 [i18n-KO] Translated prompting.md to Korean by @chhaewxn in #32294
🌐 [i18n-KO] Translated quantization/quanto.md to Korean by @fabxoe in #32281
🌐 [i18n-KO] Translated image_feature_extraction.md to Korean by @mreraser in #32239
Fix references to model google mt5 small by @JuanFKurucz in #32497
Docs: Fixed WhisperModel.forward’s docstring link by @Sai-Suraj-27 in #32498
🌐 [i18n-KO] Translated chat_templating.md to Korean by @enchantee00 in #32362
Fix link to autoclass_tutorial.md in i18n.md by @JuanFKurucz in #32501
Fix typo: depracted -> deprecated by @tomaarsen in #32489
Fix issue #32518: Update llm_tutorial.md by @doomdagadiggiedahdah in #32523
Change Phi3 _supports_sdpa to True by @pocca2048 in #32457
Uniformize kwargs for processors - GroundingDINO by @SangbumChoi in #31964
Fix add-new-model-like by @molbap in #31773
filter flash_attn optional imports loading remote code by @eaidova in #30954
🌐 [i18n-KO] Translated ko-llm_tutorial_optimization.md to Korean by @010kim in #32372
🌐 [i18n-KO] Translated trainer.md to Korean by @cjfghk5697 in #32260
🌐 [i18n-KO] Translated eetq.md to Korean by @jun048098 in #32352
🌐 [i18n-KO] Translated fsdp.md to Korean by @win2dvp21 in #32261
🌐 [i18n-KO] Translated bitsandbytes.md to Korean by @SeungAhSon in #32408
Fix generate with inputs_embeds as input by @molbap in #32493
Fixed test test_static_cache_exportability with torch 2.4.0 by @guangy10 in #32516
Fix code example to load bigcode starcoder2 7b by @JuanFKurucz in #32474
[docs] Translation guide by @stevhliu in #32547
Gemma2: fix FA2 generation by @zucchini-nlp in #32553
Fix a bug in Qwen2Audio by @faychu in #32552
fix slow integration gemma2 test by @ArthurZucker in #32534
fix non contiguous tensor value error in save_pretrained by @congcongke in #32422
🌐 [i18n-KO] Translated agent.md to Korean by @Jwaminju in #32351
Fix: FA2 with packed training by @zucchini-nlp in #32487
Fix sliding window attention used in Gemma2FlashAttention2 by @brcps12 in #32522
fix: Fixed conditional check for encodec model names by @Sai-Suraj-27 in #32581
Fix .push_to_hub(..., create_pr=True, revision="my-branch") when creating PR on not-owned repo by @Wauplin in #32094
Cleanup tool calling documentation and rename doc by @Rocketknight1 in #32337
🌐 [i18n-KO] Translated deepspeed.md to Korean by @4N3MONE in #32431
🌐 [i18n-KO] Translated awq.mdto Korean by @ahnjj in #32324
fix: Fixed failing test_find_base_model_checkpoint by @Sai-Suraj-27 in #32638
"to be not" -> "not to be" by @qgallouedec in #32636
fix: Updated the is_torch_mps_available() function to include min_version argument by @Sai-Suraj-27 in #32545
Expand inputs in processors for VLMs by @zucchini-nlp in #30962
Automatically add transformers tag to the modelcard by @LysandreJik in #32623
Fix tests by @molbap in #32649
fix tensors on different devices in WhisperGenerationMixin by @faaany in #32316
Add support for GrokAdamW optimizer by @ehartford in #32521
Add Depth Anything V2 Metric models by @bt2513 in #32126
Fix: Fixed directory path for utils folder in test_tokenization_utils.py by @Sai-Suraj-27 in #32601
Modify ProcessorTesterMixin for better generalization by @yonigozlan in #32637
TF_Deberta supporting mixed precision by @pinesnow72 in #32618
Fix tests recurrent by @molbap in #32651
Support MUSA (Moore Threads GPU) backend in transformers by @fmo-mt in #31913
fix: Fixed failing tests in tests/utils/test_add_new_model_like.py by @Sai-Suraj-27 in #32678
Update translation docs review by @stevhliu in #32662
Fix JetMoeIntegrationTest by @ydshieh in #32332
Update the distributed CPU training on Kubernetes documentation by @dmsuehir in #32669
fix: Fixed unknown pytest config option doctest_glob by @Sai-Suraj-27 in #32475
Unpin deepspeed in Docker image/tests by @muellerzr in #32572
Updated workflows to the latest versions by @Sai-Suraj-27 in #32405
reopen: llava-next fails to consider padding_side during Training by @jp1924 in #32679
fix: Corrected falcon-mamba-7b model checkpoint name by @Sai-Suraj-27 in #32837
fix: update doc link for runhouse in README.md by @muddlebee in #32664
VLMs: small clean-up for cache class by @zucchini-nlp in #32417
add back the position ids by @ArthurZucker in #32554
Use head_dim if in config for RoPE by @suiyoubi in #32495
Generate: unify LogitsWarper and LogitsProcessor by @gante in #32626
[tests] make test_sdpa_equivalence device-agnostic by @faaany in #32520
Cache: use batch_size instead of max_batch_size by @gante in #32657
Fix AutoConfig and AutoModel support for Llava-Next-Video by @TKONIY in #32844
improve _get_is_as_tensor_fns by @zrr1999 in #32596
Revert PR 32299, flag users when Zero-3 was missed by @muellerzr in #32851
fix multi-gpu with static cache by @SunMarc in #32543
Reduce the error log when using core models that need their weights renamed, and provide a step forward by @muellerzr in #32656
Make beam_constraints.Constraint.advance() docstring more accurate by @alex-calderwood in #32674
generate: missing to in DoLa body, causing exceptions in multi-gpu generation by @gante in #32856
Add Flax Dinov2 by @MHRDYN7 in #31960
support torch-speech by @itazap in #32537
[tests] make test_sdpa_can_compile_dynamic device-agnostic by @faaany in #32519
Add repr for Conv1D by @AaronZLT in #32425
Support save/load ckpt for XLA FSDP by @yitongh in #32311
RT-DETR parameterized batchnorm freezing by @AlanBlanchet in #32631
Mamba / FalconMamba: Fix mamba left padding by @younesbelkada in #32677
Fix: Mamba2 generation mismatch between input_ids and inputs_embeds by @vasqu in #32694
Docs: Fixed whisper-large-v2 model link in docs by @Sai-Suraj-27 in #32871
Allow-head-dim by @ArthurZucker in #32857
🚨🚨🚨 Update min version of accelerate to 0.26.0 by @SunMarc in #32627
Fix repr for conv by @ArthurZucker in #32897
fix: jamba cache fails to use torch.nn.module by @xgal in #32894
Fix: Mamba2 norm_before_gate usage by @vasqu in #32686
Replace tensor.norm() with decomposed version for CLIP executorch export by @qubvel in #32887
link for optimizer names by @nbroad1881 in #32400
[i18n-ar] add README_ar.md to README.md by @AhmedAlmaghz in #32583
fix: [whisper] don't overwrite GenerationConfig's return_timestamps when return_timestamps is not passed to generate function by @hrl in #31296
Update docker image building by @ArthurZucker in #32918
Jamba: update integration tests by @gante in #32250
fix: Added missing huggingface_hub installation to workflows by @Sai-Suraj-27 in #32891
fix: no need to dtype A in jamba by @xgal in #32924
FEAT / Trainer: Add adamw 4bit optimizer by @SunMarc in #31865
CI: separate step to download nltk files by @gante in #32935
FIX / Hub: Also catch for exceptions.ConnectionError by @younesbelkada in #31469
Add SynCode to llm_tutorial by @shubhamugare in #32884
Fix benchmark script by @ydshieh in #32635
Improve greedy search memory usage by @regisss in #32895
fix: (issue #32689) AttributeError raised when using Trainer with eval_on_start=True in Jupyter Notebook. by @fshp971 in #32849
Gemma2: eager attention by default by @gante in #32865
[run_slow] idefics2 by @andimarafioti in #32840
Fix regression on Processor.save_pretrained caused by #31691 by @leloykun in #32921
🌐 [i18n-KO] Translated `knowledge_distillation_for_image_classification.md to Korean" by @JinukHong in #32334
Generate: Deprecate returning legacy cache by default; Handle use_cache=False by @gante in #32863
docs: fix outdated link to TF32 explanation by @anakin87 in #32947
Reducing memory usage: removing useless logits computation in generate() by @Cyrilvallez in #31292
Forbid PretrainedConfig from saving generate parameters; Update deprecations in generate-related code 🧹 by @gante in #32659
DeviceGuard added to use Deformable Attention more safely on multi-GPU by @DonggeunYu in #32910
added doctring to SchedulerType class by @Arunprakash-A in #32898
Updated the custom_models.md changed cross_entropy code by @S-M-J-I in #33118
CI: add torchvision to the consistency image by @gante in #32941
Test: add higher atol in test_forward_with_num_logits_to_keep by @gante in #33093
mps: add isin_mps_friendly, a wrapper function for torch.isin by @gante in #33099
Add changes for uroman package to handle non-Roman characters by @nandwalritik in #32404
fix: Fixed pydantic required version in dockerfiles to make it compatible with DeepSpeed by @Sai-Suraj-27 in #33105
quickfix documentation by @molbap in #32566
Fixup py 38 type hints for mps friendly by @muellerzr in #33128
fix: Fixed CodeGenTokenizationTest::test_truncation failing test by @Sai-Suraj-27 in #32850
fix: multilingual midel convert to tflite get wrong token by @Ayaa17 in #32079
disable scheduled daily CI temporarily by @ydshieh in #33136
CI: fix efficientnet pipeline timeout and prevent future similar issues due to large image size by @gante in #33123
Log additional test metrics with the CometCallback by @Lothiraldan in #33124
[docs] add quick usage snippet to Whisper. by @Vaibhavs10 in #31289
Update stateful_callbacks state before saving checkpoint by @pedrobrs in #32115
fix Idefics2VisionConfig type annotation by @chenzizhao in #33103
Add a fix for custom code tokenizers in pipelines by @Rocketknight1 in #32300
Llama: make slow tests green 🟢 by @gante in #33138
fix redundant checkpointing in example training scripts by @eminorhan in #33131
update torch req for 4-bit optimizer by @SunMarc in #33144
🌐 [i18n-KO] Translated conversations.md to Korean by @newfull5 in #32468
Very small change to one of the function parameters by @alisalamatian1 in #32548
🚨 Add Blip2ForImageTextRetrieval by @jpizarrom in #29261
fix model name and copyright by @mayank31398 in #33152
Fix: Jamba batched generation by @vasqu in #32914
[whisper] pass attention_mask to generate_with_fallback() by @benniekiss in #33145
[RoBERTa-based] Add support for sdpa by @hackyon in #30510
Fix import paths for test_module by @rasmi in #32888
Zero-shot pipelines: minor doc changes by @pcuenca in #33127
Customise the separator used for splicing in DataCollatorWithFlattening by @beep-bebop in #33114
Fix spell mistakes by @matsuo1234567 in #33149
update push CI workflow files for security by @ydshieh in #33142
added quick clarification by @DuyguA in #33166
pass module to Params4bit.from_prequantized to ensure quant_state by @winglian in #32524
Mamba2 conversion script for original models by @vasqu in #32580
Add a static cache that offloads to the CPU or other device by @gerbenvv in #32161
use a single for loop by @ArthurZucker in #33148
Pipeline: fix bad generation kwargs docs by @gante in #33205
Add missing quotes in modeling_llava_next_video.py by @juliendenize in #33214
Add warning for stop string edge case by @Rocketknight1 in #33169
Fix local repos with remote code not registering for pipelines by @Rocketknight1 in #33100
Refactor CI: more explicit by @ArthurZucker in #30674
🌐 [i18n-KO] Translated llm_optims.md to Korean by @yijun-lee in #32325
Fix red amin by @ArthurZucker in #33220
Test fetcher: missing return on filtered tests; don't write empty files by @gante in #33224
Generate: throw warning when return_dict_in_generate is False but should be True by @gante in #33146
Add video text to text docs by @merveenoyan in #33164
Add GraniteRMSNorm by @NielsRogge in #33177
Add duckduckgo search tool by @aymeric-roucher in #32882
Fix: Suppressed 'use_reentrant=False' warning by @ankush13r in #33208
docs: Replace package abbreviations with full name(bitsandbytes) in docstrings by @rapsealk in #33230
Generate: fix assistant in different device by @gante in #33257
remove to restriction for 4-bit model by @SunMarc in #33122
Fixed typo repeated word in DETR docs by @sergiopaniego in #33250
Fix: use torch.from_numpy() to create tensors for np.ndarrays by @shinyano in #33201
remove torch input dependant control flow by @ArthurZucker in #33245
Fix: num_logits_to_keep in composite models by @zucchini-nlp in #33168
Fix Bark saving by @ylacombe in #33266
Update chat template docs to remove Blenderbot by @Rocketknight1 in #33254
Add sdpa support for Albert by @OmarManzoor in #32092
Only disallow DeepSpeed Zero-3 for auto bs finder by @muellerzr in #31731
fix the parallel number of CI nodes when it is smaller than number of tests by @ArthurZucker in #33276
Repo checks: check documented methods exist by @gante in #32320
Fix: multigpu training by @zucchini-nlp in #33271
Cache docs: update by @zucchini-nlp in #32929
Config: unified logic to retrieve text config by @gante in #33219
[fix] LlavaNextProcessor '_get_unpadded_features' method by @laurentd-lunit in #33263
wait 15m before SSH into runner workflow stops by @ydshieh in #33300
Bugfix/alexsherstinsky/fix none check for attention factor in rope scaling 2024 08 28 0 by @alexsherstinsky in #33188
[InstructBLIP] qformer_tokenizer is required input by @amyeroberts in #33222
[BUG] fix upper nltk version by @ylacombe in #33301
Fix excessive CPU memory usage with FSDP and cpu_ram_efficient_loading by @matthewdouglas in #33154
Add validate images and text inputs order util for processors and test_processing_utils by @yonigozlan in #33285
Fix: Fix FalconMamba training issues due to incompatible kernels

Configuration

📅 Schedule: Branch creation - "* 0-4 * * 3" (UTC), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.

If you want to rebase/retry this PR, check this box

This PR was generated by Mend Renovate. View the repository job log.

renovate bot requested review from aurangzaib048 and LorenzoMinto as code owners June 5, 2024 03:45

renovate bot force-pushed the renovate/transformers-4.x branch from d3f070c to 24ba2af Compare July 1, 2024 15:55

renovate bot changed the title ~~Update dependency transformers to v4.41.2~~ Update dependency transformers to v4.42.0 Jul 1, 2024

renovate bot force-pushed the renovate/transformers-4.x branch from 24ba2af to 9ef59a3 Compare July 1, 2024 19:30

renovate bot changed the title ~~Update dependency transformers to v4.42.0~~ Update dependency transformers to v4.42.1 Jul 1, 2024

renovate bot force-pushed the renovate/transformers-4.x branch from 9ef59a3 to 9abb6f3 Compare July 2, 2024 07:16

renovate bot changed the title ~~Update dependency transformers to v4.42.1~~ Update dependency transformers to v4.42.2 Jul 2, 2024

renovate bot force-pushed the renovate/transformers-4.x branch from 9abb6f3 to d173aa7 Compare July 2, 2024 16:10

renovate bot changed the title ~~Update dependency transformers to v4.42.2~~ Update dependency transformers to v4.42.3 Jul 2, 2024

renovate bot force-pushed the renovate/transformers-4.x branch from d173aa7 to 1a54c3d Compare July 15, 2024 18:50

renovate bot changed the title ~~Update dependency transformers to v4.42.3~~ Update dependency transformers to v4.42.4 Jul 15, 2024

renovate bot force-pushed the renovate/transformers-4.x branch from 1a54c3d to 0ded9ee Compare July 27, 2024 16:05

renovate bot changed the title ~~Update dependency transformers to v4.42.4~~ Update dependency transformers to v4.43.1 Jul 27, 2024

renovate bot force-pushed the renovate/transformers-4.x branch from 0ded9ee to 50a5053 Compare July 28, 2024 16:32

renovate bot changed the title ~~Update dependency transformers to v4.43.1~~ Update dependency transformers to v4.43.2 Jul 28, 2024

renovate bot force-pushed the renovate/transformers-4.x branch from 50a5053 to daab913 Compare July 30, 2024 16:30

renovate bot changed the title ~~Update dependency transformers to v4.43.2~~ Update dependency transformers to v4.43.3 Jul 30, 2024

renovate bot force-pushed the renovate/transformers-4.x branch from daab913 to 0023214 Compare August 9, 2024 12:28

renovate bot changed the title ~~Update dependency transformers to v4.43.3~~ Update dependency transformers to v4.43.4 Aug 9, 2024

renovate bot force-pushed the renovate/transformers-4.x branch from 0023214 to ccebe27 Compare August 10, 2024 22:22

renovate bot changed the title ~~Update dependency transformers to v4.43.4~~ Update dependency transformers to v4.44.0 Aug 10, 2024

renovate bot force-pushed the renovate/transformers-4.x branch from ccebe27 to ce803f6 Compare August 24, 2024 18:38

renovate bot changed the title ~~Update dependency transformers to v4.44.0~~ Update dependency transformers to v4.44.1 Aug 24, 2024

renovate bot force-pushed the renovate/transformers-4.x branch from ce803f6 to 7ecf72b Compare August 26, 2024 18:54

renovate bot changed the title ~~Update dependency transformers to v4.44.1~~ Update dependency transformers to v4.44.2 Aug 26, 2024

renovate bot force-pushed the renovate/transformers-4.x branch from 7ecf72b to f7a3fb2 Compare September 29, 2024 19:28

renovate bot changed the title ~~Update dependency transformers to v4.44.2~~ Update dependency transformers to v4.45.0 Sep 29, 2024

renovate bot force-pushed the renovate/transformers-4.x branch from f7a3fb2 to 07fa8a0 Compare September 30, 2024 18:27

renovate bot changed the title ~~Update dependency transformers to v4.45.0~~ Update dependency transformers to v4.45.1 Sep 30, 2024

renovate bot force-pushed the renovate/transformers-4.x branch from 07fa8a0 to f1816da Compare October 11, 2024 18:50

renovate bot changed the title ~~Update dependency transformers to v4.45.1~~ Update dependency transformers to v4.45.2 Oct 11, 2024

renovate bot force-pushed the renovate/transformers-4.x branch from f1816da to 89065f4 Compare October 28, 2024 10:29

renovate bot changed the title ~~Update dependency transformers to v4.45.2~~ Update dependency transformers to v4.46.0 Oct 28, 2024

Update dependency transformers to v4.45.2

ba78e43

renovate bot force-pushed the renovate/transformers-4.x branch from 89065f4 to ba78e43 Compare October 29, 2024 16:37

renovate bot changed the title ~~Update dependency transformers to v4.46.0~~ Update dependency transformers to v4.45.2 Oct 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update dependency transformers to v4.45.2 #130

Update dependency transformers to v4.45.2 #130

renovate bot commented Jun 5, 2024 •

edited

Loading

Update dependency transformers to v4.45.2 #130

Are you sure you want to change the base?

Update dependency transformers to v4.45.2 #130

Conversation

renovate bot commented Jun 5, 2024 • edited Loading

Release Notes

v4.45.2

Patch release v4.45.2

v4.45.1: Patch Release v4.45.1

Patches for v4.45.1

v4.45.0: Llama 3.2, mllama, Qwen2-Audio, Qwen2-VL, OLMoE, Llava Onevision, Pixtral, FalconMamba, Modular Transformers

New model additions

mllama

Qwen2-VL

Qwen2-Audio

OLMoE

Llava Onevision

FalconMamba

Granite Language Models

Granite MOE

Descript-Audio-Codec

Pixtral

Mimi

Quantization

GGUF

Torch AO

Liger Kernel

Modular Transformers

Agents

Dynamic cache for decoder-only models

Chat templates updates

Bugfixes and improvements

Configuration

renovate bot commented Jun 5, 2024 •

edited

Loading

`v4.45.2`

`v4.45.1`: Patch Release v4.45.1

`v4.45.0`: Llama 3.2, mllama, Qwen2-Audio, Qwen2-VL, OLMoE, Llava Onevision, Pixtral, FalconMamba, Modular Transformers