-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update dependency transformers to v4.45.2 #130
Open
renovate
wants to merge
1
commit into
master
Choose a base branch
from
renovate/transformers-4.x
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
renovate
bot
force-pushed
the
renovate/transformers-4.x
branch
from
July 1, 2024 15:55
d3f070c
to
24ba2af
Compare
renovate
bot
changed the title
Update dependency transformers to v4.41.2
Update dependency transformers to v4.42.0
Jul 1, 2024
renovate
bot
force-pushed
the
renovate/transformers-4.x
branch
from
July 1, 2024 19:30
24ba2af
to
9ef59a3
Compare
renovate
bot
changed the title
Update dependency transformers to v4.42.0
Update dependency transformers to v4.42.1
Jul 1, 2024
renovate
bot
force-pushed
the
renovate/transformers-4.x
branch
from
July 2, 2024 07:16
9ef59a3
to
9abb6f3
Compare
renovate
bot
changed the title
Update dependency transformers to v4.42.1
Update dependency transformers to v4.42.2
Jul 2, 2024
renovate
bot
force-pushed
the
renovate/transformers-4.x
branch
from
July 2, 2024 16:10
9abb6f3
to
d173aa7
Compare
renovate
bot
changed the title
Update dependency transformers to v4.42.2
Update dependency transformers to v4.42.3
Jul 2, 2024
renovate
bot
force-pushed
the
renovate/transformers-4.x
branch
from
July 15, 2024 18:50
d173aa7
to
1a54c3d
Compare
renovate
bot
changed the title
Update dependency transformers to v4.42.3
Update dependency transformers to v4.42.4
Jul 15, 2024
renovate
bot
force-pushed
the
renovate/transformers-4.x
branch
from
July 27, 2024 16:05
1a54c3d
to
0ded9ee
Compare
renovate
bot
changed the title
Update dependency transformers to v4.42.4
Update dependency transformers to v4.43.1
Jul 27, 2024
renovate
bot
force-pushed
the
renovate/transformers-4.x
branch
from
July 28, 2024 16:32
0ded9ee
to
50a5053
Compare
renovate
bot
changed the title
Update dependency transformers to v4.43.1
Update dependency transformers to v4.43.2
Jul 28, 2024
renovate
bot
force-pushed
the
renovate/transformers-4.x
branch
from
July 30, 2024 16:30
50a5053
to
daab913
Compare
renovate
bot
changed the title
Update dependency transformers to v4.43.2
Update dependency transformers to v4.43.3
Jul 30, 2024
renovate
bot
force-pushed
the
renovate/transformers-4.x
branch
from
August 9, 2024 12:28
daab913
to
0023214
Compare
renovate
bot
changed the title
Update dependency transformers to v4.43.3
Update dependency transformers to v4.43.4
Aug 9, 2024
renovate
bot
force-pushed
the
renovate/transformers-4.x
branch
from
August 10, 2024 22:22
0023214
to
ccebe27
Compare
renovate
bot
changed the title
Update dependency transformers to v4.43.4
Update dependency transformers to v4.44.0
Aug 10, 2024
renovate
bot
force-pushed
the
renovate/transformers-4.x
branch
from
August 24, 2024 18:38
ccebe27
to
ce803f6
Compare
renovate
bot
changed the title
Update dependency transformers to v4.44.0
Update dependency transformers to v4.44.1
Aug 24, 2024
renovate
bot
force-pushed
the
renovate/transformers-4.x
branch
from
August 26, 2024 18:54
ce803f6
to
7ecf72b
Compare
renovate
bot
changed the title
Update dependency transformers to v4.44.1
Update dependency transformers to v4.44.2
Aug 26, 2024
renovate
bot
force-pushed
the
renovate/transformers-4.x
branch
from
September 29, 2024 19:28
7ecf72b
to
f7a3fb2
Compare
renovate
bot
changed the title
Update dependency transformers to v4.44.2
Update dependency transformers to v4.45.0
Sep 29, 2024
renovate
bot
force-pushed
the
renovate/transformers-4.x
branch
from
September 30, 2024 18:27
f7a3fb2
to
07fa8a0
Compare
renovate
bot
changed the title
Update dependency transformers to v4.45.0
Update dependency transformers to v4.45.1
Sep 30, 2024
renovate
bot
force-pushed
the
renovate/transformers-4.x
branch
from
October 11, 2024 18:50
07fa8a0
to
f1816da
Compare
renovate
bot
changed the title
Update dependency transformers to v4.45.1
Update dependency transformers to v4.45.2
Oct 11, 2024
renovate
bot
force-pushed
the
renovate/transformers-4.x
branch
from
October 28, 2024 10:29
f1816da
to
89065f4
Compare
renovate
bot
changed the title
Update dependency transformers to v4.45.2
Update dependency transformers to v4.46.0
Oct 28, 2024
renovate
bot
force-pushed
the
renovate/transformers-4.x
branch
from
October 29, 2024 16:37
89065f4
to
ba78e43
Compare
renovate
bot
changed the title
Update dependency transformers to v4.46.0
Update dependency transformers to v4.45.2
Oct 29, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
None yet
0 participants
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR contains the following updates:
==4.38.0
->==4.45.2
Release Notes
huggingface/transformers (transformers)
v4.45.2
Compare Source
Patch release v4.45.2
Mostly some warnings that were not properly removed⚠️ :
🔴 Had a small regression with dynamic Cache 🔴
*Cache: revert DynamicCache init for BC #33861 by @gante
A small fix for idefic 🐩 :
And a fix for
Siglip
🤧 !v4.45.1
: Patch Release v4.45.1Compare Source
Patches for v4.45.1
v4.45.0
: Llama 3.2, mllama, Qwen2-Audio, Qwen2-VL, OLMoE, Llava Onevision, Pixtral, FalconMamba, Modular TransformersCompare Source
New model additions
mllama
The Llama 3.2-Vision collection of multimodal large language models (LLMs) is a collection of pretrained and instruction-tuned image reasoning generative models in 11B and 90B sizes (text + images in / text out). The Llama 3.2-Vision instruction-tuned models are optimized for visual recognition, image reasoning, captioning, and answering general questions about an image. The models outperform many of the available open source and closed multimodal models on common industry benchmarks.
Qwen2-VL
The Qwen2-VL is a major update from the previous Qwen-VL by the Qwen team.
An extract from the Qwen2-VL blogpost available here is as follows:
Qwen2-VL is the latest version of the vision language models based on Qwen2 in the Qwen model familities. Compared with Qwen-VL, Qwen2-VL has the capabilities of:
Qwen2-Audio
The Qwen2-Audio is the new model series of large audio-language models from the Qwen team. Qwen2-Audio is capable of accepting various audio signal inputs and performing audio analysis or direct textual responses with regard to speech instructions.
They introduce two distinct audio interaction modes:
OLMoE
OLMoE is a series of Open Language Models using sparse Mixture-of-Experts designed to enable the science of language models. The team releases all code, checkpoints, logs, and details involved in training these models.
Llava Onevision
LLaVA-Onevision is a Vision-Language Model that can generate text conditioned on one or several images/videos. The model consists of SigLIP vision encoder and a Qwen2 language backbone. The images are processed with anyres-9 technique where the image is split into 9 patches to better process high resolution images and capture as much details as possible. However, videos are pooled to a total sequence length of 196 tokens each frame for more memory efficient computation. LLaVA-Onevision is available in three sizes: 0.5B, 7B and 72B and achieves remarkable performance on benchmark evaluations.
FalconMamba
The FalconMamba model was proposed by TII UAE (Technology Innovation Institute) in their release.
The model has been trained on approximtely 6T tokens consisting a mixture of many data sources such as RefineWeb, Cosmopedia and Math data.
The team releases an accompanying blog post.
Granite Language Models
he Granite model was proposed in Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler by Yikang Shen, Matthew Stallone, Mayank Mishra, Gaoyuan Zhang, Shawn Tan, Aditya Prasad, Adriana Meza Soria, David D. Cox and Rameswar Panda.
PowerLM-3B is a 3B state-of-the-art small language model trained with the Power learning rate scheduler. It is trained on a wide range of open-source and synthetic datasets with permissive licenses. PowerLM-3B has shown promising results compared to other models in the size categories across various benchmarks, including natural language multi-choices, code generation, and math reasoning.
Granite MOE
The GraniteMoe model was proposed in Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler by Yikang Shen, Matthew Stallone, Mayank Mishra, Gaoyuan Zhang, Shawn Tan, Aditya Prasad, Adriana Meza Soria, David D. Cox and Rameswar Panda.
PowerMoE-3B is a 3B sparse Mixture-of-Experts (sMoE) language model trained with the Power learning rate scheduler. It sparsely activates 800M parameters for each token. It is trained on a mix of open-source and proprietary datasets. PowerMoE-3B has shown promising results compared to other dense models with 2x activate parameters across various benchmarks, including natural language multi-choices, code generation, and math reasoning.
Descript-Audio-Codec
The Descript Audio Codec (DAC) model is a powerful tool for compressing audio data, making it highly efficient for storage and transmission. By compressing 44.1 KHz audio into tokens at just 8kbps bandwidth, the DAC model enables high-quality audio processing while significantly reducing the data footprint. This is particularly useful in scenarios where bandwidth is limited or storage space is at a premium, such as in streaming applications, remote conferencing, and archiving large audio datasets.
Pixtral
The Pixtral model was released by the Mistral AI team. Pixtral is a multimodal model, taking images and text as input, and producing text as output. This model follows the Llava family, meaning image embeddings are placed instead of the [IMG] token placeholders.
The model uses PixtralVisionModel for its vision encoder, and MistralForCausalLM for its language decoder. The main contribution is the 2d ROPE (rotary postiion embeddings) on the images, and support for arbitrary image sizes (the images are not padded together nor are they resized).
Mimi
The Mimi model was proposed in Moshi: a speech-text foundation model for real-time dialogue by Alexandre Défossez, Laurent Mazaré, Manu Orsini, Amélie Royer, Patrick Pérez, Hervé Jégou, Edouard Grave and Neil Zeghidour. Mimi is a high-fidelity audio codec model developed by the Kyutai team, that combines semantic and acoustic information into audio tokens running at 12Hz and a bitrate of 1.1kbps. In other words, it can be used to map audio waveforms into “audio tokens”, known as “codebooks”.
Quantization
GGUF
GGUF support continues to be enhanced in the library by offering a way to load GGUF models within
transformers
by unquantizing them, before re-quantizing them for re-use within the GGUF/GGML ecosystem.Torch AO
An ongoing effort is to add the ability to use
torchao
as a quantization backend. Future PRs will enable saving and fine-tuning withpeft
.Liger Kernel
The Liger kernel is now supported in the
Trainer
class.Modular Transformers
This PR introduces Modularity for transformers, which has always been prohibited when working with transformers (see blog post for the accompanying design philosophy).
The core idea behind this PR is to facilitate model addition by enabling Pythonic inheritance while keeping true to our single-file policy in which models/processors must be contained within a single file, enabling working around the object without going through 10 layers of abstractions.
It is heavily recommended to read the PR description in order to understand the depth of the change: https://github.com/huggingface/transformers/pull/33248
transformers
: modularity and inheritance for new model additions by @ArthurZucker in #33248Agents
Agents
continue being improved at each release; this time making it much simpler to leverage a local engine through a local Transformers Engine.Dynamic cache for decoder-only models
This PR adds to all decoder-only models (except for XLNet) support for dynamic cache.
The documentation for the Dynamic cache can be found here, and documentation related to the KV cache in
transformers
in general can be found here.Chat templates updates
We've made several updates to our handling of chat models and chat templates. The most noticeable change is that assistant prefill is now supported. This means you can end a chat with an
assistant
message, and the model will continue that message instead of starting a new one, allowing you to guide the model's response:We've also enabled several new functionalities in Jinja that will allow more powerful templates in future, including Loop Controls and a
strftime_now
function that can get the current date and time, which is commonly used in system messages. For more details, see the updated chat template docs.Bugfixes and improvements
mask_generation.md
to Korean by @jeongiin in #32257idefics.md
to Korean by @boyunJang in #32258image_to_image.md
to Korean by @shinhyunji36 in #32327gptq.md
to Korean by @1kmmk1 in #32293prompting.md
to Korean by @chhaewxn in #32294quantization/quanto.md
to Korean by @fabxoe in #32281image_feature_extraction.md
to Korean by @mreraser in #32239chat_templating.md
to Korean by @enchantee00 in #32362_supports_sdpa
to True by @pocca2048 in #32457ko-llm_tutorial_optimization.md
to Korean by @010kim in #32372trainer.md
to Korean by @cjfghk5697 in #32260eetq.md
to Korean by @jun048098 in #32352fsdp.md
to Korean by @win2dvp21 in #32261bitsandbytes.md
to Korean by @SeungAhSon in #32408inputs_embeds
as input by @molbap in #32493test_static_cache_exportability
with torch 2.4.0 by @guangy10 in #32516agent.md
to Korean by @Jwaminju in #32351encodec
model names by @Sai-Suraj-27 in #32581.push_to_hub(..., create_pr=True, revision="my-branch")
when creating PR on not-owned repo by @Wauplin in #32094deepspeed.md
to Korean by @4N3MONE in #32431awq.md
to Korean by @ahnjj in #32324test_find_base_model_checkpoint
by @Sai-Suraj-27 in #32638is_torch_mps_available()
function to includemin_version
argument by @Sai-Suraj-27 in #32545transformers
tag to the modelcard by @LysandreJik in #32623WhisperGenerationMixin
by @faaany in #32316test_tokenization_utils.py
by @Sai-Suraj-27 in #32601tests/utils/test_add_new_model_like.py
by @Sai-Suraj-27 in #32678JetMoeIntegrationTest
by @ydshieh in #32332doctest_glob
by @Sai-Suraj-27 in #32475falcon-mamba-7b
model checkpoint name by @Sai-Suraj-27 in #32837LogitsWarper
andLogitsProcessor
by @gante in #32626batch_size
instead ofmax_batch_size
by @gante in #32657to
in DoLa body, causing exceptions in multi-gpu generation by @gante in #32856test_sdpa_can_compile_dynamic
device-agnostic by @faaany in #32519whisper-large-v2
model link in docs by @Sai-Suraj-27 in #32871norm_before_gate
usage by @vasqu in #32686tensor.norm()
with decomposed version for CLIP executorch export by @qubvel in #32887return_timestamps
whenreturn_timestamps
is not passed togenerate
function by @hrl in #31296huggingface_hub
installation to workflows by @Sai-Suraj-27 in #32891exceptions.ConnectionError
by @younesbelkada in #31469AttributeError
raised when usingTrainer
witheval_on_start=True
in Jupyter Notebook. by @fshp971 in #32849Processor.save_pretrained
caused by #31691 by @leloykun in #32921use_cache=False
by @gante in #32863PretrainedConfig
from savinggenerate
parameters; Update deprecations ingenerate
-related code 🧹 by @gante in #32659atol
intest_forward_with_num_logits_to_keep
by @gante in #33093isin_mps_friendly
, a wrapper function fortorch.isin
by @gante in #33099pydantic
required version in dockerfiles to make it compatible with DeepSpeed by @Sai-Suraj-27 in #33105efficientnet
pipeline timeout and prevent future similar issues due to large image size by @gante in #33123conversations.md
to Korean by @newfull5 in #32468llm_optims.md
to Korean by @yijun-lee in #32325return_dict_in_generate
is False but should be True by @gante in #33146bitsandbytes
) in docstrings by @rapsealk in #33230torch.from_numpy()
to create tensors for np.ndarrays by @shinyano in #33201num_logits_to_keep
in composite models by @zucchini-nlp in #33168FalconMamba
training issues due to incompatible kernelsConfiguration
📅 Schedule: Branch creation - "* 0-4 * * 3" (UTC), Automerge - At any time (no schedule defined).
🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.
♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about this update again.
This PR was generated by Mend Renovate. View the repository job log.