Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update dependency transformers to v4.45.2 #130

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

renovate[bot]
Copy link
Contributor

@renovate renovate bot commented Jun 5, 2024

This PR contains the following updates:

Package Change Age Adoption Passing Confidence
transformers ==4.38.0 -> ==4.45.2 age adoption passing confidence

Release Notes

huggingface/transformers (transformers)

v4.45.2

Compare Source

Patch release v4.45.2

Mostly some warnings that were not properly removed ⚠️ :

🔴 Had a small regression with dynamic Cache 🔴
*Cache: revert DynamicCache init for BC #​33861 by @​gante

A small fix for idefic 🐩 :

And a fix for Siglip 🤧 !

v4.45.1: Patch Release v4.45.1

Compare Source

Patches for v4.45.1

v4.45.0: Llama 3.2, mllama, Qwen2-Audio, Qwen2-VL, OLMoE, Llava Onevision, Pixtral, FalconMamba, Modular Transformers

Compare Source

New model additions

mllama

The Llama 3.2-Vision collection of multimodal large language models (LLMs) is a collection of pretrained and instruction-tuned image reasoning generative models in 11B and 90B sizes (text + images in / text out). The Llama 3.2-Vision instruction-tuned models are optimized for visual recognition, image reasoning, captioning, and answering general questions about an image. The models outperform many of the available open source and closed multimodal models on common industry benchmarks.

image

Qwen2-VL

The Qwen2-VL is a major update from the previous Qwen-VL by the Qwen team.

An extract from the Qwen2-VL blogpost available here is as follows:

Qwen2-VL is the latest version of the vision language models based on Qwen2 in the Qwen model familities. Compared with Qwen-VL, Qwen2-VL has the capabilities of:

  • SoTA understanding of images of various resolution & ratio: Qwen2-VL achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA, etc.
  • Understanding videos of 20min+: Qwen2-VL can understand videos over 20 minutes for high-quality video-based question answering, dialog, content creation, etc.
  • Agent that can operate your mobiles, robots, etc.: with the abilities of complex reasoning and decision making, Qwen2-VL can be integrated with devices like mobile phones, robots, etc., for automatic operation based on visual environment and text instructions.
  • Multilingual Support: to serve global users, besides English and Chinese, Qwen2-VL now supports the understanding of texts in different languages inside images, including most European languages, Japanese, Korean, Arabic, Vietnamese, etc.

image

Qwen2-Audio

The Qwen2-Audio is the new model series of large audio-language models from the Qwen team. Qwen2-Audio is capable of accepting various audio signal inputs and performing audio analysis or direct textual responses with regard to speech instructions.

They introduce two distinct audio interaction modes:

  • voice chat: users can freely engage in voice interactions with Qwen2-Audio without text input
  • audio analysis: users could provide audio and text instructions for analysis during the interaction

image

OLMoE

OLMoE is a series of Open Language Models using sparse Mixture-of-Experts designed to enable the science of language models. The team releases all code, checkpoints, logs, and details involved in training these models.

image

Llava Onevision

LLaVA-Onevision is a Vision-Language Model that can generate text conditioned on one or several images/videos. The model consists of SigLIP vision encoder and a Qwen2 language backbone. The images are processed with anyres-9 technique where the image is split into 9 patches to better process high resolution images and capture as much details as possible. However, videos are pooled to a total sequence length of 196 tokens each frame for more memory efficient computation. LLaVA-Onevision is available in three sizes: 0.5B, 7B and 72B and achieves remarkable performance on benchmark evaluations.

image

FalconMamba

The FalconMamba model was proposed by TII UAE (Technology Innovation Institute) in their release.

The model has been trained on approximtely 6T tokens consisting a mixture of many data sources such as RefineWeb, Cosmopedia and Math data.

The team releases an accompanying blog post.

image

Granite Language Models

he Granite model was proposed in Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler by Yikang Shen, Matthew Stallone, Mayank Mishra, Gaoyuan Zhang, Shawn Tan, Aditya Prasad, Adriana Meza Soria, David D. Cox and Rameswar Panda.

PowerLM-3B is a 3B state-of-the-art small language model trained with the Power learning rate scheduler. It is trained on a wide range of open-source and synthetic datasets with permissive licenses. PowerLM-3B has shown promising results compared to other models in the size categories across various benchmarks, including natural language multi-choices, code generation, and math reasoning.

image

Granite MOE

The GraniteMoe model was proposed in Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler by Yikang Shen, Matthew Stallone, Mayank Mishra, Gaoyuan Zhang, Shawn Tan, Aditya Prasad, Adriana Meza Soria, David D. Cox and Rameswar Panda.

PowerMoE-3B is a 3B sparse Mixture-of-Experts (sMoE) language model trained with the Power learning rate scheduler. It sparsely activates 800M parameters for each token. It is trained on a mix of open-source and proprietary datasets. PowerMoE-3B has shown promising results compared to other dense models with 2x activate parameters across various benchmarks, including natural language multi-choices, code generation, and math reasoning.

Descript-Audio-Codec

The Descript Audio Codec (DAC) model is a powerful tool for compressing audio data, making it highly efficient for storage and transmission. By compressing 44.1 KHz audio into tokens at just 8kbps bandwidth, the DAC model enables high-quality audio processing while significantly reducing the data footprint. This is particularly useful in scenarios where bandwidth is limited or storage space is at a premium, such as in streaming applications, remote conferencing, and archiving large audio datasets.

image

Pixtral

The Pixtral model was released by the Mistral AI team. Pixtral is a multimodal model, taking images and text as input, and producing text as output. This model follows the Llava family, meaning image embeddings are placed instead of the [IMG] token placeholders.

The model uses PixtralVisionModel for its vision encoder, and MistralForCausalLM for its language decoder. The main contribution is the 2d ROPE (rotary postiion embeddings) on the images, and support for arbitrary image sizes (the images are not padded together nor are they resized).

Mimi

The Mimi model was proposed in Moshi: a speech-text foundation model for real-time dialogue by Alexandre Défossez, Laurent Mazaré, Manu Orsini, Amélie Royer, Patrick Pérez, Hervé Jégou, Edouard Grave and Neil Zeghidour. Mimi is a high-fidelity audio codec model developed by the Kyutai team, that combines semantic and acoustic information into audio tokens running at 12Hz and a bitrate of 1.1kbps. In other words, it can be used to map audio waveforms into “audio tokens”, known as “codebooks”.

image

Quantization

GGUF

GGUF support continues to be enhanced in the library by offering a way to load GGUF models within transformers by unquantizing them, before re-quantizing them for re-use within the GGUF/GGML ecosystem.

Torch AO

An ongoing effort is to add the ability to use torchao as a quantization backend. Future PRs will enable saving and fine-tuning with peft.

Liger Kernel

The Liger kernel is now supported in the Trainer class.

Modular Transformers

This PR introduces Modularity for transformers, which has always been prohibited when working with transformers (see blog post for the accompanying design philosophy).

The core idea behind this PR is to facilitate model addition by enabling Pythonic inheritance while keeping true to our single-file policy in which models/processors must be contained within a single file, enabling working around the object without going through 10 layers of abstractions.

It is heavily recommended to read the PR description in order to understand the depth of the change: https://github.com/huggingface/transformers/pull/33248

image

Agents

Agents continue being improved at each release; this time making it much simpler to leverage a local engine through a local Transformers Engine.

Dynamic cache for decoder-only models

This PR adds to all decoder-only models (except for XLNet) support for dynamic cache.

The documentation for the Dynamic cache can be found here, and documentation related to the KV cache in transformers in general can be found here.

Chat templates updates

We've made several updates to our handling of chat models and chat templates. The most noticeable change is that assistant prefill is now supported. This means you can end a chat with an assistant message, and the model will continue that message instead of starting a new one, allowing you to guide the model's response:

pipe = pipeline("text-generation", model_checkpoint)

chat = [
    {"role": "user", "content": "Can you format the answer in JSON?"},
    {"role": "assistant", "content": '{"name": "'}
]

output = pipe(chat)   # The model will continue outputting JSON!

We've also enabled several new functionalities in Jinja that will allow more powerful templates in future, including Loop Controls and a strftime_now function that can get the current date and time, which is commonly used in system messages. For more details, see the updated chat template docs.

Bugfixes and improvements


Configuration

📅 Schedule: Branch creation - "* 0-4 * * 3" (UTC), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.


  • If you want to rebase/retry this PR, check this box

This PR was generated by Mend Renovate. View the repository job log.

@renovate renovate bot force-pushed the renovate/transformers-4.x branch from d3f070c to 24ba2af Compare July 1, 2024 15:55
@renovate renovate bot changed the title Update dependency transformers to v4.41.2 Update dependency transformers to v4.42.0 Jul 1, 2024
@renovate renovate bot force-pushed the renovate/transformers-4.x branch from 24ba2af to 9ef59a3 Compare July 1, 2024 19:30
@renovate renovate bot changed the title Update dependency transformers to v4.42.0 Update dependency transformers to v4.42.1 Jul 1, 2024
@renovate renovate bot force-pushed the renovate/transformers-4.x branch from 9ef59a3 to 9abb6f3 Compare July 2, 2024 07:16
@renovate renovate bot changed the title Update dependency transformers to v4.42.1 Update dependency transformers to v4.42.2 Jul 2, 2024
@renovate renovate bot force-pushed the renovate/transformers-4.x branch from 9abb6f3 to d173aa7 Compare July 2, 2024 16:10
@renovate renovate bot changed the title Update dependency transformers to v4.42.2 Update dependency transformers to v4.42.3 Jul 2, 2024
@renovate renovate bot force-pushed the renovate/transformers-4.x branch from d173aa7 to 1a54c3d Compare July 15, 2024 18:50
@renovate renovate bot changed the title Update dependency transformers to v4.42.3 Update dependency transformers to v4.42.4 Jul 15, 2024
@renovate renovate bot force-pushed the renovate/transformers-4.x branch from 1a54c3d to 0ded9ee Compare July 27, 2024 16:05
@renovate renovate bot changed the title Update dependency transformers to v4.42.4 Update dependency transformers to v4.43.1 Jul 27, 2024
@renovate renovate bot force-pushed the renovate/transformers-4.x branch from 0ded9ee to 50a5053 Compare July 28, 2024 16:32
@renovate renovate bot changed the title Update dependency transformers to v4.43.1 Update dependency transformers to v4.43.2 Jul 28, 2024
@renovate renovate bot force-pushed the renovate/transformers-4.x branch from 50a5053 to daab913 Compare July 30, 2024 16:30
@renovate renovate bot changed the title Update dependency transformers to v4.43.2 Update dependency transformers to v4.43.3 Jul 30, 2024
@renovate renovate bot force-pushed the renovate/transformers-4.x branch from daab913 to 0023214 Compare August 9, 2024 12:28
@renovate renovate bot changed the title Update dependency transformers to v4.43.3 Update dependency transformers to v4.43.4 Aug 9, 2024
@renovate renovate bot force-pushed the renovate/transformers-4.x branch from 0023214 to ccebe27 Compare August 10, 2024 22:22
@renovate renovate bot changed the title Update dependency transformers to v4.43.4 Update dependency transformers to v4.44.0 Aug 10, 2024
@renovate renovate bot force-pushed the renovate/transformers-4.x branch from ccebe27 to ce803f6 Compare August 24, 2024 18:38
@renovate renovate bot changed the title Update dependency transformers to v4.44.0 Update dependency transformers to v4.44.1 Aug 24, 2024
@renovate renovate bot force-pushed the renovate/transformers-4.x branch from ce803f6 to 7ecf72b Compare August 26, 2024 18:54
@renovate renovate bot changed the title Update dependency transformers to v4.44.1 Update dependency transformers to v4.44.2 Aug 26, 2024
@renovate renovate bot changed the title Update dependency transformers to v4.44.2 Update dependency transformers to v4.45.0 Sep 29, 2024
@renovate renovate bot changed the title Update dependency transformers to v4.45.0 Update dependency transformers to v4.45.1 Sep 30, 2024
@renovate renovate bot force-pushed the renovate/transformers-4.x branch from 07fa8a0 to f1816da Compare October 11, 2024 18:50
@renovate renovate bot changed the title Update dependency transformers to v4.45.1 Update dependency transformers to v4.45.2 Oct 11, 2024
@renovate renovate bot force-pushed the renovate/transformers-4.x branch from f1816da to 89065f4 Compare October 28, 2024 10:29
@renovate renovate bot changed the title Update dependency transformers to v4.45.2 Update dependency transformers to v4.46.0 Oct 28, 2024
@renovate renovate bot force-pushed the renovate/transformers-4.x branch from 89065f4 to ba78e43 Compare October 29, 2024 16:37
@renovate renovate bot changed the title Update dependency transformers to v4.46.0 Update dependency transformers to v4.45.2 Oct 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0 participants