Skip to content

Commit

Permalink
Merge remote-tracking branch 'upstream/main' into dev/prompt_tuning
Browse files Browse the repository at this point in the history
  • Loading branch information
lenglaender committed Nov 19, 2023
2 parents 159b8bb + d45d951 commit a2fbf30
Show file tree
Hide file tree
Showing 38 changed files with 426 additions and 222 deletions.
78 changes: 0 additions & 78 deletions CONTRIBUTING.md

This file was deleted.

4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -153,6 +153,10 @@ Currently, adapters integrates all architectures and methods listed below:

We currently support the PyTorch versions of all models listed on the **[Model Overview](https://docs.adapterhub.ml/model_overview.html) page** in our documentation.

## Developing & Contributing

To get started with developing on _Adapters_ yourself and learn more about ways to contribute, please see https://docs.adapterhub.ml/contributing.html.

## Citation

If you use this library for your work, please consider citing our paper [AdapterHub: A Framework for Adapting Transformers](https://arxiv.org/abs/2007.07779):
Expand Down
10 changes: 6 additions & 4 deletions docs/adapter_composition.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,14 +42,16 @@ The following table gives an overview on the supported composition blocks and th

| Block | Bottleneck<br> Adapters | Prefix<br> Tuning | Compacter | LoRA | (IA)³ |
| --- | --- | --- | --- | --- | --- |
| [`Stack`](#stack) |||| | |
| [`Stack`](#stack) |||| ✅(*) | ✅(*) |
| [`Fuse`](#fuse) || || | |
| [`Split`](#split) || || | |
| [`BatchSplit`](#batchsplit) |||| | |
| [`Parallel`](#parallel) |||| | |
| [Output averaging](#output-averaging) || || | |
| [`BatchSplit`](#batchsplit) |||| ✅(*) | ✅(*) |
| [`Parallel`](#parallel) |||| ✅(*) | ✅(*) |
| [Output averaging](#output-averaging) || || ✅(*) | ✅(*) |
| [Parameter averaging](#parameter-averaging) ||||||

(*) except for Deberta-v1, GPT-2.

Next, we present all composition blocks in more detail.

## `Stack`
Expand Down
1 change: 0 additions & 1 deletion docs/contributing.md

This file was deleted.

78 changes: 78 additions & 0 deletions docs/contributing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# Contributing to AdapterHub

There are many ways in which you can contribute to AdapterHub and the `adapters` library.
This includes code contributions such as:
- implementing new adapter methods
- adding support for new Transformer
- fixing open issues

as well as non-code contributions such as:
- training and uploading adapters to the Hub
- writing documentation and blog posts
- helping others with their issues and questions

Whichever way you'd like to contribute, you're very welcome to do so!

## Contributing to the `adapters` codebase

### Setting up your dev environment

To get started with writing code for `adapters`, you'd want to set up the project on a local development environment.

`adapters` closely follows the original Hugging Face Transformers repository in many aspects.
This guide assumes that you want to set up your dev environment on a local machine and that you have basic knowledge of `git`.
Additionally, you require **Python 3.8** or above pre-installed to get started.

In the following, we go through the setup procedure step by step:

1. Fork [the `adapters` repository](https://github.com/adapter-hub/adapters) to get a local copy of the code under your user account.
2. Clone your fork to your local machine:
```
git clone --recursive git@github.com:<YOUR_USERNAME>/adapters.git
cd adapters
```
**Note:** The `--recursive` flag is important to initialize git submodules.
3. Create a virtual environment, e.g. via `virtualenv` or `conda`.
4. Install PyTorch, following the installation command for your environment [on their website](https://pytorch.org/get-started/locally/).
5. Install Hugging Face Transformers from the local git submodule:
```
pip install ./hf_transformers
```
6. Install `adapters` and required dev dependencies:
```
pip install -e ".[dev]"
```
### Adding Adapter Methods
How to integrate new efficient fine-tuning/ adapter methods to `adapters` is described at [https://docs.adapterhub.ml/contributing/adding_adapter_methods.html](https://docs.adapterhub.ml/contributing/adding_adapter_methods.html).
### Adding Adapters to a Model
How to add adapter support to a model type already supported by Hugging Face Transformers is described at [https://docs.adapterhub.ml/contributing/adding_adapters_to_a_model.html](https://docs.adapterhub.ml/contributing/adding_adapters_to_a_model.html).
### Testing your changes to the codebase
`adapters` provides multiple Makefile targets for easily running tests and repo checks.
Make sure these checks run without errors to pass the CI pipeline tasks when you open a pull request.
To **run all tests** in the repository:
```
make test
```
To **auto format code and imports** in the whole codebase:
```
make style
```
This will run `black` and `isort`.
To **run all quality checks** ensuring code style and repo consistency:
```
make quality
```
This will run checks with `black`, `isort` and `flake8` as well as additional custom checks.
## Contributing Adapters to the Hub
How to make your own trained adapters accessible via AdapterHub is described at [https://docs.adapterhub.ml/hub_contributing.html](https://docs.adapterhub.ml/hub_contributing.html).
1 change: 0 additions & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,6 @@ Currently, we support the PyTorch versions of all models as listed on the `Model

classes/adapter_config
classes/model_adapters_config
classes/adapter_modules
classes/adapter_layer
classes/model_mixins
classes/adapter_training
Expand Down
17 changes: 16 additions & 1 deletion src/adapters/composition.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
import itertools
from collections.abc import Sequence
from typing import List, Optional, Set, Union
from typing import List, Optional, Set, Tuple, Union

import torch


class AdapterCompositionBlock(Sequence):
Expand Down Expand Up @@ -242,3 +244,16 @@ def adjust_tensors_for_parallel_(hidden_states, *tensors):
repeats[0] = hidden_states.shape[0] // tensor.shape[0]
new_tensor = tensor.repeat(*repeats)
tensor.set_(new_tensor)


def match_attn_matrices_for_parallel(query, key, value) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor]:
"""
Matches the shapes of query, key and value matrices for parallel composition.
"""
max_bsz = max(query.shape[0], key.shape[0], value.shape[0])

query = query.repeat(max_bsz // query.shape[0], *([1] * len(query.shape[1:])))
key = key.repeat(max_bsz // key.shape[0], *([1] * len(key.shape[1:])))
value = value.repeat(max_bsz // value.shape[0], *([1] * len(value.shape[1:])))

return query, key, value
17 changes: 14 additions & 3 deletions src/adapters/methods/adapter_layer_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -150,10 +150,13 @@ class ComposableAdapterLayerBase(AdapterLayerBase):
Base class for all adapter methods that support composition.
Make sure the 'adapter_modules_name' and 'supported_compositions' attributes as well as all abstract methods are
overriden in derived classes.
overriden in derived classes. 'allow_multi_parallelize' can be set to True to allow inputs to be parallelized
independently multiple times. This is useful when there are multiple parallel input flows through an adapter layer
(e.g. in LoRA).
"""

supported_compositions = []
allow_multi_parallelize = False

def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
Expand Down Expand Up @@ -382,15 +385,23 @@ def compose_parallel(self, adapter_setup: Parallel, state: NamedTuple, lvl: int
orig_batch_size = self._bsz(state)
state = self.repeat(state, adapter_setup.parallel_channels)
context.adapters_parallelized = True
context.original_batch_size = orig_batch_size
else:
bsz = self._bsz(state)
# If the input was already parallelized, we can parallelize it again.
# This is useful e.g. for LoRA, where attention matrices are parallelized independently.
if self.allow_multi_parallelize and bsz == getattr(context, "original_batch_size", -1):
state = self.repeat(state, adapter_setup.parallel_channels)
orig_batch_size = bsz
# The base model should handle replication of input.
# Therefore, we assume the (replicated) input batch to be divisible by the number of parallel channels.
if self._bsz(state) % adapter_setup.parallel_channels != 0:
elif bsz % adapter_setup.parallel_channels != 0:
raise ValueError(
"The total input batch size in a Parallel adapter block must be divisible by the number of"
" parallel channels."
)
orig_batch_size = self._bsz(state) // adapter_setup.parallel_channels
else:
orig_batch_size = bsz // adapter_setup.parallel_channels

state = self.pre_block(adapter_setup, state)

Expand Down
Loading

0 comments on commit a2fbf30

Please sign in to comment.