Merge remote-tracking branch 'upstream/main' into dev/prompt_tuning

adapter-hub · Nov 19, 2023 · a2fbf30 · a2fbf30
2 parents 159b8bb + d45d951
commit a2fbf30
Show file tree

Hide file tree

Showing 38 changed files with 426 additions and 222 deletions.
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
diff --git a/README.md b/README.md
@@ -153,6 +153,10 @@ Currently, adapters integrates all architectures and methods listed below:
 
 We currently support the PyTorch versions of all models listed on the **[Model Overview](https://docs.adapterhub.ml/model_overview.html) page** in our documentation.
 
+## Developing & Contributing
+
+To get started with developing on _Adapters_ yourself and learn more about ways to contribute, please see https://docs.adapterhub.ml/contributing.html.
+
 ## Citation
 
 If you use this library for your work, please consider citing our paper [AdapterHub: A Framework for Adapting Transformers](https://arxiv.org/abs/2007.07779):

diff --git a/docs/adapter_composition.md b/docs/adapter_composition.md
@@ -42,14 +42,16 @@ The following table gives an overview on the supported composition blocks and th
 
 | Block | Bottleneck<br> Adapters | Prefix<br> Tuning | Compacter | LoRA | (IA)³ |
 | --- | --- | --- | --- | --- | --- |
-| [`Stack`](#stack) | ✅ | ✅ | ✅ |  |  |
+| [`Stack`](#stack) | ✅ | ✅ | ✅ | ✅(*) | ✅(*) |
 | [`Fuse`](#fuse) | ✅ |  | ✅ |  |  |
 | [`Split`](#split) | ✅ |  | ✅ |  |  |
-| [`BatchSplit`](#batchsplit) | ✅ | ✅ | ✅ |  |  |
-| [`Parallel`](#parallel) | ✅ | ✅ | ✅ |  |  |
-| [Output averaging](#output-averaging) | ✅ |  | ✅ |  |  |
+| [`BatchSplit`](#batchsplit) | ✅ | ✅ | ✅ | ✅(*) | ✅(*) |
+| [`Parallel`](#parallel) | ✅ | ✅ | ✅ | ✅(*) | ✅(*) |
+| [Output averaging](#output-averaging) | ✅ |  | ✅ | ✅(*) | ✅(*) |
 | [Parameter averaging](#parameter-averaging) | ✅ | ✅ | ✅ | ✅ | ✅ |
 
+(*) except for Deberta-v1, GPT-2.
+
 Next, we present all composition blocks in more detail.
 
 ## `Stack`

diff --git a/docs/contributing.md b/docs/contributing.md
diff --git a/docs/contributing.md b/docs/contributing.md
@@ -0,0 +1,78 @@
+# Contributing to AdapterHub
+
+There are many ways in which you can contribute to AdapterHub and the `adapters` library.
+This includes code contributions such as:
+- implementing new adapter methods
+- adding support for new Transformer
+- fixing open issues
+
+as well as non-code contributions such as:
+- training and uploading adapters to the Hub
+- writing documentation and blog posts
+- helping others with their issues and questions
+
+Whichever way you'd like to contribute, you're very welcome to do so!
+
+## Contributing to the `adapters` codebase
+
+### Setting up your dev environment
+
+To get started with writing code for `adapters`, you'd want to set up the project on a local development environment.
+
+`adapters` closely follows the original Hugging Face Transformers repository in many aspects.
+This guide assumes that you want to set up your dev environment on a local machine and that you have basic knowledge of `git`.
+Additionally, you require **Python 3.8** or above pre-installed to get started.
+
+In the following, we go through the setup procedure step by step:
+
+1. Fork [the `adapters` repository](https://github.com/adapter-hub/adapters) to get a local copy of the code under your user account.
+2. Clone your fork to your local machine:
+    ```
+    git clone --recursive git@github.com:<YOUR_USERNAME>/adapters.git
+    cd adapters
+    ```
+    **Note:** The `--recursive` flag is important to initialize git submodules.
+3. Create a virtual environment, e.g. via `virtualenv` or `conda`.
+4. Install PyTorch, following the installation command for your environment [on their website](https://pytorch.org/get-started/locally/).
+5. Install Hugging Face Transformers from the local git submodule:
+    ```
+    pip install ./hf_transformers
+    ```
+6. Install `adapters` and required dev dependencies:
+    ```
+    pip install -e ".[dev]"
+    ```
+
+### Adding Adapter Methods
+
+How to integrate new efficient fine-tuning/ adapter methods to `adapters` is described at [https://docs.adapterhub.ml/contributing/adding_adapter_methods.html](https://docs.adapterhub.ml/contributing/adding_adapter_methods.html).
+
+### Adding Adapters to a Model
+
+How to add adapter support to a model type already supported by Hugging Face Transformers is described at [https://docs.adapterhub.ml/contributing/adding_adapters_to_a_model.html](https://docs.adapterhub.ml/contributing/adding_adapters_to_a_model.html).
+
+### Testing your changes to the codebase
+
+`adapters` provides multiple Makefile targets for easily running tests and repo checks.
+Make sure these checks run without errors to pass the CI pipeline tasks when you open a pull request.
+
+To **run all tests** in the repository:
+```
+make test
+```
+
+To **auto format code and imports** in the whole codebase:
+```
+make style
+```
+This will run `black` and `isort`.
+
+To **run all quality checks** ensuring code style and repo consistency:
+```
+make quality
+```
+This will run checks with `black`, `isort` and `flake8` as well as additional custom checks.
+
+## Contributing Adapters to the Hub
+
+How to make your own trained adapters accessible via AdapterHub is described at [https://docs.adapterhub.ml/hub_contributing.html](https://docs.adapterhub.ml/hub_contributing.html).
diff --git a/docs/index.rst b/docs/index.rst
@@ -94,7 +94,6 @@ Currently, we support the PyTorch versions of all models as listed on the `Model
 
    classes/adapter_config
    classes/model_adapters_config
-   classes/adapter_modules
    classes/adapter_layer
    classes/model_mixins
    classes/adapter_training

diff --git a/src/adapters/composition.py b/src/adapters/composition.py
@@ -1,6 +1,8 @@
 import itertools
 from collections.abc import Sequence
-from typing import List, Optional, Set, Union
+from typing import List, Optional, Set, Tuple, Union
+
+import torch
 
 
 class AdapterCompositionBlock(Sequence):
@@ -242,3 +244,16 @@ def adjust_tensors_for_parallel_(hidden_states, *tensors):
             repeats[0] = hidden_states.shape[0] // tensor.shape[0]
             new_tensor = tensor.repeat(*repeats)
             tensor.set_(new_tensor)
+
+
+def match_attn_matrices_for_parallel(query, key, value) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor]:
+    """
+    Matches the shapes of query, key and value matrices for parallel composition.
+    """
+    max_bsz = max(query.shape[0], key.shape[0], value.shape[0])
+
+    query = query.repeat(max_bsz // query.shape[0], *([1] * len(query.shape[1:])))
+    key = key.repeat(max_bsz // key.shape[0], *([1] * len(key.shape[1:])))
+    value = value.repeat(max_bsz // value.shape[0], *([1] * len(value.shape[1:])))
+
+    return query, key, value
diff --git a/src/adapters/methods/adapter_layer_base.py b/src/adapters/methods/adapter_layer_base.py
@@ -150,10 +150,13 @@ class ComposableAdapterLayerBase(AdapterLayerBase):
     Base class for all adapter methods that support composition.
 
     Make sure the 'adapter_modules_name' and 'supported_compositions' attributes as well as all abstract methods are
-    overriden in derived classes.
+    overriden in derived classes. 'allow_multi_parallelize' can be set to True to allow inputs to be parallelized
+    independently multiple times. This is useful when there are multiple parallel input flows through an adapter layer
+    (e.g. in LoRA).
     """
 
     supported_compositions = []
+    allow_multi_parallelize = False
 
     def __init__(self, *args, **kwargs):
         super().__init__(*args, **kwargs)
@@ -382,15 +385,23 @@ def compose_parallel(self, adapter_setup: Parallel, state: NamedTuple, lvl: int
             orig_batch_size = self._bsz(state)
             state = self.repeat(state, adapter_setup.parallel_channels)
             context.adapters_parallelized = True
+            context.original_batch_size = orig_batch_size
         else:
+            bsz = self._bsz(state)
+            # If the input was already parallelized, we can parallelize it again.
+            # This is useful e.g. for LoRA, where attention matrices are parallelized independently.
+            if self.allow_multi_parallelize and bsz == getattr(context, "original_batch_size", -1):
+                state = self.repeat(state, adapter_setup.parallel_channels)
+                orig_batch_size = bsz
             # The base model should handle replication of input.
             # Therefore, we assume the (replicated) input batch to be divisible by the number of parallel channels.
-            if self._bsz(state) % adapter_setup.parallel_channels != 0:
+            elif bsz % adapter_setup.parallel_channels != 0:
                 raise ValueError(
                     "The total input batch size in a Parallel adapter block must be divisible by the number of"
                     " parallel channels."
                 )
-            orig_batch_size = self._bsz(state) // adapter_setup.parallel_channels
+            else:
+                orig_batch_size = bsz // adapter_setup.parallel_channels
 
         state = self.pre_block(adapter_setup, state)