Skip to content

Commit

Permalink
[FEAT][MambaTransformer] [README]
Browse files Browse the repository at this point in the history
  • Loading branch information
Kye committed Jan 13, 2024
1 parent 99ad44a commit 854b370
Show file tree
Hide file tree
Showing 5 changed files with 339 additions and 92 deletions.
113 changes: 31 additions & 82 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,104 +1,53 @@
[![Multi-Modality](agorabanner.png)](https://discord.gg/qUtxnK2NMf)

# Python Package Template
A easy, reliable, fluid template for python packages complete with docs, testing suites, readme's, github workflows, linting and much much more
# Mamba Transformer
Integrating Mamba/SSMs with Transformer for Enhanced Long Context and High-Quality Sequence Modeling.

This is 100% novel architecture that I have designed to combine the strengths and weaknesses out of SSMs and Attention for an all-new advanced architecture with the purpose of surpassing our old limits. Faster processing speed, longer context lengths, lower perplexity over long sequences, enhanced and superior reasoning while remaining small and compact.

## Installation
The architecture is essentially: `x -> norm -> mamba -> norm -> transformer -> norm -> ffn -> norm -> out`.

You can install the package using pip
I added in many normalizations as I believe by default training stability would be severly degraded due to 2 foreign architecture's integrating with one another.

```bash
pip install -e .
```
## Structure
```
├── LICENSE
├── Makefile
├── README.md
├── agorabanner.png
├── example.py
├── package
│ ├── __init__.py
│ ├── main.py
│ └── subfolder
│ ├── __init__.py
│ └── main.py
├── pyproject.toml
└── requirements.txt
2 directories, 11 files
```
# Usage

# Documentation
## Install
`pip3 install mambatransformer`


### Code Quality 🧹
### Usage
```python
import torch
from mt import MambaTransformer

We provide two handy commands inside the `Makefile`, namely:
# Generate a random tensor of shape (1, 10) with values between 0 and 99
x = torch.randint(0, 100, (1, 10))

- `make style` to format the code
- `make check_code_quality` to check code quality (PEP8 basically)
# Create an instance of the MambaTransformer model
model = MambaTransformer(
num_tokens=100, # Number of tokens in the input sequence
dim=512, # Dimension of the model
heads=8, # Number of attention heads
depth=4, # Number of transformer layers
dim_head=64, # Dimension of each attention head
d_state=512, # Dimension of the state
dropout=0.1, # Dropout rate
ff_mult=4 # Multiplier for the feed-forward layer dimension
)

So far, **there is no types checking with mypy**. See [issue](https://github.com/roboflow-ai/template-python/issues/4).
# Pass the input tensor through the model and print the output shape
print(model(x).shape)

### Tests 🧪

[`pytests`](https://docs.pytest.org/en/7.1.x/) is used to run our tests.
# to train
model.eval()

### Publish on PyPi 🚀
# Would you like to train this model? Zeta Corporation offers unmatchable GPU clusters at unbeatable prices, let's partner!

**Important**: Before publishing, edit `__version__` in [src/__init__](/src/__init__.py) to match the wanted new version.
# Tokenizer
model.generate(text)

We use [`twine`](https://twine.readthedocs.io/en/stable/) to make our life easier. You can publish by using

```
export PYPI_USERNAME="you_username"
export PYPI_PASSWORD="your_password"
export PYPI_TEST_PASSWORD="your_password_for_test_pypi"
make publish -e PYPI_USERNAME=$PYPI_USERNAME -e PYPI_PASSWORD=$PYPI_PASSWORD -e PYPI_TEST_PASSWORD=$PYPI_TEST_PASSWORD
```

You can also use token for auth, see [pypi doc](https://pypi.org/help/#apitoken). In that case,

```
export PYPI_USERNAME="__token__"
export PYPI_PASSWORD="your_token"
export PYPI_TEST_PASSWORD="your_token_for_test_pypi"
make publish -e PYPI_USERNAME=$PYPI_USERNAME -e PYPI_PASSWORD=$PYPI_PASSWORD -e PYPI_TEST_PASSWORD=$PYPI_TEST_PASSWORD
```

**Note**: We will try to push to [test pypi](https://test.pypi.org/) before pushing to pypi, to assert everything will work

### CI/CD 🤖

We use [GitHub actions](https://github.com/features/actions) to automatically run tests and check code quality when a new PR is done on `main`.

On any pull request, we will check the code quality and tests.

When a new release is created, we will try to push the new code to PyPi. We use [`twine`](https://twine.readthedocs.io/en/stable/) to make our life easier.

The **correct steps** to create a new realease are the following:
- edit `__version__` in [src/__init__](/src/__init__.py) to match the wanted new version.
- create a new [`tag`](https://git-scm.com/docs/git-tag) with the release name, e.g. `git tag v0.0.1 && git push origin v0.0.1` or from the GitHub UI.
- create a new release from GitHub UI

The CI will run when you create the new release.

# Docs
We use MK docs. This repo comes with the zeta docs. All the docs configurations are already here along with the readthedocs configs

# Q&A

## Why no cookiecutter?
This is a template repo, it's meant to be used inside GitHub upon repo creation.

## Why reinvent the wheel?

There are several very good templates on GitHub, I prefer to use code we wrote instead of blinding taking the most starred template and having features we don't need. From experience, it's better to keep it simple and general enough for our specific use cases.

# Architecture

# License
MIT
Expand Down
22 changes: 22 additions & 0 deletions example.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
import torch
from mamba_transformer.model import MambaTransformer

# Generate a random tensor of shape (1, 10) with values between 0 and 99
x = torch.randint(0, 100, (1, 10))

# Create an instance of the MambaTransformer model
model = MambaTransformer(
num_tokens=100, # Number of tokens in the input sequence
dim=512, # Dimension of the model
heads=8, # Number of attention heads
depth=4, # Number of transformer layers
dim_head=64, # Dimension of each attention head
d_state=512, # Dimension of the state
dropout=0.1, # Dropout rate
ff_mult=4 # Multiplier for the feed-forward layer dimension
)

# Pass the input tensor through the model and print the output shape
print(model(x).shape)


7 changes: 7 additions & 0 deletions mamba_transformer/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
from mamba_transformer.model import RMSNorm, MambaTransformerblock, MambaTransformer

__all__ = [
"RMSNorm",
"MambaTransformerblock",
"MambaTransformer"
]
Loading

0 comments on commit 854b370

Please sign in to comment.