STAR: Structured Token-mixing Adaptive Residual Networks

STAR is a novel neural network architecture that implements Linear Input-Varying (LIV) operators with structured token mixing patterns. This architecture provides an efficient approach to sequence modeling by combining different mixing structures with adaptive residual connections.

Key Features

Flexible token mixing structures (Diagonal, Low-Rank, Scaled-Toeplitz, Sequential Semi-Separable)
Configurable channel mixing patterns (Diagonal, Dense, Grouped)
Feature sharing mechanisms for improved efficiency
Adaptive residual connections with pre-norm architecture
Genome-based architecture specification

Installation

pip install star-backbone

Quick Start

from star import STARBackbone, LIVConfig, TokenMixingStructure, ChannelMixingStructure

# Configure model
dim = 512
depth = 24

# Define genome
genome = [
    [1, 1, 1, 1, 1],  # SA-1
    [9, 1, 1, 1, 1],  # GMemless
    [1, 2, 1, 2, 1],  # SA-1 with sharing
]

# Configure operators
configs = {
    1: LIVConfig(
        featurizer_class=1,
        token_mixing=TokenMixingStructure.LOW_RANK,
        sparsity_mask=False,
        nonlinearity="softmax",
        channel_mixing=ChannelMixingStructure.GROUPED
    ),
    9: LIVConfig(
        featurizer_class=9,
        token_mixing=TokenMixingStructure.DIAGONAL,
        sparsity_mask=False,
        nonlinearity="silu",
        channel_mixing=ChannelMixingStructure.DENSE
    )
}

# Create model
model = STARBackbone(dim, depth, genome, configs)

Architecture Details

LIV Operators

The core building blocks are Linear Input-Varying (LIV) operators that combine:

Token mixing structures for sequence interaction
Channel mixing patterns for feature transformation
Nonlinear activations
Optional sparsity masks

Token Mixing Structures

DIAGONAL: Element-wise scaling
LOW_RANK: Attention-like mechanisms with Q/K/V projections
SCALED_TOEPLITZ: Convolution-based local mixing
SEQUENTIAL_SEMI_SEPARABLE: Recurrent processing with gating

Channel Mixing Types

DIAGONAL: Independent channel scaling
DENSE: Full channel interaction
GROUPED: Group-wise channel mixing

Genome Specification

Each layer is specified by a 5-integer sequence:

LIV operator class ID
Featurizer sharing group
Reserved
Feature sharing group
Reserved

Configuration

The LIVConfig dataclass specifies:

featurizer_class: Integer ID for featurizer type
token_mixing: TokenMixingStructure enum value
channel_mixing: ChannelMixingStructure enum value
sparsity_mask: Boolean for optional sparsity
nonlinearity: Optional activation function name
expansion_factor: Channel expansion multiplier
repeat_factor: Feature repeat factor

Contributing

Fork the repository
Create feature branch (git checkout -b feature/name)
Commit changes (git commit -am 'Add feature')
Push branch (git push origin feature/name)
Open Pull Request

License

MIT License. See LICENSE file for details.

Citation

If you use STAR in your research, please cite:

@article{star2024,
  title={STAR: Structured Token-mixing Adaptive Residual Networks},
  author={[Authors]},
  journal={[Journal]},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github		.github
star		star
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
agorabanner.png		agorabanner.png
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

STAR: Structured Token-mixing Adaptive Residual Networks

Key Features

Installation

Quick Start

Architecture Details

LIV Operators

Token Mixing Structures

Channel Mixing Types

Genome Specification

Configuration

Contributing

License

Citation

About

Releases

Sponsor this project

Packages

Languages

License

Agora-Lab-AI/STAR

Folders and files

Latest commit

History

Repository files navigation

STAR: Structured Token-mixing Adaptive Residual Networks

Key Features

Installation

Quick Start

Architecture Details

LIV Operators

Token Mixing Structures

Channel Mixing Types

Genome Specification

Configuration

Contributing

License

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Sponsor this project

Packages 0

Languages

Packages