STAR is a novel neural network architecture that implements Linear Input-Varying (LIV) operators with structured token mixing patterns. This architecture provides an efficient approach to sequence modeling by combining different mixing structures with adaptive residual connections.
- Flexible token mixing structures (Diagonal, Low-Rank, Scaled-Toeplitz, Sequential Semi-Separable)
- Configurable channel mixing patterns (Diagonal, Dense, Grouped)
- Feature sharing mechanisms for improved efficiency
- Adaptive residual connections with pre-norm architecture
- Genome-based architecture specification
pip install star-backbone
from star import STARBackbone, LIVConfig, TokenMixingStructure, ChannelMixingStructure
# Configure model
dim = 512
depth = 24
# Define genome
genome = [
[1, 1, 1, 1, 1], # SA-1
[9, 1, 1, 1, 1], # GMemless
[1, 2, 1, 2, 1], # SA-1 with sharing
]
# Configure operators
configs = {
1: LIVConfig(
featurizer_class=1,
token_mixing=TokenMixingStructure.LOW_RANK,
sparsity_mask=False,
nonlinearity="softmax",
channel_mixing=ChannelMixingStructure.GROUPED
),
9: LIVConfig(
featurizer_class=9,
token_mixing=TokenMixingStructure.DIAGONAL,
sparsity_mask=False,
nonlinearity="silu",
channel_mixing=ChannelMixingStructure.DENSE
)
}
# Create model
model = STARBackbone(dim, depth, genome, configs)
The core building blocks are Linear Input-Varying (LIV) operators that combine:
- Token mixing structures for sequence interaction
- Channel mixing patterns for feature transformation
- Nonlinear activations
- Optional sparsity masks
- DIAGONAL: Element-wise scaling
- LOW_RANK: Attention-like mechanisms with Q/K/V projections
- SCALED_TOEPLITZ: Convolution-based local mixing
- SEQUENTIAL_SEMI_SEPARABLE: Recurrent processing with gating
- DIAGONAL: Independent channel scaling
- DENSE: Full channel interaction
- GROUPED: Group-wise channel mixing
Each layer is specified by a 5-integer sequence:
- LIV operator class ID
- Featurizer sharing group
- Reserved
- Feature sharing group
- Reserved
The LIVConfig
dataclass specifies:
featurizer_class
: Integer ID for featurizer typetoken_mixing
: TokenMixingStructure enum valuechannel_mixing
: ChannelMixingStructure enum valuesparsity_mask
: Boolean for optional sparsitynonlinearity
: Optional activation function nameexpansion_factor
: Channel expansion multiplierrepeat_factor
: Feature repeat factor
- Fork the repository
- Create feature branch (
git checkout -b feature/name
) - Commit changes (
git commit -am 'Add feature'
) - Push branch (
git push origin feature/name
) - Open Pull Request
MIT License. See LICENSE file for details.
If you use STAR in your research, please cite:
@article{star2024,
title={STAR: Structured Token-mixing Adaptive Residual Networks},
author={[Authors]},
journal={[Journal]},
year={2024}
}