Learnable Multipliers: Freeing the Scale of Language Model Matrix Layers
Paper
β’ 2601.04890 β’ Published
β’ 42
None defined yet.
mamba is now available in transformers. Thanks to @tridao and @albertgu for this brilliant model! π and the amazing mamba-ssm kernels powering this!