Triton MoE
Self-contained Triton implementation of the HF/OpenAI-style expert MoE layer.
Exports CUDA inference layers:
triton_moe.layers.MoE: functional benchmark signature.triton_moe.layers.OpenaiExperts: HFOpenaiExperts-compatible stateless layer signature.
The implementation uses Triton route grouping, ragged grouped GEMMs, fused routing-weight application, and token-wise top-k reduction. It does not implement backward.
- Downloads last month
- -
Supported hardwares new
CUDA





