Converted from Composer checkpoint.
This model build uses Flash Attention 2 and ignores triton; the max_seq_len parameter is set to 170 and trained using amp_bf16 precision parameter.