DNABERT-2 MosaicBERT Architecture; created new model from scratch

Converted from Composer checkpoint.

This model build uses Flash Attention 2 and ignores triton; the max_seq_len parameter is set to 170 and trained using amp_bf16 precision parameter.

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support