BuSTv2 β€” Siamese Transformer for AI-Paraphrase Detection

Custom siamese transformer encoder for binary classification of a pair of texts (source vs paraphrased): determines whether the source text was generated/paraphrased by a model.

Architecture

Parameter Value
vocab_size 30003
hid_dim 256
n_layers 3
n_heads 8
pf_dim 512
dropout 0.25
max_length 1024
num_classes 2
tokenizer usmiva/bert-web-bg-cased

The encoder is applied to both texts (shared weights), then mean-pooling β†’ concatenation β†’ linear classifier (hid_dim*2 β†’ 2).

Labels

  • 0 β€” human
  • 1 β€” ai

Paper

This model is described in:

BuST: A Siamese Transformer Model for AI Text Detection in Bulgarian
Andrii Maslo, Silvia Gargova. Proceedings of the Interdisciplinary Workshop on Observations of Misunderstood, Misguided and Malicious Use of Language Models (OMMM 2025), pages 45–52, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.

Abstract

We introduce BuST (Bulgarian Siamese Transformer), a novel method for detecting machine-generated Bulgarian text using paraphrase-based semantic similarity. Inspired by the RAIDAR approach, BuST employs a Siamese Transformer architecture to compare input texts with their LLM-generated paraphrases, identifying subtle linguistic patterns that indicate synthetic origin. In pilot experiments, BuST achieved 88.79% accuracy and an F1-score of 88.0%, performing competitively with strong baselines. While BERT reached higher raw scores, BuST offers a model-agnostic and adaptable framework for low-resource settings, demonstrating the promise of paraphrase-driven detection strategies.

Citation

@inproceedings{maslo-gargova-2025-bust,
    title     = "{B}u{ST}: A Siamese Transformer Model for {AI} Text Detection in {B}ulgarian",
    author    = "Maslo, Andrii  and Gargova, Silvia",
    booktitle = "Proceedings of Interdisciplinary Workshop on Observations of Misunderstood, Misguided and Malicious Use of Language Models",
    month     = sep,
    year      = "2025",
    address   = "Varna, Bulgaria",
    publisher = "INCOMA Ltd., Shoumen, Bulgaria",
    url       = "https://aclanthology.org/2025.ommm-1.5/",
    pages     = "45--52"
}
Downloads last month
12
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for gate-institute/bust

Finetuned
(2)
this model