BuSTv2 β Siamese Transformer for AI-Paraphrase Detection
Custom siamese transformer encoder for binary classification of a pair of texts (source vs paraphrased): determines whether the source text was generated/paraphrased by a model.
Architecture
| Parameter | Value |
|---|---|
| vocab_size | 30003 |
| hid_dim | 256 |
| n_layers | 3 |
| n_heads | 8 |
| pf_dim | 512 |
| dropout | 0.25 |
| max_length | 1024 |
| num_classes | 2 |
| tokenizer | usmiva/bert-web-bg-cased |
The encoder is applied to both texts (shared weights), then mean-pooling β
concatenation β linear classifier (hid_dim*2 β 2).
Labels
0β human1β ai
Paper
This model is described in:
BuST: A Siamese Transformer Model for AI Text Detection in Bulgarian
Andrii Maslo, Silvia Gargova. Proceedings of the Interdisciplinary Workshop on Observations of Misunderstood, Misguided and Malicious Use of Language Models (OMMM 2025), pages 45β52, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
- ACL Anthology: https://aclanthology.org/2025.ommm-1.5/
- PDF: https://aclanthology.org/2025.ommm-1.5.pdf
Abstract
We introduce BuST (Bulgarian Siamese Transformer), a novel method for detecting machine-generated Bulgarian text using paraphrase-based semantic similarity. Inspired by the RAIDAR approach, BuST employs a Siamese Transformer architecture to compare input texts with their LLM-generated paraphrases, identifying subtle linguistic patterns that indicate synthetic origin. In pilot experiments, BuST achieved 88.79% accuracy and an F1-score of 88.0%, performing competitively with strong baselines. While BERT reached higher raw scores, BuST offers a model-agnostic and adaptable framework for low-resource settings, demonstrating the promise of paraphrase-driven detection strategies.
Citation
@inproceedings{maslo-gargova-2025-bust,
title = "{B}u{ST}: A Siamese Transformer Model for {AI} Text Detection in {B}ulgarian",
author = "Maslo, Andrii and Gargova, Silvia",
booktitle = "Proceedings of Interdisciplinary Workshop on Observations of Misunderstood, Misguided and Malicious Use of Language Models",
month = sep,
year = "2025",
address = "Varna, Bulgaria",
publisher = "INCOMA Ltd., Shoumen, Bulgaria",
url = "https://aclanthology.org/2025.ommm-1.5/",
pages = "45--52"
}
- Downloads last month
- 12
Model tree for gate-institute/bust
Base model
usmiva/bert-web-bg-cased