Democracy Detector — Multilingual Modern Bert - Binary Classifier
Task
Binary classification of sentences from political party press releases:
- 0 — Not democracy: Sentence does not contain a democratic appeal.
- 1 — Democracy: Sentence contains a democratic appeal (any rhetorical invocation of democracy, democratic norms, institutions, or principles).
This is Stage 1 of a two-stage classification pipeline:
- Stage 1 (this model): Fast binary detection of democracy-related sentences.
- Stage 2 (GPT-based): Strategy classification of detected sentences (self-assertion, accusation, counter-claim, agenda-setting).
Model Details
- Base model:
jhu-clsp/mmBERT-base - Fine-tuned on: ~3654 hand-coded sentences from the PartyPress dataset
- Languages: German, Swedish, English, Danish, Polish and Spanish (multilingual press releases)
- Max sequence length: 104 tokens
Training Configuration
| Parameter | Value |
|---|---|
| Learning rate | 0.0001 |
| Epochs | 3 |
| Batch size | 16 |
| Warmup ratio | 0.1 |
| Weight decay | 0.01 |
| Scheduler | cosine |
| Class weights | True |
| Focal loss | False (gamma=2.0) |
| Precision | fp16 |
Training Data
| Split | Total | Democracy (1) | Not democracy (0) |
|---|---|---|---|
| Train | 3654 | 1512 | 2142 |
| Val | 731 | 205 | 526 |
| Test | 412 | 169 | 243 |
Performance (Test Set)
precision recall f1-score support
Not democracy 0.907 0.918 0.912 243 Democracy 0.880 0.864 0.872 169
accuracy 0.896 412
macro avg 0.893 0.891 0.892 412
weighted avg 0.895 0.896 0.895 412
Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
repo = "LBenoit/democracy-mmBert"
tokenizer = AutoTokenizer.from_pretrained(repo)
model = AutoModelForSequenceClassification.from_pretrained(repo)
model.eval()
sentence = "Die AfD gefährdet unsere demokratische Grundordnung."
inputs = tokenizer(sentence, return_tensors="pt", truncation=True, max_length=104)
with torch.no_grad():
logits = model(**inputs).logits
prob = torch.softmax(logits, dim=-1)[0, 1].item()
label = "Democracy" if prob >= threshold else "Not democracy"
print(f"{label} (p={prob:.3f})")
Citation
Part of a PhD dissertation on democratic credibility competition in European party systems.
Author
Léandre Benoit
- Downloads last month
- 160
Model tree for LBenoit/democracy-mmBert
Base model
jhu-clsp/mmBERT-base
