Prompt Injection DeBERTa
finetuned DeBERTa-based prompt injection detection
Build secure, reliable, and long-term AI systems focused on safety, reasoning, and developer tooling.
AI Security • Prompt Defense • LLM Safety
Building secure, reliable AI systems focused on prompt security, adversarial robustness, and practical safety tooling.
A comprehensive prompt injection and adversarial intent detection framework, classifying malicious jailbreak patterns across real-world and massive synthetic attack typologies.
SOTA Datasets: neuralchemy/prompt-injection-Threat-Matrix A highly curated, leakage-free classification dataset mapping 32,000+ entries across a 5-dimensional security ontology (Intent, Technique, Severity).
neuralchemy/prompt-injection-dataset 6000+ prompt injection and benign samples collected from realistic attack scenarios.
DeBERTa Fine-Tuned Model: neuralchemy/prompt-injection-deberta Transformer-based prompt injection classifier.
DistilBERT Base Model: neuralchemy/distilbert-base-threat-matrix A 99.4% F1-scoring Transformer defense gateway, optimized for high-speed, accurate prompt intent gating.
Classical ML Models: neuralchemy/prompt-injection-detector Ultra-lightweight machine learning classifiers (RF, LR) for legacy/offline prompt risk detection.
Live Demo Space: Prompt-injection-DeBERTa Interactive inference demo for prompt safety classification.
Advancing AI security through enterprise open-source datasets, robust model deployment, and adversarial safety research.
Building safer AI systems through open security research. 🚀