Nikita Kezins
entfane
AI & ML interests
LLM post-training, adversarial training, safety, knowledge transfer
Recent Activity
updated a dataset 2 days ago
entfane/jailbreaks-only published a dataset 2 days ago
entfane/jailbreaks-only updated a model 2 days ago
entfane/llama-guard-binaryOrganizations
models 24
entfane/llama-guard-binary
Text Classification • 0.3B • Updated • 63
entfane/Toxic_Llama8B
Text Classification • 8B • Updated • 124
entfane/gpt2_constitutional_classifier_violence
Text Classification • 0.1B • Updated • 37
entfane/bert_cyberharm
Text Classification • 0.1B • Updated • 34
entfane/toxic_gemma2b_classifier
3B • Updated • 100
entfane/toxic_gpt2_lm_value_head
0.1B • Updated • 3
entfane/gpt2_constitutional_classifier_with_value_head
Text Generation • 0.1B • Updated • 5
entfane/gpt2_constitutional_classifier
Text Classification • 0.1B • Updated • 164
entfane/baby-math-135m
0.1B • Updated • 2
entfane/coder-reasoner-7Bv8
Text Generation • 8B • Updated • 4
datasets 14
entfane/jailbreaks-only
Viewer • Updated • 666 • 36
entfane/construction_points
Viewer • Updated • 10k • 170
entfane/violent_eval
Viewer • Updated • 22.4k • 42
entfane/harmful_subsets
Viewer • Updated • 571k • 7
entfane/preprocessed_toxigen
Viewer • Updated • 10.1k • 180
entfane/toxic_classification
Viewer • Updated • 38.9k • 7
entfane/toxic_chat
Viewer • Updated • 1.25M • 14
entfane/EmotionAtlas-chat
Viewer • Updated • 3.3k • 11
entfane/EmotionAtlas
Viewer • Updated • 3.3k • 8
entfane/professor-mathematics
Viewer • Updated • 64.2k • 6 • 1