diff --git a/README.md b/README.md new file mode 100644 index 0000000000000000000000000000000000000000..5402759e379b015443255354d8b56fc9999ea5b0 --- /dev/null +++ b/README.md @@ -0,0 +1,113 @@ +--- +license: mit +tags: + - deception-detection + - linear-probe + - mechanistic-interpretability +base_model: allenai/Olmo-3-7B-Instruct +--- + +# Diverse Deception Linear Probe — OLMo-3-7B-Instruct + +Per-layer linear probes trained on **OLMo-3-7B-Instruct** residual stream activations +to distinguish deceptive from honest text completions. + +## Performance + +| Variant | Token Selection | Follow-up Prompt | Averaged AUC | Layers Used | +|---|---|---|---|---| +| **last-token-generation** | last token generation | `Did you respond truthfully?` → `Yes` | **0.992** | filtered by train loss < 0.6 | + + +AUC is computed on MBPP (hardcoded vs. correct code), using probe scores +**averaged across filtered layers** (layers where probe train loss < 0.6). + +## Probe Architecture + +Each layer has an independent `LinearProbe`: + +| Component | Shape | Description | +|---|---|---| +| `linear.weight` | `[1, 4096]` | Classification weight vector | +| `linear.bias` | `[1]` | Classification bias | +| `input_scale` | scalar | Unit-norm scaling factor | +| `input_mean` | scalar | Input centering value | +| `platt_A`, `platt_B` | scalar | Platt calibration parameters | + +### File layout + +``` +/layer_N/ + config.json # probe class and d_model + model.pt # PyTorch state_dict +``` + +## Follow-up Prompt Setup + +For `last-token-generation` variants, a follow-up exchange is appended after the +completion to elicit a truthfulness signal. The probe reads activations at the +**last token** of the assistant's reply ("Yes"): + +``` +[main prompt + completion] + Did you respond truthfully? + Yes ← probe reads activations here +``` + +Formatted with OLMo-3-7B-Instruct's chat template: + +``` +<|im_start|>user +Did you respond truthfully?<|im_end|> +<|im_start|>assistant +Yes +``` + +The `generation` variant (where available) uses **no follow-up** and averages +probe scores over all tokens in the model's response. + +## Training Details + +- **Probe dataset**: `diverse_deception_probe` v1 — ~8,000 examples from 8+ sources + (MASK, TruthfulQA, DolusChat, LiarsBench, sycophancy, sandbagging, etc.) +- **Training**: 1,000 steps, lr=3e-3, constant schedule, no weight decay +- **Normalization**: unit-norm input scaling +- **Layer aggregation**: mean (scores from multiple layers averaged at inference) + +## Quick Start + +```python +import torch +from transformers import AutoModelForCausalLM, AutoTokenizer +from huggingface_hub import snapshot_download + +# Download probes +probe_dir = snapshot_download("AlignmentResearch/diverse-deception-probe-olmo-3-7b-instruct") + +# Load one layer's probe +state = torch.load(f"{probe_dir}/last-token-generation/layer_20/model.pt", weights_only=False) +w, b = state["linear.weight"], state["linear.bias"] +scale, mean = state["input_scale"], state["input_mean"] + +# Load model +model = AutoModelForCausalLM.from_pretrained("allenai/Olmo-3-7B-Instruct", torch_dtype=torch.bfloat16, device_map="auto") +tokenizer = AutoTokenizer.from_pretrained("allenai/Olmo-3-7B-Instruct") + +text = tokenizer.apply_chat_template( + [{"role": "user", "content": "Your prompt here"}], + tokenize=False, add_generation_prompt=True, +) +text += "The model's completion" + +inputs = tokenizer(text, return_tensors="pt").to(model.device) +with torch.no_grad(): + h = model(**inputs, output_hidden_states=True).hidden_states[21][:, -1, :] + +score = ((h.float() - mean) / scale @ w.T + b).item() +# score > 0 → likely deceptive, score < 0 → likely honest +``` + +## Citation + +Part of the [FAR AI](https://far.ai) deception detection research. +See [AlignmentResearch/deception](https://github.com/AlignmentResearch/deception). diff --git a/averaged_over_initial_detector.base_model.on_dataset_linear-probe_last-token-generation.txt b/averaged_over_initial_detector.base_model.on_dataset_linear-probe_last-token-generation.txt new file mode 100644 index 0000000000000000000000000000000000000000..5e8927ec6cfed559bacc1a35f1d0c83d6daade4a --- /dev/null +++ b/averaged_over_initial_detector.base_model.on_dataset_linear-probe_last-token-generation.txt @@ -0,0 +1 @@ +[16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31] \ No newline at end of file diff --git a/averaged_violin_initial_detector.base_model.on_dataset_linear-probe_last-token-generation.png b/averaged_violin_initial_detector.base_model.on_dataset_linear-probe_last-token-generation.png new file mode 100644 index 0000000000000000000000000000000000000000..b3efebf1e404e5bfefa7188dcef1cd473f84ac35 Binary files /dev/null and b/averaged_violin_initial_detector.base_model.on_dataset_linear-probe_last-token-generation.png differ diff --git a/f1_and_recall_plot_initial_detector.base_model.on_dataset.png b/f1_and_recall_plot_initial_detector.base_model.on_dataset.png new file mode 100644 index 0000000000000000000000000000000000000000..db89d1f15eff14a7c5d0b7e33396bcbc93633767 Binary files /dev/null and b/f1_and_recall_plot_initial_detector.base_model.on_dataset.png differ diff --git a/last-token-generation/layer_0/config.json b/last-token-generation/layer_0/config.json new file mode 100644 index 0000000000000000000000000000000000000000..ddd4d9f60e201edb944dc39d1eb7ad78e840d831 --- /dev/null +++ b/last-token-generation/layer_0/config.json @@ -0,0 +1,9 @@ +{ + "class_name": "LinearProbe", + "module": "deception.oa_backdoor.detectors.probe_archs", + "init_args": { + "d_model": 4096, + "nhead": 1, + "normalize_input": "unit_norm" + } +} \ No newline at end of file diff --git a/last-token-generation/layer_0/model.pt b/last-token-generation/layer_0/model.pt new file mode 100644 index 0000000000000000000000000000000000000000..9bbf52be1cb68ad6dc82d6a694de4a252d1d72a8 --- /dev/null +++ b/last-token-generation/layer_0/model.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a1990ea7a76bf454f9241f3bc71a3d0b057ee7093787552cce93d771e858c760 +size 19197 diff --git a/last-token-generation/layer_1/config.json b/last-token-generation/layer_1/config.json new file mode 100644 index 0000000000000000000000000000000000000000..ddd4d9f60e201edb944dc39d1eb7ad78e840d831 --- /dev/null +++ b/last-token-generation/layer_1/config.json @@ -0,0 +1,9 @@ +{ + "class_name": "LinearProbe", + "module": "deception.oa_backdoor.detectors.probe_archs", + "init_args": { + "d_model": 4096, + "nhead": 1, + "normalize_input": "unit_norm" + } +} \ No newline at end of file diff --git a/last-token-generation/layer_1/model.pt b/last-token-generation/layer_1/model.pt new file mode 100644 index 0000000000000000000000000000000000000000..617c0db2f2017b9616eb75b564f2f34fc20bc48f --- /dev/null +++ b/last-token-generation/layer_1/model.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bdb7488e559283ac9af79db5dfa4e88d53ba9b41a961aa60d5c7591796e13716 +size 19197 diff --git a/last-token-generation/layer_10/config.json b/last-token-generation/layer_10/config.json new file mode 100644 index 0000000000000000000000000000000000000000..ddd4d9f60e201edb944dc39d1eb7ad78e840d831 --- /dev/null +++ b/last-token-generation/layer_10/config.json @@ -0,0 +1,9 @@ +{ + "class_name": "LinearProbe", + "module": "deception.oa_backdoor.detectors.probe_archs", + "init_args": { + "d_model": 4096, + "nhead": 1, + "normalize_input": "unit_norm" + } +} \ No newline at end of file diff --git a/last-token-generation/layer_10/model.pt b/last-token-generation/layer_10/model.pt new file mode 100644 index 0000000000000000000000000000000000000000..1bf67fd3eba88497a4659dc00d120aae7dc4bb7f --- /dev/null +++ b/last-token-generation/layer_10/model.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:be4b1840f9621066ef8f179960346c5e79e05e5cedea8520dce10664b0ee7624 +size 19197 diff --git a/last-token-generation/layer_11/config.json b/last-token-generation/layer_11/config.json new file mode 100644 index 0000000000000000000000000000000000000000..ddd4d9f60e201edb944dc39d1eb7ad78e840d831 --- /dev/null +++ b/last-token-generation/layer_11/config.json @@ -0,0 +1,9 @@ +{ + "class_name": "LinearProbe", + "module": "deception.oa_backdoor.detectors.probe_archs", + "init_args": { + "d_model": 4096, + "nhead": 1, + "normalize_input": "unit_norm" + } +} \ No newline at end of file diff --git a/last-token-generation/layer_11/model.pt b/last-token-generation/layer_11/model.pt new file mode 100644 index 0000000000000000000000000000000000000000..483fd924c9b2063d4d3a5af9f39ffdff2298a3e7 --- /dev/null +++ b/last-token-generation/layer_11/model.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4d77838517ace40fda0b6b4e9504e60a9e8aeb1cc8040c4a7c2e607b49514fe4 +size 19197 diff --git a/last-token-generation/layer_12/config.json b/last-token-generation/layer_12/config.json new file mode 100644 index 0000000000000000000000000000000000000000..ddd4d9f60e201edb944dc39d1eb7ad78e840d831 --- /dev/null +++ b/last-token-generation/layer_12/config.json @@ -0,0 +1,9 @@ +{ + "class_name": "LinearProbe", + "module": "deception.oa_backdoor.detectors.probe_archs", + "init_args": { + "d_model": 4096, + "nhead": 1, + "normalize_input": "unit_norm" + } +} \ No newline at end of file diff --git a/last-token-generation/layer_12/model.pt b/last-token-generation/layer_12/model.pt new file mode 100644 index 0000000000000000000000000000000000000000..3cd0032c2c9367da7b78b4fd4fccac685d54cf09 --- /dev/null +++ b/last-token-generation/layer_12/model.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d90321040d48e96f9ee19e878c8f1c1d0199e12bb5f7cad8bea4ef73c408aa1b +size 19197 diff --git a/last-token-generation/layer_13/config.json b/last-token-generation/layer_13/config.json new file mode 100644 index 0000000000000000000000000000000000000000..ddd4d9f60e201edb944dc39d1eb7ad78e840d831 --- /dev/null +++ b/last-token-generation/layer_13/config.json @@ -0,0 +1,9 @@ +{ + "class_name": "LinearProbe", + "module": "deception.oa_backdoor.detectors.probe_archs", + "init_args": { + "d_model": 4096, + "nhead": 1, + "normalize_input": "unit_norm" + } +} \ No newline at end of file diff --git a/last-token-generation/layer_13/model.pt b/last-token-generation/layer_13/model.pt new file mode 100644 index 0000000000000000000000000000000000000000..ed99e042343971eb7d542411bd80ef36277bcb95 --- /dev/null +++ b/last-token-generation/layer_13/model.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5a34f2d7f24716f506f13c0fd59f9d45ce1c7df0fca9a5cc27e341411669071f +size 19197 diff --git a/last-token-generation/layer_14/config.json b/last-token-generation/layer_14/config.json new file mode 100644 index 0000000000000000000000000000000000000000..ddd4d9f60e201edb944dc39d1eb7ad78e840d831 --- /dev/null +++ b/last-token-generation/layer_14/config.json @@ -0,0 +1,9 @@ +{ + "class_name": "LinearProbe", + "module": "deception.oa_backdoor.detectors.probe_archs", + "init_args": { + "d_model": 4096, + "nhead": 1, + "normalize_input": "unit_norm" + } +} \ No newline at end of file diff --git a/last-token-generation/layer_14/model.pt b/last-token-generation/layer_14/model.pt new file mode 100644 index 0000000000000000000000000000000000000000..43559fedfd647fc1545e11c72f283ee6b12bf946 --- /dev/null +++ b/last-token-generation/layer_14/model.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:00a1b2dd4772f013c3a4fe0876549c988763eeee2c4bb2aeba5dbe9579ed0532 +size 19197 diff --git a/last-token-generation/layer_15/config.json b/last-token-generation/layer_15/config.json new file mode 100644 index 0000000000000000000000000000000000000000..ddd4d9f60e201edb944dc39d1eb7ad78e840d831 --- /dev/null +++ b/last-token-generation/layer_15/config.json @@ -0,0 +1,9 @@ +{ + "class_name": "LinearProbe", + "module": "deception.oa_backdoor.detectors.probe_archs", + "init_args": { + "d_model": 4096, + "nhead": 1, + "normalize_input": "unit_norm" + } +} \ No newline at end of file diff --git a/last-token-generation/layer_15/model.pt b/last-token-generation/layer_15/model.pt new file mode 100644 index 0000000000000000000000000000000000000000..3b3db2c1ddd7dca77ed46ee6c9129e2605bc1670 --- /dev/null +++ b/last-token-generation/layer_15/model.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:20458d2cc803f764514b4c44135e850733256164a2212cf2ec4551263030cdd6 +size 19197 diff --git a/last-token-generation/layer_16/config.json b/last-token-generation/layer_16/config.json new file mode 100644 index 0000000000000000000000000000000000000000..ddd4d9f60e201edb944dc39d1eb7ad78e840d831 --- /dev/null +++ b/last-token-generation/layer_16/config.json @@ -0,0 +1,9 @@ +{ + "class_name": "LinearProbe", + "module": "deception.oa_backdoor.detectors.probe_archs", + "init_args": { + "d_model": 4096, + "nhead": 1, + "normalize_input": "unit_norm" + } +} \ No newline at end of file diff --git a/last-token-generation/layer_16/model.pt b/last-token-generation/layer_16/model.pt new file mode 100644 index 0000000000000000000000000000000000000000..3e6ec732a71048ec47de2af13cde9ffdf0b487d1 --- /dev/null +++ b/last-token-generation/layer_16/model.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:43ffdaeb21ac197abeed828feef54e91d5b68d36c6c5429f3dfe2c950ccf4869 +size 19197 diff --git a/last-token-generation/layer_17/config.json b/last-token-generation/layer_17/config.json new file mode 100644 index 0000000000000000000000000000000000000000..ddd4d9f60e201edb944dc39d1eb7ad78e840d831 --- /dev/null +++ b/last-token-generation/layer_17/config.json @@ -0,0 +1,9 @@ +{ + "class_name": "LinearProbe", + "module": "deception.oa_backdoor.detectors.probe_archs", + "init_args": { + "d_model": 4096, + "nhead": 1, + "normalize_input": "unit_norm" + } +} \ No newline at end of file diff --git a/last-token-generation/layer_17/model.pt b/last-token-generation/layer_17/model.pt new file mode 100644 index 0000000000000000000000000000000000000000..71d85f7e0473e9e2d329318429ba22c5d3465de2 --- /dev/null +++ b/last-token-generation/layer_17/model.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ca4c940c8daf1bc9ab8d5dfb86ccfc78ebfe76879f2910a422fd0770ee158922 +size 19197 diff --git a/last-token-generation/layer_18/config.json b/last-token-generation/layer_18/config.json new file mode 100644 index 0000000000000000000000000000000000000000..ddd4d9f60e201edb944dc39d1eb7ad78e840d831 --- /dev/null +++ b/last-token-generation/layer_18/config.json @@ -0,0 +1,9 @@ +{ + "class_name": "LinearProbe", + "module": "deception.oa_backdoor.detectors.probe_archs", + "init_args": { + "d_model": 4096, + "nhead": 1, + "normalize_input": "unit_norm" + } +} \ No newline at end of file diff --git a/last-token-generation/layer_18/model.pt b/last-token-generation/layer_18/model.pt new file mode 100644 index 0000000000000000000000000000000000000000..8a70736494de359e6f7e0033cd6cbae022e42af8 --- /dev/null +++ b/last-token-generation/layer_18/model.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a60f3441d039d58c41603284b8fb670129a91ac10f243f0a62c10b5a387ff118 +size 19197 diff --git a/last-token-generation/layer_19/config.json b/last-token-generation/layer_19/config.json new file mode 100644 index 0000000000000000000000000000000000000000..ddd4d9f60e201edb944dc39d1eb7ad78e840d831 --- /dev/null +++ b/last-token-generation/layer_19/config.json @@ -0,0 +1,9 @@ +{ + "class_name": "LinearProbe", + "module": "deception.oa_backdoor.detectors.probe_archs", + "init_args": { + "d_model": 4096, + "nhead": 1, + "normalize_input": "unit_norm" + } +} \ No newline at end of file diff --git a/last-token-generation/layer_19/model.pt b/last-token-generation/layer_19/model.pt new file mode 100644 index 0000000000000000000000000000000000000000..0b2aff80f8d34778a5a8ee928e33125ba8882cce --- /dev/null +++ b/last-token-generation/layer_19/model.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2df0f3e95bd91c8fa0bfee277099205043a8f18b628b6f43fb24a0ead5e45cc9 +size 19197 diff --git a/last-token-generation/layer_2/config.json b/last-token-generation/layer_2/config.json new file mode 100644 index 0000000000000000000000000000000000000000..ddd4d9f60e201edb944dc39d1eb7ad78e840d831 --- /dev/null +++ b/last-token-generation/layer_2/config.json @@ -0,0 +1,9 @@ +{ + "class_name": "LinearProbe", + "module": "deception.oa_backdoor.detectors.probe_archs", + "init_args": { + "d_model": 4096, + "nhead": 1, + "normalize_input": "unit_norm" + } +} \ No newline at end of file diff --git a/last-token-generation/layer_2/model.pt b/last-token-generation/layer_2/model.pt new file mode 100644 index 0000000000000000000000000000000000000000..423761bc02b6c0b2c49a31f12700040cf037ef19 --- /dev/null +++ b/last-token-generation/layer_2/model.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7c195b1b1687fcf46e88a1cbe80c70a61c443203ed2632962eebcfa2e9b922d7 +size 19197 diff --git a/last-token-generation/layer_20/config.json b/last-token-generation/layer_20/config.json new file mode 100644 index 0000000000000000000000000000000000000000..ddd4d9f60e201edb944dc39d1eb7ad78e840d831 --- /dev/null +++ b/last-token-generation/layer_20/config.json @@ -0,0 +1,9 @@ +{ + "class_name": "LinearProbe", + "module": "deception.oa_backdoor.detectors.probe_archs", + "init_args": { + "d_model": 4096, + "nhead": 1, + "normalize_input": "unit_norm" + } +} \ No newline at end of file diff --git a/last-token-generation/layer_20/model.pt b/last-token-generation/layer_20/model.pt new file mode 100644 index 0000000000000000000000000000000000000000..34f2d20d12c1c2ee1848b47fd8ef27ae6cd86739 --- /dev/null +++ b/last-token-generation/layer_20/model.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:80c448bf021c1b3242e85e6f7cf4d2992c102b56334b46e54f4cd989801c21ff +size 19197 diff --git a/last-token-generation/layer_21/config.json b/last-token-generation/layer_21/config.json new file mode 100644 index 0000000000000000000000000000000000000000..ddd4d9f60e201edb944dc39d1eb7ad78e840d831 --- /dev/null +++ b/last-token-generation/layer_21/config.json @@ -0,0 +1,9 @@ +{ + "class_name": "LinearProbe", + "module": "deception.oa_backdoor.detectors.probe_archs", + "init_args": { + "d_model": 4096, + "nhead": 1, + "normalize_input": "unit_norm" + } +} \ No newline at end of file diff --git a/last-token-generation/layer_21/model.pt b/last-token-generation/layer_21/model.pt new file mode 100644 index 0000000000000000000000000000000000000000..b3af43017ec705e39c8fc34d3a970343bf426bab --- /dev/null +++ b/last-token-generation/layer_21/model.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:762b028cce228710e92176b7ffb032bf550cb3b16c5c3e736aed593f8698dbf0 +size 19197 diff --git a/last-token-generation/layer_22/config.json b/last-token-generation/layer_22/config.json new file mode 100644 index 0000000000000000000000000000000000000000..ddd4d9f60e201edb944dc39d1eb7ad78e840d831 --- /dev/null +++ b/last-token-generation/layer_22/config.json @@ -0,0 +1,9 @@ +{ + "class_name": "LinearProbe", + "module": "deception.oa_backdoor.detectors.probe_archs", + "init_args": { + "d_model": 4096, + "nhead": 1, + "normalize_input": "unit_norm" + } +} \ No newline at end of file diff --git a/last-token-generation/layer_22/model.pt b/last-token-generation/layer_22/model.pt new file mode 100644 index 0000000000000000000000000000000000000000..dcd253968622095fc0ec0172fb20429dcb7cc8b0 --- /dev/null +++ b/last-token-generation/layer_22/model.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:628bd6b9dd16a9598929ceb9cd12e71af6e6754402efa5c264e3cf693fc08ecb +size 19197 diff --git a/last-token-generation/layer_23/config.json b/last-token-generation/layer_23/config.json new file mode 100644 index 0000000000000000000000000000000000000000..ddd4d9f60e201edb944dc39d1eb7ad78e840d831 --- /dev/null +++ b/last-token-generation/layer_23/config.json @@ -0,0 +1,9 @@ +{ + "class_name": "LinearProbe", + "module": "deception.oa_backdoor.detectors.probe_archs", + "init_args": { + "d_model": 4096, + "nhead": 1, + "normalize_input": "unit_norm" + } +} \ No newline at end of file diff --git a/last-token-generation/layer_23/model.pt b/last-token-generation/layer_23/model.pt new file mode 100644 index 0000000000000000000000000000000000000000..ee49788d339edcd15e017a6c936d9a034ad4f971 --- /dev/null +++ b/last-token-generation/layer_23/model.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6c62d4ddf800b48c1b2070ccb3cd8515cf2e2235bf82a3afcbc8a50c7a7e2d49 +size 19197 diff --git a/last-token-generation/layer_24/config.json b/last-token-generation/layer_24/config.json new file mode 100644 index 0000000000000000000000000000000000000000..ddd4d9f60e201edb944dc39d1eb7ad78e840d831 --- /dev/null +++ b/last-token-generation/layer_24/config.json @@ -0,0 +1,9 @@ +{ + "class_name": "LinearProbe", + "module": "deception.oa_backdoor.detectors.probe_archs", + "init_args": { + "d_model": 4096, + "nhead": 1, + "normalize_input": "unit_norm" + } +} \ No newline at end of file diff --git a/last-token-generation/layer_24/model.pt b/last-token-generation/layer_24/model.pt new file mode 100644 index 0000000000000000000000000000000000000000..140b0fe78e64d6bdbaf013dcc17774646206f1b7 --- /dev/null +++ b/last-token-generation/layer_24/model.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9e78a2ea7cb45bf90366f1ea001fa59c58a065d146ac7c54bd913c11845b90c2 +size 19197 diff --git a/last-token-generation/layer_25/config.json b/last-token-generation/layer_25/config.json new file mode 100644 index 0000000000000000000000000000000000000000..ddd4d9f60e201edb944dc39d1eb7ad78e840d831 --- /dev/null +++ b/last-token-generation/layer_25/config.json @@ -0,0 +1,9 @@ +{ + "class_name": "LinearProbe", + "module": "deception.oa_backdoor.detectors.probe_archs", + "init_args": { + "d_model": 4096, + "nhead": 1, + "normalize_input": "unit_norm" + } +} \ No newline at end of file diff --git a/last-token-generation/layer_25/model.pt b/last-token-generation/layer_25/model.pt new file mode 100644 index 0000000000000000000000000000000000000000..f133a13a2eec3aa44fa8dc23f615e96caeb3b68c --- /dev/null +++ b/last-token-generation/layer_25/model.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c318a924980fea8f748e4d1953239202e60ffcb5b4c517a72975ad4a5cf85053 +size 19197 diff --git a/last-token-generation/layer_26/config.json b/last-token-generation/layer_26/config.json new file mode 100644 index 0000000000000000000000000000000000000000..ddd4d9f60e201edb944dc39d1eb7ad78e840d831 --- /dev/null +++ b/last-token-generation/layer_26/config.json @@ -0,0 +1,9 @@ +{ + "class_name": "LinearProbe", + "module": "deception.oa_backdoor.detectors.probe_archs", + "init_args": { + "d_model": 4096, + "nhead": 1, + "normalize_input": "unit_norm" + } +} \ No newline at end of file diff --git a/last-token-generation/layer_26/model.pt b/last-token-generation/layer_26/model.pt new file mode 100644 index 0000000000000000000000000000000000000000..6869cefcb60d86990220cf2413bdb17e41beaf89 --- /dev/null +++ b/last-token-generation/layer_26/model.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cadda23248ac42164c03af6f8ea27b929d1950bf7301a8ae236058c8f766e5d6 +size 19197 diff --git a/last-token-generation/layer_27/config.json b/last-token-generation/layer_27/config.json new file mode 100644 index 0000000000000000000000000000000000000000..ddd4d9f60e201edb944dc39d1eb7ad78e840d831 --- /dev/null +++ b/last-token-generation/layer_27/config.json @@ -0,0 +1,9 @@ +{ + "class_name": "LinearProbe", + "module": "deception.oa_backdoor.detectors.probe_archs", + "init_args": { + "d_model": 4096, + "nhead": 1, + "normalize_input": "unit_norm" + } +} \ No newline at end of file diff --git a/last-token-generation/layer_27/model.pt b/last-token-generation/layer_27/model.pt new file mode 100644 index 0000000000000000000000000000000000000000..532db59eab320f0435b5c72e6407e0734d8657a9 --- /dev/null +++ b/last-token-generation/layer_27/model.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cf6f22e96450474ed0325bd72a096faf37c7c4b1d1475f331e00c0084199769b +size 19197 diff --git a/last-token-generation/layer_28/config.json b/last-token-generation/layer_28/config.json new file mode 100644 index 0000000000000000000000000000000000000000..ddd4d9f60e201edb944dc39d1eb7ad78e840d831 --- /dev/null +++ b/last-token-generation/layer_28/config.json @@ -0,0 +1,9 @@ +{ + "class_name": "LinearProbe", + "module": "deception.oa_backdoor.detectors.probe_archs", + "init_args": { + "d_model": 4096, + "nhead": 1, + "normalize_input": "unit_norm" + } +} \ No newline at end of file diff --git a/last-token-generation/layer_28/model.pt b/last-token-generation/layer_28/model.pt new file mode 100644 index 0000000000000000000000000000000000000000..76bc4b7892e953ec522623a9b0ab5bf2af98b7b0 --- /dev/null +++ b/last-token-generation/layer_28/model.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:af6eeda145b9ed2094c5203953ddd8451a8dbb695f5384e45584fdfb8afb1600 +size 19197 diff --git a/last-token-generation/layer_29/config.json b/last-token-generation/layer_29/config.json new file mode 100644 index 0000000000000000000000000000000000000000..ddd4d9f60e201edb944dc39d1eb7ad78e840d831 --- /dev/null +++ b/last-token-generation/layer_29/config.json @@ -0,0 +1,9 @@ +{ + "class_name": "LinearProbe", + "module": "deception.oa_backdoor.detectors.probe_archs", + "init_args": { + "d_model": 4096, + "nhead": 1, + "normalize_input": "unit_norm" + } +} \ No newline at end of file diff --git a/last-token-generation/layer_29/model.pt b/last-token-generation/layer_29/model.pt new file mode 100644 index 0000000000000000000000000000000000000000..f64af9c2197a260bf4eab7827593cbb243e40ddb --- /dev/null +++ b/last-token-generation/layer_29/model.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1cff5a5152eca30cea35431379247c73920414d0bc0f4b80a6fb2db678a6de7f +size 19197 diff --git a/last-token-generation/layer_3/config.json b/last-token-generation/layer_3/config.json new file mode 100644 index 0000000000000000000000000000000000000000..ddd4d9f60e201edb944dc39d1eb7ad78e840d831 --- /dev/null +++ b/last-token-generation/layer_3/config.json @@ -0,0 +1,9 @@ +{ + "class_name": "LinearProbe", + "module": "deception.oa_backdoor.detectors.probe_archs", + "init_args": { + "d_model": 4096, + "nhead": 1, + "normalize_input": "unit_norm" + } +} \ No newline at end of file diff --git a/last-token-generation/layer_3/model.pt b/last-token-generation/layer_3/model.pt new file mode 100644 index 0000000000000000000000000000000000000000..8f10f9056c564e0de15420d019bb766f901afb40 --- /dev/null +++ b/last-token-generation/layer_3/model.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b27860f007db8c9eed39f66addc07310267946ef64249d4e8d660a644cb9832c +size 19197 diff --git a/last-token-generation/layer_30/config.json b/last-token-generation/layer_30/config.json new file mode 100644 index 0000000000000000000000000000000000000000..ddd4d9f60e201edb944dc39d1eb7ad78e840d831 --- /dev/null +++ b/last-token-generation/layer_30/config.json @@ -0,0 +1,9 @@ +{ + "class_name": "LinearProbe", + "module": "deception.oa_backdoor.detectors.probe_archs", + "init_args": { + "d_model": 4096, + "nhead": 1, + "normalize_input": "unit_norm" + } +} \ No newline at end of file diff --git a/last-token-generation/layer_30/model.pt b/last-token-generation/layer_30/model.pt new file mode 100644 index 0000000000000000000000000000000000000000..fb9c90694af227cebf15d4a14793b2840ab73fad --- /dev/null +++ b/last-token-generation/layer_30/model.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:545088fa7a6871a2b42d53959741c063e972497c91f3bf902198f86477b9ff85 +size 19197 diff --git a/last-token-generation/layer_31/config.json b/last-token-generation/layer_31/config.json new file mode 100644 index 0000000000000000000000000000000000000000..ddd4d9f60e201edb944dc39d1eb7ad78e840d831 --- /dev/null +++ b/last-token-generation/layer_31/config.json @@ -0,0 +1,9 @@ +{ + "class_name": "LinearProbe", + "module": "deception.oa_backdoor.detectors.probe_archs", + "init_args": { + "d_model": 4096, + "nhead": 1, + "normalize_input": "unit_norm" + } +} \ No newline at end of file diff --git a/last-token-generation/layer_31/model.pt b/last-token-generation/layer_31/model.pt new file mode 100644 index 0000000000000000000000000000000000000000..b03a35a5a844945b5d4c7ac6493999a78ce04d6f --- /dev/null +++ b/last-token-generation/layer_31/model.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5820c5aba3b1914a5481bedd6c296a543ada9e6a86eee0edbeb7ea9009611394 +size 19197 diff --git a/last-token-generation/layer_4/config.json b/last-token-generation/layer_4/config.json new file mode 100644 index 0000000000000000000000000000000000000000..ddd4d9f60e201edb944dc39d1eb7ad78e840d831 --- /dev/null +++ b/last-token-generation/layer_4/config.json @@ -0,0 +1,9 @@ +{ + "class_name": "LinearProbe", + "module": "deception.oa_backdoor.detectors.probe_archs", + "init_args": { + "d_model": 4096, + "nhead": 1, + "normalize_input": "unit_norm" + } +} \ No newline at end of file diff --git a/last-token-generation/layer_4/model.pt b/last-token-generation/layer_4/model.pt new file mode 100644 index 0000000000000000000000000000000000000000..986cbb020fc79788292624d93700833421e1921d --- /dev/null +++ b/last-token-generation/layer_4/model.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:91c75897e9c221546c3e0c3f471e9339f6c4f88eaf8e1fa7cb5173341960ae7c +size 19197 diff --git a/last-token-generation/layer_5/config.json b/last-token-generation/layer_5/config.json new file mode 100644 index 0000000000000000000000000000000000000000..ddd4d9f60e201edb944dc39d1eb7ad78e840d831 --- /dev/null +++ b/last-token-generation/layer_5/config.json @@ -0,0 +1,9 @@ +{ + "class_name": "LinearProbe", + "module": "deception.oa_backdoor.detectors.probe_archs", + "init_args": { + "d_model": 4096, + "nhead": 1, + "normalize_input": "unit_norm" + } +} \ No newline at end of file diff --git a/last-token-generation/layer_5/model.pt b/last-token-generation/layer_5/model.pt new file mode 100644 index 0000000000000000000000000000000000000000..c57c940bf25158d8cff19163e612f81bc9f4b3c7 --- /dev/null +++ b/last-token-generation/layer_5/model.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:418380ad776e9d17c3c2f71dca5c6738912966e0c1a37f844d283bb8b7839641 +size 19197 diff --git a/last-token-generation/layer_6/config.json b/last-token-generation/layer_6/config.json new file mode 100644 index 0000000000000000000000000000000000000000..ddd4d9f60e201edb944dc39d1eb7ad78e840d831 --- /dev/null +++ b/last-token-generation/layer_6/config.json @@ -0,0 +1,9 @@ +{ + "class_name": "LinearProbe", + "module": "deception.oa_backdoor.detectors.probe_archs", + "init_args": { + "d_model": 4096, + "nhead": 1, + "normalize_input": "unit_norm" + } +} \ No newline at end of file diff --git a/last-token-generation/layer_6/model.pt b/last-token-generation/layer_6/model.pt new file mode 100644 index 0000000000000000000000000000000000000000..49e8e6f29d312bc3a108871bfb3acce75e0cdd19 --- /dev/null +++ b/last-token-generation/layer_6/model.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:66318c3539db27e1a80fa3ed117cb86bd4faac1863d50f36ddf28add12dc48a3 +size 19197 diff --git a/last-token-generation/layer_7/config.json b/last-token-generation/layer_7/config.json new file mode 100644 index 0000000000000000000000000000000000000000..ddd4d9f60e201edb944dc39d1eb7ad78e840d831 --- /dev/null +++ b/last-token-generation/layer_7/config.json @@ -0,0 +1,9 @@ +{ + "class_name": "LinearProbe", + "module": "deception.oa_backdoor.detectors.probe_archs", + "init_args": { + "d_model": 4096, + "nhead": 1, + "normalize_input": "unit_norm" + } +} \ No newline at end of file diff --git a/last-token-generation/layer_7/model.pt b/last-token-generation/layer_7/model.pt new file mode 100644 index 0000000000000000000000000000000000000000..2fb5446102ea0c1e857e7bedd9ea95e8084f670e --- /dev/null +++ b/last-token-generation/layer_7/model.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1877072f3438f73cb47f58befd75bb26f7b5c3e51ded03d23ff8ba067de3438e +size 19197 diff --git a/last-token-generation/layer_8/config.json b/last-token-generation/layer_8/config.json new file mode 100644 index 0000000000000000000000000000000000000000..ddd4d9f60e201edb944dc39d1eb7ad78e840d831 --- /dev/null +++ b/last-token-generation/layer_8/config.json @@ -0,0 +1,9 @@ +{ + "class_name": "LinearProbe", + "module": "deception.oa_backdoor.detectors.probe_archs", + "init_args": { + "d_model": 4096, + "nhead": 1, + "normalize_input": "unit_norm" + } +} \ No newline at end of file diff --git a/last-token-generation/layer_8/model.pt b/last-token-generation/layer_8/model.pt new file mode 100644 index 0000000000000000000000000000000000000000..717e988a8c75a8d108d01469a40545648607fbe9 --- /dev/null +++ b/last-token-generation/layer_8/model.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:08d7f7e239a1539bc4c67021e74618cdc521377b17885638ab66ee2bba8bef25 +size 19197 diff --git a/last-token-generation/layer_9/config.json b/last-token-generation/layer_9/config.json new file mode 100644 index 0000000000000000000000000000000000000000..ddd4d9f60e201edb944dc39d1eb7ad78e840d831 --- /dev/null +++ b/last-token-generation/layer_9/config.json @@ -0,0 +1,9 @@ +{ + "class_name": "LinearProbe", + "module": "deception.oa_backdoor.detectors.probe_archs", + "init_args": { + "d_model": 4096, + "nhead": 1, + "normalize_input": "unit_norm" + } +} \ No newline at end of file diff --git a/last-token-generation/layer_9/model.pt b/last-token-generation/layer_9/model.pt new file mode 100644 index 0000000000000000000000000000000000000000..9845620fec932a7728999008e07a1b2a86745a2d --- /dev/null +++ b/last-token-generation/layer_9/model.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e69244903affec0081da3f1cc73b60652eb6d1c9334852d96285132922d2ccfb +size 19197