| Validation data NOTICE | |
| ====================== | |
| The file `val_200.jsonl` in this directory contains 200 publicly posted Reddit | |
| comments, included as a small held-out evaluation set with per-token `r_true` | |
| labels so that reviewers can reproduce paper §5 metrics without rerunning | |
| training. | |
| Copyright and licensing | |
| ----------------------- | |
| - The comment text remains the intellectual property of the original Reddit | |
| authors. It is included here under a research / fair-use rationale, solely | |
| to enable reproduction of published evaluation numbers. | |
| - The `r_true` annotations, the schema, and the file packaging are released | |
| under Apache-2.0 (see ../LICENSE). | |
| - This sample is NOT a license to redistribute the underlying Reddit content | |
| for any other purpose. | |
| Removal requests | |
| ---------------- | |
| If you are the author of one of these comments and would like it removed | |
| from the distribution, contact the corresponding author listed in | |
| `../paper.pdf`. Removals will be honored in the next release. | |
| Reproducing the full corpus | |
| --------------------------- | |
| The 1M-sample training corpus is not redistributed here. See `DATA.md` for the | |
| schema and the steps required to reconstruct it from the public Reddit API. | |