RiverRider's picture
Initial release: SRT-Adapter v8a (peer-review distribution)
aa2d4f1 verified
Validation data NOTICE
======================
The file `val_200.jsonl` in this directory contains 200 publicly posted Reddit
comments, included as a small held-out evaluation set with per-token `r_true`
labels so that reviewers can reproduce paper §5 metrics without rerunning
training.
Copyright and licensing
-----------------------
- The comment text remains the intellectual property of the original Reddit
authors. It is included here under a research / fair-use rationale, solely
to enable reproduction of published evaluation numbers.
- The `r_true` annotations, the schema, and the file packaging are released
under Apache-2.0 (see ../LICENSE).
- This sample is NOT a license to redistribute the underlying Reddit content
for any other purpose.
Removal requests
----------------
If you are the author of one of these comments and would like it removed
from the distribution, contact the corresponding author listed in
`../paper.pdf`. Removals will be honored in the next release.
Reproducing the full corpus
---------------------------
The 1M-sample training corpus is not redistributed here. See `DATA.md` for the
schema and the steps required to reconstruct it from the public Reddit API.