HOTE-8B

HOTE-8B is an 8B-parameter deep research model trained with Hybrid Open-Ended Tri-Evolution (HOTE), a reinforcement-learning framework for open-ended research agents. The model is introduced in Hybrid Open-Ended Tri-Evolution Makes Better Deep Researcher (arXiv:2606.13710v2, 2026-06-15).

HOTE trains a deep research system through the co-evolution of three roles:

  • Solver: plans, searches, integrates retrieved evidence, and writes long-form research reports with citations.
  • Judge: generates and updates rubrics, evaluates multiple solver responses, and provides rewards beyond deterministic-answer tasks.
  • Proposer: searches for weaknesses identified by the judge and proposes challenging but learnable research tasks.

The framework uses a dual-mode strategy with both tool-use and no-tool training. According to the paper, this improves training efficiency while allowing the tool-use and no-tool modes to benefit each other.

Repository Contents

This repository contains the following checkpoint folders:

  • step_700/: HOTE-8B deep research model checkpoint.
  • step_700_query/: proposer checkpoint used in the HOTE framework.

Intended Use

HOTE-8B is intended for research on long-form deep research agents, search-augmented report generation, open-ended agent evolution, and reinforcement learning for non-verifiable tasks.

The model is most useful when integrated with a search-enabled agent runtime. In the paper, the solver operates with ReAct-style actions including thinking, tool calls, final answers, and citations. The model weights alone do not provide web search, browsing, paper search, citation validation, or tool execution.

Limitations

  • The model is designed for deep research workflows and should be paired with robust tool execution, citation validation, and source-quality checks.
  • The model may generate inaccurate, incomplete, outdated, or unsupported claims, especially without retrieval tools.
  • The paper notes that evolution slows as training progresses and that the upper bound may still be constrained by model scale.
  • The HOTE method still relies on initial training data; fully data-free open-ended deep research evolution is left for future work.
  • Research outputs in sensitive domains such as healthcare, law, finance, or public policy should be reviewed by qualified experts.

Citation

@misc{piao2026hybridopenendedtrievolutionmakes,
  title = {Hybrid Open-Ended Tri-Evolution Makes Better Deep Researcher},
  author = {Hongming Piao and Chi Liu and Mengzhuo Chen and Yan Shu and Xidong Wang and Derek Li and Ying Wei and Bryan Dai},
  year = {2026},
  eprint = {2606.13710},
  archivePrefix = {arXiv},
  primaryClass = {cs.AI},
  url = {https://arxiv.org/abs/2606.13710}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for IQuestLab/HOTE-8B

Finetuned
Qwen/Qwen3-8B
Finetuned
(1710)
this model

Datasets used to train IQuestLab/HOTE-8B

Paper for IQuestLab/HOTE-8B

Evaluation results