HOTE-8B

HOTE-8B is an 8B-parameter deep research model trained with Hybrid Open-Ended Tri-Evolution (HOTE), a reinforcement-learning framework for open-ended research agents. The model is introduced in Hybrid Open-Ended Tri-Evolution Makes Better Deep Researcher (arXiv:2606.13710v2, 2026-06-15).

HOTE trains a deep research system through the co-evolution of three roles:

Solver: plans, searches, integrates retrieved evidence, and writes long-form research reports with citations.
Judge: generates and updates rubrics, evaluates multiple solver responses, and provides rewards beyond deterministic-answer tasks.
Proposer: searches for weaknesses identified by the judge and proposes challenging but learnable research tasks.

The framework uses a dual-mode strategy with both tool-use and no-tool training. According to the paper, this improves training efficiency while allowing the tool-use and no-tool modes to benefit each other.

Repository Contents

This repository contains the following checkpoint folders:

step_700/: HOTE-8B deep research model checkpoint.
step_700_query/: proposer checkpoint used in the HOTE framework.

Intended Use

HOTE-8B is intended for research on long-form deep research agents, search-augmented report generation, open-ended agent evolution, and reinforcement learning for non-verifiable tasks.

The model is most useful when integrated with a search-enabled agent runtime. In the paper, the solver operates with ReAct-style actions including thinking, tool calls, final answers, and citations. The model weights alone do not provide web search, browsing, paper search, citation validation, or tool execution.

Limitations

The model is designed for deep research workflows and should be paired with robust tool execution, citation validation, and source-quality checks.
The model may generate inaccurate, incomplete, outdated, or unsupported claims, especially without retrieval tools.
The paper notes that evolution slows as training progresses and that the upper bound may still be constrained by model scale.
The HOTE method still relies on initial training data; fully data-free open-ended deep research evolution is left for future work.
Research outputs in sensitive domains such as healthcare, law, finance, or public policy should be reviewed by qualified experts.

Citation

@misc{piao2026hybridopenendedtrievolutionmakes,
  title = {Hybrid Open-Ended Tri-Evolution Makes Better Deep Researcher},
  author = {Hongming Piao and Chi Liu and Mengzhuo Chen and Yan Shu and Xidong Wang and Derek Li and Ying Wei and Bryan Dai},
  year = {2026},
  eprint = {2606.13710},
  archivePrefix = {arXiv},
  primaryClass = {cs.AI},
  url = {https://arxiv.org/abs/2606.13710}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for IQuestLab/HOTE-8B

Base model

Qwen/Qwen3-8B-Base

Finetuned

Qwen/Qwen3-8B

Finetuned

(1710)

this model

Datasets used to train IQuestLab/HOTE-8B

Paper for IQuestLab/HOTE-8B

Hybrid Open-Ended Tri-Evolution Makes Better Deep Researcher

Paper • 2606.13710 • Published 2 days ago • 1

Evaluation results

HealthBench score on HealthBench
self-reported

54.400
ResearchQA score on ResearchQA
self-reported

76.900
DeepResearchBench score on DeepResearchBench
self-reported

45.900