Tool Calling Training Data Analysis
Generated: 2026-04-06
Files Analyzed:
training-data/tool_examples.jsonl(original)training-data_v2/tool_examples.jsonl(regenerated)
Executive Summary
The original tool calling training data had significant quality issues that limited its usefulness for training a production AI coding assistant. The data was synthetically generated with systematic errors.
Key Findings on Original Data:
- ❌ 10.5% of tool calls use incorrect parameters (mismatched search queries, wrong files)
- ❌ Heavy prompt duplication (7.5x average)
- ❌ No multi-step tool chains (only 1 tool per example)
- ❌ All examples use identical tool definitions
Action Taken: Generated 500 new examples using the project's generator script.
Recommendation: The original data needs substantial improvements before use in training.
1. Statistics Overview
Original Data (tool_examples.jsonl)
| Metric | Value |
|---|---|
| Total Examples | 1,000 |
| Unique Prompts | 133 |
| Average Duplication | 7.52x |
| Unique Tool Sequences | 5 |
| Examples with Issues | ~107 (10.7%) |
New Data (tool_examples_v2.jsonl)
| Metric | Value |
|---|---|
| Total Examples | 500 |
| File Size | 1.9 MB |
| Tools per Example | 5 (static definition) |
Tool Call Distribution (Original)
| Tool | Call Count |
|---|---|
| Bash | 200 |
| FileRead | 200 |
| FileWrite | 200 |
| WebSearch | 200 |
| Grep | 200 |
All examples have exactly one tool call - no multi-step chains exist.
2. Prompt Diversity Analysis (Original Data)
Prompt Categories
| Category | Count | Percentage |
|---|---|---|
| Python | 207 | 20.7% |
| React | 149 | 14.9% |
| File Read | 134 | 13.4% |
| File Write | 119 | 11.9% |
| Other | 114 | 11.4% |
| Run Command | 80 | 8.0% |
| Docker/K8s | 67 | 6.7% |
| Search | 50 | 5.0% |
| Git | 40 | 4.0% |
| Testing | 31 | 3.1% |
| Package Management | 9 | 0.9% |
Most Duplicated Prompts
| Prompt | Occurrences |
|---|---|
| "Run the tests with pytest" | 40 |
| "Run npm install to install dependencies" | 40 |
| "Write a simple React component to src/components/Button.jsx" | 67 |
3. Tool Usage Breakdown
Tool Definitions
All 1,000 original examples use identical tool definitions with 5 tools:
Bash- Execute bash commandsFileRead- Read file contentsFileWrite- Create/overwrite filesWebSearch- Search the webGrep- Search for patterns in files
Tool Call Issues Found (Original Data)
Wrong Search Patterns (105 instances / 10.5%)
The WebSearch tool frequently uses queries that don't match the user's question:
| User Question | Actual Search Query |
|---|---|
| "How do I use async/await in Python?" | "AWS Lambda cold start optimization" |
| "How do I use React hooks properly?" | "SQL join types explained" |
| "What's the difference between Docker and Kubernetes?" | "Git rebase vs merge" |
| "How do I use React hooks properly?" | "TypeScript generics tutorial" |
| "What's the difference between Docker and Kubernetes?" | "TypeScript generics tutorial" |
Wrong File Paths (2 instances)
The FileWrite tool sometimes writes to incorrect file types:
| User Request | Written Path |
|---|---|
| "Create a src/components/Header.jsx file" | Written to config.json |
| "Create a src/middleware.py file with settings" | Written to config.yaml |
Pattern/File Type Mismatches (Grep)
The Grep tool sometimes searches with mismatched patterns:
| Pattern | File Pattern | Issue |
|---|---|---|
class |
*.ts |
Python pattern in TypeScript files |
SELECT |
*.js |
SQL pattern in JavaScript files |
TODO |
*.md |
Searching TODO in markdown files |
4. Data Quality Issues
Critical Issues
No Multi-Step Tool Chains
- All 1,000 examples use exactly one tool call
- Real coding tasks typically require 2-5+ tool calls
- Example: "Read file → Find pattern → Search docs → Write fix"
Search Query Mismatches
- 10.5% of WebSearch calls have irrelevant queries
- Indicates the generator script has logic errors
Heavy Prompt Duplication
- 133 unique prompts duplicated to 1,000 examples
- "Write a simple React component" appears 67 times
- This creates overfitting to specific prompts
Identical Tool Definitions
- All examples use the same 5 tools with identical descriptions
- No variation in tool schemas or parameter structures
Moderate Issues
File Path Hallucination
- Tool calls reference files that don't exist in actual codebase
- Example: asking for
tests/test_main.pybut readingsrc/app.js
Response Fabrication
- Assistant responses sometimes claim to show content that wasn't actually read
- Example: "Here's the README.md" when README.md wasn't the file requested
5. Recommendations for Improvement
Immediate Actions (Completed)
- ✅ Regenerated Data
Generated 500 new examples in training-data_v2/tool_examples.jsonl
Script Fixes Needed
The generator script (scripts/generate_tool_data.py) needs:
- Fix
TOOL_CALL_PAIRSmapping - queries don't match questions - Fix
FILE_PATTERNS- wrong file types for requested content - Add multi-step chain generation
- Add prompt variation templates
- Add validation to check query/content relevance
Future Improvements
Add Multi-Step Examples
- Real tasks require reading files, searching, editing
- Generate chains of 2-4 tool calls per example
Increase Prompt Diversity
- Target 500+ unique prompts instead of duplicating
- Use template variations and paraphrasing
Vary Tool Definitions
- Different tools per example
- Add tool variations (e.g., different Bash commands)
6. Conclusion
The original tool_examples.jsonl data is NOT suitable for production training without significant improvements:
- ~10% of examples have incorrect tool parameters
- Heavy duplication leads to overfitting
- No multi-step chains fail to represent real coding workflows
- Synthetic generation errors are systematic
Action Completed: Generated 500 new examples via the project's generator script.
Remaining Work: Fix the underlying generator script to eliminate the systematic errors before full-scale regeneration.
Appendix: Quick Stats
Original Data
Total examples: 1,000
Unique prompts: 133
Tool call issues: 107 (10.7%)
Multi-tool chains: 0 (0%)
Identical tool defs: 100%
Average duplication: 7.52x
New Data (Generated)
Total examples: 500
File size: 1.9 MB
Location: training-data_v2/tool_examples.jsonl