Papers
arxiv:2605.23899

From Raw Experience to Skill Consumption: A Systematic Study of Model-Generated Agent Skills

Published on May 22
Β· Submitted by
taesiri
on May 25
Authors:
,
,
,
,
,
,
,
,
,
,
,

Abstract

Language agents benefit from reusable skills that encode domain-specific procedures, but their effectiveness varies significantly across different extraction and consumption scenarios, requiring careful evaluation and meta-skill guidance to optimize performance.

AI-generated summary

Language agents increasingly improve by reusing skills -- structured procedural artifacts distilled from past experience. In particular, domain-level and model-generated skills are especially promising. They offer fast adaptation within a domain by encoding domain-specific recurring procedures, and they scale beyond labor-intensive hand-crafting. However, while extraction methods continue to proliferate, understanding remains limited, with no comprehensive study spanning the full skill lifecycle -- experience generation, skill extraction, and skill consumption -- to ask whether such skills actually work, when they work, and what makes them succeed or fail. To close this gap, we build a utility-grounded evaluation framework that provides systematic experimental results across extractors and target agents, covering five diverse agentic task domains. We find that model-generated skills are beneficial on average but exhibit non-trivial negative transfer, and that neither extractors nor targets behave uniformly. A model can be a strong extractor yet a weak consumer, or vice versa, with skill utility independent of model scale or baseline task strength. To explain these patterns, we then dissect each lifecycle stage in depth, analyzing how experience composition shapes skill quality, what properties characterize useful skills, and how the same skill transfers across different consumers. Finally, we translate these findings into a concrete meta-skill that guides skill extraction toward the features tied to actual utility, which consistently improves skill quality across domains and substantially reduces negative transfer.

Community

Paper author

πŸš€ Understanding when agent skills actually help β€” and when they hurt.

Language agents increasingly reuse skills: structured procedural artifacts distilled from past experience. But while skill extraction methods are rapidly growing, we still lack a clear answer to a basic question:

Do model-generated skills actually work, and what makes them succeed or fail? 🧠✨

In this work, we study the full skill lifecycle:
πŸ§ͺ experience generation β†’ ✍️ skill extraction β†’ πŸ€– skill consumption

Across five diverse agentic task domains, we show that model-generated skills are useful on average, but can also cause negative transfer. Skill utility is not determined simply by model scale or baseline performance: strong skill extractors can be weak consumers, and vice versa. We further analyze which experiences and skill properties drive successful transfer across target agents.

As agents move from answering questions to performing real tasks, reusable and reliable procedural skills may become a key adaptation layer beyond prompting and finetuning. πŸ› οΈ

🌐 Project Page: https://aka.ms/SkillLens
πŸ“„ Paper: https://arxiv.org/abs/2605.23899
πŸ’» Code: https://github.com/microsoft/SkillLens

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.23899
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.23899 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.23899 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.23899 in a Space README.md to link it from this page.

Collections including this paper 3