# Simulation vs Production Mode

OpenEnv has two related but different ideas of "mode":

- **Simulation mode** is for training, evaluation, and any workflow where the orchestrator controls episode boundaries.
- **Production mode** is for exposing tools directly to clients over MCP without the training loop controlling `reset()`, `step()`, or `state()`.

This guide explains when to use each mode and how they interact with MCP tools.

## The Short Version

Use **simulation mode** when you need:

- `reset()`, `step()`, and `state()`
- rewards and `done` signals
- one action per step in a controlled trajectory
- training or evaluation loops

Use **production mode** when you need:

- direct MCP tool access
- no simulation control routes
- an agent or client talking to tools as a live service
- the environment to get out of the way and expose the tool interface directly

## Why OpenEnv Has Two Modes

This split follows the core OpenEnv design principles:

- training and evaluation need a controlled step loop
- production integrations need direct tool access
- the same environment should support both without inventing separate environment implementations

In practice, simulation mode models **trajectory time** and production mode models **service time**.

## Simulation Mode

Simulation mode is the default environment-control model.

In simulation mode, the orchestrator owns the episode:

1. Call `reset()` to start an episode.
2. Call `step()` for each action.
3. Read `reward`, `done`, and `state()` as part of the rollout.

For environments with MCP tools, the canonical simulation-mode pattern is still `step()`.

```python
from openenv.core.env_server.mcp_types import CallToolAction, ListToolsAction

obs = env.step(ListToolsAction())

obs = env.step(
    CallToolAction(
        tool_name="echo_message",
        arguments={"message": "Hello from simulation mode"},
    )
)
```

That pattern matters because the training loop can then:

- count tool usage as actions
- assign rewards to tool interactions
- record a full trajectory
- preserve the same `reset`/`step` contract across environments

### Simulation-Mode Routes

When an `HTTPEnvServer` registers routes in simulation mode, it exposes the full control surface:

- `/ws`
- `/mcp`
- `/reset`
- `/step`
- `/state`

This is the right mode for RL training infrastructure and for most environment testing.

## Production Mode

Production mode is for exposing tools directly.

In production mode, clients should interact with MCP tools as a service instead of driving the environment through `reset()` and `step()` as a trajectory loop.

Production mode keeps the MCP surface and removes the HTTP simulation control routes.

### Production-Mode Routes

When an `HTTPEnvServer` registers routes in production mode, OpenEnv does **not** expose:

- `/reset`
- `/step`
- `/state`

It still registers `/ws`, because the WebSocket transport remains part of the infrastructure boundary.

That does **not** mean `/ws` should be exposed to agents.

- `/ws` is for orchestration and simulation control
- `/mcp` is the agent-facing boundary
- production deployments should restrict `/ws` at the network, auth, or gateway layer if agents can reach the service directly

In other words, production mode removes the HTTP simulation endpoints, but operators must still treat `/ws` as infrastructure-only.

This is the right mode when:

- you are serving a tool-backed environment to external clients
- you do not want callers controlling episode boundaries
- the MCP interface is the product surface

## How MCP Fits Into Both Modes

MCP is available in both modes, but the role is different.

### In simulation mode

MCP tools are part of the environment action space.

- tool discovery can still happen
- tool calls are modeled as actions
- rewards and episode control remain in the OpenEnv loop

This is why examples such as `examples/echo_mcp_demo.py` use `ListToolsAction` and `CallToolAction` through `step()`.

### In production mode

MCP is the primary interface.

- clients call tools directly
- OpenEnv does not present simulation-control endpoints
- the service behaves like a live MCP endpoint, not an RL rollout loop

## Server-Side Configuration

The server-side switch happens when routes are registered.

```python
from fastapi import FastAPI

from openenv.core.env_server.http_server import HTTPEnvServer
from openenv.core.env_server.types import ServerMode

app = FastAPI()
server = HTTPEnvServer(env=MyEnv, action_cls=MyAction, observation_cls=MyObservation)

# Training / evaluation
server.register_routes(app, mode=ServerMode.SIMULATION)

# Direct MCP serving
server.register_routes(app, mode=ServerMode.PRODUCTION)
```

`ServerMode.SIMULATION` is the default route-registration mode.

## Client-Side Patterns

For simulation-style interaction, use a client that participates in the OpenEnv control loop.

Examples:

- an environment-specific `EnvClient[...]` subclass
- `GenericEnvClient(base_url=..., mode="simulation")`
- `step(ListToolsAction())` and `step(CallToolAction(...))` for MCP-backed environments

For direct MCP access, use an MCP-oriented client.

Examples:

- `MCPToolClient(base_url=...)`
- environment-specific clients built on top of `MCPToolClient`

`MCPToolClient` defaults to production mode and rejects `mode="simulation"`.

## Mode-Aware Tools

`MCPEnvironment` supports mode-aware tool registration, so you can expose different tools depending on how the environment is being used.

```python
class MyEnv(MCPEnvironment):
    def __init__(self):
        @self.tool(mode="simulation")
        def score_candidate(answer: str) -> str:
            return "Used inside the training loop"

        @self.tool(mode="production")
        def lookup_docs(query: str) -> str:
            return "Used by live MCP clients"
```

This lets one environment preserve the training contract while still serving a cleaner production surface.

## Choosing the Right Mode

Choose **simulation mode** if the caller needs to control trajectories.

Typical cases:

- RL training
- policy evaluation
- benchmarking with rewards
- environments where tool calls should count as agent actions

Choose **production mode** if the caller needs direct tool access.

Typical cases:

- agent runtimes that speak MCP directly
- demos and hosted services
- integrations where `reset()` and `step()` should not be public

## Common Mistake

The most common confusion is assuming that "MCP environment" automatically means "production mode only".

That is not the model OpenEnv uses.

- An MCP-backed environment can still run in **simulation mode**.
- In simulation mode, MCP tool interactions are represented through the OpenEnv step loop.
- Production mode changes the public control surface, not the underlying environment concept.

## Related Reading

- [Core API](../reference/core)
- [Getting Started Tutorials](../tutorials/index)
- [RFC 002: Environment Spec](https://github.com/huggingface/OpenEnv/blob/main/rfcs/002-env-spec.md)
- [RFC 005: Agentic Harnesses](https://github.com/huggingface/OpenEnv/blob/main/rfcs/005-agentic-harnesses.md)