TokenOptEnv

TokenOptEnv is an OpenEnv environment for CAMRE: a cost-aware meta-reasoning benchmark for code and log tasks. Agents do not just solve tasks. They also learn how to manage context, retrieval, memory, checkpoints, compression, and model-routing strategy under explicit token and cost budgets.

This repository captures the full hackathon story across both V1 and V2 of CAMRE.

The current deployed runtime is the V2 environment, but the repo intentionally preserves the V1 notebook, earlier benchmark framing, and the evolution path between the two versions.

Version Overview

V1 Core Contributions

V2 Extensions

Together, V1 and V2 make CAMRE much closer to the real systems problem the benchmark is trying to measure: not just solving a task, but solving it efficiently, strategically, and durably over long trajectories.

Benchmark Goals

CAMRE is designed to answer questions like:

The benchmark is deterministic enough for repeatable RL training and structured enough for decomposed reward analysis.

Project Links

Result Snapshots

The current public training artifact is the step-100 export from the first stable CAMRE V2 GRPO run. That is the public checkpoint currently linked in the Spaces and model repo, but the broader project and benchmark narrative spans both the original V1 environment design and the later V2 long-horizon extension.

Reward Progress

CAMRE V2 reward progress

Training Diagnostics

CAMRE V2 training diagnostics

Task Families

CAMRE currently ships three code/log-focused task families:

Each scenario includes the V1 benchmark core, with V2 runtime extensions layered on top where appropriate:

Action Space

V1 Core Actions

V2 Extended Actions

The current deployed environment adds these long-horizon control actions on top of the V1 core:

Across both versions, the design goal is the same: make the control problem explicit instead of hiding retrieval, memory, and routing choices inside middleware.

Observation Surface

V1 exposes the core task, budget, cache, compression, and routing telemetry. The current V2 deployment extends that surface so the agent can additionally observe:

Hidden state remains private inside the environment so the agent cannot directly inspect oracle truth, milestone conditions, or routing internals.

Reward Design

CAMRE uses a composite reward design across both versions.

V1 Reward Foundations

V2 Additions

V2 keeps the V1 components and adds longer-horizon process supervision:

Guardrails penalize failure modes such as:

Model Catalog

The frozen catalog lives in catalog.py and includes:

The catalog is based on real-world model identities and metadata, but runtime behavior is simulated so training and evaluation remain reproducible.

Runtime Architecture

The current deployed runtime is the V2 architecture, split into focused modules:

Quick Start

from TokenOptEnv import ActionType, TokenOptEnvAction
from TokenOptEnv.server.TokenOptEnv_environment import TokenOptEnvEnvironment

env = TokenOptEnvEnvironment()
obs = env.reset(scenario_id="incident-medium-kafka-backpressure", seed=7)

obs = env.step(
    TokenOptEnvAction(
        action_type=ActionType.READ_ARTIFACT,
        artifact_id="log-reconciler-5521",
    )
)

print(obs.last_tool_result.payload["access_mode"])
print(obs.last_tool_result.payload["preview_segment_ids"])

obs = env.step(
    TokenOptEnvAction(
        action_type=ActionType.HYDRATE_SEGMENTS,
        context_segment_ids=["log-reconciler-5521-s1"],
    )
)

print(obs.working_memory.token_budget_used)
print(obs.milestones.achieved_ids)

Training Notebooks

The repo includes both versions of the training workflow:

In other words: V1 documents the initial benchmark/training path, and V2 documents the longer-horizon upgrade path.

Running the Server Locally

uvicorn server.app:app --reload --host 0.0.0.0 --port 8000

Repo Structure

TokenOptEnv/
|-- assets/
|   `-- plots/
|       |-- camre_v2_reward_progress.png
|       `-- camre_v2_training_diagnostics.png
|-- Dockerfile
|-- __init__.py
|-- catalog.py
|-- episode_runtime.py
|-- memory_manager.py
|-- milestone_engine.py
|-- models.py
|-- notebooks/
|   |-- CAMRE_GRPO_Training.ipynb
|   |-- CAMRE_GRPO_Training_V2.ipynb
|   `-- CAMRE_GRPO_Training_V2_Clean.ipynb
|-- observation_builder.py
|-- openenv.yaml
|-- pyproject.toml
|-- README.md
|-- rewards.py
|-- scenario_store.py
|-- simulators.py
|-- task_defs.py
|-- uv.lock
`-- server/
    |-- __init__.py
    |-- TokenOptEnv_environment.py
    |-- app.py
    `-- Dockerfile

Current Positioning

CAMRE as a project is positioned as a research benchmark first:

Public Artifact Policy

This keeps the environment, training workflow, and model artifact separately reusable while still linking them together as one benchmark story that includes both the original V1 benchmark and the extended V2 runtime.

Notes