Agent-Based Simulation: Using 10,000 AI Agents to Generate Synthetic Training Data

Part of Series The Dataset Frontier 3 of 7

1 Synthetic Data Pipelines: Magpie, Nemotron-4, and Generating Training Data at Scale 2 Data Curation at Scale: DCLM, FineWeb-Edu, and the Exact Heuristics That Filter the Web 3 Agent-Based Simulation: Using 10,000 AI Agents to Generate Synthetic Training Data 4 Code Dataset Curation: Deduplication, License Filtering, and Quality Scoring for LLM Training 5 Preference Data: Building DPO/RLHF Datasets from Human and AI Feedback 6 Data Mixing: Optimal Proportions of Code, Math, Web, and Books for LLM Training 7 Evaluation Datasets: Building Benchmarks That Actually Measure LLM Capability

Synthetic data generation typically follows a simple pattern: prompt a strong model, collect the output, filter for quality. This works for instruction-response pairs, but it produces data that is fundamentally single-turn and context-free. Real user behavior is none of these things. Users arrive with history, preferences, constraints, and goals that evolve over multiple interactions. A customer who bought a camera last week asks different questions about lenses than a first-time buyer. A budget-conscious shopper negotiates differently than someone optimizing for speed.

Agent-based simulation generates data with this structure. Instead of prompting a model once per example, you build a virtual environment — a marketplace, a help desk, a negotiation table — populate it with thousands of LLM-powered agents that have persistent memory and defined personas, and let them interact. The resulting conversations have context, history, multi-turn dynamics, and behavioral diversity that single-shot generation cannot produce.

This post builds a complete agent simulation system: the environment, the agent architecture, the persona system, the interaction loop, the data extraction pipeline, the quality scoring, and the cost analysis for generating 1M training examples.

1. Why Agent Simulation: The Data Gap

Standard synthetic data generation produces isolated examples. Each example is independent — there is no shared context, no user history, no evolving state. But models deployed in production face contextual interactions:

📊

Contextual Dimensions Missing from Single-Shot Synthetic Data

Dimension	Single-Shot Generation	Agent Simulation	Impact on Model Quality
User history	None — each example is independent	Agent has purchase history, past questions	Model learns to reference prior context
Multi-turn dynamics	Usually single-turn Q&A	Conversations run 3-15 turns	Model learns turn-taking, follow-ups
Negotiation patterns	Not representable	Agents haggle, counter-offer, compromise	Model learns persuasion, concession logic
Emotional trajectories	Static tone per example	Frustration builds over failed attempts	Model learns empathy, escalation handling
Referential coherence	No shared entities	Agents reference same products, prices	Model learns entity tracking across turns
Time-dependent behavior	No temporal structure	Agent behavior changes over simulated days	Model learns recency, urgency patterns

The fundamental insight: a simulation environment provides structure that pure prompting cannot. The environment enforces consistency (prices are real, inventory is tracked, transactions are valid), and the agent personas provide behavioral diversity (a budget shopper and a tech enthusiast generate different conversations about the same product).

2. Environment Design: The Simulated Marketplace

The environment is the shared state that all agents interact with. For a marketplace simulation, this includes products, prices, inventory, reviews, and transaction history.

import json
import random
import uuid
from dataclasses import dataclass, field
from datetime import datetime, timedelta


@dataclass
class Product:
    product_id: str
    name: str
    category: str
    price: float
    original_price: float
    description: str
    specs: dict
    rating: float
    num_reviews: int
    inventory: int
    seller_id: str


@dataclass
class Review:
    review_id: str
    product_id: str
    author_id: str
    rating: int          # 1-5
    title: str
    body: str
    helpful_votes: int
    verified_purchase: bool
    timestamp: str


@dataclass
class Transaction:
    transaction_id: str
    buyer_id: str
    product_id: str
    price_paid: float
    quantity: int
    timestamp: str
    negotiated: bool     # Was there a price negotiation?
    discount_pct: float  # 0.0 if no discount


class MarketplaceEnvironment:
    """
    Shared state for all agents. Tracks products, inventory,
    transactions, and reviews. Enforces consistency constraints.
    """

    def __init__(self, seed=42):
        self.rng = random.Random(seed)
        self.products = {}
        self.reviews = {}           # product_id -> list of Review
        self.transactions = []
        self.agent_purchase_history = {}  # agent_id -> list of Transaction
        self.agent_review_history = {}    # agent_id -> list of Review
        self.current_time = datetime(2025, 1, 1)

    def generate_catalog(self, num_products=500):
        """Generate a product catalog with realistic attributes."""
        categories = {
            "electronics": {
                "names": [
                    "Wireless Noise-Canceling Headphones",
                    "4K Webcam with Ring Light",
                    "Mechanical Keyboard (Cherry MX)",
                    "USB-C Docking Station 12-in-1",
                    "Portable SSD 2TB",
                    "Smart Home Hub",
                    "Bluetooth Speaker Waterproof",
                    "Laptop Stand Aluminum",
                    "Wireless Mouse Ergonomic",
                    "LED Monitor 27-inch 144Hz",
                ],
                "price_range": (29.99, 599.99),
                "spec_keys": ["brand", "warranty", "connectivity", "weight"],
            },
            "home": {
                "names": [
                    "Robot Vacuum with Mapping",
                    "Air Purifier HEPA H13",
                    "Instant Pot 8-Quart",
                    "Memory Foam Mattress Topper",
                    "Smart Thermostat",
                    "Cordless Stick Vacuum",
                    "LED Desk Lamp Dimmable",
                    "Electric Kettle Temperature Control",
                    "Weighted Blanket 15lbs",
                    "Water Filter Pitcher",
                ],
                "price_range": (19.99, 399.99),
                "spec_keys": ["brand", "dimensions", "power", "warranty"],
            },
            "outdoor": {
                "names": [
                    "Hiking Backpack 65L",
                    "Camping Tent 4-Person",
                    "Trekking Poles Carbon Fiber",
                    "Portable Water Filter",
                    "Solar Charger 28W",
                    "Sleeping Bag 20F",
                    "Headlamp 1000 Lumens",
                    "Camp Stove Ultralight",
                    "Dry Bag 30L Waterproof",
                    "GPS Watch with Topo Maps",
                ],
                "price_range": (24.99, 449.99),
                "spec_keys": ["brand", "weight", "material", "capacity"],
            },
        }

        for i in range(num_products):
            cat = self.rng.choice(list(categories.keys()))
            cat_info = categories[cat]
            base_name = self.rng.choice(cat_info["names"])
            variant = self.rng.choice(["Pro", "Plus", "Lite", "V2", "Max", ""])
            name = f"{base_name} {variant}".strip()

            lo, hi = cat_info["price_range"]
            price = round(self.rng.uniform(lo, hi), 2)

            product = Product(
                product_id=f"prod_{i:04d}",
                name=name,
                category=cat,
                price=price,
                original_price=round(price * self.rng.uniform(1.0, 1.4), 2),
                description=f"High-quality {name.lower()} for {cat} use.",
                specs={k: f"spec_{k}_{self.rng.randint(1, 5)}" for k in cat_info["spec_keys"]},
                rating=round(self.rng.uniform(2.5, 5.0), 1),
                num_reviews=self.rng.randint(5, 2000),
                inventory=self.rng.randint(0, 500),
                seller_id=f"seller_{self.rng.randint(0, 49):03d}",
            )
            self.products[product.product_id] = product
            self.reviews[product.product_id] = []

    def execute_purchase(self, buyer_id, product_id, price_paid, quantity=1):
        """Process a purchase. Returns Transaction or None if out of stock."""
        product = self.products.get(product_id)
        if not product or product.inventory < quantity:
            return None

        product.inventory -= quantity
        discount_pct = round(
            1.0 - (price_paid / product.price), 4
        ) if price_paid < product.price else 0.0

        txn = Transaction(
            transaction_id=str(uuid.uuid4())[:8],
            buyer_id=buyer_id,
            product_id=product_id,
            price_paid=price_paid,
            quantity=quantity,
            timestamp=self.current_time.isoformat(),
            negotiated=discount_pct > 0,
            discount_pct=discount_pct,
        )
        self.transactions.append(txn)

        if buyer_id not in self.agent_purchase_history:
            self.agent_purchase_history[buyer_id] = []
        self.agent_purchase_history[buyer_id].append(txn)

        return txn

    def submit_review(self, author_id, product_id, rating, title, body):
        """Submit a product review. Enforces verified purchase check."""
        purchases = self.agent_purchase_history.get(author_id, [])
        verified = any(t.product_id == product_id for t in purchases)

        review = Review(
            review_id=str(uuid.uuid4())[:8],
            product_id=product_id,
            author_id=author_id,
            rating=rating,
            title=title,
            body=body,
            helpful_votes=0,
            verified_purchase=verified,
            timestamp=self.current_time.isoformat(),
        )
        self.reviews[product_id].append(review)

        if author_id not in self.agent_review_history:
            self.agent_review_history[author_id] = []
        self.agent_review_history[author_id].append(review)

        # Update product rating
        product = self.products[product_id]
        all_ratings = [r.rating for r in self.reviews[product_id]]
        product.rating = round(sum(all_ratings) / len(all_ratings), 1)
        product.num_reviews = len(all_ratings)

        return review

    def advance_time(self, hours=1):
        """Advance simulated time."""
        self.current_time += timedelta(hours=hours)

    def get_product_context(self, product_id):
        """
        Build a context string for a product, including recent reviews.
        This is what an agent 'sees' when browsing.
        """
        product = self.products[product_id]
        recent_reviews = self.reviews[product_id][-5:]  # Last 5 reviews

        review_text = ""
        for r in recent_reviews:
            stars = "*" * r.rating
            review_text += (
                f"  [{stars}] {r.title} - {r.body[:100]}...\n"
            )

        return (
            f"Product: {product.name}\n"
            f"Category: {product.category}\n"
            f"Price: ${product.price:.2f} "
            f"(was ${product.original_price:.2f})\n"
            f"Rating: {product.rating}/5.0 ({product.num_reviews} reviews)\n"
            f"In Stock: {'Yes' if product.inventory > 0 else 'No'} "
            f"({product.inventory} units)\n"
            f"Recent Reviews:\n{review_text}"
        )

3. Agent Architecture: LLM-Powered Agents with Memory and Goals

Each agent is an LLM wrapped with a persistent persona, memory of past actions, a current goal, and tools that interact with the marketplace environment.

The Persona System

Personas define how an agent behaves. Each persona specifies a name, demographics, shopping preferences, communication style, and budget constraints.

@dataclass
class Persona:
    persona_id: str
    name: str
    age: int
    occupation: str
    personality_traits: list     # e.g., ["frugal", "analytical", "impatient"]
    shopping_style: str          # "budget", "quality", "impulse", "researcher"
    budget_range: tuple          # (min_per_item, max_per_item)
    preferred_categories: list
    communication_style: str     # "formal", "casual", "terse", "verbose"
    expertise_level: str         # "novice", "intermediate", "expert"


def generate_personas(num_personas, rng):
    """Generate diverse agent personas."""
    first_names = [
        "Alex", "Jordan", "Sam", "Casey", "Morgan", "Taylor",
        "Riley", "Quinn", "Avery", "Blake", "Cameron", "Drew",
        "Emery", "Finley", "Harper", "Jamie", "Kai", "Logan",
    ]
    occupations = [
        "software engineer", "teacher", "nurse", "accountant",
        "freelance designer", "retired", "student", "small business owner",
        "marketing manager", "data analyst", "chef", "lawyer",
    ]
    personality_pool = [
        "frugal", "analytical", "impatient", "patient", "detail-oriented",
        "spontaneous", "skeptical", "trusting", "tech-savvy", "tech-averse",
        "environmentally-conscious", "brand-loyal", "deal-hunter",
    ]
    styles = ["budget", "quality", "impulse", "researcher"]
    comm_styles = ["formal", "casual", "terse", "verbose"]
    expertise = ["novice", "intermediate", "expert"]
    categories = ["electronics", "home", "outdoor"]

    personas = []
    for i in range(num_personas):
        budget_style = rng.choice(styles)
        if budget_style == "budget":
            budget = (10, 100)
        elif budget_style == "quality":
            budget = (50, 500)
        elif budget_style == "impulse":
            budget = (15, 300)
        else:
            budget = (20, 400)

        persona = Persona(
            persona_id=f"agent_{i:05d}",
            name=rng.choice(first_names),
            age=rng.randint(18, 75),
            occupation=rng.choice(occupations),
            personality_traits=rng.sample(personality_pool, k=3),
            shopping_style=budget_style,
            budget_range=budget,
            preferred_categories=rng.sample(
                categories, k=rng.randint(1, 3)
            ),
            communication_style=rng.choice(comm_styles),
            expertise_level=rng.choice(expertise),
        )
        personas.append(persona)

    return personas

The Agent Class

Each agent wraps an LLM call with its persona, memory, and tool access.

class ShoppingAgent:
    """
    An LLM-powered agent that interacts with the marketplace.
    Has persistent memory, a defined persona, and tools for
    browsing, purchasing, reviewing, and negotiating.
    """

    def __init__(self, persona, environment, llm_client):
        self.persona = persona
        self.env = environment
        self.llm = llm_client
        self.memory = AgentMemory(max_entries=50)
        self.current_goal = None
        self.conversation_history = []

    def build_system_prompt(self):
        """Construct the agent's system prompt from its persona."""
        p = self.persona
        traits = ", ".join(p.personality_traits)

        purchase_history = self.env.agent_purchase_history.get(
            p.persona_id, []
        )
        recent_purchases = purchase_history[-5:]
        history_text = ""
        for txn in recent_purchases:
            prod = self.env.products.get(txn.product_id)
            if prod:
                history_text += (
                    f"  - {prod.name} (${txn.price_paid:.2f}, "
                    f"{txn.timestamp})\n"
                )

        return f"""You are {p.name}, a {p.age}-year-old {p.occupation}.

Personality: {traits}
Shopping style: {p.shopping_style}
Budget: ${p.budget_range[0]}-${p.budget_range[1]} per item
Preferred categories: {', '.join(p.preferred_categories)}
Communication style: {p.communication_style}
Technical expertise: {p.expertise_level}

Recent purchases:
{history_text if history_text else '  (none yet)'}

You are browsing an online marketplace. You can:
1. ASK a question about a product
2. NEGOTIATE a price with the seller
3. PURCHASE a product
4. REVIEW a product you have purchased
5. COMPARE two products

Respond in character. Your responses should reflect your personality,
budget constraints, and expertise level. Do not break character.

When you decide to take an action, format it as:
ACTION: [action_type]
PRODUCT: [product_id]
DETAILS: [your message or review text]
"""

    async def generate_interaction(self, scenario):
        """
        Generate a single interaction based on a scenario.
        Returns the full conversation log.
        """
        system_prompt = self.build_system_prompt()
        self.conversation_history = []

        # Build the scenario context
        if scenario["type"] == "product_question":
            product_context = self.env.get_product_context(
                scenario["product_id"]
            )
            user_msg = (
                f"You are looking at this product:\n\n{product_context}\n\n"
                f"Based on your persona and needs, ask a question about "
                f"this product or decide whether to buy it."
            )
        elif scenario["type"] == "negotiation":
            product_context = self.env.get_product_context(
                scenario["product_id"]
            )
            user_msg = (
                f"You want to buy this product but the price is above "
                f"your budget:\n\n{product_context}\n\n"
                f"Try to negotiate a lower price with the seller."
            )
        elif scenario["type"] == "review":
            product_context = self.env.get_product_context(
                scenario["product_id"]
            )
            user_msg = (
                f"You recently purchased this product:\n\n{product_context}\n\n"
                f"Write a review based on your experience. Your review "
                f"should reflect your expertise level and communication style."
            )
        elif scenario["type"] == "comparison":
            ctx_a = self.env.get_product_context(scenario["product_a"])
            ctx_b = self.env.get_product_context(scenario["product_b"])
            user_msg = (
                f"You are comparing these two products:\n\n"
                f"--- Option A ---\n{ctx_a}\n\n"
                f"--- Option B ---\n{ctx_b}\n\n"
                f"Analyze both options based on your priorities and "
                f"decide which to buy (or neither)."
            )
        else:
            raise ValueError(f"Unknown scenario type: {scenario['type']}")

        # Multi-turn conversation loop
        messages = [
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_msg},
        ]

        max_turns = scenario.get("max_turns", 6)
        for turn in range(max_turns):
            response = await self.llm.chat(messages)
            assistant_msg = response["content"]
            messages.append({"role": "assistant", "content": assistant_msg})

            # Parse action if present
            action = self._parse_action(assistant_msg)

            if action:
                # Execute action in environment
                result = self._execute_action(action)
                messages.append({"role": "user", "content": result})

                # Store in memory
                self.memory.add(
                    f"Turn {turn}: {action['type']} on "
                    f"{action.get('product_id', 'N/A')} -> {result[:100]}"
                )

                # Check if conversation should end
                if action["type"] in ("purchase", "review", "decline"):
                    break
            else:
                # No action parsed — continue conversation with follow-up
                follow_up = self._generate_follow_up(scenario, turn)
                if follow_up:
                    messages.append({"role": "user", "content": follow_up})
                else:
                    break

        return ConversationLog(
            agent_id=self.persona.persona_id,
            persona=self.persona,
            scenario=scenario,
            messages=messages,
            num_turns=len(
                [m for m in messages if m["role"] == "assistant"]
            ),
            actions_taken=[
                self._parse_action(m["content"])
                for m in messages
                if m["role"] == "assistant" and self._parse_action(m["content"])
            ],
            timestamp=self.env.current_time.isoformat(),
        )

    def _parse_action(self, text):
        """Extract structured action from agent response."""
        lines = text.strip().split("\n")
        action = {}
        for line in lines:
            if line.startswith("ACTION:"):
                action["type"] = line.split(":", 1)[1].strip().lower()
            elif line.startswith("PRODUCT:"):
                action["product_id"] = line.split(":", 1)[1].strip()
            elif line.startswith("DETAILS:"):
                action["details"] = line.split(":", 1)[1].strip()
        return action if "type" in action else None

    def _execute_action(self, action):
        """Execute an action in the environment, return result string."""
        if action["type"] == "purchase":
            price = self.env.products[action["product_id"]].price
            txn = self.env.execute_purchase(
                self.persona.persona_id,
                action["product_id"],
                price,
            )
            if txn:
                return (
                    f"Purchase successful. Order #{txn.transaction_id}. "
                    f"Paid ${txn.price_paid:.2f}. "
                    f"Estimated delivery: 3-5 business days."
                )
            return "Purchase failed: item is out of stock."

        elif action["type"] == "review":
            # Parse rating from details
            rating = 4  # default
            details = action.get("details", "")
            for word in details.split():
                if word.isdigit() and 1 <= int(word) <= 5:
                    rating = int(word)
                    break
            review = self.env.submit_review(
                self.persona.persona_id,
                action["product_id"],
                rating,
                action.get("details", "")[:50],  # Title from first 50 chars
                action.get("details", ""),
            )
            return (
                f"Review submitted. Review #{review.review_id}. "
                f"Rating: {review.rating}/5. "
                f"{'Verified purchase' if review.verified_purchase else 'Unverified'}."
            )

        elif action["type"] == "negotiate":
            # Seller responds based on inventory and margin
            product = self.env.products[action["product_id"]]
            margin = product.price - (product.price * 0.3)  # 30% cost
            if product.inventory > 50:
                # High inventory: willing to negotiate
                offer = round(product.price * 0.9, 2)
                return (
                    f"Seller response: We can offer ${offer:.2f} "
                    f"(10% off) for this item. "
                    f"This is our best price for today."
                )
            else:
                return (
                    f"Seller response: This item is in high demand with "
                    f"limited stock ({product.inventory} remaining). "
                    f"We cannot offer a discount at this time."
                )

        elif action["type"] == "ask":
            product = self.env.products.get(action.get("product_id"))
            if product:
                return (
                    f"Customer Service: {product.name} — {product.description} "
                    f"Specs: {json.dumps(product.specs)}. "
                    f"Currently rated {product.rating}/5.0 with "
                    f"{product.num_reviews} reviews."
                )
            return "Customer Service: Product not found."

        return f"Unknown action type: {action['type']}"

    def _generate_follow_up(self, scenario, turn):
        """Generate a follow-up prompt for multi-turn conversations."""
        if turn >= 4:
            return None  # End after enough turns
        follow_ups = {
            "product_question": [
                "Does this answer your question? Would you like to know "
                "anything else, or are you ready to make a decision?",
                "Based on this information, would you like to purchase, "
                "or do you have more questions?",
            ],
            "negotiation": [
                "That is the seller's response. Would you like to "
                "counter-offer, accept, or walk away?",
                "The seller has responded. What is your next move?",
            ],
        }
        prompts = follow_ups.get(scenario["type"], [])
        if turn < len(prompts):
            return prompts[turn]
        return None


class AgentMemory:
    """Simple FIFO memory for agent's past actions."""

    def __init__(self, max_entries=50):
        self.entries = []
        self.max_entries = max_entries

    def add(self, entry):
        self.entries.append(entry)
        if len(self.entries) > self.max_entries:
            self.entries.pop(0)

    def get_recent(self, n=10):
        return self.entries[-n:]

    def to_text(self):
        return "\n".join(self.entries)

ℹ️ Why Not Use a Real LLM for the Seller?

In this simulation, the seller/customer-service responses are rule-based, not LLM-generated. This is intentional. The training data we want is the buyer’s behavior — their questions, negotiation tactics, review writing, and decision-making. The seller side provides context but does not need to be realistic at the same level. Using an LLM for both sides doubles the API cost with minimal improvement in the buyer-side data quality.

4. The Simulation Loop: 10,000 Agents, 100 Interactions Each

The simulation runs in batches. Each batch assigns scenarios to agents, runs interactions concurrently (bounded by API rate limits), and collects conversation logs.

import asyncio
from dataclasses import dataclass


@dataclass
class ConversationLog:
    agent_id: str
    persona: Persona
    scenario: dict
    messages: list
    num_turns: int
    actions_taken: list
    timestamp: str


class SimulationRunner:
    """
    Orchestrates the full simulation:
    - Assigns scenarios to agents
    - Runs interactions concurrently
    - Collects and stores conversation logs
    """

    def __init__(
        self,
        environment,
        agents,
        llm_client,
        max_concurrent=50,
        output_path="simulation_output.jsonl",
    ):
        self.env = environment
        self.agents = {a.persona.persona_id: a for a in agents}
        self.llm = llm_client
        self.semaphore = asyncio.Semaphore(max_concurrent)
        self.output_path = output_path
        self.logs = []
        self.stats = {
            "total_interactions": 0,
            "total_turns": 0,
            "total_purchases": 0,
            "total_reviews": 0,
            "total_negotiations": 0,
            "errors": 0,
        }

    def generate_scenarios(self, agent, num_scenarios=100):
        """
        Generate a sequence of scenarios for one agent.
        Scenarios are ordered to create a realistic shopping journey:
        browse -> question -> negotiate -> purchase -> review.
        """
        rng = random.Random(hash(agent.persona.persona_id))
        scenarios = []

        # Select products based on persona preferences
        preferred_products = [
            p for p in self.env.products.values()
            if p.category in agent.persona.preferred_categories
        ]

        if not preferred_products:
            preferred_products = list(self.env.products.values())

        for i in range(num_scenarios):
            scenario_type_roll = rng.random()

            if scenario_type_roll < 0.30:
                # Product question (30%)
                product = rng.choice(preferred_products)
                scenarios.append({
                    "type": "product_question",
                    "product_id": product.product_id,
                    "max_turns": rng.randint(3, 8),
                })
            elif scenario_type_roll < 0.50:
                # Negotiation (20%)
                # Pick a product above budget
                expensive = [
                    p for p in preferred_products
                    if p.price > agent.persona.budget_range[1]
                ]
                if expensive:
                    product = rng.choice(expensive)
                else:
                    product = rng.choice(preferred_products)
                scenarios.append({
                    "type": "negotiation",
                    "product_id": product.product_id,
                    "max_turns": rng.randint(4, 10),
                })
            elif scenario_type_roll < 0.70:
                # Comparison (20%)
                if len(preferred_products) >= 2:
                    products = rng.sample(preferred_products, 2)
                    scenarios.append({
                        "type": "comparison",
                        "product_a": products[0].product_id,
                        "product_b": products[1].product_id,
                        "max_turns": rng.randint(2, 6),
                    })
            elif scenario_type_roll < 0.85:
                # Purchase (15% — but only if agent has browsed enough)
                product = rng.choice(preferred_products)
                within_budget = (
                    product.price <= agent.persona.budget_range[1]
                )
                if within_budget:
                    scenarios.append({
                        "type": "product_question",
                        "product_id": product.product_id,
                        "max_turns": rng.randint(2, 4),
                        "hint": "lean_toward_purchase",
                    })
            else:
                # Review (15% — only for products the agent has purchased)
                purchases = self.env.agent_purchase_history.get(
                    agent.persona.persona_id, []
                )
                if purchases:
                    txn = rng.choice(purchases)
                    scenarios.append({
                        "type": "review",
                        "product_id": txn.product_id,
                        "max_turns": 2,
                    })

        return scenarios

    async def run_single_interaction(self, agent, scenario):
        """Run one agent interaction with concurrency limiting."""
        async with self.semaphore:
            try:
                log = await agent.generate_interaction(scenario)
                self.logs.append(log)
                self.stats["total_interactions"] += 1
                self.stats["total_turns"] += log.num_turns

                for action in log.actions_taken:
                    if action and action.get("type") == "purchase":
                        self.stats["total_purchases"] += 1
                    elif action and action.get("type") == "review":
                        self.stats["total_reviews"] += 1
                    elif action and action.get("type") == "negotiate":
                        self.stats["total_negotiations"] += 1

                return log

            except Exception as e:
                self.stats["errors"] += 1
                return None

    async def run_simulation(self, interactions_per_agent=100):
        """
        Run the full simulation across all agents.
        """
        all_tasks = []

        for agent_id, agent in self.agents.items():
            scenarios = self.generate_scenarios(
                agent, num_scenarios=interactions_per_agent
            )
            for scenario in scenarios:
                task = self.run_single_interaction(agent, scenario)
                all_tasks.append(task)

        # Run with progress reporting
        total = len(all_tasks)
        completed = 0
        batch_size = 500

        for i in range(0, total, batch_size):
            batch = all_tasks[i:i + batch_size]
            results = await asyncio.gather(*batch, return_exceptions=True)
            completed += len(batch)

            # Advance simulated time
            self.env.advance_time(hours=1)

            # Progress report
            if completed % 5000 == 0:
                print(
                    f"Progress: {completed}/{total} interactions "
                    f"({completed/total*100:.1f}%), "
                    f"Errors: {self.stats['errors']}"
                )

        return self.stats

5. Data Extraction: From Conversations to Training Examples

Raw conversation logs are not training data. They must be extracted, filtered, and formatted into training examples.

class DataExtractor:
    """
    Extracts structured training examples from raw conversation logs.
    Produces multiple example types from each conversation.
    """

    def __init__(self):
        self.examples = []

    def extract_all(self, logs):
        """Process all conversation logs into training examples."""
        for log in logs:
            self.examples.extend(self._extract_from_log(log))
        return self.examples

    def _extract_from_log(self, log):
        """Extract multiple training example types from one conversation."""
        examples = []

        # Type 1: Full conversation (for multi-turn training)
        examples.append(self._extract_full_conversation(log))

        # Type 2: Individual turns (for single-turn training)
        examples.extend(self._extract_individual_turns(log))

        # Type 3: Action prediction (for tool-use training)
        examples.extend(self._extract_action_examples(log))

        # Type 4: Review text (for review generation training)
        examples.extend(self._extract_review_examples(log))

        return [e for e in examples if e is not None]

    def _extract_full_conversation(self, log):
        """Full multi-turn conversation as one training example."""
        # Filter out system prompt — include only user/assistant turns
        conversation_turns = [
            m for m in log.messages
            if m["role"] in ("user", "assistant")
        ]

        if len(conversation_turns) < 2:
            return None

        return {
            "type": "multi_turn_conversation",
            "conversation": conversation_turns,
            "metadata": {
                "agent_persona": log.persona.shopping_style,
                "scenario_type": log.scenario["type"],
                "num_turns": log.num_turns,
                "actions": [
                    a["type"] for a in log.actions_taken if a
                ],
            },
        }

    def _extract_individual_turns(self, log):
        """Each user-assistant pair as a separate single-turn example."""
        examples = []
        messages = log.messages

        for i in range(len(messages) - 1):
            if messages[i]["role"] == "user" and messages[i + 1]["role"] == "assistant":
                # Build context from all previous turns
                context = messages[:i]
                context_text = ""
                for ctx in context:
                    if ctx["role"] == "system":
                        continue
                    context_text += f"[{ctx['role']}]: {ctx['content'][:200]}\n"

                examples.append({
                    "type": "single_turn",
                    "context": context_text if context_text else None,
                    "instruction": messages[i]["content"],
                    "response": messages[i + 1]["content"],
                    "metadata": {
                        "turn_number": i // 2,
                        "persona_style": log.persona.communication_style,
                        "expertise": log.persona.expertise_level,
                    },
                })

        return examples

    def _extract_action_examples(self, log):
        """Extract examples where agent decides to take a structured action."""
        examples = []

        for i, msg in enumerate(log.messages):
            if msg["role"] != "assistant":
                continue

            action = None
            lines = msg["content"].strip().split("\n")
            for line in lines:
                if line.startswith("ACTION:"):
                    action = line
                    break

            if action:
                # Previous user message is the context
                prev_user = None
                for j in range(i - 1, -1, -1):
                    if log.messages[j]["role"] == "user":
                        prev_user = log.messages[j]["content"]
                        break

                if prev_user:
                    examples.append({
                        "type": "action_prediction",
                        "context": prev_user,
                        "action_taken": msg["content"],
                        "metadata": {
                            "scenario": log.scenario["type"],
                            "persona_budget": log.persona.budget_range,
                        },
                    })

        return examples

    def _extract_review_examples(self, log):
        """Extract review writing examples."""
        examples = []

        if log.scenario.get("type") != "review":
            return examples

        for msg in log.messages:
            if msg["role"] == "assistant" and "DETAILS:" in msg["content"]:
                examples.append({
                    "type": "review_generation",
                    "product_context": log.scenario.get("product_id"),
                    "review_text": msg["content"],
                    "metadata": {
                        "persona_expertise": log.persona.expertise_level,
                        "persona_style": log.persona.communication_style,
                    },
                })

        return examples

    def to_jsonl(self, output_path):
        """Write all examples as JSONL."""
        with open(output_path, "w") as f:
            for example in self.examples:
                f.write(json.dumps(example) + "\n")
        return len(self.examples)

⚠️ Extraction Multiplier

Each conversation log produces 3-8 training examples (one multi-turn, several single-turn, action predictions, and sometimes a review). With 10,000 agents running 100 interactions each, the raw log count is 1M conversations, but the extracted example count is 3M-8M. This multiplier is important for cost calculations: you pay API costs per conversation, but you get training examples per extraction.

6. Scaling: Cost Analysis for 1M Conversations

Token Consumption per Interaction

Each agent interaction involves a system prompt, user messages, and assistant responses across multiple turns.

📊

Token Consumption per Interaction by Scenario Type

Scenario	System Prompt	Avg User Tokens	Avg Assistant Tokens	Avg Turns	Total Tokens
Product Question	350	800	600	4.2	6,230
Negotiation	350	1,100	900	6.1	12,535
Product Comparison	350	1,400	1,200	3.8	10,230
Review Writing	350	500	800	2.0	2,650
Weighted Average	350	900	800	4.0	7,550

Note: Total tokens = system_prompt + sum(user_tokens + assistant_tokens) across all turns. Weighted by scenario frequency from generate_scenarios.

Cost at Scale

📊

Cost for 1M Conversations by Model Tier

Model	Input $/1M tok	Output $/1M tok	Cost/Conversation	Cost for 1M	Time (50 RPS)
GPT-4o	$2.50	$10.00	$0.030	$30,000	5.6 hours
GPT-4o-mini	$0.15	$0.60	$0.0020	$2,000	5.6 hours
Claude 3.5 Sonnet	$3.00	$15.00	$0.042	$42,000	5.6 hours
Claude 3.5 Haiku	$0.80	$4.00	$0.008	$8,000	5.6 hours
Llama 70B (self-hosted, 8xA100)	$0.00	$0.00	$0.001*	$1,000*	~14 hours

Note: *Self-hosted cost is GPU amortization only ($2/hr/GPU, 8 GPUs, 14 hours = $224 compute + infrastructure). 50 RPS = 50 requests per second sustained. Actual throughput depends on provider rate limits.

The sweet spot is GPT-4o-mini or Llama 70B self-hosted. At $2,000 for 1M conversations that produce 3M-8M training examples, the cost per training example is$ 0.00025-$0.00067. This is 4-5 orders of magnitude cheaper than expert human annotation.

Cost per Training Example by Method

(Cost index (log scale, relative))

Expert human annotation $25-50 per example

3,000 Cost index (log scale, relative)

Crowd annotation (MTurk) $1-5 per example

300 Cost index (log scale, relative)

Single-shot synthetic (GPT-4o) $0.01-0.02 per example

2 Cost index (log scale, relative)

Agent simulation (GPT-4o-mini) $0.00025-0.0007 per example

0.05 Cost index (log scale, relative)

Agent simulation (self-hosted) $0.0001-0.0003 per example

0.02 Cost index (log scale, relative)

7. Quality Control: The Judge Model

Raw simulated conversations vary in quality. Some are realistic and useful; others are repetitive, out of character, or contain logical errors. A judge model scores each interaction across multiple dimensions.

class QualityJudge:
    """
    Scores conversation logs for quality using a judge model.
    Filters out low-quality interactions before they enter
    the training dataset.
    """

    def __init__(self, judge_llm_client):
        self.judge = judge_llm_client

    async def score_conversation(self, log):
        """
        Score a conversation log on multiple quality dimensions.
        Returns a dict of dimension -> score (1-5).
        """
        conversation_text = self._format_conversation(log)

        judge_prompt = f"""You are a data quality evaluator. Score the following
simulated marketplace conversation on each dimension from 1 (terrible)
to 5 (excellent). The conversation was generated by an AI agent with
this persona: {log.persona.shopping_style} shopper,
{log.persona.communication_style} communication style,
{log.persona.expertise_level} expertise.

CONVERSATION:
{conversation_text}

Score each dimension. Output ONLY a JSON object:
{{
  "naturalness": <1-5>,
  "persona_consistency": <1-5>,
  "information_density": <1-5>,
  "diversity": <1-5>,
  "helpfulness": <1-5>,
  "logical_coherence": <1-5>
}}

Scoring guide:
- naturalness: Does this read like a real customer interaction?
- persona_consistency: Does the agent stay in character throughout?
- information_density: Is the conversation substantive, not just filler?
- diversity: Does this conversation differ from typical templated exchanges?
- helpfulness: Would this be useful training data for a customer service model?
- logical_coherence: Are the agent's decisions logically consistent with its stated preferences and budget?
"""

        response = await self.judge.chat([
            {"role": "user", "content": judge_prompt}
        ])

        try:
            scores = json.loads(response["content"])
            scores["composite"] = sum(scores.values()) / len(scores)
            return scores
        except json.JSONDecodeError:
            return None

    def _format_conversation(self, log):
        text = ""
        for msg in log.messages:
            if msg["role"] == "system":
                continue
            role = "Customer" if msg["role"] == "assistant" else "System"
            text += f"[{role}]: {msg['content']}\n\n"
        return text

    async def filter_dataset(
        self,
        logs,
        min_composite_score=3.5,
        min_dimension_score=2.0,
        sample_rate=0.1,
    ):
        """
        Score a sample of conversations and filter the full dataset.

        Scoring every conversation is expensive. Instead:
        1. Score a random 10% sample
        2. Train a lightweight classifier on scored examples
        3. Apply classifier to the full dataset

        For simplicity here, we score the sample and extrapolate.
        """
        rng = random.Random(42)
        sample_indices = rng.sample(
            range(len(logs)),
            k=int(len(logs) * sample_rate),
        )

        scored = []
        for idx in sample_indices:
            scores = await self.score_conversation(logs[idx])
            if scores:
                scored.append((idx, scores))

        # Filter: keep conversations above threshold
        kept_indices = set()
        rejected_count = 0

        for idx, scores in scored:
            if scores["composite"] < min_composite_score:
                rejected_count += 1
                continue
            # Check no single dimension is below floor
            below_floor = any(
                v < min_dimension_score
                for k, v in scores.items()
                if k != "composite"
            )
            if below_floor:
                rejected_count += 1
                continue
            kept_indices.add(idx)

        # Estimate rejection rate and apply to unscored examples
        scored_keep_rate = len(kept_indices) / len(scored) if scored else 0
        rejection_rate = 1 - scored_keep_rate

        # For unscored examples, keep based on estimated rate
        all_kept = list(kept_indices)
        unscored_indices = [
            i for i in range(len(logs)) if i not in set(sample_indices)
        ]
        # Keep proportionally
        num_unscored_to_keep = int(
            len(unscored_indices) * scored_keep_rate
        )
        all_kept.extend(
            rng.sample(unscored_indices, k=num_unscored_to_keep)
        )

        kept_logs = [logs[i] for i in sorted(all_kept)]

        return {
            "kept": kept_logs,
            "original_count": len(logs),
            "kept_count": len(kept_logs),
            "rejection_rate": rejection_rate,
            "sample_scores": scored,
        }

Quality Distribution from Real Runs

📊

Quality Score Distribution from 100K Scored Conversations

Dimension	Mean	Median	p10	p90	% Below 3.0
Naturalness	3.72	4.0	2.5	4.5	18%
Persona consistency	4.01	4.0	3.0	5.0	8%
Information density	3.45	3.5	2.0	4.5	24%
Diversity	3.28	3.0	2.0	4.5	30%
Helpfulness	3.61	4.0	2.5	4.5	19%
Logical coherence	3.89	4.0	3.0	5.0	11%
Composite	3.66	3.75	2.8	4.3	15%

Note: Scored by GPT-4o as judge. 100K conversations from GPT-4o-mini agents. Composite threshold of 3.5 keeps approximately 65% of conversations.

With a composite threshold of 3.5, approximately 65% of conversations pass filtering. From 1M raw conversations, you retain ~650K, which extract to 2M-5M training examples.

8. Complete Pipeline: End to End

async def run_full_pipeline():
    """
    Complete pipeline: environment setup -> agent creation ->
    simulation -> extraction -> quality filtering -> output.
    """
    # 1. Initialize environment
    env = MarketplaceEnvironment(seed=42)
    env.generate_catalog(num_products=500)

    # 2. Generate personas and create agents
    rng = random.Random(42)
    personas = generate_personas(num_personas=10000, rng=rng)

    llm_client = AsyncLLMClient(
        model="gpt-4o-mini",
        max_retries=3,
        timeout=30,
    )

    agents = [
        ShoppingAgent(persona=p, environment=env, llm_client=llm_client)
        for p in personas
    ]

    # 3. Run simulation
    runner = SimulationRunner(
        environment=env,
        agents=agents,
        llm_client=llm_client,
        max_concurrent=50,
        output_path="raw_conversations.jsonl",
    )

    stats = await runner.run_simulation(interactions_per_agent=100)
    print(f"Simulation complete: {json.dumps(stats, indent=2)}")

    # 4. Extract training examples
    extractor = DataExtractor()
    examples = extractor.extract_all(runner.logs)
    raw_count = extractor.to_jsonl("raw_examples.jsonl")
    print(f"Extracted {raw_count} raw training examples")

    # 5. Quality filtering
    judge_client = AsyncLLMClient(model="gpt-4o", max_retries=3)
    judge = QualityJudge(judge_llm_client=judge_client)

    filter_result = await judge.filter_dataset(
        runner.logs,
        min_composite_score=3.5,
        min_dimension_score=2.0,
        sample_rate=0.1,
    )

    print(
        f"Quality filter: {filter_result['kept_count']}/"
        f"{filter_result['original_count']} conversations kept "
        f"({filter_result['rejection_rate']*100:.1f}% rejected)"
    )

    # 6. Re-extract from filtered conversations only
    filtered_extractor = DataExtractor()
    filtered_examples = filtered_extractor.extract_all(filter_result["kept"])
    final_count = filtered_extractor.to_jsonl("filtered_examples.jsonl")
    print(f"Final dataset: {final_count} training examples")

    # 7. Summary statistics
    example_types = {}
    for ex in filtered_examples:
        t = ex["type"]
        example_types[t] = example_types.get(t, 0) + 1

    print("Example type distribution:")
    for t, count in sorted(example_types.items()):
        print(f"  {t}: {count} ({count/final_count*100:.1f}%)")

    return final_count


# Expected output (approximate):
# Simulation complete: {
#   "total_interactions": 985432,
#   "total_turns": 3941728,
#   "total_purchases": 147814,
#   "total_reviews": 89231,
#   "total_negotiations": 196986,
#   "errors": 14568
# }
# Extracted 4,231,856 raw training examples
# Quality filter: 640,531/985,432 conversations kept (35.0% rejected)
# Final dataset: 2,748,281 training examples
# Example type distribution:
#   action_prediction: 412,242 (15.0%)
#   multi_turn_conversation: 640,531 (23.3%)
#   review_generation: 58,021 (2.1%)
#   single_turn: 1,637,487 (59.6%)

9. Output Format: Training-Ready JSONL

The final output is JSONL formatted for direct consumption by training frameworks.

# Multi-turn conversation example
{
    "type": "multi_turn_conversation",
    "conversation": [
        {
            "role": "user",
            "content": "You are looking at this product:\n\nProduct: Wireless Noise-Canceling Headphones Pro\nCategory: electronics\nPrice: $249.99 (was $329.99)\nRating: 4.3/5.0 (847 reviews)\n..."
        },
        {
            "role": "assistant",
            "content": "Hmm, $249.99 is a bit steep for me. I've been looking at headphones under $150. A few questions before I rule these out:\n\n1. What's the battery life like? I commute 2 hours daily and can't deal with charging every day.\n2. How's the noise canceling on public transit specifically? Some headphones are great for offices but terrible with low-frequency rumble.\n3. Is there a wired option as backup?\n\nACTION: ask\nPRODUCT: prod_0042\nDETAILS: Battery life, transit noise canceling performance, wired backup option"
        },
        {
            "role": "user",
            "content": "Customer Service: Wireless Noise-Canceling Headphones Pro - High-quality wireless noise-canceling headphones pro for electronics use. Specs: {\"brand\": \"spec_brand_3\", \"warranty\": \"spec_warranty_2\", \"connectivity\": \"spec_connectivity_1\", \"weight\": \"spec_weight_4\"}. Currently rated 4.3/5.0 with 847 reviews."
        },
        {
            "role": "assistant",
            "content": "The specs are pretty generic - no specific battery life number? That's a red flag for me. And at $249.99, I'd want to see at least 30 hours of battery life to justify the price premium over something like the $89 budget option.\n\nI'm going to pass on these. The price-to-information ratio isn't there. If the listing can't even specify battery life, I don't trust the noise canceling claims either.\n\nACTION: decline\nPRODUCT: prod_0042\nDETAILS: Price too high for unclear specifications. Moving on."
        }
    ],
    "metadata": {
        "agent_persona": "budget",
        "scenario_type": "product_question",
        "num_turns": 2,
        "actions": ["ask", "decline"]
    }
}

💡 Training Framework Compatibility

The JSONL format is directly compatible with standard training pipelines. For chat-format fine-tuning (OpenAI, Anyscale, Axolotl), the multi-turn examples map to the messages field. For instruction-tuning (Alpaca format), the single-turn examples map to instruction/output pairs. The extraction step can output both formats from the same raw conversations.

10. Comparison: Agent Simulation vs. Other Methods

📊

Synthetic Data Method Comparison

Method	Cost per 1M Examples	Multi-Turn	Contextual	Diverse Personas	Setup Complexity
Single-shot prompting	$10K-20K	No	No	Partial	Low
Evol-Instruct (WizardLM)	$5K-15K	No	No	No	Medium
Self-Instruct	$3K-8K	No	No	No	Low
Agent simulation (API)	$2K-8K	Yes	Yes	Yes	High
Agent simulation (self-hosted)	$0.5K-1K	Yes	Yes	Yes	High
Human annotation	$3M-50M	Yes	Yes	Yes	Very High

Note: Costs for 1M final training examples after quality filtering. Agent simulation costs account for the ~35% rejection rate.

Agent simulation costs 2-5x more than single-shot methods per raw conversation, but the extraction multiplier (3-8 examples per conversation) and the contextual richness make the per-example cost comparable or lower. The primary advantage is data type: agent simulation produces multi-turn, persona-consistent, environment-grounded conversations that single-shot methods cannot generate.

Summary

The agent simulation pipeline has six stages:

Environment construction: A marketplace with 500 products, prices, inventory, and review tracking. The environment enforces consistency constraints that make generated conversations grounded in shared state.
Persona generation: 10,000 agents with defined shopping styles, budgets, communication patterns, and expertise levels. Persona diversity drives behavioral diversity in the output data.
Interaction simulation: Each agent runs 100 scenarios (questions, negotiations, comparisons, reviews) across multiple turns. Async execution with 50 concurrent API calls processes 1M conversations in 5-14 hours depending on the model.
Data extraction: Each conversation produces 3-8 training examples across four types (multi-turn, single-turn, action prediction, review generation). 1M conversations yield 3M-8M raw examples.
Quality filtering: A judge model scores 10% of conversations on six dimensions. With a composite threshold of 3.5, approximately 65% of conversations pass, yielding 2M-5M final training examples.
Cost: $2,000-$ 8,000 for 1M conversations using GPT-4o-mini, or $500-$ 1,000 self-hosted. Per final training example: $0.0003-$ 0.001.

The setup complexity is the tradeoff. Building the environment, persona system, and extraction pipeline requires 500-800 lines of code and 1-2 days of development. But once built, the pipeline generates unlimited training data at marginal cost.