Structured Output & JSON Mode

March 28, 2026

When you’re building applications with LLMs, you usually need the output in a specific format — JSON, CSV, a particular schema. But LLMs are text generators by nature, and they don’t always cooperate. They might add explanatory text around your JSON, produce invalid syntax, or miss required fields.

This tutorial covers the techniques and tools for getting reliable structured output from LLMs. You should be familiar with Prompt Engineering Fundamentals and System Prompts & Role Design before reading this.

The Problem

Ask an LLM to return JSON and you might get:

Sure! Here's the JSON you requested:

```json
{"name": "Alice", "age": 30}

Hope that helps!


That's great for a human reading a chat, but if your code does `JSON.parse(response)`, it crashes. The model wrapped the JSON in markdown and added conversational text.

## Approach 1: Prompt Instructions

The simplest approach is to be very explicit in your prompt about the output format:

Extract the person’s name and age from the following text. Respond with ONLY a valid JSON object. No explanation, no markdown, no other text.

Text: “My name is Alice and I’m 30 years old.”


This works surprisingly well with modern models, especially at low temperature. But it's not guaranteed — the model might still add extra text, especially for complex prompts or longer conversations.

You can improve reliability by combining this with few-shot examples:

Extract structured data from the text. Respond with only valid JSON.

Text: “Bob is a 25-year-old engineer from London.” {“name”: “Bob”, “age”: 25, “occupation”: “engineer”, “city”: “London”}

Text: “My name is Alice and I’m 30 years old.”


### When This Is Enough

For quick scripts, prototypes, or cases where you can tolerate occasional failures and retry, prompt-based JSON extraction works fine. Wrap it in a try/catch and retry on parse failure:

```javascript
async function extractJSON(prompt, retries = 2) {
  for (let i = 0; i <= retries; i++) {
    const res = await client.chat.completions.create({
      model: "gpt-4o",
      temperature: 0,
      messages: [{ role: "user", content: prompt }],
    });
    try {
      return JSON.parse(res.choices[0].message.content);
    } catch {
      if (i === retries) throw new Error("Failed to parse JSON after retries");
    }
  }
}

Approach 2: JSON Mode

Most LLM providers now offer a JSON mode that constrains the model to only output valid JSON. This is a model-level setting, not just a prompt instruction.

OpenAI JSON Mode

const response = await client.chat.completions.create({
  model: "gpt-4o",
  response_format: { type: "json_object" },
  messages: [
    {
      role: "system",
      content: 'Extract the name and age. Respond in JSON with keys "name" and "age".',
    },
    {
      role: "user",
      content: "My name is Alice and I'm 30 years old.",
    },
  ],
});

const data = JSON.parse(response.choices[0].message.content);
// { name: "Alice", age: 30 }

JSON mode guarantees the output is valid JSON, but it doesn’t guarantee the structure — the model might use different key names or include extra fields.

When using OpenAI’s JSON mode, you must mention “JSON” somewhere in your prompt (system or user message). The API will return an error if you don’t.

Anthropic JSON Mode

With Claude, you guide JSON output through the prompt and prefill:

const response = await anthropic.messages.create({
  model: "claude-sonnet-4-20250514",
  max_tokens: 256,
  messages: [
    {
      role: "user",
      content: 'Extract the name and age from: "My name is Alice and I\'m 30 years old." Return JSON with keys "name" and "age".',
    },
    {
      role: "assistant",
      content: "{",
    },
  ],
});

const data = JSON.parse("{" + response.content[0].text);

The assistant prefill trick (starting the response with {) strongly guides the model to continue with JSON.

Approach 3: Structured Outputs (Schema Enforcement)

The most reliable approach is structured outputs, where you provide a JSON schema and the model is constrained to produce output matching that schema exactly. This goes beyond JSON mode — it guarantees not just valid JSON, but the correct structure.

OpenAI Structured Outputs

const response = await client.chat.completions.create({
  model: "gpt-4o",
  response_format: {
    type: "json_schema",
    json_schema: {
      name: "person",
      strict: true,
      schema: {
        type: "object",
        properties: {
          name: { type: "string" },
          age: { type: "integer" },
          occupation: { type: "string" },
        },
        required: ["name", "age", "occupation"],
        additionalProperties: false,
      },
    },
  },
  messages: [
    {
      role: "system",
      content: "Extract person details from the text.",
    },
    {
      role: "user",
      content: "Alice is a 30-year-old software engineer.",
    },
  ],
});

const person = JSON.parse(response.choices[0].message.content);
// { name: "Alice", age: 30, occupation: "software engineer" }
// Guaranteed to match the schema

With strict: true, the output is guaranteed to match your schema. The model cannot add extra fields, omit required fields, or use wrong types.

Using Zod with the OpenAI SDK

If you’re using TypeScript, the OpenAI SDK integrates with Zod for a cleaner developer experience:

import { zodResponseFormat } from "openai/helpers/zod";
import { z } from "zod";

const Person = z.object({
  name: z.string(),
  age: z.number().int(),
  occupation: z.string(),
});

const response = await client.beta.chat.completions.parse({
  model: "gpt-4o",
  response_format: zodResponseFormat(Person, "person"),
  messages: [
    { role: "system", content: "Extract person details from the text." },
    { role: "user", content: "Alice is a 30-year-old software engineer." },
  ],
});

const person = response.choices[0].message.parsed;
// Typed as { name: string, age: number, occupation: string }

This gives you type safety end-to-end — the schema is defined once in Zod and used for both the API call and the TypeScript type.

Approach 4: Tool/Function Calling

Another way to get structured output is to define a “function” the model can call. The model returns structured arguments for the function, which you can parse directly. This is the same mechanism used for AI agents and tool use, but it works well for pure data extraction too.

const response = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [
    { role: "user", content: "Alice is a 30-year-old software engineer." },
  ],
  tools: [
    {
      type: "function",
      function: {
        name: "extract_person",
        description: "Extract person details from text",
        parameters: {
          type: "object",
          properties: {
            name: { type: "string" },
            age: { type: "integer" },
            occupation: { type: "string" },
          },
          required: ["name", "age", "occupation"],
        },
      },
    },
  ],
  tool_choice: { type: "function", function: { name: "extract_person" } },
});

const args = JSON.parse(response.choices[0].message.tool_calls[0].function.arguments);
// { name: "Alice", age: 30, occupation: "software engineer" }

Setting tool_choice to a specific function forces the model to call it, guaranteeing you get structured output.

Which Approach Should You Use?

Approach	Reliability	Flexibility	Provider Support
Prompt instructions	Medium	High	All providers
JSON mode	High	Medium	Most providers
Structured outputs (schema)	Highest	Low (must define schema)	OpenAI, growing
Tool/function calling	High	Medium	Most providers

Start with structured outputs if your provider supports them — they’re the most reliable. Fall back to JSON mode or tool calling if you need broader provider compatibility. Use prompt instructions only for quick prototypes or providers that don’t support the other options.

Structured outputs and tool calling add a small amount of latency because the model is constrained during generation. For most applications this is negligible, but it’s worth knowing if you’re optimizing for speed.

What’s Next?

Now that you can get structured data out of LLMs reliably, it’s time to start building real applications. In Calling LLM APIs with JavaScript, we’ll set up a project from scratch, make API calls, handle conversations, and build a working application.

Table of Contents