Calling LLM APIs with JavaScript

March 28, 2026

#genai #ai #llm #javascript #node

Time to write some code. In this tutorial, we’ll go from zero to making LLM API calls in JavaScript — setting up a project, making your first completion request, managing conversations, handling errors, and building a simple interactive chatbot.

We’ll use the OpenAI API as the primary example since it’s the most widely used, but the patterns apply to any LLM provider. You should be familiar with the concepts from What is Generative AI? and Tokens, Context Windows & Model Parameters.

Setup

Prerequisites

Node.js 18+ installed
An OpenAI API key (sign up at platform.openai.com)

Create a Project

mkdir llm-js-demo && cd llm-js-demo
npm init -y
npm install openai

Add "type": "module" to your package.json to use ES module imports.

Set Your API Key

Set your API key as an environment variable. The OpenAI SDK reads it automatically from OPENAI_API_KEY:

export OPENAI_API_KEY="your-key-here"

Never hardcode API keys in your source code. Use environment variables or a secrets manager.

Your First API Call

Create a file called index.js:

import OpenAI from "openai";

const client = new OpenAI();

const response = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "What is a closure in JavaScript?" }],
});

console.log(response.choices[0].message.content);

Run it:

node index.js

That’s it — you’ve made your first LLM API call. Let’s break down what’s happening.

Understanding the Chat Completions API

The core API is chat.completions.create. It takes a model name and an array of messages, and returns a completion.

Messages

Messages are the conversation history. Each message has a role and content:

const messages = [
  { role: "system", content: "You are a helpful coding tutor." },
  { role: "user", content: "What is a promise?" },
  { role: "assistant", content: "A promise is an object representing..." },
  { role: "user", content: "Can you show me an example?" },
];

system — Sets the model’s behavior (see System Prompts & Role Design)
user — The human’s messages
assistant — The model’s previous responses

The model is stateless — it doesn’t remember previous calls. To have a conversation, you send the entire message history with each request.

The Response Object

The response contains metadata and the generated message:

const response = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Say hello" }],
});

console.log(response.choices[0].message.content); // The generated text
console.log(response.choices[0].finish_reason);    // "stop", "length", etc.
console.log(response.usage.prompt_tokens);          // Tokens in your input
console.log(response.usage.completion_tokens);      // Tokens in the output
console.log(response.usage.total_tokens);           // Total tokens used

The usage field is useful for tracking costs and staying within token budgets.

Managing Conversations

Since the API is stateless, you manage conversation history yourself. Here’s a simple pattern:

import OpenAI from "openai";

const client = new OpenAI();
const messages = [
  { role: "system", content: "You are a concise JavaScript tutor." },
];

async function chat(userMessage) {
  messages.push({ role: "user", content: userMessage });

  const response = await client.chat.completions.create({
    model: "gpt-4o",
    messages,
  });

  const reply = response.choices[0].message.content;
  messages.push({ role: "assistant", content: reply });
  return reply;
}

console.log(await chat("What is a closure?"));
console.log(await chat("Can you show a simple example?"));
// The model remembers the first question because the full history is sent

Each call sends the entire conversation, so the model has full context. The trade-off is that longer conversations use more tokens (and cost more).

Tuning Parameters

Control the model’s behavior with parameters we covered in Tokens, Context Windows & Model Parameters:

const response = await client.chat.completions.create({
  model: "gpt-4o",
  temperature: 0,        // Deterministic output
  max_tokens: 500,       // Limit response length
  messages: [
    { role: "user", content: "Write a haiku about JavaScript." },
  ],
});

Streaming Responses

By default, the API waits until the entire response is generated before returning. For a better user experience, you can stream the response token by token:

const stream = await client.chat.completions.create({
  model: "gpt-4o",
  stream: true,
  messages: [{ role: "user", content: "Explain event loops in Node.js." }],
});

for await (const chunk of stream) {
  const text = chunk.choices[0]?.delta?.content;
  if (text) process.stdout.write(text);
}
console.log(); // newline at the end

Streaming is essential for chat interfaces — users see the response appear in real time instead of waiting for the full generation.

Error Handling

API calls can fail for several reasons. Here’s a robust pattern:

import OpenAI from "openai";

const client = new OpenAI();

async function safeChatCompletion(messages, retries = 2) {
  for (let i = 0; i <= retries; i++) {
    try {
      return await client.chat.completions.create({
        model: "gpt-4o",
        messages,
      });
    } catch (err) {
      if (err instanceof OpenAI.RateLimitError) {
        const wait = Math.pow(2, i) * 1000;
        console.log(`Rate limited. Retrying in ${wait}ms...`);
        await new Promise((r) => setTimeout(r, wait));
      } else if (err instanceof OpenAI.APIError && err.status >= 500) {
        const wait = Math.pow(2, i) * 1000;
        console.log(`Server error. Retrying in ${wait}ms...`);
        await new Promise((r) => setTimeout(r, wait));
      } else {
        throw err; // Don't retry client errors (400, 401, etc.)
      }
    }
  }
  throw new Error("Max retries exceeded");
}

Common error types:

Error	Cause	Action
`RateLimitError` (429)	Too many requests	Retry with exponential backoff
`APIError` (500+)	Server issue	Retry with backoff
`AuthenticationError` (401)	Invalid API key	Fix your key, don’t retry
`BadRequestError` (400)	Invalid request (e.g., too many tokens)	Fix the request

Building a CLI Chatbot

Let’s put it all together into an interactive chatbot you can run in your terminal:

import OpenAI from "openai";
import * as readline from "node:readline";

const client = new OpenAI();
const messages = [
  { role: "system", content: "You are a helpful, concise coding assistant." },
];

const rl = readline.createInterface({
  input: process.stdin,
  output: process.stdout,
});

function ask(prompt) {
  return new Promise((resolve) => rl.question(prompt, resolve));
}

console.log('Chat started. Type "quit" to exit.\n');

while (true) {
  const input = await ask("You: ");
  if (input.toLowerCase() === "quit") break;

  messages.push({ role: "user", content: input });

  const stream = await client.chat.completions.create({
    model: "gpt-4o",
    stream: true,
    messages,
  });

  process.stdout.write("AI: ");
  let reply = "";
  for await (const chunk of stream) {
    const text = chunk.choices[0]?.delta?.content;
    if (text) {
      process.stdout.write(text);
      reply += text;
    }
  }
  console.log("\n");

  messages.push({ role: "assistant", content: reply });
}

rl.close();

Run it with node index.js and you have a working chatbot with streaming responses and conversation memory.

Using Other Providers

The patterns above work with any LLM provider. Most offer OpenAI-compatible APIs, so you can often just change the base URL:

// Anthropic (using their SDK)
import Anthropic from "@anthropic-ai/sdk";
const anthropic = new Anthropic();

const response = await anthropic.messages.create({
  model: "claude-sonnet-4-20250514",
  max_tokens: 1024,
  system: "You are a helpful coding assistant.",
  messages: [{ role: "user", content: "What is a closure?" }],
});

console.log(response.content[0].text);

// Any OpenAI-compatible provider (e.g., local models via Ollama)
const client = new OpenAI({
  baseURL: "http://localhost:11434/v1",
  apiKey: "ollama", // Ollama doesn't need a real key
});

What’s Next?

You’re now making LLM API calls in JavaScript. If Python is more your speed, check out Calling LLM APIs with Python for the same patterns in Python. Otherwise, continue to the RAG section of this series where we’ll build applications that combine LLM calls with your own data.

Table of Contents