Calling LLM APIs with JavaScript
Table of Contents
Time to write some code. In this tutorial, we’ll go from zero to making LLM API calls in JavaScript — setting up a project, making your first completion request, managing conversations, handling errors, and building a simple interactive chatbot.
We’ll use the OpenAI API as the primary example since it’s the most widely used, but the patterns apply to any LLM provider. You should be familiar with the concepts from What is Generative AI? and Tokens, Context Windows & Model Parameters.
Setup
Prerequisites
- Node.js 18+ installed
- An OpenAI API key (sign up at platform.openai.com)
Create a Project
mkdir llm-js-demo && cd llm-js-demo
npm init -y
npm install openai
Add "type": "module" to your package.json to use ES module imports.
Set Your API Key
Set your API key as an environment variable. The OpenAI SDK reads it automatically from OPENAI_API_KEY:
export OPENAI_API_KEY="your-key-here"
Your First API Call
Create a file called index.js:
import OpenAI from "openai";
const client = new OpenAI();
const response = await client.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: "What is a closure in JavaScript?" }],
});
console.log(response.choices[0].message.content);
Run it:
node index.js
That’s it — you’ve made your first LLM API call. Let’s break down what’s happening.
Understanding the Chat Completions API
The core API is chat.completions.create. It takes a model name and an array of messages, and returns a completion.
Messages
Messages are the conversation history. Each message has a role and content:
const messages = [
{ role: "system", content: "You are a helpful coding tutor." },
{ role: "user", content: "What is a promise?" },
{ role: "assistant", content: "A promise is an object representing..." },
{ role: "user", content: "Can you show me an example?" },
];
- system — Sets the model’s behavior (see System Prompts & Role Design)
- user — The human’s messages
- assistant — The model’s previous responses
The model is stateless — it doesn’t remember previous calls. To have a conversation, you send the entire message history with each request.
The Response Object
The response contains metadata and the generated message:
const response = await client.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: "Say hello" }],
});
console.log(response.choices[0].message.content); // The generated text
console.log(response.choices[0].finish_reason); // "stop", "length", etc.
console.log(response.usage.prompt_tokens); // Tokens in your input
console.log(response.usage.completion_tokens); // Tokens in the output
console.log(response.usage.total_tokens); // Total tokens used
The usage field is useful for tracking costs and staying within token budgets.
Managing Conversations
Since the API is stateless, you manage conversation history yourself. Here’s a simple pattern:
import OpenAI from "openai";
const client = new OpenAI();
const messages = [
{ role: "system", content: "You are a concise JavaScript tutor." },
];
async function chat(userMessage) {
messages.push({ role: "user", content: userMessage });
const response = await client.chat.completions.create({
model: "gpt-4o",
messages,
});
const reply = response.choices[0].message.content;
messages.push({ role: "assistant", content: reply });
return reply;
}
console.log(await chat("What is a closure?"));
console.log(await chat("Can you show a simple example?"));
// The model remembers the first question because the full history is sent
Each call sends the entire conversation, so the model has full context. The trade-off is that longer conversations use more tokens (and cost more).
Tuning Parameters
Control the model’s behavior with parameters we covered in Tokens, Context Windows & Model Parameters:
const response = await client.chat.completions.create({
model: "gpt-4o",
temperature: 0, // Deterministic output
max_tokens: 500, // Limit response length
messages: [
{ role: "user", content: "Write a haiku about JavaScript." },
],
});
Streaming Responses
By default, the API waits until the entire response is generated before returning. For a better user experience, you can stream the response token by token:
const stream = await client.chat.completions.create({
model: "gpt-4o",
stream: true,
messages: [{ role: "user", content: "Explain event loops in Node.js." }],
});
for await (const chunk of stream) {
const text = chunk.choices[0]?.delta?.content;
if (text) process.stdout.write(text);
}
console.log(); // newline at the end
Streaming is essential for chat interfaces — users see the response appear in real time instead of waiting for the full generation.
Error Handling
API calls can fail for several reasons. Here’s a robust pattern:
import OpenAI from "openai";
const client = new OpenAI();
async function safeChatCompletion(messages, retries = 2) {
for (let i = 0; i <= retries; i++) {
try {
return await client.chat.completions.create({
model: "gpt-4o",
messages,
});
} catch (err) {
if (err instanceof OpenAI.RateLimitError) {
const wait = Math.pow(2, i) * 1000;
console.log(`Rate limited. Retrying in ${wait}ms...`);
await new Promise((r) => setTimeout(r, wait));
} else if (err instanceof OpenAI.APIError && err.status >= 500) {
const wait = Math.pow(2, i) * 1000;
console.log(`Server error. Retrying in ${wait}ms...`);
await new Promise((r) => setTimeout(r, wait));
} else {
throw err; // Don't retry client errors (400, 401, etc.)
}
}
}
throw new Error("Max retries exceeded");
}
Common error types:
| Error | Cause | Action |
|---|---|---|
RateLimitError (429) |
Too many requests | Retry with exponential backoff |
APIError (500+) |
Server issue | Retry with backoff |
AuthenticationError (401) |
Invalid API key | Fix your key, don’t retry |
BadRequestError (400) |
Invalid request (e.g., too many tokens) | Fix the request |
Building a CLI Chatbot
Let’s put it all together into an interactive chatbot you can run in your terminal:
import OpenAI from "openai";
import * as readline from "node:readline";
const client = new OpenAI();
const messages = [
{ role: "system", content: "You are a helpful, concise coding assistant." },
];
const rl = readline.createInterface({
input: process.stdin,
output: process.stdout,
});
function ask(prompt) {
return new Promise((resolve) => rl.question(prompt, resolve));
}
console.log('Chat started. Type "quit" to exit.\n');
while (true) {
const input = await ask("You: ");
if (input.toLowerCase() === "quit") break;
messages.push({ role: "user", content: input });
const stream = await client.chat.completions.create({
model: "gpt-4o",
stream: true,
messages,
});
process.stdout.write("AI: ");
let reply = "";
for await (const chunk of stream) {
const text = chunk.choices[0]?.delta?.content;
if (text) {
process.stdout.write(text);
reply += text;
}
}
console.log("\n");
messages.push({ role: "assistant", content: reply });
}
rl.close();
Run it with node index.js and you have a working chatbot with streaming responses and conversation memory.
Using Other Providers
The patterns above work with any LLM provider. Most offer OpenAI-compatible APIs, so you can often just change the base URL:
// Anthropic (using their SDK)
import Anthropic from "@anthropic-ai/sdk";
const anthropic = new Anthropic();
const response = await anthropic.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 1024,
system: "You are a helpful coding assistant.",
messages: [{ role: "user", content: "What is a closure?" }],
});
console.log(response.content[0].text);
// Any OpenAI-compatible provider (e.g., local models via Ollama)
const client = new OpenAI({
baseURL: "http://localhost:11434/v1",
apiKey: "ollama", // Ollama doesn't need a real key
});
What’s Next?
You’re now making LLM API calls in JavaScript. If Python is more your speed, check out Calling LLM APIs with Python for the same patterns in Python. Otherwise, continue to the RAG section of this series where we’ll build applications that combine LLM calls with your own data.