#python
Calling LLM APIs with Python
This tutorial covers calling LLM APIs using Python — the same concepts from Calling LLM APIs with JavaScript, but with Python’s SDK and idioms. If you’ve already read the JavaScript version, this will feel familiar. If Python is your primary language, start here. You should be familiar with What is Generative AI? and Tokens, Context Windows & Model Parameters. Setup Prerequisites Python 3.9+ An OpenAI API key (sign up at platform. Read more →
March 28, 2026
Streaming Responses
When you make a standard LLM API call, you wait for the entire response to be generated before you see anything. For short answers that’s fine, but for longer responses the user stares at a blank screen for seconds. Streaming fixes this by sending tokens to the client as they’re generated, creating the “typing” effect you see in ChatGPT and other AI chat interfaces. This tutorial covers streaming in depth — how it works under the hood, implementation in both JavaScript and Python, and how to integrate streaming into web applications. Read more →
March 28, 2026
Error Handling & Rate Limits
LLM API calls fail. Servers go down, rate limits get hit, tokens exceed context windows, and networks time out. If your application doesn’t handle these failures gracefully, your users get cryptic errors or broken experiences. This tutorial covers the common failure modes, how to detect them, and how to build retry logic that keeps your application running. You should have read Calling LLM APIs with JavaScript or Calling LLM APIs with Python first. Read more →
March 28, 2026