Fine-Tuning vs RAG vs Prompt Engineering
Table of Contents
You’ve built a prototype with an LLM and it works pretty well, but the model doesn’t know about your company’s products, it sometimes gets the tone wrong, or it hallucinates facts about your domain. How do you fix that?
There are three main approaches to customizing LLM behavior: prompt engineering, RAG (Retrieval-Augmented Generation), and fine-tuning. Each solves different problems, and choosing the wrong one wastes time and money. This tutorial breaks down when to use each.
You should be familiar with Prompt Engineering Fundamentals and Embeddings & Vector Search before reading this.
The Three Approaches at a Glance
| Prompt Engineering | RAG | Fine-Tuning | |
|---|---|---|---|
| What it does | Crafts better instructions for the model | Feeds relevant documents to the model at query time | Trains the model further on your own data |
| Changes the model? | No | No | Yes |
| Needs training data? | No | Needs a document corpus | Needs labeled examples |
| Setup time | Minutes | Hours to days | Days to weeks |
| Cost | Low (just API calls) | Medium (embeddings + vector DB + API calls) | High (training compute + hosting) |
| Best for | Controlling format, tone, and behavior | Grounding answers in specific, up-to-date data | Teaching the model new skills or domain-specific patterns |
Prompt Engineering
Prompt engineering is always your starting point. Before reaching for more complex solutions, see how far you can get by writing better prompts.
What It Solves
- The model’s output format isn’t what you want
- The tone or style is wrong
- The model doesn’t follow your specific rules
- You need the model to play a specific role
What It Doesn’t Solve
- The model doesn’t know about your proprietary data
- The model’s knowledge is outdated
- The model consistently gets domain-specific facts wrong
- You need the model to behave in ways fundamentally different from its training
Example
You are a customer support agent for Acme Cloud Storage.
Rules:
- Only answer questions about Acme products
- Use a friendly, professional tone
- If you don't know the answer, say "Let me connect you with our team"
- Never mention competitor products
User: How do I upload files larger than 5GB?
This works well for controlling behavior, but the model is making up the answer about Acme’s upload process based on general knowledge. It doesn’t actually know Acme’s specific file size limits or upload procedures.
When to stop here: If prompt engineering gives you consistently good results, you’re done. Don’t add complexity you don’t need.
RAG (Retrieval-Augmented Generation)
RAG solves the knowledge problem. Instead of hoping the model knows about your data, you give it the relevant information at query time.
How It Works
- User asks a question
- Your system searches a knowledge base (using embeddings and vector search) to find relevant documents
- Those documents are injected into the prompt as context
- The model generates an answer grounded in the provided documents
System: You are a support agent for Acme Cloud Storage.
Answer questions using ONLY the provided documentation.
If the documentation doesn't contain the answer, say so.
Documentation:
---
Acme supports file uploads up to 50GB. Files larger than 5GB must use
multipart upload. To initiate a multipart upload, call the
POST /api/v2/uploads/multipart endpoint with the file size in the header.
Chunks must be between 5MB and 500MB each.
---
User: How do I upload files larger than 5GB?
Now the model answers based on actual documentation, not guesses.
What It Solves
- The model needs access to your proprietary data (docs, knowledge bases, databases)
- Information changes frequently and needs to stay current
- You need the model to cite sources or stay grounded in facts
- You want to reduce hallucinations about domain-specific topics
What It Doesn’t Solve
- The model’s fundamental behavior or writing style needs to change
- You need the model to learn a new skill (like generating code in a proprietary language)
- The task requires reasoning patterns the model wasn’t trained on
Trade-offs
- Latency — Each query requires an embedding lookup + document retrieval before the LLM call
- Retrieval quality — If the search doesn’t find the right documents, the answer will be wrong or incomplete
- Context window — Retrieved documents consume tokens, leaving less room for the conversation
- Maintenance — You need to keep your document index up to date
We cover RAG implementation in detail in Introduction to RAG.
Fine-Tuning
Fine-tuning means taking a pre-trained model and training it further on your own dataset. This actually changes the model’s weights, teaching it new patterns and behaviors.
How It Works
- Prepare a dataset of example inputs and desired outputs (typically hundreds to thousands of examples)
- Run a training job that adjusts the model’s parameters based on your examples
- Deploy the fine-tuned model
- Use it like any other model via API
What It Solves
- You need the model to consistently follow a very specific output style or format
- The model needs to understand domain-specific terminology or jargon deeply
- You want to replicate a specific “voice” across all outputs
- You need the model to perform a specialized task that general models struggle with
- You want to use a smaller, cheaper model that performs like a larger one on your specific task
What It Doesn’t Solve
- The model needs access to frequently changing data (use RAG instead)
- You just need to control the output format (use prompt engineering)
- You have very little training data (fine-tuning needs hundreds of quality examples minimum)
Trade-offs
- Cost — Training is expensive, and you may need to host a custom model
- Data requirements — You need high-quality labeled examples
- Maintenance — When the base model updates, you may need to re-fine-tune
- Overfitting risk — The model might become too specialized and lose general capabilities
- Time — Training takes hours to days, vs. minutes for prompt changes
Decision Framework
Here’s a practical flowchart for choosing your approach:
Start with prompt engineering. Can you get good results by writing better prompts, using few-shot examples, or adding a system prompt?
- ✅ Yes → You’re done. Ship it.
- ❌ No → Continue.
Is the problem that the model lacks specific knowledge? Does it need access to your docs, database, or other proprietary information?
- ✅ Yes → Use RAG. Embed your documents, set up vector search, and inject relevant context into prompts.
- ❌ No → Continue.
Is the problem that the model’s behavior or style is fundamentally wrong? Does it need to learn a new skill, adopt a very specific voice, or handle a specialized task?
- ✅ Yes → Consider fine-tuning. But first, make sure you have enough quality training data (500+ examples minimum for meaningful results).
- ❌ No → Revisit your prompt engineering. The issue is likely in how you’re framing the task.
Combining Approaches
These approaches aren’t mutually exclusive. In fact, the most effective production systems often combine them:
- Prompt engineering + RAG — The most common combination. Use system prompts to define behavior and RAG to provide knowledge. This covers the vast majority of use cases.
- Fine-tuning + RAG — Fine-tune a model for your domain’s style and terminology, then use RAG to provide current data. Useful for specialized applications like medical or legal assistants.
- Fine-tuning + prompt engineering — Fine-tune for the base behavior, then use prompts for task-specific instructions.
Quick Comparison
| Scenario | Best Approach |
|---|---|
| “The output format is wrong” | Prompt engineering |
| “The model doesn’t know about our products” | RAG |
| “The model needs to sound like our brand voice” | Prompt engineering (try first) → Fine-tuning |
| “The model needs to answer questions about today’s data” | RAG |
| “The model can’t do this specialized task well” | Fine-tuning |
| “The model hallucinates about our domain” | RAG |
| “I want a small model to perform like a large one on my task” | Fine-tuning |
| “I need to control what topics the model discusses” | Prompt engineering |
What’s Next?
With the foundations complete, you’re ready to start building. The next articles move into hands-on development: Structured Output & JSON Mode covers getting reliable structured data from LLMs, and Calling LLM APIs with JavaScript gets you writing real code against LLM APIs.