Making AI Work with Your Newsroom's Data
A Practical Guide to RAG, Agents, and Fine-Tuning
Workshop
·
75 min
What Is an LLM?
- Trained on massive text — books, web, code, news articles
- Learns to predict the next token
- Builds a compressed model of language, facts, and reasoning patterns
- Only "knows" what was in its training data
Billions of text documents
↓
Training
↓
LLM
Knowledge frozen at
training cutoff date
GPT-4, Claude, Llama, Mistral — same core idea, different training data and techniques.
In-Context Learning
LLMs can reason over text you put directly in the prompt — no retraining needed.
System instructions
Your documents / context
User question
- This is why context windows matter: 8K → 128K → 1M tokens
- Few-shot examples, documents, instructions — all work this way
- The model treats your context as if it "knew" it all along
This is the foundation — every approach we'll discuss is about getting the right data into that green zone.
The Gap
- Your archives, sources, internal docs — not in the training set
- LLMs hallucinate when they don't know — confidently wrong
- Recent events may be past the knowledge cutoff
- You can paste docs into the prompt, but that doesn't scale to thousands of articles
55,895 press releases
× avg 500 words each
=
~40M tokens
Won't fit in any context window
How do we bridge the gap?
Three Ways to Bridge the Gap
1. RAG
Search your data, inject results into the prompt. The model answers based on what you found.
Automated copy-paste
2. Agents
Give the model tools to search and reason on its own. It decides what to look for.
A researcher with access
3. Fine-Tuning
Retrain the model on your data so it internalizes your domain knowledge and style.
Training a new journalist
Each has different strengths, costs, and failure modes. Let's look at each one.
Approach 1
RAG — Retrieval-Augmented Generation
- User asks a question
- System searches your data for relevant documents
- Top results are injected into the prompt as context
- LLM generates an answer grounded in those documents
Think: automated copy-paste. The system finds the right documents so you don't have to.
Instructions
Retrieved doc 1
Retrieved doc 2
Retrieved doc 3
User question
What the LLM actually sees
Approach 1
RAG: How It Works
User Question
↓
↓ top K results
Question + Documents
↓
LLM
↓
Grounded Answer
Key characteristics
- One search, one LLM call
- Simple, fast, predictable
- The quality of the answer depends entirely on what the search returns
- No reasoning about what to search for — the user query goes straight to the data system
Approach 1
RAG: Weaknesses
- Retrieval ceiling — the answer is only as good as what the search returns
- One-shot retrieval — if the first search misses, there's no second chance
- Query sensitivity — slight rephrasing can return completely different results
- "Lost in the middle" — LLMs pay less attention to context buried in long prompts
- No cross-referencing — can't combine insights from multiple searches or follow leads
- Context window limits — can only fit so many documents before quality degrades
Great for direct factual questions. Struggles with complex, multi-faceted research.
Approach 2
Agentic Systems
- The LLM doesn't just answer — it drives the research
- Given tools (search, APIs, databases) it decides what to look for
- It can refine queries, follow leads, cross-reference results
- Multiple search → reason → search cycles
Less like copy-paste, more like giving a researcher access to your archive.
Agent thinking:
"Let me search for covid policy in Spain..."
→ 2,438 results
"Now let me compare with Luxembourg..."
→ 1,176 results
"Interesting. Let me check the timeline..."
→ filtered by year
"Now I can synthesize an answer."
Approach 2
Agents: How They Work
User Question
↓
Agent (LLM)
plans research
↓ tool call
Data System
↓ results
↓
Synthesized Answer
Key characteristics
- Multiple loops — the agent controls the research process
- Can use different tools: search, filter, count, compare
- Each step informed by previous results
- Produces deeper analysis but takes longer and costs more
Tools: Claude Code, custom agents with tool use, LangChain, CrewAI
Approach 2
Agents: Weaknesses
- Cost — multiple LLM calls per question, 5–50× more expensive than RAG
- Latency — research loops take seconds to minutes, not milliseconds
- Unpredictability — same question can produce different research paths and answers
- Harder to evaluate — no single retrieval step to measure; the whole chain matters
- Tool misuse — the agent can search for the wrong thing or misinterpret results
- Compounding errors — a wrong turn early in the chain poisons everything downstream
Powerful for complex investigations. Overkill (and expensive) for simple lookups.
Approach 3
Fine-Tuning with LoRA
- Instead of showing data at runtime, bake it into the model
- LoRA — Low-Rank Adaptation: train a small adapter on top of a base model
- Much cheaper than full fine-tuning — updates a fraction of parameters
- Shines at simpler, focused tasks: classification, sentiment analysis, tagging, summarization in a house style
- Can run locally — no API costs at inference time
Model parameters
Typically 0.1–5% of total parameters
Great for newsroom tasks like:
Topic classification
Sentiment analysis
Auto-tagging
Headline generation
Language detection
Approach 3
Fine-Tuning: How It Works
Training phase (offline)
Data System
↓ export examples
Curate Q&A pairs
↓
Base Model
+
LoRA Training
↓
Fine-Tuned Model
Inference (runtime)
Question
→
Model
→
Answer
Key characteristics
- Data system feeds training, not inference
- Knowledge is in the weights — no search at runtime
- Fast inference, no retrieval latency
- But knowledge is frozen until you retrain
Approach 3
Fine-Tuning: Weaknesses
- Frozen knowledge — model only knows what it was trained on; new data requires retraining
- Data curation — needs high-quality input/output pairs; garbage in, garbage out
- No citations — can't point to source documents; knowledge is opaque
- Catastrophic forgetting — can lose general capabilities when overtrained on narrow data
- Maintenance burden — must retrain when data changes, test for regressions each time
- Requires ML expertise — hyperparameters, GPU infrastructure, evaluation pipelines
Best for style/format adaptation. Not ideal when you need current facts with sources.
The Pattern
The Common Foundation
RAG
Query →
Data System
↓
LLM → Answer
Agents
Agent ↔
Data System
↓ ↑ ↓ ↑
Agent → Answer
Fine-Tuning
Data System → Export
↓
Train → Model
Every approach starts with the same green box
Before you choose an AI approach, build a searchable, structured, API-accessible data system.
Get this right and you can swap AI approaches as needs evolve.
The Foundation
What Goes in That Box?
Typesense
Fast, typo-tolerant, multilingual. Easy to self-host. Built-in vector search. Great developer experience.
Elasticsearch / OpenSearch
Industry standard. Extremely powerful and flexible. Complex to operate. Supports vectors via kNN.
Meilisearch
Developer-friendly, fast for small-to-medium datasets. Growing vector and AI search features.
PostgreSQL + pgvector
Add vector search to your existing database. Zero new infrastructure. Good enough for many use cases.
Apache Solr
Mature, battle-tested. Common in large newsrooms and archives. Dense vector support via plugins.
Dedicated Vector DBs
Pinecone, Weaviate, Qdrant, Chroma. Built for embeddings. Less useful for traditional keyword search.
The right choice depends on your data size, team skills, and existing infrastructure.
The Foundation
Inside the Green Box
Two ways to search
Keyword search
Match words directly. Fast, exact, interpretable. Struggles with synonyms and meaning.
"climate policy" → finds "climate policy"
misses "environmental regulation"
Vector search
Compare meaning, not words. Finds semantically similar content across languages.
"climate policy" → also finds
"Klimaschutzpolitik", "política climática"
What are embeddings?
- An embedding is a list of numbers that represents the meaning of a text
- An embedding model reads your document and outputs a vector (e.g., 384 numbers)
- Texts with similar meaning end up close together in vector space
- At query time: embed the question, find the nearest document vectors
"climate policy"
→
Embedding Model
→
[0.12, -0.34, 0.78, …]
Best systems combine both: hybrid search — keyword precision + semantic recall.
Workshop
Our Setup
Data
- Typesense — self-hosted, single container
- 55,895 EU government press releases
- 4 countries — Spain, Germany, Luxembourg, Italy
- 4 languages — ES, DE, FR, IT
- Multilingual embeddings (e5-small, 384d)
- Full-text + vector + faceted search
Tools
- Open WebUI — RAG chat interface
- Claude Code — agentic research via API
- LLMs via OpenRouter
- Everything on one server
User → Open WebUI → Pipeline → Typesense
↓
OpenRouter (LLM)
All data from the EU Open Data Portal. Open-licensed (CC-BY-4.0 / CC0).
Live Demo
Typesense Search
Exploring 56K press releases — full-text search, facets, typo tolerance, multilingual queries
Keyword search
→
Facet by country/year
→
Typo tolerance
→
Cross-language
Typesense Dashboard · :8109
Live Demo
RAG in Action
Asking questions about EU press releases — Typesense retrieves, LLM answers with citations
Ask a question
→
Typesense searches
→
LLM synthesizes
→
Grounded response
Open WebUI · :3000
Live Demo
Agent Research
Claude Code investigates a complex question — multiple searches, cross-referencing, synthesis
Complex question
→
Agent plans
→
Searches & reasons
→
Synthesized report
Claude Code · terminal
Choosing Your Approach
|
RAG |
Agents |
Fine-Tuning |
| Best for |
Direct factual Q&A |
Complex, multi-step research |
Classification, tagging, sentiment, style |
| Task complexity |
Simple questions |
Open-ended investigation |
Focused, repeatable tasks |
| Latency |
Seconds |
Minutes |
Milliseconds |
| Cost / query |
$ |
$$$$ |
Free (after training) |
| Data freshness |
Real-time |
Real-time |
Stale until retrained |
| Can cite sources |
Yes |
Yes |
No |
| Setup complexity |
Low |
Medium |
High (ML expertise) |
| Needs data system |
At runtime |
At runtime |
At training time only |
These aren't mutually exclusive. A newsroom might use RAG for reporter Q&A, agents for investigations, and a fine-tuned model to auto-tag incoming articles.