GraphRAG Tutorial: From Documents to Knowledge Graph–Powered RAG
Build a practical GraphRAG pipeline: extract a knowledge graph, index nodes and chunks, retrieve local paths and global summaries, and synthesize grounded answers.
Image used for representation purposes only.
Overview
Retrieval-augmented generation (RAG) couples a large language model (LLM) with a retriever to ground answers in your data. GraphRAG extends this idea by building and querying a knowledge graph so the LLM can reason over entities, relations, and global structure—not just isolated chunks. The payoff is better multi-hop answers, disambiguation, and explainability via paths and citations.
This tutorial walks you end-to-end: ingesting documents, extracting a graph, indexing both text and graph signals, and implementing a two-level retriever (local and global) that feeds a final answer synthesis prompt.
When to use GraphRAG
- Your questions require multi-hop reasoning (A → related-to → B → causes → C).
- You need interpretable answers with entity/edge citations.
- Your corpus has recurring entities across documents (people, orgs, APIs, components).
- You want global summaries of regions of the graph (communities, topics) to complement local evidence.
Architecture at a glance
- Ingestion: parse and chunk documents.
- Graph extraction: LLM or NLP pipeline yields triples (subject, relation, object) with evidence.
- Storage: text chunks in a vector store; graph in NetworkX or Neo4j (optional).
- Indexing: embeddings for chunks and for node/edge text; community detection for global structure.
- Retrieval:
- Local: semantic search over chunks + node neighborhoods.
- Global: community- or subgraph-level summaries.
- Synthesis: structured prompt that includes local facts, paths, global summaries, and citations.
ASCII sketch:
[Docs] -> [Chunker] -> (1) [Vector Index]
\-> (2) [Triple Extractor] -> [Graph DB] -> [Communities + Summaries]
Query -> [Entity Linking + Seed Nodes] -> [Neighborhood Expand] -> [Local + Global Context] -> [LLM Answer]
Prerequisites
- Python 3.10+
- Packages: networkx, sentence-transformers, faiss-cpu (or Chroma), scikit-learn, pydantic, python-dotenv, spacy (optional), fastapi (optional for serving)
- An embedding model (e.g., sentence-transformers) and an LLM provider (any; wrap behind a simple function).
Project setup
Create a minimal environment and install dependencies:
python -m venv .venv && source .venv/bin/activate
pip install networkx sentence-transformers faiss-cpu scikit-learn pydantic python-dotenv spacy
python -m spacy download en_core_web_sm
A simple layout:
project/
data/ # raw docs
build/
graph.jsonl # triples cache
node_summaries.jsonl
app/
ingest.py
extract_graph.py
index.py
retriever.py
answer.py
Step 1 — Ingest and chunk documents
Keep chunks small enough for precise retrieval but large enough for context (~400–800 tokens). Store chunk text and metadata (doc id, page, headings).
# app/ingest.py
from pathlib import Path
import re, json
from typing import List, Dict
def simple_md_split(text: str, max_chars: int = 1800) -> List[str]:
paras = [p.strip() for p in re.split(r"\n\n+", text) if p.strip()]
chunks, buf = [], ""
for p in paras:
if len(buf) + len(p) + 2 > max_chars:
if buf: chunks.append(buf); buf = ""
buf = (buf + "\n\n" + p).strip()
if buf: chunks.append(buf)
return chunks
def load_docs(path="data") -> Dict[str, List[str]]:
docs = {}
for f in Path(path).glob("**/*.md"):
text = f.read_text(encoding="utf-8")
docs[f.stem] = simple_md_split(text)
return docs
if __name__ == "__main__":
docs = load_docs()
Path("build").mkdir(exist_ok=True)
with open("build/chunks.jsonl", "w", encoding="utf-8") as w:
for doc_id, chunks in docs.items():
for i, ch in enumerate(chunks):
w.write(json.dumps({"doc_id": doc_id, "chunk_id": i, "text": ch})+"\n")
Step 2 — Extract entities and relations (triples)
You can use either:
- LLM-based extraction with a JSON schema (best quality, higher cost), or
- Lightweight NLP (spaCy + patterns) as a fallback.
LLM wrapper (provider-agnostic):
# app/llm.py
import os, json
from typing import List
# Implement this to call your LLM provider (OpenAI, Azure, Anthropic, local, etc.)
# It should return a parsed JSON string that fits the schema we request.
def call_llm(system: str, prompt: str) -> str:
raise NotImplementedError("Plug in your LLM provider here.")
TRIPLE_SCHEMA = {
"type": "object",
"properties": {
"triples": {
"type": "array",
"items": {
"type": "object",
"properties": {
"subject": {"type": "string"},
"relation": {"type": "string"},
"object": {"type": "string"},
"evidence": {"type": "string"},
"confidence": {"type": "number"}
},
"required": ["subject","relation","object","evidence","confidence"]
}
}
},
"required": ["triples"]
}
EXTRACT_SYSTEM = """
You extract knowledge graph triples from text. Output compact JSON only.
Entities should be canonical (merge aliases). Use relation verbs or nouns.
Include a short evidence quote from the text and a confidence in [0,1].
"""
EXTRACT_PROMPT_TMPL = """
Text:\n"""{text}"""\n
Respond with JSON per schema: {schema}
""".strip()
Extraction driver:
# app/extract_graph.py
import json
from pathlib import Path
from llm import call_llm, EXTRACT_SYSTEM, EXTRACT_PROMPT_TMPL, TRIPLE_SCHEMA
def extract_triples_for_chunks(chunks_path="build/chunks.jsonl", out_path="build/graph.jsonl"):
with open(chunks_path, "r", encoding="utf-8") as r, open(out_path, "w", encoding="utf-8") as w:
for line in r:
rec = json.loads(line)
prompt = EXTRACT_PROMPT_TMPL.format(text=rec["text"], schema=json.dumps(TRIPLE_SCHEMA))
try:
resp = call_llm(EXTRACT_SYSTEM, prompt)
data = json.loads(resp)
for t in data.get("triples", []):
t.update({"doc_id": rec["doc_id"], "chunk_id": rec["chunk_id"]})
w.write(json.dumps(t)+"\n")
except Exception as e:
# Optionally log and continue
pass
if __name__ == "__main__":
extract_triples_for_chunks()
Tip: post-process to normalize entity names (lowercase, strip punctuation, map aliases like “IBM” ↔ “International Business Machines”).
Step 3 — Build the graph and compute communities
Use NetworkX for a portable graph. Optionally mirror to Neo4j if you need Cypher queries or a production-grade store.
# app/index.py
import json, networkx as nx
from collections import defaultdict
class GraphIndex:
def __init__(self):
self.G = nx.MultiDiGraph()
self.node_text = defaultdict(list)
def add_triple(self, s,r,o,evidence,meta):
self.G.add_node(s)
self.G.add_node(o)
self.G.add_edge(s,o,relation=r,evidence=evidence,**meta)
self.node_text[s].append(evidence)
self.node_text[o].append(evidence)
@classmethod
def from_jsonl(cls, path="build/graph.jsonl"):
gi = cls()
with open(path,"r",encoding="utf-8") as f:
for line in f:
t = json.loads(line)
gi.add_triple(t["subject"], t["relation"], t["object"], t["evidence"], {"doc_id":t["doc_id"],"chunk_id":t["chunk_id"]})
return gi
if __name__ == "__main__":
gi = GraphIndex.from_jsonl()
print(gi.G.number_of_nodes(), gi.G.number_of_edges())
Community detection (global structure) and node text synthesis:
# app/summarize_graph.py
import json
import networkx as nx
from llm import call_llm
from collections import defaultdict
COMMUNITY_PROMPT = """
Summarize this set of related entities and relations in 5-8 bullet points.
Be factual; cite 2-4 key entity names.
Input triples:\n{triples}
"""
def louvain_communities(G):
# lightweight fallback using connected components on undirected view
# replace with a real community algorithm if desired
return list(nx.connected_components(G.to_undirected()))
def summarize_communities(G, out_path="build/node_summaries.jsonl"):
comms = louvain_communities(G)
with open(out_path, "w", encoding="utf-8") as w:
for i, nodes in enumerate(comms):
sub = G.subgraph(nodes)
triples = []
for u,v,k,d in sub.edges(keys=True, data=True):
triples.append(f"({u}) -[{d.get('relation','related')}]-> ({v})")
prompt = COMMUNITY_PROMPT.format(triples="\n".join(triples[:60]))
try:
summary = call_llm("You write precise technical summaries.", prompt)
except Exception:
summary = "- Related entities: " + ", ".join(list(nodes)[:8])
w.write(json.dumps({"community_id": i, "nodes": list(nodes), "summary": summary})+"\n")
Step 4 — Dual index: vectors for text and nodes
Embed both chunk texts and node profiles (concatenated evidence snippets). Use the same embedding model so you can rank both together.
# app/embed.py
import json
import numpy as np
from sentence_transformers import SentenceTransformer
import faiss
class DualIndex:
def __init__(self, model_name="all-MiniLM-L6-v2"):
self.model = SentenceTransformer(model_name)
self.vec_dim = self.model.get_sentence_embedding_dimension()
self.faiss_chunks = faiss.IndexFlatIP(self.vec_dim)
self.faiss_nodes = faiss.IndexFlatIP(self.vec_dim)
self.chunk_meta = []
self.node_meta = []
def _emb(self, texts):
X = self.model.encode(texts, normalize_embeddings=True)
return np.asarray(X, dtype="float32")
def add_chunks(self, chunks_jsonl="build/chunks.jsonl"):
texts, metas = [], []
with open(chunks_jsonl,"r",encoding="utf-8") as f:
for line in f:
rec = json.loads(line)
texts.append(rec["text"])
metas.append({k:rec[k] for k in ("doc_id","chunk_id")})
X = self._emb(texts)
self.faiss_chunks.add(X)
self.chunk_meta = metas
def add_nodes(self, graph_index, min_snips=3):
texts, metas = [], []
for node, snips in graph_index.node_text.items():
if len(snips) < min_snips: continue
txt = f"Entity: {node}\nEvidence:\n- " + "\n- ".join(snips[:10])
texts.append(txt)
metas.append({"entity": node})
X = self._emb(texts)
self.faiss_nodes.add(X)
self.node_meta = metas
Step 5 — Query-time retrieval: local + graph
At query time we:
- Detect mentioned entities to seed a neighborhood search.
- Run semantic search over text chunks.
- Expand the graph k steps from seed nodes to collect high-signal edges and nearby nodes.
- Retrieve community summaries for any touched communities.
- Assemble a structured context with citations and paths.
# app/retriever.py
import json, re
import networkx as nx
import numpy as np
from typing import List, Dict, Any
MENTION_RX = re.compile(r"[A-Z][A-Za-z0-9_\-]{2,}")
def detect_entities(q: str) -> List[str]:
# very naive: use spaCy NER in production
return list(set(MENTION_RX.findall(q)))
def k_hop_neighborhood(G, seeds: List[str], k=2, max_nodes=60):
visited = set(seeds)
frontier = set(seeds)
for _ in range(k):
nxt = set()
for u in frontier:
for _, v in G.out_edges(u): nxt.add(v)
for v, _ in G.in_edges(u): nxt.add(v)
frontier = nxt - visited
visited |= frontier
if len(visited) > max_nodes: break
return G.subgraph(visited)
def paths_as_text(SG):
lines = []
for u,v,k,d in SG.edges(keys=True, data=True):
rel = d.get('relation','related')
ev = d.get('evidence','')
lines.append(f"({u}) -[{rel}]-> ({v}); evidence: {ev[:120]}")
return "\n".join(lines[:80])
def search_faiss(index, query_vec, topk=5):
D, I = index.search(query_vec, topk)
return I[0], D[0]
def embed_query(model, q):
x = model.encode([q], normalize_embeddings=True).astype('float32')
return x
Answer synthesis with a structured prompt:
# app/answer.py
import json
from llm import call_llm
from embed import DualIndex
from index import GraphIndex
from retriever import detect_entities, k_hop_neighborhood, paths_as_text, embed_query, search_faiss
SYNTH_PROMPT = """
You are a careful assistant. Use only the provided context. Cite entities or doc_ids.
Question: {q}
Local evidence (top chunks):
{local_blocks}
Graph paths (k-hop neighborhood):
{graph_paths}
Global summaries (communities):
{global_summaries}
Instructions:
- First list 3-6 key grounded facts with citations.
- Then produce a concise answer.
- Finally, show 1-3 critical paths as bullet points: (A) -[rel]-> (B) -[rel]-> (C).
"""
def build_and_answer(q:str):
gi = GraphIndex.from_jsonl()
di = DualIndex(); di.add_chunks(); di.add_nodes(gi)
qv = embed_query(di.model, q)
# Local text search
I, D = search_faiss(di.faiss_chunks, qv, topk=6)
local_blocks = []
for idx in I:
meta = di.chunk_meta[idx];
# In production, also keep the text body for each chunk
local_blocks.append(f"- doc={meta['doc_id']} chunk={meta['chunk_id']}")
# Graph neighborhood
seeds = detect_entities(q)
if not seeds:
# try to seed from best-matching nodes
NI, _ = search_faiss(di.faiss_nodes, qv, topk=3)
seeds = [di.node_meta[i]['entity'] for i in NI]
SG = k_hop_neighborhood(gi.G, seeds, k=2, max_nodes=80)
graph_paths = paths_as_text(SG)
# Global summaries: collect any communities overlapping SG nodes
# Here we just load precomputed summaries
comm_summaries = []
try:
with open("build/node_summaries.jsonl","r",encoding="utf-8") as f:
for line in f:
rec = json.loads(line)
if any(n in rec["nodes"] for n in SG.nodes):
comm_summaries.append(f"- c{rec['community_id']}: {rec['summary']}")
except FileNotFoundError:
pass
prompt = SYNTH_PROMPT.format(q=q,
local_blocks="\n".join(local_blocks),
graph_paths=graph_paths,
global_summaries="\n".join(comm_summaries[:4]))
ans = call_llm("Grounded answering with rigorous citations.", prompt)
return ans
Step 6 — Running the pipeline
- Ingest: python app/ingest.py
- Extract triples: python app/extract_graph.py
- Build graph and summaries: python -c “from app.index import GraphIndex; gi=GraphIndex.from_jsonl(); print(’nodes’, gi.G.number_of_nodes())” and python app/summarize_graph.py
- Answer a question: python -c “from app.answer import build_and_answer; print(build_and_answer(‘How does Component X integrate with Service Y?’))”
Prompting tips that matter
- Extraction: require JSON with confidence scores. Cap the number of triples per chunk to control cost.
- Canonicalization: normalize case; map aliases with a small dictionary; merge near-duplicates by Jaccard similarity.
- Summaries: keep them terse and cache them; refresh when the subgraph changes.
- Answering: separate local facts, graph paths, and global summaries; ask the LLM to cite entities/doc_ids explicitly.
Evaluation and debugging
- Faithfulness: check whether cited entities/edges actually exist. Auto-verify by matching answer citations to the graph.
- Coverage: fraction of gold edges present in retrieved subgraph for a QA set.
- Latency: break down time spent in embedding search, neighborhood expansion, and LLM calls.
- Ablations: compare vanilla RAG vs +node paths vs +global summaries.
Debug routines to add:
- Print top-5 retrieved chunks with scores.
- Visualize k-hop subgraph with colors per community.
- Show the three highest betweenness paths connecting seed entities.
Production considerations
- Storage: Neo4j (with vector indexes) is ideal for large graphs; NetworkX is fine for prototypes.
- Incremental updates: re-extract triples only for changed chunks; recompute affected subgraphs/communities.
- Caching: memoize extraction and summaries; use an on-disk key-value store.
- Guardrails: drop low-confidence triples; require two pieces of evidence for critical edges.
- Privacy: strip PII during extraction; consider an allowlist of relations.
- Cost control: batch LLM calls; use smaller models for extraction and bigger ones for final synthesis.
Variations and extensions
- Path planning: ask the LLM to propose target entity pairs, then compute k-shortest paths in the graph and re-rank by semantic similarity to the question.
- Temporal GraphRAG: attach timestamps to edges; filter neighborhoods by time range for time-sensitive QA.
- Heterogeneous graphs: distinct node/edge types (e.g., API, endpoint, product, team) with type-specific prompts.
- Hybrid scoring: combine vector similarity, PageRank on the subgraph, and relation priors to rank evidence.
- Structured querying: expose a Cypher tool to the LLM for precise graph lookups when needed.
What you should have now
- A working pipeline that builds a knowledge graph from your documents.
- Dual indices over chunks and graph entities.
- A two-level retriever (local + global) and a synthesis prompt that yields grounded, multi-hop answers with citations and paths.
Next steps
- Swap in your preferred LLM provider in app/llm.py.
- Replace the toy community detector with Louvain/Leiden.
- Add a front end (FastAPI + simple UI) and telemetry to log queries, retrieved edges, and answer citations.
By adding graph structure to RAG, you give the model the scaffolding it needs for reliable multi-hop reasoning—while keeping answers grounded, auditable, and maintainable as your corpus grows.
Related Posts
A Practical Tutorial on Knowledge Graph–Enhanced AI Retrieval (GraphRAG)
Build a production-ready tutorial for knowledge graph–enhanced AI retrieval: schema, ingestion, Cypher, hybrid search, and evaluation.
LangChain API Tutorial: From Hello World to Production RAG with FastAPI and LangServe
Build a production-ready LangChain API: LCEL chains, LangServe, FastAPI streaming, RAG, structured outputs, testing, and deployment tips.
Build an AI Meeting Transcription API: An End-to-End Tutorial
Build a production-ready AI meeting transcription API with streaming, diarization, summaries, and exports. Code samples and architecture included.