CrewAI Multi‑Agent System Tutorial: From First Crew to Production
Build a production-ready multi-agent system with CrewAI: agents, tasks, tools, orchestration, validation, and deployment in under an hour.
Image used for representation purposes only.
Overview
Multi-agent systems turn one large, ambiguous request into a coordinated set of smaller, specialized actions. CrewAI is a Python framework that makes this practical: you define agents with clear roles, give them tools, arrange tasks, and let an orchestrator handle the workflow. In this tutorial you’ll build a working crew from scratch, learn orchestration patterns (sequential and hierarchical), add web tools, enforce structured outputs, and package everything for production.
What you’ll build:
- A two-agent research-and-writing crew
- Optional manager agent for hierarchical orchestration
- Custom tool for fetching external data
- Validation around JSON outputs
- A minimal FastAPI service to run your crew on demand
Prerequisites
- Python 3.10+
- An LLM provider key (for example, OpenAI)
- Optional API keys for tools (for example, Serper for web search)
Installation and project setup
Create a fresh environment and install dependencies.
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -U crewai crewai-tools langchain-openai python-dotenv fastapi uvicorn
Add your keys to a .env file in the project root:
# .env
OPENAI_API_KEY=sk-...
SERPER_API_KEY=serper_...
Project layout:
crewai-tutorial/
├─ .env
├─ crew_quickstart.py
├─ tools/
│ └─ readme_tool.py
└─ service.py
Quickstart: A research-and-writing crew
We’ll assemble two specialists:
- Researcher: Finds and synthesizes facts using web tools
- Writer: Turns research into a crisp article with a defined structure
Step 1 — Define tools
We’ll use a search API and a simple web scraper from crewai-tools.
# crew_quickstart.py
from dotenv import load_dotenv
load_dotenv()
from crewai_tools import SerperDevTool, ScrapeWebsiteTool
search = SerperDevTool()
scrape = ScrapeWebsiteTool()
Step 2 — Pick your LLM
CrewAI works with multiple providers. Here we’ll use LangChain’s OpenAI chat wrapper for clarity.
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.2) # pick a capable, cost-effective model
Step 3 — Define agents
Each agent gets a role, goal, backstory, and tools. Keep these crisp—good role design is half the battle.
from crewai import Agent
researcher = Agent(
role="Senior Researcher",
goal=(
"Discover accurate, up-to-date information and craft objective summaries "
"with citations and source links."
),
backstory=(
"You are meticulous and skeptical. You verify claims using multiple sources "
"and flag uncertainty."
),
tools=[search, scrape],
llm=llm,
verbose=True,
allow_delegation=False,
)
writer = Agent(
role="Technical Writer",
goal=(
"Transform research into a concise, well-structured article with clear headings, "
"callouts, and an executive summary."
),
backstory=(
"You write in a professional tone for engineers and product teams, optimizing for clarity."
),
llm=llm,
verbose=True,
allow_delegation=False,
)
Step 4 — Define tasks
Tasks describe the work product and the success criteria. Use placeholders like {topic} that are filled at runtime via inputs.
from crewai import Task
research_task = Task(
description=(
"Research the topic: '{topic}'. Identify 5–8 key insights, important definitions, "
"benefits, trade-offs, and 3–5 reputable sources. Capture brief notes and URLs."
),
expected_output=(
"JSON with fields: insights (array of strings), definitions (array), pros (array), "
"cons (array), sources (array of {title, url}). Keep it factual and neutral."
),
agent=researcher,
)
writing_task = Task(
description=(
"Write a 600–900 word article about '{topic}' using the research JSON from the previous task. "
"Include: (1) executive summary, (2) background, (3) key insights with bullets, "
"(4) trade-offs, (5) recommended next steps, (6) references as links."
),
expected_output="Markdown article body only. No front-matter.",
agent=writer,
)
Step 5 — Orchestrate the crew
Crew orchestrates agents and tasks. Start with a sequential process.
from crewai import Crew, Process
crew = Crew(
agents=[researcher, writer],
tasks=[research_task, writing_task],
process=Process.sequential, # run tasks in order
verbose=True,
memory=True, # pass context between tasks
)
Step 6 — Run it
Provide runtime inputs to fill placeholders across tasks.
if __name__ == "__main__":
result = crew.kickoff(inputs={"topic": "CrewAI multi-agent system patterns"})
print("\n=== Final Article ===\n")
print(result)
Run it:
python crew_quickstart.py
You should see agent-by-agent reasoning logs (summarized) and a final Markdown article printed to the console.
Going deeper: Hierarchical orchestration with a manager
Sequential is simple, but complex projects benefit from a manager agent that delegates, reviews, and requests revisions.
from crewai import Crew, Process, Agent
manager = Agent(
role="Project Manager",
goal=(
"Ensure the crew delivers accurate, high-quality outputs. Assign tasks, request clarifications, "
"and enforce standards and timelines."
),
backstory="You balance speed, quality, and scope; you ask for revisions when needed.",
llm=llm,
verbose=True,
allow_delegation=True,
)
hierarchical_crew = Crew(
agents=[manager, researcher, writer],
tasks=[research_task, writing_task],
process=Process.hierarchical, # manager oversees and can re-assign/revise
manager_agent=manager,
memory=True,
)
if __name__ == "__main__":
result = hierarchical_crew.kickoff(inputs={"topic": "Vector databases for RAG"})
print(result)
When to prefer hierarchical:
- Ambiguous scopes where requirements evolve during the run
- Multi-stage outputs that benefit from review-and-revise loops
- Large crews where handoffs must be actively managed
Add a custom tool
You can expose domain-specific capabilities as tools. Tools run synchronously and return strings (or JSON as a string).
# tools/readme_tool.py
from crewai.tools import BaseTool
import requests
class ReadmeFinderTool(BaseTool):
name = "find_readme"
description = "Given a GitHub repo URL, return the README.md contents (HEAD)."
def _run(self, repo_url: str) -> str:
url = repo_url.rstrip("/") + "/raw/HEAD/README.md"
resp = requests.get(url, timeout=10)
resp.raise_for_status()
return resp.text
Wire it into an agent:
from tools.readme_tool import ReadmeFinderTool
repo_reader = ReadmeFinderTool()
researcher.tools.extend([repo_reader])
Prompt tip: In the agent’s goal/backstory, say when the tool is appropriate and what format to return.
Enforce structured outputs with validation
Relying on natural language alone can be brittle. Ask for strict JSON and validate post-run.
# validator.py
from pydantic import BaseModel, HttpUrl, ValidationError
from typing import List
class Source(BaseModel):
title: str
url: HttpUrl
class ResearchSchema(BaseModel):
insights: List[str]
definitions: List[str]
pros: List[str]
cons: List[str]
sources: List[Source]
# in your run pipeline, after research_task completes
raw_json = crew.last_task_output # or capture from the research task callback
try:
data = ResearchSchema.model_validate_json(raw_json)
except ValidationError as e:
# Optionally ask the manager/agent to self-repair using the error message
print("Validation failed:", e)
Prompt snippet for the research_task expected_output:
- “Return STRICT JSON only. Do not include prose. If uncertain, use an empty array. Use this schema: …”
Knowledge and memory
- Memory: Enabling memory=True lets later tasks reference earlier outputs without complex prompt plumbing.
- Knowledge: For domain packets (docs, playbooks), pre-summarize key sections and attach as context via tools (e.g., a local file search tool) or include short extracts directly in task descriptions. Keep the context concise to control tokens.
Observability and debugging
- Set verbose=True for step-by-step traces.
- Log tool inputs/outputs (redact secrets).
- Capture intermediate artifacts: task outputs, chosen tools, and final messages.
- Compare runs with different temperatures; keep generation deterministic for tests (temperature=0).
Testing your crew
Add lightweight tests to avoid regressions when you tweak prompts or swap models.
# test_quickstart.py
import re
def test_article_has_sections(run_article):
body = run_article(topic="Test Topic")
assert "Executive Summary" in body or re.search(r"^#?\s*Summary", body, re.I)
assert "References" in body
Tip: Inject a smaller, faster model for CI to control cost; record fixtures of expected shapes rather than exact text.
Deploy as an API
Expose the crew behind a simple FastAPI route.
# service.py
from fastapi import FastAPI
from pydantic import BaseModel
from crew_quickstart import crew
app = FastAPI()
class RunRequest(BaseModel):
topic: str
@app.post("/run")
async def run(req: RunRequest):
result = crew.kickoff(inputs={"topic": req.topic})
return {"article": result}
# run: uvicorn service:app --reload --port 8000
Operational tips:
- Use async workers and a queue for higher throughput.
- Add request IDs and persist artifacts to blob storage.
- Rate-limit and retry on provider errors; implement backoff.
Prompt design best practices
- Roles ≠ tasks: Keep roles stable and reuse across projects; tune tasks to the job.
- State the audience and constraints (length, format, tone) explicitly.
- Give agents permission boundaries: which tools, what decisions they can make, what to escalate.
- Prefer checklists and JSON fields to vague prose.
- Add self-checks: “Before finalizing, verify all URLs resolve (HEAD). If any fail, replace or flag.”
Troubleshooting
- Hallucinations: Strengthen expected_output with schemas, require citations, set temperature lower.
- Tool misuse: Clarify when to use each tool; add examples in the backstory.
- Token overuse: Trim context, pre-summarize, and keep prompts DRY.
- Rate limits: Batch runs, add jittered retries, use smaller models for intermediate steps.
- Non-determinism: Fix seeds if supported; set temperature=0; keep instructions stable.
Extending this tutorial
- Add a Reviewer agent that checks style and factual consistency before publishing.
- Introduce a Planner agent that breaks a large topic into subtopics, spawning subtasks.
- Integrate a vector search tool for organization-specific knowledge.
- Stream partial outputs to a UI for better UX.
Summary
CrewAI provides a pragmatic way to build multi-agent systems:
- Define sharp roles and responsibilities
- Attach only the tools that matter
- Write tasks with explicit deliverables and formats
- Choose an orchestration mode that matches complexity
- Validate outputs and observe the run With these patterns, you can scale from a simple two-agent pipeline to a robust, production-grade multi-agent service.
Related Posts
GPT‑4 API Structured Outputs: A Hands‑On Tutorial for Reliable JSON
A practical GPT‑4 API guide to Structured Outputs: enforce JSON Schemas via Responses and Chat Completions, with code, streaming, and production tips.
gRPC Streaming API Tutorial: Server, Client, and Bidirectional Patterns with Go and Python
Hands-on gRPC streaming tutorial: server, client, and bidirectional streams with Go and Python, plus proto design, flow control, deadlines, security, and testing.
Flutter Hive Database Tutorial: Fast, Typed, and Offline‑First
Learn Flutter Hive database from setup to adapters, reactive UI, encryption, migrations, testing, and performance tips—with clear code examples.