CrewAI Multi‑Agent System Tutorial: From First Crew to Production

Overview

Multi-agent systems turn one large, ambiguous request into a coordinated set of smaller, specialized actions. CrewAI is a Python framework that makes this practical: you define agents with clear roles, give them tools, arrange tasks, and let an orchestrator handle the workflow. In this tutorial you’ll build a working crew from scratch, learn orchestration patterns (sequential and hierarchical), add web tools, enforce structured outputs, and package everything for production.

What you’ll build:

A two-agent research-and-writing crew
Optional manager agent for hierarchical orchestration
Custom tool for fetching external data
Validation around JSON outputs
A minimal FastAPI service to run your crew on demand

Prerequisites

Python 3.10+
An LLM provider key (for example, OpenAI)
Optional API keys for tools (for example, Serper for web search)

Installation and project setup

Create a fresh environment and install dependencies.

python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate
pip install -U crewai crewai-tools langchain-openai python-dotenv fastapi uvicorn

Add your keys to a .env file in the project root:

# .env
OPENAI_API_KEY=sk-...
SERPER_API_KEY=serper_...

Project layout:

crewai-tutorial/
├─ .env
├─ crew_quickstart.py
├─ tools/
│  └─ readme_tool.py
└─ service.py

Quickstart: A research-and-writing crew

We’ll assemble two specialists:

Researcher: Finds and synthesizes facts using web tools
Writer: Turns research into a crisp article with a defined structure

Step 1 — Define tools

We’ll use a search API and a simple web scraper from crewai-tools.

# crew_quickstart.py
from dotenv import load_dotenv
load_dotenv()

from crewai_tools import SerperDevTool, ScrapeWebsiteTool

search = SerperDevTool()
scrape = ScrapeWebsiteTool()

Step 2 — Pick your LLM

CrewAI works with multiple providers. Here we’ll use LangChain’s OpenAI chat wrapper for clarity.

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.2)  # pick a capable, cost-effective model

Step 3 — Define agents

Each agent gets a role, goal, backstory, and tools. Keep these crisp—good role design is half the battle.

from crewai import Agent

researcher = Agent(
    role="Senior Researcher",
    goal=(
        "Discover accurate, up-to-date information and craft objective summaries "
        "with citations and source links."
    ),
    backstory=(
        "You are meticulous and skeptical. You verify claims using multiple sources "
        "and flag uncertainty."
    ),
    tools=[search, scrape],
    llm=llm,
    verbose=True,
    allow_delegation=False,
)

writer = Agent(
    role="Technical Writer",
    goal=(
        "Transform research into a concise, well-structured article with clear headings, "
        "callouts, and an executive summary."
    ),
    backstory=(
        "You write in a professional tone for engineers and product teams, optimizing for clarity."
    ),
    llm=llm,
    verbose=True,
    allow_delegation=False,
)

Step 4 — Define tasks

Tasks describe the work product and the success criteria. Use placeholders like {topic} that are filled at runtime via inputs.

from crewai import Task

research_task = Task(
    description=(
        "Research the topic: '{topic}'. Identify 5–8 key insights, important definitions, "
        "benefits, trade-offs, and 3–5 reputable sources. Capture brief notes and URLs."
    ),
    expected_output=(
        "JSON with fields: insights (array of strings), definitions (array), pros (array), "
        "cons (array), sources (array of {title, url}). Keep it factual and neutral."
    ),
    agent=researcher,
)

writing_task = Task(
    description=(
        "Write a 600–900 word article about '{topic}' using the research JSON from the previous task. "
        "Include: (1) executive summary, (2) background, (3) key insights with bullets, "
        "(4) trade-offs, (5) recommended next steps, (6) references as links."
    ),
    expected_output="Markdown article body only. No front-matter.",
    agent=writer,
)

Step 5 — Orchestrate the crew

Crew orchestrates agents and tasks. Start with a sequential process.

from crewai import Crew, Process

crew = Crew(
    agents=[researcher, writer],
    tasks=[research_task, writing_task],
    process=Process.sequential,  # run tasks in order
    verbose=True,
    memory=True,                 # pass context between tasks
)

Step 6 — Run it

Provide runtime inputs to fill placeholders across tasks.

if __name__ == "__main__":
    result = crew.kickoff(inputs={"topic": "CrewAI multi-agent system patterns"})
    print("\n=== Final Article ===\n")
    print(result)

Run it:

python crew_quickstart.py

You should see agent-by-agent reasoning logs (summarized) and a final Markdown article printed to the console.

Going deeper: Hierarchical orchestration with a manager

Sequential is simple, but complex projects benefit from a manager agent that delegates, reviews, and requests revisions.

from crewai import Crew, Process, Agent

manager = Agent(
    role="Project Manager",
    goal=(
        "Ensure the crew delivers accurate, high-quality outputs. Assign tasks, request clarifications, "
        "and enforce standards and timelines."
    ),
    backstory="You balance speed, quality, and scope; you ask for revisions when needed.",
    llm=llm,
    verbose=True,
    allow_delegation=True,
)

hierarchical_crew = Crew(
    agents=[manager, researcher, writer],
    tasks=[research_task, writing_task],
    process=Process.hierarchical,   # manager oversees and can re-assign/revise
    manager_agent=manager,
    memory=True,
)

if __name__ == "__main__":
    result = hierarchical_crew.kickoff(inputs={"topic": "Vector databases for RAG"})
    print(result)

When to prefer hierarchical:

Ambiguous scopes where requirements evolve during the run
Multi-stage outputs that benefit from review-and-revise loops
Large crews where handoffs must be actively managed

Add a custom tool

You can expose domain-specific capabilities as tools. Tools run synchronously and return strings (or JSON as a string).

# tools/readme_tool.py
from crewai.tools import BaseTool
import requests

class ReadmeFinderTool(BaseTool):
    name = "find_readme"
    description = "Given a GitHub repo URL, return the README.md contents (HEAD)."

    def _run(self, repo_url: str) -> str:
        url = repo_url.rstrip("/") + "/raw/HEAD/README.md"
        resp = requests.get(url, timeout=10)
        resp.raise_for_status()
        return resp.text

Wire it into an agent:

from tools.readme_tool import ReadmeFinderTool

repo_reader = ReadmeFinderTool()
researcher.tools.extend([repo_reader])

Prompt tip: In the agent’s goal/backstory, say when the tool is appropriate and what format to return.

Enforce structured outputs with validation

Relying on natural language alone can be brittle. Ask for strict JSON and validate post-run.

# validator.py
from pydantic import BaseModel, HttpUrl, ValidationError
from typing import List

class Source(BaseModel):
    title: str
    url: HttpUrl

class ResearchSchema(BaseModel):
    insights: List[str]
    definitions: List[str]
    pros: List[str]
    cons: List[str]
    sources: List[Source]

# in your run pipeline, after research_task completes
raw_json = crew.last_task_output  # or capture from the research task callback
try:
    data = ResearchSchema.model_validate_json(raw_json)
except ValidationError as e:
    # Optionally ask the manager/agent to self-repair using the error message
    print("Validation failed:", e)

Prompt snippet for the research_task expected_output:

“Return STRICT JSON only. Do not include prose. If uncertain, use an empty array. Use this schema: …”

Knowledge and memory

Memory: Enabling memory=True lets later tasks reference earlier outputs without complex prompt plumbing.
Knowledge: For domain packets (docs, playbooks), pre-summarize key sections and attach as context via tools (e.g., a local file search tool) or include short extracts directly in task descriptions. Keep the context concise to control tokens.

Observability and debugging

Set verbose=True for step-by-step traces.
Log tool inputs/outputs (redact secrets).
Capture intermediate artifacts: task outputs, chosen tools, and final messages.
Compare runs with different temperatures; keep generation deterministic for tests (temperature=0).

Testing your crew

Add lightweight tests to avoid regressions when you tweak prompts or swap models.

# test_quickstart.py
import re

def test_article_has_sections(run_article):
    body = run_article(topic="Test Topic")
    assert "Executive Summary" in body or re.search(r"^#?\s*Summary", body, re.I)
    assert "References" in body

Tip: Inject a smaller, faster model for CI to control cost; record fixtures of expected shapes rather than exact text.

Deploy as an API

Expose the crew behind a simple FastAPI route.

# service.py
from fastapi import FastAPI
from pydantic import BaseModel
from crew_quickstart import crew

app = FastAPI()

class RunRequest(BaseModel):
    topic: str

@app.post("/run")
async def run(req: RunRequest):
    result = crew.kickoff(inputs={"topic": req.topic})
    return {"article": result}

# run: uvicorn service:app --reload --port 8000

Operational tips:

Use async workers and a queue for higher throughput.
Add request IDs and persist artifacts to blob storage.
Rate-limit and retry on provider errors; implement backoff.

Prompt design best practices

Roles ≠ tasks: Keep roles stable and reuse across projects; tune tasks to the job.
State the audience and constraints (length, format, tone) explicitly.
Give agents permission boundaries: which tools, what decisions they can make, what to escalate.
Prefer checklists and JSON fields to vague prose.
Add self-checks: “Before finalizing, verify all URLs resolve (HEAD). If any fail, replace or flag.”

Troubleshooting

Hallucinations: Strengthen expected_output with schemas, require citations, set temperature lower.
Tool misuse: Clarify when to use each tool; add examples in the backstory.
Token overuse: Trim context, pre-summarize, and keep prompts DRY.
Rate limits: Batch runs, add jittered retries, use smaller models for intermediate steps.
Non-determinism: Fix seeds if supported; set temperature=0; keep instructions stable.

Extending this tutorial

Add a Reviewer agent that checks style and factual consistency before publishing.
Introduce a Planner agent that breaks a large topic into subtopics, spawning subtasks.
Integrate a vector search tool for organization-specific knowledge.
Stream partial outputs to a UI for better UX.

Summary

CrewAI provides a pragmatic way to build multi-agent systems:

Define sharp roles and responsibilities
Attach only the tools that matter
Write tasks with explicit deliverables and formats
Choose an orchestration mode that matches complexity
Validate outputs and observe the run With these patterns, you can scale from a simple two-agent pipeline to a robust, production-grade multi-agent service.