AI Engineering

Building an AI Email Assistant with APIs: Architecture, Code, and Best Practices

Build a production-ready AI email assistant: architecture, Gmail/Graph integration, LLM prompts, security, reliability, and code examples.

ASOasis

May 29, 2026

8 min read

Building an AI Email Assistant with APIs: Architecture, Code, and Best Practices

Image used for representation purposes only.

Overview

AI email assistant API automation turns your inbox into an event-driven system that reads, understands, and acts on messages with minimal human intervention. Done well, it can triage messages, draft or send replies, schedule meetings, update CRMs, and escalate exceptions while respecting privacy, compliance, and brand tone.

This guide walks through architectures, tooling choices, LLM prompting strategies, reliability patterns, and code examples to help you ship a production-grade assistant—not just a demo.

Common use cases

Smart triage: auto-label, prioritize, and route messages to queues or teams.
Reply drafting: on-brand, context-aware replies with citations to the source email.
Autonomous replies: send routine answers (e.g., FAQs, order status) under policy gates.
Scheduling: propose times, confirm meetings, attach calendar invites.
CRM/ticketing sync: extract entities and log to Salesforce, HubSpot, Zendesk, Jira.
Summarization: generate thread digests or daily inbox summaries for executives.

Reference architecture

+-------------------+        +------------------+        +------------------+
| Email Provider    |  Push  | Ingestion Layer  |  Msg   | Work Queue       |
| (Gmail/Graph/IMAP)| -----> | (Webhook/Watcher)| -----> | (SQS/PubSub/Kafka)|
+-------------------+        +------------------+        +------------------+
                                                        |  Dead-letter Q    |
                                                        +---------+--------+
                                                                  |
                                                                  v
                                                        +------------------+
                                                        | Orchestrator     |
                                                        | (Functions/      |
                                                        |  Workers)        |
                                                        +----+--------+----+
                                                             |        |
                        +---------------------+              |        |
                        | Tools & Integrations| <------------+        |
                        | (Calendar/CRM/DB/   |                       |
                        |  Search/RAG)        |                       |
                        +----------+----------+                       |
                                   |                                  |
                                   v                                  v
                            +--------------+                   +--------------+
                            | LLM Gateway  | <---------------> | Policy/Guard |
                            | (provider(s))|                   | Rails        |
                            +------+-------+                   +------+-------+
                                   |                                  |
                                   v                                  v
                            +--------------+                   +--------------+
                            | Outbox/Approval                  | Observability|
                            | Workflow (HITL)  | ------------> | (logs, traces|
                            +--------------+                   |  metrics)    |
                                                              +--------------+

Key ideas:

Event-driven: treat each new/updated message as a job; avoid polling where possible.
Separation of concerns: ingestion (email plumbing), orchestration (state machine), reasoning (LLM), and actuation (send email, update systems) are distinct.
Human-in-the-loop (HITL): sensitive actions require review unless certain policies pass.

Choosing your email transport

Gmail API
- Pros: robust, granular scopes, watch/push via Pub/Sub, labels, historyId for idempotency.
- Cons: OAuth complexity; per-user consent or domain-wide delegation.
Microsoft Graph (Outlook/Exchange)
- Pros: unified API for mail/calendar; change notifications; application permissions in Entra ID.
- Cons: permission granularity/config can be intricate; throttling requires backoff discipline.
IMAP/SMTP (fallback)
- Pros: universal, simple.
- Cons: poor webhooks, brittle flags, limited metadata; prefer only when providers’ native APIs aren’t an option.

Recommendation: use native provider APIs for production; reserve IMAP/SMTP for edge cases.

Data model and state

Message identity: use provider messageId + threadId; store provider history/cursor (e.g., Gmail historyId) to dedupe.
Normalized record: from, to/cc/bcc (hash or redact as needed), subject, text/html bodies, attachments metadata, receivedAt, labels, thread snippet.
Processing state machine: NEW → PARSED → CLASSIFIED → DRAFTED → APPROVED → SENT/LOGGED → DONE; with ERROR and RETRY states.

LLM design patterns

Intent + entities: classify “what is this email about?” and extract structured fields (account number, dates, sentiment, SLA).
Tool use (function calling): let the model request tools like “lookupOrder”, “findTimeslots”, or “createTicket”.
Structured outputs: force JSON Schema to reduce hallucinations and simplify downstream logic.
Memory and context: summarize long threads to a running “thread memory” and pass that plus the latest user turn.
Retrieval augmentation (RAG): index policies, FAQs, and product docs; cite passages used to ground replies.
Safety rails: policies for when to answer vs. escalate; never invent confidential data; redact PII before logging.

Prompting strategy (minimal, stable, testable)

System prompt: define role, tone, brand rules, forbidden behaviors, escalation criteria.
Few-shot exemplars: include 3–5 canonical threads that demonstrate expected behavior and JSON outputs.
Guarded decoding: short max tokens for classification calls; higher for drafting; temperature 0–0.3 for reliability.
Deterministic fallbacks: if JSON validation fails, repair by re-prompting the model with the schema and previous output.

Security, privacy, and compliance

Least privilege: request only read-only scopes for classification; add send/modify scopes in isolated workers.
Secrets management: store OAuth tokens and API keys in a KMS-backed vault; rotate regularly.
Encryption: TLS in transit; at-rest encryption with separation of duties for key access.
PII handling: redact or tokenize emails before logging; enable field-level encryption for bodies.
Tenant isolation: separate queues/storage per customer; enforce row-level security.
Auditability: log every action (who/what/when/before/after) with immutable event streams.
Policy overlays: if you operate in regulated environments (e.g., finance/health), apply DLP, allow-list tools, and HITL for sends.

Implementation: Python skeleton

The snippet below shows an end-to-end path: fetch unread emails (Gmail API), classify and draft via an LLM, then queue for approval or send.

import base64, json, os, time
from email.message import EmailMessage
from google.oauth2.credentials import Credentials
from googleapiclient.discovery import build
import requests

GMAIL_SCOPES = [
    "https://www.googleapis.com/auth/gmail.readonly",
    "https://www.googleapis.com/auth/gmail.send",
    "https://www.googleapis.com/auth/gmail.modify",
]

LLM_ENDPOINT = os.environ["LLM_ENDPOINT"]  # e.g., https://api.your-llm.com/v1/chat/completions
LLM_KEY = os.environ["LLM_KEY"]

creds = Credentials.from_authorized_user_file("token.json", GMAIL_SCOPES)
gmail = build("gmail", "v1", credentials=creds)

SCHEMA = {
    "type": "object",
    "properties": {
        "intent": {"type": "string", "enum": [
            "faq", "support", "sales", "scheduling", "spam", "other"
        ]},
        "confidence": {"type": "number"},
        "entities": {"type": "object"},
        "draft": {"type": "string"},
        "auto_send_ok": {"type": "boolean"}
    },
    "required": ["intent", "confidence", "draft", "auto_send_ok"]
}

SYSTEM_PROMPT = (
    "You are an email assistant. Classify intent, extract entities, and draft a concise,"
    " on-brand reply. Obey policy: never invent facts, cite provided context snippets,"
    " and only set auto_send_ok=true for routine FAQs with >=0.85 confidence."
)


def call_llm(thread_context: str) -> dict:
    payload = {
        "model": "your-model-name",
        "messages": [
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": thread_context}
        ],
        "temperature": 0.2,
        "response_format": {"type": "json_schema", "json_schema": {"name": "email_decision", "schema": SCHEMA}}
    }
    r = requests.post(LLM_ENDPOINT, headers={"Authorization": f"Bearer {LLM_KEY}"}, json=payload, timeout=30)
    r.raise_for_status()
    return json.loads(r.json()["choices"][0]["message"]["content"])  # provider-dependent


def fetch_unread_messages(user_id="me", max_results=10):
    res = gmail.users().messages().list(userId=user_id, q="is:unread in:inbox", maxResults=max_results).execute()
    return res.get("messages", [])


def get_thread_text(message_id: str) -> str:
    msg = gmail.users().messages().get(userId="me", id=message_id, format="full").execute()
    headers = {h['name'].lower(): h['value'] for h in msg.get('payload', {}).get('headers', [])}
    subject = headers.get('subject', '(no subject)')
    frm = headers.get('from', '')
    snippet = msg.get('snippet', '')
    # Extract plain text body
    def walk(p):
        if p.get('mimeType') == 'text/plain' and 'data' in p.get('body', {}):
            return base64.urlsafe_b64decode(p['body']['data']).decode('utf-8', errors='ignore')
        for part in p.get('parts', []) or []:
            t = walk(part)
            if t:
                return t
        return ''
    body = walk(msg.get('payload', {}))
    return f"From: {frm}\nSubject: {subject}\n\nSnippet: {snippet}\n\nBody:\n{body[:8000]}"


def send_reply(original_id: str, to_addr: str, subject: str, body_text: str):
    message = EmailMessage()
    message['To'] = to_addr
    message['Subject'] = f"Re: {subject}"
    message['In-Reply-To'] = original_id
    message.set_content(body_text)
    raw = base64.urlsafe_b64encode(message.as_bytes()).decode()
    gmail.users().messages().send(userId="me", body={"raw": raw}).execute()


def process():
    for m in fetch_unread_messages():
        mid = m['id']
        context = get_thread_text(mid)
        decision = call_llm(context)
        # Mark as processed to avoid duplicates
        gmail.users().messages().modify(userId="me", id=mid, body={"removeLabelIds": ["UNREAD"], "addLabelIds": ["STARRED"]}).execute()
        # Route
        if decision.get('intent') == 'spam':
            gmail.users().messages().trash(userId="me", id=mid).execute()
            continue
        # HITL gate example
        if decision.get('auto_send_ok') and decision.get('confidence', 0) >= 0.85:
            # Extract reply target and subject from headers again for reliability
            # (left as exercise to parse From/Subject from headers)
            print("Auto-sending reply...")
            # send_reply(mid, to_addr, subject, decision['draft'])
        else:
            # enqueue for approval UI with decision + draft
            print(json.dumps({"message_id": mid, "decision": decision}, indent=2))
        time.sleep(0.25)  # basic pacing; replace with async workers

if __name__ == "__main__":
    process()

Notes:

For production, subscribe to Gmail push notifications (watch/stop) and process deltas using historyId to achieve exactly-once semantics.
Replace the LLM call with your provider and SDK of choice; ensure structured output validation.
Never auto-send without explicit policy checks and audit trails.

Node.js: intent-and-draft microservice

import express from "express";
import Ajv from "ajv";
import fetch from "node-fetch";

const app = express();
app.use(express.json({ limit: "1mb" }));

const schema = { type: "object", properties: { intent: { type: "string" }, draft: { type: "string" } }, required: ["intent", "draft"] };
const ajv = new Ajv();
const validate = ajv.compile(schema);

app.post("/classify-draft", async (req, res) => {
  const { threadContext } = req.body;
  const r = await fetch(process.env.LLM_ENDPOINT!, {
    method: "POST",
    headers: { Authorization: `Bearer ${process.env.LLM_KEY}`, "Content-Type": "application/json" },
    body: JSON.stringify({ model: "your-model", messages: [{ role: "system", content: "You are an email assistant." }, { role: "user", content: threadContext }], response_format: { type: "json_schema", json_schema: { name: "out", schema } } })
  });
  const data = await r.json();
  const content = JSON.parse(data.choices[0].message.content);
  if (!validate(content)) return res.status(422).json({ error: validate.errors });
  res.json(content);
});

app.listen(3000, () => console.log("up on :3000"));

This microservice is language-agnostic and easy to swap across providers.

Reliability and scale patterns

Idempotency: key jobs by provider messageId + historyId; store a processing checksum to prevent duplicates.
Retries with backoff and jitter: handle 429/5xx from email and LLM APIs gracefully.
Rate limits: batch operations where allowed; prefer incremental sync via webhooks.
Timeouts and circuit breakers: isolate flaky dependencies; fail closed for sending.
Deterministic pipelines: split classification, RAG retrieval, drafting, and sending into distinct steps with persisted artifacts.
Testing: snapshot tests for prompts and outputs; red-team adversarial emails; regression suites per release.

Cost and latency management

Model selection: use small, fast models for intent extraction; reserve larger models for complex drafts.
Prompt compaction: summarize long threads, pass only the latest turns plus a compressed memory.
Caching: semantic cache for FAQs; store top K retrieved passages to bypass repeated RAG calls.
Batching: process new messages in micro-batches to amortize startup overhead.
Streaming: surface partial drafts to the approval UI while the model completes.

Tooling and ecosystem

SDKs: official Gmail/Microsoft Graph SDKs; IMAP/SMTP libraries for legacy.
Orchestration: serverless functions for bursty workloads; queues (SQS, Pub/Sub) for backpressure.
Evaluation: prompt/unit tests, human rating of drafts, rubric scoring, and automatic regression dashboards.
Observability: distributed tracing around LLM calls; redact spans; capture token/latency/cost per step.

Production checklist

OAuth scopes are least-privilege and segregated by worker role.
Webhook/watch is configured; polling used only as a fallback.
Idempotency and deduplication validated with provider cursors.
Structured outputs enforced with JSON Schema and repair loop.
HITL approvals for non-trivial messages; audit logs enabled.
PII redaction in logs/traces; encryption keys rotated.
Prompt and RAG sources versioned; regression tests green.
Dashboards track accuracy, auto-send rate, deflection, latency, and cost.

Final thoughts

Great AI email assistants are predictable systems, not magic. By separating ingestion, reasoning, and actuation; enforcing structure and policies; and designing for reliability from day one, you can move from “cool demo” to an assistant that safely saves hours every week—and scales with your team and your customers.

AI Document Understanding API Tutorial: From PDFs to Structured Data in Production

Build a production‑ready pipeline for AI document understanding: upload, OCR, schema‑based extraction, tables, QA, validation, and storage.

ASOasis

Apr 17, 2026

DeepSeek API Integration Tutorial: From First Call to Production

Step-by-step DeepSeek API integration: base URL, models, cURL/Python/Node code, streaming, thinking mode, tool calls, errors, and production tips.

ASOasis

Mar 26, 2026

LangChain API Tutorial: From Hello World to Production RAG with FastAPI and LangServe

Build a production-ready LangChain API: LCEL chains, LangServe, FastAPI streaming, RAG, structured outputs, testing, and deployment tips.

ASOasis

Mar 8, 2026

Building an AI Email Assistant with APIs: Architecture, Code, and Best Practices

Overview

Common use cases

Reference architecture

Choosing your email transport

Data model and state

LLM design patterns

Prompting strategy (minimal, stable, testable)

Security, privacy, and compliance

Implementation: Python skeleton

Node.js: intent-and-draft microservice

Reliability and scale patterns

Cost and latency management

Tooling and ecosystem

Production checklist

Final thoughts

Tags

Related Posts

AI Document Understanding API Tutorial: From PDFs to Structured Data in Production

DeepSeek API Integration Tutorial: From First Call to Production

LangChain API Tutorial: From Hello World to Production RAG with FastAPI and LangServe

Services

Products

Company

Legal