Build an AI App with Flutter: Architecture, Streaming, and Production Best Practices

Overview

Flutter lets you ship AI experiences to iOS, Android, web, and desktop from a single codebase. In this guide, you’ll plan the architecture, wire up a streaming chat UI, secure provider keys behind a proxy, add optional on‑device ML, and ship with confidence. Examples use idiomatic Dart and Flutter patterns you can adapt to any AI model provider.

Architecture at a glance

A production AI app typically has three layers:

Client (Flutter): UI, state management, prompt assembly, lightweight caching, and safety UX (disclaimers, feedback).
Edge/backend proxy: Hides provider API keys, unifies vendor differences, adds rate limiting, logging, and usage metering. Streams tokens back to the app.
Providers/models: Hosted LLMs, embedding services, vision and speech APIs, and optional on‑device models for offline or low‑latency paths.

Recommendation: keep your Flutter client provider‑agnostic by defining a small interface (e.g., LlmClient) and implementing adapters behind your proxy.

Choosing models and capabilities

Select models by the job to be done, not by hype:

Chat/assistants: general multimodal LLM with streaming.
Summarization/classification: fast, smaller models; batch where possible.
Vision: OCR, captioning, or VQA (vision‑question answering) via provider or on‑device MLKit/TFLite.
Speech: streaming STT and TTS for voice agents.
Embeddings: for search, RAG, and similarity; store locally (Hive/Sqflite) or in a vector service.

Plan for vendor churn: standardize request/response shapes in your proxy so you can swap providers without touching Flutter UI code.

Project setup

Create a new Flutter app and add a few essentials.

# pubspec.yaml (high-level deps; pin versions in your project)
dependencies:
  flutter:
    sdk: flutter
  dio: any                 # robust HTTP + streaming
  flutter_riverpod: any    # state management
  go_router: any           # navigation
  freezed_annotation: any  # data classes
  json_annotation: any     # (de)serialization
  hive_flutter: any        # local cache/persistence
  uuid: any                # ids for messages/sessions
  speech_to_text: any      # optional voice input
  flutter_tts: any         # optional voice output
  tflite_flutter: any      # optional on-device models

dev_dependencies:
  build_runner: any
  freezed: any
  json_serializable: any
  flutter_test:
    sdk: flutter

Run flutter pub get and set up your usual analysis options and CI (format + analyze + test).

Data model and state

Define a minimal, serializable message model and an abstraction for the LLM client.

// message.dart
import 'package:freezed_annotation/freezed_annotation.dart';
part 'message.freezed.dart';
part 'message.g.dart';

@freezed
class Msg with _$Msg {
  const factory Msg({
    required String id,
    required String role, // 'user' | 'assistant' | 'system'
    required String content,
    DateTime? createdAt,
  }) = _Msg;

  factory Msg.fromJson(Map<String, dynamic> json) => _$MsgFromJson(json);
}

// llm_client.dart
abstract class LlmClient {
  // Returns tokens as they stream in
  Stream<String> streamChat({required List<Msg> messages});
}

Riverpod providers keep UI reactive:

// providers.dart
final messagesProvider = StateProvider<List<Msg>>((_) => const []);

final llmClientProvider = Provider<LlmClient>((ref) {
  final baseUrl = const String.fromEnvironment('PROXY_URL');
  return HttpLlmClient(baseUrl: baseUrl);
});

final streamProvider = StreamProvider.family<String, List<Msg>>((ref, msgs) {
  final client = ref.watch(llmClientProvider);
  return client.streamChat(messages: msgs);
});

HTTP streaming from a proxy

Flutter should never ship provider API keys. Use a thin proxy that authenticates your app (JWT, session, or signed nonce), enforces quotas, and forwards to the model API. The proxy should normalize responses and emit Server‑Sent Events (SSE) or chunked NDJSON.

Example: proxy with Cloudflare Workers (SSE passthrough)

// worker.js (simplified)
export default {
  async fetch(req, env) {
    if (req.method !== 'POST') return new Response('Not found', { status: 404 });

    // Authenticate your app request here (omitted for brevity)

    const body = await req.json(); // { messages: [...], model: 'default' }

    const upstream = await fetch(env.UPSTREAM_URL, {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${env.PROVIDER_KEY}`,
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({
        model: env.DEFAULT_MODEL,
        messages: body.messages,
        stream: true,
      }),
    });

    return new Response(upstream.body, {
      headers: { 'Content-Type': 'text/event-stream' },
    });
  }
}

Bind PROVIDER_KEY, DEFAULT_MODEL, and UPSTREAM_URL as encrypted environment variables. Add logging, rate limiting, and safety filters in production.

Flutter client for SSE

// http_llm_client.dart
import 'dart:convert';
import 'package:dio/dio.dart';
import 'message.dart';
import 'llm_client.dart';

class HttpLlmClient implements LlmClient {
  HttpLlmClient({required this.baseUrl});
  final String baseUrl;
  final _dio = Dio(BaseOptions(connectTimeout: const Duration(seconds: 15)));

  @override
  Stream<String> streamChat({required List<Msg> messages}) async* {
    final req = await _dio.fetch<ResponseBody>(Options(
      method: 'POST',
      responseType: ResponseType.stream,
      headers: {'Accept': 'text/event-stream'},
    ).compose(_dio.options, '/v1/chat/stream', data: {
      'messages': messages.map((m) => m.toJson()).toList(),
    }));

    // SSE frames: lines like "data: {\"delta\":\"he\"}"
    await for (final chunk in req.data!.stream.transform(utf8.decoder)) {
      for (final line in const LineSplitter().convert(chunk)) {
        if (!line.startsWith('data:')) continue;
        final payload = line.substring(5).trim();
        if (payload == '[DONE]') return;
        try {
          final map = json.decode(payload) as Map<String, dynamic>;
          final delta = map['delta'] as String?;
          if (delta != null && delta.isNotEmpty) yield delta;
        } catch (_) {
          // tolerate keep-alives/heartbeat
        }
      }
    }
  }
}

Building the chat UI

A minimal streaming chat screen:

// chat_page.dart
import 'package:flutter/material.dart';
import 'package:flutter_riverpod/flutter_riverpod.dart';
import 'providers.dart';
import 'message.dart';
import 'package:uuid/uuid.dart';

class ChatPage extends ConsumerStatefulWidget {
  const ChatPage({super.key});
  @override
  ConsumerState<ChatPage> createState() => _ChatPageState();
}

class _ChatPageState extends ConsumerState<ChatPage> {
  final _ctrl = TextEditingController();
  final _uuid = const Uuid();

  void _send() async {
    final text = _ctrl.text.trim();
    if (text.isEmpty) return;
    _ctrl.clear();

    final id = _uuid.v4();
    final userMsg = Msg(id: id, role: 'user', content: text, createdAt: DateTime.now());
    ref.read(messagesProvider.notifier).update((list) => [...list, userMsg]);

    final messages = ref.read(messagesProvider);
    final stream = ref.read(streamProvider(messages).stream);

    final aiId = _uuid.v4();
    var acc = '';
    ref.read(messagesProvider.notifier).update((list) => [
      ...list,
      Msg(id: aiId, role: 'assistant', content: '', createdAt: DateTime.now()),
    ]);

    await for (final token in stream) {
      acc += token;
      ref.read(messagesProvider.notifier).update((list) => [
        ...list.where((m) => m.id != aiId),
        Msg(id: aiId, role: 'assistant', content: acc, createdAt: DateTime.now()),
      ]);
    }
  }

  @override
  Widget build(BuildContext context) {
    final messages = ref.watch(messagesProvider);
    return Scaffold(
      appBar: AppBar(title: const Text('AI Assistant')),
      body: Column(children: [
        Expanded(
          child: ListView.builder(
            padding: const EdgeInsets.all(12),
            itemCount: messages.length,
            itemBuilder: (_, i) {
              final m = messages[i];
              final isUser = m.role == 'user';
              return Align(
                alignment: isUser ? Alignment.centerRight : Alignment.centerLeft,
                child: Container(
                  margin: const EdgeInsets.symmetric(vertical: 6),
                  padding: const EdgeInsets.all(12),
                  decoration: BoxDecoration(
                    color: isUser ? Colors.blue.shade100 : Colors.grey.shade200,
                    borderRadius: BorderRadius.circular(12),
                  ),
                  child: Text(m.content),
                ),
              );
            },
          ),
        ),
        SafeArea(
          child: Row(children: [
            Expanded(
              child: TextField(
                controller: _ctrl,
                minLines: 1,
                maxLines: 5,
                decoration: const InputDecoration(hintText: 'Type a message…'),
              ),
            ),
            IconButton(icon: const Icon(Icons.send), onPressed: _send),
          ]),
        ),
      ]),
    );
  }
}

Prompt assembly and system instructions

Keep system instructions short, stable, and versioned in your proxy.
Use message templates with placeholders and guardrails; avoid concatenating arbitrary user text into prompts without sanitization.
Consider storing “prompt variants” server‑side and referencing them by id from the client for A/B tests.

Persistence, sessions, and offline

Store recent messages in Hive for instant resume.
For long histories, snapshot to the backend and summarize context to control token usage.
Provide a read‑only fallback when offline; if you support on‑device models, switch adapters automatically.

On‑device ML options (optional)

Text tasks: ship a small TFLite sequence model for classification or intent routing.
Vision: use MLKit/TFLite for OCR or basic object detection when you need offline capture.
Hybrid: perform light on‑device preprocessing (e.g., transcription, redaction) before sending to the LLM.

Example: running a TFLite model in an isolate to avoid janking the UI.

import 'dart:isolate';
import 'package:tflite_flutter/tflite_flutter.dart';

Future<List<double>> runModel(List<double> input) => Isolate.run(() async {
  final interpreter = await Interpreter.fromAsset('model.tflite');
  final output = List.filled(128, 0.0).reshape([1, 128]);
  interpreter.run([input], output);
  return output.first.cast<double>();
});

Voice: speech in, speech out

Mic capture: speech_to_text for on‑device or stream audio chunks to your proxy for cloud STT.
TTS: flutter_tts or a provider TTS endpoint.
UX: show live partial transcripts and a large “hold to speak” button; clearly indicate recording state.

RAG (Retrieval‑Augmented Generation)

Generate embeddings for your documents server‑side; store vectors in a database.
On the client, send the query to the proxy; the proxy retrieves top‑k passages and constructs the final prompt.
Cache recent retrievals locally for snappy back/forward navigation.

Safety, privacy, and guardrails

Never embed provider keys in the app bundle.
Add content filters in your proxy and clear user messaging about limitations.
For sensitive inputs, redact PII on‑device before sending upstream.
Log prompts/responses only with explicit consent; provide a “Forget my data” control.

Performance and battery

Stream responses to keep time‑to‑first‑token under a second for perceived speed.
Use Isolate.run for CPU‑heavy parsing or TFLite.
Batch network calls, compress large payloads, and keep request bodies lean (avoid sending entire histories every turn).
Cache images and transcripts; use memoizer patterns for repeated UI recompositions.

Cost control

Truncate or summarize context; show token/cost estimates before long operations.
Prefer smaller, faster models for background tasks.
Implement server‑side rate limits per user and per route.

Testing and quality

Unit: prompt builders, JSON mappers, and safety filters.
Widget/golden tests: message bubbles, loading cursors, error states.
Integration: a fake SSE server that emits known token sequences to verify UI streaming.

Example: fake SSE in tests.

// Emits SSE frames like a provider would
Stream<List<int>> fakeSseStream(List<String> tokens) async* {
  for (final t in tokens) {
    final frame = 'data: {"delta":"$t"}\n\n';
    yield utf8.encode(frame);
    await Future<void>.delayed(const Duration(milliseconds: 10));
  }
  yield utf8.encode('data: [DONE]\n\n');
}

Deployment checklist

iOS: NSMicrophoneUsageDescription for voice; ATS over HTTPS only; background modes if needed.
Android: RECORD_AUDIO permission; foreground service notification for long audio tasks.
Web: fallback for mic/FS retrieval; CORS enabled on proxy.
Desktop: file system prompts require clear disclosure.
Env/CI: inject PROXY_URL via --dart-define per flavor (dev/stage/prod).

Observability and analytics

Log latency, tokens, and error codes at the proxy.
Capture client UX metrics (TTFT, drop‑offs, retries) with privacy in mind.
Add a “Was this helpful?” rating per response to improve prompts.

Common pitfalls

Storing secrets in the app. Fix: always go through a proxy.
Freezing the UI on large JSON payloads. Fix: parse in an isolate.
Unbounded histories causing token blow‑ups. Fix: summarize and window.
Ignoring error states. Fix: design for rate limits, timeouts, and partial outputs.

A minimal end‑to‑end flow

User types a message; add it to local state.
Flutter posts messages to /v1/chat/stream on your proxy.
Proxy validates auth, selects a model, starts a streaming request upstream.
Tokens stream back as SSE; Flutter updates the assistant bubble in place.
On completion, persist the exchange locally and optionally sync a summary to the backend.

Extending the app

Multimodal: attach images/audio, display provider captions or transcripts inline.
Tools/functions: let the model call weather, calendar, or custom APIs through the proxy with strict schemas.
Memory: store user preferences as structured facts rather than raw prompt text.

Starter backlog you can ship this week

Day 1–2: Scaffold Flutter app, Riverpod, chat UI, and Cloudflare Worker proxy.
Day 3: Streaming end‑to‑end with proper error handling and retries.
Day 4: Local persistence with Hive + session list; basic rate limiting in proxy.
Day 5: Voice input/output; simple analytics and feedback chip.

Conclusion

Flutter is a pragmatic foundation for AI apps: one codebase, native performance, and flexible integration with cloud and on‑device intelligence. Keep secrets server‑side, stream everything, measure relentlessly, and design for change—because your provider lineup will evolve. With the patterns here, you can deliver a responsive AI assistant today and keep iterating as models and hardware improve.