A containerized stack of open-source services. Each component is isolated, replaceable, and independently scalable.
Every request stays on your network. Here's the path from browser to model and back.
Through the React web UI — upload a file, type a question, or trigger a TTS job.
Django backend validates the request, authenticates the session, and routes to the appropriate service handler.
For RAG queries: embeddings are generated, relevant chunks are retrieved from the vector store, and a prompt is constructed with context.
The request hits your chosen inference engine — Ollama, vLLM, LM Studio, or any OpenAI-compatible server running on your hardware.
Tokens stream back in real-time via SSE. No data leaves your network. Everything is logged locally.
Four core containers, each with a single responsibility. Swap any component without touching the others.
Next.js application served through a Caddy reverse proxy. Provides the full user interface — chat, file uploads, voice transcription, model management, and settings. Communicates with the Django backend via REST API and Server-Sent Events for real-time streaming.
Django REST API that orchestrates all services. Handles authentication, file parsing, RAG pipeline, prompt construction, and routes requests to the correct model engine or service. All business logic lives here.
Ollama runs as a bundled container by default, giving you instant access to hundreds of open-source models. You can also point Local AI at a host-installed Ollama instance or any other OpenAI-compatible inference server by changing a single environment variable.
PostgreSQL stores all application data — users, chat history, documents, and RAG embeddings. A separate RAG service handles document ingestion, chunking, and vector search. All data is persisted in named Docker volumes on your machine.
Here's what the default docker-compose.yml looks like.
local-ai is designed from the ground up for air-gapped, on-premise deployment.
No telemetry, no analytics, no outbound network requests. Runs fully offline after initial Docker pull.
All files, embeddings, chat history, and generated outputs stay in Docker volumes on your machine.
Built-in session auth for multi-user setups. Drop in your own SSO/LDAP provider via environment config.
Every query, file upload, and model call is logged locally. Export logs for compliance and review.
Install in under 2 minutes and explore the stack yourself.