💡 Summary
LlamaFarm is an open-source AI platform that enables local AI applications with complete privacy and no cloud dependency.
🎯 Target Audience
🤖 AI Roast: “Powerful, but the setup might scare off the impatient.”
Risk: Medium. Review: shell/CLI command execution; outbound network access (SSRF, data egress); API keys/tokens handling and storage; dependency pinning and supply-chain risk. Run with least privilege and audit before enabling in production.
LlamaFarm - Edge AI for Everyone
Enterprise AI capabilities on your own hardware. No cloud required.
LlamaFarm is an open-source AI platform that runs entirely on your hardware. Build RAG applications, train custom classifiers, detect anomalies, and run document processing—all locally with complete privacy.
- 🔒 Complete Privacy — Your data never leaves your device
- 💰 No API Costs — Use open-source models without per-token fees
- 🌐 Offline Capable — Works without internet once models are downloaded
- ⚡ Hardware Optimized — Automatic GPU/NPU acceleration on Apple Silicon, NVIDIA, and AMD
Desktop App Downloads
Get started instantly — no command line required:
| Platform | Download | |----------|----------| | Mac (Universal) | Download | | Windows | Download | | Linux (x86_64) | Download | | Linux (ARM64) | Download |
What Can You Build?
| Capability | Description | |-----------|-------------| | RAG (Retrieval-Augmented Generation) | Ingest PDFs, docs, CSVs and query them with AI | | Custom Classifiers | Train text classifiers with 8-16 examples using SetFit | | Anomaly Detection | Detect outliers in logs, metrics, or transactions | | OCR & Document Extraction | Extract text and structured data from images and PDFs | | Named Entity Recognition | Find people, organizations, and locations | | Multi-Model Runtime | Switch between Ollama, OpenAI, vLLM, or local GGUF models |
Video demo (90 seconds): https://youtu.be/W7MHGyN0MdQ
Quickstart
Option 1: Desktop App
Download the desktop app above and run it. No additional setup required.
Option 2: CLI + Development Mode
-
Install the CLI
macOS / Linux:
curl -fsSL https://raw.githubusercontent.com/llama-farm/llamafarm/main/install.sh | bashWindows (PowerShell):
irm https://raw.githubusercontent.com/llama-farm/llamafarm/main/install.ps1 | iexOr download directly from releases.
-
Create and run a project
lf init my-project # Generates llamafarm.yaml lf start # Starts services and opens Designer UI -
Chat with your AI
lf chat # Interactive chat lf chat "Hello, LlamaFarm!" # One-off message
The Designer web interface is available at http://localhost:8000.
Option 3: Development from Source
git clone https://github.com/llama-farm/llamafarm.git cd llamafarm # Install Nx globally and initialize the workspace npm install -g nx nx init --useDotNxInstallation --interactive=false # Required on first clone # Start all services (run each in a separate terminal) nx start server # FastAPI server (port 8000) nx start rag # RAG worker for document processing nx start universal-runtime # ML models, OCR, embeddings (port 11540)
Architecture
LlamaFarm consists of three main services:
| Service | Port | Purpose | |---------|------|---------| | Server | 8000 | FastAPI REST API, Designer web UI, project management | | RAG Worker | - | Celery worker for async document processing | | Universal Runtime | 11540 | ML model inference, embeddings, OCR, anomaly detection |
All configuration lives in llamafarm.yaml—no scattered settings or hidden defaults.
Runtime Options
Universal Runtime (Recommended)
The Universal Runtime provides access to HuggingFace models plus specialized ML capabilities:
- Text Generation - Any HuggingFace text model
- Embeddings - sentence-transformers and other embedding models
- OCR - Text extraction from images/PDFs (Surya, EasyOCR, PaddleOCR, Tesseract)
- Document Extraction - Forms, invoices, receipts via vision models
- Text Classification - Pre-trained or custom models via SetFit
- Named Entity Recognition - Extract people, organizations, locations
- Reranking - Cross-encoder models for improved RAG quality
- Anomaly Detection - Isolation Forest, One-Class SVM, Local Outlier Factor, Autoencoders
runtime: models: default: provider: universal model: Qwen/Qwen2.5-1.5B-Instruct base_url: http://127.0.0.1:11540/v1
Ollama
Simple setup for GGUF models with CPU/GPU acceleration:
runtime: models: default: provider: ollama model: qwen3:8b base_url: http://localhost:11434/v1
OpenAI-Compatible
Works with vLLM, Together, Mistral API, or any OpenAI-compatible endpoint:
runtime: models: default: provider: openai model: gpt-4o base_url: https://api.openai.com/v1 api_key: ${OPENAI_API_KEY}
Core Workflows
CLI Commands
| Task | Command |
|------|---------|
| Initialize project | lf init my-project |
| Start services | lf start |
| Interactive chat | lf chat |
| One-off message | lf chat "Your question" |
| List models | lf models list |
| Use specific model | lf chat --model powerful "Question" |
| Create dataset | lf datasets create -s pdf_ingest -b main_db research |
| Upload files (auto-process by default) | lf datasets upload research ./docs/*.pdf |
| Process dataset (if you skipped auto-process) | lf datasets process research |
| Query RAG | lf rag query --database main_db "Your query" |
| Check RAG health | lf rag health |
RAG Pipeline
- Create a dataset linked to a processing strategy and database
- Upload files (PDF, DOCX, Markdown, TXT) — processing runs automatically unless you pass
--no-process - Process manually only when you intentionally skipped auto-processing (e.g., large batches)
- Query using semantic search with optional metadata filtering
lf datasets create -s default -b main_db research lf datasets upload research ./papers/*.pdf # auto-processes by default # For large batches: # lf datasets upload research ./papers/*.pdf --no-process # lf datasets process research lf rag query --database main_db "What are the key findings?"
Designer Web UI
The Designer at http://localhost:8000 provides:
- Visual dataset management with drag-and-drop uploads
- Interactive configuration editor with live validation
- Integrated chat with RAG context
- Switch between visual and YAML editing modes
Configuration
llamafarm.yaml is the source of truth for each project:
version: v1 name: my-assistant namespace: default # Multi-model configuration runtime: default_model: fast models: fast: description: "Fast local model" provider: universal model: Qwen/Qwen2.5-1.5B-Instruct base_url: http://127.0.0.1:11540/v1 powerful: description: "More capable model" provider: universal model: Qwen/Qwen2.5-7B-Instruct base_url: http://127.0.0.1:11540/v1 # System prompts prompts: - name: default messages: - role: system content: You are a helpful assistant. # RAG configuration rag: databases: - name: main_db type: ChromaStore default_embedding_strategy: default_embeddings default_retrieval_strategy: semantic_search embedding_strategies: - name: default_embeddings type: UniversalEmbedder config: model: sentence-transformers/all-MiniLM-L6-v2 base_url: http://127.0.0.1:11540/v1 retrieval_strategies: - name: semantic_search type: BasicSimilarityStrategy config: top_k: 5 data_processing_strategies: - name: default parsers: - type: PDFParser_LlamaIndex config: chunk_size: 1000 chunk_overlap: 100 - type: MarkdownParser_Python config: chunk_size: 1000 extractors: [] # Dataset definitions datasets: - name: research data_processing_strategy: default database: main_db
Environment Variable Substitution
Use ${VAR} syntax to inject secrets from .env files:
runtime: models: openai: api_key: ${OPENAI_API_KEY} # With default: ${OPENAI_API_KEY:-sk-default} # From specific file: ${file:.env.production:API_KEY}
See the Configuration Guide for complete reference.
REST API
LlamaFarm provides an OpenAI-compatible REST API:
Chat Completions
curl -X POST http://localhost:8000/v1/projects/default/my-project/chat/completions \ -H "Content-Type: application/json" \ -d '{ "messages": [{"role": "user", "content": "Hello"}], "stream": false, "rag_enabled": true }'
RAG Query
curl -X POST http://localhost:8000/v1/projects/default/my-project/rag/query \ -H "Content-Type: application/json" \ -d '{ "query": "What are the requirements?", "database": "main_db", "top_k": 5 }'
See the API Reference for all endpoints.
Specia
Pros
- Complete privacy with local data processing
- No API costs associated with usage
- Offline capabilities once models are downloaded
- Supports multiple AI models and frameworks
Cons
- Requires local hardware capable of running AI models
- Initial setup may be complex for non-technical users
- Limited community support compared to larger platforms
- Performance depends on local hardware specifications
Related Skills
pytorch
S“It's the Swiss Army knife of deep learning, but good luck figuring out which of the 47 installation methods is the one that won't break your system.”
agno
S“It promises to be the Kubernetes for agents, but let's see if developers have the patience to learn yet another orchestration layer.”
nuxt-skills
S“It's essentially a well-organized cheat sheet that turns your AI assistant into a Nuxt framework parrot.”
Disclaimer: This content is sourced from GitHub open source projects for display and rating purposes only.
Copyright belongs to the original author llama-farm.
