Back to Search
mudler

mudler/LocalAI

:robot: The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more. Features: Generate Text, MCP, Audio, Video, Images, Voice Cloning, Distributed, P2P and decentralized inference

42,224stars
3,463forks
42,224watchers
MIT License
Updated 1/21/2026
aiapiaudio-generationdecentralizeddistributedgemmaimage-generationlibp2pllamallmmambamcpmistralmusicgenobject-detectionrerankrwkvstable-diffusiontext-generationtts

README.md




LocalAI forks LocalAI stars LocalAI pull-requests

LocalAI Docker hub LocalAI Quay.io

Follow LocalAI_API Join LocalAI Discord Community

mudler%2FLocalAI | Trendshift

:bulb: Get help - ❓FAQ πŸ’­Discussions :speech_balloon: Discord :book: Documentation website

πŸ’» Quickstart πŸ–ΌοΈ Models πŸš€ Roadmap πŸ›« Examples Try on Telegram

testsBuild and Releasebuild container imagesBump dependenciesArtifact Hub

LocalAI is the free, Open Source OpenAI alternative. LocalAI act as a drop-in replacement REST API that's compatible with OpenAI (Elevenlabs, Anthropic... ) API specifications for local AI inferencing. It allows you to run LLMs, generate images, audio (and not only) locally or on-prem with consumer grade hardware, supporting multiple model families. Does not require GPU. It is created and maintained by Ettore Di Giacinto.

πŸ“šπŸ†• Local Stack Family

πŸ†• LocalAI is now part of a comprehensive suite of AI tools designed to work together:

LocalAGI Logo

LocalAGI

A powerful Local AI agent management platform that serves as a drop-in replacement for OpenAI's Responses API, enhanced with advanced agentic capabilities.

LocalRecall Logo

LocalRecall

A REST-ful API and knowledge base management system that provides persistent memory and storage capabilities for AI agents.

Screenshots / Video

Youtube video




Screenshots

Talk InterfaceGenerate Audio
Screenshot 2025-03-31 at 12-01-36 LocalAI - TalkScreenshot 2025-03-31 at 12-01-29 LocalAI - Generate audio with voice-en-us-ryan-low
Models OverviewGenerate Images
Screenshot 2025-03-31 at 12-01-20 LocalAI - ModelsScreenshot 2025-03-31 at 12-31-41 LocalAI - Generate images with flux 1-dev
Chat InterfaceHome
Screenshot 2025-03-31 at 11-57-44 LocalAI - Chat with localai-functioncall-qwen2 5-7b-v0 5Screenshot 2025-03-31 at 11-57-23 LocalAI API - c2a39e3 (c2a39e3639227cfd94ffffe9f5691239acc275a8)
LoginSwarm
Screenshot 2025-03-31 at 12-09-59 Screenshot 2025-03-31 at 12-10-39 LocalAI - P2P dashboard

πŸ’» Quickstart

⚠️ Note: The install.sh script is currently experiencing issues due to the heavy changes currently undergoing in LocalAI and may produce broken or misconfigured installations. Please use Docker installation (see below) or manual binary installation until issue #8032 is resolved.

Run the installer script:

# Basic installation
curl https://localai.io/install.sh | sh

For more installation options, see Installer Options.

macOS Download:

Download LocalAI for macOS

Note: the DMGs are not signed by Apple as quarantined. See https://github.com/mudler/LocalAI/issues/6268 for a workaround, fix is tracked here: https://github.com/mudler/LocalAI/issues/6244

Containers (Docker, podman, ...)

πŸ’‘ Docker Run vs Docker Start

  • docker run creates and starts a new container. If a container with the same name already exists, this command will fail.
  • docker start starts an existing container that was previously created with docker run.

If you've already run LocalAI before and want to start it again, use: docker start -i local-ai

CPU only image:

docker run -ti --name local-ai -p 8080:8080 localai/localai:latest

NVIDIA GPU Images:

# CUDA 13.0
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-13

# CUDA 12.0
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-12

# NVIDIA Jetson (L4T) ARM64
# CUDA 12 (for Nvidia AGX Orin and similar platforms)
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-nvidia-l4t-arm64

# CUDA 13 (for Nvidia DGX Spark)
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-nvidia-l4t-arm64-cuda-13

AMD GPU Images (ROCm):

docker run -ti --name local-ai -p 8080:8080 --device=/dev/kfd --device=/dev/dri --group-add=video localai/localai:latest-gpu-hipblas

Intel GPU Images (oneAPI):

docker run -ti --name local-ai -p 8080:8080 --device=/dev/dri/card1 --device=/dev/dri/renderD128 localai/localai:latest-gpu-intel

Vulkan GPU Images:

docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-vulkan

AIO Images (pre-downloaded models):

# CPU version
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-cpu

# NVIDIA CUDA 13 version
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-aio-gpu-nvidia-cuda-13

# NVIDIA CUDA 12 version
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-aio-gpu-nvidia-cuda-12

# Intel GPU version
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-gpu-intel

# AMD GPU version
docker run -ti --name local-ai -p 8080:8080 --device=/dev/kfd --device=/dev/dri --group-add=video localai/localai:latest-aio-gpu-hipblas

For more information about the AIO images and pre-downloaded models, see Container Documentation.

To load models:

# From the model gallery (see available models with `local-ai models list`, in the WebUI from the model tab, or visiting https://models.localai.io)
local-ai run llama-3.2-1b-instruct:q4_k_m
# Start LocalAI with the phi-2 model directly from huggingface
local-ai run huggingface://TheBloke/phi-2-GGUF/phi-2.Q8_0.gguf
# Install and run a model from the Ollama OCI registry
local-ai run ollama://gemma:2b
# Run a model from a configuration file
local-ai run https://gist.githubusercontent.com/.../phi-2.yaml
# Install and run a model from a standard OCI registry (e.g., Docker Hub)
local-ai run oci://localai/phi-2:latest

⚑ Automatic Backend Detection: When you install models from the gallery or YAML files, LocalAI automatically detects your system's GPU capabilities (NVIDIA, AMD, Intel) and downloads the appropriate backend. For advanced configuration options, see GPU Acceleration.

For more information, see πŸ’» Getting started, if you are interested in our roadmap items and future enhancements, you can see the Issues labeled as Roadmap here

πŸ“° Latest project news

Roadmap items: List of issues

πŸš€ Features

🧩 Supported Backends & Acceleration

LocalAI supports a comprehensive range of AI backends with multiple acceleration options:

Text Generation & Language Models

BackendDescriptionAcceleration Support
llama.cppLLM inference in C/C++CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, CPU
vLLMFast LLM inference with PagedAttentionCUDA 12/13, ROCm, Intel
transformersHuggingFace transformers frameworkCUDA 12/13, ROCm, Intel, CPU
exllama2GPTQ inference libraryCUDA 12/13
MLXApple Silicon LLM inferenceMetal (M1/M2/M3+)
MLX-VLMApple Silicon Vision-Language ModelsMetal (M1/M2/M3+)

Audio & Speech Processing

BackendDescriptionAcceleration Support
whisper.cppOpenAI Whisper in C/C++CUDA 12/13, ROCm, Intel SYCL, Vulkan, CPU
faster-whisperFast Whisper with CTranslate2CUDA 12/13, ROCm, Intel, CPU
barkText-to-audio generationCUDA 12/13, ROCm, Intel
bark-cppC++ implementation of BarkCUDA, Metal, CPU
coquiAdvanced TTS with 1100+ languagesCUDA 12/13, ROCm, Intel, CPU
kokoroLightweight TTS modelCUDA 12/13, ROCm, Intel, CPU
chatterboxProduction-grade TTSCUDA 12/13, CPU
piperFast neural TTS systemCPU
kitten-ttsKitten TTS modelsCPU
silero-vadVoice Activity DetectionCPU
neuttsText-to-speech with voice cloningCUDA 12/13, ROCm, CPU
vibevoiceReal-time TTS with voice cloningCUDA 12/13, ROCm, Intel, CPU
pocket-ttsLightweight CPU-based TTSCUDA 12/13, ROCm, Intel, CPU

Image & Video Generation

BackendDescriptionAcceleration Support
stablediffusion.cppStable Diffusion in C/C++CUDA 12/13, Intel SYCL, Vulkan, CPU
diffusersHuggingFace diffusion modelsCUDA 12/13, ROCm, Intel, Metal, CPU

Specialized AI Tasks

BackendDescriptionAcceleration Support
rfdetrReal-time object detectionCUDA 12/13, Intel, CPU
rerankersDocument reranking APICUDA 12/13, ROCm, Intel, CPU
local-storeVector databaseCPU
huggingfaceHuggingFace API integrationAPI-based

Hardware Acceleration Matrix

Acceleration TypeSupported BackendsHardware Support
NVIDIA CUDA 12All CUDA-compatible backendsNvidia hardware
NVIDIA CUDA 13All CUDA-compatible backendsNvidia hardware
AMD ROCmllama.cpp, whisper, vllm, transformers, diffusers, rerankers, coqui, kokoro, bark, neutts, vibevoice, pocket-ttsAMD Graphics
Intel oneAPIllama.cpp, whisper, stablediffusion, vllm, transformers, diffusers, rfdetr, rerankers, exllama2, coqui, kokoro, bark, vibevoice, pocket-ttsIntel Arc, Intel iGPUs
Apple Metalllama.cpp, whisper, diffusers, MLX, MLX-VLM, bark-cppApple M1/M2/M3+
Vulkanllama.cpp, whisper, stablediffusionCross-platform GPUs
NVIDIA Jetson (CUDA 12)llama.cpp, whisper, stablediffusion, diffusers, rfdetrARM64 embedded AI (AGX Orin, etc.)
NVIDIA Jetson (CUDA 13)llama.cpp, whisper, stablediffusion, diffusers, rfdetrARM64 embedded AI (DGX Spark)
CPU OptimizedAll backendsAVX/AVX2/AVX512, quantization support

πŸ”— Community and integrations

Build and deploy custom containers:

WebUIs:

Agentic Libraries:

MCPs:

Model galleries

Voice:

Other:

πŸ”— Resources

:book: πŸŽ₯ Media, Blogs, Social

Citation

If you utilize this repository, data in a downstream project, please consider citing it with:

@misc{localai,
  author = {Ettore Di Giacinto},
  title = {LocalAI: The free, Open source OpenAI alternative},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/go-skynet/LocalAI}},

❀️ Sponsors

Do you find LocalAI useful?

Support the project by becoming a backer or sponsor. Your logo will show up here with a link to your website.

A huge thank you to our generous sponsors who support this project covering CI expenses, and our Sponsor list:


Individual sponsors

A special thanks to individual sponsors that contributed to the project, a full list is in Github and buymeacoffee, a special shout out goes to drikster80 for being generous. Thank you everyone!

🌟 Star history

LocalAI Star history Chart

πŸ“– License

LocalAI is a community-driven project created by Ettore Di Giacinto.

MIT - Author Ettore Di Giacinto mudler@localai.io

πŸ™‡ Acknowledgements

LocalAI couldn't have been built without the help of great software already available from the community. Thank you!

πŸ€— Contributors

This is a community project, a special thanks to our contributors! πŸ€—