Nebula
Deploy your models. Anywhere.
Nebula is an open-source platform designed to simplify the deployment and scaling of large language models (LLMs). The platform provides a developer experience akin to Vercel or Heroku, but specifically for AI models, abstracting away the complexities of infrastructure management such as CUDA, Docker, and Kubernetes.
The Problem
Machine learning infrastructure today is fragmented. Serving a model requires deep knowledge of:
- CUDA and GPU drivers
- Docker containerization
- Kubernetes orchestration
- Cloud provider APIs
- Inference runtime configuration
Developers want to focus on building AI applications, not managing infrastructure.
The Solution
Nebula provides a "Vercel for LLMs" experience:
- Choose a model (e.g.,
mistralai/Mistral-7B-Instruct) - Deploy with a single command:
nebulactl deploy mistral - Get an OpenAI-compatible endpoint within minutes
It works everywhere — from your laptop to AWS, GCP, or private clusters — with no vendor lock-in.
Target Audience
AI/ML Developers who need to:
- Quickly deploy models to get accessible API endpoints
- Test fine-tuned models without infrastructure overhead
- Focus on model development, not DevOps
- Avoid vendor lock-in across cloud providers
Key Features
- Simple CLI - Deploy models with a single command
- Multiple Runtimes - vLLM, TGI, Ollama support
- GPU Management - Automatic GPU discovery and allocation
- OpenAI-compatible API - Works with existing tools and SDKs
- SSH Provisioning - Automatic node setup via SSH
- Prometheus Metrics - Built-in observability
Quick Example
# Add your local machine as a compute node
nebulactl node add --local
# Deploy a model
nebulactl deploy mistralai/Mistral-7B-Instruct \
--runtime vllm \
--device gpu
# Test the endpoint
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"messages": [{"role": "user", "content": "Hello!"}]}'
Architecture Overview
Nebula follows a distributed, agent-based architecture with two main planes:
- Orchestration Layer - Manages cluster state, schedules deployments, provisions nodes
- Compute Layer -
nebula-agentdaemon runs on every node, managing containers and GPUs
Next Steps
- Quick Start Guide - Get up and running in minutes
- Architecture Overview - Deep dive into system design
- Deployment Guide - Production deployment instructions