Nebula

Deploy your models. Anywhere.

Nebula is an open-source platform designed to simplify the deployment and scaling of large language models (LLMs). The platform provides a developer experience akin to Vercel or Heroku, but specifically for AI models, abstracting away the complexities of infrastructure management such as CUDA, Docker, and Kubernetes.

The Problem

Machine learning infrastructure today is fragmented. Serving a model requires deep knowledge of:

CUDA and GPU drivers
Docker containerization
Kubernetes orchestration
Cloud provider APIs
Inference runtime configuration

Developers want to focus on building AI applications, not managing infrastructure.

The Solution

Nebula provides a "Vercel for LLMs" experience:

Choose a model (e.g., mistralai/Mistral-7B-Instruct)
Deploy with a single command: nebulactl deploy mistral
Get an OpenAI-compatible endpoint within minutes

It works everywhere — from your laptop to AWS, GCP, or private clusters — with no vendor lock-in.

Target Audience

AI/ML Developers who need to:

Quickly deploy models to get accessible API endpoints
Test fine-tuned models without infrastructure overhead
Focus on model development, not DevOps
Avoid vendor lock-in across cloud providers

Key Features

Simple CLI - Deploy models with a single command
Multiple Runtimes - vLLM, TGI, Ollama support
GPU Management - Automatic GPU discovery and allocation
OpenAI-compatible API - Works with existing tools and SDKs
SSH Provisioning - Automatic node setup via SSH
Prometheus Metrics - Built-in observability

Quick Example

# Add your local machine as a compute node
nebulactl node add --local

# Deploy a model
nebulactl deploy mistralai/Mistral-7B-Instruct \
  --runtime vllm \
  --device gpu

# Test the endpoint
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"messages": [{"role": "user", "content": "Hello!"}]}'

Architecture Overview

Nebula follows a distributed, agent-based architecture with two main planes:

Orchestration Layer - Manages cluster state, schedules deployments, provisions nodes
Compute Layer - nebula-agent daemon runs on every node, managing containers and GPUs

Next Steps

Quick Start Guide - Get up and running in minutes
Architecture Overview - Deep dive into system design
Deployment Guide - Production deployment instructions

The Problem​

The Solution​

Target Audience​

Key Features​

Quick Example​

Architecture Overview​

Next Steps​