Skip to main content

Nebula

Deploy your models. Anywhere.

Nebula is an open-source platform designed to simplify the deployment and scaling of large language models (LLMs). The platform provides a developer experience akin to Vercel or Heroku, but specifically for AI models, abstracting away the complexities of infrastructure management such as CUDA, Docker, and Kubernetes.

The Problem

Machine learning infrastructure today is fragmented. Serving a model requires deep knowledge of:

  • CUDA and GPU drivers
  • Docker containerization
  • Kubernetes orchestration
  • Cloud provider APIs
  • Inference runtime configuration

Developers want to focus on building AI applications, not managing infrastructure.

The Solution

Nebula provides a "Vercel for LLMs" experience:

  1. Choose a model (e.g., mistralai/Mistral-7B-Instruct)
  2. Deploy with a single command: nebulactl deploy mistral
  3. Get an OpenAI-compatible endpoint within minutes

It works everywhere — from your laptop to AWS, GCP, or private clusters — with no vendor lock-in.

Target Audience

AI/ML Developers who need to:

  • Quickly deploy models to get accessible API endpoints
  • Test fine-tuned models without infrastructure overhead
  • Focus on model development, not DevOps
  • Avoid vendor lock-in across cloud providers

Key Features

  • Simple CLI - Deploy models with a single command
  • Multiple Runtimes - vLLM, TGI, Ollama support
  • GPU Management - Automatic GPU discovery and allocation
  • OpenAI-compatible API - Works with existing tools and SDKs
  • SSH Provisioning - Automatic node setup via SSH
  • Prometheus Metrics - Built-in observability

Quick Example

# Add your local machine as a compute node
nebulactl node add --local

# Deploy a model
nebulactl deploy mistralai/Mistral-7B-Instruct \
--runtime vllm \
--device gpu

# Test the endpoint
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"messages": [{"role": "user", "content": "Hello!"}]}'

Architecture Overview

Nebula follows a distributed, agent-based architecture with two main planes:

  • Orchestration Layer - Manages cluster state, schedules deployments, provisions nodes
  • Compute Layer - nebula-agent daemon runs on every node, managing containers and GPUs

Next Steps