Architecture Overview

Nebula is an open-source, cloud-agnostic platform for deploying and scaling Large Language Models (LLMs) across local, private, and cloud GPU infrastructure.

High-Level Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        USER INTERFACES                          │
│  CLI (nebulactl) | Web UI (React) | TUI | REST API             │
└──────────────────────────┬──────────────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────────────────┐
│                      CONTROL PLANE                              │
│ ┌─────────────┐ ┌──────────────┐ ┌────────────────────────┐   │
│ │  Scheduler  │ │  Provisioner │ │ REST/gRPC API Gateway │   │
│ ├─────────────┤ ├──────────────┤ ├────────────────────────┤   │
│ │  SSH Setup  │ │  Node Manager │ │ OpenAI-compatible    │   │
│ └─────────────┘ └──────────────┘ └────────────────────────┘   │
│                                                                 │
│                    SQLite Store                                │
│         (Nodes, Deployments, Stats, Events)                   │
└────────────────────┬────────────────────────────────────────────┘
                     │ gRPC (Agent Communication)
                     ▼
┌─────────────────────────────────────────────────────────────────┐
│               COMPUTE PLANE (GPU Nodes)                         │
│ ┌────────────────────────────────────────────────────────┐     │
│ │           Node Agent (nebula-agent)                   │     │
│ ├────────────────────────────────────────────────────────┤     │
│ │ • gRPC Server (Agent Service)                          │     │
│ │ • Docker Container Manager                             │     │
│ │ • GPU Monitoring (NVML integration)                    │     │
│ │ • Model Cache Manager                                  │     │
│ │ • State Management (SQLite agent.db)                   │     │
│ │ • Prometheus Metrics Export                            │     │
│ └───────────────┬──────────────┬──────────────────────────┘     │
│                 │              │                                │
│           ▼     │              ▼                                │
│ ┌────────────────────┐  ┌─────────────────┐                   │
│ │ Model Runtimes     │  │   Docker        │                   │
│ ├────────────────────┤  │   Containers    │                   │
│ │ • vLLM             │  │                 │                   │
│ │ • TGI              │  │ • Port Binding  │                   │
│ │ • Ollama           │  │ • GPU Passthrough                   │
│ └────────────────────┘  └─────────────────┘                   │
└─────────────────────────────────────────────────────────────────┘

Two-Plane Architecture

Control Plane (`platform/`)

The orchestration logic manages cluster state, schedules deployments, and provisions nodes.

Responsibilities:

Receive and validate user commands
Provision and manage compute nodes
Schedule deployments to appropriate nodes
Maintain cluster state (nodes, deployments)
Monitor node health via heartbeats

Compute Layer (`compute/`)

The nebula-agent daemon runs on every compute node and manages all local operations autonomously.

Responsibilities:

Download and cache model files
Pull Docker images and manage containers
Monitor GPU resources (NVML integration)
Execute deployment lifecycle (start, stop, health checks)
Persist local state for crash recovery
Expose Prometheus metrics
Report heartbeat to orchestrator

Core Components

1. nebulactl CLI

Location: cmd/nebulactl/

The primary user interface for all Nebula operations.

Commands:

nebulactl deploy - Deploy models via CLI flags or YAML spec
nebulactl deployment - Manage deployments (list, get, logs, restart, delete)
nebulactl node - Manage compute nodes (add, list, status, remove)

2. nebula-agent

Location: cmd/nebula-agent/, compute/

A stateful daemon running on each compute node.

gRPC API Endpoints:

GetCapabilities - Report node capabilities (GPUs, memory, etc.)
PrepareModel - Download and cache model files
StartRuntime - Launch model container
StopRuntime - Stop running deployment
GetStats - Real-time GPU and system metrics
GetHealth - Health check for deployment
ListDeployments - List all deployments on node
GetLogs - Stream container logs

3. Scheduler

Location: platform/service/deployment/scheduler.go

Selects the optimal node for each deployment based on resource requirements.

Scheduling Logic:

For GPU deployments: Select node with most available GPUs
For CPU deployments: Select node with most available CPU cores
Validate GPU memory and type constraints
Enforce homogeneous GPU requirements for multi-GPU deployments

4. GPU Monitor

Location: compute/gpu/gpu.go

Integrates with NVIDIA NVML to discover and monitor GPUs.

Capabilities:

Discover all NVIDIA GPUs on host
Track GPU UUID, model name, memory capacity
Monitor real-time metrics (utilization, memory, temperature, power)

5. Docker Manager

Location: compute/docker/manager.go

Manages the lifecycle of model containers.

Operations:

Pull Docker images for runtimes
Create containers with GPU passthrough
Start/stop/remove containers
Stream container logs
Health checks with configurable timeouts

6. Runtime Implementations

Location: compute/runtime/

Runtime	Image	Use Case	Device Support
vLLM	`vllm/vllm-openai`	OpenAI-compatible, high performance	GPU, CPU
TGI	`ghcr.io/huggingface/text-generation-inference`	Hugging Face models	GPU
Ollama	`ollama/ollama`	Local models, quantized	GPU, CPU

Technology Stack

Backend

Language: Go 1.24.0
RPC Framework: gRPC + Protocol Buffers 3
CLI Framework: urfave/cli v3
Containerization: Docker SDK
GPU Monitoring: NVIDIA go-nvml
Observability: Prometheus client
Database: SQLite

Frontend

Web UI: React + Tailwind CSS
TUI: Charmbracelet (bubbletea, lipgloss)

Directory Structure

nebula/
├── cmd/                      # Executable entry points
│   ├── nebulactl/            # CLI tool
│   ├── nebula-agent/         # Agent daemon
│   └── nebulad/              # Control plane server
│
├── platform/                 # Orchestration layer
│   ├── api/                  # gRPC API definitions
│   ├── client/               # gRPC client wrapper
│   ├── domain/               # Domain models
│   ├── service/              # Business logic
│   └── storage/              # SQLite persistence
│
├── compute/                  # Agent layer
│   ├── daemon/               # Agent daemon implementation
│   ├── docker/               # Docker container manager
│   ├── gpu/                  # NVIDIA GPU monitoring
│   ├── cache/                # Model cache manager
│   ├── runtime/              # Runtime implementations
│   └── storage/              # Agent-side storage
│
├── shared/                   # Shared utilities
└── ui/                       # Web UI

Deployment Patterns

Single Node (Local Development)

Developer Machine
└── nebulactl (CLI) ←→ nebula-agent (localhost:9091)
    └── vLLM container (model serving)

Multi-Node (Enterprise)

Control Plane (separate server)
├─→ Agent Node 1 (GPU)
├─→ Agent Node 2 (GPU)
├─→ Agent Node 3 (GPU)
└─→ Agent Node 4 (CPU-only)

High-Level Architecture​

Two-Plane Architecture​

Control Plane (platform/)​

Compute Layer (compute/)​

Core Components​

1. nebulactl CLI​

2. nebula-agent​

3. Scheduler​

4. GPU Monitor​

5. Docker Manager​

6. Runtime Implementations​

Technology Stack​

Backend​

Frontend​

Directory Structure​

Deployment Patterns​

Single Node (Local Development)​

Multi-Node (Enterprise)​