Architecture Overview
Nebula is an open-source, cloud-agnostic platform for deploying and scaling Large Language Models (LLMs) across local, private, and cloud GPU infrastructure.
High-Level Architecture
┌─────────────────────────────────────────────────────────────────┐
│ USER INTERFACES │
│ CLI (nebulactl) | Web UI (React) | TUI | REST API │
└──────────────────────────┬──────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ CONTROL PLANE │
│ ┌─────────────┐ ┌──────────────┐ ┌────────────────────────┐ │
│ │ Scheduler │ │ Provisioner │ │ REST/gRPC API Gateway │ │
│ ├─────────────┤ ├──────────────┤ ├────────────────────────┤ │
│ │ SSH Setup │ │ Node Manager │ │ OpenAI-compatible │ │
│ └─────────────┘ └──────────────┘ └────────────────────────┘ │
│ │
│ SQLite Store │
│ (Nodes, Deployments, Stats, Events) │
└────────────────────┬────────────────────────────────────────────┘
│ gRPC (Agent Communication)
▼
┌─────────────────────────────────────────────────────────────────┐
│ COMPUTE PLANE (GPU Nodes) │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ Node Agent (nebula-agent) │ │
│ ├────────────────────────────────────────────────────────┤ │
│ │ • gRPC Server (Agent Service) │ │
│ │ • Docker Container Manager │ │
│ │ • GPU Monitoring (NVML integration) │ │
│ │ • Model Cache Manager │ │
│ │ • State Management (SQLite agent.db) │ │
│ │ • Prometheus Metrics Export │ │
│ └───────────────┬──────────────┬──────────────────────────┘ │
│ │ │ │
│ ▼ │ ▼ │
│ ┌────────────────────┐ ┌─────────────────┐ │
│ │ Model Runtimes │ │ Docker │ │
│ ├────────────────────┤ │ Containers │ │
│ │ • vLLM │ │ │ │
│ │ • TGI │ │ • Port Binding │ │
│ │ • Ollama │ │ • GPU Passthrough │
│ └────────────────────┘ └─────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Two-Plane Architecture
Control Plane (platform/)
The orchestration logic manages cluster state, schedules deployments, and provisions nodes.
Responsibilities:
- Receive and validate user commands
- Provision and manage compute nodes
- Schedule deployments to appropriate nodes
- Maintain cluster state (nodes, deployments)
- Monitor node health via heartbeats
Compute Layer (compute/)
The nebula-agent daemon runs on every compute node and manages all local operations autonomously.
Responsibilities:
- Download and cache model files
- Pull Docker images and manage containers
- Monitor GPU resources (NVML integration)
- Execute deployment lifecycle (start, stop, health checks)
- Persist local state for crash recovery
- Expose Prometheus metrics
- Report heartbeat to orchestrator
Core Components
1. nebulactl CLI
Location: cmd/nebulactl/
The primary user interface for all Nebula operations.
Commands:
nebulactl deploy- Deploy models via CLI flags or YAML specnebulactl deployment- Manage deployments (list, get, logs, restart, delete)nebulactl node- Manage compute nodes (add, list, status, remove)
2. nebula-agent
Location: cmd/nebula-agent/, compute/
A stateful daemon running on each compute node.
gRPC API Endpoints:
GetCapabilities- Report node capabilities (GPUs, memory, etc.)PrepareModel- Download and cache model filesStartRuntime- Launch model containerStopRuntime- Stop running deploymentGetStats- Real-time GPU and system metricsGetHealth- Health check for deploymentListDeployments- List all deployments on nodeGetLogs- Stream container logs
3. Scheduler
Location: platform/service/deployment/scheduler.go
Selects the optimal node for each deployment based on resource requirements.
Scheduling Logic:
- For GPU deployments: Select node with most available GPUs
- For CPU deployments: Select node with most available CPU cores
- Validate GPU memory and type constraints
- Enforce homogeneous GPU requirements for multi-GPU deployments
4. GPU Monitor
Location: compute/gpu/gpu.go
Integrates with NVIDIA NVML to discover and monitor GPUs.
Capabilities:
- Discover all NVIDIA GPUs on host
- Track GPU UUID, model name, memory capacity
- Monitor real-time metrics (utilization, memory, temperature, power)
5. Docker Manager
Location: compute/docker/manager.go
Manages the lifecycle of model containers.
Operations:
- Pull Docker images for runtimes
- Create containers with GPU passthrough
- Start/stop/remove containers
- Stream container logs
- Health checks with configurable timeouts
6. Runtime Implementations
Location: compute/runtime/
| Runtime | Image | Use Case | Device Support |
|---|---|---|---|
| vLLM | vllm/vllm-openai | OpenAI-compatible, high performance | GPU, CPU |
| TGI | ghcr.io/huggingface/text-generation-inference | Hugging Face models | GPU |
| Ollama | ollama/ollama | Local models, quantized | GPU, CPU |
Technology Stack
Backend
- Language: Go 1.24.0
- RPC Framework: gRPC + Protocol Buffers 3
- CLI Framework: urfave/cli v3
- Containerization: Docker SDK
- GPU Monitoring: NVIDIA go-nvml
- Observability: Prometheus client
- Database: SQLite
Frontend
- Web UI: React + Tailwind CSS
- TUI: Charmbracelet (bubbletea, lipgloss)
Directory Structure
nebula/
├── cmd/ # Executable entry points
│ ├── nebulactl/ # CLI tool
│ ├── nebula-agent/ # Agent daemon
│ └── nebulad/ # Control plane server
│
├── platform/ # Orchestration layer
│ ├── api/ # gRPC API definitions
│ ├── client/ # gRPC client wrapper
│ ├── domain/ # Domain models
│ ├── service/ # Business logic
│ └── storage/ # SQLite persistence
│
├── compute/ # Agent layer
│ ├── daemon/ # Agent daemon implementation
│ ├── docker/ # Docker container manager
│ ├── gpu/ # NVIDIA GPU monitoring
│ ├── cache/ # Model cache manager
│ ├── runtime/ # Runtime implementations
│ └── storage/ # Agent-side storage
│
├── shared/ # Shared utilities
└── ui/ # Web UI
Deployment Patterns
Single Node (Local Development)
Developer Machine
└── nebulactl (CLI) ←→ nebula-agent (localhost:9091)
└── vLLM container (model serving)
Multi-Node (Enterprise)
Control Plane (separate server)
├─→ Agent Node 1 (GPU)
├─→ Agent Node 2 (GPU)
├─→ Agent Node 3 (GPU)
└─→ Agent Node 4 (CPU-only)