Skip to main content

Nebula Daemon

Central control plane server for managing the Nebula platform. It orchestrates GPU agents, handles model deployments, and provides an OpenAI-compatible API gateway.

Features

Core Daemon (cmd/nebulad/main.go)

  • CLI based on urfave/cli with flag and environment variable support
  • Structured logging (slog) with JSON and text format support
  • Graceful shutdown handling
  • gRPC interceptors for logging and panic recovery
  • gRPC reflection for debugging

gRPC Server (Port 9090)

  • Agent registration and deregistration
  • Heartbeat processing with offline detection
  • Command queuing for agents
  • Bidirectional communication with nebula-agent instances

HTTP Gateway (Port 8080)

  • OpenAI-compatible REST API
  • Chat completions with streaming support (SSE)
  • Text completions and embeddings
  • Model listing and health checks
  • CORS support for web clients

Registry Service

  • In-memory agent state tracking
  • Automatic offline detection (90s timeout)
  • Pending commands queue for agents
  • Caddy and Route53 integration for dynamic routing

Deployment Service

  • Full deployment lifecycle management
  • Automatic node scheduling
  • Runtime adapter support (vLLM, TGI, Ollama)
  • Container logs and events streaming

Storage

  • SQLite database for persistent state
  • Node and deployment tracking
  • Event history

Usage

Running the Daemon

# Basic usage with defaults
./nebulad

# With custom ports
./nebulad --grpc-port 9090 --http-port 8080

# With Caddy integration
./nebulad --caddy-admin-url http://localhost:2019 --gateway-domain gateway.example.com

# With Route53 DNS management
./nebulad \
--route53-hosted-zone-id Z1234567890ABC \
--route53-access-key-id AKIA... \
--route53-secret-access-key ... \
--route53-region us-east-1 \
--gateway-ip 10.0.0.1

# With debug logging
./nebulad --log-level debug --log-format json

CLI Options

OptionDescriptionDefault
--data-dirData directory for SQLite database/var/lib/nebulad
--grpc-portgRPC server port for agent communication9090
--http-portHTTP API server port8080
--api-prefixAPI prefix for HTTP endpoints/v1
--caddy-admin-urlCaddy Admin API URLhttp://localhost:2019
--gateway-domainBase domain for agent gatewaysgateway.nebulactl.com
--heartbeat-timeoutSeconds before marking agent offline90
--default-heartbeat-intervalDefault heartbeat interval for agents30
--log-levelLog level: debug, info, warn, errorinfo
--log-formatLog format: json, texttext
--route53-hosted-zone-idAWS Route 53 hosted zone ID-
--route53-access-key-idAWS access key ID-
--route53-secret-access-keyAWS secret access key-
--route53-regionAWS regionus-east-1
--route53-ttlDNS record TTL in seconds300
--gateway-ipGateway IP for DNS A records-

Configuration

Environment Variables

All CLI options can be set via environment variables with the NEBULAD_ prefix:

VariableDescription
NEBULAD_DATA_DIRData directory path
NEBULAD_GRPC_PORTgRPC server port
NEBULAD_HTTP_PORTHTTP API port
NEBULAD_API_PREFIXAPI prefix
NEBULAD_CADDY_ADMIN_URLCaddy Admin API URL
NEBULAD_GATEWAY_DOMAINGateway base domain
NEBULAD_HEARTBEAT_TIMEOUTOffline timeout
NEBULAD_DEFAULT_HEARTBEAT_INTERVALHeartbeat interval
NEBULAD_LOG_LEVELLog level
NEBULAD_LOG_FORMATLog format
NEBULAD_ROUTE53_HOSTED_ZONE_IDRoute 53 zone ID
NEBULAD_ROUTE53_ACCESS_KEY_IDAWS access key
NEBULAD_ROUTE53_SECRET_ACCESS_KEYAWS secret key
NEBULAD_ROUTE53_REGIONAWS region
NEBULAD_ROUTE53_TTLDNS TTL
NEBULAD_GATEWAY_IPGateway IP

Example: Production Configuration

export NEBULAD_DATA_DIR=/var/lib/nebulad
export NEBULAD_GRPC_PORT=9090
export NEBULAD_HTTP_PORT=8080
export NEBULAD_CADDY_ADMIN_URL=http://localhost:2019
export NEBULAD_GATEWAY_DOMAIN=gateway.mycompany.com
export NEBULAD_LOG_LEVEL=info
export NEBULAD_LOG_FORMAT=json

# Optional: Route53 DNS management
export NEBULAD_ROUTE53_HOSTED_ZONE_ID=Z1234567890ABC
export NEBULAD_ROUTE53_ACCESS_KEY_ID=AKIA...
export NEBULAD_ROUTE53_SECRET_ACCESS_KEY=...
export NEBULAD_ROUTE53_REGION=us-east-1
export NEBULAD_GATEWAY_IP=10.0.0.1

./nebulad

gRPC API Reference

The daemon exposes a PlatformService for agent communication on the gRPC port (default 9090).

Service Definition

service PlatformService {
rpc Register(RegisterRequest) returns (RegisterResponse);
rpc Heartbeat(HeartbeatRequest) returns (HeartbeatResponse);
rpc Deregister(DeregisterRequest) returns (DeregisterResponse);
}

Register RPC

Registers a new agent or re-registers an existing one.

Request:

message RegisterRequest {
string node_id = 1; // UUID (auto-generated if empty)
string node_name = 2; // Human-readable node name
string version = 3; // Agent version
string agent_address = 4; // host:port for callbacks
NodeCapabilities capabilities = 5;
repeated DeploymentInfo deployments = 6;
}

message NodeCapabilities {
repeated GPUInfo gpus = 1;
int64 total_memory_bytes = 2;
int64 available_memory_bytes = 3;
string os = 4;
string arch = 5;
int32 cpu_cores = 6;
string cuda_version = 7;
}

Response:

message RegisterResponse {
string node_id = 1; // Assigned node ID
int32 heartbeat_interval = 2; // Heartbeat interval in seconds
string gateway_subdomain = 3; // Assigned subdomain (e.g., abc123.gateway.nebulactl.com)
}

Example with grpcurl:

grpcurl -plaintext -d '{
"node_name": "gpu-node-1",
"version": "0.1.0",
"agent_address": "192.168.1.100:9091",
"capabilities": {
"gpus": [
{"uuid": "GPU-abc123", "name": "NVIDIA A100", "memory_bytes": 42949672960}
],
"total_memory_bytes": 137438953472,
"os": "linux",
"arch": "amd64",
"cpu_cores": 32,
"cuda_version": "12.1"
}
}' localhost:9090 proto.PlatformService/Register

Heartbeat RPC

Sends periodic status updates from agents and receives pending commands.

Request:

message HeartbeatRequest {
string node_id = 1;
NodeStats stats = 2;
HealthStatus health = 3;
repeated DeploymentInfo deployments = 4;
int64 timestamp = 5;
}

message NodeStats {
float cpu_usage_percent = 1;
int64 memory_used_bytes = 2;
int64 memory_total_bytes = 3;
repeated GPUStats gpu_stats = 4;
}

message GPUStats {
string uuid = 1;
float utilization_percent = 2;
int64 memory_used_bytes = 3;
int64 memory_total_bytes = 4;
float temperature_celsius = 5;
float power_watts = 6;
}

message HealthStatus {
string status = 1; // "healthy", "degraded", "unhealthy"
repeated string issues = 2;
int64 uptime_seconds = 3;
}

Response:

message HeartbeatResponse {
bool acknowledged = 1;
NodeConfig config = 2;
repeated Command commands = 3; // Pending commands for agent
}

message Command {
string id = 1;
string type = 2; // "start", "stop", "restart", "delete"
string target_id = 3; // Deployment ID
map<string, string> params = 4;
}

Example with grpcurl:

grpcurl -plaintext -d '{
"node_id": "550e8400-e29b-41d4-a716-446655440000",
"stats": {
"cpu_usage_percent": 45.5,
"memory_used_bytes": 68719476736,
"memory_total_bytes": 137438953472,
"gpu_stats": [
{
"uuid": "GPU-abc123",
"utilization_percent": 80.0,
"memory_used_bytes": 34359738368,
"memory_total_bytes": 42949672960,
"temperature_celsius": 72.0,
"power_watts": 250.0
}
]
},
"health": {
"status": "healthy",
"uptime_seconds": 86400
},
"timestamp": 1702400000
}' localhost:9090 proto.PlatformService/Heartbeat

Deregister RPC

Gracefully removes an agent from the platform.

Request:

message DeregisterRequest {
string node_id = 1;
string reason = 2; // Optional reason for deregistration
}

Response:

message DeregisterResponse {
bool success = 1;
string message = 2;
}

Example with grpcurl:

grpcurl -plaintext -d '{
"node_id": "550e8400-e29b-41d4-a716-446655440000",
"reason": "graceful shutdown"
}' localhost:9090 proto.PlatformService/Deregister

HTTP API Reference (OpenAI-compatible)

The daemon provides an OpenAI-compatible HTTP API on the HTTP port (default 8080).

POST /v1/chat/completions

Generate chat completions using deployed models.

Request:

curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <token>" \
-d '{
"model": "llama-3.1-8b",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello, how are you?"}
],
"temperature": 0.7,
"max_tokens": 2048,
"stream": false
}'

Request Body:

FieldTypeRequiredDescription
modelstringYesModel name (deployment name)
messagesarrayYesArray of message objects
temperaturefloatNoSampling temperature (0-2)
top_pfloatNoNucleus sampling parameter
max_tokensintNoMaximum tokens to generate
streamboolNoEnable streaming response
stoparrayNoStop sequences
presence_penaltyfloatNoPresence penalty (-2 to 2)
frequency_penaltyfloatNoFrequency penalty (-2 to 2)

Response:

{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1702400000,
"model": "llama-3.1-8b",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm doing well, thank you for asking. How can I help you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 18,
"total_tokens": 43
}
}

Streaming Response:

When stream: true, the response uses Server-Sent Events (SSE):

curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <token>" \
-d '{"model": "llama-3.1-8b", "messages": [{"role": "user", "content": "Hello"}], "stream": true}'

Response chunks:

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1702400000,"model":"llama-3.1-8b","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1702400000,"model":"llama-3.1-8b","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1702400000,"model":"llama-3.1-8b","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1702400000,"model":"llama-3.1-8b","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

POST /v1/completions

Generate text completions (legacy API).

Request:

curl -X POST http://localhost:8080/v1/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <token>" \
-d '{
"model": "llama-3.1-8b",
"prompt": "Once upon a time",
"max_tokens": 100,
"temperature": 0.7
}'

Response:

{
"id": "cmpl-abc123",
"object": "text_completion",
"created": 1702400000,
"model": "llama-3.1-8b",
"choices": [
{
"text": ", in a land far away, there lived a wise old wizard...",
"index": 0,
"finish_reason": "length"
}
],
"usage": {
"prompt_tokens": 4,
"completion_tokens": 100,
"total_tokens": 104
}
}

POST /v1/embeddings

Generate embeddings for text.

Request:

curl -X POST http://localhost:8080/v1/embeddings \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <token>" \
-d '{
"model": "text-embedding-model",
"input": "The quick brown fox jumps over the lazy dog"
}'

Response:

{
"object": "list",
"data": [
{
"object": "embedding",
"embedding": [0.0023, -0.0094, 0.0156, ...],
"index": 0
}
],
"model": "text-embedding-model",
"usage": {
"prompt_tokens": 9,
"total_tokens": 9
}
}

GET /v1/models

List all available models (deployed and ready).

Request:

curl http://localhost:8080/v1/models \
-H "Authorization: Bearer <token>"

Response:

{
"object": "list",
"data": [
{
"id": "llama-3.1-8b",
"object": "model",
"created": 1702400000,
"owned_by": "nebula"
},
{
"id": "mistral-7b",
"object": "model",
"created": 1702400000,
"owned_by": "nebula"
}
]
}

GET /health

Health check endpoint.

Request:

curl http://localhost:8080/health

Response:

{
"status": "ok"
}

Services

Registry Service

Manages agent registration, heartbeats, and health monitoring.

Location: platform/service/registry/service.go

Key Operations:

OperationDescription
RegisterAgentCreates/updates node, configures Caddy/Route53 routes
ProcessHeartbeatUpdates LastHeartbeat, updates status in DB
DeregisterAgentRemoves Caddy/Route53 routes, marks as offline
GetPendingCommandsReturns and clears pending commands for agent
QueueCommandAdds command for delivery on next heartbeat
StartOfflineCheckerBackground goroutine checking offline agents

Agent State Structure:

type AgentState struct {
NodeID string
LastHeartbeat time.Time
Status string // "online", "offline"
AgentAddress string
Version string
Capabilities *NodeCapabilities
}

Offline Detection:

  • Background checker runs every 30 seconds
  • Agents are marked offline if no heartbeat received for 90 seconds
  • Offline agents have their Caddy routes removed

Deployment Service

Manages the full lifecycle of model deployments.

Location: platform/service/deployment/service.go

Key Operations:

MethodDescription
DeployEnd-to-end: select node, start runtime on agent, create deployment
ListGet all deployments from all online nodes
GetGet specific deployment by ID
StartRestart a stopped deployment
StopStop a running deployment
DeleteDelete deployment and stop container
RestartRestart deployment (stop + start)
GetLogsGet container logs
GetEventsGet deployment event history
ExecCommandExecute command inside container

Deployment Flow:

1. Client sends Deploy request
2. Generate Deployment ID (UUID)
3. Auto-detect ModelSource (ollama:// or hf://)
4. Select node via scheduler (if not specified)
5. Connect to agent via gRPC
6. Generate access token (if not public)
7. Send StartRuntimeRequest to agent
8. Return Deployment with endpoint info

Scheduler Logic:

  • GPU deployments: Select node with most available GPUs
  • CPU deployments: Select node with most available CPU cores
  • Validates GPU memory and type constraints
  • Enforces homogeneous GPU requirements for multi-GPU

Gateway Router

Routes incoming API requests to the appropriate model deployments.

Location: platform/gateway/router.go

Route Structure:

type Route struct {
DeploymentID string // Deployment ID
Endpoint string // http://host:port
Token string // Access token
Runtime string // ollama, vllm, tgi
IsPublic bool // Requires auth?
Model string // Original model name
}

Routing Process:

1. Client sends POST /v1/chat/completions with model field
2. Router resolves model -> Deployment -> Node
3. Select appropriate Runtime adapter (Ollama/vLLM/TGI)
4. Adapter forwards request to endpoint with token
5. Result returned in OpenAI format

Runtime Adapters:

RuntimeAdapterNotes
OllamaOllamaAdapterCustom format translation
vLLMPassthroughAdapterOpenAI-compatible
TGIPassthroughAdapterOpenAI-compatible
TritonPassthroughAdapterOpenAI-compatible

Integrations

Caddy Integration

Provides automatic HTTPS and reverse proxy for agent gateways.

Location: platform/service/caddy/client.go

Features:

  • Dynamic route creation via Caddy Admin API
  • Automatic TLS certificate management
  • Reverse proxy from subdomain to agent

How it works:

  1. When an agent registers, nebulad creates a unique subdomain
  2. Caddy is configured to proxy https://{subdomain}.gateway.nebulactl.com to the agent's auth proxy (port 8888)
  3. All traffic is automatically encrypted with TLS

Example Route:

https://abc123.gateway.nebulactl.com -> http://agent-ip:8888

Requirements:

  • Caddy server running with Admin API enabled (port 2019)
  • Wildcard DNS record pointing to Caddy server

Route53 Integration

Optional AWS Route 53 integration for DNS management.

Location: platform/service/route53/client.go

Features:

  • Automatic A record creation for agent subdomains
  • Configurable TTL
  • Regional support

Configuration:

./nebulad \
--route53-hosted-zone-id Z1234567890ABC \
--route53-access-key-id AKIA... \
--route53-secret-access-key ... \
--route53-region us-east-1 \
--route53-ttl 300 \
--gateway-ip 10.0.0.1

How it works:

  1. When an agent registers, nebulad creates an A record
  2. Record points subdomain to the gateway IP
  3. Combined with Caddy, enables full HTTPS access to agents

Architecture

┌─────────────────────────────────────────────────────────────────┐
│ nebulad │
│ │
│ ┌─────────────────┐ ┌─────────────────┐ ┌────────────────┐ │
│ │ gRPC Server │ │ HTTP Gateway │ │ SQLite Store │ │
│ │ :9090 │ │ :8080 │ │ │ │
│ │ │ │ │ │ /var/lib/ │ │
│ │ - Register │ │ - /v1/chat │ │ nebulad/ │ │
│ │ - Heartbeat │ │ - /v1/models │ │ nebula.db │ │
│ │ - Deregister │ │ - /health │ │ │ │
│ └────────┬────────┘ └────────┬────────┘ └───────┬────────┘ │
│ │ │ │ │
│ ┌────────┴────────────────────┴────────────────────┴────────┐ │
│ │ Services │ │
│ │ │ │
│ │ ┌──────────────┐ ┌────────────────┐ ┌───────────────┐ │ │
│ │ │ Registry │ │ Deployment │ │ Gateway │ │ │
│ │ │ Service │ │ Service │ │ Router │ │ │
│ │ │ │ │ │ │ │ │ │
│ │ │ - Agent │ │ - Deploy │ │ - Model │ │ │
│ │ │ tracking │ │ - Start/Stop │ │ routing │ │ │
│ │ │ - Offline │ │ - Scheduling │ │ - Runtime │ │ │
│ │ │ detection │ │ - Logs/Events │ │ adapters │ │ │
│ │ └──────────────┘ └────────────────┘ └───────────────┘ │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ External Integrations │ │
│ │ │ │
│ │ ┌────────────────────┐ ┌─────────────────────────────┐ │ │
│ │ │ Caddy Client │ │ Route53 Client │ │ │
│ │ │ │ │ │ │ │
│ │ │ - SSL termination │ │ - DNS A record management │ │ │
│ │ │ - Reverse proxy │ │ - Subdomain creation │ │ │
│ │ │ - Dynamic routes │ │ - (Optional) │ │ │
│ │ └────────────────────┘ └─────────────────────────────┘ │ │
│ └───────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘

│ gRPC (Agent Communication)

┌─────────────────────────────────────────────────────────────────┐
│ GPU Nodes (nebula-agent) │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Node 1 │ │ Node 2 │ │ Node 3 │ ... │
│ │ 4x A100 │ │ 2x H100 │ │ 8x A10G │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────────┘

Directory Structure

platform/
├── api/
│ └── grpc/
│ ├── platform.proto # Platform service definition
│ ├── agent.proto # Agent service definition
│ ├── proto/ # Generated protobuf files
│ └── server/
│ ├── platform_server.go # gRPC server for agents
│ └── agent_server.go

├── gateway/
│ ├── server.go # HTTP Gateway server
│ ├── router.go # Model routing logic
│ ├── api/
│ │ └── types.go # OpenAI API types
│ └── adapter/
│ ├── adapter.go # Runtime adapter interface
│ ├── ollama.go # Ollama adapter
│ └── passthrough.go # Passthrough adapter

├── service/
│ ├── registry/
│ │ └── service.go # Agent registration service
│ ├── deployment/
│ │ ├── service.go # Deployment management
│ │ └── scheduler.go # Node scheduling
│ ├── caddy/
│ │ └── client.go # Caddy Admin API client
│ └── route53/
│ ├── client.go # AWS Route 53 client
│ └── short_uuid.go # Short UUID generation

├── storage/
│ ├── store.go # Storage interface
│ └── sqlite/
│ └── store.go # SQLite implementation

├── domain/
│ ├── node.go # Node domain model
│ ├── deployment.go # Deployment domain model
│ └── errors.go # Domain errors

└── client/
└── grpc_client.go # Client for agent communication

Deployment Scenarios

Development (Single Node)

# Terminal 1: Start nebulad
./nebulad --data-dir ./data --log-level debug

# Terminal 2: Start nebula-agent
./nebula-agent --control-plane localhost:9090

# Terminal 3: Deploy a model
./nebulactl deploy --model llama3.2:1b --runtime ollama

Production (with Caddy)

  1. Setup Caddy with wildcard certificate:
{
admin :2019
}

*.gateway.example.com {
# Routes added dynamically by nebulad
}
  1. Configure DNS: Create wildcard A record *.gateway.example.com pointing to Caddy server

  2. Start nebulad:

./nebulad \
--data-dir /var/lib/nebulad \
--caddy-admin-url http://localhost:2019 \
--gateway-domain gateway.example.com \
--log-level info \
--log-format json

Production (with Route53)

./nebulad \
--data-dir /var/lib/nebulad \
--caddy-admin-url http://localhost:2019 \
--gateway-domain gateway.example.com \
--route53-hosted-zone-id Z1234567890ABC \
--route53-access-key-id AKIA... \
--route53-secret-access-key ... \
--route53-region us-east-1 \
--gateway-ip 10.0.0.1 \
--log-level info \
--log-format json

Logging

The daemon uses structured logging (slog) at all levels.

Log Levels

LevelDescription
debugDetailed debugging information
infoGeneral operational information
warnWarning messages
errorError conditions

Example Output (text format)

level=INFO msg="Starting Nebula Daemon" version=0.1.0
level=INFO msg="SQLite store initialized" path=/var/lib/nebulad/nebula.db
level=INFO msg="Caddy client initialized" url=http://localhost:2019
level=INFO msg="Registry service started"
level=INFO msg="Deployment service started"
level=INFO msg="HTTP Gateway starting" port=8080 prefix=/v1
level=INFO msg="gRPC server starting" port=9090
level=INFO msg="Nebula Daemon is ready"
level=INFO msg="Agent registered" node_id=550e8400-e29b-41d4-a716-446655440000 name=gpu-node-1
level=INFO msg="Heartbeat received" node_id=550e8400-e29b-41d4-a716-446655440000

Example Output (JSON format)

{"time":"2024-12-12T10:00:00Z","level":"INFO","msg":"Starting Nebula Daemon","version":"0.1.0"}
{"time":"2024-12-12T10:00:00Z","level":"INFO","msg":"Agent registered","node_id":"550e8400-e29b-41d4-a716-446655440000","name":"gpu-node-1"}

Building

Build for Current Platform

go build -o nebulad ./cmd/nebulad

Cross-compile for Linux

# Linux AMD64
GOOS=linux GOARCH=amd64 go build -o nebulad-linux-amd64 ./cmd/nebulad

# Linux ARM64
GOOS=linux GOARCH=arm64 go build -o nebulad-linux-arm64 ./cmd/nebulad

Build with Version Information

VERSION=$(git describe --tags --always)
BUILD_TIME=$(date -u +"%Y-%m-%dT%H:%M:%SZ")

go build -ldflags "-X main.Version=$VERSION -X main.BuildTime=$BUILD_TIME" \
-o nebulad ./cmd/nebulad

System Requirements

  • Go 1.24+
  • SQLite (embedded, no external dependency)
  • Network access to GPU nodes
  • (Optional) Caddy server for HTTPS gateway
  • (Optional) AWS credentials for Route53 DNS management