gRPC Connection Types
Nebula supports two methods of connecting to agents via gRPC:
- Direct - Direct connection to agent's gRPC server
- SSH - Connection via SSH tunnel
Architecture
In version 1, Nebula agent operates fully autonomously and doesn't know about control plane.
Key Principles
-
Agent is autonomous: Nebula Agent is a standalone gRPC server that:
- Manages local resources (GPU, Docker)
- Runs model runtime containers
- Exports metrics
- Does NOT connect to control plane
- Does NOT initiate outgoing connections
-
nebulactl is the initiator: Only
nebulactl:- Initiates gRPC connections to agents
- Stores connection settings in local SQLite DB
- Manages deployments via agent gRPC API
-
Two connection methods:
- Direct - Direct connection to agent's gRPC port
- SSH - Connection via SSH tunnel (for agents behind NAT/firewall)
Interaction Diagram
┌─────────────┐ gRPC ┌──────────────┐
│ │ ─────────────────────> │ │
│ nebulactl │ (direct or SSH) │ Nebula Agent │
│ │ <───────────────────── │ (gRPC) │
└─────────────┘ └──────────────┘
│ │
│ │
v v
┌─────────────┐ ┌──────────────┐
│ SQLite │ │ Docker │
│ (~/.nebula) │ │ + GPUs │
└─────────────┘ └──────────────┘
Important: Agent does NOT know about nebulactl and contains no control plane references in its configuration.
Connection Type: Direct
Direct connection is used when:
- Agent is directly accessible over network
- Agent's gRPC port is open and available
- Typically used for local agents
Usage Example
# Add local node with direct connection
nebulactl node add localhost --local --connection-type direct
# Or explicitly specify connection-type for remote host
nebulactl node add 192.168.1.100 --connection-type direct --user admin
Technical Details
- Connection:
host:grpc_port - Transport: Direct TCP/IP
- Requirements: gRPC port must be accessible
Connection Type: SSH
SSH tunneling is used when:
- Agent is behind NAT/firewall
- Direct access to gRPC port is not possible
- Only SSH access to server is available
Usage Example
# Add SSH node with SSH tunneling (default)
nebulactl node add user@remote.server.com \
--user admin \
--port 22 \
--key-path ~/.ssh/id_rsa
# Explicitly specify connection-type
nebulactl node add 10.0.0.50 \
--connection-type ssh \
--user root \
--key-path ~/.ssh/nebula_key
Technical Details
- SSH client establishes connection to remote server
- Local listener is created on
127.0.0.1:<random-port> - SSH tunnel forwards traffic to remote
127.0.0.1:grpc_port - gRPC client connects to local listener
nebulactl -> SSH Client -> SSH Tunnel -> Remote Agent (127.0.0.1:9091)
| ^
v |
Local Listener (127.0.0.1:random) --+
^
|
gRPC Client
Database
Connection settings are stored in ~/.nebula/nebula.db:
CREATE TABLE nodes (
id TEXT PRIMARY KEY,
name TEXT NOT NULL,
host TEXT NOT NULL,
grpc_port INTEGER NOT NULL,
connection_type TEXT NOT NULL DEFAULT 'direct', -- 'direct' or 'ssh'
ssh_user TEXT,
ssh_port INTEGER,
ssh_key_path TEXT,
...
);
Code Usage
Creating Client
import (
"context"
"nebula/platform/client"
"nebula/platform/storage"
)
// Get node from DB
node, err := store.GetNode(ctx, nodeID)
if err != nil {
return err
}
// Create client (automatically determines connection type)
agentClient, err := client.NewAgentClient(ctx, node)
if err != nil {
return err
}
defer agentClient.Close()
// Use client
health, err := agentClient.GetHealth(ctx)
Supported Methods
GetHealth()- Agent health statusGetCapabilities()- Node resource informationGetStats()- Current metricsPrepareModel()- Model preparationStartRuntime()- Start runtime containerStopRuntime()- Stop runtime
Testing Connection
# Test connection to agent
nebulactl node test <node-id>
The command:
- Establishes connection (direct or SSH)
- Checks health endpoint
- Gets capabilities
- Outputs node information
Example Output
Testing connection to node: my-gpu-server
Connection Type: ssh
SSH: admin@192.168.1.100:22
Connecting to agent...
✅ Connection established
Testing health endpoint...
✅ Health Status: healthy
Uptime: 3600 seconds
Testing capabilities endpoint...
✅ Capabilities retrieved
CPU Cores: 32
Memory: 128.00 GB
OS: linux x86_64
Driver Version: 535.129.03
CUDA Version: 12.2
GPUs: 4
GPU 0: NVIDIA A100-SXM4-40GB (40.00 GB)
GPU 1: NVIDIA A100-SXM4-40GB (40.00 GB)
GPU 2: NVIDIA A100-SXM4-40GB (40.00 GB)
GPU 3: NVIDIA A100-SXM4-40GB (40.00 GB)
Supported Runtimes: [vllm tgi]
✅ All tests passed successfully!
CLI Commands
Add Node
# Direct connection (local)
nebulactl node add localhost --local
# Direct connection (remote with open gRPC port)
nebulactl node add 192.168.1.100 --connection-type direct
# SSH tunnel (default for remote nodes)
nebulactl node add 10.0.0.50 --user ubuntu --key-path ~/.ssh/key.pem
# SSH tunnel (explicit)
nebulactl node add remote.server.com \
--connection-type ssh \
--user admin \
--port 2222 \
--key-path ~/.ssh/custom_key
View Nodes
# List all nodes
nebulactl node list
# Node status
nebulactl node status <node-id>
# Test connection
nebulactl node test <node-id>
Security
SSH Authentication
- Public key is used (recommended)
- Password authentication is supported (not recommended)
- TODO: Add proper host key verification
gRPC Security
- Current version: insecure credentials
- TODO: Add TLS support
Current Limitations
- Host Key Verification: Uses
ssh.InsecureIgnoreHostKey() - gRPC without TLS: Connection without encryption
- SSH Tunnel Performance: Each connection creates new tunnel
Agent Configuration
Example agent configuration (/etc/nebula/agent.yaml):
agent:
node_id: "550e8400-e29b-41d4-a716-446655440000"
node_name: "gpu-server-1"
cache_path: "/var/lib/nebula/models"
version: "0.1.0"
grpc_port: 9091
docker:
socket: "/var/run/docker.sock"
network: "nebula-network"
metrics:
enabled: true
port: 9100
path: "/metrics"
Note: Agent config does NOT contain:
control_planesection- Endpoints for outgoing connections
- Information about
nebulactl
The agent simply starts gRPC server and waits for incoming connections.