Architecture
This guide explains KSailâs architecture and design decisions at a level useful for advanced users. For contributor-level internals (package layout, internal APIs, source structure), see CONTRIBUTING.md.
Design Principles
Section titled âDesign PrinciplesâKSail is built on several core principles:
- Single Binary Distribution â All Kubernetes tools are embedded as Go libraries, eliminating external dependencies except Docker
- No Vendor Lock-In â Uses native distribution configs (kind.yaml, k3d.yaml, Talos patches, vcluster.yaml)
- Declarative Configuration â Everything as code in version-controlled files
- Provider/Provisioner Separation â Infrastructure management separated from distribution configuration
- Composability â Modular architecture with clear boundaries between components
High-Level Architecture
Section titled âHigh-Level Architectureâgraph TB
CLI[CLI Layer]
APIs[API Types]
Services[Services]
Clients[Tool Clients]
CLI --> Services
CLI --> APIs
Services --> APIs
Services --> Clients
subgraph "Infrastructure"
Provider[Providers<br/>Docker, Hetzner, Omni]
end
subgraph "Distributions"
Provisioner["Provisioners<br/>Vanilla (Kind), K3s (K3d), Talos, VCluster, KWOK"]
end
subgraph "Components"
Installer[Installers<br/>CNI, CSI, Metrics, etc.]
end
Services --> Provider
Services --> Provisioner
Services --> Installer
Provider vs Provisioner Architecture
Section titled âProvider vs Provisioner ArchitectureâKSail separates infrastructure management (providers) from distribution configuration (provisioners). This separation allows the same distribution (e.g., Talos) to run on different infrastructure (Docker, Hetzner, Omni).
Providers
Section titled âProvidersâProviders manage the underlying infrastructure where Kubernetes nodes run:
| Provider | Description | Supported Distributions |
|---|---|---|
| Docker | Runs nodes as Docker containers | Vanilla, K3s, Talos, VCluster, KWOK |
| Hetzner | Runs nodes on Hetzner Cloud servers | Talos |
| Omni | Manages Talos clusters via Sidero Omni API | Talos |
Provisioners
Section titled âProvisionersâProvisioners configure and manage Kubernetes distributions on top of provider infrastructure:
| Distribution | Tool | Description |
|---|---|---|
| Vanilla | Kind | Standard upstream Kubernetes |
| K3s | K3d | Lightweight K3s |
| Talos | Talos | Immutable Talos Linux |
| VCluster | vCluster | Virtual clusters via vCluster |
Interaction Flow
Section titled âInteraction FlowâsequenceDiagram
participant User
participant CLI
participant Provisioner
participant Provider
User->>CLI: ksail cluster create
CLI->>Provisioner: Create cluster
Provisioner->>Provider: Create infrastructure
Provider-->>Provisioner: Nodes ready
Provisioner->>Provisioner: Bootstrap Kubernetes
Provisioner-->>CLI: Cluster ready
CLI-->>User: Success
Example: Talos Distribution
Section titled âExample: Talos DistributionâTalos demonstrates the provider/provisioner separation. The same Talos provisioner generates machine configs and bootstraps Kubernetes regardless of which provider is used: Docker (local containers), Hetzner (cloud servers with CCM and CSI), or Omni (Sidero API). This enables a consistent Talos experience across all environments.
Component Lifecycle
Section titled âComponent LifecycleâKSail manages cluster components (CNI, CSI, metrics-server, cert-manager, policy engines, GitOps engines) through a structured lifecycle:
Installation Phases
Section titled âInstallation PhasesâComponents are installed in two phases to ensure dependencies are met:
Phase 1: Infrastructure Components
Section titled âPhase 1: Infrastructure ComponentsâInstalled immediately after CNI becomes ready:
- CSI â Storage drivers (local-path, Longhorn, Hetzner CSI)
- Metrics Server â Resource metrics API
- LoadBalancer â Cloud Provider KIND (Vanilla) or MetalLB (Talos)
- Cert Manager â TLS certificate management
- Policy Engine â Kyverno or Gatekeeper
Phase 2: GitOps Engines
Section titled âPhase 2: GitOps EnginesâInstalled after a cluster stability check confirms the API server is fully ready:
- Flux â GitOps continuous delivery
- ArgoCD â GitOps continuous delivery
Before Phase 2, KSail always performs a Cluster Stability Check with three sequential steps:
- API server stability â requires consecutive successful health checks within a 2-minute timeout. The threshold is distribution-aware: 3 checks for Vanilla, K3s, and KWOK (which stabilize faster after webhook registration), and 5 for Talos and VCluster.
- DaemonSet readiness â verifies all kube-system DaemonSets are ready within a 3-minute timeout. Runs after API server stability, as it does not retry transient transport errors.
- In-cluster API connectivity (Cilium only) â creates a short-lived busybox pod that tests TCP connectivity to the API server ClusterIP (port 443) from within the cluster, with a 2-minute timeout. Only performed for Cilium CNI, where eBPF dataplane programming may lag behind DaemonSet readiness. Skipped for the default (distribution-provided) CNI and Calico.
This check always runs before GitOps engines, even when no Phase 1 components are installed. It prevents race conditions where K3s/K3d clusters report creation success before the API server is fully ready to serve requests. On setups with Phase 1 infrastructure components, it also ensures API connectivity has recovered after those components register webhooks and CRDs.
Detection and Updates
Section titled âDetection and UpdatesâThe detector service identifies installed components by querying Helm release history and the Kubernetes API, with additional checks against the Docker daemon where needed. It determines the active distribution, provider, and cluster name from the current kubeconfig context, and distinguishes KSail-managed GitOps resources from unrelated ones so it does not interfere with external GitOps setups.
The diff service classifies update impact as in-place (no disruption), reboot-required (node reboot), or recreate-required (full cluster recreation).
Embedded Tools Approach
Section titled âEmbedded Tools ApproachâKSail embeds Kubernetes tools as Go libraries instead of shelling out to CLI tools, delivering a single binary with no external dependencies, locked tool versions via go.mod, direct API access (no output parsing), no process-spawning overhead, and structured error handling from Go APIs.
Embedded Tools
Section titled âEmbedded Toolsâ| Tool | Purpose |
|---|---|
| kubectl | Kubernetes API operations |
| helm | Chart operations |
| kind | Vanilla provisioner |
| k3d | K3s provisioner |
| vcluster | VCluster provisioner |
| flux | Flux GitOps |
| argocd | ArgoCD GitOps |
| k9s | Terminal UI |
| kubeconform | Validation |
| kustomize | Rendering |
External Dependencies
Section titled âExternal DependenciesâOnly Docker is required externally (as the container runtime for local clusters). Cloud providers require credentials: HCLOUD_TOKEN for Hetzner, and a service account key (default env: OMNI_SERVICE_ACCOUNT_KEY, configurable via spec.provider.omni.serviceAccountKeyEnvVar) for Omni.
State Persistence
Section titled âState PersistenceâTalos and VCluster can introspect running configuration (Talos via API, VCluster via Kubernetes resources), so KSail needs no local state for them.
Vanilla (Kind) and K3s (K3d) donât expose cluster config via API, so KSail persists their ClusterSpecs to ~/.ksail/clusters/<name>/spec.json. This enables ksail cluster update to compare desired vs current state.
AI Integration
Section titled âAI IntegrationâKSail provides two AI interfaces built on top of the same CLI tool infrastructure:
ksail chatâ interactive AI assistant using GitHub Copilot SDK; supports Agent and Plan modes. See AI Chat.ksail mcpâ Model Context Protocol server for Claude and other AI assistants; tools are auto-generated from the CLI command tree and grouped into read/write pairs. See MCP.
Further Reading
Section titled âFurther Readingâ- Configuration â Declarative configuration, sources, and precedence
- Concepts â High-level concepts and mental models
- Development Guide â Build commands, coding standards, testing patterns, and CI/CD
- Contributing â Dev setup and PR process
- CLI Flags â Complete CLI reference
- GitHub Repository â Source code and issues