Architecture
This guide explains KSail's architecture and design decisions at a level useful for advanced users. For contributor-level internals (package layout, internal APIs, source structure), see CONTRIBUTING.md.
Design Principles
Section titled “Design Principles”KSail is built on several core principles:
- Single Binary Distribution — All Kubernetes tools are embedded as Go libraries, eliminating external dependencies except Docker
- No Vendor Lock-In — Uses native distribution configs (kind.yaml, k3d.yaml, Talos patches, vcluster.yaml)
- Declarative Configuration — Everything as code in version-controlled files
- Provider/Provisioner Separation — Infrastructure management separated from distribution configuration
- Composability — Modular architecture with clear boundaries between components
High-Level Architecture
Section titled “High-Level Architecture”graph TB
CLI[CLI Layer]
APIs[API Types]
Services[Services]
Clients[Tool Clients]
CLI --> Services
CLI --> APIs
Services --> APIs
Services --> Clients
subgraph "Infrastructure"
Provider[Providers<br/>Docker, Hetzner, Omni]
end
subgraph "Distributions"
Provisioner["Provisioners<br/>Vanilla (Kind), K3s (K3d), Talos, VCluster, KWOK"]
end
subgraph "Components"
Installer[Installers<br/>CNI, CSI, Metrics, etc.]
end
Services --> Provider
Services --> Provisioner
Services --> Installer
Provider vs Provisioner Architecture
Section titled “Provider vs Provisioner Architecture”KSail separates infrastructure management (providers) from distribution configuration (provisioners). This separation allows the same distribution (e.g., Talos) to run on different infrastructure (Docker, Hetzner, Omni).
For supported providers and distributions, see Concepts and the Distribution × Provider Matrix.
Interaction Flow
Section titled “Interaction Flow”sequenceDiagram
participant User
participant CLI
participant Provisioner
participant Provider
User->>CLI: ksail cluster create
CLI->>Provisioner: Create cluster
Provisioner->>Provider: Create infrastructure
Provider-->>Provisioner: Nodes ready
Provisioner->>Provisioner: Bootstrap Kubernetes
Provisioner-->>CLI: Cluster ready
CLI-->>User: Success
Example: Talos Distribution
Section titled “Example: Talos Distribution”Talos demonstrates the provider/provisioner separation. The same Talos provisioner generates machine configs and bootstraps Kubernetes regardless of which provider is used: Docker (local containers), Hetzner (cloud servers with CCM and CSI), or Omni (Sidero API). This enables a consistent Talos experience across all environments.
Component Lifecycle
Section titled “Component Lifecycle”KSail manages cluster components (CNI, CSI, metrics-server, cert-manager, policy engines, GitOps engines) through a structured lifecycle:
Installation Phases
Section titled “Installation Phases”Components are installed in two phases to ensure dependencies are met:
Phase 1: Infrastructure Components
Section titled “Phase 1: Infrastructure Components”Installed immediately after CNI becomes ready:
- CSI — Storage drivers (local-path, Longhorn, Hetzner CSI)
- Metrics Server — Resource metrics API
- LoadBalancer — Cloud Provider KIND (Vanilla) or MetalLB (Talos)
- Cert Manager — TLS certificate management
- Policy Engine — Kyverno or Gatekeeper
Phase 2: GitOps Engines
Section titled “Phase 2: GitOps Engines”Installed after a cluster stability check confirms the API server is fully ready:
- Flux — GitOps continuous delivery
- ArgoCD — GitOps continuous delivery
Before Phase 2, KSail always performs a Cluster Stability Check — verifying API server health, DaemonSet readiness, and (for Cilium) in-cluster connectivity. This prevents race conditions where clusters report creation success before the API server is fully ready. The check always runs even when no Phase 1 components are installed.
Detection and Updates
Section titled “Detection and Updates”KSail detects installed components by querying Helm release history and the Kubernetes API (and, for Docker-based providers, inspecting the Docker daemon where needed), and determines the active distribution, provider, and cluster name from the kubeconfig context. It distinguishes KSail-managed GitOps resources from unrelated ones to avoid interfering with external GitOps setups.
The diff service classifies update impact as in-place (no disruption), reboot-required (node reboot), or recreate-required (full cluster recreation).
Embedded Tools Approach
Section titled “Embedded Tools Approach”KSail embeds Kubernetes tools as Go libraries instead of shelling out to CLI tools, delivering a single binary with no external dependencies, locked tool versions via go.mod, direct API access (no output parsing), no process-spawning overhead, and structured error handling from Go APIs.
Embedded Tools
Section titled “Embedded Tools”| Tool | Purpose | |------|---------| | kubectl | Kubernetes API operations | | helm | Chart operations | | kind | Vanilla provisioner | | k3d | K3s provisioner | | vcluster | VCluster provisioner | | flux | Flux GitOps | | argocd | ArgoCD GitOps | | k9s | Terminal UI | | kubeconform | Validation | | kustomize | Rendering |
External Dependencies
Section titled “External Dependencies”Only Docker is required externally (as the container runtime for local clusters). Cloud providers require credentials: HCLOUD_TOKEN for Hetzner, and a service account key (default env: OMNI_SERVICE_ACCOUNT_KEY, configurable via spec.provider.omni.serviceAccountKeyEnvVar) for Omni.
State Persistence
Section titled “State Persistence”Talos and VCluster can introspect running configuration (Talos via API, VCluster via Kubernetes resources), so KSail needs no local state for them.
Vanilla (Kind) and K3s (K3d) don't expose cluster config via API, so KSail persists their ClusterSpecs to ~/.ksail/clusters/<name>/spec.json. This enables ksail cluster update to compare desired vs current state.
AI Integration
Section titled “AI Integration”KSail provides two AI interfaces built on top of the same CLI tool infrastructure:
ksail chat— interactive AI assistant using GitHub Copilot SDK; supports Agent and Plan modes. See AI Chat.ksail mcp— Model Context Protocol server for Claude and other AI assistants; tools are auto-generated from the CLI command tree and grouped into read/write pairs. See MCP.
Further Reading
Section titled “Further Reading”- Configuration — Declarative configuration, sources, and precedence
- Concepts — High-level concepts and mental models
- Development Guide — Build commands, coding standards, testing patterns, and CI/CD
- Contributing — Dev setup and PR process
- CLI Flags — Complete CLI reference
- GitHub Repository — Source code and issues