Architecture
This guide explains KSail’s internal architecture, design decisions, and how the various components work together. This is essential reading for contributors and helpful for advanced users who want to understand the tool’s inner workings.
Design Principles
Section titled “Design Principles”KSail is built on several core principles:
- Single Binary Distribution — All Kubernetes tools are embedded as Go libraries, eliminating external dependencies except Docker
- No Vendor Lock-In — Uses native distribution configs (kind.yaml, k3d.yaml, Talos patches, vcluster.yaml)
- Declarative Configuration — Everything as code in version-controlled files
- Provider/Provisioner Separation — Infrastructure management separated from distribution configuration
- Composability — Modular architecture with clear boundaries between components
High-Level Architecture
Section titled “High-Level Architecture”graph TB
CLI[CLI Layer<br/>pkg/cli]
APIs[API Types<br/>pkg/apis]
Services[Services<br/>pkg/svc]
Clients[Tool Clients<br/>pkg/client]
CLI --> Services
CLI --> APIs
Services --> APIs
Services --> Clients
subgraph "Infrastructure"
Provider[Providers<br/>Docker, Hetzner, Omni]
end
subgraph "Distributions"
Provisioner["Provisioners<br/>Vanilla (Kind), K3s (K3d), Talos, VCluster"]
end
subgraph "Components"
Installer[Installers<br/>CNI, CSI, Metrics, etc.]
end
Services --> Provider
Services --> Provisioner
Services --> Installer
Package Structure
Section titled “Package Structure”KSail uses a flat package structure in pkg/ to improve maintainability and reduce import complexity:
Core Packages
Section titled “Core Packages”pkg/apis/
Section titled “pkg/apis/”API types, schemas, and enums that define KSail’s configuration model.
cluster/v1alpha1/— ClusterSpec and ClusterConfig types; each enum type lives in its own domain-specific file (e.g.,distribution.go,cni.go,csi.go,loadbalancer.go,gitopsengine.go); theEnumValuerinterface is inenum.go- Top-level
schemas/directory — Generated JSON schema forksail.yaml(viago generate ./schemas/...) used for validation and editor autocomplete
pkg/cli/
Section titled “pkg/cli/”CLI wiring, commands, and terminal UI components (cmd/ for Cobra commands, ui/ for TUI, lifecycle/ for cluster orchestration, setup/ for component installation).
pkg/client/
Section titled “pkg/client/”Embedded tool clients that wrap Kubernetes tools as Go libraries.
kubectl/— Kubernetes API client operationshelm/— Helm chart operationsflux/— Flux GitOps reconciliationargocd/— ArgoCD GitOps reconciliationdocker/— Docker daemon operationsk9s/— K9s terminal UI integrationkubeconform/— Kubernetes manifest validationkustomize/— Kustomize renderingoci/— OCI registry operationsnetretry/— Network retry utilitiesreconciler/— Common base for GitOps clients
Note: Distribution tools like Kind, K3d, and VCluster are used directly via their SDKs in provisioners, not wrapped in pkg/client/.
pkg/svc/
Section titled “pkg/svc/”Services implementing core business logic.
provider/— Infrastructure providers (Docker, Hetzner, Omni)provisioner/cluster/— Distribution provisioners (Vanilla (Kind), K3s (K3d), Talos, VCluster)provisioner/registry/— Registry provisioner for mirror registriesinstaller/— Component installers (CNI, CSI, LoadBalancer, policy engines, cert-manager, metrics-server)chat/— AI chat integration using GitHub Copilot SDKdetector/— Detects installed components by querying Helm and Kubernetes APIdiff/— Computes ClusterSpec diffs and classifies update impactimage/— Container image export/import servicesmcp/— Model Context Protocol server for AI assistantsregistryresolver/— OCI registry detection and resolutionstate/— Cluster state persistence for distributions without introspection
Utility Packages
Section titled “Utility Packages”pkg/di/— Dependency injection containerpkg/envvar/— Environment variable utilitiespkg/fsutil/— Filesystem utilities (includes configmanager)pkg/k8s/— Kubernetes helpers and templatespkg/notify/— CLI notifications and progress displaypkg/runner/— Cobra command execution helperspkg/timer/— Command timing and performance trackingpkg/toolgen/— AI tool generation for assistants
Internal Packages
Section titled “Internal Packages”internal/buildmeta/— Build-time version metadata (Version, Commit, Date) injected via ldflags
Provider vs Provisioner Architecture
Section titled “Provider vs Provisioner Architecture”KSail separates infrastructure management (providers) from distribution configuration (provisioners). This separation allows the same distribution (e.g., Talos) to run on different infrastructure (Docker, Hetzner, Omni).
Providers
Section titled “Providers”Providers manage the underlying infrastructure where Kubernetes nodes run:
| Provider | Description | Supported Distributions |
|---|---|---|
| Docker | Runs nodes as Docker containers | Vanilla, K3s, Talos, VCluster |
| Hetzner | Runs nodes on Hetzner Cloud servers | Talos |
| Omni | Manages Talos clusters via Sidero Omni API | Talos |
Responsibilities:
- Create/delete infrastructure resources (containers or cloud servers)
- Configure networking
- Start/stop nodes
- Provide node access for provisioners
Location: pkg/svc/provider/
Provisioners
Section titled “Provisioners”Provisioners configure and manage Kubernetes distributions on top of provider infrastructure:
| Distribution | Tool | Provisioner | Description |
|---|---|---|---|
| Vanilla | Kind | KindClusterProvisioner | Standard upstream Kubernetes via Kind SDK |
| K3s | K3d | K3dClusterProvisioner | Lightweight K3s via K3d Cobra/SDK |
| Talos | Talos | TalosProvisioner | Immutable Talos Linux via Talos SDK |
| VCluster | Vind | VClusterProvisioner | Virtual clusters via vCluster Go SDK (Vind driver) |
Responsibilities:
- Generate distribution-specific configs (kind.yaml, k3d.yaml, Talos patches, vcluster.yaml)
- Bootstrap Kubernetes control plane
- Configure cluster components
- Manage cluster lifecycle
- Support cluster updates and upgrades
Location: pkg/svc/provisioner/cluster/
Interaction Flow
Section titled “Interaction Flow”sequenceDiagram
participant User
participant CLI
participant Provisioner
participant Provider
User->>CLI: ksail cluster create
CLI->>Provisioner: Create cluster
Provisioner->>Provider: Create infrastructure
Provider-->>Provisioner: Nodes ready
Provisioner->>Provisioner: Bootstrap Kubernetes
Provisioner-->>CLI: Cluster ready
CLI-->>User: Success
Example: Talos Distribution
Section titled “Example: Talos Distribution”Talos demonstrates the provider/provisioner separation. The same Talos provisioner (pkg/svc/provisioner/cluster/talos/) generates machine configs and bootstraps Kubernetes regardless of which provider is used: Docker (local containers), Hetzner (cloud servers with CCM and CSI), or Omni (Sidero API). This enables a consistent Talos experience across all environments.
Component Lifecycle
Section titled “Component Lifecycle”KSail manages cluster components (CNI, CSI, metrics-server, cert-manager, policy engines, GitOps engines) through a structured lifecycle:
Installation Phases
Section titled “Installation Phases”Components are installed in two phases to ensure dependencies are met:
Phase 1: Infrastructure Components
Section titled “Phase 1: Infrastructure Components”Installed immediately after CNI becomes ready:
- CSI — Storage drivers (local-path, Longhorn, Hetzner CSI)
- Metrics Server — Resource metrics API
- LoadBalancer — Cloud Provider KIND (Vanilla) or MetalLB (Talos)
- Cert Manager — TLS certificate management
- Policy Engine — Kyverno or Gatekeeper
Phase 2: GitOps Engines
Section titled “Phase 2: GitOps Engines”Installed after Phase 1 completes and API server is stable:
- Flux — GitOps continuous delivery
- ArgoCD — GitOps continuous delivery
Between phases, KSail performs an API Server Stability Check (waitForAPIServerStability) that requires 3 consecutive successful health checks within a 2-minute timeout. This prevents GitOps engines from crashing with dial tcp 10.96.0.1:443: i/o timeout errors caused by infrastructure components temporarily destabilizing API server connectivity while registering webhooks and CRDs.
Detection and Updates
Section titled “Detection and Updates”The detector service (pkg/svc/detector/) identifies installed components by:
- Querying Helm release history for Helm-based components
- Querying Kubernetes API for components installed via kubectl
This detection enables:
- Accurate baselines — Know what’s currently installed before updates
- Intelligent updates — Only modify what changed
- Verification — Confirm components are running as expected
The diff service (pkg/svc/diff/) computes configuration differences and classifies update impact:
- In-place — Can update without disruption (e.g., CNI config changes)
- Reboot-required — Requires node reboot (e.g., Talos kernel changes)
- Recreate-required — Must destroy and recreate cluster (e.g., distribution change)
Embedded Tools Approach
Section titled “Embedded Tools Approach”KSail embeds Kubernetes tools as Go libraries instead of shelling out to CLI tools:
Benefits
Section titled “Benefits”- Single Binary — No external dependencies except Docker
- Version Control — Tool versions locked in go.mod
- API Access — Direct access to tool APIs, not just CLI output parsing
- Performance — No process spawning overhead
- Error Handling — Structured errors from Go APIs vs parsing stderr
Embedded Tools
Section titled “Embedded Tools”| Tool | Purpose | Location |
|---|---|---|
| kubectl | Kubernetes API operations | pkg/client/kubectl/ |
| helm | Chart operations | pkg/client/helm/ |
| kind | Vanilla provisioner | pkg/svc/provisioner/cluster/kind/ (uses Kind SDK directly) |
| k3d | K3s provisioner | pkg/svc/provisioner/cluster/k3d/ (uses K3d Cobra/SDK directly) |
| vcluster | VCluster provisioner | pkg/svc/provisioner/cluster/vcluster/ (uses vCluster Go SDK directly) |
| flux | Flux GitOps | pkg/client/flux/ |
| argocd | ArgoCD GitOps | pkg/client/argocd/ |
| k9s | Terminal UI | pkg/client/k9s/ |
| kubeconform | Validation | pkg/client/kubeconform/ |
| kustomize | Rendering | pkg/client/kustomize/ |
External Dependencies
Section titled “External Dependencies”Only Docker is required as an external dependency:
- Docker — Container runtime for local clusters (Docker provider)
Cloud providers (Hetzner, Omni) require credentials:
- Hetzner —
HCLOUD_TOKENenvironment variable - Omni — Service account key (default:
OMNI_SERVICE_ACCOUNT_KEY, configurable viaspec.cluster.omni.serviceAccountKeyEnvVar)
State Persistence
Section titled “State Persistence”Different distributions handle state differently:
Introspectable Distributions
Section titled “Introspectable Distributions”Talos and VCluster can introspect running configuration:
- Talos can query machine configs via API
- VCluster stores config in Kubernetes resources
KSail can detect current configuration without local state files.
Non-Introspectable Distributions
Section titled “Non-Introspectable Distributions”Vanilla (Kind) and K3s (K3d) cannot introspect running configuration:
- Kind and K3d don’t expose cluster config via API
- Cannot reliably detect what was used to create cluster
KSail persists cluster specs to ~/.ksail/clusters/<name>/spec.json for these distributions.
State Service (pkg/svc/state/):
- Save — Write ClusterSpec to JSON after cluster creation
- Load — Read ClusterSpec for update operations
- Delete — Clean up state file on cluster deletion
This enables the ksail cluster update command to compare desired vs current state.
AI Integration
Section titled “AI Integration”KSail provides two AI interfaces:
Chat Assistant
Section titled “Chat Assistant”Command: ksail chat — Implementation: pkg/svc/chat/
Uses GitHub Copilot SDK for interactive cluster configuration and troubleshooting. Supports two modes: Agent (</>) with full tool execution and Plan (≡) for describing steps without executing. Authenticated via KSAIL_COPILOT_TOKEN or COPILOT_TOKEN. See AI Chat for full documentation.
MCP Server
Section titled “MCP Server”Command: ksail mcp — Implementation: pkg/svc/mcp/
Exposes KSail as a Model Context Protocol server for Claude and other AI assistants. Provides tools for cluster management, workload deployment, and configuration. See MCP for setup instructions.
Testing Strategy
Section titled “Testing Strategy”KSail uses multiple testing approaches:
- Unit tests —
go test ./...; uses testify/mock,export_test.gofor unexported symbols,t.Parallel(), and static error sentinels - Integration tests — Test real cluster operations (Kind/K3d/VCluster clusters with real workloads)
- System tests (CI) — System tests run on Linux (
ubuntu-latest, amd64) via.github/workflows/ci.yaml - Benchmarks —
go test -bench=. -benchmem ./...for performance regression tracking
Agentic Workflows
Section titled “Agentic Workflows”Automated AI-powered workflows run on schedules for continuous improvement: daily-code-quality, daily-plan, daily-builder, daily-workflow-maintenance, and daily-docs. These are Markdown-based agentic workflow definitions (not GitHub Actions YAML workflows) stored in .github/workflows/*.md.
Configuration Management
Section titled “Configuration Management”KSail uses a declarative configuration model:
Configuration Files
Section titled “Configuration Files”-
ksail.yaml— Main cluster configuration- Cluster metadata (name, distribution, provider)
- Component selection (CNI, CSI, metrics-server, etc.)
- GitOps configuration
- Mirror registries
- Custom settings per distribution
-
Distribution Configs — Native tool configurations
kind.yaml(Vanilla/Kind)k3d.yaml(K3s/K3d)talos/directory (Talos machine configs and patches)vcluster.yaml(VCluster)
-
k8s/Directory — Kubernetes manifests- Application deployments
- Services
- ConfigMaps/Secrets
- Kustomize overlays
Schema Validation
Section titled “Schema Validation”JSON schemas provide editor autocomplete and validation:
Generate schemas: go generate ./schemas/...
Location: schemas/ksail-config.schema.json
VSCode integration: Automatic via .vscode/settings.json
Configuration Loading
Section titled “Configuration Loading”The configmanager (pkg/fsutil/configmanager/) handles configuration loading:
- Load
ksail.yamland validate against schema - Load distribution config (kind.yaml, k3d.yaml, etc.)
- Merge configs with defaults
- Validate combined configuration
- Return strongly-typed ClusterSpec
Build System
Section titled “Build System”Building
Section titled “Building”# Development buildgo build -o ksail
# Optimized build (strips debug symbols)go build -ldflags="-s -w" -o ksail
# Build with version info (CI)go build -ldflags="-s -w -X github.com/devantler-tech/ksail/v5/internal/buildmeta.Version=v5.x.x" -o ksailRelease Process
Section titled “Release Process”- Push to
maintriggers the release workflow (.github/workflows/release.yaml); it can also be started manually via workflow_dispatch. - The release workflow delegates to a reusable workflow that builds binaries for all supported platforms/architectures:
- Linux (amd64, arm64)
- macOS (arm64)
- Windows (amd64, arm64)
- Artifacts are uploaded to GitHub Releases by the workflow
- Checksums are generated for verification
- VSCode extension is packaged and published to the marketplace as part of the release workflow
Config: .github/workflows/release.yaml
Documentation
Section titled “Documentation”Documentation is built with Astro and Starlight:
cd docs/npm cinpm run build # Generates static site in dist/npm run dev # Local development serverDeployed to: GitHub Pages (ksail.devantler.tech)
Workflow: .github/workflows/publish-pages.yaml
Contributing
Section titled “Contributing”See CONTRIBUTING.md for:
- Development setup
- Coding standards
- Pull request process
- Testing requirements
- Documentation guidelines
Further Reading
Section titled “Further Reading”- Concepts — High-level concepts and mental models
- Configuration — Detailed configuration reference
- CLI Flags — Complete CLI reference
- GitHub Repository — Source code and issues