Skip to content

Architecture

This guide explains KSail’s internal architecture, design decisions, and how the various components work together. This is essential reading for contributors and helpful for advanced users who want to understand the tool’s inner workings.

KSail is built on several core principles:

  1. Single Binary Distribution — All Kubernetes tools are embedded as Go libraries, eliminating external dependencies except Docker
  2. No Vendor Lock-In — Uses native distribution configs (kind.yaml, k3d.yaml, Talos patches, vcluster.yaml)
  3. Declarative Configuration — Everything as code in version-controlled files
  4. Provider/Provisioner Separation — Infrastructure management separated from distribution configuration
  5. Composability — Modular architecture with clear boundaries between components
graph TB
    CLI[CLI Layer<br/>pkg/cli]
    APIs[API Types<br/>pkg/apis]
    Services[Services<br/>pkg/svc]
    Clients[Tool Clients<br/>pkg/client]

    CLI --> Services
    CLI --> APIs
    Services --> APIs
    Services --> Clients

    subgraph "Infrastructure"
        Provider[Providers<br/>Docker, Hetzner, Omni]
    end

    subgraph "Distributions"
        Provisioner["Provisioners<br/>Vanilla (Kind), K3s (K3d), Talos, VCluster"]
    end

    subgraph "Components"
        Installer[Installers<br/>CNI, CSI, Metrics, etc.]
    end

    Services --> Provider
    Services --> Provisioner
    Services --> Installer

KSail uses a flat package structure in pkg/ to improve maintainability and reduce import complexity:

API types, schemas, and enums for KSail’s configuration model. cluster/v1alpha1/ contains ClusterSpec/ClusterConfig types with each enum in its own file (distribution.go, cni.go, etc.); the EnumValuer interface is in enum.go. The top-level schemas/ directory holds the generated JSON schema for ksail.yaml (via go generate ./schemas/...).

CLI wiring, commands, and terminal UI components (cmd/ for Cobra commands, ui/ for TUI, lifecycle/ for cluster orchestration, setup/ for component installation).

Embedded tool clients: kubectl, helm, flux, argocd, docker, k9s, kubeconform, and kustomize. A reconciler package provides a shared base for GitOps reconciliation (Flux and ArgoCD). Distribution tools (Kind, K3d, VCluster) are used directly via their SDKs in provisioners, not wrapped here.

Core services: provider/ (infrastructure backends), provisioner/cluster/ (distribution provisioners), provisioner/registry/ (mirror registries), installer/ (CNI, CSI, LoadBalancer, cert-manager, etc.), chat/ (AI chat), detector/ (installed component detection — with cluster/ for kubeconfig-based distribution/provider/name detection and gitops/ for GitOps CR discovery), diff/ (config diff and impact classification), image/ (container image I/O), mcp/ (MCP server), registryresolver/ (OCI registry resolution), and state/ (cluster state persistence).

  • pkg/di/ — Dependency injection container
  • pkg/envvar/ — Environment variable utilities
  • pkg/fsutil/ — Filesystem utilities (includes configmanager)
  • pkg/k8s/ — Kubernetes helpers and templates
  • pkg/notify/ — CLI notifications and progress display; ProgressGroup manages parallel task execution with live spinner output and per-task duration tracking (injectable Clock via WithClock; WithTimer integrates with pkg/timer/ for stage totals)
  • pkg/runner/ — Cobra command execution helpers
  • pkg/timer/ — Command timing and performance tracking
  • pkg/toolgen/ — AI tool generation for assistants
  • internal/buildmeta/ — Build-time version metadata (Version, Commit, Date) injected via ldflags

KSail separates infrastructure management (providers) from distribution configuration (provisioners). This separation allows the same distribution (e.g., Talos) to run on different infrastructure (Docker, Hetzner, Omni).

Providers manage the underlying infrastructure where Kubernetes nodes run:

ProviderDescriptionSupported Distributions
DockerRuns nodes as Docker containersVanilla, K3s, Talos, VCluster
HetznerRuns nodes on Hetzner Cloud serversTalos
OmniManages Talos clusters via Sidero Omni APITalos

Provisioners configure and manage Kubernetes distributions on top of provider infrastructure:

DistributionToolProvisionerDescription
VanillaKindKindClusterProvisionerStandard upstream Kubernetes via Kind SDK
K3sK3dK3dClusterProvisionerLightweight K3s via K3d Cobra/SDK
TalosTalosTalosProvisionerImmutable Talos Linux via Talos SDK
VClusterVindVClusterProvisionerVirtual clusters via vCluster Go SDK (Vind driver)
sequenceDiagram
    participant User
    participant CLI
    participant Provisioner
    participant Provider

    User->>CLI: ksail cluster create
    CLI->>Provisioner: Create cluster
    Provisioner->>Provider: Create infrastructure
    Provider-->>Provisioner: Nodes ready
    Provisioner->>Provisioner: Bootstrap Kubernetes
    Provisioner-->>CLI: Cluster ready
    CLI-->>User: Success

Talos demonstrates the provider/provisioner separation. The same Talos provisioner (pkg/svc/provisioner/cluster/talos/) generates machine configs and bootstraps Kubernetes regardless of which provider is used: Docker (local containers), Hetzner (cloud servers with CCM and CSI), or Omni (Sidero API). This enables a consistent Talos experience across all environments.

KSail manages cluster components (CNI, CSI, metrics-server, cert-manager, policy engines, GitOps engines) through a structured lifecycle:

Components are installed in two phases to ensure dependencies are met:

Installed immediately after CNI becomes ready:

  • CSI — Storage drivers (local-path, Longhorn, Hetzner CSI)
  • Metrics Server — Resource metrics API
  • LoadBalancer — Cloud Provider KIND (Vanilla) or MetalLB (Talos)
  • Cert Manager — TLS certificate management
  • Policy Engine — Kyverno or Gatekeeper

Installed after Phase 1 completes and API server is stable:

  • Flux — GitOps continuous delivery
  • ArgoCD — GitOps continuous delivery

Between phases, KSail performs a Cluster Stability Check (waitForClusterStability) that requires 5 consecutive successful API server health checks within a 2-minute timeout and verifies all kube-system DaemonSets are ready within a 3-minute timeout. This prevents GitOps engines from crashing with API server i/o timeout errors caused by infrastructure components temporarily destabilizing API server connectivity while registering webhooks and CRDs, or by incomplete CNI dataplane programming.

The detector service (pkg/svc/detector/) identifies installed components by querying Helm release history and the Kubernetes API, enabling accurate baselines, intelligent updates (only modify what changed), and post-update verification. It has two sub-packages:

  • detector/cluster/ — Detects the Kubernetes distribution, provider, and cluster name by analyzing kubeconfig context names and server endpoints. Exposes DetectInfo, DetectDistributionFromContext, and ResolveKubeconfigPath.
  • detector/gitops/ — Detects existing GitOps Custom Resources (FluxInstance, ArgoCD Application) in the source directory that are either labeled as managed by KSail or use KSail’s default resource names (for example, flux / ksail).

The diff service (pkg/svc/diff/) classifies update impact as:

  • In-place — No disruption (e.g., CNI config changes)
  • Reboot-required — Node reboot needed (e.g., Talos kernel changes)
  • Recreate-required — Full cluster recreation required (e.g., distribution change)

KSail embeds Kubernetes tools as Go libraries instead of shelling out to CLI tools, delivering a single binary with no external dependencies, locked tool versions via go.mod, direct API access (no output parsing), no process-spawning overhead, and structured error handling from Go APIs.

ToolPurposeLocation
kubectlKubernetes API operationspkg/client/kubectl/
helmChart operationspkg/client/helm/
kindVanilla provisionerpkg/svc/provisioner/cluster/kind/ (uses Kind SDK directly)
k3dK3s provisionerpkg/svc/provisioner/cluster/k3d/ (uses K3d Cobra/SDK directly)
vclusterVCluster provisionerpkg/svc/provisioner/cluster/vcluster/ (uses vCluster Go SDK directly)
fluxFlux GitOpspkg/client/flux/
argocdArgoCD GitOpspkg/client/argocd/
k9sTerminal UIpkg/client/k9s/
kubeconformValidationpkg/client/kubeconform/
kustomizeRenderingpkg/client/kustomize/

Only Docker is required externally (as the container runtime for local clusters). Cloud providers require credentials: HCLOUD_TOKEN for Hetzner, and a service account key (default env: OMNI_SERVICE_ACCOUNT_KEY, configurable via spec.cluster.omni.serviceAccountKeyEnvVar) for Omni.

Different distributions handle state differently:

Talos and VCluster can introspect running configuration:

  • Talos can query machine configs via API
  • VCluster stores config in Kubernetes resources

KSail can detect current configuration without local state files.

Vanilla (Kind) and K3s (K3d) cannot introspect running configuration:

  • Kind and K3d don’t expose cluster config via API
  • Cannot reliably detect what was used to create cluster

KSail persists cluster specs to ~/.ksail/clusters/<name>/spec.json for these distributions.

State Service (pkg/svc/state/):

  • Save — Write ClusterSpec to JSON after cluster creation
  • Load — Read ClusterSpec for update operations
  • Delete — Clean up state file on cluster deletion

This enables the ksail cluster update command to compare desired vs current state.

KSail provides two AI interfaces:

Command: ksail chat — Implementation: pkg/svc/chat/

Uses GitHub Copilot SDK for interactive cluster configuration and troubleshooting. Supports two modes: Agent (</>) with full tool execution and Plan (≡) for describing steps without executing. Authenticated via KSAIL_COPILOT_TOKEN or COPILOT_TOKEN. See AI Chat for full documentation.

Command: ksail mcp — Implementation: pkg/svc/mcp/

Exposes KSail as a Model Context Protocol server for Claude and other AI assistants. Provides tools for cluster management, workload deployment, and configuration. See MCP for setup instructions.

KSail uses unit tests (go test ./..., testify/mock, export_test.go for unexported symbols, t.Parallel(), static error sentinels), integration tests against real clusters, system tests on Linux via .github/workflows/ci.yaml, and benchmarks (go test -bench=. -benchmem ./...).

Automated AI-powered agentic workflows (daily-code-quality, daily-plan, daily-builder, daily-workflow-maintenance, daily-docs) run on schedules for continuous improvement. These Markdown-based definitions live in .github/workflows/**/*.md.

KSail uses a declarative configuration model:

  1. ksail.yaml — Cluster metadata, component selection (CNI, CSI, metrics-server, etc.), GitOps engine, mirror registries, and distribution-specific settings
  2. Distribution configs — kind.yaml (Vanilla), k3d.yaml (K3s), talos/ patches, vcluster.yaml (VCluster)
  3. k8s/ directory — Kubernetes manifests (Deployments, Services, ConfigMaps, Kustomize overlays)

JSON schemas provide editor autocomplete and validation:

Generate schemas: go generate ./schemas/...

Location: schemas/ksail-config.schema.json

VSCode integration: Automatic via .vscode/settings.json

The configmanager (pkg/fsutil/configmanager/) handles configuration loading:

  1. Load ksail.yaml and validate against schema
  2. Load distribution config (kind.yaml, k3d.yaml, etc.)
  3. Merge configs with defaults
  4. Validate combined configuration
  5. Return strongly-typed ClusterSpec
Terminal window
# Development build
go build -o ksail
# Optimized build (strips debug symbols)
go build -ldflags="-s -w" -o ksail
# Build with version info (CI)
go build -ldflags="-s -w -X github.com/devantler-tech/ksail/v5/internal/buildmeta.Version=v5.x.x" -o ksail
  1. Push to main triggers the release workflow (.github/workflows/release.yaml); it can also be started manually via workflow_dispatch.
  2. The release workflow builds binaries for all supported platforms (Linux amd64/arm64, macOS arm64, Windows amd64/arm64), uploads artifacts to GitHub Releases, and generates checksums.
  3. VSCode extension is packaged and published to the marketplace as part of the release workflow.

Config: .github/workflows/release.yaml

Documentation is built with Astro and Starlight:

Terminal window
cd docs/
npm ci
npm run build # Generates static site in dist/
npm run dev # Local development server

Deployed to: GitHub Pages (ksail.devantler.tech)

Workflow: .github/workflows/publish-pages.yaml

See CONTRIBUTING.md for development setup, coding standards, pull request process, testing requirements, and documentation guidelines.