Skip to content

Architecture

This guide explains KSail’s architecture and design decisions at a level useful for advanced users. For contributor-level internals (package layout, internal APIs, source structure), see CONTRIBUTING.md.

KSail is built on several core principles:

  1. Single Binary Distribution — All Kubernetes tools are embedded as Go libraries, eliminating external dependencies except Docker
  2. No Vendor Lock-In — Uses native distribution configs (kind.yaml, k3d.yaml, Talos patches, vcluster.yaml)
  3. Declarative Configuration — Everything as code in version-controlled files
  4. Provider/Provisioner Separation — Infrastructure management separated from distribution configuration
  5. Composability — Modular architecture with clear boundaries between components
graph TB
    CLI[CLI Layer]
    APIs[API Types]
    Services[Services]
    Clients[Tool Clients]

    CLI --> Services
    CLI --> APIs
    Services --> APIs
    Services --> Clients

    subgraph "Infrastructure"
        Provider[Providers<br/>Docker, Hetzner, Omni]
    end

    subgraph "Distributions"
        Provisioner["Provisioners<br/>Vanilla (Kind), K3s (K3d), Talos, VCluster, KWOK"]
    end

    subgraph "Components"
        Installer[Installers<br/>CNI, CSI, Metrics, etc.]
    end

    Services --> Provider
    Services --> Provisioner
    Services --> Installer

KSail separates infrastructure management (providers) from distribution configuration (provisioners). This separation allows the same distribution (e.g., Talos) to run on different infrastructure (Docker, Hetzner, Omni).

Providers manage the underlying infrastructure where Kubernetes nodes run:

ProviderDescriptionSupported Distributions
DockerRuns nodes as Docker containersVanilla, K3s, Talos, VCluster, KWOK
HetznerRuns nodes on Hetzner Cloud serversTalos
OmniManages Talos clusters via Sidero Omni APITalos

Provisioners configure and manage Kubernetes distributions on top of provider infrastructure:

DistributionToolDescription
VanillaKindStandard upstream Kubernetes
K3sK3dLightweight K3s
TalosTalosImmutable Talos Linux
VClustervClusterVirtual clusters via vCluster
sequenceDiagram
    participant User
    participant CLI
    participant Provisioner
    participant Provider

    User->>CLI: ksail cluster create
    CLI->>Provisioner: Create cluster
    Provisioner->>Provider: Create infrastructure
    Provider-->>Provisioner: Nodes ready
    Provisioner->>Provisioner: Bootstrap Kubernetes
    Provisioner-->>CLI: Cluster ready
    CLI-->>User: Success

Talos demonstrates the provider/provisioner separation. The same Talos provisioner generates machine configs and bootstraps Kubernetes regardless of which provider is used: Docker (local containers), Hetzner (cloud servers with CCM and CSI), or Omni (Sidero API). This enables a consistent Talos experience across all environments.

KSail manages cluster components (CNI, CSI, metrics-server, cert-manager, policy engines, GitOps engines) through a structured lifecycle:

Components are installed in two phases to ensure dependencies are met:

Installed immediately after CNI becomes ready:

  • CSI — Storage drivers (local-path, Longhorn, Hetzner CSI)
  • Metrics Server — Resource metrics API
  • LoadBalancer — Cloud Provider KIND (Vanilla) or MetalLB (Talos)
  • Cert Manager — TLS certificate management
  • Policy Engine — Kyverno or Gatekeeper

Installed after a cluster stability check confirms the API server is fully ready:

  • Flux — GitOps continuous delivery
  • ArgoCD — GitOps continuous delivery

Before Phase 2, KSail always performs a Cluster Stability Check with three sequential steps:

  1. API server stability — requires consecutive successful health checks within a 2-minute timeout. The threshold is distribution-aware: 3 checks for Vanilla, K3s, and KWOK (which stabilize faster after webhook registration), and 5 for Talos and VCluster.
  2. DaemonSet readiness — verifies all kube-system DaemonSets are ready within a 3-minute timeout. Runs after API server stability, as it does not retry transient transport errors.
  3. In-cluster API connectivity (Cilium only) — creates a short-lived busybox pod that tests TCP connectivity to the API server ClusterIP (port 443) from within the cluster, with a 2-minute timeout. Only performed for Cilium CNI, where eBPF dataplane programming may lag behind DaemonSet readiness. Skipped for the default (distribution-provided) CNI and Calico.

This check always runs before GitOps engines, even when no Phase 1 components are installed. It prevents race conditions where K3s/K3d clusters report creation success before the API server is fully ready to serve requests. On setups with Phase 1 infrastructure components, it also ensures API connectivity has recovered after those components register webhooks and CRDs.

The detector service identifies installed components by querying Helm release history and the Kubernetes API, with additional checks against the Docker daemon where needed. It determines the active distribution, provider, and cluster name from the current kubeconfig context, and distinguishes KSail-managed GitOps resources from unrelated ones so it does not interfere with external GitOps setups.

The diff service classifies update impact as in-place (no disruption), reboot-required (node reboot), or recreate-required (full cluster recreation).

KSail embeds Kubernetes tools as Go libraries instead of shelling out to CLI tools, delivering a single binary with no external dependencies, locked tool versions via go.mod, direct API access (no output parsing), no process-spawning overhead, and structured error handling from Go APIs.

ToolPurpose
kubectlKubernetes API operations
helmChart operations
kindVanilla provisioner
k3dK3s provisioner
vclusterVCluster provisioner
fluxFlux GitOps
argocdArgoCD GitOps
k9sTerminal UI
kubeconformValidation
kustomizeRendering

Only Docker is required externally (as the container runtime for local clusters). Cloud providers require credentials: HCLOUD_TOKEN for Hetzner, and a service account key (default env: OMNI_SERVICE_ACCOUNT_KEY, configurable via spec.provider.omni.serviceAccountKeyEnvVar) for Omni.

Talos and VCluster can introspect running configuration (Talos via API, VCluster via Kubernetes resources), so KSail needs no local state for them.

Vanilla (Kind) and K3s (K3d) don’t expose cluster config via API, so KSail persists their ClusterSpecs to ~/.ksail/clusters/<name>/spec.json. This enables ksail cluster update to compare desired vs current state.

KSail provides two AI interfaces built on top of the same CLI tool infrastructure:

  • ksail chat — interactive AI assistant using GitHub Copilot SDK; supports Agent and Plan modes. See AI Chat.
  • ksail mcp — Model Context Protocol server for Claude and other AI assistants; tools are auto-generated from the CLI command tree and grouped into read/write pairs. See MCP.