Skip to content

Architecture

This guide explains KSail’s internal architecture, design decisions, and how the various components work together. This is essential reading for contributors and helpful for advanced users who want to understand the tool’s inner workings.

KSail is built on several core principles:

  1. Single Binary Distribution — All Kubernetes tools are embedded as Go libraries, eliminating external dependencies except Docker
  2. No Vendor Lock-In — Uses native distribution configs (kind.yaml, k3d.yaml, Talos patches, vcluster.yaml)
  3. Declarative Configuration — Everything as code in version-controlled files
  4. Provider/Provisioner Separation — Infrastructure management separated from distribution configuration
  5. Composability — Modular architecture with clear boundaries between components
graph TB
    CLI[CLI Layer<br/>pkg/cli]
    APIs[API Types<br/>pkg/apis]
    Services[Services<br/>pkg/svc]
    Clients[Tool Clients<br/>pkg/client]

    CLI --> Services
    CLI --> APIs
    Services --> APIs
    Services --> Clients

    subgraph "Infrastructure"
        Provider[Providers<br/>Docker, Hetzner, Omni]
    end

    subgraph "Distributions"
        Provisioner["Provisioners<br/>Vanilla (Kind), K3s (K3d), Talos, VCluster"]
    end

    subgraph "Components"
        Installer[Installers<br/>CNI, CSI, Metrics, etc.]
    end

    Services --> Provider
    Services --> Provisioner
    Services --> Installer

KSail uses a flat package structure in pkg/ to improve maintainability and reduce import complexity:

API types, schemas, and enums that define KSail’s configuration model.

  • cluster/v1alpha1/ — ClusterSpec and ClusterConfig types; each enum type lives in its own domain-specific file (e.g., distribution.go, cni.go, csi.go, loadbalancer.go, gitopsengine.go); the EnumValuer interface is in enum.go
  • Top-level schemas/ directory — Generated JSON schema for ksail.yaml (via go generate ./schemas/...) used for validation and editor autocomplete

CLI wiring, commands, and terminal UI components (cmd/ for Cobra commands, ui/ for TUI, lifecycle/ for cluster orchestration, setup/ for component installation).

Embedded tool clients that wrap Kubernetes tools as Go libraries.

  • kubectl/ — Kubernetes API client operations
  • helm/ — Helm chart operations
  • flux/ — Flux GitOps reconciliation
  • argocd/ — ArgoCD GitOps reconciliation
  • docker/ — Docker daemon operations
  • k9s/ — K9s terminal UI integration
  • kubeconform/ — Kubernetes manifest validation
  • kustomize/ — Kustomize rendering
  • oci/ — OCI registry operations
  • netretry/ — Network retry utilities
  • reconciler/ — Common base for GitOps clients

Note: Distribution tools like Kind, K3d, and VCluster are used directly via their SDKs in provisioners, not wrapped in pkg/client/.

Services implementing core business logic.

  • provider/ — Infrastructure providers (Docker, Hetzner, Omni)
  • provisioner/cluster/ — Distribution provisioners (Vanilla (Kind), K3s (K3d), Talos, VCluster)
  • provisioner/registry/ — Registry provisioner for mirror registries
  • installer/ — Component installers (CNI, CSI, LoadBalancer, policy engines, cert-manager, metrics-server)
  • chat/ — AI chat integration using GitHub Copilot SDK
  • detector/ — Detects installed components by querying Helm and Kubernetes API
  • diff/ — Computes ClusterSpec diffs and classifies update impact
  • image/ — Container image export/import services
  • mcp/ — Model Context Protocol server for AI assistants
  • registryresolver/ — OCI registry detection and resolution
  • state/ — Cluster state persistence for distributions without introspection
  • pkg/di/ — Dependency injection container
  • pkg/envvar/ — Environment variable utilities
  • pkg/fsutil/ — Filesystem utilities (includes configmanager)
  • pkg/k8s/ — Kubernetes helpers and templates
  • pkg/notify/ — CLI notifications and progress display
  • pkg/runner/ — Cobra command execution helpers
  • pkg/timer/ — Command timing and performance tracking
  • pkg/toolgen/ — AI tool generation for assistants
  • internal/buildmeta/ — Build-time version metadata (Version, Commit, Date) injected via ldflags

KSail separates infrastructure management (providers) from distribution configuration (provisioners). This separation allows the same distribution (e.g., Talos) to run on different infrastructure (Docker, Hetzner, Omni).

Providers manage the underlying infrastructure where Kubernetes nodes run:

ProviderDescriptionSupported Distributions
DockerRuns nodes as Docker containersVanilla, K3s, Talos, VCluster
HetznerRuns nodes on Hetzner Cloud serversTalos
OmniManages Talos clusters via Sidero Omni APITalos

Responsibilities:

  • Create/delete infrastructure resources (containers or cloud servers)
  • Configure networking
  • Start/stop nodes
  • Provide node access for provisioners

Location: pkg/svc/provider/

Provisioners configure and manage Kubernetes distributions on top of provider infrastructure:

DistributionToolProvisionerDescription
VanillaKindKindClusterProvisionerStandard upstream Kubernetes via Kind SDK
K3sK3dK3dClusterProvisionerLightweight K3s via K3d Cobra/SDK
TalosTalosTalosProvisionerImmutable Talos Linux via Talos SDK
VClusterVindVClusterProvisionerVirtual clusters via vCluster Go SDK (Vind driver)

Responsibilities:

  • Generate distribution-specific configs (kind.yaml, k3d.yaml, Talos patches, vcluster.yaml)
  • Bootstrap Kubernetes control plane
  • Configure cluster components
  • Manage cluster lifecycle
  • Support cluster updates and upgrades

Location: pkg/svc/provisioner/cluster/

sequenceDiagram
    participant User
    participant CLI
    participant Provisioner
    participant Provider

    User->>CLI: ksail cluster create
    CLI->>Provisioner: Create cluster
    Provisioner->>Provider: Create infrastructure
    Provider-->>Provisioner: Nodes ready
    Provisioner->>Provisioner: Bootstrap Kubernetes
    Provisioner-->>CLI: Cluster ready
    CLI-->>User: Success

Talos demonstrates the provider/provisioner separation. The same Talos provisioner (pkg/svc/provisioner/cluster/talos/) generates machine configs and bootstraps Kubernetes regardless of which provider is used: Docker (local containers), Hetzner (cloud servers with CCM and CSI), or Omni (Sidero API). This enables a consistent Talos experience across all environments.

KSail manages cluster components (CNI, CSI, metrics-server, cert-manager, policy engines, GitOps engines) through a structured lifecycle:

Components are installed in two phases to ensure dependencies are met:

Installed immediately after CNI becomes ready:

  • CSI — Storage drivers (local-path, Longhorn, Hetzner CSI)
  • Metrics Server — Resource metrics API
  • LoadBalancer — Cloud Provider KIND (Vanilla) or MetalLB (Talos)
  • Cert Manager — TLS certificate management
  • Policy Engine — Kyverno or Gatekeeper

Installed after Phase 1 completes and API server is stable:

  • Flux — GitOps continuous delivery
  • ArgoCD — GitOps continuous delivery

Between phases, KSail performs an API Server Stability Check (waitForAPIServerStability) that requires 3 consecutive successful health checks within a 2-minute timeout. This prevents GitOps engines from crashing with dial tcp 10.96.0.1:443: i/o timeout errors caused by infrastructure components temporarily destabilizing API server connectivity while registering webhooks and CRDs.

The detector service (pkg/svc/detector/) identifies installed components by:

  • Querying Helm release history for Helm-based components
  • Querying Kubernetes API for components installed via kubectl

This detection enables:

  • Accurate baselines — Know what’s currently installed before updates
  • Intelligent updates — Only modify what changed
  • Verification — Confirm components are running as expected

The diff service (pkg/svc/diff/) computes configuration differences and classifies update impact:

  • In-place — Can update without disruption (e.g., CNI config changes)
  • Reboot-required — Requires node reboot (e.g., Talos kernel changes)
  • Recreate-required — Must destroy and recreate cluster (e.g., distribution change)

KSail embeds Kubernetes tools as Go libraries instead of shelling out to CLI tools:

  1. Single Binary — No external dependencies except Docker
  2. Version Control — Tool versions locked in go.mod
  3. API Access — Direct access to tool APIs, not just CLI output parsing
  4. Performance — No process spawning overhead
  5. Error Handling — Structured errors from Go APIs vs parsing stderr
ToolPurposeLocation
kubectlKubernetes API operationspkg/client/kubectl/
helmChart operationspkg/client/helm/
kindVanilla provisionerpkg/svc/provisioner/cluster/kind/ (uses Kind SDK directly)
k3dK3s provisionerpkg/svc/provisioner/cluster/k3d/ (uses K3d Cobra/SDK directly)
vclusterVCluster provisionerpkg/svc/provisioner/cluster/vcluster/ (uses vCluster Go SDK directly)
fluxFlux GitOpspkg/client/flux/
argocdArgoCD GitOpspkg/client/argocd/
k9sTerminal UIpkg/client/k9s/
kubeconformValidationpkg/client/kubeconform/
kustomizeRenderingpkg/client/kustomize/

Only Docker is required as an external dependency:

  • Docker — Container runtime for local clusters (Docker provider)

Cloud providers (Hetzner, Omni) require credentials:

  • HetznerHCLOUD_TOKEN environment variable
  • Omni — Service account key (default: OMNI_SERVICE_ACCOUNT_KEY, configurable via spec.cluster.omni.serviceAccountKeyEnvVar)

Different distributions handle state differently:

Talos and VCluster can introspect running configuration:

  • Talos can query machine configs via API
  • VCluster stores config in Kubernetes resources

KSail can detect current configuration without local state files.

Vanilla (Kind) and K3s (K3d) cannot introspect running configuration:

  • Kind and K3d don’t expose cluster config via API
  • Cannot reliably detect what was used to create cluster

KSail persists cluster specs to ~/.ksail/clusters/<name>/spec.json for these distributions.

State Service (pkg/svc/state/):

  • Save — Write ClusterSpec to JSON after cluster creation
  • Load — Read ClusterSpec for update operations
  • Delete — Clean up state file on cluster deletion

This enables the ksail cluster update command to compare desired vs current state.

KSail provides two AI interfaces:

Command: ksail chatImplementation: pkg/svc/chat/

Uses GitHub Copilot SDK for interactive cluster configuration and troubleshooting. Supports two modes: Agent (</>) with full tool execution and Plan () for describing steps without executing. Authenticated via KSAIL_COPILOT_TOKEN or COPILOT_TOKEN. See AI Chat for full documentation.

Command: ksail mcpImplementation: pkg/svc/mcp/

Exposes KSail as a Model Context Protocol server for Claude and other AI assistants. Provides tools for cluster management, workload deployment, and configuration. See MCP for setup instructions.

KSail uses multiple testing approaches:

  • Unit testsgo test ./...; uses testify/mock, export_test.go for unexported symbols, t.Parallel(), and static error sentinels
  • Integration tests — Test real cluster operations (Kind/K3d/VCluster clusters with real workloads)
  • System tests (CI) — System tests run on Linux (ubuntu-latest, amd64) via .github/workflows/ci.yaml
  • Benchmarksgo test -bench=. -benchmem ./... for performance regression tracking

Automated AI-powered workflows run on schedules for continuous improvement: daily-code-quality, daily-plan, daily-builder, daily-workflow-maintenance, and daily-docs. These are Markdown-based agentic workflow definitions (not GitHub Actions YAML workflows) stored in .github/workflows/*.md.

KSail uses a declarative configuration model:

  1. ksail.yaml — Main cluster configuration

    • Cluster metadata (name, distribution, provider)
    • Component selection (CNI, CSI, metrics-server, etc.)
    • GitOps configuration
    • Mirror registries
    • Custom settings per distribution
  2. Distribution Configs — Native tool configurations

    • kind.yaml (Vanilla/Kind)
    • k3d.yaml (K3s/K3d)
    • talos/ directory (Talos machine configs and patches)
    • vcluster.yaml (VCluster)
  3. k8s/ Directory — Kubernetes manifests

    • Application deployments
    • Services
    • ConfigMaps/Secrets
    • Kustomize overlays

JSON schemas provide editor autocomplete and validation:

Generate schemas: go generate ./schemas/...

Location: schemas/ksail-config.schema.json

VSCode integration: Automatic via .vscode/settings.json

The configmanager (pkg/fsutil/configmanager/) handles configuration loading:

  1. Load ksail.yaml and validate against schema
  2. Load distribution config (kind.yaml, k3d.yaml, etc.)
  3. Merge configs with defaults
  4. Validate combined configuration
  5. Return strongly-typed ClusterSpec
Terminal window
# Development build
go build -o ksail
# Optimized build (strips debug symbols)
go build -ldflags="-s -w" -o ksail
# Build with version info (CI)
go build -ldflags="-s -w -X github.com/devantler-tech/ksail/v5/internal/buildmeta.Version=v5.x.x" -o ksail
  1. Push to main triggers the release workflow (.github/workflows/release.yaml); it can also be started manually via workflow_dispatch.
  2. The release workflow delegates to a reusable workflow that builds binaries for all supported platforms/architectures:
    • Linux (amd64, arm64)
    • macOS (arm64)
    • Windows (amd64, arm64)
  3. Artifacts are uploaded to GitHub Releases by the workflow
  4. Checksums are generated for verification
  5. VSCode extension is packaged and published to the marketplace as part of the release workflow

Config: .github/workflows/release.yaml

Documentation is built with Astro and Starlight:

Terminal window
cd docs/
npm ci
npm run build # Generates static site in dist/
npm run dev # Local development server

Deployed to: GitHub Pages (ksail.devantler.tech)

Workflow: .github/workflows/publish-pages.yaml

See CONTRIBUTING.md for:

  • Development setup
  • Coding standards
  • Pull request process
  • Testing requirements
  • Documentation guidelines