Skip to content

Hetzner Provider

The Hetzner provider creates Kubernetes cluster nodes as Hetzner Cloud servers running Talos Linux. It provisions real cloud infrastructure with public IPs, private networking, load balancers, and persistent volumes — ideal for production-grade clusters and cloud testing.

The Hetzner provider is ideal when you:

  • Need production-grade Kubernetes on affordable European cloud infrastructure
  • Want real cloud load balancers and persistent volumes (not emulated locally)
  • Run performance or integration tests that require dedicated server resources
  • Deploy Talos Linux clusters with high-availability placement groups

[!NOTE] The Hetzner provider only supports the Talos distribution. For local Docker-based clusters, see the Docker Provider or the Talos distribution guide.

  1. Hetzner Cloud account — Sign up at hetzner.com/cloud
  2. API token — Create one in the Hetzner Cloud Console → Project → Security → API Tokens (read/write permissions)
  3. Talos ISO or Schematic — Either a Talos Linux ISO must be available in your Hetzner Cloud project (x86 default: 125127 for Talos 1.12.4; for ARM, look up the matching ISO ID in the Hetzner Cloud Console), or a Talos factory schematic ID can be provided via spec.cluster.talos.schematicId so KSail builds the snapshot automatically. See Talos options.
  4. Docker — Required locally for ksail CLI operations

Export the Hetzner Cloud API token so KSail can manage servers:

Terminal window
export HCLOUD_TOKEN=your-hetzner-api-token

[!TIP] Add the export to your shell profile (~/.bashrc, ~/.zshrc) to persist across sessions.

By default, KSail reads from HCLOUD_TOKEN. To use a different environment variable name, set spec.provider.hetzner.tokenEnvVar in ksail.yaml.

The Hetzner provider is configured through spec.provider.hetzner in ksail.yaml:

apiVersion: ksail.io/v1alpha1
kind: Cluster
metadata:
name: my-hetzner-cluster
spec:
cluster:
distribution: Talos
provider: Hetzner
controlPlanes: 1
workers: 2
talos:
# ISO image ID for Hetzner (default: 125127 for Talos 1.12.4 x86;
# look up the matching ARM ISO ID in the Hetzner Cloud Console).
# Ignored when schematicId is set — KSail builds a snapshot instead.
iso: 125127
# Optional: Talos factory schematic ID. When set, KSail builds a Hetzner
# snapshot from the Talos factory URL and uses it instead of the ISO.
# Obtain a schematic ID from https://factory.talos.dev.
# schematicId: ""
provider:
hetzner:
# Server types (default: cx23 for both)
controlPlaneServerType: "cx23"
workerServerType: "cx23"
# Datacenter location (default: fsn1)
location: "fsn1"
# Private network settings
networkCidr: "10.0.0.0/16"
# networkName: "my-network" # defaults to <cluster>-network
# Optional: SSH key for server access (Talos API is primary)
# sshKeyName: "my-key"
# Optional: override the default env var name (default: HCLOUD_TOKEN)
# tokenEnvVar: "MY_CUSTOM_HCLOUD_TOKEN"
# Placement group settings for HA
placementGroupStrategy: "Spread" # or "None"
# placementGroup: "my-placement" # defaults to <cluster>-placement
# fallbackLocations: ["nbg1", "hel1"]
# placementGroupFallbackToNone: false
# Talos OS-level ingress firewall (default: Enabled — provides defense-in-depth)
# ingressFirewall: "Enabled" # or "Disabled"
# Restrict Kubernetes API (6443) and Talos API (50000) access to these CIDRs.
# When empty, both APIs are open to 0.0.0.0/0 and ::/0 (all IPv4 and IPv6).
# Applied to both the Hetzner Cloud Firewall and the Talos OS-level ingress firewall.
# allowedCidrs: ["203.0.113.0/24", "198.51.100.0/24"]
# Per-role public networking (default: both IPv4 and IPv6 enabled).
# Set workerPublicIPv4: false for IPv4-less workers reached over the private
# network — requires KSail to run with private-network reachability and a NAT
# gateway (or working IPv6) for egress. See "IPv4-less nodes" below.
# workerPublicIPv4: true
# workerPublicIPv6: true
# controlPlanePublicIPv4: true
# controlPlanePublicIPv6: true
# Maximum Hetzner servers for this cluster (default: 10)
# serverLimit: 10

The full spec.provider.hetzner field reference — every option with its type, default, and description — is generated from the API types: see spec.provider.hetzner (OptionsHetzner) in the Declarative Configuration reference.

By default every Hetzner node receives a public IPv4 (billed, ~€0.50/mo each) and a public IPv6 (free). Setting workerPublicIPv4: false (and/or controlPlanePublicIPv4: false) provisions nodes without a public IPv4, reducing both cost and public attack surface — the node is no longer directly reachable from the internet.

Because KSail manages each node over its Talos API, an IPv4-less node changes how KSail reaches it:

  • KSail connects over the private network. When a node has no public IPv4, KSail uses the node’s private-network IP for the Talos API, config apply, bootstrap, and (for control planes) the kube/talos endpoint. This requires KSail itself to have a route into the private network — run KSail from inside the network, over a VPN/WireGuard, or via a bastion. Without that route, provisioning hangs waiting for the Talos API.
  • Nodes still need egress. A node with no public IPv4 cannot pull container images, reach the Hetzner API (CCM/CSI), or join the cluster unless the private network provides egress. Provide a NAT gateway on the private network, or rely on public IPv6 where every required registry is reachable over IPv6.
  • IPv6-only egress is unreliable for image pulls. ghcr.io (and some other registries) are not consistently reachable over IPv6, so an IPv4-less node relying solely on IPv6 egress may fail to pull images. A NAT gateway is the robust option.

IPv4-less control planes. Disabling controlPlanePublicIPv4 makes the generated kubeconfig and talosconfig point at the control plane’s private IP, so kubectl/ksail only work from inside the private network (or behind a load balancer / floating IP you place in front).

Reachability. Because an IPv4-less node is driven entirely over the private network, KSail must be able to route into it. If KSail can’t reach an IPv4-less node’s Talos API, provisioning fails with an actionable error pointing at the private-network-route and egress prerequisites above (rather than an opaque timeout). KSail also warns at config time when a role disables both IP families, since that node then depends on a NAT gateway for egress.

Autoscaler nodes. Autoscaler-created nodes inherit the worker public-net setting cluster-wide (KSail sets HCLOUD_PUBLIC_IPV4/HCLOUD_PUBLIC_IPV6 on the cluster-autoscaler). The upstream Hetzner cluster-autoscaler has no per-pool public-net control, so all autoscaler pools share the worker setting.

Load balancers. KSail does not create Hetzner load balancers, but when you expose Services with the Hetzner CCM and your nodes are IPv4-less, annotate the Service with load-balancer.hetzner.cloud/use-private-ip: "true" so the load balancer targets nodes by their private IP.

FieldTypeDefaultDescription
talos.isoint125127Hetzner Cloud ISO/image ID for Talos 1.12.4 x86 (look up the matching ARM ID in the Hetzner Cloud Console). Ignored when schematicId is set.
talos.schematicIdstring(empty)Talos factory schematic ID. When set, KSail automatically builds and manages a Hetzner snapshot image instead of booting from the ISO. spec.cluster.talos.version must also be set. Not supported for ARM64 server types (cax*). The snapshot is deleted when ksail cluster delete --delete-storage is run.

KSail supports node and pod autoscaling for Hetzner clusters via spec.cluster.autoscaler. Node pools are Hetzner-specific and map directly to the Kubernetes Cluster Autoscaler node group format.

spec:
cluster:
autoscaler:
node:
enabled: true
expander: LeastWaste # single value, or a priority list e.g. [LeastNodes, LeastWaste] (Price is not supported for Hetzner)
maxNodesTotal: 20 # whole-cluster node ceiling (--max-nodes-total); 0 = no cap
scaleDownUnneededTime: 10m
pools:
- name: workers-fsn1
serverType: cx23
location: fsn1
min: 1
max: 5
- name: gpu-nbg1
serverType: cx33
location: nbg1
min: 0
max: 3
labels: # applied to every node in this pool
workload: gpu
taints: # only pods that tolerate this land here
- key: dedicated
value: gpu
effect: NoSchedule
pod:
horizontal: Disabled # Enabled | Disabled (pod autoscaler setting; metrics-server is configured separately)
vertical: Disabled # Enabled | Disabled (reserved for future VPA support; not yet implemented)
provider:
hetzner:
serverLimit: 20 # project quota; must be ≥ the reachable total (controlPlanes + workers + sum(pool.max), capped by maxNodesTotal)

The full autoscaler field reference (spec.cluster.autoscaler, including node pools and per-pool labels/taints) is generated from the API types: see spec.cluster.autoscaler (AutoscalerConfig) in the Declarative Configuration reference. Note that the Price expander is rejected by KSail with a validation error for the Hetzner provider, because the Hetzner cloud provider does not implement the pricing API.

[!NOTE] spec.provider.hetzner.serverLimit guards against exceeding your Hetzner Cloud project quota. KSail rejects configurations where the reachable total exceeds serverLimit, where the reachable total is controlPlanes + workers + sum(pool.max), clamped to autoscaler.node.maxNodesTotal when that global cap is set. Because maxNodesTotal is the whole-cluster ceiling (the value passed to the autoscaler’s --max-nodes-total), set it ≤ serverLimit. Raise serverLimit to match your actual Hetzner project quota.

Distinguishing autoscaler nodes from baseline workers

Section titled “Distinguishing autoscaler nodes from baseline workers”

KSail stamps the Kubernetes node label ksail.io/autoscaled=true on every autoscaler-provisioned worker, and on no static baseline worker. This gives workloads a discriminator to key node affinity off of — for example a soft preferredDuringSchedulingIgnoredDuringExecution nodeAffinity preferring nodes where ksail.io/autoscaled DoesNotExist, so pods land on baseline workers first and only spill onto autoscaler nodes under real pressure (keeping autoscaler nodes empty and quick to scale down).

[!NOTE] The label is applied via the autoscaler worker’s Talos machine.nodeLabels (the kubelet --node-labels flag), so it lands on the real Node object. The upstream Hetzner cluster-autoscaler deliberately does not push its per-pool nodeConfigs[].labels/taints to the kubelet — those only seed the in-memory template node used for scheduling simulation and scale-from-zero (see kubernetes/autoscaler#8492, closed as working-as-intended). Stamping the label in the worker cloud-init is therefore the canonical mechanism.

Each pool can declare labels and taints that KSail applies to every node provisioned in that pool — useful for steering workloads onto a specific pool (nodeSelector/nodeAffinity) or reserving a pool for tolerating workloads (e.g. GPU nodes).

KSail applies them through two complementary mechanisms so they are correct both on the live node and during scaling decisions:

  1. On the real Node — baked into each pool’s Talos worker cloud-init as machine.nodeLabels / machine.nodeTaints, which Talos reconciles onto the Node (this is what actually labels/taints the running node, per the mechanism described above).
  2. On the scale-from-zero template — mirrored into the autoscaler’s per-pool nodeConfigs[].labels/taints so the autoscaler knows, before any node of the pool exists, that scaling the pool would produce nodes carrying those labels/taints. Without this, the autoscaler would never scale a tainted pool up from zero (no template node tolerates the pending pod) and could scale the wrong pool for a label-selecting pod.

Under the hood this is delivered via the autoscaler’s HCLOUD_CLUSTER_CONFIG (per-pool cloud-init, image, labels, and taints), which KSail writes to the cluster-autoscaler-config Secret. Changing a pool’s labels or taints on ksail cluster update is applied in place and recycles that pool’s existing autoscaler nodes so they pick up the change.

Set HCLOUD_TOKEN as described in the Configuration section above, then verify it is exported in your shell.

Terminal window
ksail cluster init \
--name my-hetzner-cluster \
--distribution Talos \
--provider Hetzner \
--control-planes 1 \
--workers 2

This creates ksail.yaml and a talos/ directory for Talos configuration patches.

Terminal window
ksail cluster create

KSail creates Hetzner Cloud servers, boots them with Talos Linux, bootstraps Kubernetes, configures kubectl context, and installs the Hetzner Cloud Controller Manager and CSI driver.

Terminal window
ksail cluster info
kubectl get nodes -o wide
kubectl get pods -n kube-system
Terminal window
ksail cluster delete

[!WARNING] This deletes the Hetzner Cloud servers, load balancers, and associated resources. You will stop being charged, but the operation is irreversible.

KSail provisions Hetzner Cloud servers running Talos Linux, connected via a private network. The Hetzner Cloud Controller Manager provides native load balancer integration, and the Hetzner CSI Driver provisions persistent volumes backed by Hetzner Block Storage.

graph TB
    subgraph "Your Machine"
        KSAIL["ksail CLI"]
    end

    subgraph "Hetzner Cloud"
        API["Hetzner Cloud API"]
        NET["Private Network"]
        LB["Cloud Load Balancer"]

        subgraph "Servers"
            CP["Control Plane (Talos)"]
            W1["Worker Node 1 (Talos)"]
            W2["Worker Node 2 (Talos)"]
        end
    end

    KSAIL -->|"HCLOUD_TOKEN"| API
    API --> CP
    API --> W1
    API --> W2
    CP --- NET
    W1 --- NET
    W2 --- NET
    LB -->|"traffic"| W1
    LB -->|"traffic"| W2
    CP -.->|"kubeconfig"| KSAIL

When using the Hetzner provider, KSail automatically installs:

  • Hetzner Cloud Controller Manager — Provisions Hetzner Cloud Load Balancers for type: LoadBalancer services
  • Hetzner CSI Driver — Provisions Hetzner Block Storage volumes for PersistentVolumeClaim resources
  • Placement groups — Distributes servers across physical hosts for high availability (configurable)
  • Cluster Autoscaler — Installed automatically when autoscaler.node.enabled: true (or the deprecated nodeAutoscaling: Enabled). Scales node pools defined under autoscaler.node.pools based on pending pod demand.

KSail applies two independent firewall layers for Hetzner Talos clusters:

  1. Hetzner Cloud Firewall — Restricts external traffic at the network perimeter. KSail manages three rules: Talos API (50000/tcp), Kubernetes API (6443/tcp), and ICMP. Ports for cluster-internal communication (etcd, kubelet, trustd) are not exposed because nodes communicate over the private Hetzner Cloud Network. When allowedCidrs is set, the Kubernetes and Talos API rules restrict source IPs to those CIDR blocks instead of 0.0.0.0/0 and ::/0.
  2. Talos OS-level ingress firewall — Defense-in-depth at the node level, enabled by default (spec.provider.hetzner.ingressFirewall: Enabled). Blocks all ingress by default; opens only the ports required for each node role (control plane vs worker), restricted to the cluster subnet where appropriate. When allowedCidrs is set, the Kubernetes API and Talos API NetworkRuleConfig rules also restrict source IPs to those CIDRs. Disable with ingressFirewall: Disabled if you manage node-level firewall rules yourself.

[!NOTE] Upgrading from older KSail versions: Clusters created before the firewall hardening change used 6 Hetzner Cloud Firewall rules. KSail now manages 3 rules (Talos API, Kubernetes API, and ICMP), with etcd, kubelet, and trustd access kept on the private Hetzner Cloud Network. Running ksail cluster update automatically migrates existing clusters to the current rule set.

When spec.cluster.talos.schematicId is set, KSail manages a Hetzner Cloud snapshot image (instead of booting from the ISO) with the following lifecycle:

  • Createksail cluster create calls the Talos factory to download the raw disk image for the specified schematic and Talos version, uploads it to Hetzner Cloud as a snapshot, and labels it with ksail.io/talos-version, ksail.io/talos-schematic, and ksail.io/cluster. If a matching snapshot already exists, it is reused.
  • Reuse — Subsequent ksail cluster create calls find the existing snapshot by label and skip the upload step.
  • Deleteksail cluster delete --delete-storage removes the snapshot along with the cluster. Without --delete-storage, the snapshot is retained (useful when recreating the cluster).

[!NOTE] Snapshots are scoped per cluster name, so deleting one cluster does not affect snapshots used by another cluster with the same schematic.

Autoscaler-managed nodes are provisioned by the Cluster Autoscaler itself — booting the Talos snapshot and worker config baked into the cluster-autoscaler-config Secret — not by KSail directly. They are therefore not part of the in-place rolling Talos/Kubernetes upgrade ksail cluster update runs against the static control planes and workers (those are matched by the ksail.owned label; autoscaler nodes are not).

To keep them on the baseline, when a ksail cluster update changes the Talos or Kubernetes version, KSail:

  1. Rebuilds the Talos snapshot at the new version and refreshes the cluster-autoscaler-config Secret (so new autoscaler nodes boot the new version), then restarts the cluster-autoscaler to load it.
  2. Recycles the existing autoscaler nodes so they follow the new baseline instead of drifting: after the restarted autoscaler is ready, each autoscaler node is cordoned and drained one at a time (through the Kubernetes eviction API, honoring PodDisruptionBudgets) and its Hetzner server is deleted. The cluster-autoscaler then provisions any still-needed capacity from the refreshed snapshot on demand.

This mirrors the upstream Cluster Autoscaler model — the autoscaler owns node creation, so KSail removes the stale nodes and lets the autoscaler replace them rather than upgrading ephemeral, compute-only nodes in place. Recycling runs only when the version actually changes; a no-op cluster update leaves autoscaler nodes untouched.

[!NOTE] Recycling drains nodes one at a time, so a strict PodDisruptionBudget can slow or block it. Pods that cannot be evicted within the drain timeout fail the update so the condition surfaces rather than abruptly evicting workloads.

Terminal window
ksail cluster list
Terminal window
ksail cluster info

Displays Kubernetes control-plane and core service endpoints for the current context.

Terminal window
ksail cluster update

Applies in-place changes to components (CNI, GitOps engine, cert-manager, etc.). Node scaling changes for Talos are applied in-place. A Talos or Kubernetes version bump rolls the static nodes in place and recycles autoscaler nodes so they follow the new baseline (see Autoscaler Node Upgrades). Changes to hetzner.location, hetzner.controlPlaneServerType, or hetzner.networkCidr require cluster recreation. See the Update Behavior table for details.

Terminal window
kubectl create deployment web --image=nginx --replicas=3
kubectl expose deployment web --port=80 --type=LoadBalancer
kubectl get svc web --watch

The type: LoadBalancer service provisions a real Hetzner Cloud Load Balancer with a public IP.

hcloud token is not set — The HCLOUD_TOKEN environment variable is missing or empty. Run echo $HCLOUD_TOKEN to check. Re-export if needed. If you use a custom variable name, verify it matches spec.provider.hetzner.tokenEnvVar in ksail.yaml.

server type not found — The server type name in controlPlaneServerType, workerServerType, or an autoscaler pool does not exist in the Hetzner Cloud API. Check available types in the Hetzner Cloud Console or with hcloud server-type list.

server type unavailable in all configured locations — The requested server type exists but is not available in your primary location or any configured fallback locations. KSail runs this precheck before creating any infrastructure, so no partial resources are left behind. Add more fallback locations or choose a different server type:

spec:
provider:
hetzner:
location: "fsn1"
fallbackLocations: ["nbg1", "hel1"]

Server creation fails with resource unavailability — If the precheck passes but server creation still fails (transient capacity issue), configure spec.provider.hetzner.fallbackLocations to try alternative datacenter locations automatically (see example above).

Placement group errors — Hetzner limits spread placement groups to 10 servers per datacenter. Reduce your node count or set placementGroupStrategy: "None". For best-effort HA, set placementGroupFallbackToNone: true to fall back automatically when spread placement fails.

autoscaler configuration exceeds Hetzner server limit — The reachable total node count (controlPlanes + workers + sum(pool.max), clamped by maxNodesTotal when set) exceeds hetzner.serverLimit. Increase serverLimit, reduce your pool max values, or lower maxNodesTotal to cap the cluster total.

cloud provider requires an external registry — Hetzner Cloud servers cannot reach Docker-based local registries running on your machine. When spec.cluster.localRegistry is enabled, it must point to an internet-accessible registry (e.g., ghcr.io/myorg). KSail returns this error early if a non-external registry is configured.

context deadline exceeded or connection errors — Verify HCLOUD_TOKEN is valid and has read/write permissions. Check connectivity with curl -sI https://api.hetzner.cloud/v1/servers -H "Authorization: Bearer $HCLOUD_TOKEN".

Cluster deletion leaves orphaned resources — If ksail cluster delete is interrupted, manually clean up in the Hetzner Cloud Console: delete servers, load balancers, networks, and placement groups associated with your cluster name.