Skip to content

Hetzner Provider

The Hetzner provider creates Kubernetes cluster nodes as Hetzner Cloud servers running Talos Linux. It provisions real cloud infrastructure with public IPs, private networking, load balancers, and persistent volumes — ideal for production-grade clusters and cloud testing.

The Hetzner provider is ideal when you:

  • Need production-grade Kubernetes on affordable European cloud infrastructure
  • Want real cloud load balancers and persistent volumes (not emulated locally)
  • Run performance or integration tests that require dedicated server resources
  • Deploy Talos Linux clusters with high-availability placement groups
  1. Hetzner Cloud account — Sign up at hetzner.com/cloud
  2. API token — Create one in the Hetzner Cloud Console → Project → Security → API Tokens (read/write permissions)
  3. Talos ISO or Schematic — Either a Talos Linux ISO must be available in your Hetzner Cloud project (x86 default: 125127 for Talos 1.12.4; for ARM, look up the matching ISO ID in the Hetzner Cloud Console), or a Talos factory schematic ID can be provided via spec.cluster.talos.schematicId so KSail builds the snapshot automatically. See Talos options.
  4. Docker — Required locally for ksail CLI operations

Export the Hetzner Cloud API token so KSail can manage servers:

Terminal window
export HCLOUD_TOKEN=your-hetzner-api-token

By default, KSail reads from HCLOUD_TOKEN. To use a different environment variable name, set spec.provider.hetzner.tokenEnvVar in ksail.yaml.

The Hetzner provider is configured through spec.provider.hetzner in ksail.yaml:

apiVersion: ksail.io/v1alpha1
kind: Cluster
metadata:
name: my-hetzner-cluster
spec:
cluster:
distribution: Talos
provider: Hetzner
controlPlanes: 1
workers: 2
talos:
# ISO image ID for Hetzner (default: 125127 for Talos 1.12.4 x86;
# look up the matching ARM ISO ID in the Hetzner Cloud Console).
# Ignored when schematicId is set — KSail builds a snapshot instead.
iso: 125127
# Optional: Talos factory schematic ID. When set, KSail builds a Hetzner
# snapshot from the Talos factory URL and uses it instead of the ISO.
# Obtain a schematic ID from https://factory.talos.dev.
# schematicId: ""
provider:
hetzner:
# Server types (default: cx23 for both)
controlPlaneServerType: "cx23"
workerServerType: "cx23"
# Datacenter location (default: fsn1)
location: "fsn1"
# Private network settings
networkCidr: "10.0.0.0/16"
# networkName: "my-network" # defaults to <cluster>-network
# Optional: SSH key for server access (Talos API is primary)
# sshKeyName: "my-key"
# Optional: override the default env var name (default: HCLOUD_TOKEN)
# tokenEnvVar: "MY_CUSTOM_HCLOUD_TOKEN"
# Placement group settings for HA
placementGroupStrategy: "Spread" # or "None"
# placementGroup: "my-placement" # defaults to <cluster>-placement
# fallbackLocations: ["nbg1", "hel1"]
# placementGroupFallbackToNone: false
# Talos OS-level ingress firewall (default: Enabled — provides defense-in-depth)
# ingressFirewall: "Enabled" # or "Disabled"
# Restrict Kubernetes API (6443) and Talos API (50000) access to these CIDRs.
# When empty, both APIs are open to 0.0.0.0/0 and ::/0 (all IPv4 and IPv6).
# Applied to both the Hetzner Cloud Firewall and the Talos OS-level ingress firewall.
# allowedCidrs: ["203.0.113.0/24", "198.51.100.0/24"]
# Per-role public networking (default: both IPv4 and IPv6 enabled).
# Set workerPublicIPv4: false for IPv4-less workers reached over the private
# network — requires KSail to run with private-network reachability and a NAT
# gateway (or working IPv6) for egress. See "IPv4-less nodes" below.
# workerPublicIPv4: true
# workerPublicIPv6: true
# controlPlanePublicIPv4: true
# controlPlanePublicIPv6: true
# Maximum Hetzner servers for this cluster (default: 10)
# serverLimit: 10

| Field | Type | Default | Description | | --- | --- | --- | --- | | hetzner.controlPlaneServerType | string | cx23 | Hetzner server type for control-plane nodes | | hetzner.workerServerType | string | cx23 | Hetzner server type for worker nodes | | hetzner.location | string | fsn1 | Datacenter location (fsn1, nbg1, hel1) | | hetzner.networkName | string | <cluster>-network | Private network name | | hetzner.networkCidr | string | 10.0.0.0/16 | Network CIDR block | | hetzner.sshKeyName | string | (empty) | SSH key for server access (optional) | | hetzner.tokenEnvVar | string | HCLOUD_TOKEN | Environment variable containing the API token | | hetzner.placementGroupStrategy | string | Spread | Spread (HA) or None (no placement group) | | hetzner.placementGroup | string | <cluster>-placement | Placement group name | | hetzner.fallbackLocations | list | ["nbg1", "hel1"] | Alternative locations if primary is unavailable | | hetzner.placementGroupFallbackToNone | bool | false | Fall back to no placement group on capacity issues | | hetzner.ingressFirewall | string | Enabled | Talos OS-level ingress firewall (Enabled or Disabled). Generates NetworkDefaultActionConfig + NetworkRuleConfig patches as defense-in-depth alongside the Hetzner Cloud Firewall. | | hetzner.allowedCidrs | list | (empty) | CIDR blocks allowed to access the Kubernetes API (6443) and Talos API (50000) on control-plane nodes. When empty, both APIs are open to 0.0.0.0/0 and ::/0. Applied to both the Hetzner Cloud Firewall and the Talos OS-level ingress firewall for defense-in-depth. Example: ["203.0.113.0/24", "2001:db8::/32"] | | hetzner.workerPublicIPv4 | bool | true | Assign a public IPv4 to worker nodes. Set false for IPv4-less workers reached over the private network. See IPv4-less nodes. | | hetzner.workerPublicIPv6 | bool | true | Assign a public IPv6 to worker nodes (IPv6 is free on Hetzner). | | hetzner.controlPlanePublicIPv4 | bool | true | Assign a public IPv4 to control-plane nodes. Set false for IPv4-less control planes; the kube/talos endpoint then resolves to the private-network IP (cluster reachable only from inside the private network). | | hetzner.controlPlanePublicIPv6 | bool | true | Assign a public IPv6 to control-plane nodes. | | hetzner.serverLimit | int | 10 | Maximum total Hetzner servers for this cluster (your project quota). Validated only when Hetzner node autoscaling is enabled — KSail rejects configs whose reachable total exceeds it, where the reachable total is control-planes + workers + sum(pool.max), clamped to autoscaler.node.maxNodesTotal when that global cap is set. Set to 0 to use the default of 10. |

By default every Hetzner node receives a public IPv4 (billed, ~€0.50/mo each) and a public IPv6 (free). Setting workerPublicIPv4: false (and/or controlPlanePublicIPv4: false) provisions nodes without a public IPv4, reducing both cost and public attack surface — the node is no longer directly reachable from the internet.

Because KSail manages each node over its Talos API, an IPv4-less node changes how KSail reaches it:

  • KSail connects over the private network. When a node has no public IPv4, KSail uses the node's private-network IP for the Talos API, config apply, bootstrap, and (for control planes) the kube/talos endpoint. This requires KSail itself to have a route into the private network — run KSail from inside the network, over a VPN/WireGuard, or via a bastion. Without that route, provisioning hangs waiting for the Talos API.
  • Nodes still need egress. A node with no public IPv4 cannot pull container images, reach the Hetzner API (CCM/CSI), or join the cluster unless the private network provides egress. Provide a NAT gateway on the private network, or rely on public IPv6 where every required registry is reachable over IPv6.
  • IPv6-only egress is unreliable for image pulls. ghcr.io (and some other registries) are not consistently reachable over IPv6, so an IPv4-less node relying solely on IPv6 egress may fail to pull images. A NAT gateway is the robust option.

IPv4-less control planes. Disabling controlPlanePublicIPv4 makes the generated kubeconfig and talosconfig point at the control plane's private IP, so kubectl/ksail only work from inside the private network (or behind a load balancer / floating IP you place in front).

Reachability. Because an IPv4-less node is driven entirely over the private network, KSail must be able to route into it. If KSail can't reach an IPv4-less node's Talos API, provisioning fails with an actionable error pointing at the private-network-route and egress prerequisites above (rather than an opaque timeout). KSail also warns at config time when a role disables both IP families, since that node then depends on a NAT gateway for egress.

Autoscaler nodes. Autoscaler-created nodes inherit the worker public-net setting cluster-wide (KSail sets HCLOUD_PUBLIC_IPV4/HCLOUD_PUBLIC_IPV6 on the cluster-autoscaler). The upstream Hetzner cluster-autoscaler has no per-pool public-net control, so all autoscaler pools share the worker setting.

Load balancers. KSail does not create Hetzner load balancers, but when you expose Services with the Hetzner CCM and your nodes are IPv4-less, annotate the Service with load-balancer.hetzner.cloud/use-private-ip: "true" so the load balancer targets nodes by their private IP.

| Field | Type | Default | Description | | --- | --- | --- | --- | | talos.iso | int | 125127 | Hetzner Cloud ISO/image ID for Talos 1.12.4 x86 (look up the matching ARM ID in the Hetzner Cloud Console). Ignored when schematicId is set. | | talos.schematicId | string | (empty) | Talos factory schematic ID. When set, KSail automatically builds and manages a Hetzner snapshot image instead of booting from the ISO. spec.cluster.talos.version must also be set. Not supported for ARM64 server types (cax*). The snapshot is deleted when ksail cluster delete --delete-storage is run. |

KSail supports node and pod autoscaling for Hetzner clusters via spec.cluster.autoscaler. Node pools are Hetzner-specific and map directly to the Kubernetes Cluster Autoscaler node group format.

spec:
cluster:
autoscaler:
node:
enabled: true
expander: LeastWaste # LeastWaste | LeastNodes | Random (Price is not supported for Hetzner)
maxNodesTotal: 20 # whole-cluster node ceiling (--max-nodes-total); 0 = no cap
scaleDownUnneededTime: 10m
pools:
- name: workers-fsn1
serverType: cx23
location: fsn1
min: 1
max: 5
- name: gpu-nbg1
serverType: cx33
location: nbg1
min: 0
max: 3
labels: # applied to every node in this pool
workload: gpu
taints: # only pods that tolerate this land here
- key: dedicated
value: gpu
effect: NoSchedule
pod:
horizontal: Disabled # Enabled | Disabled (pod autoscaler setting; metrics-server is configured separately)
vertical: Disabled # Enabled | Disabled (reserved for future VPA support; not yet implemented)
provider:
hetzner:
serverLimit: 20 # project quota; must be ≥ the reachable total (controlPlanes + workers + sum(pool.max), capped by maxNodesTotal)

| Field | Type | Default | Description | | --- | --- | --- | --- | | autoscaler.node.enabled | bool | false | true — defer node scaling to an external autoscaler; false — KSail manages node counts directly | | autoscaler.node.expander | string | LeastWaste | Expander strategy: LeastWaste, LeastNodes, Random. (Price is rejected by KSail with a validation error for the Hetzner provider, because the Hetzner cloud provider does not implement the pricing API.) | | autoscaler.node.maxNodesTotal | int | 0 | Maximum total nodes in the whole cluster (control-planes + workers + autoscaler nodes), passed to the cluster-autoscaler --max-nodes-total flag — not an autoscaler-only budget. 0 disables the global cap. Should be ≤ serverLimit. | | autoscaler.node.scaleDownUnneededTime | string | 10m | How long a node must be unneeded before it is eligible for scale-down (e.g. 10m) | | autoscaler.node.pools[].name | string | — | Unique pool name (DNS-1123 label, max 63 chars) | | autoscaler.node.pools[].serverType | string | — | Hetzner server type for this pool (e.g. cx23, cax11) | | autoscaler.node.pools[].location | string | — | Hetzner datacenter location for this pool (e.g. fsn1) | | autoscaler.node.pools[].min | int | — | Minimum nodes in this pool | | autoscaler.node.pools[].max | int | — | Maximum nodes in this pool | | autoscaler.node.pools[].labels | map | (none) | Kubernetes node labels applied to every node in this pool (see Per-pool labels and taints). Keys must be valid Kubernetes label keys. | | autoscaler.node.pools[].taints[] | list | (none) | Kubernetes node taints applied to every node in this pool. Each entry has key (required), value (optional), and effect (NoSchedule, PreferNoSchedule, or NoExecute). | | autoscaler.pod.horizontal | string | Disabled | Enabled activates HPA support. Metrics-server is a prerequisite — configure it separately via spec.cluster.metricsServer. | | autoscaler.pod.vertical | string | Disabled | Reserved for future VPA support; not yet implemented. |

Distinguishing autoscaler nodes from baseline workers

Section titled “Distinguishing autoscaler nodes from baseline workers”

KSail stamps the Kubernetes node label ksail.io/autoscaled=true on every autoscaler-provisioned worker, and on no static baseline worker. This gives workloads a discriminator to key node affinity off of — for example a soft preferredDuringSchedulingIgnoredDuringExecution nodeAffinity preferring nodes where ksail.io/autoscaled DoesNotExist, so pods land on baseline workers first and only spill onto autoscaler nodes under real pressure (keeping autoscaler nodes empty and quick to scale down).

Each pool can declare labels and taints that KSail applies to every node provisioned in that pool — useful for steering workloads onto a specific pool (nodeSelector/nodeAffinity) or reserving a pool for tolerating workloads (e.g. GPU nodes).

KSail applies them through two complementary mechanisms so they are correct both on the live node and during scaling decisions:

  1. On the real Node — baked into each pool's Talos worker cloud-init as machine.nodeLabels / machine.nodeTaints, which Talos reconciles onto the Node (this is what actually labels/taints the running node, per the mechanism described above).
  2. On the scale-from-zero template — mirrored into the autoscaler's per-pool nodeConfigs[].labels/taints so the autoscaler knows, before any node of the pool exists, that scaling the pool would produce nodes carrying those labels/taints. Without this, the autoscaler would never scale a tainted pool up from zero (no template node tolerates the pending pod) and could scale the wrong pool for a label-selecting pod.

Under the hood this is delivered via the autoscaler's HCLOUD_CLUSTER_CONFIG (per-pool cloud-init, image, labels, and taints), which KSail writes to the cluster-autoscaler-config Secret. Changing a pool's labels or taints on ksail cluster update is applied in place and recycles that pool's existing autoscaler nodes so they pick up the change.

Set HCLOUD_TOKEN as described in the Configuration section above, then verify it is exported in your shell.

Terminal window
ksail cluster init \
--name my-hetzner-cluster \
--distribution Talos \
--provider Hetzner \
--control-planes 1 \
--workers 2

This creates ksail.yaml and a talos/ directory for Talos configuration patches.

Terminal window
ksail cluster create

KSail creates Hetzner Cloud servers, boots them with Talos Linux, bootstraps Kubernetes, configures kubectl context, and installs the Hetzner Cloud Controller Manager and CSI driver.

Terminal window
ksail cluster info
kubectl get nodes -o wide
kubectl get pods -n kube-system
Terminal window
ksail cluster delete

KSail provisions Hetzner Cloud servers running Talos Linux, connected via a private network. The Hetzner Cloud Controller Manager provides native load balancer integration, and the Hetzner CSI Driver provisions persistent volumes backed by Hetzner Block Storage.

graph TB
    subgraph "Your Machine"
        KSAIL["ksail CLI"]
    end

    subgraph "Hetzner Cloud"
        API["Hetzner Cloud API"]
        NET["Private Network"]
        LB["Cloud Load Balancer"]

        subgraph "Servers"
            CP["Control Plane (Talos)"]
            W1["Worker Node 1 (Talos)"]
            W2["Worker Node 2 (Talos)"]
        end
    end

    KSAIL -->|"HCLOUD_TOKEN"| API
    API --> CP
    API --> W1
    API --> W2
    CP --- NET
    W1 --- NET
    W2 --- NET
    LB -->|"traffic"| W1
    LB -->|"traffic"| W2
    CP -.->|"kubeconfig"| KSAIL

When using the Hetzner provider, KSail automatically installs:

  • Hetzner Cloud Controller Manager — Provisions Hetzner Cloud Load Balancers for type: LoadBalancer services
  • Hetzner CSI Driver — Provisions Hetzner Block Storage volumes for PersistentVolumeClaim resources
  • Placement groups — Distributes servers across physical hosts for high availability (configurable)
  • Cluster Autoscaler — Installed automatically when autoscaler.node.enabled: true (or the deprecated nodeAutoscaling: Enabled). Scales node pools defined under autoscaler.node.pools based on pending pod demand.

KSail applies two independent firewall layers for Hetzner Talos clusters:

  1. Hetzner Cloud Firewall — Restricts external traffic at the network perimeter. KSail manages three rules: Talos API (50000/tcp), Kubernetes API (6443/tcp), and ICMP. Ports for cluster-internal communication (etcd, kubelet, trustd) are not exposed because nodes communicate over the private Hetzner Cloud Network. When allowedCidrs is set, the Kubernetes and Talos API rules restrict source IPs to those CIDR blocks instead of 0.0.0.0/0 and ::/0.
  2. Talos OS-level ingress firewall — Defense-in-depth at the node level, enabled by default (spec.provider.hetzner.ingressFirewall: Enabled). Blocks all ingress by default; opens only the ports required for each node role (control plane vs worker), restricted to the cluster subnet where appropriate. When allowedCidrs is set, the Kubernetes API and Talos API NetworkRuleConfig rules also restrict source IPs to those CIDRs. Disable with ingressFirewall: Disabled if you manage node-level firewall rules yourself.

When spec.cluster.talos.schematicId is set, KSail manages a Hetzner Cloud snapshot image (instead of booting from the ISO) with the following lifecycle:

  • Createksail cluster create calls the Talos factory to download the raw disk image for the specified schematic and Talos version, uploads it to Hetzner Cloud as a snapshot, and labels it with ksail.io/talos-version, ksail.io/talos-schematic, and ksail.io/cluster. If a matching snapshot already exists, it is reused.
  • Reuse — Subsequent ksail cluster create calls find the existing snapshot by label and skip the upload step.
  • Deleteksail cluster delete --delete-storage removes the snapshot along with the cluster. Without --delete-storage, the snapshot is retained (useful when recreating the cluster).

Autoscaler-managed nodes are provisioned by the Cluster Autoscaler itself — booting the Talos snapshot and worker config baked into the cluster-autoscaler-config Secret — not by KSail directly. They are therefore not part of the in-place rolling Talos/Kubernetes upgrade ksail cluster update runs against the static control planes and workers (those are matched by the ksail.owned label; autoscaler nodes are not).

To keep them on the baseline, when a ksail cluster update changes the Talos or Kubernetes version, KSail:

  1. Rebuilds the Talos snapshot at the new version and refreshes the cluster-autoscaler-config Secret (so new autoscaler nodes boot the new version), then restarts the cluster-autoscaler to load it.
  2. Recycles the existing autoscaler nodes so they follow the new baseline instead of drifting: after the restarted autoscaler is ready, each autoscaler node is cordoned and drained one at a time (through the Kubernetes eviction API, honoring PodDisruptionBudgets) and its Hetzner server is deleted. The cluster-autoscaler then provisions any still-needed capacity from the refreshed snapshot on demand.

This mirrors the upstream Cluster Autoscaler model — the autoscaler owns node creation, so KSail removes the stale nodes and lets the autoscaler replace them rather than upgrading ephemeral, compute-only nodes in place. Recycling runs only when the version actually changes; a no-op cluster update leaves autoscaler nodes untouched.

Terminal window
ksail cluster list
Terminal window
ksail cluster info

Displays Kubernetes control-plane and core service endpoints for the current context.

Terminal window
ksail cluster update

Applies in-place changes to components (CNI, GitOps engine, cert-manager, etc.). Node scaling changes for Talos are applied in-place. A Talos or Kubernetes version bump rolls the static nodes in place and recycles autoscaler nodes so they follow the new baseline (see Autoscaler Node Upgrades). Changes to hetzner.location, hetzner.controlPlaneServerType, or hetzner.networkCidr require cluster recreation. See the Update Behavior table for details.

Terminal window
kubectl create deployment web --image=nginx --replicas=3
kubectl expose deployment web --port=80 --type=LoadBalancer
kubectl get svc web --watch

The type: LoadBalancer service provisions a real Hetzner Cloud Load Balancer with a public IP.

hcloud token is not set — The HCLOUD_TOKEN environment variable is missing or empty. Run echo $HCLOUD_TOKEN to check. Re-export if needed. If you use a custom variable name, verify it matches spec.provider.hetzner.tokenEnvVar in ksail.yaml.

server type not found — The server type name in controlPlaneServerType, workerServerType, or an autoscaler pool does not exist in the Hetzner Cloud API. Check available types in the Hetzner Cloud Console or with hcloud server-type list.

server type unavailable in all configured locations — The requested server type exists but is not available in your primary location or any configured fallback locations. KSail runs this precheck before creating any infrastructure, so no partial resources are left behind. Add more fallback locations or choose a different server type:

spec:
provider:
hetzner:
location: "fsn1"
fallbackLocations: ["nbg1", "hel1"]

Server creation fails with resource unavailability — If the precheck passes but server creation still fails (transient capacity issue), configure spec.provider.hetzner.fallbackLocations to try alternative datacenter locations automatically (see example above).

Placement group errors — Hetzner limits spread placement groups to 10 servers per datacenter. Reduce your node count or set placementGroupStrategy: "None". For best-effort HA, set placementGroupFallbackToNone: true to fall back automatically when spread placement fails.

autoscaler configuration exceeds Hetzner server limit — The reachable total node count (controlPlanes + workers + sum(pool.max), clamped by maxNodesTotal when set) exceeds hetzner.serverLimit. Increase serverLimit, reduce your pool max values, or lower maxNodesTotal to cap the cluster total.

cloud provider requires an external registry — Hetzner Cloud servers cannot reach Docker-based local registries running on your machine. When spec.cluster.localRegistry is enabled, it must point to an internet-accessible registry (e.g., ghcr.io/myorg). KSail returns this error early if a non-external registry is configured.

context deadline exceeded or connection errors — Verify HCLOUD_TOKEN is valid and has read/write permissions. Check connectivity with curl -sI https://api.hetzner.cloud/v1/servers -H "Authorization: Bearer $HCLOUD_TOKEN".

Cluster deletion leaves orphaned resources — If ksail cluster delete is interrupted, manually clean up in the Hetzner Cloud Console: delete servers, load balancers, networks, and placement groups associated with your cluster name.