Hetzner Provider
The Hetzner provider creates Kubernetes cluster nodes as Hetzner Cloud servers running Talos Linux. It provisions real cloud infrastructure with public IPs, private networking, load balancers, and persistent volumes — ideal for production-grade clusters and cloud testing.
When to Use the Hetzner Provider
Section titled “When to Use the Hetzner Provider”The Hetzner provider is ideal when you:
- Need production-grade Kubernetes on affordable European cloud infrastructure
- Want real cloud load balancers and persistent volumes (not emulated locally)
- Run performance or integration tests that require dedicated server resources
- Deploy Talos Linux clusters with high-availability placement groups
[!NOTE] The Hetzner provider only supports the Talos distribution. For local Docker-based clusters, see the Docker Provider or the Talos distribution guide.
Prerequisites
Section titled “Prerequisites”- Hetzner Cloud account — Sign up at hetzner.com/cloud
- API token — Create one in the Hetzner Cloud Console → Project → Security → API Tokens (read/write permissions)
- Talos ISO or Schematic — Either a Talos Linux ISO must be available in your Hetzner Cloud project (x86 default:
125127for Talos 1.12.4; for ARM, look up the matching ISO ID in the Hetzner Cloud Console), or a Talos factory schematic ID can be provided viaspec.cluster.talos.schematicIdso KSail builds the snapshot automatically. See Talos options. - Docker — Required locally for
ksailCLI operations
Configuration
Section titled “Configuration”Environment Variable
Section titled “Environment Variable”Export the Hetzner Cloud API token so KSail can manage servers:
export HCLOUD_TOKEN=your-hetzner-api-token[!TIP] Add the export to your shell profile (
~/.bashrc,~/.zshrc) to persist across sessions.
By default, KSail reads from HCLOUD_TOKEN. To use a different environment variable name, set spec.provider.hetzner.tokenEnvVar in ksail.yaml.
ksail.yaml Reference
Section titled “ksail.yaml Reference”The Hetzner provider is configured through spec.provider.hetzner in ksail.yaml:
apiVersion: ksail.io/v1alpha1kind: Clustermetadata: name: my-hetzner-clusterspec: cluster: distribution: Talos provider: Hetzner controlPlanes: 1 workers: 2 talos: # ISO image ID for Hetzner (default: 125127 for Talos 1.12.4 x86; # look up the matching ARM ISO ID in the Hetzner Cloud Console). # Ignored when schematicId is set — KSail builds a snapshot instead. iso: 125127 # Optional: Talos factory schematic ID. When set, KSail builds a Hetzner # snapshot from the Talos factory URL and uses it instead of the ISO. # Obtain a schematic ID from https://factory.talos.dev. # schematicId: "" provider: hetzner: # Server types (default: cx23 for both) controlPlaneServerType: "cx23" workerServerType: "cx23" # Datacenter location (default: fsn1) location: "fsn1" # Private network settings networkCidr: "10.0.0.0/16" # networkName: "my-network" # defaults to <cluster>-network # Optional: SSH key for server access (Talos API is primary) # sshKeyName: "my-key" # Optional: override the default env var name (default: HCLOUD_TOKEN) # tokenEnvVar: "MY_CUSTOM_HCLOUD_TOKEN" # Placement group settings for HA placementGroupStrategy: "Spread" # or "None" # placementGroup: "my-placement" # defaults to <cluster>-placement # fallbackLocations: ["nbg1", "hel1"] # placementGroupFallbackToNone: false # Talos OS-level ingress firewall (default: Enabled — provides defense-in-depth) # ingressFirewall: "Enabled" # or "Disabled" # Restrict Kubernetes API (6443) and Talos API (50000) access to these CIDRs. # When empty, both APIs are open to 0.0.0.0/0 and ::/0 (all IPv4 and IPv6). # Applied to both the Hetzner Cloud Firewall and the Talos OS-level ingress firewall. # allowedCidrs: ["203.0.113.0/24", "198.51.100.0/24"] # Per-role public networking (default: both IPv4 and IPv6 enabled). # Set workerPublicIPv4: false for IPv4-less workers reached over the private # network — requires KSail to run with private-network reachability and a NAT # gateway (or working IPv6) for egress. See "IPv4-less nodes" below. # workerPublicIPv4: true # workerPublicIPv6: true # controlPlanePublicIPv4: true # controlPlanePublicIPv6: true # Maximum Hetzner servers for this cluster (default: 10) # serverLimit: 10The full spec.provider.hetzner field reference — every option with its type, default, and
description — is generated from the API types: see
spec.provider.hetzner (OptionsHetzner)
in the Declarative Configuration reference.
IPv4-less Nodes
Section titled “IPv4-less Nodes”By default every Hetzner node receives a public IPv4 (billed, ~€0.50/mo each) and a public IPv6
(free). Setting workerPublicIPv4: false (and/or controlPlanePublicIPv4: false) provisions nodes
without a public IPv4, reducing both cost and public attack surface — the node is no longer
directly reachable from the internet.
Because KSail manages each node over its Talos API, an IPv4-less node changes how KSail reaches it:
- KSail connects over the private network. When a node has no public IPv4, KSail uses the node’s private-network IP for the Talos API, config apply, bootstrap, and (for control planes) the kube/talos endpoint. This requires KSail itself to have a route into the private network — run KSail from inside the network, over a VPN/WireGuard, or via a bastion. Without that route, provisioning hangs waiting for the Talos API.
- Nodes still need egress. A node with no public IPv4 cannot pull container images, reach the Hetzner API (CCM/CSI), or join the cluster unless the private network provides egress. Provide a NAT gateway on the private network, or rely on public IPv6 where every required registry is reachable over IPv6.
- IPv6-only egress is unreliable for image pulls.
ghcr.io(and some other registries) are not consistently reachable over IPv6, so an IPv4-less node relying solely on IPv6 egress may fail to pull images. A NAT gateway is the robust option.
IPv4-less control planes. Disabling controlPlanePublicIPv4 makes the generated kubeconfig and
talosconfig point at the control plane’s private IP, so kubectl/ksail only work from inside
the private network (or behind a load balancer / floating IP you place in front).
Reachability. Because an IPv4-less node is driven entirely over the private network, KSail must be able to route into it. If KSail can’t reach an IPv4-less node’s Talos API, provisioning fails with an actionable error pointing at the private-network-route and egress prerequisites above (rather than an opaque timeout). KSail also warns at config time when a role disables both IP families, since that node then depends on a NAT gateway for egress.
Autoscaler nodes. Autoscaler-created nodes inherit the worker public-net setting
cluster-wide (KSail sets HCLOUD_PUBLIC_IPV4/HCLOUD_PUBLIC_IPV6 on the cluster-autoscaler).
The upstream Hetzner cluster-autoscaler has no per-pool public-net control, so all autoscaler pools
share the worker setting.
Load balancers. KSail does not create Hetzner load balancers, but when you expose Services with
the Hetzner CCM and your nodes are IPv4-less, annotate the Service with
load-balancer.hetzner.cloud/use-private-ip: "true" so the load balancer targets nodes by their
private IP.
Talos options for Hetzner clusters
Section titled “Talos options for Hetzner clusters”| Field | Type | Default | Description |
|---|---|---|---|
talos.iso | int | 125127 | Hetzner Cloud ISO/image ID for Talos 1.12.4 x86 (look up the matching ARM ID in the Hetzner Cloud Console). Ignored when schematicId is set. |
talos.schematicId | string | (empty) | Talos factory schematic ID. When set, KSail automatically builds and manages a Hetzner snapshot image instead of booting from the ISO. spec.cluster.talos.version must also be set. Not supported for ARM64 server types (cax*). The snapshot is deleted when ksail cluster delete --delete-storage is run. |
Autoscaler Configuration
Section titled “Autoscaler Configuration”KSail supports node and pod autoscaling for Hetzner clusters via spec.cluster.autoscaler. Node pools are Hetzner-specific and map directly to the Kubernetes Cluster Autoscaler node group format.
spec: cluster: autoscaler: node: enabled: true expander: LeastWaste # single value, or a priority list e.g. [LeastNodes, LeastWaste] (Price is not supported for Hetzner) maxNodesTotal: 20 # whole-cluster node ceiling (--max-nodes-total); 0 = no cap scaleDownUnneededTime: 10m pools: - name: workers-fsn1 serverType: cx23 location: fsn1 min: 1 max: 5 - name: gpu-nbg1 serverType: cx33 location: nbg1 min: 0 max: 3 labels: # applied to every node in this pool workload: gpu taints: # only pods that tolerate this land here - key: dedicated value: gpu effect: NoSchedule pod: horizontal: Disabled # Enabled | Disabled (pod autoscaler setting; metrics-server is configured separately) vertical: Disabled # Enabled | Disabled (reserved for future VPA support; not yet implemented) provider: hetzner: serverLimit: 20 # project quota; must be ≥ the reachable total (controlPlanes + workers + sum(pool.max), capped by maxNodesTotal)The full autoscaler field reference (spec.cluster.autoscaler, including node pools and per-pool
labels/taints) is generated from the API types: see
spec.cluster.autoscaler (AutoscalerConfig)
in the Declarative Configuration reference. Note that the Price expander is rejected by KSail
with a validation error for the Hetzner provider, because the Hetzner cloud provider does not
implement the pricing API.
[!NOTE]
spec.provider.hetzner.serverLimitguards against exceeding your Hetzner Cloud project quota. KSail rejects configurations where the reachable total exceedsserverLimit, where the reachable total iscontrolPlanes + workers + sum(pool.max), clamped toautoscaler.node.maxNodesTotalwhen that global cap is set. BecausemaxNodesTotalis the whole-cluster ceiling (the value passed to the autoscaler’s--max-nodes-total), set it ≤serverLimit. RaiseserverLimitto match your actual Hetzner project quota.
Distinguishing autoscaler nodes from baseline workers
Section titled “Distinguishing autoscaler nodes from baseline workers”KSail stamps the Kubernetes node label ksail.io/autoscaled=true on every autoscaler-provisioned worker, and on no static baseline worker. This gives workloads a discriminator to key node affinity off of — for example a soft preferredDuringSchedulingIgnoredDuringExecution nodeAffinity preferring nodes where ksail.io/autoscaled DoesNotExist, so pods land on baseline workers first and only spill onto autoscaler nodes under real pressure (keeping autoscaler nodes empty and quick to scale down).
[!NOTE] The label is applied via the autoscaler worker’s Talos
machine.nodeLabels(the kubelet--node-labelsflag), so it lands on the realNodeobject. The upstream Hetzner cluster-autoscaler deliberately does not push its per-poolnodeConfigs[].labels/taintsto the kubelet — those only seed the in-memory template node used for scheduling simulation and scale-from-zero (see kubernetes/autoscaler#8492, closed as working-as-intended). Stamping the label in the worker cloud-init is therefore the canonical mechanism.
Per-pool labels and taints
Section titled “Per-pool labels and taints”Each pool can declare labels and taints that KSail applies to every node provisioned in that pool — useful for steering workloads onto a specific pool (nodeSelector/nodeAffinity) or reserving a pool for tolerating workloads (e.g. GPU nodes).
KSail applies them through two complementary mechanisms so they are correct both on the live node and during scaling decisions:
- On the real
Node— baked into each pool’s Talos worker cloud-init asmachine.nodeLabels/machine.nodeTaints, which Talos reconciles onto theNode(this is what actually labels/taints the running node, per the mechanism described above). - On the scale-from-zero template — mirrored into the autoscaler’s per-pool
nodeConfigs[].labels/taintsso the autoscaler knows, before any node of the pool exists, that scaling the pool would produce nodes carrying those labels/taints. Without this, the autoscaler would never scale a tainted pool up from zero (no template node tolerates the pending pod) and could scale the wrong pool for a label-selecting pod.
Under the hood this is delivered via the autoscaler’s HCLOUD_CLUSTER_CONFIG (per-pool cloud-init, image, labels, and taints), which KSail writes to the cluster-autoscaler-config Secret. Changing a pool’s labels or taints on ksail cluster update is applied in place and recycles that pool’s existing autoscaler nodes so they pick up the change.
Quick Start
Section titled “Quick Start”Step 1: Configure API Token
Section titled “Step 1: Configure API Token”Set HCLOUD_TOKEN as described in the Configuration section above, then verify it is exported in your shell.
Step 2: Initialize Project
Section titled “Step 2: Initialize Project”ksail cluster init \ --name my-hetzner-cluster \ --distribution Talos \ --provider Hetzner \ --control-planes 1 \ --workers 2This creates ksail.yaml and a talos/ directory for Talos configuration patches.
Step 3: Create Cluster
Section titled “Step 3: Create Cluster”ksail cluster createKSail creates Hetzner Cloud servers, boots them with Talos Linux, bootstraps Kubernetes, configures kubectl context, and installs the Hetzner Cloud Controller Manager and CSI driver.
Step 4: Verify Cluster
Section titled “Step 4: Verify Cluster”ksail cluster infokubectl get nodes -o widekubectl get pods -n kube-systemStep 5: Cleanup
Section titled “Step 5: Cleanup”ksail cluster delete[!WARNING] This deletes the Hetzner Cloud servers, load balancers, and associated resources. You will stop being charged, but the operation is irreversible.
Architecture
Section titled “Architecture”KSail provisions Hetzner Cloud servers running Talos Linux, connected via a private network. The Hetzner Cloud Controller Manager provides native load balancer integration, and the Hetzner CSI Driver provisions persistent volumes backed by Hetzner Block Storage.
graph TB
subgraph "Your Machine"
KSAIL["ksail CLI"]
end
subgraph "Hetzner Cloud"
API["Hetzner Cloud API"]
NET["Private Network"]
LB["Cloud Load Balancer"]
subgraph "Servers"
CP["Control Plane (Talos)"]
W1["Worker Node 1 (Talos)"]
W2["Worker Node 2 (Talos)"]
end
end
KSAIL -->|"HCLOUD_TOKEN"| API
API --> CP
API --> W1
API --> W2
CP --- NET
W1 --- NET
W2 --- NET
LB -->|"traffic"| W1
LB -->|"traffic"| W2
CP -.->|"kubeconfig"| KSAIL
Installed Components
Section titled “Installed Components”When using the Hetzner provider, KSail automatically installs:
- Hetzner Cloud Controller Manager — Provisions Hetzner Cloud Load Balancers for
type: LoadBalancerservices - Hetzner CSI Driver — Provisions Hetzner Block Storage volumes for
PersistentVolumeClaimresources - Placement groups — Distributes servers across physical hosts for high availability (configurable)
- Cluster Autoscaler — Installed automatically when
autoscaler.node.enabled: true(or the deprecatednodeAutoscaling: Enabled). Scales node pools defined underautoscaler.node.poolsbased on pending pod demand.
Firewall Layers
Section titled “Firewall Layers”KSail applies two independent firewall layers for Hetzner Talos clusters:
- Hetzner Cloud Firewall — Restricts external traffic at the network perimeter. KSail manages three rules: Talos API (50000/tcp), Kubernetes API (6443/tcp), and ICMP. Ports for cluster-internal communication (etcd, kubelet, trustd) are not exposed because nodes communicate over the private Hetzner Cloud Network. When
allowedCidrsis set, the Kubernetes and Talos API rules restrict source IPs to those CIDR blocks instead of0.0.0.0/0and::/0. - Talos OS-level ingress firewall — Defense-in-depth at the node level, enabled by default (
spec.provider.hetzner.ingressFirewall: Enabled). Blocks all ingress by default; opens only the ports required for each node role (control plane vs worker), restricted to the cluster subnet where appropriate. WhenallowedCidrsis set, the Kubernetes API and Talos APINetworkRuleConfigrules also restrict source IPs to those CIDRs. Disable withingressFirewall: Disabledif you manage node-level firewall rules yourself.
[!NOTE] Upgrading from older KSail versions: Clusters created before the firewall hardening change used 6 Hetzner Cloud Firewall rules. KSail now manages 3 rules (Talos API, Kubernetes API, and ICMP), with etcd, kubelet, and trustd access kept on the private Hetzner Cloud Network. Running
ksail cluster updateautomatically migrates existing clusters to the current rule set.
Operations
Section titled “Operations”Talos Snapshot Lifecycle
Section titled “Talos Snapshot Lifecycle”When spec.cluster.talos.schematicId is set, KSail manages a Hetzner Cloud snapshot image (instead of booting from the ISO) with the following lifecycle:
- Create —
ksail cluster createcalls the Talos factory to download the raw disk image for the specified schematic and Talos version, uploads it to Hetzner Cloud as a snapshot, and labels it withksail.io/talos-version,ksail.io/talos-schematic, andksail.io/cluster. If a matching snapshot already exists, it is reused. - Reuse — Subsequent
ksail cluster createcalls find the existing snapshot by label and skip the upload step. - Delete —
ksail cluster delete --delete-storageremoves the snapshot along with the cluster. Without--delete-storage, the snapshot is retained (useful when recreating the cluster).
[!NOTE] Snapshots are scoped per cluster name, so deleting one cluster does not affect snapshots used by another cluster with the same schematic.
Autoscaler Node Upgrades
Section titled “Autoscaler Node Upgrades”Autoscaler-managed nodes are provisioned by the Cluster Autoscaler itself — booting the Talos snapshot and worker config baked into the cluster-autoscaler-config Secret — not by KSail directly. They are therefore not part of the in-place rolling Talos/Kubernetes upgrade ksail cluster update runs against the static control planes and workers (those are matched by the ksail.owned label; autoscaler nodes are not).
To keep them on the baseline, when a ksail cluster update changes the Talos or Kubernetes version, KSail:
- Rebuilds the Talos snapshot at the new version and refreshes the
cluster-autoscaler-configSecret (so new autoscaler nodes boot the new version), then restarts the cluster-autoscaler to load it. - Recycles the existing autoscaler nodes so they follow the new baseline instead of drifting: after the restarted autoscaler is ready, each autoscaler node is cordoned and drained one at a time (through the Kubernetes eviction API, honoring PodDisruptionBudgets) and its Hetzner server is deleted. The cluster-autoscaler then provisions any still-needed capacity from the refreshed snapshot on demand.
This mirrors the upstream Cluster Autoscaler model — the autoscaler owns node creation, so KSail removes the stale nodes and lets the autoscaler replace them rather than upgrading ephemeral, compute-only nodes in place. Recycling runs only when the version actually changes; a no-op cluster update leaves autoscaler nodes untouched.
[!NOTE] Recycling drains nodes one at a time, so a strict PodDisruptionBudget can slow or block it. Pods that cannot be evicted within the drain timeout fail the update so the condition surfaces rather than abruptly evicting workloads.
List Clusters
Section titled “List Clusters”ksail cluster listCluster Info
Section titled “Cluster Info”ksail cluster infoDisplays Kubernetes control-plane and core service endpoints for the current context.
Update Cluster
Section titled “Update Cluster”ksail cluster updateApplies in-place changes to components (CNI, GitOps engine, cert-manager, etc.). Node scaling changes for Talos are applied in-place. A Talos or Kubernetes version bump rolls the static nodes in place and recycles autoscaler nodes so they follow the new baseline (see Autoscaler Node Upgrades). Changes to hetzner.location, hetzner.controlPlaneServerType, or hetzner.networkCidr require cluster recreation. See the Update Behavior table for details.
Deploy with LoadBalancer
Section titled “Deploy with LoadBalancer”kubectl create deployment web --image=nginx --replicas=3kubectl expose deployment web --port=80 --type=LoadBalancerkubectl get svc web --watchThe type: LoadBalancer service provisions a real Hetzner Cloud Load Balancer with a public IP.
Troubleshooting
Section titled “Troubleshooting”hcloud token is not set — The HCLOUD_TOKEN environment variable is missing or empty. Run echo $HCLOUD_TOKEN to check. Re-export if needed. If you use a custom variable name, verify it matches spec.provider.hetzner.tokenEnvVar in ksail.yaml.
server type not found — The server type name in controlPlaneServerType, workerServerType, or an autoscaler pool does not exist in the Hetzner Cloud API. Check available types in the Hetzner Cloud Console or with hcloud server-type list.
server type unavailable in all configured locations — The requested server type exists but is not available in your primary location or any configured fallback locations. KSail runs this precheck before creating any infrastructure, so no partial resources are left behind. Add more fallback locations or choose a different server type:
spec: provider: hetzner: location: "fsn1" fallbackLocations: ["nbg1", "hel1"]Server creation fails with resource unavailability — If the precheck passes but server creation still fails (transient capacity issue), configure spec.provider.hetzner.fallbackLocations to try alternative datacenter locations automatically (see example above).
Placement group errors — Hetzner limits spread placement groups to 10 servers per datacenter. Reduce your node count or set placementGroupStrategy: "None". For best-effort HA, set placementGroupFallbackToNone: true to fall back automatically when spread placement fails.
autoscaler configuration exceeds Hetzner server limit — The reachable total node count (controlPlanes + workers + sum(pool.max), clamped by maxNodesTotal when set) exceeds hetzner.serverLimit. Increase serverLimit, reduce your pool max values, or lower maxNodesTotal to cap the cluster total.
cloud provider requires an external registry — Hetzner Cloud servers cannot reach Docker-based local registries running on your machine. When spec.cluster.localRegistry is enabled, it must point to an internet-accessible registry (e.g., ghcr.io/myorg). KSail returns this error early if a non-external registry is configured.
context deadline exceeded or connection errors — Verify HCLOUD_TOKEN is valid and has read/write permissions. Check connectivity with curl -sI https://api.hetzner.cloud/v1/servers -H "Authorization: Bearer $HCLOUD_TOKEN".
Cluster deletion leaves orphaned resources — If ksail cluster delete is interrupted, manually clean up in the Hetzner Cloud Console: delete servers, load balancers, networks, and placement groups associated with your cluster name.