How to Implement Server-Side Sharded List and Watch in Kubernetes 1.36
Introduction
As Kubernetes clusters scale to tens of thousands of nodes, controllers that watch high-cardinality resources like Pods face a fundamental scaling challenge. Each replica of a horizontally scaled controller receives the entire event stream from the API server, deserializing every object only to discard those it's not responsible for. This wastes CPU, memory, and network bandwidth. Kubernetes 1.36 introduces an alpha feature—server-side sharded list and watch (KEP-5866)—that moves filtering upstream into the API server. With this feature enabled, each controller replica tells the API server which hash range it owns, and the API server sends only the matching events. This guide walks you through enabling and using this feature in your controllers.
What You Need
- A Kubernetes cluster running v1.36 or later (alpha features must be enabled)
- Cluster-admin permissions to enable feature gates
- A controller that uses
client-goinformers to list and watch resources (e.g., custom controllers or kube-state-metrics) - Familiarity with Go programming and Kubernetes controller patterns
- A deployment strategy for multiple controller replicas (e.g., StatefulSet or Deployment with consistent replica count)
Step-by-Step Guide
Step 1: Verify Kubernetes Version and Enable the Alpha Feature Gate
First, ensure your cluster's API server is running v1.36 or newer. Then enable the ShardedListWatch feature gate. Add the flag --feature-gates=ShardedListWatch=true to the kube-apiserver configuration. If using kubeadm, edit the static pod manifest or update the kubeadm configuration file. For managed services like EKS or AKS, check provider documentation for enabling alpha features. After restarting the API server, verify the feature is active by checking the API server logs for messages about shard support.
Step 2: Determine the Number of Replicas and Their Hash Ranges
Decide how many controller replicas you want. Each replica will handle a contiguous portion of the 64-bit hash space (0 to 2^64-1). Compute the start and end values for each replica. For example, with 2 replicas: Replica 0 handles [0x0000000000000000, 0x8000000000000000) and Replica 1 handles [0x8000000000000000, 0xFFFFFFFFFFFFFFFF]. For 4 replicas, split equally: each gets a quarter of the space. Store these ranges in a configuration map or compute them programmatically based on the replica index (e.g., via a StatefulSet's pod ordinal). The hash is computed using FNV-1a, so the same field (e.g., metadata.uid) always maps to the same shard.
Step 3: Modify Your Controller's Informer Setup to Include a Shard Selector
Update your controller code to inject the shard selector into the informer's ListOptions. Use the WithTweakListOptions option when creating the shared informer factory. The shard selector is a string like:
shardRange(object.metadata.uid, '0x0000000000000000', '0x8000000000000000')You must replace the range values with those computed in Step 2. Here's an example in Go:
import (
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/client-go/informers"
)
shardSelector := "shardRange(object.metadata.uid, '0x0000000000000000', '0x8000000000000000')"
factory := informers.NewSharedInformerFactoryWithOptions(client, resyncPeriod,
informers.WithTweakListOptions(func(opts *metav1.ListOptions) {
opts.ShardSelector = shardSelector
}),
)If your controller uses individual informers (not a factory), set the ShardSelector field on the ListOptions passed to the NewInformer or NewFilteredSharedIndexInformer call. Currently supported field paths are object.metadata.uid and object.metadata.namespace. Choose the one that best distributes your workload.
Step 4: Deploy the Controller Replicas with the Correct Shard Selectors
Deploy your controller as a StatefulSet or Deployment with the desired number of replicas. For each replica, pass its assigned hash range via an environment variable or command-line argument. In your controller startup code, read this value and construct the shard selector string accordingly. For example, with a StatefulSet, you can use the pod's hostname to derive the replica index. Configure the deployment so that each pod knows its unique range and never overlaps with another pod's range. Ensure the number of replicas is stable to avoid coverage gaps or duplication.
Step 5: Test and Monitor the Sharded Watch Behavior
After deployment, verify that each replica receives only a subset of events. Enable verbose logging in the API server (e.g., -v=6) to see if shard filtering is applied. Check your controller's metrics: you should see a significant reduction in the number of objects processed per replica compared to a non-sharded setup. Use tools like kubectl top nodes to observe reduced CPU and memory usage on the API server. To ensure correctness, compare the sum of objects across all replicas matches the total number of objects in the cluster. Also, test that watch events are correctly filtered; for instance, create and delete Pods and confirm only the responsible replica reacts.
Tips
- Start with a small number of replicas—2 or 4—to validate the sharding logic before scaling out further.
- Consider using
metadata.namespaceif your workload is naturally partitioned by namespace. This can simplify range assignment if you have a known number of namespaces. - Handle rebalancing carefully: if you change the number of replicas, every pod must update its shard range simultaneously to avoid missing events. One approach is to use a rolling update that first adds new replicas with empty ranges, then redistributes.
- Monitor for hash collisions: although FNV-1a is deterministic, very uneven distribution can occur if object UIDs are not uniformly random. Use a larger number of shards to mitigate this.
- Performance impact: The API server performs an additional hash computation per object. This overhead is negligible compared to the savings from reduced data transfer, but benchmark in your environment.
- Security: The shard selector is applied at the API server level; ensure RBAC permissions still enforce namespace isolation if needed.
- Fallback plan: If the alpha feature is disabled or removed, your controller should still work without sharding (i.e., ignore the ShardSelector field).
Related Articles
- Cloud Cost Optimization Remains Critical Amid AI Workload Surge, Experts Warn
- Mastering Photo Library Cleanup with the Daily Habit Method
- How to Optimize Kubernetes Pod Performance with Pod-Level Resource Managers (Alpha)
- Microsoft Launches Smart Tier for Azure Storage: Automated Cost Optimization Now Generally Available
- Cloudflare Unveils Dynamic Workflows: Durable Execution Meets Multi-Tenant Code at Runtime
- Kubernetes v1.36: Tackling Controller Staleness with Atomic FIFO and Enhanced Observability
- 10 Essential Steps to Build a Serverless Spam Classifier with AWS and Scikit-Learn
- Dynamic Workflows: Powering Tenant-Specific Durable Execution on Cloudflare