Enhancing Deployment Resilience at GitHub with eBPF

Published: 2026-05-03 18:07:14 | Category: Open Source

At GitHub, we discovered a critical challenge: our deployment system had a circular dependency on our own service. If GitHub went down, we couldn't access our source code to fix it. To break this loop, we adopted eBPF (extended Berkeley Packet Filter) to monitor and block risky calls during deployments. This Q&A explores the problem, the types of circular dependencies, and how eBPF provides a lightweight, secure solution.

What circular dependency problem did GitHub face?

GitHub hosts its own source code on github.com, which creates a simple circular dependency: to deploy GitHub, you need GitHub. If an outage occurs, you cannot access the code to deploy a fix. This was mitigated by maintaining a local mirror, but deeper issues remained. For instance, deployment scripts might inadvertently depend on GitHub to download tools or check for updates, creating additional loops. These dependencies could cascade during an outage, making it impossible to roll out fixes. The challenge was to identify and break these hidden, direct, and transient dependencies without hindering legitimate deployment operations. eBPF emerged as a powerful tool to selectively monitor and block such calls, ensuring that deployment scripts remain self-contained even if GitHub is unavailable.

Enhancing Deployment Resilience at GitHub with eBPF — Source: github.blog

What types of circular dependencies exist in deployment?

Using a hypothetical MySQL outage scenario, we identified three main types:

Direct dependency: A deploy script tries to pull a tool from GitHub, but GitHub is down, so the script fails.
Hidden dependency: A local tool checks for an update from GitHub on startup; if it can't reach GitHub, it hangs or errors.
Transient dependency: A script calls an internal service, which in turn tries to fetch something from GitHub, propagating the failure back.

These examples show that many dependencies are not obvious until an outage occurs. Traditional approaches require teams to manually review scripts, which is error-prone and time-consuming.

How did GitHub traditionally handle circular dependencies?

Previously, each team owning stateful hosts was responsible for auditing their deployment scripts to find and eliminate circular dependencies. This manual process was inefficient: dependencies often went unnoticed until a real outage, and teams had to rely on static analysis or personal knowledge. Moreover, scripts frequently evolved, introducing new dependencies over time. The lack of automation meant that even after a review, new risks could emerge. This reactive approach could not guarantee that all circular dependencies were caught, especially hidden or transient ones. GitHub sought a more systematic, runtime-based solution that could enforce constraints without burdening developers.

Why did GitHub choose eBPF for deployment safety?

eBPF (extended Berkeley Packet Filter) allows you to run sandboxed programs inside the Linux kernel without modifying kernel source code or loading modules. It can inspect system calls, network packets, and other events in real time. For deployment safety, we needed to selectively block or alert on specific outbound connections (e.g., to GitHub servers) from deployment scripts. eBPF's lightweight, dynamic nature meant we could attach probes to connect, sendto, or other syscalls and decide whether to allow or deny them based on destination IPs or process characteristics. This approach avoids restarting daemons or rewriting scripts, and it works across different languages. It provides granular control without significant performance overhead, making it ideal for production environments.

How does eBPF monitor and block circular dependency calls?

We implement an eBPF program that attaches to the connect syscall (or an equivalent tracing point). When a deployment script attempts to make a network connection, the eBPF program checks the destination IP address against a list of internal services (e.g., GitHub's own servers). If the destination is flagged as a potential circular dependency (e.g., github.com), the program can either log the attempt and continue, or block the connection entirely, returning an error. The decision is made right in the kernel, before the connection reaches the network stack. This ensures that even if a script tries to download a binary from GitHub, it fails fast, alerting the team to a problem. Additionally, we can track connections to other internal services that might introduce transitive dependencies. The key advantage is that this dynamic filtering adapts as dependencies change, without code changes.

Can you show a simple eBPF program example for deployment safety?

While detailed code is beyond this summary, a basic approach uses libbpf or BCC. We load an eBPF program with a tracepoint on syscalls/sys_enter_connect. Inside the program, we read the sockaddr_in structure to extract the destination IP. If it matches a predefined IP range (e.g., 192.30.252.0/22 for GitHub), we update a map with the process ID and timestamp for logging. Optionally, we can call bpf_override_return to force the syscall to fail with EPERM. On the user-space side, a daemon reads the map and alerts operators. This setup runs on each deployment host. Note that we must consider security and validate that only legitimate deployment processes are monitored. The same technique can be extended to other syscalls like open or exec to detect file-based dependencies.

What benefits does eBPF offer over traditional review methods?

Manual reviews are static and only catch known patterns. eBPF provides runtime enforcement that catches unforeseen dependencies—especially those that only appear under certain conditions (e.g., an update check that only runs at startup). It reduces the burden on developers, as they no longer need to audit every script line. eBPF also works without modifying the target scripts, so legacy deployments benefit immediately. Moreover, it provides real-time monitoring and alerting, allowing teams to detect and fix dependency issues before they cause outages. Performance remains negligible because eBPF runs in kernel context with low overhead. Finally, the policy can be updated by simply loading a new eBPF program, enabling quick responses to new dependency patterns.

What challenges did GitHub face implementing eBPF for this use case?

One challenge was ensuring we only block the correct connections—e.g., legitimate use of GitHub services by non-deployment tools must not be affected. We solved this by attaching eBPF programs only to processes involved in deployment scripts (identified by cgroup or process ancestry). Another challenge was handling TLS connections: the destination IP is visible, but the full domain might be hidden behind CDNs or load balancers. We built a mapping of known internal IP ranges rather than relying on DNS. Additionally, eBPF programs must be carefully written to avoid kernel panics; we followed best practices for safety checks and used verifier-approved code. Finally, managing multiple eBPF programs across many hosts required a robust deployment pipeline. Despite these hurdles, eBPF proved to be a flexible and powerful tool that improved our deployment safety significantly.

Betsports