Kubecon Europe Rundown
Kubecon Europe just wrapped up last week and as usual it was a great event with some fantastic content. The shift to a virtual format came off really well, and while not the same, I think it made the content accessible to a much wider audience.
We were tracking a few topics of interest: eBPF, OpenTelemetry, and Prometheus and wanted to share some highlights from some of the sessions.
There were a number of sessions on eBPF.
- The Kinvolk team ran a hands on tutorial on using eBPF. They covered a number of useful tools, Inspektor-gadget and kubectl-trace. If you want a sense of some of the things eBPF, this isn’t a bad starting point.
- Daniel Borkmann had a great talk, eBPF and Kubernetes: Little Helper Minions for Scaling Microservices, that spanned the history to latest developments in eBPF. I really recommend this one. He mentioned the motivation for eBPF merging into the kernel in the first place was to reduce kernel complexity and feature creep. It allows maintainers to push back on things merging to the mainline kernel without requiring kernel modules. He had some great descriptions of the Linux kernel data path and how eBPF can be used to bypass portions of it (Cilium does some of this). There was also a quantitative performance comparison of XDP / tc / iptables. As you might expect eBPF made things much faster. He even pointed out that eBPF currently has 347 contributors! Wow.
- The Isovalent team talked about collecting observability data with Hubble and eBPF.
- The Shopify team talked about how they are using Falco to handle intrusion detection in their containers. For a deeper dive on part of Falco, there was also a talk on the gRPC API the Falco team built on its security data.
- eBPF (and Inspektor-gadget) came up in Duffie Cooley’s talk on digging into the system calls made by applications as well.
- The Bloomberg team talked about how they integrated a K8s cluster with existing bare metal servers and used eBPF to investigate the interaction of iptables and their packet encapsulation.
OpenTelemetry continued to gain momentum.
- Nina Strawski had a cool talk on tracing user events with GraphQL and OpenTelemetry. Tracing is so frequently associated with backends that I found the frontend take on it really interesting.
- Intuit talked about a great success story of implementation and deployment of tracing with OpenTelemetry. It was interesting to me that a lot of the consumption by their team actually came from metrics aggregated from these traces, particularly once they could get <1 minute granularity.
- There was a talk on OpenTelemetry auto instrumentation. I’m partial to this topic because I know how hard it can be to add instrumentation, particularly for existing apps. The speakers talked about the Java and Python implementations they have in OpenTelemetry to solve this without code changes.
- Shopify talked about their migration to OpenCensus and later OpenTelemetry from a custom set of tools and collectors. It was a tricky process that ended well. As the speaker notes, trace instrumentation and collection is essentially commoditized and the hard work rests in the aggregation and analysis of that data.
There was a bit too much Prometheus material to recount all of it but a few talks stuck out for us.
- “Make Prometheus Restart Faster and Use Less Memory” told the story of two optimizations in Prometheus. The first, available in 2.19, reduces the memory footprint that Prometheus uses to hold compressed samples for a timeseries in long blocks using memory-mapped files. In some cases, this had a 30-40% reduction in overall memory and could speed up startup time of the database as well. The second optimization speeds startup times by saving a snapshot of non-full chunks to disk so the database can quickly reconstruct its state on startup (without replaying the entire WAL).
- Bjorn Rabenstein spoke about improving histograms in Prometheus. The current approach is very dependent on the bucket layout and the talk investigated whether automatic approaches to bucketing could be applied (Hdrhistogram, etc.) These approaches were more challenging with the pull based mechanism Prometheus uses but still seemed feasible based on the investigations presented. This work is ongoing so you should expect further dialogue in the Prometheus community.
Hopefully that gives you a bit of some of the topics we’ve been excited about from Kubecon Europe. Even virtually, there is a ton of energy and enthusiasm in this community and at Flowmill, we’re happy to be part of it.