Field Notes From KubeCon #5: GitOps at Scale with Expedia Group

The blog post is a detailed live report from KubeCon, capturing insights from key sessions attended during the event

Field Notes From KubeCon #5: GitOps at Scale with Expedia Group

Hi, Ishan here again! And I am delivering you the field notes I took from all the sessions I attended during KubeCon Day 1 and 2—so you don’t have to miss anything!
Expedia Group's GitOps Revolution: Extensive Scalability Testing on ArgoCD for 30K+ Applications -

After talking to some people, I eagerly headed to the talk by Shivani Mehrotra from Expedia Group & Mohit Kumar from Coforge.

KubeCon India was abuzz with energy, and I was particularly excited about this session, titled "GitOps Evolution: Scaling ArgoCD for 30K+ Applications". It promised deep insights into how a behemoth like Expedia Group optimized its GitOps pipeline—an area that directly aligns with the work we do at Facets.cloud as we solve platform engineering challenges for diverse enterprises.

A Journey Through GitOps Evolution

Expedia Group’s journey began with their Runtime Compute Platform (RCP), a Kubernetes-based container orchestration platform initially managed via Kubefed. With the deprecation of Kubefed, they transitioned to ArgoCD, leveraging its robust declarative GitOps capabilities. This migration wasn’t just about adopting a new tool but also about scaling it to manage over 30,000 applications across 110 clusters.

The team’s process involved:

  1. Setting Up ArgoCD at Scale:
    • A centralized ArgoCD control plane with 10 target clusters, each hosting 11 virtual clusters, created a testing ground for scalability.
    • Automation scripts in Go were used to dynamically generate application configurations for various use cases, ranging from lightweight 10-resource applications to complex 25-resource setups.
  2. Rigorous Scalability Testing:
    • Progressive deployment scenarios tested ArgoCD’s capability to reconcile and sync thousands of applications in under 40 minutes.
    • A mix of parallelism and dynamic application setups ensured a real-world simulation of the production environment.

Observations and Key Learnings

Tuning ArgoCD for Optimal Performance:Expedia’s scalability tests revealed valuable tuning parameters for ArgoCD:

  • Operation Processors: Setting these to 420 reduced sync times significantly.
  • Status Processors: Optimized at 300 for balancing reconciliation and resource usage.
  • Sharding Algorithms: A combination of manual and round-robin sharding ensured even load distribution across clusters.

Scaling Challenges:

  • Uneven resource utilization among shards required attention. Peaks ranged from 3 cores to 15 cores during heavy reconciliation.
  • Deleting clusters introduced delays, as unknown sync states needed cleanup before resuming normal operations.

Community Contributions:Expedia’s efforts have enriched the GitOps ecosystem by stress-testing ArgoCD configurations and contributing insights back to the community. This aligns with the collaborative spirit of KubeCon and the broader Kubernetes ecosystem.

Relevance to Platform Engineering at Facets.cloud

At Facets.cloud, we are actively building, solving, and serving for platform engineering. The lessons from Expedia Group resonate deeply with our work:

  • Scalability Frameworks: Their approach to scaling GitOps with ArgoCD mirrors the challenges we tackle for our clients. Automated testing pipelines and optimized resource allocation are central to our solutions.
  • Operational Resilience: Ensuring consistent application state across thousands of clusters is a critical part of our offerings, and Expedia’s insights into shard management and reconciliation times provide actionable takeaways.
  • Ecosystem Enrichment: Like Expedia, we believe in contributing back to the community. The parameters and strategies tested by their team are a treasure trove for anyone scaling Kubernetes in production.

Wrapping Up

As I walked out of the session, one thing was clear: scaling GitOps isn’t just about tools; it’s about tuning, testing, and a relentless focus on resilience. Shivani Mehrotra and her team at Expedia Group provided a masterclass in operational excellence, leaving us all inspired to push the boundaries of what’s possible in platform engineering.

Here’s to solving the next big challenge—one application set at a time!