Driving etcd Stability and Kubernetes Success


Introduction: The Critical Role of etcd in Cloud-Native Infrastructure

Imagine a cloud-native world without Kubernetes. It's hard, right? But have you ever considered the unsung hero that makes Kubernetes tick? Enter etcd, the distributed key-value store that serves as the central nervous system for Kubernetes. Etcd's ability to consistently store and replicate critical cluster state data is essential for maintaining the health and harmony of distributed systems.


etcd: The Backbone of Kubernetes

Think of Kubernetes as a magnificent vertebrate animal, capable of complex movements and adaptations. In this analogy, etcd is the animal's backbone – a strong, flexible structure that supports the entire system. Just as a backbone protects the spinal cord (which carries vital information), etcd safeguards the critical data that defines the Kubernetes environment. And just as a backbone connects to every other part of the body, etcd facilitates communication and coordination between all the components of Kubernetes, allowing it to move, adapt, and thrive in the dynamic world of distributed systems.

ALT TEXT
Credit: Original image xkcd.com/2347, alterations by Josh Berkus.

Google's Deep-Rooted Commitment to Open Source

Google has a long history of contributing to open source projects, and our commitment to etcd is no exception. As the initiator of Kubernetes, Google understands the critical role that etcd plays in its success. Google engineers consistently invest in etcd to enhance its functionality and reliability, driven by their extensive use of etcd for their own internal systems.


Google's Collaborative Impact on etcd Reliability

Google engineers have actively contributed to the stability and resilience of etcd, working alongside the wider community to address challenges and improve the project. Here are some key areas where their impact has been felt:

Post-Release Support: Following the release of etcd v3.5.0, Google engineers quickly identified and addressed several critical issues, demonstrating their commitment to maintaining a stable and production-ready etcd for Kubernetes and other systems.

Data Consistency: Early Detection and Swift Action: Google engineers led efforts to identify and resolve data inconsistency issues in etcd, advocating for public awareness and mitigation strategies. Drawing from their Site Reliability Engineering (SRE) expertise, they fostered a culture of "blameless postmortems" within the etcd community—a practice where the focus is on learning from incidents rather than assigning blame. Their detailed postmortem of the v3.5 data inconsistency issue and a co-presented KubeCon talk served to share these valuable lessons with the broader cloud-native community.

Refocusing on Stability and Testing: The v3.5 incident highlighted the need for more comprehensive testing and documentation. Google engineers took action on multiple fronts:

  • Improving Documentation: They contributed to creating "The Implicit Kubernetes-ETCD Contract," which formalizes the interactions between the two systems, guiding development and troubleshooting.
  • Prioritizing Stability and Testing: They developed the "etcd Robustness Tests," a rigorous framework simulating extreme scenarios to proactively identify inconsistency and correctness issues.

These contributions have fostered a collaborative environment where the entire community can learn from incidents and work together to improve etcd's stability and resilience. The etcd Robustness Tests have been particularly impactful, not only reproducing all the data inconsistencies found in v3.5 but also uncovering other bugs introduced in that version. Furthermore, they've found previously unnoticed bugs that existed in earlier etcd versions, some dating back to the original v3 implementation. These results demonstrate the effectiveness of the robustness tests and highlight how they've made etcd the most reliable it has ever been in the history of the project.


etcd Robustness Tests: Making etcd the Most Reliable It's Ever Been

The "etcd Robustness Tests," inspired by the Jepsen methodology, subject etcd to rigorous simulations of network partitions, node failures, and other real-world disruptions. This ensures etcd's data consistency and correctness even under extreme conditions. These tests have proven remarkably effective, identifying and addressing a variety of issues:

For deeper insights into ensuring etcd's data consistency, Marek Siarkowicz's talk, "On the Hunt for Etcd Data Inconsistencies," offers valuable information about distributed systems testing and the innovative approaches used to build these tests. To foster transparency and collaboration, the etcd community holds bi-weekly meetings to discuss test results, open to engineers, researchers, and other interested parties.


The Kubernetes-etcd Contract: A Partnership Forged in Rigorous Testing

To solidify the Kubernetes-etcd partnership, Google engineers formally defined the implicit contract between the two systems. This shared understanding guided development and troubleshooting, leading to improved testing strategies and ensuring etcd meets Kubernetes' demanding requirements.

When subtle issues were discovered in how Kubernetes utilized etcd watch, the value of this formal contract became clear. These issues could lead to missed events under specific conditions, potentially impacting Kubernetes' operation. In response, Google engineers are actively working to integrate the contract directly into the etcd Robustness Tests to proactively identify and prevent such compatibility issues.


Conclusion: Google's Continued Commitment to etcd and the Cloud-Native Community

Google's ongoing investment in etcd underscores their commitment to the stability and success of the cloud-native ecosystem. Their contributions, along with the wider community's efforts, have made etcd a more reliable and performant foundation for Kubernetes and other critical systems. As the ecosystem evolves, etcd remains a critical linchpin, empowering organizations to build and deploy distributed applications with confidence. We encourage all etcd and Kubernetes contributors to continue their active participation and contribute to the project's ongoing success.

By Marek Siarkowicz – GKE etcd

Batch processing support for Performance Max

What’s New

Starting with Google Ads API v17, BatchJobService supports AssetGroupOperation. With this change you can use batch processing to create and manage entire Performance Max campaigns.

Batch processing is a powerful feature in the Google Ads API that lets you dispatch a set of operations, which might be interdependent, to multiple services without waiting for each operation to complete. Batch processing also provides automatic retries for transient errors, and automatic grouping of operations We have made batch processing available for AssetGroupOperation in response to your feedback to provide another option for working with asset groups asynchronously.

Implementation Details

Using AssetGroupOperation with batch processing for Performance Max campaigns is similar to how you use other operations, with the following special considerations:

When creating asset groups using batch processing, AssetGroupOperation and AssetGroupAssetOperation must be sequential without other operations in between. This is due to how operations are grouped together when processed. You can learn more about the considerations and best practices for using batch processing to create Performance Max campaigns in the Batch processing for Performance Max guide.

Here’s how to create an asset group in a batch job:

  1. Create a MutateOperation containing an AssetGroupOperation. This is no different than creating a MutateOperation using the GoogleAdsService.Mutate service. See the mutating resources guide.
  2. Associate assets with the asset group by creating a MutateOperation containing operations of type AssetGroupAssetOperation for each asset required; the sequential AssetGroupOperation and operations of type AssetGroupAssetOperation must combine to meet applicable minimum asset requirements.
  3. Add the operations of type MutateOperation to the batch job as you would with any other type of operation.
  4. Run the batch job by calling RunBatchJob after adding all operations.

The following resources contain additional information to help you with your integration:

Improving Performance Max integrations Blog Series

This article is part of a series that discusses new and upcoming features that you have been asking for. We’ll cover what’s new and how it differs from the current implementation approach.

Keep an eye out for further updates and improvements on our developer blog, continue providing feedback on Performance Max integrations with the Google Ads API, and as always, contact our team if you need support.

Workshop Reminder

Register today for the Performance Max and the Google Ads API workshop happening on July 17!

Chrome Beta for Desktop Update

The Chrome team is excited to announce the promotion of Chrome 127 to the Beta channel for Windows, Mac and Linux. Chrome 127.0.6533.4/.5 contains our usual under-the-hood performance and stability tweaks, but there are also some cool new features to explore - please head to the Chromium blog to learn more!

A partial list of changes is available in the Git log. Interested in switching release channels? Find out how. If you find a new issue, please let us know by filing a bug. The community help forum is also a great place to reach out for help or learn about common issues.

Daniel Yip
Google Chrome

Google Chrome Releases 2024-06-12 21:44:00

 

The Dev channel is being updated to OS version 127.0.6533.0 (Platform version 15917.2.0) for most ChromeOS devices. This build contains a number of bug fixes and security updates.

If you find new issues, please let us know one of the following ways

  1. File a bug
  2. Visit our ChromeOS communities
    1. General: Chromebook Help Community
    2. Beta Specific: ChromeOS Beta Help Community
  3. Report an issue or send feedback on Chrome

Interested in switching channels? Find out how.

Alon,
Google ChromeOS

Chrome Beta for Android Update

Hi everyone! We've just released Chrome Beta 127 (127.0.6533.2) for Android. It's now available on Google Play.

You can see a partial list of the changes in the Git log. For details on new features, check out the Chromium blog, and for details on web platform updates, check here.

If you find a new issue, please let us know by filing a bug.

Erhu Akpobaro
Google Chrome

Time to challenge yourself in the 2024 Google CTF


It’s Google CTF time! Install your tools, commit your scripts, and clear your schedule. The competition kicks off on June 21 2024 6:00 PM UTC and runs through June 23 2024 6:00 PM UTC. Registration is now open at goo.gle/ctf.


Join the Google CTF (at goo.gle/ctf), a thrilling arena to showcase your technical prowess. The Google CTF consists of a set of computer security puzzles (or challenges) involving reverse-engineering, memory corruption, cryptography, web technologies, and more. Participants can use obscure security knowledge to find exploits through bugs and creative misuse, and with each completed challenge your team will earn points and move up through the ranks. 




The top 8 teams of the Google CTF will qualify for our Hackceler8 competition taking place in Málaga, Spain later this year as a part of our larger Escal8 event. Hackceler8 is our experimental esport-style hacking game competition, custom-made to mix CTF and speedrunning. 


Screenshot from last year’s Hackceler8 game




In the competition, teams need to find clever ways to abuse the game features to capture flags as quickly as possible.




Last year, teams assumed the role of Bartholomew (Mew for short), the fuzzy and adorable protagonist of Hackceler8 2023, set to defeat and overcome the evil rA.Ibbit taking over Silicon Valley! What adventures will Mew encounter this year? See the 2023 grand final to get a sense of the story and gameplay. The prize pool for this year’s Google CTF and Hackceler8 stands at more than $32,000.



Itching to get started early? Want to learn more, or get a leg up on the competition? Review challenges from previous years, including previous Hackceler8 matches, all open-sourced here. Or gain inspiration by binge watching hours of Hackceler8 2023 videos!




If you are just starting out in this space, check out our documentary H4CK1NG GOOGLE, it’s a great way to get acquainted with security. We also recommend checking out this year’s Beginner’s Quest that’ll be launching later this summer which will teach you some of the tools and tricks with simpler gamified challenges. For example, last year we explored hacking through time – you can use this to prepare for what’s yet to come.




Whether you’re a seasoned CTF player or just curious about cybersecurity and ethical hacking, we want to invite you to join us. Sign up for the Google CTF to expand your skill set, meet new friends in the security community, and even watch the pros in action. For the latest announcements, see goo.gle/ctf, subscribe to our mailing list, or follow us on Twitter @GoogleVRP. Interested in bug hunting for Google? Check out bughunters.google.com. See you there!