Hi there! I’m Merry, Santa’s CIE (Chief Information Elf), responsible for making sure computers help us deliver joy to the world each Christmas. My elf colleagues are really busy getting ready for the big day (or should I say night?), but this year, my team has things under control, thanks to our fully cloud-native architecture running on Google Cloud Platform (GCP)! What’s that? You didn’t know that the North Pole was running in the cloud? How else did you think that we could scale to meet the demands of bringing all those gifts to all those children around the world?
Of course, our toy list has changed a lot too. It used to be relatively simple — rocking horses, stuffed animals, dolls and toy trucks, mostly. The most complicated things we made when I was a young elf were Teddy Ruxpins (remember those?). Now toy cars and even trading card games come with their own apps and use machine learning.
This is where I come in. We build lots of computer programs to help us. My team is responsible for running hundreds of microservices. I explain microservices to Santa as a computer program that performs a single service. We have a microservice for processing incoming letters from kids, another microservice for calculating kids’ niceness scores, even a microservice for tracking reindeer games rankings.
|Here's an example of the Letter Processing Microservice, which takes handwritten letter in all languages (often including spelling and grammatical errors) and turns each one into text.|
Google lets us use projects, folders and orgs to group different VMs together. Multiple microservices can make up an application and everything together makes up our system. Our most important and most complicated application is our Christmas Planner application. Let’s talk about a few services in this application and how we make sure we have a successful Christmas Eve.
Small elves, big data
Our work starts months in advance, tracking naughty and nice kids by relying on parent reports, teacher reports, police reports and our mobile elves. Keeping track of almost 2 billion kids each year is no easy feat. Things really heat up around the beginning of December, when our army of Elves-on-the-Shelves are mobilized, reporting in nightly.
Deck the halls with SLO dashboards
Our most important service level indicator or SLI is “child delight”. We target “5 nines” or 99.999% delightment level meaning 99,999/100,000 nice children are delighted. This limit is our service level objective or SLO and one of the few things everyone here in the North Pole takes very seriously. Each individual service has SLOs we track as well.
We use Stackdriver for dashboards, which we show in our control center. We set up alerting policies to easily track when a service level indicator is below expected and notify us. Santa was a little grumpy since he wanted red and green to be represented equally and we explained that the red warning meant that there were alerts and incidents on a service, but we put candy canes on all our monitors and he was much happier.
Merry monitoring for all
We have a team of elite SREs (Site Reliability Elves, though they might be called Site Reliability Engineers by all you folks south of the North Pole) to make sure each and every microservice is working correctly, particularly around this most wonderful time of the year. One of the most important things to get right is the monitoring.
For example, we built our own “internet of things” or IoT where each toy production station has sensors and computers so we know the number of toys made, what their quota was and how many of them passed inspection. Last Tuesday, there was an alert that the number of failed toys had shot up. Our SREs sprang into action. They quickly pulled up the dashboards for the inspection stations and saw that the spike in failures was caused almost entirely by our baby doll line. They checked the logs and found that on Monday, a creative elf had come up with the idea of taping on arms and legs rather than sewing them to save time. They rolled back this change immediately. Crisis averted. Without the proper monitoring and logging, it would be very difficult to find and fix the issue, which is why our SREs consider it the base of their gift reliability pyramid.
All I want for Christmas is machine learning
Running things in Google Cloud has another benefit: we can use technology they’ve developed at Google. One of our most important services is our gift matching service, which takes 50 factors as input including the child’s wish list, niceness score, local regulations, existing toys, etc., and comes up with the final list of which gifts should be delivered to this child. Last year, we added machine learning or ML, where we gave the Cloud ML engine the last 10 years of inputs, gifts and child and parent delight levels. It automatically learned a new model to use in gift matching based on this data.
Using this new ML model, we reduced live animal gifts by 90%, ball pits by 50% and saw a 5% increase in child delight and a 250% increase in parent delight.
Tis the season for sharing
Know someone who loves technology that might enjoy this article or someone who reminds you of Santa — someone with many amazing skills but whose eyes get that “reindeer-in-the-headlights look” when you talk about cloud computing? Share this article with him or her and hopefully you’ll soon be chatting about all the cool things you can do with cloud computing over Christmas cookies and eggnog... And be sure to tell them to sign up for a free trial — Google Cloud’s gift to them!