Introducing Shared VPC for Google Kubernetes Engine



Containers have come a long way, from the darling of cloud-native startups to the de-facto way to deploy production workloads in large enterprises. But as containers grow into this new role, networking continues to pose challenges in terms of management, deployment and scaling.

We are excited to introduce Shared Virtual Private Cloud (VPC) to help Google Kubernetes Engine tackle scale and reliability needs of container networking for enterprise workloads.

Shared VPC for better control of enterprise network resources
In large enterprises, you often need to place different departments into different projects, for purposes of budgeting, access control, accounting, etc. And while isolation and network segmentation are recommended best practices, network segmentation can pose a challenge for sharing resources. In Compute Engine environments, Shared VPC networks let enterprise administrators give multiple projects permission to communicate via a single, shared virtual network without giving control of critical resources such as firewalls. Now, with the general availability of Kubernetes Engine 1.10, we are extending the shared VPC model to connect Kubernetes Engine clusters with versions 1.8 and above—connecting from multiple projects to a common VPC network.

Before Shared VPC, it was possible to achieve this setup in a crippled, insecure way, by bridging projects in their own VPC with Cloud VPNs. The problem with this approach was that it required N*(N-1)/2 connections to obtain full connectivity between each project. An additional challenge was that the network for each cluster wasn’t fully configurable, making it difficult for one project to communicate with another without a NAT gateway in between. Security was another concern since the organization administrator had no control over firewalls in the other projects.

Now, with Shared VPC, you can overcome these challenges and compartmentalize Kubernetes Engine clusters into separate projects, for the following benefits to your enterprise:
  • Sharing of common resources - Shared VPC makes it easy to use network resources that must be shared across various teams, such as a set of IPs within RFC 1918 shared across multiple subnets and Kubernetes Engine clusters.
  • Security - Organizations want to leverage more granular IAM roles to separate access to sensitive projects and data. By restricting what individual users can do with network resources, network and security administrators can better protect enterprise assets from inadvertent or deliberate acts that can compromise the network. While a network administrator can set firewalls for every team, a cluster administrator’s responsibilities might be to manage workloads within a project. Shared VPC provides you with centralized control of critical network resources under the organization administrator, while still giving flexibility to the various organization admins to manage clusters in their own projects.
  • Billing - Teams can use projects and separate Kubernetes Engine clusters to isolate their resource usage, which helps with accounting and budgeting needs by letting you view billing for each team separately.
  • Isolation and support for multi-tenant workloads - You can break up your deployment into projects and assign them to the teams working on them, lowering the chances that one team’s actions will inadvertently affect another team’s projects.

Below is an example of how you can map your organizational structure to your network resources using Shared VPC:

Here is what, Spotify, one of the many enterprises using Kubernetes Engine, has to say:
“Kubernetes is our preferred orchestration solution for thousands of our backend services because of its capabilities for improved resiliency, features such as autoscaling, and the vibrant open-source community. Shared VPC in Kubernetes Engine is essential for us to be able to use Kubernetes Engine with our many GCP projects."
- Matt Brown, Software Engineer, Spotify

How does Shared VPC work?
Host project contains one or more shared network resources while the service project(s) map to the different teams or departments in your organization. After setting up the correct IAM permissions for service accounts in both the host and service projects, the cluster admin can instantiate a number of Compute Engine resources in any of the service projects. This way, critical resources like firewalls are centrally managed by the network or security admin, while cluster admins are able to create clusters in the respective service projects.

Shared VPC is built on top of Alias IP. Kubernetes Engine clusters in service projects will need to be configured with a primary CIDR range (from which to draw Node IP addresses), and two secondary CIDR ranges (from which to draw Kubernetes Pod and Service IP addresses). The following diagram illustrates a subnet with the three CIDR ranges from which the clusters in the Shared VPC are carved out.

The Shared VPC IAM permissions model
To get started with Shared VPC, the first step is to set up the right IAM permissions on service accounts. For the cluster admin to be able to create Kubernetes Engine clusters in the service projects, the host project administrator needs to grant the compute.networkUser and container.hostServiceAgentUser roles in the host project, allowing the service project's service accounts to use specific subnetworks and to perform networking administrative actions to manage Kubernetes Engine clusters. For more detailed information on the IAM permissions model, follow along here.

Try it out today!
Create a Shared VPC cluster in Kubernetes Engine and get the ease of access and scale for your enterprise workloads. Don’t forget to sign up for the upcoming webinar, 3 reasons why you should run your enterprise workloads on Kubernetes Engine.

Now on iOS: new vehicle icons to spice up your drive

There’s now a new way to customize your drive on Google Maps for iOS. Depending on your mood, you can swap out the classic blue navigation arrow for a new icon—a stylish sedan, a timeless pickup truck, or a speedy SUV. Get started by tapping on the arrow while in driving navigation mode to select your vehicle of choice, and hit the road with a brand new car, so you can have that new car feeling without the down payment.

New icons on Google Maps iOS

Learn Kotlin Fast with new Kotlin Bootcamp course

Posted by Aleks Haecky, Training Developer & Word Artist, Google+, LinkedIn, Medium

The Kotlin Bootcamp Udacity course is a free, self-paced online course that teaches you the basics of the Kotlin programming language. This introduction to Kotlin was created by Google experts in collaboration with Udacity and is for people who already know how to program.

The Kotlin language lets you create apps in less time, writing less code, and with fewer errors.

This modern object-oriented language offers a strong type system, type inference, null safety, properties, lambdas, extensions, coroutines, higher-order functions, and many other features. Kotlin is so concise that you can create complete data classes with a single line of code.

Kotlin is officially supported for building Android apps, fully interoperates with the Java programming language and libraries, and is included with IntelliJ and Android Studio.

In this course you will learn everything you need to program in Kotlin, including:

  1. Basics: Write Kotlin statements and expressions in the IntelliJ REPL Kotlin interpreter using nullable and non-nullable variables, data types, operators, and control structures.
  2. Functions: Create a main() function, create and call functions with default and variable arguments, pass functions as arguments to filters, program simple lambdas, function types, and compact single-expression functions.
  3. Classes: Create a class with methods and properties. Implement constructors and init(). Learn about inheritance, interfaces, and abstract classes. Use the special purpose classes data, object, enum, and sealed.
  4. Beyond the Basics: Dive deeper into Pairs, collections, and constants. Learn how to write extensions, implement generics, apply annotations, and use labeled breaks.
  5. Functional Manipulation: Explore more about lambdas, higher-order functions, and inline.

You'll learn how to use extension functions to add helpful functionality to existing classes.

Extend built-in types:

fun Int.print() = println(this)
5.print() // prints 5

Extend Android classes:

fun Context.toast(text: CharSequence, duration: Int = Toast.LENGTH_SHORT): Toast {
   return Toast.makeText(this, text, duration).apply { show() }
}
toast("Hello Toast")

Extend your own classes:

class AquariumPlant(
       val color: String)

fun AquariumPlant.print() =
       println("Pretty Aquarium Plant")

val plant = AquariumPlant("green")
plant.print()
// prints -> Pretty Aquarium Plant

When you've completed the course, you will be able to create programs in Kotlin, taking advantage of the features and capabilities that make Kotlin unique.

The course is available free, online at Udacity; take it in your own time at your own pace.

Go learn how to build apps with less code at https://www.udacity.com/course/ud9011.

Google Kubernetes Engine 1.10 is generally available and ready for the enterprise



Today, we’re excited to announce the general availability of Google Kubernetes Engine 1.10, which lays the foundation for new features to enable greater enterprise usage. Here on the Kubernetes Engine team, we’ve been thinking about challenges such as security, networking, logging, and monitoring that are critical to the enterprise for a long time. Now, in parallel to the GA of Kubernetes Engine 1.10, we are introducing a train of new features to support enterprise use cases. These include:
  • Shared Virtual Private Cloud (VPC) for better control of your network resources
  • Regional Persistent Disks and Regional Clusters for higher-availability and stronger SLAs
  • Node Auto-Repair GA, and Custom Horizontal Pod Autoscaler for greater automation
Better yet, these all come with the robust security that Kubernetes Engine provides by default.
Let’s take a look at some of the new features that we will add to Kubernetes Engine 1.10 in the coming weeks.
Networking: global hyperscale network for applications with Shared VPC
Large organizations with several teams prefer to share physical resources while maintaining logical separation of resources between departments. Now, you can deploy your workloads in Google’s global Virtual Private Cloud (VPC) in a Shared VPC model, giving you the flexibility to manage access to shared network resources using IAM permissions while still isolating your departments. Shared VPC lets your organization administrators delegate administrative responsibilities, such as creating and managing instances and clusters, to service project admins while maintaining centralized control over network resources like subnets, routes, and firewalls. Stay tuned for more on Shared VPC support this week, where we’ll demonstrate how enterprise users can separate resources owned by projects while allowing them to communicate with each other over a common internal network.

Storage: high availability with Regional Persistent Disks
To make it easier to build highly available solutions, Kubernetes Engine will provide support for the new Regional Persistent Disk (Regional PD). Regional PD, available in the coming days, provides durable network-attached block storage with synchronous replication of data between two zones in a region. With Regional PDs, you don’t have to worry about application-level replication and can take advantage of replication at the storage layer. This replication offers a convenient building block for implementing highly available solutions on Kubernetes Engine.
Reliability: improved uptime with Regional Clusters, node auto-repair
Regional clusters, to be generally available soon, allows you to create a Kubernetes Engine cluster with a multi-master, highly-available control plane that spreads your masters across three zones in a region—an important feature for clusters with higher uptime requirements. Regional clusters also offers a zero-downtime upgrade experience when upgrading Kubernetes Engine masters. In addition to Regional Clusters, the node auto-repair feature is now generally available. Node auto-repair monitors the health of the nodes in your cluster and repairs any that are unhealthy.
Auto-scaling: Horizontal Pod Autoscaling with custom metrics
Our users have long asked for the ability to scale horizontally any way they like. In Kubernetes Engine 1.10, Horizontal Pod Autoscaler supports three different custom metrics types in beta: External (e.g., for scaling based on Cloud Pub/Sub queue length - one of the most requested use cases), Pods (e.g., for scaling based on the average number of open connections per pod) and Object (e.g., for scaling based on Kafka running in your cluster).

Kubernetes Engine enterprise adoption

Since we launched it in 2014, Kubernetes has taken off like a rocket. It is becoming “the Linux of the cloud,” according to Jim Zemlin, Executive Director of the Linux Foundation. Analysts estimate that 54 percent of Fortune 100 companies use Kubernetes across a spectrum of industries including finance, manufacturing, media, and others.
Kubernetes Engine, the first production-grade managed Kubernetes service, has been generally available since 2015. Core-hours for the service have ballooned: in 2017 Kubernetes Engine core-hours grew 9X year over year, supporting a wide variety of applications. Stateful workload (e.g. databases and key-value stores) usage has grown since the initial launch in 2016, to over 40 percent of production Kubernetes Engine clusters.
Here is what a few of the enterprises who are using Kubernetes Engine have to say.
Alpha Vertex, a financial services company that delivers advanced analytical capabilities to the financial community, built a Kubernetes cluster of 150 64-core Intel Skylake processors in just 15 minutes and trains 20,000 machine learning models concurrently using Kubernetes Engine.
“Google Kubernetes Engine is like magic for us. It’s the best container environment there is. Without it, we couldn’t provide the advanced financial analytics we offer today. Scaling would be difficult and prohibitively expensive.”
- Michael Bishop, CTO and Co-Founder, Alpha Vertex
Philips Lighting builds lighting products, systems, and services. Philips uses Kubernetes Engine to handle 200 million transactions every day, including 25 million remote lighting commands.
“Google Kubernetes Engine delivers a high-performing, flexible infrastructure that lets us independently scale components for maximum efficiency.”
- George Yianni, Head of Technology, Home Systems, Philips Lighting
Spotify the digital music service that hosts more than 2 billion playlists and gives consumers access to more than 30 million songs uses Kubernetes Engine for thousands of backend services.
“Kubernetes is our preferred orchestration solution for thousands of our backend services because of its capabilities for improved resiliency, features such as autoscaling, and the vibrant open source community. Shared VPC in Kubernetes Engine is essential for us to be able to use Kubernetes Engine with our many GCP projects.”
- Matt Brown, Software Engineer, Spotify
Get started today and let Google Cloud manage your enterprise applications on Kubernetes Engine. To learn more about Kubernetes Engine, join us for a deep dive into the Kubernetes 1.10 enterprise features in Kubernetes Engine by signing up for our upcoming webinar, 3 reasons why you should run your enterprise workloads on Kubernetes Engine.

Helping over 15M women in 1,50,000 villages unlock their true potential through the Internet Saathi Program

https://lh6.googleusercontent.com/rnJrb19VCN1ykhn0KdKlCMxSDeRjfd59ODXqoVHpnyovQPUaCbdzpEahSo1B9tpJa1r9RTi4Lh4lwNBIkLQdn7-DnOqPrsfLmiAo0bmIjzO2Y5pJu-RR48Xtqhi2crcyZNFf5gh_
When we first met Parveen Begum -- an Internet Saathi, from a small village in Guntur, last year, she had many apprehensions and asked, ‘Is the Internet meant for me?’, ‘Will my family allow me to step outside the house to train other women?’,  ‘How will the people of my village react to this change?’. And she isn't alone. Hundreds and Thousands of women we meet along the journey of the Internet Saathi program, ask us similar questions. But today, Parveen Begu and 48000 other Internet Saathis have not only learnt to use the Internet themselves, but have also helped over 15M women in 150000 villages to unlock their true potential.

These women have gone ahead to do amazing things, once they learn to use the Internet and one such story is that of Padmavati from Guntur who was trained by Parveen Begum. Padmavati started her own business of lemongrass oil. She learnt the production process and techniques and then came up with the idea of setting up a small production plant in Guntur.

Today is an important day for the Internet Saathi program,  as we reach the halfway point for our milestone. The program is spread across the villages of Rajasthan, Gujarat, Jharkhand, Andhra Pradesh, Uttar Pradesh, Assam, West Bengal, Tripura, Maharashtra and Madhya Pradesh, Bihar, Haryana, Tamil Nadu and has also expanded to four new states -- Uttarakhand, Goa, Karnataka, and Telangana, in the past couple of months.


We are committed to bridging the digital gender divide in India and will continue to work towards reaching our larger ambition of 300,000 villages across India i.e. covering 50% of total villages in the country.

The program focuses on educating women to use the internet, who then impart training to other women in their community and neighboring villages.  What’s incredible to see is the impact the program has had on millions of women, their families and communities.

We are grateful to Tata Trusts, our onground NGO partners, local activists and several Google volunteers who have contributed to the success of this program. We watch with pride as the family of 48000 Internet Saathis grows!

By Sapna Chadha, Director of Marketing, Southeast Asia & India, Google

22 international YouTubers, 15 countries, 4 days: Behind the scenes at #io18

Editor’s note: A few of these videos are in different languages. Luckily, YouTube has automatic closed captioning and you can even magically auto-translate those captions into English. Click on CC, then the settings cog to turn on auto-translate to English.

What happens when you let 22 YouTubers from 15 countries run wild inside Google HQ, behind the scenes at Google I/O and across San Francisco’s urban jungle?

Organized chaos, that’s what. Plus lots of selfies, vlogs and smiles for their millions of followers back home. Last week, we invited a delightful bunch of YouTubers from around the world to join us on an adventure and check out the latest tech from Google. Here’s a glimpse at #io18 through their eyes.

Day 1 — Google HQ tour, conference bikes and more


To get things started, we toured the whole Googleplex campus and met 10 of the top Android developers from around the world.

Gaurav Chaudary (AKA Technical Guruji) from India tours Google HQ campus

Gaurav Chaudary (AKA Technical Guruji) from India tours Google HQ campus. Don’t forget to turn on closed captions and select “auto-translate” to English if you don’t speak Hindi

Felix Bahlinger from Germany

Felix Bahlinger from Germany rides a 7-person conference bike and goes speed-dating with top Android developers

One of our very special guests was a lovely 72-year-old lady known as Korea Grandma—definitely a leading candidate as one of the most energetic and daring grandmothers in the world.

For our friends in China, here’s Chaping’s wrap up on Weibo and for those Spanish speakers out there, check out Topes DeGama’s YouTube video.

Day 2 — I/O keynote and product demos


If you didn’t catch the keynote live stream, here are some quick recaps in English, Hindi, Spanish, Chinese, Arabic and German.

After the keynote wrapped, Mr.Mobile, Tim Schofield and Technical Guriji got hands-on with the new Android P and then our YouTuber crew explored the product demo sandboxes at I/O—including Liang Fen from China checking our Accessibility tent and flowers that react to your emotions.

Technical Guruji interacts with our latest Internet of Things tech

Technical Guruji interacts with our latest Internet of Things tech

Day 3 — Machine learning, Waymo, X, and digital wellbeing


To kick off day three, we went to an inspiration talk on the future of machine learning. Auman gave a detailed summary for his Hong Kong followers while RayDu boiled machine learning down to just two simple lines.
Circles and squares

ML demystified by RayDu

We then visited our Alphabet cousins, Waymo and X, to hear about how machine learning is helping make new technologies like self-driving cars, Project Wing and more possible. In classic grandma style, Korea Grandma even handed out chocolates delivered by a Project Wing delivery drone to the team.

Technical Guruji films the arrival of chocolates delivered by Project Wing

Technical Guruji films the arrival of chocolates delivered by Project Wing

After a busy start to the week, it was time to chill for a bit. Helping people maintain a healthy relationship with the way they want to use tech was a major focus of I/O this year, so we decided to take a break with a meditation session. Google VP Sameer Samat also dropped by to share how we’re building digital wellbeing controls into the next version of Android P. Creative Monkeyz, Flora, and Pierre couldn’t pass up the opportunity for a quick selfie too. 😉

Pierre (The Liu Pei) talks digital wellbeing in Android P with Sameer

Pierre (The Liu Pei) talks digital wellbeing in Android P with Sameer

We played Emoji Scavenger Hunt... Aaaaaand then we partied. Hard.

io after hours

Day 4 — Urban AI digital jungle


For our last day together, we trekked into San Francisco to road-test the latest tech in the real world. We went to San Francisco Zoo for a Google Lens Zoo Safari and then had lunch with menus written only in Japanese—so you had to use Word Lens in Google Translate to decide what you wanted to eat!
SF zoo scavenger hunt

Newrara with her Korea Grandma using Google Lens at the SF Zoo

Then we finished with a #TeamPixel Photo Tour of the city to try out Portrait Mode with our mint-fresh Pixel 2s.

You still want more? Okay! Here are some full trip vlogs with ALL the things.

Coisa de Nerd’s full trip summary

Coisa de Nerd’s full trip summary

Topes De Gama’s full trip summary

Topes De Gama’s full trip summary

Until our next adventure!

Introducing Git protocol version 2


Today we announce Git protocol version 2, a major update of Git's wire protocol (how clones, fetches and pushes are communicated between clients and servers). This update removes one of the most inefficient parts of the Git protocol and fixes an extensibility bottleneck, unblocking the path to more wire protocol improvements in the future.

The protocol version 2 spec can be found here. The main improvements are:
    The main motivation for the new protocol was to enable server side filtering of references (branches and tags). Prior to protocol v2, servers responded to all fetch commands with an initial reference advertisement, listing all references in the repository. This complete listing is sent even when a client only cares about updating a single branch, e.g.: `git fetch origin master`. For repositories that contain 100s of thousands of references (the Chromium repository has over 500k branches and tags) the server could end up sending 10s of megabytes of data that get ignored. This typically dominates both time and bandwidth during a fetch, especially when you are updating a branch that's only a few commits behind the remote, or even when you are only checking if you are up-to-date, resulting in a no-op fetch.

    We recently rolled out support for protocol version 2 at Google and have seen a performance improvement of 3x for no-op fetches of a single branch on repositories containing 500k references. Protocol v2 has also enabled a reduction of 8x of the overhead bytes (non-packfile) sent from googlesource.com servers. A majority of this improvement is due to filtering references advertised by the server to the refs the client has expressed interest in.

    Getting over the hurdles

    The Git project has tried on a number of occasions over the years to either limit the initial ref advertisement or move to a new protocol altogether but continued to run into two problems: (1) the initial request is rigid and does not include a field that could be used to request that new servers modify their response without breaking compatibility with existing servers and (2) error handling is not well enough defined to allow safely using a new protocol that existing servers do not understand with a quick fallback to the old protocol. To migrate to a new protocol version, we needed to find a side channel which existing servers would ignore but could be used to safely communicate with newer servers.

    There are three main transports that are used to speak Git’s wire-protocol (git://, ssh://, and https://), and the side channel that we use to request v2 needs to communicate in such a way that an older server would ignore any additional data sent and not crash. The http transport was the easiest as we can simply include an additional http header in the request (“Git-Protocol: version=2”). The ssh transport is a bit more difficult as it requires sending an environment variable (“GIT_PROTOCOL=version=2”) to be set on the remote end. This is more challenging because it requires server administrators to configure sshd to accept the new environment variable on their server. The most difficult transport is the anonymous Git transport (git://).

    Initial requests made to a server using the anonymous Git transport are made in the form of a single packet-line which includes the requested service (git-upload-pack for fetches and git-receive-pack for pushes), and the repository followed by a NUL byte. Later virtualization support was added and a hostname parameter could be tacked on and  terminated by a NUL byte: `0033git-upload-pack /project.git\0host=myserver.com\0`. Ideally we’d be able to add a new parameter to be used to request v2 by adding it in the same manner as the hostname was added: `003dgit-upload-pack /project.git\0host=myserver.com\0version=2\0`. Unfortunately due to a bug introduced in 2006 we aren't able to place any extra arguments (separated by NULs) other than the host because otherwise the parsing of those arguments would enter an infinite loop. When this bug was fixed in 2009, a check was put in place to disallow extra arguments so that new clients wouldn't trigger this bug in older servers.

    Fortunately, that check doesn't notice if we send additional request arguments hidden behind a second NUL byte, which was pointed out back in 2009.  This allows requests structured like: `003egit-upload-pack /project.git\0host=myserver.com\0\0version=2\0`. By placing version information behind a second NUL byte we can skirt around both the infinite loop bug and the explicit disallowal of extra arguments besides hostname. Only newer servers will know to look for additional information hidden behind two NUL bytes and older servers won’t croak.

    Now, in every case, a client can issue a request to use v2, using a transport-specific side channel, and v2 servers can respond using the new protocol while older servers will ignore the side channel and just respond with a ref advertisement.

    Try it for yourself

    To try out protocol version 2 for yourself you'll need an up to date version of Git (support for v2 was recently merged to Git's master branch and is expected to be part of Git 2.18) and a v2 enabled server (repositories on googlesource.com and Cloud Source Repositories are v2 enabled). If you enable tracing and run the `ls-remote` command querying for a single branch, you can see the server sends a much smaller set of references when using protocol version 2:

    ```
    # Using the original wire protocol
    GIT_TRACE_PACKET=1 git -c protocol.version=0 ls-remote https://chromium.googlesource.com/chromium/src.git master

    # Using protocol version 2
    GIT_TRACE_PACKET=1 git -c protocol.version=2 ls-remote https://chromium.googlesource.com/chromium/src.git master
    ```

    By Brandon Williams, Git-core Team

    Sharding of timestamp-ordered data in Cloud Spanner



    Cloud Spanner was designed from the ground up to offer horizontal scalability and a developer-friendly SQL interface. As a managed service, Google Cloud handles most database management tasks, but it’s up to you to ensure that there are no hotspots, as described in Schema Design Best Practices and Optimizing Schema Design for Cloud Spanner. In this article, we’ll look at how to efficiently insert and retrieve records with timestamp ordering. We’ll start with the high-level guidance provided in Anti-pattern: timestamp ordering and explore the scenario in more detail with a concrete example.

    Scenario

    Let’s say we’re building an app that logs user activity along with timestamps and also allows users to query this activity by user id and time range. A good primary key for the table storing user activity (let’s call it LogEntries) is (UserId, Timestamp), as this gives us a uniform distribution of activity logs. Cloud Spanner inserts log entries sequentially, but they’re naturally sharded by UserId, resulting in uniform key distribution.

    Table LogEntries

    UserId (PK)
    Timestamp (PK)
    LogEntry
    15b7bd1f-8473
    2018-05-01T15:16:03.386257Z


    Here’s a sample query to retrieve a list of log entries by user and time range:

    SELECT UserId, Timestamp, LogEntry
    FROM LogEntries
       WHERE UserID = '15b7bd1f-8473'
       AND Timestamp BETWEEN '2018-05-01T15:14:10.386257Z'
       AND '2018-05-01T15:16:10.386257Z';
    
    This query takes advantage of the primary key and thus performs well.

    Now let’s make things more interesting. What if we wanted to group users by the company they work for so we can segment reports by company? This is a fairly common use case for Cloud Spanner, especially with multi-tenant SaaS applications. To support this, we create a table with the following schema.
    Table LogEntries


    CompanyId (PK)
    UserId (PK)
    Timestamp (PK)
    LogEntry
    Acme
    15b7bd1f-8473
    2018-05-01T15:16:03.386257Z


    And here’s the corresponding query to retrieve the log entries:

    SELECT CompanyId, UserId, Timestamp, LogEntry
    FROM LogEntries
       WHERE CompanyID = 'Acme'
       AND UserID = '15b7bd1f-8473'
       AND Timestamp BETWEEN '2018-05-01T15:14:10.386257Z'
       AND '2018-05-01T15:16:10.386257Z';
    


    Here’s the query to retrieve log entries by CompanyId and time range (user field not specified):

    SELECT CompanyId, UserId, Timestamp, LogEntry
    FROM LogEntries
       WHERE CompanyID = 'Acme'
       AND Timestamp BETWEEN '2018-05-01T15:14:10.386257Z'
       AND '2018-05-01T15:16:10.386257Z';
    
    To support the above query, we add a separate, secondary index. Initially, we include just two columns:

    CREATE INDEX LogEntriesByCompany ON UserActivity(CompanyId, Timestamp)
    

    Challenge: hotspots during inserts


    The challenge here is that some companies may have a lot more (orders of magnitude more) users than others, resulting in a very skewed distribution of log entries. The challenge is particularly acute during inserts as described in the opening paragraph above. And even if Cloud Spanner helps out by creating additional splits, nodes that service new splits become hotspots due to uneven key distribution.

    The above diagram depicts a scenario where Company B has three times more users than Company A or Company C. Therefore, log entries corresponding to Company B grow at a higher rate, resulting in the hotspotting of nodes that service the splits where Company B’s log entries are being inserted.

    Hotspot mitigation

    There are multiple aspects to our hotspot mitigation strategy: schema design, index design and querying. Let’s look at each of these below.

    Schema and index design 

    As described in Anti-pattern: timestamp ordering, we’ll use application-level sharding to distribute data evenly. Let’s look at one particular approach for our scenario: instead of (CompanyId, UserId, Timestamp), we’ll use (UserId, CompanyId, Timestamp).

    Table LogEntries (reorder columns CompanyId and UserId in Primary Key)


    UserId (PK)
    CompanyId (PK)
    Timestamp (PK)
    LogEntry
    15b7bd1f-8473
    Acme
    2018-05-01T15:16:03.386257Z


    By placing UserId before CompanyId in the primary key, we can mitigate the hotspots caused by the non-uniform distribution of log entries across companies.

    Now let’s look at the secondary index on CompanyId and timestamp. Since this index is meant to support queries that specify just CompanyId and timestamp, we cannot address the distribution problem by simply incorporating UserId. Keep in mind that indexes are also susceptible to hotspots and we need to design them so that their distribution is uniform.

    To address this, we’ll add a new column, EntryShardId, where (in pseudo-code):
    entryShardId = hash(CompanyId + timestamp) % num_shards
    
    The hash function here could be a simple crc32 operation. Here’s a python snippet illustrating how to calculate this hash function before a log entry is inserted:
    ...
    import datetime
    import zlib
    ...
    timestamp = datetime.datetime.utcnow()
    companyId = 'Acme'
    entryShardId = (zlib.crc32(companyId + timestamp.isoformat()) & 0xffffffff) % 10
    ...
    
    In this case, num_shards = 10. You can adjust this value based on the characteristics of your workload. For instance, if one company in our scenario generates 100 times more log entries on average than the other companies, then we would pick 100 for num_shards in order to achieve a uniform distribution across entries from all companies.

    This hashing approach essentially takes the sequential, timestamp-ordered LogEntriesByCompany index entries for a particular company and distributes them across multiple application (or logical) shards. In this case, we have 10 such shards per company, resulting from the crc32 and modulo operations shown above.

    Table LogEntries (with EntryShardId added)


    CompanyId (PK)

    UserId (PK)

    Timestamp (PK)

    EntryShardId

    LogEntry

    ‘Acme’

    1

    2018-05-01T15:16:03.386257Z

    8


    And the index:
    CREATE INDEX LogEntriesByCompany ON LogEntries(EntryShardId, CompanyId, Timestamp)
    

    Querying

    Evenly distributing data using a sharding approach is great for inserts but how does it affect retrieval? Application-level sharding is no good to us if we cannot retrieve the data efficiently. Let’s look at how we would query for a list of log entries by CompanyId and time range, but without UserId:

    SELECT CompanyId, UserId, Timestamp, LogEntry
    FROM LogEntries@{FORCE_INDEX=LogEntriesbyCompany}
       WHERE CompanyId = 'Acme'
       AND ShardedEntryId BETWEEN 0 AND 9
       AND Timestamp > '2018-05-01T15:14:10.386257Z'
       AND Timestamp < '2018-05-01T15:16:10.386257Z'
    ORDER BY Timestamp DESC;
    

    The above query illustrates how to perform a timestamp range retrieval while taking sharding into account. By including the ShardedEntryId in the query above, we tell Spanner to ‘look’ in all 10 logical shards to retrieve the timestamp entries for CompanyId ‘Acme’ for a particular range.

    Cloud Spanner is a full-featured relational database service that relieves you of most—but not all—database management tasks. For more information on Cloud Spanner management best practices, check out the recommended reading.

    Anti-pattern: timestamp ordering
    Optimizing Schema Design for Cloud Spanner
    Best Practices for Schema Design

    Dev Channel Update for Chrome OS


    The Dev channel has been updated to 68.0.3431.0 (Platform version: 10682.0.0) for most Chrome OS devices. This build contains a number of bug fixes, security updates and feature enhancements.


    If you find new issues, please let us know by visiting our forum or filing a bug. Interested in switching channels? Find out how. You can submit feedback using ‘Report an issue...’ in the Chrome menu (3 vertical dots in the upper right corner of the browser).


    CIndy Bayless
    Google Chrome

    Watch Prince Harry and Meghan Markle tie the knot this Saturday

    Cross-posted from the Official Google Blog

    Tomorrow people from all over the world will tune in to watch the wedding of Britain's Prince Harry and Meghan Markle.

    To give people everywhere a chance to join together and celebrate this royal union, on Saturday, May 19, the ceremony will be live streamed on the Royal Family's official YouTube channel.

     The live stream will follow the wedding procession, marriage ceremony at Windsor Castle, and wedding day happenings along the way. Afterwards, the footage will be reshown so that people can enjoy this wonderful event no matter their location or time zone.

    Whether you’re from Blighty or anywhere else on the globe, all eyes will be on St. George’s Chapel in Windsor Castle this Saturday to see this next chapter of royal history.

    Source: YouTube Blog