Tag Archives: Customers

Out of one, many: Using Jenkins, GCP projects and service accounts at Catalant



[Editor’s Note: Today we hear from Catalant, an on-demand staffing provider that connects consultants and enterprises with a Software as a Service (SaaS) application that’s built on Google Cloud Platform (GCP). Using Jenkins, Google projects and service accounts, Catalant was able to build a single shared environment across production, development and sales that was easy to manage and that satisfied its compliance and regulatory requirements].

If your organization provides a SaaS application, you probably have multiple environments: production of course, but also demo, test, staging and integration environments to support various use cases. From a management perspective, you want to share resources across all those environments so you have the least number of moving parts. But ease of management and robust security are often at odds. For security purposes, the best practice is a separate project for each environment where nothing is shared and there's complete isolation.

Here at Catalant, we approached this problem by taking a step back and understanding the requirements from different parts of the organization:
  1. Compliance: Each environment and its data needs to be secure and not be shared. 
  2. Sales: We need an environment that lets us control the data, so that we can give a consistent, predictable demo. 
  3. Development: We need an environment where we can test things before putting them into production. 
  4. Engineering Management: We need continuous integration and continuous deployment (CI/CD). Also, developers should not be required to use GCP-specific tools.
Based on these requirements, we elected to go with a single shared Jenkins project to manage CI/CD activities for all the environments (test, demo, prod) that we may bring up, which satisfied developers and engineering management. Google Cloud’s concept of projects, meanwhile, addressed the compliance team’s concerns with fortified boundaries that by default do not allow unauthorized traffic into the environment. Finally, we used service accounts to allow projects to communicate with one another.
Figure 1. Jenkins Pipeline
Figure 2. Projects Layout

We built this environment on Google Compute Engine. And while it’s out of the scope of this article to show how to build this out on Google Kubernetes Engine (formerly Container Engine), these resources can show you how to do it yourself:

Creating a service account


By default, when a developer creates a project, GCP also creates a default Compute Engine service account that you can use to access any of its own project resources as well as that of another project’s resources. We took advantage of this service account to access the Jenkins project resources.

We store all the images that we build with the Jenkins project in Container Registry. We provided “Storage Object Viewer” access for each project’s default service account so that the images can be deployed (via pull access) into an environment-specific project. In addition, to deploy the containers, we created a Jenkins service account that can authenticate into projects’ Kubernetes clusters for a specific namespace.

Here’s how to create a service account based on a namespace:

Step1 - Create a service account:

kubectl create serviceaccount <sa-name> --namespace <ns-name></ns-name></sa-name>

This command creates a service account on the destination project. This service account will be used by Jenkins in order to authenticate into the destination cluster.

Step 2 - Verify the new service account:

kubectl get serviceaccount <sa-name> --namespace <ns-name> -o yaml</ns-name></sa-name>

This checks that the service account was successfully created, and outputs the service account details in a yaml format.

Step 3 - Get the secret name:

kubectl get sa <sa-name> -o json --namespace  <ns-name> | jq -r .secrets[].name</ns-name></sa-name>

This retrieves the secret name associated with the service account created in Step 1.

Step 4 - Get the certificate:

kubectl get secret   -o json --namespace  <ns-name> | jq -r '.data["ca.crt"]' | base64 -d > ca.crt</ns-name></secret-name>

This gets the certificate details from the secret, decodes the certificate data and stores it into a file ca.crt. The certificate ca.crt will be used in order to authenticate into a cluster.

Step 5 - Get the token:

kubectl get secret <secret-name> -o json --namespace <ns-name> | jq -r '.data["token"]' | base64 -d</ns-name></secret-name>

This command gets the token from the secret and decodes the token to plain text. The token will be used in order to authenticate into the cluster.

Step 6 - Get the IP address of the cluster:

kubectl config view -o yaml | grep server

Allowing cross-project access


When Jenkins does a deploy, it needs to authenticate into each project's Kubernetes cluster. In the Jenkins application, we created the service account’s token and certificate as credentials. The steps below show how to authenticate into a different project, known as cross-project access.

Again, let’s explain what each step does:

Step 1 - Set a cluster entry in kubeconfig:

kubectl config set-cluster <cluster-name> --embed-certs=true --server=<cluster-ip> --certificate-authority=<path/to/certificate>

where
  • <cluster-name>can be any name 
  • --embed-certs=true embeds certs for the cluster entry in kubeconfig 
  • --server=<cluster-ip> is the cluster ip where we’re trying to authenticate, namely, the IP generated in Step 6 of the service account creation process 
  • --certificate-authority=<path certificate> is the certificate path and the certificate is the certificate file we generated in Step 4 of service account creation section above 
Step 2 - Set the user entry in kubeconfig:

kubectl config set-credentials  --token=

where
  • <credentials-name> can be any name 
  • --token=<token-value> is the token value that was decoded during Step 5 of the previous section 
Step 3- Set the context entry in kubeconfig:

kubectl config set-context <context-name> --cluster=<cluster-name> --user=<credentials-name> --namespace=<ns-name></ns-name></credentials-name></cluster-name></context-name>

where
  • <context-name>can be any name 
  • --cluster=<cluster-name> is the cluster name set up in Step 1 above
  • --user=<credentials-name> is the credentials name set up in Step 2 above 
  • --namespace=<ns-name> - the namespace name we like to interact with. 
Step 4 - Set the current-context in a kubeconfig file:
kubectl config use-context <context-name>

Where <context-name> is the context name that we created in Step 3 above. 

After setting up the context, we’re ready to access the destination project cluster. All the kubectl commands will be executed against the destination project cluster. A simple test to verify that we're accessing the destination project cluster successfully is to check for pods.

kubectl get pods -n <ns-name>

If the output pods’ list shown is the list of the destination project pods, then you set up the configuration correctly. All the Kubernetes commands are associated to the destination project.

In this setup, bringing up new environments is quick and easy since the Jenkins environment doesn’t have to be re-created or copied for each Google project. Of course, it does create a single point of failure and a shared resource. It’s important to configure Jenkins correctly, so that work from a single environment can’t starve out the rest. Make sure you have enough resources for the workers and limit the number of builds per branch to one  that way multiple commits in quick succession to a branch can’t overload the infrastructure.

All told, the combination of Jenkins for CI/CD plus Google Cloud projects and service accounts give us the best of both worlds: a single shared environment that uses resources efficiently and is easy to manage, plus the security and isolation that our compliance and sales teams demanded. If you have questions or comments about this environment, reach out to us. And for more information, visit GoCatalant.com.

How to get real-time, actionable insights from your Fastly logs with Looker and BigQuery



Editor’s note: Fastly, whose edge cloud platform offers content delivery, streaming, security and load-balancing, recently integrated its platform with Looker, a business intelligence tool. Using Google BigQuery as its analytics engine, you can use Fastly plus Looker to do things like improve your operations, analyze the effectiveness of marketing programs — even identify attack trends.

This past August we announced a deeper integration between Google Cloud Platform (GCP) and Fastly’s edge cloud. In addition to using Fastly to improve response times for applications built on GCP, Fastly customers can stream Fastly logs in real-time from the edge to a number of third parties for deeper analysis, including Google Cloud Storage and BigQuery. We're now expanding upon this partnership by integrating Looker, a powerful business intelligence tool, into our offering.

Looker can analyze Fastly log data on its own or combine it with other data sources in BigQuery such as Google Analytics, Google Ads data or security and firewall logs, allowing customers to run queries against these data sets and present findings in dashboards to facilitate better business decisions.

As part of this collaboration, we created a “Looker Block” for Fastly Log Analytics in BigQuery, to help you get up and running quickly with key visualizations and metrics. Think of Looker Blocks as analytical patterns that can be used as a starting point for modeling a data source. They include dashboards and key metrics that can be explored ad-hoc to build new customized reports. The Fastly Looker Block can be extended to account for specific Fastly logging use cases while also connecting to other data sources in BigQuery for more comprehensive analysis.

Looker runs all analytics in BigQuery — data is never moved from the source — leveraging BigQuery’s performance and features directly. This functionality is made possible via Looker’s modeling layer, LookML, which serves as an abstraction of SQL.

Here are some common use cases for GCP customers who wish to take advantage of both Fastly and Looker:

DevOps - Fastly streams 100% of logs from the edge to BigQuery in real time, providing insights into web and app usage. Using Looker dashboards, you can correlate the most popular URLs, website and app activity by country, and activity by client device. You can then use this information to see which content is gaining the most traction where, and what devices it’s being consumed on.

Leveraging BigQuery analytics, Looker can also analyze Fastly log data and create dashboards to use for troubleshooting. Here, Looker can illustrate failed requests by geo / datacenter, and country, or the slowest URLs. You can also use these dashboards to troubleshoot connectivity issues, pinpoint configuration areas that need tuning, and identify the cause of service disruptions.

Looker dashboard, troubleshooting using Fastly log data
Marketing/Digital Advertising - Looker can cross-reference Fastly log data with other data sources for broader insights. For example, by combining Fastly app activity by country with Google Ad data, marketers can discover where engagement is higher and which users are more likely to consume their ads.

Looker dashboard, analysis of user engagement with Google Ad data
Security - You can also use Looker to help visualize Fastly’s real-time logs for insights into live attack trends. Fastly’s Web Application Firewall (WAF) logs can be fed into Google BigQuery. Looker then pulls that data to create dashboards illustrating trends in attacks, breakdown of attacks over time, spikes in attacks from a given attacker, and more.

Looker dashboard, Fastly's WAF top offenders

Getting started with Fastly and Looker on GCP


If you haven’t yet signed up for Fastly, setting up a trial account is quick and easy. Once your applications are up and running, you can set up Google Cloud Storage for your Fastly streaming logs and establish BigQuery as a logging endpoint.

If you need to get started with Looker, you can request a demo. Once you’re using Looker, follow the documentation to connect BigQuery to your Looker instance. Make sure Looker has access to your Fastly data and any other data sources you’d like to explore (e.g., Google Analytics, Google Ads data, security or firewall log data).

Another way to get started with Looker and Fastly is to use the Log Analytics by Fastly Block. You can either download the entire block into Looker by following the directions, or selectively migrate pieces of the block by simply copying and pasting the block LookML into your Looker instance. Then customize your LookML model to account for any custom metrics relevant to your business within the Fastly logs data (or any other data you’ve made available to Looker in BigQuery).

Now that you are set up with Fastly, BigQuery and Looker you’re ready to get real-time insights into how your web and mobile traffic is performing and better understand users interactions with your applications. Have questions? Please contact us.

How Qubit and GCP helped Ubisoft create personalized customer experiences



Editor’s note: Today’s blog post comes from Alex Olivier, product manager at Qubit. He’ll be taking us through the solution Qubit provided for Ubisoft, one of the world’s largest gaming companies, to help them personalize customer experiences through data analysis.

Our platform helps brands across a range of sectors — from retail and gaming to travel and hospitality — deliver a personalized digital experience for users. To do so, we analyze thousands of data points throughout a customer’s journey, taking the processing burden away from our clients. This insight prompts our platform to make a decision — for example, including a customer in a VIP segment, or identifying a customer’s interest in a certain product — and adapts the visitor’s experience accordingly.

As one of the world's largest gaming companies, Ubisoft faced a problem that challenges many enterprises: a data store so big it was difficult and time-consuming to analyze. “Data took between fifteen and thirty minutes to process,” explained Maxime Bosvieux, EMEA Ecommerce Director at Ubisoft. “This doesn’t sound like much, but the modern customer darts from website to website, and if you’re unable to provide them with the experience they’re looking for, when they’re looking for it, they’ll choose the competitor who can.” That’s when they turned to Qubit and Google Cloud Platform.

A cloud native approach.


From early on, we made the decision to be an open ecosystem so as to provide our clients and partners with flexibility across technologies. When designing our system, we saw that the rise of cloud computing could transform not only how platform companies like ours process data, but also how they interface with customers. By providing Cloud-native APIs across the stack, our clients could seamlessly use open source tools and utilities with Qubit’s systems that run on GCP. Many of these tools interface with gsutil via the command-line, call BigQuery, or even upload to Cloud Storage buckets via CyberDuck.

We provision and provide our clients access to their own GCP project. The project contains all data processed and stored from their websites, apps and back-end data sources. Clients can then access both batch and streaming data, be it a user's predicted preferred category, a real-time calculation of lifetime value, or which customer segment the user belongs to. A client can access this data within seconds, regardless of their site’s traffic volume at that moment.


Bringing it all together for Ubisoft.


One of the first things Ubisoft realized is that they needed access to all of their data, regardless of the source. Qubit Live Tap gave Ubisoft access to the full take of their data via BigQuery (and through BI tools like Google Analytics and Looker). Our system manages all data processing and schema management, and reports out actionable next steps. This helps speed up the process of understanding the customer in order to provide better personalization. Using BigQuery’s scaling abilities, Live Tap generates machine learning and AI driven insights for clients like Ubisoft. This same system also lets them access their data in other BI and analytics tools such as Google Data Studio.

We grant access to clients like Ubisoft through a series of views in their project that point back to their master data store. The BigQuery IAM model (permissions provisioning for shared datasets) allows views to be authorized across multiple projects, removing the need to do batch copies between instances, which might cause some data to become stale. As Qubit streams data into the master tables, the views have direct access to it: analysts who perform queries in their own BigQuery project get access to the latest, real-time data.

Additionally, because the project provided is a complete GCP environment, clients like Ubisoft can also provision additional resources. We have clients who create their own Dataproc clusters, or import data provided by Qubit in BigQuery or via a PubSub topic to perform additional analysis and machine learning in a single environment. This process avoids the data wrangling problems commonly encountered in closed systems.

By combining Google Cloud Dataflow, Bigtable and BigQuery, we’re able to process vast amounts of data quickly and at petabyte-scale. During a typical month, Qubit’s platform will provide personalized experiences for more than 100 million users, surface 28 billion individual visitor experiences from ML-derived conclusions on customer data and use AI to simulate more than 2.3 billion customers journeys.

All of this made a lot of sense to Ubisoft. “We’re a company famous for innovating quickly and pushing the limits of what can be done,” Maxime Bosvieux told us. “That requires stable and robust technology that leverages the latest in artificial intelligence to build our segmentation and personalization strategies.”

Helping more companies move to the cloud with effective and efficient migrations.


We’re thrilled that the infrastructure we built with GCP has helped clients like Ubisoft scale data processing far beyond previous capabilities. Our integration into the GCP ecosystem is making this scalability even more attractive to organizations switching to the cloud. While porting data to a new provider can be daunting, we’re helping our clients make a more manageable leap to GCP.

Looking back on our migration from bare metal to GCP: Sentry



[Editor’s note: Is the thought of migrating to Google Cloud Platform (GCP) simultaneously exciting and daunting? You’re not alone. This summer, after months of planning, Sentry took the plunge and moved its hosted open-source error tracking service from a bare-metal provider to GCP. Read on to learn about why it decided to switch and how it settled on its migration game-plan.]

It was the first weekend in July. And because we’re in San Francisco, it was so foggy and windy that we may as well have been huddled inside an Antarctic research station trying to avoid The Thing.

The Sentry operations team had gathered at HQ in SOMA to finish migrating our infrastructure to GCP. The previous two and a half months had been tireless (and tiring), but we finished the day by switching over sentry.io’s DNS records and watched as traffic slowly moved from our colo provider in Texas to us-central1 in Iowa.

We’ve now gone several months without dedicated hardware, with no downtime along the way, and we feel good about having made the switch from bare metal to a cloud provider. Here’s what we learned along the way.

It’s all about meeting unpredictable demand

As an error tracking service, Sentry’s traffic is naturally unpredictable, as there’s simply no way to foresee when a user’s next influx of events will be. On bare metal, we handled this by preparing for the worst(ish) and over-provisioning machines in case of a spike. We’re hardly the only company to do this; it’s a popular practice for anyone running on dedicated hardware. And since providers often compete on price, users like us reap the benefits of cheap computing power.

Unfortunately, as demand grew, our window for procuring new machines shrunk. We demanded more from our provider, requesting machines before we really needed them and kept them idle for days on end. This was exacerbated when we needed bespoke machines, since databases and firewalls took even more time to piece together than commodity boxes.

But even in the best case, you still had an onsite engineer sprinting down the floor clutching a machine like Marshawn Lynch clutches a football. It was too nerve-wracking. We made the decision to switch to GCP because the machines are already there, they turn on in seconds after we request them, and we only pay for them when they’re on.

Building the bridge is harder than crossing it

We decided that migrating to GCP was possible in April, and the operations team spent the next two months working diligently to make it happen. Our first order of business: weed out all the single data center assumptions that we’d made. Sentry was originally constructed for internal services communicating across the room from each other. Increasing that distance to hundreds of miles during the migration would change behaviour in ways that we never planned for. At the same time, we wanted to make certain that we could sustain the same throughput between two providers during the migration that we previously sustained inside of only one.

The first fork in the road that we came to was the literal network bridge. We had two options: Maintain our own IPsec VPN or encrypt arbitrary connections between providers. Weighing the options, we agreed that public end-to-end latency was low enough that we could rely on stunnel to protect our private data across the public wire. Funneling this private traffic through machines acting as pseudo-NATs yielded surprisingly solid results. For two providers that were roughly 650 miles apart, we saw latencies of around 15 milliseconds for established connections.

The rest of our time was spent simulating worst-case scenarios, like “What happens if this specific machine disappears?” and “How do we point traffic back the other way if something goes wrong?” After a few days of back-and-forth on disaster scenarios, we ultimately determined that we could successfully migrate with the caveat that the more time we spent straddled between two providers, the less resilient we would be.

Every change to infrastructure extends your timeline

A lot of conversations about migrating to the cloud weigh the pros and cons of doing a “lift and shift” vs. re-architecting for the cloud. We chose the former. If we were going to be able to migrate quickly, it was because we were treating GCP as a hardware provider. We gave them money, they gave us machines to connect to and configure. Our entire migration plan was focused around moving off of our current provider, not adopting a cloud-based architecture.

Sure, there were solid arguments for adopting GCP services as we started moving, but we cast those arguments aside and reminded ourselves that the primary reason for our change was not about architecture  it was about infrastructure. Minimizing infrastructure changes not only reduced the work required, but also reduced the possibility for change in application behavior. Our focus was on building the bridge, not rebuilding Sentry.

Migrate like you’re stealing a base

Once we agreed that we’d built the bridge correctly, we sought out to divert our traffic the safest way we could think of: slow and thorough testing, followed by quick and confident migrating. We spent a week diverting our L4 traffic to GCP in short bursts, which helped us build confidence that we could process data in one provider and store it in the other.

Then the migration really got underway. It started with failing over our single busiest database, just to be extra certain that Google Compute Engine could actually keep up with our IO. Those requirements met, it was a race to get everything else into GCP the other databases, the workers writing to them and the web machines reading from them. We did everything we could to rid ourselves of hesitation. Like stealing a base, successful migrations are the result of careful planning and confident execution.


Dust doesn’t settle, you settle dust

The fateful and foggy July day when we switched over finally came. After a few days, we deleted our old provider’s dashboard from our Bookmarks and set out to get comfortable in GCP: we hung a few pictures, removed the shims we put in place for the migration and checked what time the bar across the street had happy hour. More to the point, now that we had a clearer picture of resource requirements, we could start resizing our instances.

No matter how long we spent projecting resource usage within Compute Engine, we never would have predicted our increased throughput. Due to GCP’s default microarchitecture, Haswell, we noticed an immediate performance increase across our CPU-intensive workloads, namely source map processing. The operations team spent the next few weeks making conservative reductions in our infrastructure, and still managed to cut our infrastructure costs by roughly 20%. No fancy cloud technology, no giant infrastructure undertaking  just new rocks that were better at math.

Now that we’ve finished our apples-to-apples migration, we can finally explore all of the features that GCP provides in hopes of adding even more resilience to Sentry. We’ll talk about these features, as well as the techniques we use to cut costs, in future blog posts.

I’d like to send out a special thanks to Matt Robenolt and Evan Ralston for their contributions to this project; you are both irreplaceable parts of the Sentry team. We would love to have another person on our team to help us break ground on our next infrastructure build-out. Maybe that person is you?

Guest post: Using GCP for massive drug discovery virtual screening



[Editor’s note: Today we hear from Boston, MA-based Silicon Therapeutics, which is applying computational methods in the context of complex biochemical problems relevant in human biology.]

As an integrated computational drug discovery firm, we recently deployed our INSITE Screening platform on Google Cloud Platform (GCP) to analyze over 10 million commercially available molecular compounds as potential starting materials for next-generation medicines. In one week, we performed over 500 million docking computations to evaluate how a protein responds to a given molecule. Each computation involved a docking program that predicted the preferred orientation of a small molecule to a protein and the associated energetics so we could assess whether or not it will bind and alter the function of the target protein.

With a combination of Google Compute Engine standard and Preemptible VMs, we used up to 16,000 cores, for a total of 3 million core-hours and a cost of about $30,000. While this might sound like a lot of time and money, it's a lot less expensive and a lot faster than experimentally screening all compounds. Using a physics-based approach such as our INSITE platform is much more computationally expensive than some other computational screening approaches, but it allows us to find novel binders without the use of any prior information about active compounds (this particular target has no drug-like compounds known to bind). In a final stage of the calculations we performed all-atom molecular dynamics (MD) simulations on the top 1,000 molecules to determine which ones to purchase and experimentally assay for activity.

The bottom line: We successfully completed the screen using our INSITE platform on GCP and found several molecules that have recently been experimentally verified to have on-target and cell-based activity.

We chose to run this high-performance computing (HPC) job on GCP over other public cloud providers for a number of reasons:
  • Availability of high-performance compute infrastructure. Compute Engine has a good inventory of high-performance processors that can be configured with large amounts of cores and memory. It also offers GPUs  a great fit for some of our computations, such as molecular dynamics and free energy calculations. SSD made a big difference in performance, as our total I/O for this screen exceeded 40 TB of raw data. Fast connectivity between the front-end and the compute nodes was also a big factor, as the front-end disk was NFS-mounted on the compute nodes.
  • Support for industry standard tools. As a startup, we value the ability to run our workloads wherever we see fit. Our priorities can change rapidly based on project challenges (chemistry and biology), competition, opportunities and the availability of compute resources. Our INSITE platform is built on a combination of open-source and proprietary in-house software, so portability and repeatability across in-house and public clouds is essential.
  • An attractive pricing model. Preemptible VMs are great combination of cost-effective and predictable, offering up to 80% off standard instances  no bidding and no surprises. That means we don't have to worry about jobs being killed due to a bidding war, which can create significant delays in completing our screens and requires unnecessary human overhead to manage the jobs.
We initialized multiple clusters for the screening; specifically, our cluster’s front-end consisted of three full-priced n1-highmem-32 VM instances with 208GB of RAM that ran the queuing system, and that connected to a 2TB SSD NFS filestore that housed the compound library. Each of these front-end nodes then spawned up to 128 compute nodes configured as n1-highcpu-32 Preemptible VMs, each with 28.8GB of memory. Those compute nodes performed the actual molecular compound screens, and wrote their results back to the filestore. Preemptible VMs run for a maximum of 24 hours; when that time elapsed, the front-end nodes drained any jobs remaining on the compute nodes and re-spawned a new set of nodes until all 10 million compounds had been successfully run.

To manage compute jobs, we enlisted the help of two popular open-source tools: Slurm, a workload manager used by 60% of the world’s TOP500 clusters, and ElastiCluster, which provides a command-line tool to create, manage and setup compute clusters hosted on a variety of cloud infrastructures. Using these open-source packages is economical, provides the lion’s share of the functionality of paid software solutions and ensures we can run our workloads in-house or elsewhere.

More compute = better results

But ultimately, the biggest benefit of using GCP was being able to more thoroughly screen compounds than we could have done with in-house resources. The target protein in this particular study was highly flexible, and having access to massive amounts of compute power allowed us to more accurately model the underlying physics of the system by accounting for protein flexibility. This yielded more active compounds than we would have found without the GCP resources.

The reality is that all proteins are flexible, and undergo some form of induced fit upon ligand binding, so treating protein flexibility is always important in virtual screening if you want the best results. Most molecular docking programs only account for ligand flexibility, so if the receptor structure is not quite right then active compounds might not fit and therefore be missed, no matter how good the docking program is. Our INSITE screening platform incorporates protein flexibility in a novel way that can greatly improve the hit rate in virtual screening, even as it requires a lot of computational resources when screening millions of commercially available compounds.

Example of the dynamic nature of protein target (Interleukin018, IL18)
From the initial 10 million compounds, we prioritized 250 promising compounds for experimental validation in our lab. As a small company, we don't have the capabilities to experimentally screen millions of compounds, and there's no need to do so with an accurate virtual screening approach like we have in our INSITE platform. We're excited to report that at least five of these compounds have shown activity in human cells, suggesting them as promising starting points for new medicines. To our knowledge, there are no drug-like small molecule activators of this important and challenging immune-oncology target.

To learn more about the science at Silicon Therapeutics, please visit our website. And if you’re an engineer with expertise in high performance computing, GPUs and/or molecular simulations, be sure to visit our job listings.

Guest post: Using GCP for massive drug discovery virtual screening



[Editor’s note: Today we hear from Boston, MA-based Silicon Therapeutics, which is applying computational methods in the context of complex biochemical problems relevant in human biology.]

As an integrated computational drug discovery firm, we recently deployed our INSITE Screening platform on Google Cloud Platform (GCP) to analyze over 10 million commercially available molecular compounds as potential starting materials for next-generation medicines. In one week, we performed over 500 million docking computations to evaluate how a protein responds to a given molecule. Each computation involved a docking program that predicted the preferred orientation of a small molecule to a protein and the associated energetics so we could assess whether or not it will bind and alter the function of the target protein.

With a combination of Google Compute Engine standard and Preemptible VMs, we used up to 16,000 cores, for a total of 3 million core-hours and a cost of about $30,000. While this might sound like a lot of time and money, it's a lot less expensive and a lot faster than experimentally screening all compounds. Using a physics-based approach such as our INSITE platform is much more computationally expensive than some other computational screening approaches, but it allows us to find novel binders without the use of any prior information about active compounds (this particular target has no drug-like compounds known to bind). In a final stage of the calculations we performed all-atom molecular dynamics (MD) simulations on the top 1,000 molecules to determine which ones to purchase and experimentally assay for activity.

The bottom line: We successfully completed the screen using our INSITE platform on GCP and found several molecules that have recently been experimentally verified to have on-target and cell-based activity.

We chose to run this high-performance computing (HPC) job on GCP over other public cloud providers for a number of reasons:
  • Availability of high-performance compute infrastructure. Compute Engine has a good inventory of high-performance processors that can be configured with large amounts of cores and memory. It also offers GPUs  a great fit for some of our computations, such as molecular dynamics and free energy calculations. SSD made a big difference in performance, as our total I/O for this screen exceeded 40 TB of raw data. Fast connectivity between the front-end and the compute nodes was also a big factor, as the front-end disk was NFS-mounted on the compute nodes.
  • Support for industry standard tools. As a startup, we value the ability to run our workloads wherever we see fit. Our priorities can change rapidly based on project challenges (chemistry and biology), competition, opportunities and the availability of compute resources. Our INSITE platform is built on a combination of open-source and proprietary in-house software, so portability and repeatability across in-house and public clouds is essential.
  • An attractive pricing model. Preemptible VMs are great combination of cost-effective and predictable, offering up to 80% off standard instances  no bidding and no surprises. That means we don't have to worry about jobs being killed due to a bidding war, which can create significant delays in completing our screens and requires unnecessary human overhead to manage the jobs.
We initialized multiple clusters for the screening; specifically, our cluster’s front-end consisted of three full-priced n1-highmem-32 VM instances with 208GB of RAM that ran the queuing system, and that connected to a 2TB SSD NFS filestore that housed the compound library. Each of these front-end nodes then spawned up to 128 compute nodes configured as n1-highcpu-32 Preemptible VMs, each with 28.8GB of memory. Those compute nodes performed the actual molecular compound screens, and wrote their results back to the filestore. Preemptible VMs run for a maximum of 24 hours; when that time elapsed, the front-end nodes drained any jobs remaining on the compute nodes and re-spawned a new set of nodes until all 10 million compounds had been successfully run.

To manage compute jobs, we enlisted the help of two popular open-source tools: Slurm, a workload manager used by 60% of the world’s TOP500 clusters, and ElastiCluster, which provides a command-line tool to create, manage and setup compute clusters hosted on a variety of cloud infrastructures. Using these open-source packages is economical, provides the lion’s share of the functionality of paid software solutions and ensures we can run our workloads in-house or elsewhere.

More compute = better results

But ultimately, the biggest benefit of using GCP was being able to more thoroughly screen compounds than we could have done with in-house resources. The target protein in this particular study was highly flexible, and having access to massive amounts of compute power allowed us to more accurately model the underlying physics of the system by accounting for protein flexibility. This yielded more active compounds than we would have found without the GCP resources.

The reality is that all proteins are flexible, and undergo some form of induced fit upon ligand binding, so treating protein flexibility is always important in virtual screening if you want the best results. Most molecular docking programs only account for ligand flexibility, so if the receptor structure is not quite right then active compounds might not fit and therefore be missed, no matter how good the docking program is. Our INSITE screening platform incorporates protein flexibility in a novel way that can greatly improve the hit rate in virtual screening, even as it requires a lot of computational resources when screening millions of commercially available compounds.

Example of the dynamic nature of protein target (Interleukin018, IL18)
From the initial 10 million compounds, we prioritized 250 promising compounds for experimental validation in our lab. As a small company, we don't have the capabilities to experimentally screen millions of compounds, and there's no need to do so with an accurate virtual screening approach like we have in our INSITE platform. We're excited to report that at least five of these compounds have shown activity in human cells, suggesting them as promising starting points for new medicines. To our knowledge, there are no drug-like small molecule activators of this important and challenging immune-oncology target.

To learn more about the science at Silicon Therapeutics, please visit our website. And if you’re an engineer with expertise in high performance computing, GPUs and/or molecular simulations, be sure to visit our job listings.

Guest post: Loot Crate unboxes Google Container Engine for new Sports Crate venture



[Editor’s note: Gamers and superfans know Loot Crate, which delivers boxes of themed swag to 650,000 subscribers every month. Loot Crate built its back-end on Heroku, but for its next venture  Sports Crate  the company decided to containerize its Rails app with Google Container Engine, and added continuous deployment with Jenkins. Read on to learn how they did it.]

Founded in 2012, Loot Crate is the worldwide leader in fan subscription boxes, partnering with entertainment, gaming and pop culture creators to deliver monthly themed crates, produce interactive experiences and digital content and film original video productions. In our first five years, we’ve delivered over 14 million crates to fans in 35 territories across the globe.
In early 2017 we were tasked with launching an offering to Major League Baseball fans called Sports Crate. There were only a couple of months until the 2017 MLB season started on April 2nd, so we needed the site to be up and capturing emails from interested parties as fast as possible. Other items on our wish list included the ability to scale the site as traffic increased, automated zero-downtime deployments, effective secret management and to reap the benefits of Docker images. Our other Loot Crate properties are built on Heroku, but for Sports Crate, we decided to try Container Engine, which we suspected would allow our app to scale better during peak traffic, manage our resources using a single Google login and better manage our costs.


Continuous deployment with Jenkins

Our goal was to be able to successfully deploy an application to Container Engine with a simple git push command. We created an auto-scaling, dual-zone Kubernetes cluster on Container Engine, and tackled how to do automated deployments to the cluster. After a lot of research and a conversation with Google Cloud Solutions Architect Vic Iglesias, we decided to go with Jenkins Multibranch Pipelines. We followed this guide on continuous deployment on Kubernetes and soon had a working Jenkins deployment running in our cluster ready to handle deploys.

Our next task was to create a Dockerfile of our Rails app to deploy to Container Engine. To speed up build time, we created our own base image with Ruby and our gems already installed, as well as a rake task to precompile assets and upload them to Google Cloud Storage when Jenkins builds the Docker image.

Dockerfile in hand, we set up the Jenkins Pipeline to build the Docker image, push it to Google Container Registry and deploy Kubernetes and its services to our environment. We put a Jenkinsfile in our GitHub repo that uses a switch statement based on the GitHub branch name to choose which Kubernetes namespace to deploy to. (We have three QA environments, a staging environment and production environment).

The Jenkinsfile checks out our code from GitHub, builds the Docker image, pushes the image to Container Registry, runs a Kubernetes job that performs any database migrations (checking for success or failure) and runs tests. It then deploys the updated Docker image to Container Engine and reports the status of the deploy to Slack. The entire process takes under 3 minutes.

Improving secret management in the local development environment

Next, we focused on making local development easier and more secure. We do our development locally, and with our Heroku-based applications, we deploy using environment variables that we add in the Heroku config or in the UI. That means that anyone with the Heroku login and permission can see them. For Sports Crate, we wanted to make the environment variables more secure; we put them in a Kubernetes secret that the applications can easily consume, which also keeps the secrets out of the codebase and off developer laptops.

The local development environment consumes those environmental variables using a railtie that goes out to Kubernetes, retrieves the secrets for the development environment, parses them and puts them into the Rails environment. This allows our developers to "cd" into a repo and run "rails server" or "rails console" with the Kubernetes secrets pulled down before the app starts.

TLS termination and load balancing

Another requirement was to set up effective TLS termination and load balancing. We used a Kubernetes Ingress resource with an Nginx Ingress Controller, whose automatic HTTP-to-HTTPS redirect functionality isn’t available from Google Cloud Platform's (GCP) Ingress controller. Once we had the Ingress resource configured with our certificate and our Nginx Ingress controller running behind a service with a static IP, we were able to get to our application from the outside world. Things were starting to come together!

Auto-scaling and monitoring

With all of the basic pieces of our infrastructure on GCP in place, we looked towards auto-scaling, monitoring and educating our QA team on deployment practices and logging. For pod auto-scaling, we implemented a Kubernetes Horizontal Pod Autoscaler on our deployment. This checks CPU utilization and scales the pods up if we start getting a lot of traffic to our app. For monitoring, we implemented Datadog’s Kubernetes Agent and set up metrics to check for any critical issues, and send alerts to PagerDuty. We use StackDriver for logging and educated our team on how to use the StackDriver Logging console to properly drill down to the app, namespace and pod for which they wanted information.

Net-net

With launch day around the corner, we ran load tests on our new app and were amazed at how well it handled large amounts of traffic. The pods auto-scaled exactly as we needed them to and our QA team fell in love with continuous deployment with Jenkins Multibranch Pipelines. All told, Container Engine met all of our requirements, and we were up and running within a month.
Our next project is to move our other monolithic Rails apps off of Heroku and onto Container Engine as decoupled microservices that can take advantage of the newest Kubernetes features. We look forward to improving on what has already been an extremely powerful tool.

Top 12 Google Cloud Platform posts of 2016


From product news to behind-the-scenes stories to tips and tricks, we covered a lot of ground on the Google Cloud Platform (GCP) blog this year. Here are the most popular posts from 2016.

  1. Google supercharges machine learning tasks with TPU custom chip - A look inside our custom ASIC built specifically for machine learning. This chip fast-forwards technology seven years into the future. 
    Tensor Processing Unit board
  2. Bringing Pokemon Go to life - Niantic’s augmented reality game uses more than a dozen Google Cloud services to delight and physically exert millions of Pokemon chasers across the globe.


  3. New undersea cable expands capacity for Google APAC customers and users - Together with Facebook, Pacific Light Data Communication and TE SubCom, we’re building the first direct submarine cable system between Los Angeles and Hong Kong.
  4. Introducing Cloud Natural Language API, Speech API open beta and our West Coast Region expansion - Now anyone can use machine learning models to process unstructured data or to convert speech to text. We also announced the opening of our Oregon Cloud Region (us-west1).


  5. Google to acquire Apigee - Apigee, an API management provider, helps developers integrate with outside apps and services. (Our acquisition of cloud-based software buyer and seller, Orbitera, also made big news this year.)


  6. Top 5 GCP NEXT breakout sessions on YouTube (so far) - From Site Reliability Engineering (SRE) and container management to building smart apps and analyzing 25 billion stock market events in an hour, Google presenters kept the NEXT reel rolling. (Don’t forget to sign up for Google Cloud Next 2017, which is just around the corner!)


  7. Advancing enterprise database workloads on Google Cloud Platform - Announcing that our fully managed database services Cloud SQL, Cloud Bigtable and Cloud Datastore are all generally available, plus Microsoft SQL Server images for Google Compute Engine.


  8. Google Cloud machine learning family grows with new API, editions and pricing - The new Cloud Jobs API makes it easier to fill open positions, and GPUs spike compute power for certain jobs. Also included: custom TPUs in Cloud Vision API, Cloud Translation API premium and general availability of Cloud Natural Language API.


  9. Google Cloud Platform sets a course for new horizons - In one day, we announced eight new Google Cloud regions, BigQuery support for Standard SQL and Customer Reliability Engineering (CRE), a support model in which Google engineers work directly with customer operations teams.


  10. Finding Pete’s Dragon with Cloud Vision API - Learn how Disney used machine learning to create a “digital experience” that lets kids search for Pete’s friend Elliot on their mobile and desktop screens.
  11. Top 10 GCP sessions from Google I/O 2016 - How do you develop a Node.js backend for an iOS and Android based game? What about a real-time game with Firebase? How do you build a smart RasPI bot with Cloud Vision API? You'll find the answers to these and many other burning 


  12. Spotify chooses Google Cloud Platform to power its data infrastructure - As Spotify’s user base grew to more than 75 million, it moved its backend from a homegrown infrastructure to a scalable and reliable public cloud.

Thank you for staying up to speed on GCP happenings on our blog. We look forward to much more activity in 2017, and invite you to join in on the action if you haven't already. Happy holidays!

Google Cloud, HEPCloud and probing the nature of Nature



Understanding the nature of the universe isn't a game for the resource-constrained. Today, we probe the very structure of matter using multi-billion dollar experimental machinery, hundreds of thousands of computing cores and exabytes of data storage. Together, the European Center for Nuclear Research (CERN) and partners such as Fermilab built the Large Hadron Collider (LHC), the world's largest particle collider, to recreate and observe the first moments of the universe.

Today, we're excited to announce that Google Cloud Platform (GCP) is now a supported provider for HEPCloud, a project launched in June 2015 by Fermilab’s Scientific Computing Division to develop a virtual facility providing a common interface to local clusters, grids, high-performance computers and community and commercial clouds. Following the recommendations from a 2014 report by the Particle Physics Project Prioritization Panel to the national funding agencies, the HEPCloud project demonstrates the value of the elastic provisioning model using commercial clouds.

The need for compute resources by the high-energy physics (HEP) community is not constant. It follows cycles of peaks and valleys driven by experiment schedules and other constraints. However, the conventional method of building data centers is to provide all the capacity needed to meet peak loads, which can lead to overprovisioned resources. To help mitigate this, Grid federations such as the Open Science Grid offer opportunistic access to compute resources across a number of partner facilities. With the appetite for compute power expected to increase over 100-fold over the next decade, so too will the need to improve cost efficiency with an “elastic” model for dynamically provisioned resources.

With Virtual Machines (VMs) that boot within seconds and per-minute billing, Google Compute Engine lets HEPCloud pay for only the compute it uses. Because the simulations that Fermilab needs to perform are fully independent and parallelizable, this workload is appropriate for Preemptible Virtual Machines. Without the need for bidding, Preemptible VMs can be up to 80% cheaper compared to regular VMs. Combined with Custom Machine Types, Fermilab is able to double the computing power of the Compact Muon Solenoid (CMS) experiment by adding 160,000 virtual cores and 320 TB of memory in a single region, for about $1400 per hour.

At SC16 this week, Google and Fermilab will demonstrate how high-energy physics workflows can benefit from the elastic Google Cloud infrastructure. The demonstration involves computations that simulate billions of particles from the CMS detector (see fig. 1) at the LHC. Using Fermilab’s HEPCloud facility, the goal is to burst CMS workflows to Compute Engine instances for one week.
Fig. 1: The CMS detector before closure (credit: 2008 CERN, photo: Maximilien Brice, Michael Hoch, Joseph Gobin)

The demonstration also leverages HTCondor, a specialized workload management system for compute-intensive jobs, to manage resource provisioning and job scheduling. HTCondor manages VMs natively using the Compute Engine API. In conjunction with the HEPCloud Decision Engine component, it enables the use of the remote resources at scale at an affordable rate (fig. 2). With half a petabyte of input data in Google Cloud Storage, each task reads from the bucket via gcsfuse, performs its computation on Preemptible VMs, then transports the resulting output back to Fermilab through the US Department of Energy Office of Science's Energy Sciences Network (ESNet), a high-performance, unclassified network built to support scientific research.
Fig. 2: The flow of data from the CMS detector to scientific results through the CMS, HEPCloud and Google Cloud layers. Image of CMS event display © CERN by McCauley, Thomas; Taylor, Lucas; the CMS Collaboration is licensed under CC BY-SA 4.0.

The demonstration shows that HTCondor, HEPCloud and GCP all work together to enable real HEP science to be conducted in a cost-effective burst mode at a scale that effectively doubles the current capability. The Fermilab project plans to transition the HEPCloud facility into production use by the HEP community in 2018.
“Every year we have to plan to provision computing resources for our High-Energy Physics experiments based on their overall computing needs for performing their science. Unfortunately, the computing utilization patterns of these experiments typically exhibit peaks and valleys during the year, which makes cost-effective provisioning difficult. To achieve this cost effectiveness we need our computing facility to be able to add and remove resources to track the demand of the experiments as a function of time. Our collaboration with commercial clouds is an important component of our strategy for achieving this elasticity of resources, as we aim to demonstrate with Google Cloud for the CMS experiment via the HEPCloud facility at SC16.” 
- Panagiotis Spentzouris, Head of the Scientific Computing Division at Fermilab
If you're at SC16, stop by the Google booth and speak with experts on scalable high performance computing or spin up your own HTCondor cluster on Google Cloud Platform for your workloads.

Snapchat shares security best practices for running on GCP

Snapchat security engineer, Subhash Sankuratripati, took the stage at GCP NEXT in San Francisco this week, to share his company’s best practices for running securely at scale on Google Cloud Platform. And when we say at scale, we mean at scale!

Snapchat has over 100 million daily users and supports 8 billion videos, viewed daily. The company runs about 100 separate GCP projects, each requiring different permissions for who at the company can do what on which GCP resources.

Until recently, Snapchat engineers used viewer/editor roles and built their own stopgaps to manage resources on the platform, but not anymore. With the launch of IAM Roles in beta, Snapchat now uses this service to set the fine grained permissions it needs to help secure its users’ data.

Essentially, Snapchat operates on the principle of least privilege to the extreme. The principle of least privilege promotes minimal user profile privileges on machines, based on your job necessities. You get the least access and authority necessary to perform your job. It sounds a bit restrictive, but it reduces the attack surface considerably.

In Snapchat’s case, the company's working on using our new iam.setpolicy feature to create what it calls Access Control List leases or “ACL leases.” These leases temporarily grant access to resources only when someone needs them, then the policy tears them down when the lease is over, for example:
  • AccessControlService can iam.SetPolicy
  • When bob@ needs access, AccessControlService adds bob@ to policy
  • AccessControlService removes bob@ after 1 hour
Like the nature of Snapchat itself, the company treats its cloud resources as ephemeral for maximum security. In a steady state in Snapchat’s GCP environment, nobody would have access to anything.

Snapchat’s using the new Organizational Node, which sits above projects and manages GCP resources. This prevents shadow projects from being created, giving the company more control over all projects and the permissions of members associated to those projects. Subhash said he’s also doing data siloing based on role using IAM Roles and is testing the IAM Service Account API, which can be used by programs to authenticate to Google and make API calls.

The possibilities this opens up are endless, according to Subhash. He said microservice to microservice authentication would mean an even larger reduction in what his engineers can manage directly, locking down access to resources even further. Snapchat's strategy is essentially to ensure its developers have enough freedom to get their job done, but not enough to get themselves into trouble.

Stay tuned for more resources coming soon on using IAM on Cloud Platform and as you check out these services, please share your feedback with us at mailto:[email protected].

- Posted by Jo Maitland, Managing Editor, Google Cloud Platform