Tag Archives: cloud storage

How to use App Engine Blobstore (Module 15)

Posted by Wesley Chun (@wescpy), Developer Advocate, Google Cloud

Introduction and background

In our ongoing Serverless Migration Station mini-series aimed at helping developers modernize their serverless applications, one of the key objectives for Google App Engine developers is to upgrade to the latest language runtimes, such as from Python 2 to 3 or Java 8 to 17. Another goal is to demonstrate how to move away from App Engine legacy APIs (now referred to as "bundled services") to Cloud standalone replacement services. Once this has been accomplished, apps are much more portable, making them flexible enough to:

Developers building web apps that provide for user uploads or serve large files like videos or audio clips can benefit from convenient "blob" storage backing such functionality, and App Engine's Blobstore serves this specific purpose. As mentioned above, moving away from proprietary App Engine services like Blobstore makes user apps more portable. The original underlying Blobstore infrastructure eventually merged with the Cloud Storage service anyway, so it's logical to move completely to Cloud Storage when convenient, and this content is inform on this process.

Showing App Engine users how to use its Blobstore service
In today's Module 15 video, we begin this journey by showing users how to add Blobstore usage to a sample app, setting us up for our next move to Cloud Storage in Module 16. Similar videos in this series adding use of an App Engine bundled service start with a Python 2 sample app that has already migrated web frameworks from webapp2 to Flask, but not this time.

Blobstore for Python 2 has a dependency on webapp, the original App Engine micro framework replaced by webapp2 when the Python 2.5 runtime was deprecated in favor of 2.7. Because the Blobstore handlers were left "stuck" in webapp, it's better to start with a more generic webapp2 app prior to a Flask migration. This isn't an issue because we modernize this app completely in Module 16 by:

  • Migrating from webapp2 (and webapp) to Flask
  • Migrating from App Engine NDB to Cloud NDB
  • Migrating from App Engine Blobstore to Cloud Storage
  • Migrating from Python 2 to Python (2 and) 3

We'll go into more detail in Module 16, but it suffices to say that once those migrations are complete, the resulting app becomes portable enough for all the possibilities mentioned at the top.

Adding use of Blobstore

The original sample app registers individual web page "visits," storing visitor information such as the IP address and user agent, then displaying the most recent visits to the end-user. In today's video, we add one additional feature: allowing visitors to optionally augment their visits with a file artifact, like an image. Instead of registering a visit immediately, the visitor is first prompted to provide the artifact, as illustrated below.

The updated sample app's new artifact prompt page

The end-user can choose to do so or click a "Skip" button to opt-out. Once this process is complete, the same most recent visits page is then rendered, with one difference: an additional link to view a visit artifact if one's available.

The sample app's updated most recent visits page

Below is pseudocode representing the core part of the app that was altered to add Blobstore usage, namely new upload and download handlers as well as the changes required of the main handler. Upon the initial GET request, the artifact form is presented. When the user submits an artifact or skips, the upload handler POSTs back to home ("/") via an HTTP 307 to preserve the verb, and then the most recent visits page is rendered as expected. There, if the end-user wishes to view a visit artifact, they can click a "view" link where the download handler which fetches and returns the corresponding artifact from the Blobstore service, otherwise an HTTP 404 if the artifact wasn't found. The bolded lines represent the new or altered code.

Adding App Engine Blobstore usage to sample app


In this "migration," we added Blobstore usage to support visit artifacts to the Module 0 baseline sample app and arrived at the finish line with the Module 15 sample app. To get hands-on experience doing it yourself, do the codelab by hand and follow along with the video. Then you'll be ready to upgrade to Cloud Storage should you choose to do so. 

In Fall 2021, the App Engine team extended support of many of the bundled services to 2nd generation runtimes (that have a 1st generation runtime), meaning you are no longer required to migrate to Cloud Storage when porting your app to Python 3. You can continue using Blobstore in your Python 3 app so long as you retrofit the code to access bundled services from next-generation runtimes

If you do want to move to Cloud Storage, Module 16 is next. You can also try its codelab to get a head start. All Serverless Migration Station content (codelabs, videos, source code [when available]) can be accessed at its open source repo. While our content initially focuses on Python users, the Cloud team is working on covering other language runtimes, so stay tuned. For additional video content, check out our broader Serverless Expeditions series.

Build a mobile gaming analytics platform

Popular mobile games can attract millions of players and generate terabytes of game-related data in a short burst of time. This places extraordinary pressure on the infrastructure powering these games and requires scalable data analytics services to provide timely, actionable insights in a cost-effective way.

To address these needs, a growing number of successful gaming companies use Google’s web-scale analytics services to create personalized experiences for their players. They use telemetry and smart instrumentation to gain insight into how players engage with the game and to answer questions like: At what game level are players stuck? What virtual goods did they buy? And what's the best way to tailor the game to appeal to both casual and hardcore players?

A new reference architecture describes how you can collect, archive and analyze vast amounts of gaming telemetry data using Google Cloud Platform’s data analytics products. The architecture demonstrates two patterns for analyzing mobile game events:

  • Batch processing: This pattern helps you process game logs and other large files in a fast, parallelized manner. For example, leading mobile gaming company DeNA moved to BigQuery from Hadoop to get faster query responses for their log file analytics pipeline. In this GDC Lightning Talk video they explain the speed benefits of Google’s analytics tools and how the team was able to process large gaming datasets without the need to manage any infrastructure.
  • Real-time processing: Use this pattern when you want to understand what's happening in the game right now. Cloud Pub/Sub and Cloud Dataflow provide a fully managed way to perform a number of data-processing tasks like data cleansing and fraud detection in real-time. For example, you can highlight a player with maximum hit-points outside the valid range. Real-time processing is also a great way to continuously update dashboards of key game metrics, like how many active users are currently logged in or which in-game items are most popular.

Some Cloud Dataflow features are especially useful in a mobile context since messages may be delayed from the source due to mobile Internet connection issues or batteries running out. Cloud Dataflow's built-in session windowing functionality and triggers aggregate events based on the actual time they occurred (event time) as opposed to the time they're processed so that you can still group events together by user session even if there's a delay from the source.

But why choose between one or the other pattern? A key benefit of this architecture is that you can write your data pipeline processing once and execute it in either batch or streaming mode without modifying your codebase. So if you start processing your logs in batch mode, you can easily move to real-time processing in the future. This is an advantage of the high-level Cloud Dataflow model that was released as open source by Google.

Cloud Dataflow loads the processed data into one or more BigQuery tables. BigQuery is built for very large scale, and allows you to run aggregation queries against petabyte-scale datasets with fast response times. This is great for interactive analysis and data exploration, like the example screenshot above, where a simple BigQuery SQL query dynamically creates a Daily Active Users (DAU) graph using Google Cloud Datalab.

And what about player engagement and in-game dynamics? The BigQuery example above shows a bar chart of the ten toughest game bosses. It looks like boss10 killed players more than 75% of the time, much more than the next toughest. Perhaps it would make sense to lower the strength of this boss? Or maybe give the player some more powerful weapons? The choice is yours, but with this reference architecture you'll see the results of your changes straight away. Review the new reference architecture to jumpstart your data-driven quest to engage your players and make your games more successful, contact us, or sign up for a free trial of Google Cloud Platform to get started.

Further Reading and Additional Resources

- Posted by Oyvind Roti, Solutions Architect

Containerizing in the real world . . . of Minecraft

Containers are all the rage right now. There are scores of best practices papers and tutorials out there, and "Intro to Containers" sessions at just about every conference even tangentially related to cloud computing. You may have read through the Docker docs, launched an NGINX Docker container, and read through Miles Ward’s Introduction to containers and Kubernetes piece. Still, containers can be a hard concept to internalize, especially if you have an existing application that you’re considering containerizing.

To help you through this conceptual hurdle, I’ve written a four-part series of blog posts that gives you a hands-on introduction to building, updating, and using containers for something familiar: running a Minecraft server. You can check them out here:

In the first part of the series, you’ll learn how to create a container image that includes everything a Minecraft server needs, use that image on Google Compute Engine to run the server, and make it accessible from your Minecraft client. You’ll use the Docker command-line tools to build, test, and run the container, as well as to push the image up into the Google Container Registry for use with a container-optimized instance.

Next, you'll work through the steps needed to separate out storage from the container and learn how to make regular backups of your game. If you’ve ever made a mistake in Minecraft, you know how critical being able to restore world state can be! As Minecraft is always more fun when it’s customized, you'll also learn how to update the container image with modifications you make to the server.properties file.

Finally, you’ll take the skills that you’ve learned and apply them to making something fun and slightly absurd: Minecraft Roulette. This application allows you to randomly connect to one of several different Minecraft worlds using a single IP as your entry point. As you work through this tutorial, you’ll learn the basics of Kubernetes, an open source container orchestrator.

By the end of the series, you’ll have grasped the basics of containers and Kubernetes, and will be set to go out and containerize your own application. Plus, you’ll have had the excuse to play a little Minecraft. Enjoy!

This blog post is not approved by or associated with Mojang or Minecraft.

Posted by Julia Ferraioli, Senior Developer Advocate, Google Cloud Platform

Processing logs at scale using Cloud Dataflow

Logs generated by applications and services can provide an immense amount of information about how your deployment is running and the experiences your users are having as they interact with the products and services. But as deployments grow more complex, gleaning insights from this data becomes more challenging. Logs come from an increasing number of sources, so they can be hard to collate and query for useful information. And building, operating and maintaining your own infrastructure to analyze log data at scale requires extensive expertise in running distributed systems and storage. Today, we’re introducing a new solution paper and reference implementation that will show how you can process logs from multiple sources and extract meaningful information by using Google Cloud Platform and Google Cloud Dataflow.

Log processing typically involves some combination of the following activities:

  • Configuring applications and services
  • Collecting and capturing log files
  • Storing and managing log data
  • Processing and extracting data
  • Persisting insights

Each of those components has it’s own scaling and management challenges, often using different approaches at different times. These sorts of challenges can slow down the generation of meaningful, actionable information from your log data.

Cloud Platform provides a number of services that can help you to address these challenges. You can use Cloud Logging to collect logs from applications and services, and then store them in Google Cloud Storage buckets or stream them to Pub/Sub topics. Dataflow can read from Cloud Storage or Pub/Sub (and many more), process log data, extract and transform metadata and compute aggregations. You can persist the output from Dataflow in BigQuery, where it can be analyzed or reviewed anytime. These mechanisms are offered as managed services—meaning they can scale when needed. That also means that you don't need to worry about provisioning resources up front.

The solution paper and reference implementation describe how you can use Dataflow to process log data from multiple sources and persist findings directly in BigQuery. You’ll learn how to configure Cloud Logging to collect logs from applications running in Container Engine, how to export those logs to Cloud Storage, and how to execute the Dataflow processing job. In addition, the solution shows you how to reconfigure Cloud Logging to use Pub/Sub to stream data directly to Dataflow, so you can process logs in real-time.

Check out the Processing Logs at Scale using Cloud Dataflow solution to learn how to combine logging, storage, processing and persistence into a scalable log processing approach. Then take a look at the reference implementation tutorial on Github to deploy a complete end-to-end working example. Feedback is welcome and appreciated; comment here, submit a pull request, create an issue, or find me on Twitter @crcsmnky and let me know how I can help.

- Posted by Sandeep Parikh, Google Solutions Architect