Tag Archives: Pub/Sub

Migrating from App Engine pull tasks to Cloud Pub/Sub (Module 19)

Posted by Wesley Chun (@wescpy), Developer Advocate, Google Cloud

Introduction and background

The Serverless Migration Station series is aimed at helping developers modernize their apps running one of Google Cloud's serverless platforms. The preceding (Migration Module 18) video demonstrates how to add use of App Engine's Task Queue pull tasks service to a Python 2 App Engine sample app. Today's Module 19 video picks up from where that leaves off, migrating that pull task usage to Cloud Pub/Sub.

Moving away from proprietary App Engine services like Task Queue makes apps more portable, giving them enough flexibility to:


    Understanding the migrations

    Module 19 consists of implementing three different migrations on the Module 18 sample app:

    • Migrate from App Engine NDB to Cloud NDB
    • Migrate from App Engine Task Queue pull tasks to Cloud Pub/Sub
    • Migrate from Python 2 to Python (2 and) 3

    The NDB to Cloud NDB migration is identical to the Module 2 migration content, so it's not covered in-depth in Module 19. The original app was designed to be Python 2 and 3 compatible, so there's no work there either. Module 19 boils down to three key updates:

    • Setup: Enable APIs and create Pub/Sub Topic & Subscription
    • How work is created: Publish Pub/Sub messages instead of adding pull tasks
    • How work is processed: Pull messages instead of leasing tasks

    Aside from these physical changes, a key hurdle to overcome is understanding the differences in terminology between pull tasks and Pub/Sub. The following chart attempts to demystify this so developers can more easily grasp how they differ:
    Table of terminology with related GAE Pull Tasks and Cloud Pub/Sub
    Terminology differences between App Engine pull tasks and Cloud Pub/Sub

    Reflecting the chart, these differences can be summarized like this:
    1. With Pull Queues, work is created in pull queues while work is sent to Pub/Sub topics
    2. Task Queue pull tasks are called messages in Pub/Sub
    3. With Task Queues, workers access pull tasks; with Pub/Sub, subscribers receive messages
    4. Leasing a pull task is the same as pulling a message from a Pub/Sub topic via a subscription
    5. Deleting a task from a pull queue when you're done is analogous to successfully acknowledging a Pub/Sub message
    The video walks developers through the terminology as well as the code changes described above. Below is pseudocode implementing the key changes to the main application (new or updated lines of code bolded):
    Table showing changes in code Before (Module 18) on the left, and After (Module 19) on the right
    Migration from App Engine Task Queue pull tasks to Cloud Pub/Sub

    Observe how most of the code, especially app operations and data models are left relatively unchanged. The only visible changes are switching from App Engine NDB and Task Queue to Cloud NDB and Pub/Sub. Complete versions of the app before and after making the changes can be found in the Module 18 and Module 19 repo folders, respectively. In addition to the video, be sure to check out the Module 19 codelab which leads you step-by-step through the migrations discussed.


    Module 19 features a migration of App Engine pull tasks to Cloud Pub/Sub, but developers should note that Pub/Sub itself is not based on pull tasks. It is a fully-featured asynchronous, scalable messaging service that has many more features than the pull functionality provided by Task Queue. For example, Pub/Sub has other features like streaming to BigQuery and push functionality. Pub/Sub push operates differently than Task Queue push tasks, hence why we recommend push tasks be migrated to Cloud Tasks instead (see Module 8). For more information on all of its features, see the Pub/Sub documentation. Because Cloud Tasks doesn't support pull functionality, we turn to Pub/Sub instead for pull task users.

    While we recommend users move to the latest offerings from Google Cloud, neither of those migrations are required, and should you opt to do so, can do them on your own timeline. In Fall 2021, the App Engine team extended support of many of the bundled services to 2nd generation runtimes (that have a 1st generation runtime), meaning you don't have to migrate to standalone Cloud services before porting your app to Python 3. You can continue using Task Queue in Python 3 so long as you retrofit your code to access bundled services from next-generation runtimes.

    If you're using other App Engine legacy services be sure to check out the other Migration Modules in this series. All Serverless Migration Station content (codelabs, videos, source code [when available]) can be accessed at its open source repo. While our content initially focuses on Python users, the Cloud team is working on covering other language runtimes, so stay tuned. For additional video content, check out our broader Serverless Expeditions series.

    How to use App Engine pull tasks (Module 18)

    Posted by Wesley Chun (@wescpy), Developer Advocate, Google Cloud

    Introduction and background

    The Serverless Migration Station mini-series helps App Engine developers modernize their apps to the latest language runtimes, such as from Python 2 to 3 or Java 8 to 17, or to sister serverless platforms Cloud Functions and Cloud Run. Another goal of this series is to demonstrate how to move away from App Engine's original APIs (now referred to as legacy bundled services) to Cloud standalone replacement services. Once no longer dependent on these proprietary services, apps become much more portable, making them flexible enough to:

    App Engine's Task Queue service provides infrastructure for executing tasks outside of the standard request-response workflow. Tasks may consist of workloads exceeding request timeouts or periodic tangential work. The Task Queue service provides two different queue types, push and pull, for developers to perform auxiliary work.

    Push queues are covered in Migration Modules 7-9, demonstrating how to add use of push tasks to an existing baseline app followed by steps to migrate that functionality to Cloud Tasks, the standalone successor to the Task Queues push service. We turn to pull queues in today's video where Module 18 demonstrates how to add use of pull tasks to the same baseline sample app. Module 19 follows, showing how to migrate that usage to Cloud Pub/Sub.

    Adding use of pull queues

    In addition to registering page visits, the sample app needs to be modified to track visitors. Visits are comprised of a timestamp and visitor information such as the IP address and user agent. We'll modify the app to use the IP address and track how many visits come from each address seen. The home page is modified to show the top visitors in addition to the most recent visits:

    Screen grab of the sample app's updated home page tracking visits and visitors
    The sample app's updated home page tracking visits and visitors

    When visits are registered, pull tasks are created to track the visitors. The pull tasks sit patiently in the queue until they are processed in aggregate periodically. Until that happens, the top visitors table stays static. These tasks can be processed in a number of ways: periodically by a cron or Cloud Scheduler job, a separate App Engine backend service, explicitly by a user (via browser or command-line HTTP request), event-triggered Cloud Function, etc. In the tutorial, we issue a curl request to the app's endpoint to process the enqueued tasks. When all tasks have completed, the table then reflects any changes to the current top visitors and their visit counts:

    Screen grab of processed pull tasks updated in the top visitors table
    Processed pull tasks update the top visitors table

    Below is some pseudocode representing the core part of the app that was altered to add Task Queue pull task usage, namely a new data model class, VisitorCount, to track visitor counts, enqueuing a (pull) task to update visitor counts when registering individual visits in store_visit(), and most importantly, a new function fetch_counts(), accessible via /log, to process enqueued tasks and update overall visitor counts. The bolded lines represent the new or altered code.

    Adding App Engine Task Queue pull task usage to sample app showing 'Before'[Module 1] on the left and 'After' [Module 18] with altered code on the right
    Adding App Engine Task Queue pull task usage to sample app


    This "migration" is comprised of adding Task Queue pull task usage to support tracking visitor counts to the Module 1 baseline app and arrives at the finish line with the Module 18 app. To get hands-on experience doing it yourself, do the codelab by hand and follow along with the video. Then you'll be ready to upgrade to Cloud Pub/Sub should you choose to do so.

    In Fall 2021, the App Engine team extended support of many of the bundled services to 2nd generation runtimes (that have a 1st generation runtime), meaning you are no longer required to migrate pull tasks to Pub/Sub when porting your app to Python 3. You can continue using Task Queue in your Python 3 app so long as you retrofit the code to access bundled services from next-generation runtimes.

    If you do want to move to Pub/Sub, see Module 19, including its codelab. All Serverless Migration Station content (codelabs, videos, and source code) are available at its open source repo. While we're initially focusing on Python users, the Cloud team is covering other runtimes soon, so stay tuned. Also check out other videos in the broader Serverless Expeditions series.

    Build a mobile gaming analytics platform

    Popular mobile games can attract millions of players and generate terabytes of game-related data in a short burst of time. This places extraordinary pressure on the infrastructure powering these games and requires scalable data analytics services to provide timely, actionable insights in a cost-effective way.

    To address these needs, a growing number of successful gaming companies use Google’s web-scale analytics services to create personalized experiences for their players. They use telemetry and smart instrumentation to gain insight into how players engage with the game and to answer questions like: At what game level are players stuck? What virtual goods did they buy? And what's the best way to tailor the game to appeal to both casual and hardcore players?

    A new reference architecture describes how you can collect, archive and analyze vast amounts of gaming telemetry data using Google Cloud Platform’s data analytics products. The architecture demonstrates two patterns for analyzing mobile game events:

    • Batch processing: This pattern helps you process game logs and other large files in a fast, parallelized manner. For example, leading mobile gaming company DeNA moved to BigQuery from Hadoop to get faster query responses for their log file analytics pipeline. In this GDC Lightning Talk video they explain the speed benefits of Google’s analytics tools and how the team was able to process large gaming datasets without the need to manage any infrastructure.
    • Real-time processing: Use this pattern when you want to understand what's happening in the game right now. Cloud Pub/Sub and Cloud Dataflow provide a fully managed way to perform a number of data-processing tasks like data cleansing and fraud detection in real-time. For example, you can highlight a player with maximum hit-points outside the valid range. Real-time processing is also a great way to continuously update dashboards of key game metrics, like how many active users are currently logged in or which in-game items are most popular.

    Some Cloud Dataflow features are especially useful in a mobile context since messages may be delayed from the source due to mobile Internet connection issues or batteries running out. Cloud Dataflow's built-in session windowing functionality and triggers aggregate events based on the actual time they occurred (event time) as opposed to the time they're processed so that you can still group events together by user session even if there's a delay from the source.

    But why choose between one or the other pattern? A key benefit of this architecture is that you can write your data pipeline processing once and execute it in either batch or streaming mode without modifying your codebase. So if you start processing your logs in batch mode, you can easily move to real-time processing in the future. This is an advantage of the high-level Cloud Dataflow model that was released as open source by Google.

    Cloud Dataflow loads the processed data into one or more BigQuery tables. BigQuery is built for very large scale, and allows you to run aggregation queries against petabyte-scale datasets with fast response times. This is great for interactive analysis and data exploration, like the example screenshot above, where a simple BigQuery SQL query dynamically creates a Daily Active Users (DAU) graph using Google Cloud Datalab.

    And what about player engagement and in-game dynamics? The BigQuery example above shows a bar chart of the ten toughest game bosses. It looks like boss10 killed players more than 75% of the time, much more than the next toughest. Perhaps it would make sense to lower the strength of this boss? Or maybe give the player some more powerful weapons? The choice is yours, but with this reference architecture you'll see the results of your changes straight away. Review the new reference architecture to jumpstart your data-driven quest to engage your players and make your games more successful, contact us, or sign up for a free trial of Google Cloud Platform to get started.

    Further Reading and Additional Resources

    - Posted by Oyvind Roti, Solutions Architect

    Processing logs at scale using Cloud Dataflow

    Logs generated by applications and services can provide an immense amount of information about how your deployment is running and the experiences your users are having as they interact with the products and services. But as deployments grow more complex, gleaning insights from this data becomes more challenging. Logs come from an increasing number of sources, so they can be hard to collate and query for useful information. And building, operating and maintaining your own infrastructure to analyze log data at scale requires extensive expertise in running distributed systems and storage. Today, we’re introducing a new solution paper and reference implementation that will show how you can process logs from multiple sources and extract meaningful information by using Google Cloud Platform and Google Cloud Dataflow.

    Log processing typically involves some combination of the following activities:

    • Configuring applications and services
    • Collecting and capturing log files
    • Storing and managing log data
    • Processing and extracting data
    • Persisting insights

    Each of those components has it’s own scaling and management challenges, often using different approaches at different times. These sorts of challenges can slow down the generation of meaningful, actionable information from your log data.

    Cloud Platform provides a number of services that can help you to address these challenges. You can use Cloud Logging to collect logs from applications and services, and then store them in Google Cloud Storage buckets or stream them to Pub/Sub topics. Dataflow can read from Cloud Storage or Pub/Sub (and many more), process log data, extract and transform metadata and compute aggregations. You can persist the output from Dataflow in BigQuery, where it can be analyzed or reviewed anytime. These mechanisms are offered as managed services—meaning they can scale when needed. That also means that you don't need to worry about provisioning resources up front.

    The solution paper and reference implementation describe how you can use Dataflow to process log data from multiple sources and persist findings directly in BigQuery. You’ll learn how to configure Cloud Logging to collect logs from applications running in Container Engine, how to export those logs to Cloud Storage, and how to execute the Dataflow processing job. In addition, the solution shows you how to reconfigure Cloud Logging to use Pub/Sub to stream data directly to Dataflow, so you can process logs in real-time.

    Check out the Processing Logs at Scale using Cloud Dataflow solution to learn how to combine logging, storage, processing and persistence into a scalable log processing approach. Then take a look at the reference implementation tutorial on Github to deploy a complete end-to-end working example. Feedback is welcome and appreciated; comment here, submit a pull request, create an issue, or find me on Twitter @crcsmnky and let me know how I can help.

    - Posted by Sandeep Parikh, Google Solutions Architect