Tag Archives: Technical

Processing logs at scale using Cloud Dataflow

Logs generated by applications and services can provide an immense amount of information about how your deployment is running and the experiences your users are having as they interact with the products and services. But as deployments grow more complex, gleaning insights from this data becomes more challenging. Logs come from an increasing number of sources, so they can be hard to collate and query for useful information. And building, operating and maintaining your own infrastructure to analyze log data at scale requires extensive expertise in running distributed systems and storage. Today, we’re introducing a new solution paper and reference implementation that will show how you can process logs from multiple sources and extract meaningful information by using Google Cloud Platform and Google Cloud Dataflow.

Log processing typically involves some combination of the following activities:

  • Configuring applications and services
  • Collecting and capturing log files
  • Storing and managing log data
  • Processing and extracting data
  • Persisting insights

Each of those components has it’s own scaling and management challenges, often using different approaches at different times. These sorts of challenges can slow down the generation of meaningful, actionable information from your log data.

Cloud Platform provides a number of services that can help you to address these challenges. You can use Cloud Logging to collect logs from applications and services, and then store them in Google Cloud Storage buckets or stream them to Pub/Sub topics. Dataflow can read from Cloud Storage or Pub/Sub (and many more), process log data, extract and transform metadata and compute aggregations. You can persist the output from Dataflow in BigQuery, where it can be analyzed or reviewed anytime. These mechanisms are offered as managed services—meaning they can scale when needed. That also means that you don't need to worry about provisioning resources up front.

The solution paper and reference implementation describe how you can use Dataflow to process log data from multiple sources and persist findings directly in BigQuery. You’ll learn how to configure Cloud Logging to collect logs from applications running in Container Engine, how to export those logs to Cloud Storage, and how to execute the Dataflow processing job. In addition, the solution shows you how to reconfigure Cloud Logging to use Pub/Sub to stream data directly to Dataflow, so you can process logs in real-time.


Check out the Processing Logs at Scale using Cloud Dataflow solution to learn how to combine logging, storage, processing and persistence into a scalable log processing approach. Then take a look at the reference implementation tutorial on Github to deploy a complete end-to-end working example. Feedback is welcome and appreciated; comment here, submit a pull request, create an issue, or find me on Twitter @crcsmnky and let me know how I can help.

- Posted by Sandeep Parikh, Google Solutions Architect

Faster builds for Java developers with Maven Central mirror

The Maven Central Repository is a key host of Java dependencies and is used by many popular build systems and dependency managers, such as Apache Maven, Gradle, Ivy, Grape and Bazel. Jason van Zyl, founder of Apache Maven, is hosting a complete mirror of the Maven Central Repository on Google Cloud Storage, meaning faster builds on Google Cloud Platform.

When you build a Maven project, Maven will check your pom.xml file for dependencies. If the dependency isn’t available locally, it needs to be pulled from an online repository. With a simple change to your settings.xml configuration file, a build system running on Cloud Platform – for example, Jenkins on Google Compute Engine or Google Cloud Shell – can now fetch your project’s dependencies from Cloud Storage, increasing the speed of your builds.

To use the Cloud Storage Maven Central mirror, add this in the settings.xml configuration file:


Access the Maven Central Repository via API 

Google provides API libraries to access Cloud Storage in Java, Python, Node.js and Ruby. The libraries can be used to access the Maven repository bucket. For example, the following snippet lists all the entities at the top of “maven-central” storage bucket:


If you want to learn more about Maven Central and its mirror on Cloud Platform, check out the post by Jason van Zyl, founder of Apache Maven.

**Java is a registered trademark of Oracle Corporation and/or its affiliates.

Posted by Ludovic Champenois, Google Software Engineer

Add backend logic to real-time data with Firebase and Google App Engine

Firebase is a platform for building Android, iOS and web-based mobile apps, offering real-time data storage and automatic synchronization across devices. But what about when you need to run backend processes on the data?

By connecting an App Engine application to your Firebase database, you can perform complex logic on the data without having to manage synchronization and updates; Firebase handles that for you.

Updates in the Android client release of Firebase 2.4.0 make it easy to access a Firebase database from an App Engine application.



The tutorial, Use Firebase and Google App Engine in an Android App, walks you through the steps to create an Android app that stores a to-do list in Firebase, and uses backend logic running on App Engine to send daily reminder emails.

In the process of working through the tutorial, you’ll learn how to use the following technologies:

  • Firebase — a platform for building mobile apps, offering realtime data storage and synchronization, user authentication, and more.
  • Android Studio — an Android development environment based on IntelliJ IDEA.
  • Cloud Tools for Android Studio — a set of tools included with Android Studio, that integrate with Google Cloud Platform services.
  • Google App Engine — an environment for running application code in the cloud; Platform as a service (PaaS).

- Posted by Benjamin Wulfe, Firebase