Tag Archives: Google Cloud Storage

Google Developer Group Spotlight: A conversation with Cloud Architect, Ilias Papachristos

Posted by Jennifer Kohl, Global Program Manager, Google Developer Communities

The Google Developer Groups Spotlight series interviews inspiring leaders of community meetup groups around the world. Our goal is to learn more about what developers are working on, how they’ve grown their skills with the Google Developer Group community, and what tips they might have for us all.

We recently spoke with Ilias Papachristos, Google Developer Group Cloud Thessaloniki Lead in Greece. Check out our conversation with Ilias on Cloud architecture, reading official documentation, and suggested resources to help developers grow professionally.

Tell us a little about yourself?

I’m a family man, ex-army helicopter pilot, Kendo sensei, beta tester at Coursera, Lead of the Google Developer Group Cloud Thessaloniki community, Google Cloud Professional Architect, and a Cloud Board Moderator on the Google Developers Community Leads Platform (CLP).

I love outdoor activities, reading books, listening to music, and cooking for my family and friends!

Can you explain your work in Cloud technologies?

Over my career, I have used Compute Engine for an e-shop, AutoML Tables for an HR company, and have architected the migration of a company in Mumbai. Now I’m consulting for a company on two of their projects: one that uses Cloud Run and another that uses Kubernetes.

Both of them have Cloud SQL and the Kubernetes project will use the AI Platform. We might even end up using Dataflow with BigQuery for the streaming and Scheduler or Manager, but I’m still working out the details.

I love the chance to share knowledge with the developer community. Many days, I open my PC, read the official Google Cloud blog, and share interesting articles on the CLP Cloud Board and GDG Cloud Thessaloniki’s social media accounts. Then, I check Google Cloud’s Medium publication for extra articles. Read, comment, share, repeat!

How did the Google Developer Group community help your Cloud career?

My overall knowledge of Google Cloud has to do with my involvement with Google Developer Groups. It is not just one thing. It’s about everything! At the first European GDG Leads Summit, I met so many people who were sharing their knowledge and offering their help. For a newbie like me it was and still is something that I keep in my heart as a treasure

I’ve also received so many informative lessons on public speaking from Google Developer Group and Google Developer Student Club Leads. They always motivate me to continue talking about the things I love!

What has been the most inspiring part of being a part of your local Google Developer Group?

Collaboration with the rest of the DevFest Hellas Team! For this event, I was a part of a small group of 12 organizers, all of whom never had hosted a large meetup before. With the help of Google Developer Groups, we had so much fun while creating a successful DevFest learning program for 360 people.

What are some technical resources you have found the most helpful for your professional development?

Besides all of the amazing tricks and tips you can learn from the Google Cloud training team and courses on the official YouTube channel, I had the chance to hear a talk by Wietse Venema on Cloud Run. I also have learned so much about AI from Dale Markovitz’s videos on Applied AI. And of course, I can’t leave out Priyanka Vergadia’s posts, articles, and comic-videos!

Official documentation has also been a super important part of my career. Here are five links that I am using right now as an Architect:

  1. Google Cloud Samples
  2. Cloud Architecture Center
  3. Solve with Google Cloud
  4. Google Cloud Solutions
  5. 13 sample architectures to kickstart your Google Cloud journey

How did you become a Google Developer Group Lead?

I am a member of the Digital Analytics community in Thessaloniki, Greece. Their organizer asked me to write articles to start motivating young people. I translated one of the blogs into English and published it on Medium. The Lead of GDG Thessaloniki read them and asked me to become a facilitator for a Cloud Study Jams (CSJ) workshop. I accepted and then traveled to Athens to train three people so that they could also become CSJ facilitators. At the end of the CSJ, I was asked if I wanted to lead a Google Developer Group chapter. I agreed. Maria Encinar and Katharina Lindenthal interviewed me, and I got it!

What would be one piece of advice you have for someone looking to learn more about a specific technology?

Learning has to be an amusing and fun process. And that’s how it’s done with Google Developer Groups all over the world. Join mine, here. It’s the best one. (Wink, wink.)

Want to start growing your career and coding knowledge with developers like Ilias? Then join a Google Developer Group near you, here.

Sports Authority handles 2,000 transactions per second with Google Cloud Platform

(Cross-posted on the Google Cloud Platform Blog.)

Athletic gear, much like all apparel categories, is quickly shifting to an online sales business. Sports Authority, seeing the benefits that cloud could offer around agility and speed, turned to Google Cloud Platform to help it respond to its customers faster.

In 2014, Sports Authority’s technical team was asked to build a solution that would expose all in-store product inventory to its ecommerce site, sportsauthority.com, allowing customers to see local store availability of products as they were shopping online. That’s nearly half a million products to choose from in over 460 stores across the U.S. and Puerto Rico.

This use case posed a major challenge for the company. Its in-store inventory data was “locked” deep inside a mainframe. Exposing millions of products to thousands of customers, 24 hours a day, seven days a week would not be possible using this system.

The requirements for a new solution included finding the customer’s location, searching the 90 million record inventory system and returning product availability in just the handful of stores nearest in location to that particular customer. On top of that, the API would need to serve at least 50 customers per second, while returning results in less than 200 milliseconds.

Choosing the right cloud provider

At the time this project began, Sports Authority had already been a Google Apps for Work (Gmail, Google Sites, Docs) customer since 2011. However, it had never built any custom applications on Google Cloud Platform.

After a period of due diligence checking out competing cloud provider options, Sports Authority decided that Google App Engine and Google Cloud Datastore had the right combination of attributes — elastic scaling, resiliency and simplicity of deployment — to support this new solution.

Through the combined efforts of a dedicated project team, business partners and three or four talented developers, it was able to build a comprehensive solution on Cloud Platform in about five months. It consisted of multiple modules: 1) batch processes, using Informatica to push millions of product changes from its IBM mainframe to Google Cloud Storage each night, 2) load processes — python code running on App Engine, which spawn task queue jobs to load Cloud Datastore, and 3) a series of SOAP and REST APIs to expose the search functionality to its ecommerce website.

Sports Authority used tools including SOAPUI and LOADUI to simulate thousands of virtual users to measure the scalability of SOAP and REST APIs. It found that as the number of transactions grew past 2,000 per second, App Engine and Cloud Datastore continued to scale seamlessly, easily meeting its target response times.

The company implemented the inventory locator solution just in time for the 2014 holiday season. It performed admirably during that peak selling period and continues to do so today.
This screenshot shows what customers see when they shop for products on the website — a list of local stores, showing the availability of any given product in each store

When a customer finds a product she's interested in buying, the website requests inventory availability from Sports Authority’s cloud API, which provides a list of stores and product availability to the customer, as exhibited in the running shoe example above.

In-store kiosk

As Sports Authority became comfortable building solutions on Cloud Platform, it opened its eyes to other possibilities for creating new solutions to better serve its customers.

For example, it recently developed an in-store kiosk, which allows customers to search for products that may not be available in that particular store. It also lets them enroll in the loyalty program and purchase gift cards. This kiosk is implemented on a Google Chromebox, connected to a web application running on App Engine.
This image shows the in-store kiosk that customers use to locate products available in other stores. 

Internal store portal

Additionally, it built a store portal and task management system, which facilitates communication between the corporate office and its stores. This helps the store team members plan and execute their work more efficiently, allowing them to serve customers better when needs arise. This solution utilizes App Engine, Cloud Datastore and Google Custom Search, and was built with the help of a local Google partner, Tempus Nova.
This screenshot shows the internal store portal that employees use to monitor daily tasks.

Learning how to build software in any new environment such as Cloud Platform takes time, dedication and a willingness to learn. Once up to speed, the productivity and power of Google Cloud Platform allowed the Sports Authority team to work like a software company and build quickly while wielding great power.

What it looks like to process 3.5 million books in Google’s cloud

Today’s guest blog comes from Kalev Leetaru, founder of The GDELT Project, which monitors the
world’s news media in nearly every country in over 100 languages to identify the events and narratives driving our global society.

This past September I published into Google BigQuery a massive new public dataset of metadata from 3.5 million digitized English-language books dating back more than two centuries (1800-2015), along with the full text of 1 million of these books. The archive, which draws from the English-language public domain book collections of the Internet Archive and HathiTrust, includes full publication details for every book, along with a wide array of computed content-based data. The entire archive is available as two public BigQuery datasets, and there’s a growing collection of sample queries to help users get started with the collection. You can even map two centuries of books with a single line of SQL.

What did it look like to process 3.5 million books? Data-mining and creating a public archive of 3.5 million books is an example of an application perfectly suited to the cloud, in which a large amount of specialized processing power is needed for only a brief period of time. Here are the five main steps that I took to make the invaluable learnings of millions of books more easily and speedily accessible in the cloud:
  1. The project began with a single 8-core Google Compute Engine (GCE) instance with a 2TB SSD persistent disk that was used to download the 3.5 million books. I downloaded the books to the instance’s local disk, unzipped them, converted them into a standardized file format, and then uploaded them to Google Cloud Storage (GCS) in large batches, using the composite objects and parallel upload capability of GCS. Unlike traditional UNIX file systems, GCS performance does not degrade with large numbers of small files in a single directory, so I could upload all 3.5 million files into a common set of directories.
    Figure 1: Visualization of two centuries of books
  2. Once all books had been downloaded and stored into GCS, I launched ten 16-core High Mem (100GB RAM) GCE instances (160 cores total) to process the books, each with a 50GB persistent SSD root disk to achieve faster IO over traditional persistent disks. To launch all ten instances quickly, I launched the first instance and configured that with all of the necessary software libraries and tools, then created and used a disk snapshot to rapidly clone the other nine with just a few clicks. Each of the ten compute instances would download a batch of 100 books at a time to process from GCS.
  3. Once the books had been processed, I uploaded back into GCS all of the computed metadata. In this way, GCS served as a central storage fabric connecting the compute nodes. Remarkably, even in worst-case scenarios when all 160 processors were either downloading new batches of books from GCS or uploading output files back to GCS in parallel, there was no measurable performance degradation.
  4. With the books processed, I deleted the ten compute instances and launched a single 32-core instance with 200GB of RAM, a 10TB persistent SSD disk, and four 375GB direct-attached Local SSD Disks. I used this to reassemble the 3.5 million per-book output files into single output files, tab-delimited with data available for each year, merging in publication metadata and other information about each book. Disk IO of more than 750MB/s was observed on this machine.
  5. I then uploaded the final per-year output files to a public GCS directory with web downloading enabled, allowing the public to download the files.
Since very few researchers have the bandwidth, local storage or computing power to process even just the metadata of 3.5 million books, the entire collection was uploaded into Google BigQuery as a public dataset. Using standard SQL queries, you can explore the entire collection in tens of seconds at speeds of up to 45.5GB/s and perform complex analyses entirely in-database.

The entire project, from start to finish, took less than two weeks, a good portion of which consisted of human verification for issues with the publication metadata. This is significant because previous attempts to process even a subset of the collection on a modern HPC supercluster had taken over one month and completed only a fraction of the number of books examined here. The limiting factor was always the movement of data: transferring terabytes of books and their computed metadata across hundreds of processors.

This is where Google’s cloud offerings shine, seemingly purpose-built for data-first computing. In just two weeks, I was able to process 3.5 million books, spinning up a cluster of 160 cores and 1TB of RAM, followed by a single machine with 32 cores, 200GB of RAM, 10TB of SSD disk and 1TB of direct-attached scratch SSD disk. I was able to make the final results publicly accessible through BigQuery at query speeds of over 45.5GB/s.

You can access the entire collection today in BigQuery, explore sample queries, and read more technical detail about the processing pipeline on the GDELT Blog.

I’d like to thank Google, Clemson University, the Internet Archive, HathiTrust, and OCLC in making this project possible, along with all of the contributing libraries and digitization sponsors that have made these digitized books available.

- Posted by Kalev Leetaru, founder of The GDELT Project