Tag Archives: Research Awards

Google Research, 2022 & beyond: Research community engagement


(This is Part 9 in our series of posts covering different topical areas of research at Google. You can find other posts in the series here.)

Sharing knowledge is essential to Google’s research philosophy — it accelerates technological progress and expands capabilities community-wide. Solving complex problems requires bringing together diverse minds and resources collaboratively. This can be accomplished through building local and global connections with multidisciplinary experts and impacted communities. In partnership with these stakeholders, we bring our technical leadership, product footprint, and resources to make progress against some of society's greatest opportunities and challenges.

We at Google see it as our responsibility to disseminate our work as contributing members of the scientific community and to help train the next generation of researchers. To do this well, collaborating with experts and researchers outside of Google is essential. In fact, just over half of our scientific publications highlight work done jointly with authors outside of Google. We are grateful to work collaboratively across the globe and have only increased our efforts with the broader research community over the past year. In this post, we will talk about some of the opportunities afforded by such partnerships, including:


Addressing social challenges together

Engaging the wider community helps us progress on seemingly intractable problems. For example, access to timely, accurate health information is a significant challenge among women in rural and densely populated urban areas across India. To solve this challenge, ARMMAN developed mMitra, a free mobile service that sends preventive care information to expectant and new mothers. Adherence to such public health programs is a prevalent challenge, so researchers from Google Research and the Indian Institute of Technology, Madras worked with ARMMAN to design an ML system that alerts healthcare providers about participants at risk of dropping out of the health information program. This early identification helps ARMMAN provide better-targeted support, improving maternal health outcomes.

Google Research worked with ARMMAN to design a system to alert healthcare providers about participants at risk for dropping out of their preventative care information program for expectant mothers. This plot shows the cumulative engagement drops prevented using our restless multi-armed bandit model (RMAB) compared to the control group (Round Robin).

We also support Responsible AI projects directly for other organizations — including our commitment of $3M to fund the new INSAIT research center based in Bulgaria. Further, to help build a foundation of fairness, interpretability, privacy, and security, we are supporting the establishment of a first-of-its-kind multidisciplinary Center for Responsible AI with a grant of $1M to the Indian Institute of Technology, Madras.

Top


Training the next generation of researchers

Part of our responsibility in guiding how technology affects society is to help train the next generation of researchers. For example, supporting equitable student persistence in computing research through our Computer Science Research Mentorship Program, where Googlers have mentored over one thousand students since 2018 — 86% of whom identify as part of a historically marginalized group.

We work towards inclusive goals and work across the globe to achieve them. In 2022, we expanded our research interactions and programs to faculty and students across Latin America, which included grants to women in computer science in Ecuador. We partnered with ENS, a university in France, to help fund scholarships for students to train through research. Another example is our collaboration with the Computing Alliance of Hispanic-Serving Institutions (CAHSI) to provide $4.8 million to support more than 30 collaborative research projects and over 3,000 Hispanic students and faculty across a network of Hispanic-serving institutions.

Efforts like these foster the research ecosystem and help the community give back. Through exploreCSR, we partner with universities to provide students with introductory experiences in research, such as Rice University’s regional workshop on applications and research in data science (ReWARDS), which was delivered in rural Peru by faculty from Rice. Similarly, one of our Awards for Inclusion Research led to a faculty member helping startups in Africa use AI.

The funding we provide is most often unrestricted and leads to inspiring results. Last year, for example, Kean University was one of 53 institutions to receive an exploreCSR award. It used the funding to create the Research Recruits Program, a two-semester program designed to give undergraduates an introductory opportunity to participate in research with a faculty mentor. A student at Kean with a chronic condition that requires him to take different medications every day, a struggle that affects so many, decided to pursue research on the topic with a peer. Their research, set to be published this year, demonstrates an ML solution, built with Google's TensorFlow, that can identify pills with 99.8% certainty when used correctly. Results like these are why we continue to invest in younger generations, further demonstrated by our long-term commitment to funding PhD Fellows every year across the globe.

Building an inclusive ecosystem is imperative. To this end, we've also partnered with the non-profit Black in Robotics (BiR), formed to address the systemic inequities in the robotics community. Together, we established doctoral student awards that help financially support graduate students and to support BiR’s newly established Bay Area Robotics lab. We also help make global conferences accessible to more researchers around the world, for example, by funding 24 students this year to attend Deep Learning Indaba in Tunisia.

Top


Collaborating to advance scientific innovations

In 2022 Google sponsored over 150 research conferences and even more workshops, which leads to invaluable engagements with the broader research community. At research conferences, Googlers serve on program committees and organize workshops, tutorials and numerous other activities to collectively advance the field. Additionally, last year, we hosted over 14 dedicated workshops to bring together researchers, such as the 2022 Quantum Symposium, which generates new ideas and directions for the research field, further advancing research initiatives. In 2022, we authored 2400 papers, many of which were presented at leading research conferences, such as NeurIPS, EMNLP, ECCV, Interspeech, ICML, CVPR, ICLR, and many others. More than 50% of these papers were authored in collaboration with researchers beyond Google.

Over the past year, we've expanded our engagement models to facilitate students, faculty, and Google's research scientists coming together across schools to form constructive research triads. One such project, undertaken in partnership with faculty and students from Georgia Tech, aims to develop a robot guide dog with human behavior modeling and safe reinforcement learning. Throughout 2022, we gave over 224 grants to researchers and over $10M in Google Cloud Platform credits for topics ranging from the improvement of algorithms for post-quantum cryptography with collaborators at CNRS in France to fostering cybersecurity research at TU Munich and Fraunhofer AISEC in Germany.

In 2022, we made 22 new multi-year commitments totaling over ~$80M to 65 institutions across nine countries, where each year we will host workshops to select over 100 research projects of mutual interest. For example, in a growing partnership, we are supporting the new Max Planck VIA-Center in Germany to work together on robotics. Another large area of investment is a close partnership with four universities in Taiwan (NTU, NCKU, NYCU, NTHU) to increase innovation in silicon chip design and improve competitiveness in semiconductor design and manufacturing. We aim to collaborate by default and were proud to be recently named one of Australia's top collaborating companies.

Top


Fueling innovation in products and engineering

The community fuels innovation at Google. For example, by facilitating student researchers to work with us on defined research projects, we've experienced both incremental and more dramatic improvements. Together with visiting researchers, we combine information, compute power, and a great deal of expertise to bring about breakthroughs, such as leveraging our undersea internet cables to detect earthquakes. Visiting Researchers also worked hand-in-hand with us to develop Minerva, a state-of-the-art solution that came about by training a deep learning model on a dataset that contains quantitative reasoning with symbolic expressions.

Minerva incorporates recent prompting and evaluation techniques to better solve mathematical questions. It then employs majority voting, in which it generates multiple solutions to each question and chooses the most common answer as the solution, thus improving performance significantly.

Top


Open-sourcing datasets and tools

Engaging with the broader research community is a core part of our efforts to build a more collaborative ecosystem. We support the general advancement of ML and related research through the release of open-source code and datasets. We continued to grow open source datasets in 2022, for example, in natural language processing and vision, and expanded our global index of available datasets in Google Dataset Search. We also continued to release sustainability data via Data Commons and invite others to use it for their research. See some of the datasets and tools we released in 2022 listed below.


Dataset Description
  
Auto-Arborist A multiview urban tree classification dataset that consists of ~2.6M trees covering >320 genera, which can aid in the development of models for urban forest monitoring.
  
Bazel GitHub Metrics A dataset with GitHub download counts of release artifacts from selected bazelbuild repositories.
  
BC-Z demonstration Episodes of a robotic arm performing 100 different manipulation tasks. Data for each episode includes the RGB video, the robot's end-effector positions, and the natural language embedding.
  
BEGIN V2 A benchmark dataset for evaluating dialog systems and natural language generation metrics.
  
CLSE: Corpus of Linguistically Significant Entities A dataset of named entities annotated by linguistic experts. It includes 34 languages and 74 different semantic types to support various applications from airline ticketing to video games.
  
CocoChorales A dataset consisting of over 1,400 hours of audio mixtures containing four-part chorales performed by 13 instruments, all synthesized with realistic-sounding generative models.
  
Crossmodal-3600 A geographically diverse dataset of 3,600 images, each annotated with human-generated reference captions in 36 languages.
  
CVSS: A Massively Multilingual Speech-to-Speech Translation Corpus A Common Voice-based Speech-to-Speech translation corpus that includes 2,657 hours of speech-to-speech translation sentence pairs from 21 languages into English.
  
DSTC11 Challenge Task This challenge evaluates task-oriented dialog systems end-to-end, from users' spoken utterances to inferred slot values.
  
EditBench A comprehensive diagnostic and evaluation dataset for text-guided image editing.
  
Few-shot Regional Machine Translation FRMT is a few-shot evaluation dataset containing en-pt and en-zh bitexts translated from Wikipedia, in two regional varieties for each non-English language (pt-BR and pt-PT; zh-CN and zh-TW).
  
Google Patent Phrase Similarity A human-rated contextual phrase-to-phrase matching dataset focused on technical terms from patents.
  
Hinglish-TOP Hinglish-TOP is the largest code-switched semantic parsing dataset with 10k entries annotated by humans, and 170K generated utterances using the CST5 augmentation technique introduced in the paper.
  
ImPaKT A dataset that contains semantic parsing annotations for 2,489 sentences from shopping web pages in the C4 corpus, corresponding to annotations of 3,719 expressed implication relationships and 6,117 typed and summarized attributes.
  
InFormal A formality style transfer dataset for four Indic Languages, made up of a pair of sentences and a corresponding gold label identifying the more formal and semantic similarity.
  
MAVERICS A suite of test-only visual question answering datasets, created from Visual Question Answering image captions with question answering validation and manual verification.
  
MetaPose A dataset with 3D human poses and camera estimates predicted by the MetaPose model for a subset of the public Human36M dataset with input files necessary to reproduce these results from scratch.
  
MGnify proteins A 2.4B-sequence protein database with annotations.
  
MiQA: Metaphorical Inference Questions and Answers MiQA assesses the capability of language models to reason with conventional metaphors. It combines the previously isolated topics of metaphor detection and commonsense reasoning into a single task that requires a model to make inferences by selecting between the literal and metaphorical register.
  
MT-Opt A dataset of task episodes collected across a fleet of real robots, following the RLDS format to represent steps and episodes.
  
MultiBERTs Predictions on Winogender Predictions of BERT on Winogender before and after several different interventions.
  
Natural Language Understanding Uncertainty Evaluation NaLUE is a relabelled and aggregated version of three large NLU corpuses CLINC150, Banks77 and HWU64. It contains 50k utterances spanning 18 verticals, 77 domains, and ~260 intents.
  
NewsStories A collection of url links to publicly available news articles with their associated images and videos.
  
Open Images V7 Open Images V7 expands the Open Images dataset with new point-level label annotations, which provide localization information for 5.8k classes, and a new all-in-one visualization tool for better data exploration.
  
Pfam-NUniProt2 A set of 6.8 million new protein sequence annotations.
  
Re-contextualizing Fairness in NLP for India A dataset of region and religion-based societal stereotypes in India, with a list of identity terms and templates for reproducing the results from the "Re-contextualizing Fairness in NLP" paper.
  
Scanned Objects A dataset with 1,000 common household objects that have been 3D scanned for use in robotic simulation and synthetic perception research.
  
Specialized Rater Pools This dataset comes from a study designed to understand whether annotators with different self-described identities interpret toxicity differently. It contains the unaggregated toxicity annotations of 25,500 comments from pools of raters who self-identify as African American, LGBTQ, or neither.
  
UGIF A multi-lingual, multi-modal UI grounded dataset for step-by-step task completion on the smartphone.
  
UniProt Protein Names Data release of ~49M protein name annotations predicted from their amino acid sequence.
  
upwelling irradiance from GOES-16 Climate researchers can use the 4 years of outgoing longwave radiation and reflected shortwave radiation data to analyze important climate forcers, such as aircraft condensation trails.
  
UserLibri The UserLibri dataset reorganizes the existing popular LibriSpeech dataset into individual “user” datasets consisting of paired audio-transcript examples and domain-matching text-only data for each user. This dataset can be used for research in speech personalization or other language processing fields.
  
VideoCC A dataset containing (video-URL, caption) pairs for training video-text machine learning models.
  
Wiki-conciseness A manually curated evaluation set in English for concise rewrites of 2,000 Wikipedia sentences.
  
Wikipedia Translated Clusters Introductions to English Wikipedia articles and their parallel versions in 10 other languages, with machine translations to English. Also includes synthetic corruptions to the English versions, to be identified with NLI models.
  
Workload Traces 2022 A dataset with traces that aim to help system designers better understand warehouse-scale computing workloads and develop new solutions for front-end and data-access bottlenecks.


Tool Description
  
Differential Privacy Open Source Library An open-source library to enable developers to use analytic techniques based on DP.
  
Mood Board Search The result of collaborative work with artists, photographers, and image researchers to demonstrate how ML can enable people to visually explore subjective concepts in image datasets.
  
Project Relate An Android beta app that uses ML to help people with non-standard speech make their voices heard.
  
TensorStore TensorStore is an open-source C++ and Python library designed for storage and manipulation of n-dimensional data, which can address key engineering challenges in scientific computing through better management and processing of large datasets.
  
The Data Cards Playbook A Toolkit for Transparency in Dataset Documentation.

Top

Conclusion

Research is an amplifier, an accelerator, and an enabler — and we are grateful to partner with so many incredible people to harness it for the good of humanity. Even when investing in research that advances our products and engineering, we recognize that, ultimately, this fuels what we can offer our users. We welcome more partners to engage with us and maximize the benefits of AI for the world.


Acknowledgements

Thank you to our many research partners across the globe, including academics, universities, NGOs, and research organizations, for continuing to engage and work with Google on exciting research efforts. There are many teams within GoogIe who make this work possible, including Google’s research teams and community, research partnerships, education, and policy teams. Finally, I would especially like to thank those who provided helpful feedback in the development of this post, including Sepi Hejazi Moghadam, Jill Alvidrez, Melanie Saldaña, Ashwani Sharma, Adriana Budura Skobeltsyn, Aimin Zhu, Michelle Hurtado, Salil Banerjee and Esmeralda Cardenas.

Top


Google Research, 2022 & beyond

This was the ninth and final blog post in the “Google Research, 2022 & Beyond” series. Other posts in this series are listed in the table below:


Source: Google AI Blog


Announcing the 2019 Google Faculty Research Award Recipients



In Fall 2019, we opened our annual call for the Google Faculty Research Awards, a program focused on supporting the world-class technical research in Computer Science, Engineering and related fields performed at academic institutions around the world. These awards give Google researchers the opportunity to partner with faculty who are doing impactful research, additionally covering tuition for a student.

This year we received 917 proposals from ~50 countries and over 330 universities, and had the opportunity to increase our investment in several research areas related to Health, Accessibility, AI for Social Good, and ML Fairness. All proposals went through an extensive review process involving 1100 expert reviewers across Google who assessed the proposals on merit, innovation, connection to Google’s products/services and alignment with our overall research philosophy.

As a result of these reviews, Google is funding 150 promising proposals across a wide range of research areas, from Machine Learning, Systems, Human Computer Interaction and many more, with 26% of the funding awarded to universities outside the United States. Additionally, 27% of our recipients this year identified as a historically underrepresented group within technology. This is just the beginning of a larger investment in underrepresented communities and we are looking forward to sharing our 2020 initiatives soon.

Congratulations to the well-deserving recipients of this round's awards. More information on our faculty funding programs can be found on our website.

Source: Google AI Blog


Google Faculty Research Awards 2018



We just completed another round of the Google Faculty Research Awards, our annual open call for proposals on computer science and related topics, such as quantum computing, machine learning, algorithms and theory, natural language processing and more. Our grants cover tuition for a graduate student and provide both faculty and students the opportunity to work directly with Google researchers and engineers.

This round we received 910 proposals covering 40 countries and over 320 universities. After expert reviews and committee discussions, we decided to fund 158 projects. The subject areas that received the most support this year were human computer interaction, machine learning, machine perception, and systems.

Congratulations to the well-deserving recipients of this round's awards. More information on how to apply for the next round will be available at the end of the summer on our website. You can find award recipients from previous years here.

Source: Google AI Blog


Announcing the Google Cloud Platform Research Credits Program



Scientists across nearly every discipline are researching ever larger and more complex data sets, using tremendous amounts of compute power to learn, make discoveries and build new tools that few could have imagined only a few years ago. Traditionally, this kind of research has been limited by the availability of resources, with only the largest universities or industry partners able to successfully pursue these endeavors. However, the power of cloud computing has been removing obstacles that many researchers used to face, enabling projects that use machine learning tools to understand and address student questions and that study robotic interactions with humans, among many more.

In order to ensure that more researchers have access to powerful cloud tools, we’re launching Google Cloud Platform (GCP) research credits, a new program aimed to support faculty in qualified regions who want to take advantage of GCP’s compute, analytics, and machine-learning capabilities for research. Higher education researchers can use GCP research credits in a multitude of ways — below are just three examples to illustrate how GCP can help propel your research forward.

Andrew V. Sutherland, a computational number theorist and Principal Research Scientist at the Massachusetts Institute of Technology, is one of a growing number of academic researchers who has already made the transition and benefited from GCP. His team moved his extremely large database to GCP because “we are mathematicians who want to focus on our research, and not have to worry about hardware failures or scaling issues with the website.”

Ryan Abernathey, Assistant Professor of Earth and Environmental Sciences, Ocean and Climate Physics at the Lamont-Doherty Earth Observatory at Columbia University, used Google Cloud credits through an NSF partnership and, with his team, developed an open-source platform to manage the complex data sets of climate science. The platform, called Pangeo, can run Earth System Modeling simulations on petabytes of high-resolution, three-dimensional data. “This is the future of what day-to-day science research computing will look like,” he predicts.

At the Stanford Center for Genomics and Personalized Medicine (SCGPM), researchers using GCP and BigQuery can now run hundreds of genomes through a variant analysis pipeline and get query results quickly. Mike Snyder, director of SCGPM, notes, “We’re entering an era where people are working with thousands or tens of thousands or even million genome projects, and you’re never going to do that on a local cluster very easily. Cloud computing is where the field is going.”

The GCP research credits program is open to faculty doing cutting-edge research in eligible countries. We’re eager to hear how we can help accelerate your progress. If you’re interested, you can learn more on our website or apply now.

Announcing the Google Cloud Platform Research Credits Program



Scientists across nearly every discipline are researching ever larger and more complex data sets, using tremendous amounts of compute power to learn, make discoveries and build new tools that few could have imagined only a few years ago. Traditionally, this kind of research has been limited by the availability of resources, with only the largest universities or industry partners able to successfully pursue these endeavors. However, the power of cloud computing has been removing obstacles that many researchers used to face, enabling projects that use machine learning tools to understand and address student questions and that study robotic interactions with humans, among many more.

In order to ensure that more researchers have access to powerful cloud tools, we’re launching Google Cloud Platform (GCP) research credits, a new program aimed to support faculty in qualified regions who want to take advantage of GCP’s compute, analytics, and machine-learning capabilities for research. Higher education researchers can use GCP research credits in a multitude of ways — below are just three examples to illustrate how GCP can help propel your research forward.

Andrew V. Sutherland, a computational number theorist and Principal Research Scientist at the Massachusetts Institute of Technology, is one of a growing number of academic researchers who has already made the transition and benefited from GCP. His team moved his extremely large database to GCP because “we are mathematicians who want to focus on our research, and not have to worry about hardware failures or scaling issues with the website.”

Ryan Abernathey, Assistant Professor of Earth and Environmental Sciences, Ocean and Climate Physics at the Lamont-Doherty Earth Observatory at Columbia University, used Google Cloud credits through an NSF partnership and, with his team, developed an open-source platform to manage the complex data sets of climate science. The platform, called Pangeo, can run Earth System Modeling simulations on petabytes of high-resolution, three-dimensional data. “This is the future of what day-to-day science research computing will look like,” he predicts.

At the Stanford Center for Genomics and Personalized Medicine (SCGPM), researchers using GCP and BigQuery can now run hundreds of genomes through a variant analysis pipeline and get query results quickly. Mike Snyder, director of SCGPM, notes, “We’re entering an era where people are working with thousands or tens of thousands or even million genome projects, and you’re never going to do that on a local cluster very easily. Cloud computing is where the field is going.”

The GCP research credits program is open to faculty doing cutting-edge research in eligible countries. We’re eager to hear how we can help accelerate your progress. If you’re interested, you can learn more on our website or apply now.

Source: Google AI Blog


Google Faculty Research Awards 2017



We’ve just completed another round of the Google Faculty Research Awards, our annual open call for proposals on computer science and related topics such as machine learning, machine perception, natural language processing, and quantum computing. Our grants cover tuition for a graduate student and provide both faculty and students the opportunity to work directly with Google researchers and engineers.

This round we received 1033 proposals covering 46 countries and over 360 universities. After expert reviews and committee discussions, we decided to fund 152 projects. The subject areas that received the most support this year were human computer interaction, machine learning, machine perception, and systems. Here are a few observations from this round:
  • There was a 17% increase in the total number of proposals received
  • There was a 87% increase in the number of proposals from Asia Pacific universities
  • Proposals focused on Computational Neuroscience increased 53%
  • Proposals focused on Quantum Computing more than doubled this round
Congratulations to the well-deserving recipients of this round’s awards. If you are interested in applying for the next round (September 2018 deadline), please visit our website for more information. You can find award recipients from previous years here.

Google Faculty Research Awards 2017



We’ve just completed another round of the Google Faculty Research Awards, our annual open call for proposals on computer science and related topics such as machine learning, machine perception, natural language processing, and quantum computing. Our grants cover tuition for a graduate student and provide both faculty and students the opportunity to work directly with Google researchers and engineers.

This round we received 1033 proposals covering 46 countries and over 360 universities. After expert reviews and committee discussions, we decided to fund 152 projects. The subject areas that received the most support this year were human computer interaction, machine learning, machine perception, and systems. Here are a few observations from this round:
  • There was a 17% increase in the total number of proposals received
  • There was a 87% increase in the number of proposals from Asia Pacific universities
  • Proposals focused on Computational Neuroscience increased 53%
  • Proposals focused on Quantum Computing more than doubled this round
Congratulations to the well-deserving recipients of this round’s awards. If you are interested in applying for the next round (September 2018 deadline), please visit our website for more information. You can find award recipients from previous years here.

Source: Google AI Blog


Google Research Awards 2016



We’ve just completed another round of the Google Research Awards, our annual open call for proposals on computer science and related topics including machine learning, machine perception, natural language processing, and security. Our grants cover tuition for a graduate student and provide both faculty and students the opportunity to work directly with Google researchers and engineers.

This round we received 876 proposals covering 44 countries and over 300 universities. After expert reviews and committee discussions, we decided to fund 143 projects. Here are a few observations from this round:


Congratulations to the well-deserving recipients of this round’s awards. If you are interested in applying for the next round (deadline is September 30th), please visit our website for more information.

An Update on fast Transit Routing with Transfer Patterns



What is the best way to get from A to B by public transit? Google Maps is answering such queries for over 20,000 cities and towns in over 70 countries around the world, including large metro areas like New York, São Paulo or Moscow, and some complete countries, such as Japan or Great Britain.
Since its beginnings in 2005 with the single city of Portland, Oregon, the number of cities and countries served by Google’s public transit directions has been growing rapidly. With more and larger regions, the amount of data we need to search in order to provide optimal directions has grown as well. In 2010, the search speed of transit directions made a leap ahead of that growth and became fast enough to update the result while you drag the endpoints. The technique behind that speed-up is the Transfer Patterns algorithm [1], which was created at Google’s engineering office in Zurich, Switzerland, by visiting researcher Hannah Bast and a number of Google engineers.

I am happy to report that this research collaboration has continued and expanded with the Google Focused Research Award on Next-Generation Route Planning. Over the past three years, this grant has supported Hannah Bast’s research group at the University of Freiburg, as well as the research groups of Peter Sanders and Dorothea Wagner at the Karlsruhe Institute of Technology (KIT).

From the project’s numerous outcomes, I’d like to highlight two recent ones that re-examine the Transfer Patterns approach and massively improve it for continent-sized networks: Scalable Transfer Patterns [2] and Frequency-Based Search for Public Transit [3] by Hannah Bast, Sabine Storandt and Matthias Hertel. This blogpost presents the results from these publications.

The notion of a transfer pattern is easy to understand. Suppose you are at a transit stop downtown, call it A, and want to go to some stop B as quickly as possible. Suppose further you brought a printed schedule book but no smartphone. (This sounded plausible only a few years ago!) As a local, you might know that there are only two reasonable options:
  1. Take a tram from A to C, then transfer at C to a bus to B.
  2. Take the direct bus from A to B, which only runs infrequently.
We say the first option has transfer pattern A-C-B, and the second option has transfer pattern A-B. Notice that no in-between stops are mentioned. This is very compact information, much less than the actual schedules, but it makes looking up the schedules significantly faster: Knowing that all optimal trips follow one of these patterns, you only need to look at those lines in the schedule book that provide direct connections from A to C, C to B and A to B. All other lines can safely be ignored: you know you will not miss a better option.

While the basic idea of transfer patterns is indeed that simple, it takes more to make it work in practice. The transfer patterns of all optimal trips have to be computed ahead of time and stored, so that they are available to answer queries. Conceptually, we need transfer patterns for every pair of stops, because any pair could come up in a query. It is perfectly reasonable to compute them for all pairs within one city, or even one metro area that is densely criss-crossed by a transit network comprising, say, a thousand stops, yielding a million of pairs to consider.

As the scale of the problem increases from one metro area to an entire country or continent, this “all pairs” approach rapidly becomes expensive: ten thousand stops (10x more than above) already yield a hundred million pairs (100x more than above), and so on. Also, the individual transfer patterns become quite repetitive: For example, from any stop in Paris, France to any stop in Cologne, Germany, all optimal connections end up using the same few long-distance train lines in the middle, only the local connections to the railway stations depend on the specific pair of stops considered.

However, designated long-distance connections are not the only way to travel between different local networks – they also overlap and connect to each other. For mid-range trips, there is no universally correct rule when to choose a long-distance train or intercity bus, short of actually comparing options with local or regional transit, too.

The Scalable Transfer Patterns algorithm [2] does just that, but in a smart way. For starters, it uses what is known as graph clustering to cut the network into pieces, called clusters, that have a lot of connections inside but relatively few to the outside. As an example, the figure below (kindly provided by the authors) shows a partitioning of Germany into clusters. The stops highlighted in red are border stops: They connect directly to stops outside the cluster. Notice how they are a small fraction of the total network.
The public transit network of Germany (dots and lines), split into clusters (shown in various colors). Of all 251,763 stops, only 10,886 (4.32%) are boundary stops, highlighted as red boxes. Click here to view the full resolution image.[source: S. Storandt, 2016]
Based on the clustering, the transfer patterns of all optimal connections are computed in two steps.

In step 1, transfer patterns are computed for optimal connections inside each cluster. They are stored for query processing later on, but they also accelerate the search through a cluster in the following step: between the stops on its border, we only need to consider the connections captured in the transfer patterns.

The next figure sketches how the transit network in the cluster around Berlin gets reduced to much fewer connections between border stations. (The central station stands out as a hub, as expected. It is a border station itself, because it has direct connections out of the cluster.)
The cluster of public transit connections around Berlin (shown as dots and lines in light blue), its border stops (highlighted as red boxes), and the transfer patterns of optimal connections between border stops (thick black lines; only the most important 111 of 592 are shown to keep the image legible). This cuts out 96.15% of the network (especially a lot of the high-frequency inner city trips, which makes the time savings even bigger). Click here to view the full resolution image. [source: S. Storandt, 2016]
In step 2, transfer patterns can be computed for the entire network, that is, between any pair of clusters. This is done with the following twists:

  • It suffices to consider trips from and to boundary stops of any cluster; the local transfer patterns from step 1 will supply the missing pieces later on.
  • The per-cluster transfer patterns from step 1 greatly accelerate the search across other clusters.
  • The search stops exploring any possible connection between two boundary stops as soon as it gets worse than a connection that sticks to long-distance transit between clusters (which may not always be optimal, but is always quick to compute).

The results of steps 1 and 2 are stored and used to answer queries. For any given query from some A to some B, one can now easily stitch together a network of transfer patterns that covers all optimal connections from A to B. Looking up the direct connections on that small network (like in the introductory example) and finding the best one for the queried time is very fast, even if A and B are far away.

The total storage space needed for this is much smaller than the space that would be needed for all pairs of stops, all the more the larger the network gets. Extrapolating from their experiments, the researchers estimate [2] that Scalable Transfer Patterns for the whole world could be stored in 30 GB, cutting their estimate for the basic Transfer Patterns by a thousand(!). This is considerably more powerful than the “hub station” idea from the original Transfer Patterns paper [1].

The time needed to compute Scalable Transfer Patterns is also estimated to shrink by three orders of magnitude: At a high level, the earlier phases of the algorithm accelerate the later ones, as described above. At a low level, a second optimization technique kicks in: exploiting the repetitiveness of schedules in time. Recall that finding transfer patterns is all about finding the optimal connections between pairs of stops at any possible departure time.

Frequency-based schedules (e.g., one bus every 10 minutes) cause a lot of similarity during the day, although it often doesn’t match up between lines (e.g., said bus runs every 10 minutes before 6pm and every 20 minutes after, and we seek connections to a train that runs every 12 minutes before 8pm and every 30 minutes after). Moreover, this similarity also exists from one day to the next, and we need to consider all permissible departure dates.

The Frequency-Based Search for Public Transit [3] is carefully designed to find and take advantage of repetitive schedules while representing all one-off cases exactly. Comparing to the set-up from the original Transfer Patterns paper [1], the authors estimate a whopping 60x acceleration of finding transfer patterns from this part alone.

I am excited to see that the scalability questions originally left open by [1] have been answered so convincingly as part of this Focused Research Award. Please see the list of publications on the project’s website for more outcomes of this award. Besides more on transfer patterns, they contain a wealth of other results about routing on road networks, transit networks, and with combinations of travel modes.

References:

[1] Fast Routing in Very Large Public Transportation Networks Using Transfer Patterns
by H. Bast, E. Carlsson, A. Eigenwillig, R. Geisberger, C. Harrelson, V. Raychev and F. Viger
(ESA 2010). [doi]

[2] Scalable Transfer Patterns
by H. Bast, M. Hertel and S. Storandt (ALENEX 2016). [doi]

[3] Frequency-based Search for Public Transit
by H. Bast and S. Storandt (SIGSPATIAL 2014). [doi]

Google Research Awards: Fall 2015



We have just completed another round of the Google Research Awards, our annual open call for proposals on computer science and related topics including machine learning, speech recognition, natural language processing, and computational neuroscience. Our grants cover tuition for a graduate student and provide both faculty and students the opportunity to work directly with Google researchers and engineers.

This round we received 950 proposals, an increase of 18% over last round, covering 55 countries and over 350 universities. After expert reviews and committee discussions, we decided to fund 151 projects. This round we increased our support of machine learning projects increased by 71% from last round. Physical interfaces and immersive experiences, a relatively new area for the Google Research Awards, saw a 19% increase in the number of submitted proposals.

Congratulations to the well-deserving recipients of this round’s awards. If you are interested in applying for the next round (deadline is October 15), please visit our website for more information. Please note that we are now moving to an annual cycle.