Off-Policy Estimation for Infinite-Horizon Reinforcement Learning
In conventional reinforcement learning (RL) settings, an agent interacts with an environment in an online fashion, meaning that it collects data from its interaction with the environment that is then used to inform changes to the policy governing its behavior. In contrast, offline RL refers to the setting where historical data are used to either learn good policies for acting in an environment, or to evaluate the performance of new policies. As RL is increasingly applied to crucial real-life problems like robotics and recommendation systems, evaluating new policies in the offline setting — estimating the expected reward of a target policy given historical data generated from actions that are based on a behavior policy — becomes more critical. However, despite its importance, evaluating the overall effectiveness of a target policy based on historical behavior policies can be a bit tricky, due to the difficulty in building high-fidelity simulators and also the mismatch in data distributions.
![]() |
Agent-environment interaction in reinforcement learning. At each step, an agent takes an action based on a policy, receives a reward and makes a transition to a new state. |
Black-Box Off-Policy Estimation for Infinite-Horizon Reinforcement Learning”, accepted at ICLR 2020, we propose a new approach to evaluate a given policy from offline data based on estimating the expected reward of the target policy as a weighted average of rewards in off-policy data. Since meaningful weights for the off-policy data are not known a priori, we propose a novel way of learning them. Unlike most of previous works, our method is particularly suitable when we plan to use historical data where trajectories are significantly lengthy or have infinite horizons. We empirically demonstrate the effectiveness of this approach using a number of classical control benchmarks.
Background
In general, one approach to solve the off-policy evaluation problem is to build a simulator that mimics the interaction of the agent with the environment, and then evaluate the target policy against the simulation. While the idea is natural, building a high-fidelity simulator for many domains can be extremely challenging, particularly those that involve human interactions.
An alternative approach is to use the weighted average of rewards from the off-policy data as an estimate of the average reward of the target policy. This approach can be more robust than using a simulator as it does not require modeling assumptions about real world dynamics. Indeed, most previous efforts using this approach have found success on short-horizon problems where the number of time steps (i.e., the length of data trajectory) is limited. However, as the horizon is extended, the variance in predictions made by most of the previous estimators often grows exponentially, necessitating novel solutions for long-horizon problems, and even more so in the extreme case of the infinite-horizon problem.
Our Approach for Infinite-Horizon RL
Our method of OPE leverages a well-known statistical technique called importance sampling through which one can estimate the properties of a particular distribution (e.g., the mean) from samples generated by another distribution. In particular, we estimate the long-term average reward of the target policy using the weighted average of rewards from the behavior policy data. The difficulty in this approach is how to choose the weights in order to remove the bias between the off-policy data distribution and that of the target policy while achieving the best estimate of the target policy’s average reward.
One important point is that if the weights are normalized to be positive and sum up to one, then they define a probability distribution over the set of possible states and actions of the agent. On the other hand, an individual policy defines a distribution on how often an agent visits a particular state or performs a particular action. In other words, it defines a unique distribution on states and actions. Under reasonable assumptions, this distribution does not change over time, and is called a stationary distribution. Since we are using importance sampling, we naturally want to optimize weights of the estimator such that the stationary distribution of the target policy matches the distribution induced by the weights of our estimator. However, the problem remains that we do not know the stationary distribution of the target policy, since we do not have any data generated by that policy.
One way to overcome this problem is to make sure that the distribution of weights satisfies properties that the target policy distribution has, without actually knowing what this distribution is. Luckily, we can take advantage of some mathematical "trickery" to solve this. While the full details are found in our paper, the upshot is that while we do not know the stationary distribution of the target policy (since we have no data collected from it) we can determine that distribution by solving an optimization problem involving a backward operator, which describes how an agent transitions from other states and actions to a particular state and action using probability distributions as both input and output. Once we are done, the weighted average of rewards from historic data gives us an estimate of the expected reward of the target policy.
Experimental Results
Using a toy environment called ModelWin that has three states and two actions, we compare our work with a previous state-of-the-art approach (labeled “IPS”), along with a naive method in which we simply average rewards from the behavior policy data. The figure below shows the log of the root-mean-square error (RMSE) with respect to the target policy reward as we change the number of steps collected by the behavior policy. The naive method suffers from a large bias and its error does not change even with more data collected by increasing the length of the episode. The estimation error of the IPS method decreases with increasing horizon length. On the other hand, the error exhibited by our method is small, even for short horizon length.
CartPole, Pendulum, and MountainCar.
![]() | ![]() |
![]() | ![]() |
![]() | ![]() |
Comparison of different methods on three environments: Cartpole, Pendulum, and Mountaincar. The left column shows the environments. The right column shows the log of the RMSE with respect to the target policy reward as the number of trajectories collected by the behavior policy changes. The results are based on 50 runs. |
Acknowledgements.
Special thanks to Qiang Liu and Denny Zhou for contributing to this project.
Source: Google AI Blog
Why Ana Corrales loves ending meetings early
Many people remember a time in their young lives when they loved or at least knew about scrunchies. The colorful hair accessories were a staple both on wrists and around ponytails. While Ana Corrales loved them, too, she took things a step further. “When I was 15, I ran a little business selling scrunchies,” she says. “I think that was the first time where I was like, oh wow, I really love this stuff!” Creating products and organizing her business felt like “freedom” to Ana. “I knew I was really happy when I was in that environment.”
Ana continues to find that same happiness today at Google, where she is Chief Operating Officer for consumer hardware, managing the detailed-oriented process of developing and delivering products like Pixel 4 phones and Nest Minis as efficiently as possible. More recently, she is also supporting many of Google’s community efforts in response to COVID-19.
At Google, Ana is in charge of not only managing numerous large-scale projects simultaneously, but also organizing her time as well as her team, who work in different offices around the world and have transitioned to working from home. Her team manages thousands of people all over the globe, and oversees an entire portfolio of products at once. All of this means she has to master her inbox, which can fill up with hundreds of new emails overnight. Since business never sleeps, here’s how she keeps her day-to-day life organized regardless of whether she’s in the Google office or at home.
Expect the unexpected.
Ana starts each day with a carefully curated calendar of meetings to attend with her team. But she knows it will end up a lot different than it looks. “In my typical day, I am 100 percent sure things will never happen the way they are scheduled,” Ana says. “I am not exaggerating, I don’t think there’s been a day in three years when it went as planned!” Because her schedule frequently gets interrupted, she’s learned to go with the flow and adapt to whatever last-minute issue comes up.
But when there’s something important that can’t be moved—-say, her child’s birthday party at school—-that’s when she doesn’t budge. “You have to be really disciplined, because otherwise your calendar ends up running you rather than you running it,” she says. This can be especially true when working from home where the separation between work and life can easily blur. “It’s important to really prioritize and create windows you can dedicate to each area, you need to honor your boundaries.”
Be strict about meeting times.
If you’re in a meeting with Ana, and it starts at noon, expect it to start precisely at noon—-not 12:05, or 12:06, or whenever the last person sits down. And if you get things done efficiently in that meeting, expect the team to get out early, rather than fill up the remaining time with other topics. That’s because a few minutes of free time can be crucial during the workday. “Everyone gets to walk slower to their next meeting, or breathe, or get a coffee. Or, if you're working from home, get in a quick workout or take the dog out.” Ana says. She also loves a quick 15-minute, one-on-one walk to chat (more recently done via phone due to COVID-19), which can accomplish more than you may think.
Get ruthless with your inbox.
When Ana wakes up each morning, she’s greeted with hundreds of new emails in her inbox. And those keep coming throughout the day and into the night given the global nature of the team. Since she’s in meetings all day, there’s no way she can read every incoming message, let alone respond to it. So she’s not afraid to hit delete. “I try to extinguish email as much as possible,” she says.
When a big product launch is coming up and she’s busier than usual, she has to prioritize, and won’t check emails that are related to different topics. “If it’s launch mode and I know it’s not related to launch, it’s out of my zone,” she says. There are some exceptions, including work related to her role as executive sponsor of HOLA, Google’s employee resource group for the Latino community and allies, as well as her work as a board member of Women@Google, a global network committed to empowering all women at Google.
Even during quieter times of the year, she still makes tackling inbox clutter a priority. She will rely on other members of her team to respond to an email, especially if it’s a topic more related to their expertise, and she’ll urge colleagues to not copy her on emails unless it’s necessary. And when she comes back from vacation, she deletes any email that's not urgent (in Ana's case, with help from an administrative assistant).
Take time away from your phone.
When Ana’s work day is over (whether she’s at the office, or more recently, in her home office), she jumps into a packed evening at home with her family. But she makes sure to take some time away from her busy schedule, whether it’s taking a walk with her husband or going swimming. “When I swim for 30 minutes, it’s great, because you can’t have your phone anywhere near you when you’re swimming,” she says. “I think that quietness really helps me.”
The Takeaway:
Start your meetings and video calls on time, always. But don’t be afraid to end your meetings early.
Getting too much irrelevant email? Just ask to be taken off the list.
Work still on your mind when you get home? Put your phone down and go for a swim—or a walk.
Source: The Official Google Blog
Check in on emotional well-being during distance learning
Without the consistent routine of the school day, reliable Wi-Fi or even a quiet place to work, students may be struggling to adjust to learning outside the classroom. While nothing can replace teachers’ in-person interactions with their students, the same digital tools used for teaching can be used to check in on a student’s well-being. Educators have shared creative ideas to check in on students through video calls, phone calls, or drive-by visits, which inspired these four ways to use technology to emotionally support your students.
Create emotional check-in opportunities
Even though distance learning tools allow teachers to see and talk to students, they can miss out on observing body language and behavior that indicate when a student needs help. Use Google Forms to reach out to students with an emotional-health questionnaire. Keep the question list short with a few high-level questions: How do you feel today? Why do you feel this way? What is your goal for today?
Provide a space for students to write longer responses. The responses, which are collected in a Google Sheet, can guide teachers in identifying which students need more support. Encouraging students to write out their emotions is a good tool for allowing students to become self-aware about what they’re feeling and why.
Organization is key to emotional well-being for students
Teachers are always helping students to keep classwork organized, follow schedules, and complete tasks. These self-management skills are even more critical to students’ emotional well-being now that they’re studying from home. Encourage students to create to-do lists in Google Docs, checklists in Google Keep or to schedule deadlines and meetings in Google Calendar. Teachers can use Google Classroom to send wellness reminders to students—everything from a quick message like “Take a quick movement break!” to sharing a mindfulness activity from YouTube.
Inspire students to express themselves
Teachers can often get a sense of a student’s emotional well-being from a quick conversation after the bell rings or checking in at the end of a challenging week. With distance learning, it’s just as important to create a comfortable and safe space for students to express emotions. Using Blogger, teachers can ask students to create a “reflection journal,” and write about their distance learning experiences. They can also invite students to record vlogs—or video blog posts-- and insert into a shared deck in Google Slides. This way students can not only share their thoughts, but also see one another, hear each other’s voices, and comment on messages. It’s important to note that YouTube and Blogger are both not core services of G Suite for Education, and are additional services that can be enabled by your administrator.
Additional tools on Chromebooks
There are many apps that work well on Chromebooks and are integrated with Google for Education tools that were specifically developed to support social emotional learning. With ClassDojo, students can share their daily learning on a digital portfolio, teachers can give students feedback aligned with school or classroom values, and families are brought into the classroom experience through teacher posted photos and videos. Classcraft blends students’ physical and virtual learning and reframes their progress in school as a game they play together throughout the year. And Wisdom - Kingdom of Anger empowers pre-K to 2nd grade students to practice social emotional learning through weekly lessons and hands-on activities. Students learn to identify, label, and communicate emotions to develop effective coping tools to healthily manage emotions.
More ideas for social emotional learning
If you’re looking for more ideas for improving student SEL, visit the website forCASEL (Collaborative for Academic, Social, and Emotional Learning), which offers resources such as webinars. You can watch our webinar on SEL, and visitTeach from Home, Google’s new hub of information and tools to help teachers during COVID-19.
Source: The Official Google Blog
Family and history inspired this Googler’s photo series
Editor’s Note: Welcome to Passion Projects, a series where we highlight Googler’s unexpected, fascinating and often inspiring interests outside of the office. In our latest installment, we’re focusing on a recent project Sarah Torney, a Googler from the Chrome Enterprise product marketing team, put together during her time sheltering at home in San Francisco. Over to Sarah...
I’m a fifth-generation San Franciscan and fourteenth-generation American. Recently, to fill my time as I shelter in place, I’ve been sifting through old family photos. I discovered a series of photos my great-grandfather took in the days after the 1906 San Francisco earthquake. My great-grandfather, Edward “Ned” Johnston Torney Sr., hit the streets with his camera to document the devastation caused by fires following the “The Great Quake” on April 18, 1906. He was able to continue shooting for days after, documenting the path of destruction caused by the fires.
After sharing my great-grandfather’s photos with close friends during a virtual happy hour, an idea hit me: I decided to recreate a “then and now’’ photo series, heading out to the same locations and street corners my great-grandfather had photographed (all while following social distancing guidelines, of course). Not only has San Francisco's shelter in place emptied the streets of many cars and people, similar to the impact of the fires, but the timing is also significant. It’s been 114 years to the month since the 1906 earthquake.

Present-day photo of Market at 6th facing west; a cable car on the streets of San Francisco in 1906.

Geary at Powell facing east towards the Palace Hotel on Market, in 1906 and 2020.

Hibernia Bank, Jones at McAllister, facing north, in 1906 and 2020.
Some things have clearly changed: New, modern buildings have replaced many of the ones that stood in the early twentieth century. Our methods are different, too; to document the crisis of 1906, my great-grandfather used the trendiest equipment available at the time, a Kodak “Premo” camera. I recreated his photos with my camera of choice, Pixel 3a. It’s interesting to see the aftermath of the 1906 earthquake juxtaposed against the ghost town that is downtown San Francisco amid the COVID-19 crisis today.
Working on this project has been a fascinating history lesson on San Francisco—and better yet, it’s surfaced family photos and stories that I will be able to remember and share for generations.
Special thanks and gratitude to Warren Finke, Richard Torney and Eric Torney for photo preservation and publishing permission.

Sarah Torney and her great-grandfather, Edward “Ned” Johnston Torney Sr.
Source: The Official Google Blog
Developer tools to debug WebView in Beta
Since 2014, Android WebView has paved the way as an updateable system component, delivering stability and performance improvements, modern web platform features, and security patches to Android apps and users. However, updates can be a double edged sword: as much as we strive for stability and backward compatibility, new crashes and breaking changes occasionally slip through. To solve these issues faster, today we're announcing WebView DevTools, a new set of on-device debugging tools to diagnose WebView-caused crashes and misbehaving web platform features.
For your convenience, WebView DevTools comes included as part of WebView itself. The easiest way to launch WebView Devtools is to try out WebView Beta. WebView's beta program is a way for app developers to get WebView several weeks before they reach users, for extra lead time to report compatibility bugs to our team. Starting with today's release (M83), WebView Beta includes a launcher icon for WebView DevTools. Just look for the blue and gray WebView gear icon to get started debugging WebView in your app.
Inspecting a crash in WebView DevTools.
No software is bug-free and loading web content can be challenging, so it's no surprise WebView crashes are a pain point for apps. Worse yet, these crashes are difficult to debug because WebView's Java and C++ stack traces are obfuscated (to minimize APK size for Android users). To help make these crashes more actionable, we're exposing first-class access to WebView's built-in crash reporter. Just open WebView DevTools, tap on "crashes," and you'll see a list of recent WebView-caused crashes from apps on your device. You can use this tool to see if the crash report has been uploaded to our servers, force-upload it if necessary, and subsequently file a bug. This ensures our team has all the information we need to swiftly resolve these crashes and ensure a smoother user experience in your app.
Using flags to highlight WebView usage in Android apps.
However, not all bugs cause crashes. A handful of past WebView releases have broken Android apps due to behavior changes caused by new features. While our team's policy is to roll back features which break compatibility, the chromium team launches several features for WebView in each release, and we often need time to identify the offending feature. WebView DevTools can help here too. Inspired by Google Chrome's chrome://flags tool, which enables compatibility testing with web platform features, we're offering app developers similar controls for experimental features. To get started, open WebView DevTools, tap on "flags," enable or disable any available features, then kill and restart the WebView-based app you're testing. Using WebView DevTools will help us work together to pin down the culprit so we can roll it back. We've also included flags for features slated for upcoming releases, so you can test compatibility even earlier by enabling these features on your test device.
We hope you find WebView DevTools helpful for reporting crashes and testing against new WebView features. Install WebView Beta today to get started with WebView DevTools, and check out the user guide for more tips and tricks.Source: Android Developers Blog
Helping our communities connect in a time of isolation
Finding ways to connect with our families, coworkers, classmates, and friends from a distance has become essential for most of us. Google Fiber is grateful to get to be a part of facilitating that connection for our customers, and we take that responsibility very seriously. We wanted to share how we’re dealing with the COVID-19 crisis as we continue to bring high-speed, high-bandwidth internet to our customers and to our communities to keep even more people connected.
Serving our customers
Internet connections have become the foundation on which we build all our other connections, from work and studies to information and entertainment. A reliable Internet connection with the speed and capacity to meet our ever-growing needs is no longer something that’s just nice to have — it’s a necessity.
That’s why Google Fiber is continuing construction, installations and network maintenance. While most of our team members are working from home, we've made numerous process and equipment changes to protect the health and safety of our field teams, whose jobs require them to be out in the community, connecting customers or maintaining our network.
These include, but are not limited to:
- Personal protective equipment for our field teams
- Regular handwashing and sanitizing
- Following social distancing practices
- Restrictions on certain types of construction methods
In addition to these enhanced safety measures to protect our customers and crews, we’re coordinating with local governments and engaging communities within each city we serve to make sure we're charting the right local approach
For the most up-to-date information on our health and safety precautions, please visit our help center. And although our retail Fiber Spaces are closed, we are standing by 24/7 to help with anything you need to make your internet work for you, so please reach out to our team if you need anything.
(((anchor)))
Serving our communities
Advancing digital inclusion is a central tenet of Google Fiber. In each of our communities, we partner with local organizations doing great work to help build digital literacy and increase internet access for residents.
Over the past two years, Google Fiber has supported nearly 1 million digital literacy training hours, helped provide more than 10,000 free or affordable devices to residents in need, connected 275,000 people to STEM programs, and empowered 7,000 aspiring entrepreneurs with training programs. In fact, in the last two years, our partner organizations were able to reach over 1.3 million unique participants across the country.
But there is much more work to be done. COVID-19 has sharpened that need, drawing clear lines between the digital haves and have-nots. Google Fiber and Webpass are investing in efforts across each of our cities to help more people connect during this difficult time, supporting organizations to help them meet the enormous technology demands for students and workers.
In several cities, including Austin, San Antonio, San Francisco, Irvine, Provo, Salt Lake City, and Chicago, we’ve partnered with the local public school district or their foundation to help students and their families as they adjust to schooling from home — targeting those families most impacted by the digital divide. In other places, we’ve also funded the efforts of incredible organizations to better serve their communities’ increased needs and help provide devices and hot spots to their clients:
- Connecting for Good in Kansas City
- PCs for People in Denver
- Craft Lake City in Salt Lake City
- Nashville Public Library Foundation
- United Way of Utah County in Provo
- Boys and Girls Club of North Alabama in Huntsville
- Latinitas, E4 Youth, and AVANCE in Austin
We don’t know what’s going to happen next. Things are changing on a daily basis, and, like all of you, we’re working to meet the challenges and opportunities of this new normal. We do know that what you need and want from your internet — speed, reliability, great customer service — isn’t changing. We want to help you with that goal, both to help meet today’s challenges and to help take advantage of the opportunities we hope tomorrow presents.
~~~~
anchor: communities
category: company_news
Source: Google Fiber Blog
Announcing the 2020 first quarter Google Open Source Peer Bonus winners
The Google Open Source Peer Bonus rewards external open source contributors nominated by Googlers for their exceptional contributions to open source. Historically, the program was primarily focused on rewarding developers. Over the years the program has evolved—rewarding not just software engineers but all types of contributors—including technical writers, user experience and graphic designers, community managers and marketers, mentors and educators, ops and security experts.
In support of diversity, equity and inclusion initiatives worldwide, we had decided to devote this cycle to amazing women in open source, especially since it coincided with celebrating International Women’s Day on March 8. We are very excited and pleased to share the following statistics with you.
We have 56 winners this cycle representing 17 countries all over the world: Australia, Belgium, Canada, Estonia, France, Germany, India, Italy, Japan, Republic of Korea, Netherlands, Russia, Sweden, Switzerland, Ukraine, United Kingdom, and the United States.
Even though the cycle was open to ALL contributors, the number of female nominees went up from 8% to 25% in comparison to the previous cycle. That’s an amazing number celebrating amazing women!
Also, we are very pleased to see the number of docs contributors increase from 7% to 15%. Documentation is the #1 factor for project adoption, so this shift is very important and encouraging. To strengthen this trend and emphasize the importance of documentation in open source, the next cycle will be devoted (but not limited!) to docs contributors.
Below is the list of current winners who gave us permission to thank them publicly:
By Maria Tabak, Google Open Source
Source: Google Open Source Blog
Announcing the 2020 first quarter Google Open Source Peer Bonus winners
The Google Open Source Peer Bonus rewards external open source contributors nominated by Googlers for their exceptional contributions to open source. Historically, the program was primarily focused on rewarding developers. Over the years the program has evolved—rewarding not just software engineers but all types of contributors—including technical writers, user experience and graphic designers, community managers and marketers, mentors and educators, ops and security experts.
In support of diversity, equity and inclusion initiatives worldwide, we had decided to devote this cycle to amazing women in open source, especially since it coincided with celebrating International Women’s Day on March 8. We are very excited and pleased to share the following statistics with you.
We have 56 winners this cycle representing 17 countries all over the world: Australia, Belgium, Canada, Estonia, France, Germany, India, Italy, Japan, Republic of Korea, Netherlands, Russia, Sweden, Switzerland, Ukraine, United Kingdom, and the United States.
Even though the cycle was open to ALL contributors, the number of female nominees went up from 8% to 25% in comparison to the previous cycle. That’s an amazing number celebrating amazing women!
Also, we are very pleased to see the number of docs contributors increase from 7% to 15%. Documentation is the #1 factor for project adoption, so this shift is very important and encouraging. To strengthen this trend and emphasize the importance of documentation in open source, the next cycle will be devoted (but not limited!) to docs contributors.
Below is the list of current winners who gave us permission to thank them publicly:
By Maria Tabak, Google Open Source
Source: Google Open Source Blog
Two weeks of Doodles to thank essential workers
Essential workers keeping our world safe and running during this global pandemic deserve a standing ovation. To show our appreciation, we created a two week Google Doodle series to honor and recognize all who have stepped up in unprecedented ways— including putting their own lives at risk—to provide services that keep our society moving forward.
Google Doodles usually take months (sometimes years!) of planning and development. This one came together in a matter of days. Though in the past we’ve moved quickly to create a Doodle in response to a major world—or even outer world—event, this is our first real-time Doodle series focusing on one theme.
In this series of animated GIF Doodles, the big "G" represents communities around the world sending our love to the other letters, which represent the essential workers. Fun fact: we purposely used the first and last letters of our logo to ensure characters in every Doodle were practicing social distancing.
Beyond the efforts of essential workers, “help” has become more than a concept, a desire or an unusual action. Help has become part of our day-to-day lives. We notice it in small actions—like going to the supermarket for your elderly neighbor, or donating homemade face masks to healthcare workers—and in what people are searching for around the world. One thing has become clear: people want to help.
As with all of our Doodles, we hope the series allows for helpers everywhere to feel seen, heard, and valued and for everyone to remember there will be a light at the end of what could feel like a long tunnel. Because where there’s help, there’s hope.