What’s next for Google Fiber

Thumbnail

If you’ve been following Google Fiber, you know we’ve been pretty busy lately.

We’ve been steadily building out our network in all of our cities and surrounding regions, from North Carolina to Utah. We’re connecting customers in West Des Moines - making Iowa our first new state in five years - and will soon start construction in neighboring Des Moines. And of course, we recently announced that we’ll build a network in Mesa, Arizona.

And that's just the stuff we've been talking about. For the past several years, we’ve been even busier behind the scenes, focusing on our vision of providing the best possible gigabit internet service to our customers through relentless refinements to our service delivery and products.

At no time in Google Fiber’s history has that ever been more important than today. We’re living in a world that has finally caught up to the idea that high-speed, reliable internet — at gigabit speeds — is no longer a bold idea or a “nice to have.” The experience of the last couple of years has certainly taught us that.

As communities across the country look to expand access to gigabit internet, I’m happy to say that we’re ready to grow alongside them. Our team has spent many months traveling across the country, having conversations with cities looking for the best way to get better internet to their residents and business owners as quickly as possible.

So, yeah, it’s about to get even busier at Google Fiber. We’re talking to city leaders in the following states, with the objective of bringing Google Fiber’s fiber-to-the-home service to their communities:

These states will be the main focus for our growth for the next several years, along with continued expansion in our current metro areas.

In addition, we'd also love to talk to communities that want to build their own fiber networks. We’ve seen this model work effectively in Huntsville and in West Des Moines, and we’ll continue to look for ways to support similar efforts.

We're thrilled to be expanding our geographic reach once again — bringing better internet to more people in more places. Stay tuned in the coming months as we fill in this picture with more details about our new cities, even faster speeds and redefined customer service.

Posted by Dinni Jain, CEO




El Carro for Oracle: Data migration and improved backups

In May 2021, we released El Carro to make it easier to run Oracle databases on Kubernetes. Our following blog dove deeper into El Carro’s features, announced support for Oracle 19c, and detailed more flexibility for building database images. Today we’re excited to open source two new features to enhance El Carro and make it easier to manage your Oracle deployments: Data Migration and Point-in-Time Recovery. Automated Data Migration makes it much easier to re-platform to El Carro and Point-in-Time Recovery is a standard feature that database professionals come to expect because it enables you to drastically reduce RPO and worry less about backup frequency.

Data Migration

The Data Migration feature of El Carro enables users to migrate data from their existing database to an El Carro database running on Kubernetes. This functionality allows users to re-platform to El Carro with minimal disruption. The two most common pathways shown in the image below are to 1) modernize in place by simply migrating your database to Kubernetes so you can leverage the automation of El Carro and to 2) migrate to Kubernetes in the cloud.
Migrate your database in place or to the cloud
Migrate your database in place or to the cloud

Typical migration sources include AWS (RDS, EKS, EC2), Azure (AKS, VMs), GCP (GKE, GCE, BMS) or on-premises deployments. Typical migration targets include any Kubernetes installation on GCP (GKE), AWS (EKS), Azure (AKS), or on-premises.

The Data Migration feature offers two automated migration flows and two manual ones.

Category

Options

Migration Downtime

Complexity

Automated

Data Guard with physical standby

minimal

lowest

Data pump

long

low

Manual

RMAN-based migration

long

medium

Data pump

long

medium

  • Migration Downtime: required downtime to migrate source database into El Carro without data loss. Minimal means less downtime, long means more downtime.
  • Complexity: summarizes the difficulty and complexity of the migration journey.

Point-in-Time Recovery

Since its release, El Carro has provided users the ability to take backups and restore via RMAN or storage-based snapshots. Today we’re excited to release a new Point-in-Time Recovery feature to enhance El Carro’s backup functionality by automatically backing up archive redo logs to a GCS (Google Cloud Storage) bucket and allowing users to seamlessly restore their databases to any point in time within a user configurable window. This optional feature provides an additional layer of protection and enhanced restore granularity without interfering with manual backups or affecting database performance.

The diagram below contrasts the new versus old functionality. Previously, there were discrete restore points (shown in green on the top arrow) which represented limited opportunities to restore. With Point-in-Time Recovery, the entire arrow is green, meaning the recovery functionality is continuous, with restore points at any time along the green arrow.

With Point-in-Time Recovery you can restore to any point after the first backup
With Point-in-Time Recovery you can restore to any point after the first backup

Conclusion

As always, you can try the open source El Carro operator for free (Apache 2.0 license) wherever you run Oracle databases. Follow the quick start guide and try out provisioning of instances, databases, users. Import data via Data Pump, manage instance parameters, choose between different methods for backups, and try out a restore. Have a look at how we integrate with external logging and monitoring solutions. Reach out via our Google group and leave feedback for what features you would like to see next, or even create your own patch, issue or pull request on GitHub.

By Kyle Meggs, Product Manager and Ash Gbadamassi, Software Engineer – Cloud Databases

Beta Channel Update for ChromeOS

 The Beta channel is being updated to 105.0.5195.24 (Platform version: 14989.36.0) for most ChromeOS devices.

If you find new issues, please let us know one of the following ways

  1. File a bug
  2. Visit our Chrome OS communities:
    1. General: Chromebook Help Community
    2. Beta Specific: ChromeOS Beta Help Community
  3. Report an issue or send feedback on Chrome

Interested in switching channels? Find out how.

Matt Nelson,

Google ChromeOS

Stronger protection for sensitive Google Workspace account actions

What’s changing 

We’re introducing stronger safeguards for sensitive actions taken in your Google Workspace account. These apply to actions that, when done by hijackers, can have far reaching consequences for the account owner or the organization it belongs to. 


Google will evaluate the session attempting the action, and if it’s deemed risky, it will be challenged with a “Verify it’s You” prompt. Through a second and trusted factor, such as a 2-step verification code, users can confirm the validity of the action. For example, if a malicious actor gains access to your account and attempts to change the name on your account, the action will be blocked until the true account owner can verify that this was intentional. 


Note that this feature only supports users that use Google as their identity provider and actions taken within Google products. SAML users are not supported at this time. See below for more information. 



Who’s impacted 

Admins and end users 


Why it matters 

This added layer of security helps to intercept bad actors who have gained access to a user's account, further protecting their data and your organization's sensitive information. Additionally, these challenge attempts will be logged as an audit event allowing for further admin investigation. 

Additional details 

In the Admin console under Users > “UserName” > Security, admins can toggle login challenges OFF for ten minutes if a user gets stuck behind a "verify it's you prompt". We strongly recommend only using this option if contact with the user is credibly established, such as via a video call. 

Getting started 


Rollout pace 


Availability 

  • Available to all Google Workspace customers, as well as legacy G Suite Basic and Business customers 

Resources 

Improving data privacy with Client-side encryption for Google Meet

What’s changing 

We are adding Workspace Client-side encryption to Google Meet, giving customers increased control over their data. Meet already encrypts all of your data at rest and in transit between our facilities — client-side encryption gives users direct control of their encryption keys and the identity service that they choose to authenticate for those keys. Additionally, this guarantees that Google cannot access audio and video content under any circumstances and helps you meet regulatory compliance in many regions. 


Bringing Client-side encryption to Meet is another significant milestone in Google Workspace’s industry-leading encryption work, offering our users the highest degree of protection and control over their data. 


Workspace Client-side encryption for Meet will be available first on the web, with support for meeting rooms and mobile devices coming later. 


Important note: At this stage, only participants within your Workspace organization can be invited to client-side encrypted calls — guest access will be introduced in the future. 


Why it’s important 

Client-side encryption uses keys supplied by the customer to add another layer of encryption to video and audio, in addition to using the default encryption that Google Meet provides. This is used for calls that need an extra level of confidentiality and makes the media indecipherable even to Google. Those could be calls regarding sensitive intellectual property or when required for compliance in highly regulated industries. 


Additional details 

Notes about using client-side encryption: 
  • The organizer needs to join for the call to start when client-side encryption is turned on. If participants join early, they will need to wait for the organizer to join before communicating with others. 
  • Some functions that require server-side processing or parsing of call media will not work, e.g. cloud-based noise cancellation or closed captions. 
  • Client-side encryption does not support dialing-in/out 

Getting started 

  • Admins: An administrator needs to configure how Meet connects to a key service and identity provider before turning on client-side encryption for users. Learn more about configuring Client-side encryption here
  • End users: 
    • Organizing calls: 
      • In a calendar event with Meet video conferencing, navigate to Settings (cog-wheel icon) > Security and select “Add encryption”
      • Note: All participants must be invited to the call, either via the Calendar event or within the meeting. 
    • Participating in calls: 
      • Client-side encrypted meetings will start once the meeting organizer arrives — there are no other restrictions or changes for meeting participants. 

Rollout pace 

  • Users on supported Google Workspace editions can create Client-side encrypted calls. 

Availability 

  • Available to Google Workspace Enterprise Plus, Education Standard, and Education Plus customers hosting client-side encrypted calls 
  • Not available to Google Workspace Essentials, Business Starter, Business Standard, Business Plus, Enterprise Essentials, Education Fundamentals, The Teaching and Learning Upgrade, Frontline, and Nonprofits, as well as legacy G Suite Basic and Business customers 

Resources 

Bringing our first cloud region to Aotearoa New Zealand

Image: Google New Zealand's office in Auckland


We have now marked 15 years on the ground in New Zealand, and we’re continuing to increase our commitment to helping businesses, communities and educators prepare for a digital future. We know that cloud has a huge role to play in helping our customers harness the full potential of digital transformation, and today, I’m pleased to share that we’re building on our infrastructure investments by bringing our very first cloud region to New Zealand. 


From this, our Kiwi customers will benefit from high-performance, low-latency services and products, as well as  three zones to help protect against service disruptions. Importantly, it’ll give our customers the choice to keep their data onshore and retain data sovereignty if they wish.


Last year, we  announced a Dedicated Cloud Interconnect in Auckland to support local customers, and collectively, the Dedicated Interconnect locations and cloud region will deliver geographically distributed and secure infrastructure to customers across the country - which we know is especially important for those in regulated industries such as Financial Services and the Public Sector.  


We’re looking forward to partnering even more deeply with our customers, and delivering on our unique ability to bring enterprise and consumer ecosystems closer together - across Search, YouTube, Cloud - to deliver more powerful customer experiences, quickly and securely. 


What our Kiwi customers are saying 


Kiwi businesses from a range of industries are choosing Google Cloud as their trusted innovation partner, and we’ll continue to work with our customers to ensure the new cloud region can meet their evolving needs.


“Kami was born out of the digital native era, where in order to scale globally we needed a partner like Google Cloud who could support us on our ongoing innovation journey. We have since delivered an engaging and dependable experience for millions of teachers and students around the world, so it’s incredibly exciting to hear about the new region coming to New Zealand. This investment from Google Cloud will enable us to deliver services with lower latency to our Kiwi users, which will further elevate and optimise our free premium offering to all New Zealand schools,” said Jordan Thoms, Chief Technology Officer at Kami. 


“Our customers are at the heart of our business, and helping Kiwis find what they are looking for, faster than ever before, is our key priority. Our collaboration with Google Cloud has been pivotal in ensuring the stability and resilience of our infrastructure, allowing us to deliver world-class experiences to the 650,000 Kiwis that visit our site everyday. We welcome Google Cloud’s investment in New Zealand, and are looking forward to more opportunities to partner closely on our technology transformation journey,” said   Anders Skoe, CEO at Trade Me. 


“Digital transformation plays a key role in helping Vodafone deliver better customer experiences and connect all Kiwis. We welcome Google Cloud’s investment in New Zealand and look forward to working together to offer more enriched experiences for local businesses, and the communities we serve,” said Jason Paris, CEO at Vodafone New Zealand 


The New Zealand cloud region will be Google Cloud’s third region in Australasia, joining Sydney and Melbourne. Google Cloud has 11 cloud regions already in operation in JAPAC. You can find out more about global cloud infrastructure, including new and upcoming regions here


Post content

Efficient Video-Text Learning with Iterative Co-tokenization

Video is an ubiquitous source of media content that touches on many aspects of people’s day-to-day lives. Increasingly, real-world video applications, such as video captioning, video content analysis, and video question-answering (VideoQA), rely on models that can connect video content with text or natural language. VideoQA is particularly challenging, however, as it requires grasping both semantic information, such as objects in a scene, as well as temporal information, e.g., how things move and interact, both of which must be taken in the context of a natural-language question that holds specific intent. In addition, because videos have many frames, processing all of them to learn spatio-temporal information can be computationally expensive. Nonetheless, understanding all this information enables models to answer complex questions — for example, in the video below, a question about the second ingredient poured in the bowl requires identifying objects (the ingredients), actions (pouring), and temporal ordering (second).

An example input question for the VideoQA task “What is the second ingredient poured into the bowl?” which requires deeper understanding of both the visual and text inputs. The video is an example from the 50 Salads dataset, used under the Creative Commons license.

To address this, in “Video Question Answering with Iterative Video-Text Co-Tokenization”, we introduce a new approach to video-text learning called iterative co-tokenization, which is able to efficiently fuse spatial, temporal and language information for VideoQA. This approach is multi-stream, processing different scale videos with independent backbone models for each to produce video representations that capture different features, e.g., those of high spatial resolution or long temporal durations. The model then applies the co-tokenization module to learn efficient representations from fusing the video streams with the text. This model is highly efficient, using only 67 giga-FLOPs (GFLOPs), which is at least 50% fewer than previous approaches, while giving better performance than alternative state-of-the-art models.

Video-Text Iterative Co-tokenization
The main goal of the model is to produce features from both videos and text (i.e., the user question), jointly allowing their corresponding inputs to interact. A second goal is to do so in an efficient manner, which is highly important for videos since they contain tens to hundreds of frames as input.

The model learns to tokenize the joint video-language inputs into a smaller set of tokens that jointly and efficiently represent both modalities. When tokenizing, we use both modalities to produce a joint compact representation, which is fed to a transformer layer to produce the next level representation. A challenge here, which is also typical in cross-modal learning, is that often the video frame does not correspond directly to the associated text. We address this by adding two learnable linear layers which unify the visual and text feature dimensions before tokenization. This way we enable both video and text to condition how video tokens are learned.

Moreover, a single tokenization step does not allow for further interaction between the two modalities. For that, we use this new feature representation to interact with the video input features and produce another set of tokenized features, which are then fed into the next transformer layer. This iterative process allows the creation of new features, or tokens, which represent a continual refinement of the joint representation from both modalities. At the last step the features are input to a decoder that generates the text output.

As customarily done for VideoQA, we pre-train the model before fine-tuning it on the individual VideoQA datasets. In this work we use the videos automatically annotated with text based on speech recognition, using the HowTo100M dataset instead of pre-training on a large VideoQA dataset. This weaker pre-training data still enables our model to learn video-text features.

Visualization of the video-text iterative co-tokenization approach. Multi-stream video inputs, which are versions of the same video input (e.g., a high resolution, low frame-rate video and a low resolution, high frame-rate video), are efficiently fused together with the text input to produce a text-based answer by the decoder. Instead of processing the inputs directly, the video-text iterative co-tokenization model learns a reduced number of useful tokens from the fused video-language inputs. This process is done iteratively, allowing the current feature tokenization to affect the selection of tokens at the next iteration, thus refining the selection.

Efficient Video Question-Answering
We apply the video-language iterative co-tokenization algorithm to three main VideoQA benchmarks, MSRVTT-QA, MSVD-QA and IVQA, and demonstrate that this approach achieves better results than other state-of-the-art models, while having a modest size. Furthermore, iterative co-tokenization learning yields significant compute savings for video-text learning tasks. The method uses only 67 giga-FLOPs (GFLOPS), which is one sixth the 360 GFLOPS needed when using the popular 3D-ResNet video model jointly with text and is more than twice as efficient as the X3D model. This is all the while producing highly accurate results, outperforming state-of-the-art methods.

Comparison of our iterative co-tokenization approach to previous methods such as MERLOT and VQA-T, as well as, baselines using single ResNet-3D or X3D-XL.

Multi-stream Video Inputs
For VideoQA, or any of a number of other tasks that involve video inputs, we find that multi-stream input is important to more accurately answer questions about both spatial and temporal relationships. Our approach utilizes three video streams at different resolutions and frame-rates: a low-resolution high frame-rate, input video stream (with 32 frames-per-second and spatial resolution 64x64, which we denote as 32x64x64); a high-resolution, low frame-rate video (8x224x224); and one in-between (16x112x112). Despite the apparently more voluminous information to process with three streams, we obtain very efficient models due to the iterative co-tokenization approach. At the same time these additional streams allow extraction of the most pertinent information. For example, as shown in the figure below, questions related to a specific activity in time will produce higher activations in the smaller resolution but high frame-rate video input, whereas questions related to the general activity can be answered from the high resolution input with very few frames. Another benefit of this algorithm is that the tokenization changes depending on the questions asked.

Visualization of the attention maps learned per layer during the video-text co-tokenization. The attention maps differ depending on the question asked for the same video. For example, if the question is related to the general activity (e.g., surfing in the figure above), then the attention maps of the higher resolution low frame-rate inputs are more active and seem to consider more global information. Whereas if the question is more specific, e.g., asking about what happens after an event, the feature maps are more localized and tend to be active in the high frame-rate video input. Furthermore, we see that the low-resolution, high-frame rate video inputs provide more information related to activities in the video.

Conclusion
We present a new approach to video-language learning that focuses on joint learning across video-text modalities. We address the important and challenging task of video question-answering. Our approach is both highly efficient and accurate, outperforming current state-of-the-art models, despite being more efficient. Our approach results in modest model sizes and can gain further improvements with larger models and data. We hope this work provokes more research in vision-language learning to enable more seamless interaction with vision-based media.

Acknowledgements
This work is conducted by AJ Pierviovanni, Kairo Morton, Weicheng Kuo, Michael Ryoo and Anelia Angelova. We thank our collaborators in this research, and Soravit Changpinyo for valuable comments and suggestions, and Claire Cui for suggestions and support. We also thank Tom Small for visualizations.

Source: Google AI Blog


Efficient Video-Text Learning with Iterative Co-tokenization

Video is an ubiquitous source of media content that touches on many aspects of people’s day-to-day lives. Increasingly, real-world video applications, such as video captioning, video content analysis, and video question-answering (VideoQA), rely on models that can connect video content with text or natural language. VideoQA is particularly challenging, however, as it requires grasping both semantic information, such as objects in a scene, as well as temporal information, e.g., how things move and interact, both of which must be taken in the context of a natural-language question that holds specific intent. In addition, because videos have many frames, processing all of them to learn spatio-temporal information can be computationally expensive. Nonetheless, understanding all this information enables models to answer complex questions — for example, in the video below, a question about the second ingredient poured in the bowl requires identifying objects (the ingredients), actions (pouring), and temporal ordering (second).

An example input question for the VideoQA task “What is the second ingredient poured into the bowl?” which requires deeper understanding of both the visual and text inputs. The video is an example from the 50 Salads dataset, used under the Creative Commons license.

To address this, in “Video Question Answering with Iterative Video-Text Co-Tokenization”, we introduce a new approach to video-text learning called iterative co-tokenization, which is able to efficiently fuse spatial, temporal and language information for VideoQA. This approach is multi-stream, processing different scale videos with independent backbone models for each to produce video representations that capture different features, e.g., those of high spatial resolution or long temporal durations. The model then applies the co-tokenization module to learn efficient representations from fusing the video streams with the text. This model is highly efficient, using only 67 giga-FLOPs (GFLOPs), which is at least 50% fewer than previous approaches, while giving better performance than alternative state-of-the-art models.

Video-Text Iterative Co-tokenization
The main goal of the model is to produce features from both videos and text (i.e., the user question), jointly allowing their corresponding inputs to interact. A second goal is to do so in an efficient manner, which is highly important for videos since they contain tens to hundreds of frames as input.

The model learns to tokenize the joint video-language inputs into a smaller set of tokens that jointly and efficiently represent both modalities. When tokenizing, we use both modalities to produce a joint compact representation, which is fed to a transformer layer to produce the next level representation. A challenge here, which is also typical in cross-modal learning, is that often the video frame does not correspond directly to the associated text. We address this by adding two learnable linear layers which unify the visual and text feature dimensions before tokenization. This way we enable both video and text to condition how video tokens are learned.

Moreover, a single tokenization step does not allow for further interaction between the two modalities. For that, we use this new feature representation to interact with the video input features and produce another set of tokenized features, which are then fed into the next transformer layer. This iterative process allows the creation of new features, or tokens, which represent a continual refinement of the joint representation from both modalities. At the last step the features are input to a decoder that generates the text output.

As customarily done for VideoQA, we pre-train the model before fine-tuning it on the individual VideoQA datasets. In this work we use the videos automatically annotated with text based on speech recognition, using the HowTo100M dataset instead of pre-training on a large VideoQA dataset. This weaker pre-training data still enables our model to learn video-text features.

Visualization of the video-text iterative co-tokenization approach. Multi-stream video inputs, which are versions of the same video input (e.g., a high resolution, low frame-rate video and a low resolution, high frame-rate video), are efficiently fused together with the text input to produce a text-based answer by the decoder. Instead of processing the inputs directly, the video-text iterative co-tokenization model learns a reduced number of useful tokens from the fused video-language inputs. This process is done iteratively, allowing the current feature tokenization to affect the selection of tokens at the next iteration, thus refining the selection.

Efficient Video Question-Answering
We apply the video-language iterative co-tokenization algorithm to three main VideoQA benchmarks, MSRVTT-QA, MSVD-QA and IVQA, and demonstrate that this approach achieves better results than other state-of-the-art models, while having a modest size. Furthermore, iterative co-tokenization learning yields significant compute savings for video-text learning tasks. The method uses only 67 giga-FLOPs (GFLOPS), which is one sixth the 360 GFLOPS needed when using the popular 3D-ResNet video model jointly with text and is more than twice as efficient as the X3D model. This is all the while producing highly accurate results, outperforming state-of-the-art methods.

Comparison of our iterative co-tokenization approach to previous methods such as MERLOT and VQA-T, as well as, baselines using single ResNet-3D or X3D-XL.

Multi-stream Video Inputs
For VideoQA, or any of a number of other tasks that involve video inputs, we find that multi-stream input is important to more accurately answer questions about both spatial and temporal relationships. Our approach utilizes three video streams at different resolutions and frame-rates: a low-resolution high frame-rate, input video stream (with 32 frames-per-second and spatial resolution 64x64, which we denote as 32x64x64); a high-resolution, low frame-rate video (8x224x224); and one in-between (16x112x112). Despite the apparently more voluminous information to process with three streams, we obtain very efficient models due to the iterative co-tokenization approach. At the same time these additional streams allow extraction of the most pertinent information. For example, as shown in the figure below, questions related to a specific activity in time will produce higher activations in the smaller resolution but high frame-rate video input, whereas questions related to the general activity can be answered from the high resolution input with very few frames. Another benefit of this algorithm is that the tokenization changes depending on the questions asked.

Visualization of the attention maps learned per layer during the video-text co-tokenization. The attention maps differ depending on the question asked for the same video. For example, if the question is related to the general activity (e.g., surfing in the figure above), then the attention maps of the higher resolution low frame-rate inputs are more active and seem to consider more global information. Whereas if the question is more specific, e.g., asking about what happens after an event, the feature maps are more localized and tend to be active in the high frame-rate video input. Furthermore, we see that the low-resolution, high-frame rate video inputs provide more information related to activities in the video.

Conclusion
We present a new approach to video-language learning that focuses on joint learning across video-text modalities. We address the important and challenging task of video question-answering. Our approach is both highly efficient and accurate, outperforming current state-of-the-art models, despite being more efficient. Our approach results in modest model sizes and can gain further improvements with larger models and data. We hope this work provokes more research in vision-language learning to enable more seamless interaction with vision-based media.

Acknowledgements
This work is conducted by AJ Pierviovanni, Kairo Morton, Weicheng Kuo, Michael Ryoo and Anelia Angelova. We thank our collaborators in this research, and Soravit Changpinyo for valuable comments and suggestions, and Claire Cui for suggestions and support. We also thank Tom Small for visualizations.

Source: Google AI Blog


Google Dev Library Letters — 12th Issue

Posted by Garima Mehra, Program Manager

‘Google Dev Library Letters’ is curated to bring you some of the latest projects developed with Google tech submitted to Google Dev Library Platform. We hope this brings you the inspiration you need for your next project!


Android

Shape your Image: Circle, Rounded Square, or Cuts at the corner in Android by Sriyank Siddhartha

Using the MDC library, shape images in just a few lines of code by using ShapeableImageView.


Foso/Ktorfit by Jens Klingenberg

HTTP client / Kotlin Symbol Processor for Kotlin Multiplatform (Js, Jvm, Android, Native, iOS) using KSP and Ktor clients inspired by Retrofit.

Help kids learn to read with Read Along, now available on the web

Over the past three years, more than 30 million kids have read more than 120 million stories on Read Along. The app, which was first released as Bolo in India in 2019 and released globally as Read Along the following year, helps kids learn to read independently with the help of a reading assistant, Diya.

As kids read stories aloud, Diya listens and gives both correctional and encouraging feedback to help kids develop their reading skills. Read Along has been an Android app so far, and to make it accessible to more users, we have launched the public beta of the website version. The website contains the same magic: Diya’s help and hundreds of well illustrated stories across several languages.

With the web version, parents can let their children use Read Along on bigger screens by simply logging into a browser from laptops or PCs at readalong.google.com. Just like the Android app, all the speech recognition happens in the browser so children’s voice data remains private and we do not send it to any servers. You can learn more about data processing on the website version by reading our privacy policy.

The website also opens up new opportunities for teachers and education leaders around the world, who can use Read Along as a reading practice tool for students in schools. The product supports multiple popular browsers like Chrome, Firefox and Edge, with support for iOS and more browsers such as Safari coming soon. With the sign-in option, you can login from a unique account for each child on the same device. We recommend using Google Workspace for Education accounts in schools and Google accounts with Family Link at home.

In addition to the website launch, we are also adding some brand-new stories. We have partnered with two well-known YouTube content creators, ChuChu TV and USP Studios, to adapt some of their popular videos into a storybook format. Our partnership with Kutuki continues as we adapt their excellent collection of English and Hindi alphabet books and phonics books for early readers; those titles will be available later this year.

Reading is a critical skill to develop at a young age, and with Read Along Web, we are taking another step towards ensuring each kid has that option. Join us by visiting readalong.google.com and help kids learn to read with the power of their voice.