AI-Powered Fuzzing: Breaking the Bug Hunting Barrier



Since 2016, OSS-Fuzz has been at the forefront of automated vulnerability discovery for open source projects. Vulnerability discovery is an important part of keeping software supply chains secure, so our team is constantly working to improve OSS-Fuzz. For the last few months, we’ve tested whether we could boost OSS-Fuzz’s performance using Google’s Large Language Models (LLM). 




This blog post shares our experience of successfully applying the generative power of LLMs to improve the automated vulnerability detection technique known as fuzz testing (“fuzzing”). By using LLMs, we’re able to increase the code coverage for critical projects using our OSS-Fuzz service without manually writing additional code. Using LLMs is a promising new way to scale security improvements across the over 1,000 projects currently fuzzed by OSS-Fuzz and to remove barriers to future projects adopting fuzzing. 




LLM-aided fuzzing

We created the OSS-Fuzz service to help open source developers find bugs in their code at scale—especially bugs that indicate security vulnerabilities. After more than six years of running OSS-Fuzz, we now support over 1,000 open source projects with continuous fuzzing, free of charge. As the Heartbleed vulnerability showed us, bugs that could be easily found with automated fuzzing can have devastating effects. For most open source developers, setting up their own fuzzing solution could cost time and resources. With OSS-Fuzz, developers are able to integrate their project for free, automated bug discovery at scale.  




Since 2016, we’ve found and verified a fix for over 10,000 security vulnerabilities. We also believe that OSS-Fuzz could likely find even more bugs with increased code coverage. The fuzzing service covers only around 30% of an open source project’s code on average, meaning that a large portion of our users’ code remains untouched by fuzzing. Recent research suggests that the most effective way to increase this is by adding additional fuzz targets for every project—one of the few parts of the fuzzing workflow that isn’t yet automated.




When an open source project onboards to OSS-Fuzz, maintainers make an initial time investment to integrate their projects into the infrastructure and then add fuzz targets. The fuzz targets are functions that use randomized input to test the targeted code. Writing fuzz targets is a project-specific and manual process that is similar to writing unit tests. The ongoing security benefits from fuzzing make this initial investment of time worth it for maintainers, but writing a comprehensive set of fuzz targets is an tough expectation for project maintainers, who are often volunteers. 




But what if LLMs could write additional fuzz targets for maintainers?



“Hey LLM, fuzz this project for me”

To discover whether an LLM could successfully write new fuzz targets, we built an evaluation framework that connects OSS-Fuzz to the LLM, conducts the experiment, and evaluates the results. The steps look like this:  




  1. OSS-Fuzz’s Fuzz Introspector tool identifies an under-fuzzed, high-potential portion of the sample project’s code and passes the code to the evaluation framework. 

  2. The evaluation framework creates a prompt that the LLM will use to write the new fuzz target. The prompt includes project-specific information.

  3. The evaluation framework takes the fuzz target generated by the LLM and runs the new target. 

  4. The evaluation framework observes the run for any change in code coverage.

  5. In the event that the fuzz target fails to compile, the evaluation framework prompts the LLM to write a revised fuzz target that addresses the compilation errors.





Experiment overview: The experiment pictured above is a fully automated process, from identifying target code to evaluating the change in code coverage.






At first, the code generated from our prompts wouldn’t compile, however after several rounds of  prompt engineering and trying out the new fuzz targets, we saw projects gain between 1.5% and 31% code coverage. One of our sample projects, tinyxml2, went from 38% line coverage to 69% without any interventions from our team. The case of tinyxml2 taught us: when LLM-generated fuzz targets are added, tinyxml2 has the majority of its code covered. 









Example fuzz targets for tinyxml2: Each of the five fuzz targets shown is associated with a different part of the code and adds to the overall coverage improvement. 






To replicate tinyxml2’s results manually would have required at least a day’s worth of work—which would mean several years of work to manually cover all OSS-Fuzz projects. Given tinyxml2’s promising results, we want to implement them in production and to extend similar, automatic coverage to other OSS-Fuzz projects. 




Additionally, in the OpenSSL project, our LLM was able to automatically generate a working target that rediscovered CVE-2022-3602, which was in an area of code that previously did not have fuzzing coverage. Though this is not a new vulnerability, it suggests that as code coverage increases, we will find more vulnerabilities that are currently missed by fuzzing. 




Learn more about our results through our example prompts and outputs or through our experiment report. 




The goal: fully automated fuzzing

In the next few months, we’ll open source our evaluation framework to allow researchers to test their own automatic fuzz target generation. We’ll continue to optimize our use of LLMs for fuzzing target generation through more model finetuning, prompt engineering, and improvements to our infrastructure. We’re also collaborating closely with the Assured OSS team on this research in order to secure even more open source software used by Google Cloud customers.   




Our longer term goals include:



  • Adding LLM fuzz target generation as a fully integrated feature in OSS-Fuzz, with continuous generation of new targets for OSS-fuzz projects and zero manual involvement.

  • Extending support from C/C++ projects to additional language ecosystems, like Python and Java. 

  • Automating the process of onboarding a project into OSS-Fuzz to eliminate any need to write even initial fuzz targets. 




We’re working towards a future of personalized vulnerability detection with little manual effort from developers. With the addition of LLM generated fuzz targets, OSS-Fuzz can help improve open source security for everyone. 

AI-Powered Fuzzing: Breaking the Bug Hunting Barrier



Since 2016, OSS-Fuzz has been at the forefront of automated vulnerability discovery for open source projects. Vulnerability discovery is an important part of keeping software supply chains secure, so our team is constantly working to improve OSS-Fuzz. For the last few months, we’ve tested whether we could boost OSS-Fuzz’s performance using Google’s Large Language Models (LLM). 




This blog post shares our experience of successfully applying the generative power of LLMs to improve the automated vulnerability detection technique known as fuzz testing (“fuzzing”). By using LLMs, we’re able to increase the code coverage for critical projects using our OSS-Fuzz service without manually writing additional code. Using LLMs is a promising new way to scale security improvements across the over 1,000 projects currently fuzzed by OSS-Fuzz and to remove barriers to future projects adopting fuzzing. 




LLM-aided fuzzing

We created the OSS-Fuzz service to help open source developers find bugs in their code at scale—especially bugs that indicate security vulnerabilities. After more than six years of running OSS-Fuzz, we now support over 1,000 open source projects with continuous fuzzing, free of charge. As the Heartbleed vulnerability showed us, bugs that could be easily found with automated fuzzing can have devastating effects. For most open source developers, setting up their own fuzzing solution could cost time and resources. With OSS-Fuzz, developers are able to integrate their project for free, automated bug discovery at scale.  




Since 2016, we’ve found and verified a fix for over 10,000 security vulnerabilities. We also believe that OSS-Fuzz could likely find even more bugs with increased code coverage. The fuzzing service covers only around 30% of an open source project’s code on average, meaning that a large portion of our users’ code remains untouched by fuzzing. Recent research suggests that the most effective way to increase this is by adding additional fuzz targets for every project—one of the few parts of the fuzzing workflow that isn’t yet automated.




When an open source project onboards to OSS-Fuzz, maintainers make an initial time investment to integrate their projects into the infrastructure and then add fuzz targets. The fuzz targets are functions that use randomized input to test the targeted code. Writing fuzz targets is a project-specific and manual process that is similar to writing unit tests. The ongoing security benefits from fuzzing make this initial investment of time worth it for maintainers, but writing a comprehensive set of fuzz targets is an tough expectation for project maintainers, who are often volunteers. 




But what if LLMs could write additional fuzz targets for maintainers?



“Hey LLM, fuzz this project for me”

To discover whether an LLM could successfully write new fuzz targets, we built an evaluation framework that connects OSS-Fuzz to the LLM, conducts the experiment, and evaluates the results. The steps look like this:  




  1. OSS-Fuzz’s Fuzz Introspector tool identifies an under-fuzzed, high-potential portion of the sample project’s code and passes the code to the evaluation framework. 

  2. The evaluation framework creates a prompt that the LLM will use to write the new fuzz target. The prompt includes project-specific information.

  3. The evaluation framework takes the fuzz target generated by the LLM and runs the new target. 

  4. The evaluation framework observes the run for any change in code coverage.

  5. In the event that the fuzz target fails to compile, the evaluation framework prompts the LLM to write a revised fuzz target that addresses the compilation errors.





Experiment overview: The experiment pictured above is a fully automated process, from identifying target code to evaluating the change in code coverage.






At first, the code generated from our prompts wouldn’t compile, however after several rounds of  prompt engineering and trying out the new fuzz targets, we saw projects gain between 1.5% and 31% code coverage. One of our sample projects, tinyxml2, went from 38% line coverage to 69% without any interventions from our team. The case of tinyxml2 taught us: when LLM-generated fuzz targets are added, tinyxml2 has the majority of its code covered. 









Example fuzz targets for tinyxml2: Each of the five fuzz targets shown is associated with a different part of the code and adds to the overall coverage improvement. 






To replicate tinyxml2’s results manually would have required at least a day’s worth of work—which would mean several years of work to manually cover all OSS-Fuzz projects. Given tinyxml2’s promising results, we want to implement them in production and to extend similar, automatic coverage to other OSS-Fuzz projects. 




Additionally, in the OpenSSL project, our LLM was able to automatically generate a working target that rediscovered CVE-2022-3602, which was in an area of code that previously did not have fuzzing coverage. Though this is not a new vulnerability, it suggests that as code coverage increases, we will find more vulnerabilities that are currently missed by fuzzing. 




Learn more about our results through our example prompts and outputs or through our experiment report. 




The goal: fully automated fuzzing

In the next few months, we’ll open source our evaluation framework to allow researchers to test their own automatic fuzz target generation. We’ll continue to optimize our use of LLMs for fuzzing target generation through more model finetuning, prompt engineering, and improvements to our infrastructure. We’re also collaborating closely with the Assured OSS team on this research in order to secure even more open source software used by Google Cloud customers.   




Our longer term goals include:



  • Adding LLM fuzz target generation as a fully integrated feature in OSS-Fuzz, with continuous generation of new targets for OSS-fuzz projects and zero manual involvement.

  • Extending support from C/C++ projects to additional language ecosystems, like Python and Java. 

  • Automating the process of onboarding a project into OSS-Fuzz to eliminate any need to write even initial fuzz targets. 




We’re working towards a future of personalized vulnerability detection with little manual effort from developers. With the addition of LLM generated fuzz targets, OSS-Fuzz can help improve open source security for everyone. 

Privacy Sandbox Developer Preview 9: Custom Audience Delegation

Posted by Jon Markoff, Privacy Sandbox Developer Relations

Earlier this year we released the first Privacy Sandbox Beta on Android, with the goal of bringing real-world testing of our private advertising solutions to users' devices.

Since then, we’ve launched several additional Privacy Sandbox releases, each with new features and improvements, in Developer Preview and Beta. This is part of our ongoing commitment to helping developers create privacy-focused apps and tools that keep content open and accessible to everyone. Your feedback has helped us refine and improve these releases and new design proposals, and is greatly appreciated.

Today, we’re announcing Developer Preview 9 for the Privacy Sandbox on Android, including:

  • Protected Audience API: The first release of Custom Audience Delegation, which supports the creation of custom audiences for buyers that do not have an on-device SDK presence. Bidding and Auction services integrations are available to support more complex ad auctions.
  • Attribution Reporting API: Enrollment is no longer required for development and testing purposes. Improvements to debug reporting include supporting additional verbose debug report and app-to-web debug reports.
  • SDK Runtime: With some limitations, SDK Runtime can now launch intents to other apps, and can bind to an allowlist of services.
  • For the full list of released features, see the release notes.

Alongside Developer Preview 9, we’re also announcing Project Flight: a collection of sample apps that demonstrate how the Privacy Sandbox APIs can be used together in end-to-end user journeys. Project Flight includes the following:

  • Advertiser app, to demonstrate a conversion by booking a travel experience
  • Publisher app, to show a relevant ad and register an event
  • SSP library, to demonstrate running ad selection and registering a source
  • MMP library, to demonstrate joining a custom audience and registering a trigger
  • A mock server backend as a companion to the Protected Audience and Attribution Reporting APIs using Firebase

As with all of our releases, we highly encourage developers to share feedback as they continue their journey into the Privacy Sandbox on Android. To get started, read the instructions to set up the SDK and system images on an emulator or supported Pixel device.

For more information on the Privacy Sandbox on Android, visit the developer site, and sign up for our newsletter to receive regular updates.

Programmatically read and write working locations with the Calendar API, now generally available

What’s changing 

Previously available in beta through our Developer Preview Program, the ability to read and write a user’s working location using the Calendar API is now generally available. 


Reading a user’s working location helps better understand the flow and volume of people through physical campuses. Using this information, you can better adapt on-site resources and update other third-party surfaces, such as hot desk booking tools. 


Writing a user’s working location makes it easier to update a user's working location in their calendar based on when and where they’ve booked a hot desk, or if they’ve scheduled a trip via a travel booking tool, and more. 


Getting started 



Rollout pace


Availability 

All developers can use the API, however the working location feature is only available for eligible Workspace editions: 
  • Available to Google Workspace Business Standard, Business Plus, Enterprise Standard, Enterprise Plus, Education Fundamentals, Education Plus, Education Standard, the Teaching and Learning Upgrade and Nonprofits customers, as well as legacy G Suite Business customers 
  • Not available to Google Workspace Essentials, Business Starter, Enterprise Essentials, Frontline, G Suite Basic customers 

Resources 

Chrome Beta for Desktop Update

The Chrome team is excited to announce the promotion of Chrome 117 to the Beta channel for Windows, Mac and Linux. Chrome 117.0.5938.11 contains our usual under-the-hood performance and stability tweaks, but there are also some cool new features to explore - please head to the Chromium blog to learn more!

A partial list of changes is available in the Git log. Interested in switching release channels? Find out how. If you find a new issue, please let us know by filing a bug. The community help forum is also a great place to reach out for help or learn about common issues.

Prudhvi Bommana
Google Chrome

Toward Quantum Resilient Security Keys



As part of our effort to deploy quantum resistant cryptography, we are happy to announce the release of the first quantum resilient FIDO2 security key implementation as part of OpenSK, our open source security key firmware. This open-source hardware optimized implementation uses a novel ECC/Dilithium hybrid signature schema that benefits from the security of ECC against standard attacks and Dilithium’s resilience against quantum attacks. This schema was co-developed in partnership with the ETH Zürich and won the ACNS secure cryptographic implementation workshop best paper.




Quantum processor

Quantum processor




As progress toward practical quantum computers is accelerating, preparing for their advent is becoming a more pressing issue as time passes. In particular, standard public key cryptography which was designed to protect against traditional computers, will not be able to withstand quantum attacks. Fortunately, with the recent standardization of public key quantum resilient cryptography including the Dilithium algorithm, we now have a clear path to secure security keys against quantum attacks.




While quantum attacks are still in the distant future, deploying cryptography at Internet scale is a massive undertaking which is why doing it as early as possible is vital. In particular, for security keys this process is expected to be gradual as users will have to acquire new ones once FIDO has standardized post quantum cryptography resilient cryptography and this new standard is supported by major browser vendors.



Hybrid signature scheme

Hybrid signature: Strong nesting with classical and PQC scheme




Our proposed implementation relies on a hybrid approach that combines the battle tested ECDSA signature algorithm and the recently standardized quantum resistant signature algorithm, Dilithium. In collaboration with ETH, we developed this novel hybrid signature schema that offers the best of both worlds. Relying on a hybrid signature is critical as the security of Dilithium and other recently standardized quantum resistant algorithms haven’t yet stood the test of time and recent attacks on Rainbow (another quantum resilient algorithm) demonstrate the need for caution. This cautiousness is particularly warranted for security keys as most can’t be upgraded – although we are working toward it for OpenSK. The hybrid approach is also used in other post-quantum efforts like Chrome’s support for TLS.




On the technical side, a large challenge was to create a Dilithium implementation small enough to run on security keys’ constrained hardware. Through careful optimization, we were able to develop a Rust memory optimized implementation that only required 20 KB of memory, which was sufficiently small enough. We also spent time ensuring that our implementation signature speed was well within the expected security keys specification. That said, we believe improving signature speed further by leveraging hardware acceleration would allow for keys to be more responsive.




Moving forward, we are hoping  to see this implementation (or a variant of it), being standardized as part of the FIDO2 key specification and supported by major web browsers so that users' credentials can be protected against quantum attacks. If you are interested in testing this algorithm or contributing to security key research, head to our open source implementation OpenSK.

Developers Share How They Build with Google Tools and Bard

Posted by Lyanne Alfaro, DevRel Program Manager, Google Developer Studio

Developer Journey is a monthly series highlighting diverse and global developers sharing relatable challenges, opportunities, and wins in their journey. Every month, we will spotlight developers around the world, the Google tools they leverage, and the kind of products they are building.

This month, we spoke with several Google Developer Experts to learn more about their path.


Eslam Medhat Fathy

Headshot of Eslam Medhat Fathy smiling
Giza, Egypt
Google Developer Expert, Firebase
Technical and Design Mentor at Google for Startups Accelerator Program
Google Developer Group Organizer
Senior Flutter Developer at Sarmad

What Google tools have you used to build?

I have used many tools like Firebase, Flutter, Android, Kotlin, Dart, Assistant, and Bard, of course.

Which tool has been your favorite to use? Why?

My favorite tool is Firebase, because of how easy it is to set up and use. It also provides a serverless architecture, easy-to-use services, real-time synchronization, and cross-platform support, among other features. These benefits can help you build robust and scalable applications quickly and easily.

Tell us about something you've built in the past using Google tools.

I have more than 10 apps in the store created in Android native with Kotlin, Flutter and Dart. A few examples are Rehlatech and AzkarApp.

What will you create with Google Bard?

I use Bard every day for generating, debugging, explaining, learning code, and more.

What advice would you give someone starting in their developer journey?

I advise everyone about to start their developer journey to:

  • Start with the basics: It's important to have a solid foundation in programming fundamentals. Learn the basics of a programming language, such as syntax, data types, control structures, and functions.
  • Practice coding: Practice makes perfect. The more you practice coding, the better you'll become. Start with small projects and gradually move on to more complex projects.
  • Learn from others: Join online communities, attend meetups, and participate in forums. Learning from others can help you improve your skills.
  • Read the documentation: Documentation is your friend. Read the documentation of the programming language or tools you're using. It can help you understand how to use them properly and solve problems.
  • Be patient: Learning to code takes time and patience. Don't get discouraged if you don't understand something right away. Keep practicing and asking questions.
  • Build projects: Building projects is a great way to learn new skills and apply what you've learned. Start small and gradually build more complex projects.
  • Stay up-to-date: Technology is constantly evolving. Stay up-to-date on the latest trends and updates in the programming world. Attend conferences, read blogs, and follow experts on social media.
  • Have fun: Coding should be fun. Don't take it too seriously and enjoy the process of learning and building new things.

Carmen Ansio

Headshot of Carmen Ansio smiling
Barcelona, Spain
Google Developer Expert, Firebase
Google Developer Expert, Web Technologies
UX Engineer

What Google tools have you used to build?

I have used various Google tools to build projects including Angular, Dart, and Firebase.

Which tool has been your favorite to use? Why?

My favorite tool has been Chrome DevTools because of its versatile suite of debugging tools and its network panel, which I often use to optimize web performance. DevTools is an essential part of my daily development process as it allows me to test, experiment, and debug code directly in the browser.

What will you create with Google Bard?

With Google Bard, I plan to develop a Figma plugin for creating dynamic design prototypes. Leveraging the natural language processing and understanding capabilities of Google Bard, the plugin will allow designers to quickly convert textual descriptions into visual design elements. This can significantly streamline the design process, bridging the gap between ideation and visual representation, while enabling non-designers to contribute effectively to the design process.

What advice would you give someone starting in their developer journey?

For those beginning their developer journey, my advice would be: Always stay curious and never stop learning. Technology evolves quickly, and it's important to be adaptable. Also, never undervalue the importance of good UI/UX design. It's not only about writing code, but also about creating a great user experience.


Stéphanie Walter

Headshot of Stéphanie Walter smiling
Luxembourg, Luxembourg
Google Developer Expert, Web Technologies
Women Techmakers
UX Researcher & Designer

What Google tools have you used to build?

The main tools I use are the Chrome inspect tool and Lighthouse. I’m using Material UI a lot and the M3 design kit for Figma is a great time saver.

Which tool has been your favorite to use? Why?

Performance is important where I work, so Lighthouse is definitely in my favorite list. The function to get a quick report, which also shows main accessibility issues, is very nice. Of course it won’t show all accessibility issues, but it’s a good place to start improving a website.

Please share with us about something you’ve built in the past using Google tools.

Both Lighthouse and the Chrome inspect tool are lifesavers when building websites like my blog. There’s still improvement to be made on some pages on performance, but it’s getting there.

What will you create with Google Bard?

To be honest, it only has been recently made available for my country, so I haven’t had time to really play with it. For now, I use AI chatbots as glorified assistants. English isn’t my native language, so asking such tools to help translate some things and improve grammar in some sentences is very helpful. I might use it to help me with sharing knowledge: to improve my articles, conference slides, and training material.

What advice would you give someone starting in their developer journey?

Start with a project you are passionate about, something that would help you, or something you wish existed. It doesn’t have to be perfect. It also doesn’t have to be something that will bring money. And remember, you also don’t have to finish it. It’s nice if you can share it with peers to get feedback but you can also share unfinished projects. It’s all about learning while working on something that you like.But remember to also step away from the computer. Developing should not be your whole life - otherwise, you will burn out really fast.

STUDY: Socially aware temporally causal decoder recommender systems

Reading has many benefits for young students, such as better linguistic and life skills, and reading for pleasure has been shown to correlate with academic success. Furthermore students have reported improved emotional wellbeing from reading, as well as better general knowledge and better understanding of other cultures. With the vast amount of reading material both online and off, finding age-appropriate, relevant and engaging content can be a challenging task, but helping students do so is a necessary step to engage them in reading. Effective recommendations that present students with relevant reading material helps keep students reading, and this is where machine learning (ML) can help.

ML has been widely used in building recommender systems for various types of digital content, ranging from videos to books to e-commerce items. Recommender systems are used across a range of digital platforms to help surface relevant and engaging content to users. In these systems, ML models are trained to suggest items to each user individually based on user preferences, user engagement, and the items under recommendation. These data provide a strong learning signal for models to be able to recommend items that are likely to be of interest, thereby improving user experience.

In “STUDY: Socially Aware Temporally Causal Decoder Recommender Systems”, we present a content recommender system for audiobooks in an educational setting taking into account the social nature of reading. We developed the STUDY algorithm in partnership with Learning Ally, an educational nonprofit, aimed at promoting reading in dyslexic students, that provides audiobooks to students through a school-wide subscription program. Leveraging the wide range of audiobooks in the Learning Ally library, our goal is to help students find the right content to help boost their reading experience and engagement. Motivated by the fact that what a person’s peers are currently reading has significant effects on what they would find interesting to read, we jointly process the reading engagement history of students who are in the same classroom. This allows our model to benefit from live information about what is currently trending within the student’s localized social group, in this case, their classroom.


Data

Learning Ally has a large digital library of curated audiobooks targeted at students, making it well-suited for building a social recommendation model to help improve student learning outcomes. We received two years of anonymized audiobook consumption data. All students, schools and groupings in the data were anonymized, only identified by a randomly generated ID not traceable back to real entities by Google. Furthermore all potentially identifiable metadata was only shared in an aggregated form, to protect students and institutions from being re-identified. The data consisted of time-stamped records of student’s interactions with audiobooks. For each interaction we have an anonymized student ID (which includes the student’s grade level and anonymized school ID), an audiobook identifier and a date. While many schools distribute students in a single grade across several classrooms, we leverage this metadata to make the simplifying assumption that all students in the same school and in the same grade level are in the same classroom. While this provides the foundation needed to build a better social recommender model, it's important to note that this does not enable us to re-identify individuals, class groups or schools.


The STUDY algorithm

We framed the recommendation problem as a click-through rate prediction problem, where we model the conditional probability of a user interacting with each specific item conditioned on both 1) user and item characteristics and 2) the item interaction history sequence for the user at hand. Previous work suggests Transformer-based models, a widely used model class developed by Google Research, are well suited for modeling this problem. When each user is processed individually this becomes an autoregressive sequence modeling problem. We use this conceptual framework to model our data and then extend this framework to create the STUDY approach.

While this approach for click-through rate prediction can model dependencies between past and future item preferences for an individual user and can learn patterns of similarity across users at train time, it cannot model dependencies across different users at inference time. To recognise the social nature of reading and remediate this shortcoming we developed the STUDY model, which concatenates multiple sequences of books read by each student into a single sequence that collects data from multiple students in a single classroom.

However, this data representation requires careful diligence if it is to be modeled by transformers. In transformers, the attention mask is the matrix that controls which inputs can be used to inform the predictions of which outputs. The pattern of using all prior tokens in a sequence to inform the prediction of an output leads to the upper triangular attention matrix traditionally found in causal decoders. However, since the sequence fed into the STUDY model is not temporally ordered, even though each of its constituent subsequences is, a standard causal decoder is no longer a good fit for this sequence. When trying to predict each token, the model is not allowed to attend to every token that precedes it in the sequence; some of these tokens might have timestamps that are later and contain information that would not be available at deployment time.

In this figure we show the attention mask typically used in causal decoders. Each column represents an output and each column represents an output. A value of 1 (shown as blue) for a matrix entry at a particular position denotes that the model can observe the input of that row when predicting the output of the corresponding column, whereas a value of 0 (shown as white) denotes the opposite.

The STUDY model builds on causal transformers by replacing the triangular matrix attention mask with a flexible attention mask with values based on timestamps to allow attention across different subsequences. Compared to a regular transformer, which would not allow attention across different subsequences and would have a triangular matrix mask within sequence, STUDY maintains a causal triangular attention matrix within a sequence and has flexible values across sequences with values that depend on timestamps. Hence, predictions at any output point in the sequence are informed by all input points that occurred in the past relative to the current time point, regardless of whether they appear before or after the current input in the sequence. This causal constraint is important because if it is not enforced at train time, the model could potentially learn to make predictions using information from the future, which would not be available for a real world deployment.

In (a) we show a sequential autoregressive transformer with causal attention that processes each user individually; in (b) we show an equivalent joint forward pass that results in the same computation as (a); and finally, in (c) we show that by introducing new nonzero values (shown in purple) to the attention mask we allow information to flow across users. We do this by allowing a prediction to condition on all interactions with an earlier timestamp, irrespective of whether the interaction came from the same user or not.

Experiments

We used the Learning Ally dataset to train the STUDY model along with multiple baselines for comparison. We implemented an autoregressive click-through rate transformer decoder, which we refer to as “Individual”, a k-nearest neighbor baseline (KNN), and a comparable social baseline, social attention memory network (SAMN). We used the data from the first school year for training and we used the data from the second school year for validation and testing.

We evaluated these models by measuring the percentage of the time the next item the user actually interacted with was in the model’s top n recommendations, i.e., hits@n, for different values of n. In addition to evaluating the models on the entire test set we also report the models’ scores on two subsets of the test set that are more challenging than the whole data set. We observed that students will typically interact with an audiobook over multiple sessions, so simply recommending the last book read by the user would be a strong trivial recommendation. Hence, the first test subset, which we refer to as “non-continuation”, is where we only look at each model’s performance on recommendations when the students interact with books that are different from the previous interaction. We also observe that students revisit books they have read in the past, so strong performance on the test set can be achieved by restricting the recommendations made for each student to only the books they have read in the past. Although there might be value in recommending old favorites to students, much value from recommender systems comes from surfacing content that is new and unknown to the user. To measure this we evaluate the models on the subset of the test set where the students interact with a title for the first time. We name this evaluation subset “novel”.

We find that STUDY outperforms all other tested models across almost every single slice we evaluated against.

In this figure we compare the performance of four models, Study, Individual, KNN and SAMN. We measure the performance with hits@5, i.e., how likely the model is to suggest the next title the user read within the model’s top 5 recommendations. We evaluate the model on the entire test set (all) as well as the novel and non-continuation splits. We see STUDY consistently outperforms the other three models presented across all splits.

Importance of appropriate grouping

At the heart of the STUDY algorithm is organizing users into groups and doing joint inference over multiple users who are in the same group in a single forward pass of the model. We conducted an ablation study where we looked at the importance of the actual groupings used on the performance of the model. In our presented model we group together all students who are in the same grade level and school. We then experiment with groups defined by all students in the same grade level and district and also place all students in a single group with a random subset used for each forward pass. We also compare these models against the Individual model for reference.

We found that using groups that were more localized was more effective, with the school and grade level grouping outperforming the district and grade level grouping. This supports the hypothesis that the STUDY model is successful because of the social nature of activities such as reading — people’s reading choices are likely to correlate with the reading choices of those around them. Both of these models outperformed the other two models (single group and Individual) where grade level is not used to group students. This suggests that data from users with similar reading levels and interests is beneficial for performance.


Future work

This work is limited to modeling recommendations for user populations where the social connections are assumed to be homogenous. In the future it would be beneficial to model a user population where relationships are not homogeneous, i.e., where categorically different types of relationships exist or where the relative strength or influence of different relationships is known.


Acknowledgements

This work involved collaborative efforts from a multidisciplinary team of researchers, software engineers and educational subject matter experts. We thank our co-authors: Diana Mincu, Lauren Harrell, and Katherine Heller from Google. We also thank our colleagues at Learning Ally, Jeff Ho, Akshat Shah, Erin Walker, and Tyler Bastian, and our collaborators at Google, Marc Repnyek, Aki Estrella, Fernando Diaz, Scott Sanner, Emily Salkey and Lev Proleev.

Source: Google AI Blog


Dev Channel Update for ChromeOS / ChromeOS Flex

The Dev channel is being updated to OS version: 15572.4.0 Browser version: 117.0.5938.4 for most ChromeOS devices.

If you find new issues, please let us know one of the following ways

  1. File a bug
  2. Visit our ChromeOS communities
    1. General: Chromebook Help Community
    2. Beta Specific: ChromeOS Beta Help Community
  3. Report an issue or send feedback on Chrome

Interested in switching channels? Find out how.

Matt Nelson,
Google ChromeOS

Chrome for Android Update

     Hi, everyone! We've just released Chrome 116 (116.0.5845.92) for Android: it'll become available on Google Play over the next few days.

This release includes stability and performance improvements. You can see a full list of the changes in the Git log. If you find a new issue, please let us know by filing a bug.

Android releases contain the same security fixes as their corresponding Desktop release (Windows:  116.0.5845.96/.97  Mac& Linux: 116.0.5845.96), unless otherwise noted.

Erhu Akpobaro
Google Chrome