Dev Channel Update for Chrome OS

The Dev channel has been updated to 72.0.3609.3 (Platform version: 11264.0.0) for most Chrome OS devices. This build contains a number of bug fixes, security updates and feature enhancements.  A list of changes can be found here.

If you find new issues, please let us know by visiting our forum or filing a bug. Interested in switching channels? Find out how. You can submit feedback using ‘Report an issue...’ in the Chrome menu (3 vertical dots in the upper right corner of the browser).


Bernie Thompson

Google Chrome

Improved Grading of Prostate Cancer Using Deep Learning



Approximately 1 in 9 men in the United States will develop prostate cancer in their lifetime, making it the most common cancer in males. Despite being common, prostate cancers are frequently non-aggressive, making it challenging to determine if the cancer poses a significant enough risk to the patient to warrant treatment such as surgical removal of the prostate (prostatectomy) or radiation therapy. A key factor that helps in the “risk stratification” of prostate cancer patients is the Gleason grade, which classifies the cancer cells based on how closely they resemble normal prostate glands when viewed on a slide under a microscope.

However, despite its widely recognized clinical importance, Gleason grading of prostate cancer is complex and subjective, as evidenced by studies reporting inter-pathologist disagreements ranging from 30-53% [1][2]. Furthermore, there are not enough speciality trained pathologists to meet the global demand for prostate cancer pathology, especially outside the United States. Recent guidelines also recommend that pathologists report the percentage of tumor of different Gleason patterns in their final report, which adds to the workload and is yet another subjective challenge for the pathologist [3]. Overall, these issues suggest an opportunity to improve the diagnosis and clinical management of prostate cancer using deep learning–based models, similar to how Google and others used such techniques to demonstrate the potential to improve metastatic breast cancer detection.

In “Development and Validation of a Deep Learning Algorithm for Improving Gleason Scoring of Prostate Cancer”, we explore whether deep learning could improve the accuracy and objectivity of Gleason grading of prostate cancer in prostatectomy specimens. We developed a deep learning system (DLS) that mirrors a pathologist’s workflow by first categorizing each region in a slide into a Gleason pattern, with lower patterns corresponding to tumors that more closely resemble normal prostate glands. The DLS then summarizes an overall Gleason grade group based on the two most common Gleason patterns present. The higher the grade group, the greater the risk of further cancer progression and the more likely the patient is to benefit from treatment.
Visual examples of Gleason patterns, which are used in the Gleason system for grading prostate cancer. Individual cancer patches are assigned a Gleason pattern based on how closely the cancer resembles normal prostate tissue, with lower numbers corresponding to more well differentiated tumors. Image Source: National Institutes of Health.
To develop and validate the DLS, we collected de-identified images of prostatectomy samples which contain a greater amount and diversity of prostate cancer than needle core biopsies, even though the latter is the more common clinical procedure. On the training data, a cohort of 32 pathologists provided detailed annotations of Gleason patterns (resulting in over 112 million annotated image patches) and an overall Gleason grade group for each image. To overcome the previously referenced variability in Gleason grading, each slide in the validation set was independently graded by 3 to 5 general pathologists (selected from a cohort of 29 pathologists) and had a final Gleason grade assigned by a genitourinary-specialist pathologist to obtain the ground-truth label for that slide.

In the paper, we show that our DLS achieved an overall accuracy of 70%, compared to an average accuracy of 61% achieved by US board-certified general pathologists in our study. Of 10 high-performing individual general pathologists who graded every slide in the validation set, the DLS was more accurate than 8. The DLS was also more accurate than the average pathologist at Gleason pattern quantitation. These improvements in Gleason grading translated into better clinical risk stratification: the DLS better identified patients at higher risk for disease recurrence after surgery than the average general pathologist, potentially enabling doctors to use this information to better match patients to therapy.
Comparison of scoring performance of the DLS with pathologists. a: Accuracy of the DLS (in red) compared with the mean accuracy among a cohort-of-29 pathologists (in green). Error bars indicate 95% confidence intervals. b: Comparison of risk stratification provided by the DLS, the cohort-of-29 pathologists, and the genitourinary specialist pathologists. Patients are divided into low and high risk groups based on their Gleason grade group, where a larger separation between the Kaplan-Meier curves of these risk groups indicates better stratification.
We also found that the DLS was able to characterize tissue morphology that appeared to lie at the cusp of two Gleason patterns, which is one reason for the disagreements in Gleason grading observed between pathologists, suggesting the possibility of creating finer grained “precision grading” of prostate cancer. While the clinical significance of these intermediate patterns (e.g. Gleason pattern 3.3 or 3.7) is not known, the increased precision of the DLS will enable further research into this interesting question.
Assessing the region-level classification of the DLS. a: Annotations from 3 pathologists compared to DLS predictions. The pathologists show general concordance on the location and the extent of tumor areas, but poor agreement in classifying Gleason patterns. The DLS’s precision Gleason pattern for each region is represented by interpolating between the DLS’s prediction patterns for Gleason patterns 3 (green), 4 (yellow), and 5 (red). b: DLS prediction
patterns compared to the distribution of pathologists’ Gleason pattern classifications on 41 million annotated image patches from the test dataset. On patches where pathologists are discordant, where the tissue is more likely to be on the cusp of two patterns, the DLS reflects this ambiguity in it's prediction scores.
While these initial results are encouraging, there is much more work to be done before systems like our DLS can be used to improve the care of prostate cancer patients. First, the accuracy of the model can be further improved with additional training data and should be validated on independent cohorts containing a larger number and more diverse group of patients. In addition, we are actively working on refining our DLS system to work on diagnostic needle core biopsies, which occur prior to the decision to undergo surgery and where Gleason grading therefore has a significantly greater impact on clinical decision-making. Further work will be needed to assess how to best integrate our DLS into the pathologist’s diagnostic workflow and the impact of such artificial-intelligence based assistance on the overall efficiency, accuracy, and prognostic ability of Gleason grading in clinical practice. Nonetheless, we are excited about the potential of technologies like this to significantly improve cancer diagnostics and patient care.

Acknowledgements
This work involved the efforts of a multidisciplinary team of software engineers, researchers, clinicians and logistics support staff. Key contributors to this project include Kunal Nagpal, Davis Foote, Yun Liu, Po-Hsuan (Cameron) Chen, Ellery Wulczyn, Fraser Tan, Niels Olson, Jenny L. Smith, Arash Mohtashamian, James H. Wren, Greg S. Corrado, Robert MacDonald, Lily H. Peng, Mahul B. Amin, Andrew J. Evans, Ankur R. Sangoi, Craig H. Mermel, Jason D. Hipp and Martin C. Stumpe. We would also like to thank Tim Hesterberg, Michael Howell, David Miller, Alvin Rajkomar, Benny Ayalew, Robert Nagle, Melissa Moran, Krishna Gadepalli, Aleksey Boyko, and Christopher Gammage. Lastly, this work would not have been possible without the aid of the pathologists who annotated data for this study.

References
  1. Interobserver Variability in Histologic Evaluation of Radical Prostatectomy Between Central and Local Pathologists: Findings of TAX 3501 Multinational Clinical Trial, Netto, G. J., Eisenberger, M., Epstein, J. I. & TAX 3501 Trial Investigators, Urology 77, 1155–1160 (2011).
  2. Phase 3 Study of Adjuvant Radiotherapy Versus Wait and See in pT3 Prostate Cancer: Impact of Pathology Review on Analysis, Bottke, D., Golz, R., Störkel, S., Hinke, A., Siegmann, A., Hertle, L., Miller, K., Hinkelbein, W., Wiegel, T., Eur. Urol. 64, 193–198 (2013).
  3. Utility of Quantitative Gleason Grading in Prostate Biopsies and Prostatectomy Specimens, Sauter, G. Steurer, S., Clauditz, T. S., Krech, T., Wittmer, C., Lutz, F., Lennartz, M., Janssen, T., Hakimi, N., Simon, R., von Petersdorff-Campen, M., Jacobsen, F., von Loga, K., Wilczak, W., Minner, S., Tsourlakis, M. C., Chirico, V., Haese, A., Heinzer, H., Beyer, B., Graefen, M., Michl, U., Salomon, G., Steuber, T., Budäus, L. H., Hekeler, E., Malsy-Mink, J., Kutzera, S., Fraune, C., Göbel, C., Huland, H., Schlomm, T., Clinical Eur. Urol. 69, 592–598 (2016).

Source: Google AI Blog


Additional Google Ads scripts workshop event

We had a great turnout at our round of Google Ads scripts workshops over the past few months. We're pleased to announce that due to high demand, we will host one more event in London on December 7, 2018. Please join us for some informative talks and interactive codelabs.

The additional London session will present the same advanced track from the previous set of workshops, which caters to experienced Google Ads scripts users who want to explore advanced scenarios and keep up with the latest developments.

Please visit the event site for full details and to register for this additional session.

We hope to see you there!

Friendsgiving? More like #trendsgiving

Across America, people celebrate next week’s day of thanks with their own unique traditions, and many come to Search for help to pull off Turkey Day like a pro. (In fact, you can simply search “Thanksgiving” and find video tips from expert chefs on how to master mashed potatoes or ideas to elevate a pumpkin pie!)

But there’s a cornucopia of searches you can do, and we took a look at some of the top and trending ideas that people are gobbling up to hone their side dish skills and perfect their pie game.

Pardon me, but how do you plan to prep your turkey?

There are many ways to toast a turkey, but here are the preferred cooking methods across the U.S. this year. While the Eastern Seaboard and some of the Southwest is fond of frying, smoked birds are booming in nearly half the country. Hawaii joins New England in sticking to the classic roasted turkey.

turkey across america

Top turkey cooking methods in each state

Party fowl

As people gear up for one of the year’s biggest get-togethers for family and friends, questions arise about how to make sure that their fowl doesn’t flop and, this year, how to ensure that the bird they select is safe to eat. Here are some top turkey searches trending right now:

Let’s dish on sides

For many, Thanksgiving meals are more about the sweet and salty sides and the gravy train that goes along with them. But making a dinner that satisfies the dietary needs of all of your guests can be a challenge—and many of you are coming to search for ideas to tailor your menus accordingly. Here are a few insights around these diet-specific dish trends:

  • “Keto Thanksgiving” searches are hitting an all time high in 2018, with “keto Thanksgiving sides” up 70% YoY.

  • Vegan Thanksgiving” has the highest interest among diet-specific searches, more than double the search interest of “keto Thanksgiving” and “vegetarian Thanksgiving” searches.

  • “Gluten free Thanksgiving” is also on the rise, with searches for “gluten free thanksgiving sides” (+300% YoY), desserts (+100% YoY) and stuffing (+90% YoY) all trending up since last year.

Pie, in charts

Everyone has their own festive favorite when it comes to Thanksgiving Day dessert. Here’s a slice of the top pies people are searching for:


pie chart

Many of these pie preferences can be regional, too. Here are the places where the top pies are poppin’ on Search.

thanksgiving pies

No matter how you celebrate and give thanks, you can find all the ideas you need on Search, get to your friends and family quickly and safely with help from Google Maps, and get to gobbling up that grub.

Source: Search


Update to Customer Match Requirements

We announced an update to Customer Match requirements in October 2018 that affects member uploads and the usage of Customer Match for campaigns on Search, YouTube, and Gmail.
  • Creating a CrmBasedUserList through AdWordsUserListService now requires whitelisting. If the account is not whitelisted, your request returns an ADVERTISER_NOT_WHITELISTED_FOR_USING_UPLOADED_DATA error.
  • Uploading members to a CrmBasedUserList results in a CAN_NOT_MUTATE_SENSITIVE_USERLIST error if the account is not whitelisted.
  • Targeting CrmBasedUserLists in accounts that are not whitelisted using CampaignCriterionService or AdGroupCriterionService results in an INVALID_ID error.
  • Serving campaigns can be affected if they are using Customer Match and the account is not whitelisted. If a campaign is only targeting Customer Match, then the campaign stops serving. If the campaign has other audiences, then the campaign continues to serve with the other audiences.
Please see the requirements for whitelist eligibility. To apply for the whitelist, reach out to your account manager. Once whitelisted, Customer Match no longer results in errors and serves normally without any further changes needed.

If you have questions, please reach out to us on the AdWords API forum.

Hispanic Heritage Month Pay It Forward Challenge: Recognizing students making a difference (Part 3 of 3)

In honor of Hispanic Heritage Month (Sept 15 - Oct 15), Google hosted a Pay It Forward Challenge to recognize Latinx/Hispanic student leaders who are advancing opportunities for their local communities. We ended up receiving so may great submissions that we decided to make this a three-part blog series. This is the final piece. We’re excited to share the work of the students below and hope you’ll be inspired by their stories.

ICYMI, be sure to check out Part 1 and Part 2 of this post.

Diana Lee Guzman
Diana Lee Guzman is a recent graduate from New York University with a B.S. in Computer Science. She grew up in Phoenix, AZ in a primarily Latinx community. She is currently the Founder/CEO of Coding in Color and a Software Engineer at Boeing. 
Diana started her non-profit, Coding in Color, with the purpose of providing educational resources to underrepresented students in computing. “Over the past 9 months, I had the pleasure of working alongside two of my amazing high school colleagues, Lirio and Robert, to create a Summer Coding Camp, specifically for our community. We worked alongside our high school administration (Carl Hayden Community High School) where they provided us with a classroom and computers. The course was sponsored by individual members of the community who helped with supplies and providing stipends for students. I taught the course for three weeks where we covered topics such as Web Development, Object Orientated Programming, Robotics, and Artificial Intelligence.”

After the course ended, Diana continued mentoring, and along with her mentee, created websites for two local Latina business owners with businesses catering towards the Spanish speaking community.

What inspires Diana about Hispanic Heritage Month
"What I enjoy the most about Hispanic Heritage month is being able to see all these amazing opportunities being acted on by people just like me, people who speak like me, eat some of the same food as me and listen to the same music as me. I enjoy seeing the celebration of our cultures and accomplishments and it always makes me hopeful that the next generation, next graduating class, next wave of us will be able to accomplish more than we ever have."

Katerina Alvarez
Katerina Alvarez is a Posse Foundation Scholar at Mount Holyoke College studying Statistics and Sociology. Katerina is a Latina civic leader and STEM advocate committed to “engaging purposefully in mutually-beneficial community partnerships to advance social justice, education and community development with tech.”
Through her work as a Mount Holyoke Community-Based Learning (CBL) STEM Fellow for The Care Center, a transformative education program, Katerina helps support and empower young Latinx mothers to complete their high school equivalency exams and pursue higher education and successful careers in tech.

“For the past year, I've recruited over 50 Spanish-speaking tutors and developed an innovative partnership with Makerspace – a laboratory on campus which aims at inspiring and educating underrepresented women in STEM by blending the arts and sciences together to create fun and engaging workshops. As I continue collaborating with The Care Center, I am also mentoring and supporting 30 other CBL Fellows, to help them build successful and sustainable partnerships with their community partners.”

Katerina’s advice to others
"Remember: Never assume, be transparent, and complete a '360 review' frequently, so you may learn from the organization and volunteers about what is and what isn't working."

What inspires Katerina about Hispanic Heritage Month
"Hispanic Heritage Month inspires me because it reminds me of my Cuban grandparents who taught me the importance of perseverance, determination, and believing in the unbelievable. I remember my grandfather, a 90-year old tennis player and painter, vividly sharing their story of love and sacrifice as they immigrated to the US while my wise and practical grandmother fact-checked him along the way. They've empowered me to embrace my roots and live my best life with passion and resilience. This month, and every month, I celebrate Hispanic Heritage Month for my grandparents, Aba y Abu."

Angel Ortega
Angel is a graduate student at The University of Texas at El Paso. He was born and raised in Mexico City before moving to El Paso to seek an education. He is an avid learner with interests in technology, culture, foreign languages, and education. He is also a big Harry Potter fan.
Angel has been a long-time member of the Sol y Agua Project. The Sol y Agua Project aims to attract middle school students, specifically minorities, from the Rio Grande Region into STEM fields and careers with a focus on water sustainability, biodiversity, and the human-impact on the environment. “I combined my passion for technology and education with my background as a minority, international student, and Computer Science major to teach children in El Paso about computing, computational thinking, and water.” His goal is to help and inspire young students to pursue a higher education, ideally in STEM.

Angel is also very active in the Computing Alliance of Hispanic Serving Institutions (CAHSI). He was recently selected as a CAHSI Scholar and currently acts as the CAHSI Student Coordinator for the Google TechExchange Program.

Angel’s advice to others
“Sometimes the things with the most impact are those that seem the least significant. You'll never know how impactful you can be, until you try. Go out there and be the change you want to see in your community.”

Orlando Gil 
Orlando lives in Harlem, New York. He is graduating soon from Baruch College with a concentration in Data Analytics.
On campus, Orlando helped over 30 undocumented students share their stories in a university publication. “My mission is to uplift the contributions of immigrants in American society, and positively shape the rhetoric towards undocumented immigrants. It has taught me the value of owning one's story and using it to combat stereotypes.”

As an intern at the U.S. House of Representatives, Orlando helped lobby on behalf of undocumented students such as himself and “bring light to the various issues faced by those of us who are currently DACA recipients.”

“Although the efforts to pass the DREAM Act, a legislative solution, were not successful, I see far more value in the self-determination of immigrants as natural entrepreneurs. For that reason, I am passionate about helping undocumented entrepreneurs bridge gaps in business and technical expertise.” This is why he has launched his new initiative – Dream Ventures NYC.  “Dream Ventures is a springboard for innovation, education and communal entrepreneurship within the immigrant community. We help ‘UndocuPreneurs’ finance their bold ideas, and pair them with experienced advisors.”

Through advocacy, Orlando has been able to share and perform his writing at various magazine launch events, festivals, and television. 

Orlando’s advice to others
“Know yourself and understand your ties to your own community. It will reinforce your passion for helping and persevering through constant challenges. Also, analyze your available network and see how you can create value. Value is not always determined by the structure of power—one may not have the power to bring about overnight change, but one can gradually and creatively find the resources to do so.”

Andreina Martinez 
Andreina Martinez is currently a senior at The City College of New York majoring in Psychology with an interest in public service. She is from the Dominican Republic and came to the United States in 2010.
Andreina volunteers as a High School Educator, with Peer Health Exchange, an organization that wants to provide young kids with the right tools and information to make smart decisions about their health. She previously interned with the New York State Senate where she focused on constituent casework ranging from housing issues to military benefits. After spending last summer in Washington D.C “interning and learning more about the legislative process our nation goes through,” Andreina took on another internship in the New York City Council where she works to help low income communities and immigrants.

Andreina’s advice to others
"We don't know who we can impact with our actions and even with our words. It's extremely important to know the value of your voice and your story, when you are able to share those things with the world you will see change in your communities. Give it a try!"


Itzel Tapia
Itzel is currently a junior attending the University of Texas at Dallas full-time on a full scholarship. She is majoring in Software Engineering and Artificial Intelligence with a minor in Cognitive Science. She was born in Dallas, Texas, after her parents immigrated from Mexico. She is the first person in her family to attend college, and has a three year old daughter.

When Itzel returned to school in 2016 she began volunteering at with Phi Theta Kappa Honor Society in the form of scholarship fundraisers, food drives, and mentorship. This is where she found her passion – helping other students.

Itzel began mentoring classmates on the abundance of resources available to help them succeed. This led to helping with scholarship applications, class registration, major exploration, university admissions, and even tutoring. Eventually she began reaching out to high school juniors and seniors who desperately needed help navigating their last years of high school, in preparation for college.

“It is so common to encounter Hispanic students who are intimidated and thus unsure of whether they should attend college. There is so little help offered in the advising offices, and so many resources that go unused. I’ve always loved research, so collecting a growing list of resources, scholarship opportunities, and the like came naturally – once I knew where to look.”

“First-generation college students cannot count on the experience of our parents to help guide us in our journeys, we rely solely on our own grit, and the few generous mentors we encounter along the way. I felt personally responsible to be that mentor to every student I met who needed help.”

Itzel is also passionate about increasing the interest of girls and women in STEM. She has begun mentoring girls who cannot afford coding/robotics camps, and hopes to inspire them and give them the self-confidence to become the engineers, scientists, and doctors of the future.

What inspires Itzel about Hispanic Heritage Month?
“I am inspired by the stories of other Latinx who come from humble backgrounds and still find their own ways to help our community. It’s a powerful thing to see a mother of four, raising money for scholarships by throwing a Tamalada. It gives great comfort knowing that I’ll never be alone, and that no matter what, someone will always step-in to offer me their help with a few Tamales in tow.”

Keep up with us on social (TwitterInstagramFacebookG+YouTube) to hear more about our initiatives!

Changes to the Google sign-in interface coming soon

Starting November 27th, 2018, we’ll make some small changes to the appearance of the Google sign-in page. These follow changes made earlier this year, which updated the sign-in page to match the Material Design principles used in other Google products.

Specifically, you might notice outlines around some entry fields, and changes to the spacing and styling of other text on both the web and mobile screens. The changes will start to take effect on November 27th and may take up to two weeks to reach all users.

See the new sign-in UI 

Sign-in page that will start rolling out on November 27, 2018

Sign-in page prior to November 27, 2018


Launch Details 
Release track:
Launching to both Rapid Release and Scheduled Release 

Editions: 
Available to all G Suite editions

Rollout pace: 
Gradual rollout (up to 15 days for feature visibility)

Impact: 
All end users

Action: 
Change management suggested/FYI

Launch release calendar
Launch detail categories
Get these product update alerts by email
Subscribe to the RSS feed of these updates

Say G’day to the new Google My Business app, and say hello to your next customer

Every month, in Australia, Google drives tens of millions of direct connections between businesses and their customers including calls, online reservations and direction requests.
Enabling these interactions was our goal when we first introduced Google My Business in Australia four years ago, a free tool that helps small business owners reach more people online and connect with their customers through Google, so they can grow their business and spend more of their time doing what they do best—running it.
As a next step into that journey, today we are excited to announce the release of the new Google My Business app, an even easier way for small businesses to turn those direct connections into customers. Let’s look at how one small business, Khamsa, gets new customers with the app, and read on for details about what’s new:


  • A simple, free way to stand out on Google and get new customers with a great Business Profile: With a press of the new Post button in the app, you can upload a photo, create an offer or event and add it right to your Business Profile on Google. You can also manage your business information on Google from the Profile tab and watch your edits appear seamlessly across Search and Maps. 
  • Your customers on Google – all in one place: When people find you on Google, they can connect with you in a number of ways -- calling, messaging, or even leaving a review -- right from your Business Profile. You’ll be able to see all of these customers on Google in one place from the app’s new Customers tab. From here, you can easily respond to customer reviews and post offers to your followers to keep them coming back in the door, and soon you’ll be able to respond to messages right from the app. And because Google My Business is always on the clock, we’ll be sure to notify you when you get a new customer connection. 
  • Keep track of the results that matter: See how many people are finding and connecting with you from your Business Profile on Google. We’ve put your profile results front-and-center on the home screen so you’re always in the know. 
No matter if you’re a dog grooming business in Sydney’s Eastern suburbs, or an Australian Teddy Bear shop in Tambo, people are searching for your business on Google. Turn those searchers into customers with the new, free Google My Business app. Available for download in Google Play or the App Store today.

Say G’day to the new Google My Business app, and say hello to your next customer

Every month, in Australia, Google drives tens of millions of direct connections between businesses and their customers including calls, online reservations and direction requests.
Enabling these interactions was our goal when we first introduced Google My Business in Australia four years ago, a free tool that helps small business owners reach more people online and connect with their customers through Google, so they can grow their business and spend more of their time doing what they do best—running it.
As a next step into that journey, today we are excited to announce the release of the new Google My Business app, an even easier way for small businesses to turn those direct connections into customers. Let’s look at how one small business, Khamsa, gets new customers with the app, and read on for details about what’s new:


  • A simple, free way to stand out on Google and get new customers with a great Business Profile: With a press of the new Post button in the app, you can upload a photo, create an offer or event and add it right to your Business Profile on Google. You can also manage your business information on Google from the Profile tab and watch your edits appear seamlessly across Search and Maps. 
  • Your customers on Google – all in one place: When people find you on Google, they can connect with you in a number of ways -- calling, messaging, or even leaving a review -- right from your Business Profile. You’ll be able to see all of these customers on Google in one place from the app’s new Customers tab. From here, you can easily respond to customer reviews and post offers to your followers to keep them coming back in the door, and soon you’ll be able to respond to messages right from the app. And because Google My Business is always on the clock, we’ll be sure to notify you when you get a new customer connection. 
  • Keep track of the results that matter: See how many people are finding and connecting with you from your Business Profile on Google. We’ve put your profile results front-and-center on the home screen so you’re always in the know. 
No matter if you’re a dog grooming business in Sydney’s Eastern suburbs, or an Australian Teddy Bear shop in Tambo, people are searching for your business on Google. Turn those searchers into customers with the new, free Google My Business app. Available for download in Google Play or the App Store today.

Combating Potentially Harmful Applications with Machine Learning at Google: Datasets and Models



[Cross-posted from the Android Developers Blog]

In a previous blog post, we talked about using machine learning to combat Potentially Harmful Applications (PHAs). This blog post covers how Google uses machine learning techniques to detect and classify PHAs. We'll discuss the challenges in the PHA detection space, including the scale of data, the correct identification of PHA behaviors, and the evolution of PHA families. Next, we will introduce two of the datasets that make the training and implementation of machine learning models possible, such as app analysis data and Google Play data. Finally, we will present some of the approaches we use, including logistic regression and deep neural networks.

Using Machine Learning to Scale

Detecting PHAs is challenging and requires a lot of resources. Our security experts need to understand how apps interact with the system and the user, analyze complex signals to find PHA behavior, and evolve their tactics to stay ahead of PHA authors. Every day, Google Play Protect (GPP) analyzes over half a million apps, which makes a lot of new data for our security experts to process.

Leveraging machine learning helps us detect PHAs faster and at a larger scale. We can detect more PHAs just by adding additional computing resources. In many cases, machine learning can find PHA signals in the training data without human intervention. Sometimes, those signals are different than signals found by security experts. Machine learning can take better advantage of this data, and discover hidden relationships between signals more effectively.

There are two major parts of Google Play Protect's machine learning protections: the data and the machine learning models.

Data Sources

The quality and quantity of the data used to create a model are crucial to the success of the system. For the purpose of PHA detection and classification, our system mainly uses two anonymous data sources: data from analyzing apps and data from how users experience apps.

App Data

Google Play Protect analyzes every app that it can find on the internet. We created a dataset by decomposing each app's APK and extracting PHA signals with deep analysis. We execute various processes on each app to find particular features and behaviors that are relevant to the PHA categories in scope (for example, SMS fraud, phishing, privilege escalation). Static analysis examines the different resources inside an APK file while dynamic analysis checks the behavior of the app when it's actually running. These two approaches complement each other. For example, dynamic analysis requires the execution of the app regardless of how obfuscated its code is (obfuscation hinders static analysis), and static analysis can help detect cloaking attempts in the code that may in practice bypass dynamic analysis-based detection. In the end, this analysis produces information about the app's characteristics, which serve as a fundamental data source for machine learning algorithms.

Google Play Data

In addition to analyzing each app, we also try to understand how users perceive that app. User feedback (such as the number of installs, uninstalls, user ratings, and comments) collected from Google Play can help us identify problematic apps. Similarly, information about the developer (such as the certificates they use and their history of published apps) contribute valuable knowledge that can be used to identify PHAs. All these metrics are generated when developers submit a new app (or new version of an app) and by millions of Google Play users every day. This information helps us to understand the quality, behavior, and purpose of an app so that we can identify new PHA behaviors or identify similar apps.

In general, our data sources yield raw signals, which then need to be transformed into machine learning features for use by our algorithms. Some signals, such as the permissions that an app requests, have a clear semantic meaning and can be directly used. In other cases, we need to engineer our data to make new, more powerful features. For example, we can aggregate the ratings of all apps that a particular developer owns, so we can calculate a rating per developer and use it to validate future apps. We also employ several techniques to focus in on interesting data.To create compact representations for sparse data, we use embedding. To help streamline the data to make it more useful to models, we use feature selection. Depending on the target, feature selection helps us keep the most relevant signals and remove irrelevant ones.

By combining our different datasets and investing in feature engineering and feature selection, we improve the quality of the data that can be fed to various types of machine learning models.

Models

Building a good machine learning model is like building a skyscraper: quality materials are important, but a great design is also essential. Like the materials in a skyscraper, good datasets and features are important to machine learning, but a great algorithm is essential to identify PHA behaviors effectively and efficiently.

We train models to identify PHAs that belong to a specific category, such as SMS-fraud or phishing. Such categories are quite broad and contain a large number of samples given the number of PHA families that fit the definition. Alternatively, we also have models focusing on a much smaller scale, such as a family, which is composed of a group of apps that are part of the same PHA campaign and that share similar source code and behaviors. On the one hand, having a single model to tackle an entire PHA category may be attractive in terms of simplicity but precision may be an issue as the model will have to generalize the behaviors of a large number of PHAs believed to have something in common. On the other hand, developing multiple PHA models may require additional engineering efforts, but may result in better precision at the cost of reduced scope.

We use a variety of modeling techniques to modify our machine learning approach, including supervised and unsupervised ones.

One supervised technique we use is logistic regression, which has been widely adopted in the industry. These models have a simple structure and can be trained quickly. Logistic regression models can be analyzed to understand the importance of the different PHA and app features they are built with, allowing us to improve our feature engineering process. After a few cycles of training, evaluation, and improvement, we can launch the best models in production and monitor their performance.

For more complex cases, we employ deep learning. Compared to logistic regression, deep learning is good at capturing complicated interactions between different features and extracting hidden patterns. The millions of apps in Google Play provide a rich dataset, which is advantageous to deep learning.

In addition to our targeted feature engineering efforts, we experiment with many aspects of deep neural networks. For example, a deep neural network can have multiple layers and each layer has several neurons to process signals. We can experiment with the number of layers and neurons per layer to change model behaviors.

We also adopt unsupervised machine learning methods. Many PHAs use similar abuse techniques and tricks, so they look almost identical to each other. An unsupervised approach helps define clusters of apps that look or behave similarly, which allows us to mitigate and identify PHAs more effectively. We can automate the process of categorizing that type of app if we are confident in the model or can request help from a human expert to validate what the model found.

PHAs are constantly evolving, so our models need constant updating and monitoring. In production, models are fed with data from recent apps, which help them stay relevant. However, new abuse techniques and behaviors need to be continuously detected and fed into our machine learning models to be able to catch new PHAs and stay on top of recent trends. This is a continuous cycle of model creation and updating that also requires tuning to ensure that the precision and coverage of the system as a whole matches our detection goals.

Looking forward

As part of Google's AI-first strategy, our work leverages many machine learning resources across the company, such as tools and infrastructures developed by Google Brain and Google Research. In 2017, our machine learning models successfully detected 60.3% of PHAs identified by Google Play Protect, covering over 2 billion Android devices. We continue to research and invest in machine learning to scale and simplify the detection of PHAs in the Android ecosystem.

Acknowledgements

This work was developed in joint collaboration with Google Play Protect, Safe Browsing and Play Abuse teams with contributions from Andrew Ahn, Hrishikesh Aradhye, Daniel Bali, Hongji Bao, Yajie Hu, Arthur Kaiser, Elena Kovakina, Salvador Mandujano, Melinda Miller, Rahul Mishra, Damien Octeau, Sebastian Porst, Chuangang Ren, Monirul Sharif, Sri Somanchi, Sai Deep Tetali, Zhikun Wang, and Mo Yu.