Tag Archives: google play protect

Google Play Protect in 2018: New updates to keep Android users secure

Posted by Rahul Mishra and Tom Watkins, Android Security & Privacy Team

In 2018, Google Play Protect made Android devices running Google Play some of the most secure smartphones available, scanning over 50 billion apps everyday for harmful behaviour.

Android devices can genuinely improve people's lives through our accessibility features, Google Assistant, digital wellbeing, Family Link, and more — but we can only do this if they are safe and secure enough to earn users' long term trust. This is Google Play Protect's charter and we're encouraged by this past year's advancements.

Google Play Protect, a refresher

Google Play Protect is the technology we use to ensure that any device shipping with the Google Play Store is secured against potentially harmful applications (PHA). It is made up of a giant backend scanning engine to aid our analysts in sourcing and vetting applications made available on the Play Store, and built-in protection that scans apps on users' devices, immobilizing PHA and warning users.

This technology protects over 2 billion devices in the Android ecosystem every day.

What's new

On by default

We strongly believe that security should be a built-in feature of every device, not something a user needs to find and enable. When security features function at their best, most users do not need to be aware of them. To this end, we are pleased to announce that Google Play Protect is now enabled by default to secure all new devices, right out of the box. The user is notified that Google Play Protect is running, and has the option to turn it off whenever desired.

New and rare apps

Android is deployed in many diverse ways across many different users. We know that the ecosystem would not be as powerful and vibrant as it is today without an equally diverse array of apps to choose from. But installing new apps, especially from unknown sources, can carry risk.

Last year we launched a new feature that notifies users when they are installing new or rare apps that are rarely installed in the ecosystem. In these scenarios, the feature shows a warning, giving users pause to consider whether they want to trust this app, and advising them to take additional care and check the source of installation. Once Google has fully analyzed the app and determined that it is not harmful, the notification will no longer display. In 2018, this warning showed around 100,000 times per day

Context is everything: warning users on launch

It's easy to misunderstand alerts when presented out of context. We're trained to click through notifications without reading them and get back to what we were doing as quickly as possible. We know that providing timely and context-sensitive alerts to users is critical for them to be of value. We recently enabled a security feature first introduced in Android Oreo which warns users when they are about to launch a potentially harmful app on their device.

This new warning dialog provides in-context information about which app the user is about to launch, why we think it may be harmful and what might happen if they open the app. We also provide clear guidance on what to do next. These in-context dialogs ensure users are protected even if they accidentally missed an alert.

Auto-disabling apps

Google Play Protect has long been able to disable the most harmful categories of apps on users devices automatically, providing robust protection where we believe harm will be done.

In 2018, we extended this coverage to apps installed from Play that were later found to have violated Google Play's policies, e.g. on privacy, deceptive behavior or content. These apps have been suspended and removed from the Google Play Store.

This does not remove the app from user device, but it does notify the user and prevents them from opening the app accidentally. The notification gives the option to remove the app entirely.

Keeping the Android ecosystem secure is no easy task, but we firmly believe that Google Play Protect is an important security layer that's used to protect users devices and their data while maintaining the freedom, diversity and openness that makes Android, well, Android.

Acknowledgements: This post leveraged contributions from Meghan Kelly and William Luh.

How we fought bad apps and malicious developers in 2018

Posted by Andrew Ahn, Product Manager, Google Play

Google Play is committed to providing a secure and safe platform for billions of Android users on their journey discovering and experiencing the apps they love and enjoy. To deliver against this commitment, we worked last year to improve our abuse detection technologies and systems, and significantly increased our team of product managers, engineers, policy experts, and operations leaders to fight against bad actors.

In 2018, we introduced a series of new policies to protect users from new abuse trends, detected and removed malicious developers faster, and stopped more malicious apps from entering the Google Play Store than ever before. The number of rejected app submissions increased by more than 55 percent, and we increased app suspensions by more than 66 percent. These increases can be attributed to our continued efforts to tighten policies to reduce the number of harmful apps on the Play Store, as well as our investments in automated protections and human review processes that play critical roles in identifying and enforcing on bad apps.

In addition to identifying and stopping bad apps from entering the Play Store, our Google Play Protect system now scans over 50 billion apps on users' devices each day to make sure apps installed on the device aren't behaving in harmful ways. With such protection, apps from Google Play are eight times less likely to harm a user's device than Android apps from other sources.

Here are some areas we've been focusing on in the last year and that will continue to be a priority for us in 2019:

Protecting User Privacy

Protecting users' data and privacy is a critical factor in building user trust. We've long required developers to limit their device permission requests to what's necessary to provide the features of an app. Also, to help users understand how their data is being used, we've required developers to provide prominent disclosures about the collection and use of sensitive user data. Last year, we rejected or removed tens of thousands of apps that weren't in compliance with Play's policies related to user data and privacy.

In October 2018, we announced a new policy restricting the use of the SMS and Call Log permissions to a limited number of cases, such as where an app has been selected as the user's default app for making calls or sending text messages. We've recently started to remove apps from Google Play that violate this policy. We plan to introduce additional policies for device permissions and user data throughout 2019.

Developer integrity

We find that over 80% of severe policy violations are conducted by repeat offenders and abusive developer networks. When malicious developers are banned, they often create new accounts or buy developer accounts on the black market in order to come back to Google Play. We've further enhanced our clustering and account matching technologies, and by combining these technologies with the expertise of our human reviewers, we've made it more difficult for spammy developer networks to gain installs by blocking their apps from being published in the first place.

Harmful app contents and behaviors

As mentioned in last year's blog post, we fought against hundreds of thousands of impersonators, apps with inappropriate content, and Potentially Harmful Applications (PHAs). In a continued fight against these types of apps, not only do we apply advanced machine learning models to spot suspicious apps, we also conduct static and dynamic analyses, intelligently use user engagement and feedback data, and leverage skilled human reviews, which have helped in finding more bad apps with higher accuracy and efficiency.

Despite our enhanced and added layers of defense against bad apps, we know bad actors will continue to try to evade our systems by changing their tactics and cloaking bad behaviors. We will continue to enhance our capabilities to counter such adversarial behavior, and work relentlessly to provide our users with a secure and safe app store.

How useful did you find this blog post?

Combating Potentially Harmful Applications with Machine Learning at Google: Datasets and Models

Posted by Mo Yu, Android Security & Privacy Team

In a previous blog post, we talked about using machine learning to combat Potentially Harmful Applications (PHAs). This blog post covers how Google uses machine learning techniques to detect and classify PHAs. We'll discuss the challenges in the PHA detection space, including the scale of data, the correct identification of PHA behaviors, and the evolution of PHA families. Next, we will introduce two of the datasets that make the training and implementation of machine learning models possible, such as app analysis data and Google Play data. Finally, we will present some of the approaches we use, including logistic regression and deep neural networks.

Using machine learning to scale

Detecting PHAs is challenging and requires a lot of resources. Our security experts need to understand how apps interact with the system and the user, analyze complex signals to find PHA behavior, and evolve their tactics to stay ahead of PHA authors. Every day, Google Play Protect (GPP) analyzes over half a million apps, which makes a lot of new data for our security experts to process.

Leveraging machine learning helps us detect PHAs faster and at a larger scale. We can detect more PHAs just by adding additional computing resources. In many cases, machine learning can find PHA signals in the training data without human intervention. Sometimes, those signals are different than signals found by security experts. Machine learning can take better advantage of this data, and discover hidden relationships between signals more effectively.

There are two major parts of Google Play Protect's machine learning protections: the data and the machine learning models.

Data sources

The quality and quantity of the data used to create a model are crucial to the success of the system. For the purpose of PHA detection and classification, our system mainly uses two anonymous data sources: data from analyzing apps and data from how users experience apps.

App data

Google Play Protect analyzes every app that it can find on the internet. We created a dataset by decomposing each app's APK and extracting PHA signals with deep analysis. We execute various processes on each app to find particular features and behaviors that are relevant to the PHA categories in scope (for example, SMS fraud, phishing, privilege escalation). Static analysis examines the different resources inside an APK file while dynamic analysis checks the behavior of the app when it's actually running. These two approaches complement each other. For example, dynamic analysis requires the execution of the app regardless of how obfuscated its code is (obfuscation hinders static analysis), and static analysis can help detect cloaking attempts in the code that may in practice bypass dynamic analysis-based detection. In the end, this analysis produces information about the app's characteristics, which serve as a fundamental data source for machine learning algorithms.

Google Play data

In addition to analyzing each app, we also try to understand how users perceive that app. User feedback (such as the number of installs, uninstalls, user ratings, and comments) collected from Google Play can help us identify problematic apps. Similarly, information about the developer (such as the certificates they use and their history of published apps) contribute valuable knowledge that can be used to identify PHAs. All these metrics are generated when developers submit a new app (or new version of an app) and by millions of Google Play users every day. This information helps us to understand the quality, behavior, and purpose of an app so that we can identify new PHA behaviors or identify similar apps.

In general, our data sources yield raw signals, which then need to be transformed into machine learning features for use by our algorithms. Some signals, such as the permissions that an app requests, have a clear semantic meaning and can be directly used. In other cases, we need to engineer our data to make new, more powerful features. For example, we can aggregate the ratings of all apps that a particular developer owns, so we can calculate a rating per developer and use it to validate future apps. We also employ several techniques to focus in on interesting data.To create compact representations for sparse data, we use embedding. To help streamline the data to make it more useful to models, we use feature selection. Depending on the target, feature selection helps us keep the most relevant signals and remove irrelevant ones.

By combining our different datasets and investing in feature engineering and feature selection, we improve the quality of the data that can be fed to various types of machine learning models.

Models

Building a good machine learning model is like building a skyscraper: quality materials are important, but a great design is also essential. Like the materials in a skyscraper, good datasets and features are important to machine learning, but a great algorithm is essential to identify PHA behaviors effectively and efficiently.

We train models to identify PHAs that belong to a specific category, such as SMS-fraud or phishing. Such categories are quite broad and contain a large number of samples given the number of PHA families that fit the definition. Alternatively, we also have models focusing on a much smaller scale, such as a family, which is composed of a group of apps that are part of the same PHA campaign and that share similar source code and behaviors. On the one hand, having a single model to tackle an entire PHA category may be attractive in terms of simplicity but precision may be an issue as the model will have to generalize the behaviors of a large number of PHAs believed to have something in common. On the other hand, developing multiple PHA models may require additional engineering efforts, but may result in better precision at the cost of reduced scope.

We use a variety of modeling techniques to modify our machine learning approach, including supervised and unsupervised ones.

One supervised technique we use is logistic regression, which has been widely adopted in the industry. These models have a simple structure and can be trained quickly. Logistic regression models can be analyzed to understand the importance of the different PHA and app features they are built with, allowing us to improve our feature engineering process. After a few cycles of training, evaluation, and improvement, we can launch the best models in production and monitor their performance.

For more complex cases, we employ deep learning. Compared to logistic regression, deep learning is good at capturing complicated interactions between different features and extracting hidden patterns. The millions of apps in Google Play provide a rich dataset, which is advantageous to deep learning.

In addition to our targeted feature engineering efforts, we experiment with many aspects of deep neural networks. For example, a deep neural network can have multiple layers and each layer has several neurons to process signals. We can experiment with the number of layers and neurons per layer to change model behaviors.

We also adopt unsupervised machine learning methods. Many PHAs use similar abuse techniques and tricks, so they look almost identical to each other. An unsupervised approach helps define clusters of apps that look or behave similarly, which allows us to mitigate and identify PHAs more effectively. We can automate the process of categorizing that type of app if we are confident in the model or can request help from a human expert to validate what the model found.

PHAs are constantly evolving, so our models need constant updating and monitoring. In production, models are fed with data from recent apps, which help them stay relevant. However, new abuse techniques and behaviors need to be continuously detected and fed into our machine learning models to be able to catch new PHAs and stay on top of recent trends. This is a continuous cycle of model creation and updating that also requires tuning to ensure that the precision and coverage of the system as a whole matches our detection goals.

Looking forward

As part of Google's AI-first strategy, our work leverages many machine learning resources across the company, such as tools and infrastructures developed by Google Brain and Google Research. In 2017, our machine learning models successfully detected 60.3% of PHAs identified by Google Play Protect, covering over 2 billion Android devices. We continue to research and invest in machine learning to scale and simplify the detection of PHAs in the Android ecosystem.

Acknowledgements

This work was developed in joint collaboration with Google Play Protect, Safe Browsing and Play Abuse teams with contributions from Andrew Ahn, Hrishikesh Aradhye, Daniel Bali, Hongji Bao, Yajie Hu, Arthur Kaiser, Elena Kovakina, Salvador Mandujano, Melinda Miller, Rahul Mishra, Damien Octeau, Sebastian Porst, Chuangang Ren, Monirul Sharif, Sri Somanchi, Sai Deep Tetali, Zhikun Wang, and Mo Yu.

Keeping 2 billion Android devices safe with machine learning

Posted by Sai Deep Tetali, Software Engineer, Google Play Protect

At Google I/O 2017, we introduced Google Play Protect, our comprehensive set of security services for Android. While the name is new, the smarts powering Play Protect have protected Android users for years.

Google Play Protect's suite of mobile threat protections are built into more than 2 billion Android devices, automatically taking action in the background. We're constantly updating these protections so you don't have to think about security: it just happens. Our protections have been made even smarter by adding machine learning elements to Google Play Protect.

Security at scale

Google Play Protect provides in-the-moment protection from potentially harmful apps (PHAs), but Google's protections start earlier.

Before they're published in Google Play, all apps are rigorously analyzed by our security systems and Android security experts. Thanks to this process, Android devices that only download apps from Google Play are 9 times less likely to get a PHA than devices that download apps from other sources.

After you install an app, Google Play Protect continues its quest to keep your device safe by regularly scanning your device to make sure all apps are behaving properly. If it finds an app that is misbehaving, Google Play Protect either notifies you, or simply removes the harmful app to keep your device safe.

Our systems scan over 50 billion apps every day. To keep on the cutting edge of security, we look for new risks in a variety of ways, such as identifying specific code paths that signify bad behavior, investigating behavior patterns to correlate bad apps, and reviewing possible PHAs with our security experts.

In 2016, we added machine learning as a new detection mechanism and it soon became a critical part of our systems and tools.

Training our machines

In the most basic terms, machine learning means training a computer algorithm to recognize a behavior. To train the algorithm, we give it hundreds of thousands of examples of that behavior.

In the case of Google Play Protect, we are developing algorithms that learn which apps are "potentially harmful" and which are "safe." To learn about PHAs, the machine learning algorithms analyze our entire catalog of applications. Then our algorithms look at hundreds of signals combined with anonymized data to compare app behavior across the Android ecosystem to find PHAs. They look for behavior common to PHAs, such as apps that attempt to interact with other apps on the device, access or share your personal data, download something without your knowledge, connect to phishing websites, or bypass built-in security features.

When we find apps exhibit similar malicious behavior, we group them into families. Visualizing these PHA families helps us uncover apps that share similarities to known bad apps, but have yet remained under our radar.

After we identify a new PHA, we confirm our findings with expert security reviews. If the app in question is a PHA, Google Play Protect takes action on the app and then we feed information about that PHA back into our algorithms to help find more PHAs.

Doubling down on security

So far, our machine learning systems have successfully detected 60.3% of the malware identified by Google Play Protect in 2017.

In 2018, we're devoting a massive amount of computing power and talent to create, maintain and improve these machine learning algorithms. We're constantly leveraging artificial intelligence and our highly skilled researchers and engineers from all across Google to find new ways to keep Android devices safe and secure. In addition to our talented team, we work with the foremost security experts and researchers from around the world. These researchers contribute even more data and insights to keep Google Play Protect on the cutting edge of mobile security.

To check out Google Play Protect, open the Google Play app and tap Play Protect in the left panel.

Acknowledgements: This work was developed in joint collaboration with Google Play Protect, Safe Browsing and Play Abuse teams with contributions from Andrew Ahn, Hrishikesh Aradhye, Daniel Bali, Hongji Bao, Yajie Hu, Arthur Kaiser, Elena Kovakina, Salvador Mandujano, Melinda Miller, Rahul Mishra, Damien Octeau, Sebastian Porst, Chuangang Ren, Monirul Sharif, Sri Somanchi, Sai Deep Tetali, Zhikun Wang, and Mo Yu.

Protecting WebView with Safe Browsing

Posted by Nate Fischer, Software Engineer

Since 2007, Google Safe Browsing has been protecting users across the web from phishing and malware attacks. It protects over three billion devices from an increasing number of threats, now also including unwanted software across desktop and mobile platforms. Today, we're announcing that Google Play Protect is bringing Safe Browsing to WebView by default, starting in April 2018 with the release of WebView 66.

Developers of Android apps using WebView no longer have to make any changes to benefit from this protection. Safe Browsing in WebView has been available since Android 8.0 (API level 26), using the same underlying technology as Chrome on Android. When Safe Browsing is triggered, the app will present a warning and receive a network error. Apps built for API level 27 and above can customize this behavior with new APIs for Safe Browsing.

An example of a warning shown when Safe Browsing detects a dangerous site. The style and content of the warning will vary depending on the size of the WebView.

You can learn more about customizing and controlling Safe Browsing in the Android API documentation, and you can test your application today by visiting the Safe Browsing test URL (chrome://safe-browsing/match?type=malware) while using the current WebView beta.

SafetyNet Verify Apps API, Google Play Protect at your fingertips

Posted by William Luh, Software Engineer

Google Play Protect, which includes the Verify Apps security feature, helps keep users safe from harmful apps. Google Play Protect is available on all Android devices with Google Play installed and provides users with peace of mind and insights into the state of their device security.

App developers can get similar security insights into the installed apps landscape on user devices from the SafetyNet Verify Apps API. This new suite of APIs lets developers determine whether a user's device is protected by Google Play Protect, encourage users not already using Google Play Protect to enable it, and identify any known potentially harmful apps (PHAs) that are installed on the device.

These APIs are especially useful for developers of apps that may be impacted by installed PHAs on the same device as their app. Determining that Google Play Protect is enabled with isVerifyAppsEnabled() gives developers additional assurance that a device is more likely to be clean. If a device doesn't have Google Play Protect enabled, developers can request that the user enable Google Play Protect with enableVerifyApps(). With Google Play Protect enabled, developers can use the listHarmfulApps() method to determine whether there are any potentially harmful apps installed on a user's device. This easy-to-use suite of features does not require API keys and requesting quota.

Enterprise-focused apps in particular may benefit from using the Verify Apps API. Enterprise apps are designed to safeguard a company's data from the outside world. These apps often implement strict enforcements, such as ensuring the mobile device is approved by the enterprise and requiring a strong password for lockscreens. If any of the criteria are not satisfied, the enterprise may revoke credentials and remove sensitive data from the device. Having a mechanism to enforce Google Play Protect and scan for PHAs is another tool to help enterprise app developers keep enterprise data and devices safe.

For better protection, developers should use the attestation API along with the new Verify Apps API. Use the attestation API first to establish that the device has not been modified from a known state. Once the Android system can be trusted, the results from the Verify Apps API can be trusted. Existing attestation API users may find additional benefits in using the Verify Apps API as it may be able to detect on-device PHAs. In general, using multiple signals for anti-abuse detection is encouraged.

To learn how to use this API in your app, check out the developer docs.