Tag Archives: PyTorch

PyTorch machine learning models on Android

Posted by Paul Ruiz – Senior Developer Relations Engineer

Earlier this year we launched Google AI Edge, a suite of tools with easy access to ready-to-use ML tasks, frameworks that enable you to build ML pipelines, and run popular LLMs and custom models – all on-device. For AI on Android Spotlight Week, the Google team is highlighting various ways that Android developers can use machine learning to help improve their applications.

In this post, we'll dive into Google AI Edge Torch, which enables you to convert PyTorch models to run locally on Android and other platforms, using the Google AI Edge LiteRT (formerly TensorFlow Lite) and MediaPipe Tasks libraries. For insights on other powerful tools, be sure to explore the rest of the AI on Android Spotlight Week content.

To get started with Google AI Edge easier, we've provided samples available on GitHub as an executable codelab. They demonstrate how to convert the MobileViT model for image classification (compatible with MediaPipe Tasks) and the DIS model for segmentation (compatible with LiteRT).

a red Android figurine is shown next to a black and white silhouette of the same figure, labeled 'Original Image' and 'PT Mask' respectively, demonstrating image segmentation.

DIS model output

This blog guides you through how to use the MobileViT model with MediaPipe Tasks. Keep in mind that the LiteRT runtime provides similar capabilities, enabling you to build custom pipelines and features.

Convert MobileViT model for image classification compatible with MediaPipe Tasks

Once you've installed the necessary dependencies and utilities for your app, the first step is to retrieve the PyTorch model you wish to convert, along with any other MobileViT components you might need (such as an image processor for testing).

from transformers import MobileViTImageProcessor, MobileViTForImageClassification

hf_model_path = 'apple/mobilevit-small'
processor = MobileViTImageProcessor.from_pretrained(hf_model_path)
pt_model = MobileViTForImageClassification.from_pretrained(hf_model_path)

Since the end result of this tutorial should work with MediaPipe Tasks, take an extra step to match the expected input and output shapes for image classification to what is used by the MediaPipe image classification Task.

class HF2MP_ImageClassificationModelWrapper(nn.Module):

  def __init__(self, hf_image_classification_model, hf_processor):
    super().__init__()
    self.model = hf_image_classification_model
    if hf_processor.do_rescale:
      self.rescale_factor = hf_processor.rescale_factor
    else:
      self.rescale_factor = 1.0

  def forward(self, image: torch.Tensor):
    # BHWC -> BCHW.
    image = image.permute(0, 3, 1, 2)
    # RGB -> BGR.
    image = image.flip(dims=(1,))
    # Scale [0, 255] -> [0, 1].
    image = image * self.rescale_factor
    logits = self.model(pixel_values=image).logits  # [B, 1000] float32.
    # Softmax is required for MediaPipe classification model.
    logits = torch.nn.functional.softmax(logits, dim=-1)

    return logits

hf_model_path = 'apple/mobilevit-small'
hf_mobile_vit_processor = MobileViTImageProcessor.from_pretrained(hf_model_path)
hf_mobile_vit_model = MobileViTForImageClassification.from_pretrained(hf_model_path)
wrapped_pt_model = HF2MP_ImageClassificationModelWrapper(
hf_mobile_vit_model, hf_mobile_vit_processor).eval()

Whether you plan to use the converted MobileViT model with MediaPipe Tasks or LiteRT, the next step is to convert the model to the .tflite format.

First, match the input shape. In this example, the input shape is 1, 256, 256, 3 for a 256x256 pixel three-channel RGB image.

Then, call AI Edge Torch's convert function to complete the conversion process.

import ai_edge_torch

sample_args = (torch.rand((1, 256, 256, 3)),)
edge_model = ai_edge_torch.convert(wrapped_pt_model, sample_args)

After converting the model, you can further refine it by incorporating metadata for the image classification labels. MediaPipe Tasks will utilize this metadata to display or return pertinent information after classification.

from mediapipe.tasks.python.metadata.metadata_writers import image_classifier
from mediapipe.tasks.python.metadata.metadata_writers import metadata_writer
from mediapipe.tasks.python.vision.image_classifier import ImageClassifier
from pathlib import Path

flatbuffer_file = Path('hf_mobile_vit_mp_image_classification_raw.tflite')
edge_model.export(flatbuffer_file)
tflite_model_buffer = flatbuffer_file.read_bytes()

//Extract the image classification labels from the HF models for later integration into the TFLite model.
labels = list(hf_mobile_vit_model.config.id2label.values())

writer = image_classifier.MetadataWriter.create(
    tflite_model_buffer,
    input_norm_mean=[0.0], #  Normalization is not needed for this model.
    input_norm_std=[1.0],
    labels=metadata_writer.Labels().add(labels),
)
tflite_model_buffer, _ = writer.populate()

With all of that completed, it's time to integrate your model into an Android app. If you're following the official Colab notebook, this involves saving the model locally. For an example of image classification with MediaPipe Tasks, explore the GitHub repository. You can find more information in the official Google AI Edge documentation.

Newly converted ViT model with MediaPipe Tasks

After understanding how to convert a simple image classification model, you can use the same techniques to adapt various PyTorch models for Google AI Edge LiteRT or MediaPipe Tasks tooling on Android.

For further model optimization, consider methods like quantizing during conversion. Check out the GitHub example to learn more about how to convert a PyTorch image segmentation model to LiteRT and quantize it.

What's Next

To keep up to date on Google AI Edge developments, look for announcements on the Google for Developers YouTube channel and blog.

We look forward to hearing about how you're using these features in your projects. Use #AndroidAI hashtag to share your feedback or what you've built in social media and check out other content in AI on Android Spotlight Week!

Source: Android Developers Blog

PJRT: Simplifying ML Hardware and Framework Integration

Infrastructure fragmentation in Machine Learning (ML) across frameworks, compilers, and runtimes makes developing new hardware and toolchains challenging. This inhibits the industry’s ability to quickly productionize ML-driven advancements. To simplify the growing complexity of ML workload execution across hardware and frameworks, we are excited to introduce PJRT and open source it as part of the recently available OpenXLA Project.

PJRT (used in conjunction with OpenXLA’s StableHLO) provides a hardware- and framework-independent interface for compilers and runtimes. It simplifies the integration of hardware with frameworks, accelerating framework coverage for the hardware, and thus hardware targetability for workload execution.

PJRT is the primary interface for TensorFlow and JAX and fully supported for PyTorch, and is well integrated with the OpenXLA ecosystem to execute workloads on TPU, GPU, and CPU. It is also the default runtime execution path for most of Google’s internal production workloads. The toolchain-independent architecture of PJRT allows it to be leveraged by any hardware, framework, or compiler, with extensibility for unique features. With this open-source release, we're excited to allow anyone to begin leveraging PJRT for their own devices.

If you’re developing an ML hardware accelerator or developing your own compiler and runtime, check out the PJRT source code on GitHub and sign up for the OpenXLA mailing list to quickly bootstrap your work.

Vision: Simplifying ML Hardware and Framework Integration

We are entering a world of ambient experiences where intelligent apps and devices surround us, from edge to the cloud, in a range of environments and scales. ML workload execution currently supports a combinatorial matrix of hardware, frameworks, and workflows, mostly through tight vertical integrations. Examples of such vertical integrations include specific kernels for TPU versus GPU, specific toolchains to train and serve in TensorFlow versus PyTorch. These bespoke 1:1 integrations are perfectly valid solutions but promote lock-in, inhibit innovation, and are expensive to maintain. This problem of a fragmented software stack is compounded over time as different computing hardware needs to be supported.

A variety of ML hardware exists today and hardware diversity is expected to increase in the future. ML users have options and they want to exercise them seamlessly: users want to train a large language model (LLM) on TPU in the Cloud, batch infer on GPU or even CPU, distill, quantize, and finally serve them on mobile processors. Our goal is to solve the challenge of making ML workloads portable across hardware by making it easy to integrate the hardware into the ML infrastructure (framework, compiler, runtime).

Portability: Seamless Execution

The workflow to enable this vision with PJRT is as follows (shown in Figure 1):

The hardware-specific compiler and runtime provider implement the PJRT API, package it as a plugin containing the compiler and runtime hooks, and register it with the frameworks. The implementation can be opaque to the frameworks.
The frameworks discover and load one or multiple PJRT plugins as dynamic libraries targeting the hardware on which to execute the workload.
That’s it! Execute the workload from the framework onto the target hardware.

The PJRT API will be backward compatible. The plugin would not need to change often and would be able to do version-checking for features.

Figure 1: To target specific hardware, provide an implementation of the PJRT API to package a compiler and runtime plugin that can be called by the framework.

Cohesive Ecosystem

As a foundational pillar of the OpenXLA Project, PJRT is well-integrated with projects within the OpenXLA Project including StableHLO and the OpenXLA compilers (XLA, IREE). It is the primary interface for TensorFlow and JAX and fully supported for PyTorch through PyTorch/XLA. It provides the hardware interface layer in solving the combinatorial framework x hardware ML infrastructure fragmentation (see Figure 2).

Diagram of PJRT hardware interface layer

Figure 2: PJRT provides the hardware interface layer in solving the combinatorial framework x hardware ML infrastructure fragmentation, well-integrated with OpenXLA.

Toolchain Independent

PJRT is hardware and framework independent. With framework integration through the self-contained IR StableHLO, PJRT is not coupled with a specific compiler, and can be used outside of the OpenXLA ecosystem, including with other proprietary compilers. The public availability and toolchain-independent architecture allows it to be used by any hardware, framework or compiler, with extensibility for unique features. If you are developing an ML hardware accelerator, compiler, or runtime targeting any hardware, or converging siloed toolchains to solve infrastructure fragmentation, PJRT can minimize bespoke hardware and framework integration, providing greater coverage and improving time-to-market at lower development cost.

Driving Impact with Collaboration

Industry partners such as Intel and others have already adopted PJRT.

Intel

Intel is leveraging PJRT in Intel® Extension for TensorFlow to provide the Intel GPU backend for TensorFlow and JAX. This implementation is based on the PJRT plugin mechanism (see RFC). Check out how this greatly simplifies the framework and hardware integration with this example of executing a JAX program on Intel GPU.

"At Intel, we share Google's vision of modular interfaces to make integration easier and enable faster, framework-independent development. Similar in design to the PluggableDevice mechanism, PJRT is a pluggable interface that allows us to easily compile and execute XLA's High Level Operations on Intel devices. Its simple design allowed us to quickly integrate it into our systems and start running JAX workloads on Intel® GPUs within just a few months. PJRT enables us to more efficiently deliver hardware acceleration and oneAPI-powered AI software optimizations to developers using a wide range of AI Frameworks." - Wei Li, VP and GM, Artificial Intelligence and Analytics, Intel.

Technology Leader

We’re also working with a technology leader to leverage PJRT to provide the backend targeting their proprietary processor for JAX. More details on this to follow soon.

Get Involved

PJRT is available on GitHub: source code for the API and a reference openxla-pjrt-plugin, and integration guides. If you develop ML frameworks, compilers, or runtimes, or are interested in improving portability of workloads across hardware, we want your feedback. We encourage you to contribute code, design ideas, and feature suggestions. We also invite you to join the OpenXLA mailing list to stay updated with the latest product and community announcements and to help shape the future of an interoperable ML infrastructure.

Acknowledgements

Allen Hutchison, Andrew Leaver, Chuanhao Zhuge, Jack Cao, Jacques Pienaar, Jieying Luo, Penporn Koanantakool, Peter Hawkins, Robert Hundt, Russell Power, Sagarika Chalasani, Skye Wanderman-Milne, Stella Laurenzo, Will Cromar, Xiao Yu.

By Aman Verma, Product Manager, Machine Learning Infrastructure

Source: Google Open Source Blog

Accelerate your models to production with Google Cloud and PyTorch

We believe in the power of choice for Machine Learning development, and continue to invest resources to make it easy for ML practitioners to train, deploy, and orchestrate models from a single unified data and AI cloud platform. We’re excited to announce our role as a founding member of the newly formed PyTorch Foundation, which will better position Google Cloud to make meaningful contributions to the PyTorch community. As a member of the board, we will deepen our open source investment to deliver on the Foundation’s mission to drive adoption of AI tooling by building an ecosystem of open source projects with PyTorch. We strongly believe in choice and will continue to invest in frameworks such as JAX and Tensorflow and support integrations with other OSS Projects including Spark, Airflow, XGBoost, and others.

In this blog, we provide an overview of existing resources to help you get started with PyTorch on Google Cloud. We also talk about how ML practitioners can leverage our end-to-end ML platform to train, tune, and deploy PyTorch models.

PyTorch on Google Cloud

Open source in the cloud is important because it gives you flexibility and control over where you train and deploy your ML workloads. PyTorch is extensively used in the research space and in recent years it has gained immense traction in the industry due to its ease of use and deployment. In fact, according to a survey of Kaggle users, PyTorch is the fastest growing ML framework today.

ML practitioners using PyTorch tell us that it can be challenging to advance their ML project past experimentation. This is why Google Cloud has built integrations with PyTorch that make it easier to train, deploy, and orchestrate models in production. Some examples are:

PyTorch integrates directly with Vertex AI, a fully managed ML platform that provides the tools you need to take a model from PyTorch to production, like the Pytorch DL containers or the Vertex AI workbench PyTorch one-click JupyterLab environment.
PyTorch/XLA, an open source library, uses the XLA deep learning compiler to enable PyTorch to run on Cloud TPUs. Cloud TPUs are custom accelerators designed by Google, optimized for perf/TCO with large scale ML workload PyTorch/XLA also enables XLA driven optimizations on GPUs.
TorchX provides an adapter to run and orchestrate TorchX components as part of Kubeflow Pipelines that you can easily scale on Vertex AI Pipelines.
With our OSS contributions to Apache Beam, we have made PyTorch models easy to deploy in batch or stream, data processing pipelines. Running on Google Dataflow, these pipelines will scale to very large workloads in a fully managed and simple to maintain environment.

To learn more and start using PyTorch on Google Cloud, check out the resources below:

PyTorch on Vertex AI Resources

How To train and tune PyTorch models on Vertex AI: Learn how to use Vertex AI Training to build and train a sentiment text classification model using PyTorch and Vertex AI Hyperparameter Tuning to tune hyperparameters of PyTorch models.
How to deploy PyTorch models on Vertex AI: Walk through the deployment of a Pytorch model using TorchServe as a custom container, by deploying the model artifacts to a Vertex Prediction service.
Orchestrating PyTorch ML Workflows on Vertex AI Pipelines: See how to build and orchestrate ML pipelines for training and deploying PyTorch models on Google Cloud Vertex AI using Vertex AI Pipelines.
Scalable ML Workflows using PyTorch on Kubeflow Pipelines and Vertex Pipelines: Take a look at examples of PyTorch-based ML workflows on two pipelines frameworks: OSS Kubeflow Pipelines, part of the Kubeflow project, and Vertex AI Pipelines. We share new PyTorch built-in components added to the Kubeflow Pipelines.

PyTorch/XLA and Cloud TPU/GPU

Scaling deep learning workloads with PyTorch / XLA and Cloud TPU VM: Describes the challenges associated with scaling deep learning jobs to distributed training settings, using the Cloud TPU VM and shows how to stream training data from Google Cloud Storage (GCS) to PyTorch / XLA models running on Cloud TPU Pod slices.
PyTorch/XLA: Performance debugging on Cloud TPU VM: Part I: In the first part of the performance debugging series on Cloud TPU, we lay out the conceptual framework for PyTorch/XLA in the context of training performance. We introduced a case study to make sense of preliminary profiler logs and identify the corrective actions.
PyTorch/XLA: Performance debugging on Cloud TPU VM: Part II: In the second part, we deep dive into further analysis of the performance debugging to discover more performance improvement opportunities.
PyTorch/XLA: Performance debugging on Cloud TPU VM: Part III: In the final part of the performance debugging series, we introduce user defined code annotation and visualize these annotations in the form of a trace.
Train ML models with Pytorch Lightning on TPUs: Learn how easy it is to start training models with PyTorch Lightning on TPUs with its built-in TPU support.

Other resources

Increase your productivity using PyTorch Lightning: Learn how to use PyTorch Lightning on Vertex AI Workbench (was previously Notebooks).

By Erwin Huizing and Grace Reed – Cloud AI and ML

googblogs.com

All Google blogs and Press in one site

Tag Archives: PyTorch

PyTorch machine learning models on Android

Convert MobileViT model for image classification compatible with MediaPipe Tasks

What's Next

Source: Android Developers Blog

PJRT: Simplifying ML Hardware and Framework Integration

Vision: Simplifying ML Hardware and Framework Integration

Portability: Seamless Execution

Cohesive Ecosystem

Toolchain Independent

Driving Impact with Collaboration

Intel

Technology Leader

Get Involved

Acknowledgements

Source: Google Open Source Blog

Accelerate your models to production with Google Cloud and PyTorch

PyTorch on Google Cloud

PyTorch on Vertex AI Resources

PyTorch/XLA and Cloud TPU/GPU

Other resources

Source: Google Open Source Blog