Tag Archives: transformer

Common media processing operations with Jetpack Media3 Transformer

Posted by Nevin Mital – Developer Relations Engineer, and Kristina Simakova – Engineering Manager

Android users have demonstrated an increasing desire to create, personalize, and share video content online, whether to preserve their memories or to make people laugh. As such, media editing is a cornerstone of many engaging Android apps, and historically developers have often relied on external libraries to handle operations such as Trimming and Resizing. While these solutions are powerful, integrating and managing external library dependencies can introduce complexity and lead to challenges with managing performance and quality.

The Jetpack Media3 Transformer APIs offer a native Android solution that streamline media editing with fast performance, extensive customizability, and broad device compatibility. In this blog post, we’ll walk through some of the most common editing operations with Transformer and discuss its performance.

Getting set up with Transformer

To get started with Transformer, check out our Getting Started documentation for details on how to add the dependency to your project and a basic understanding of the workflow when using Transformer. In a nutshell, you’ll:

    • Create one or many MediaItem instances from your video file(s), then
    • Apply item-specific edits to them by building an EditedMediaItem for each MediaItem,
    • Create a Transformer instance configured with settings applicable to the whole exported video,
    • and finally start the export to save your applied edits to a file.
Aside: You can also use a CompositionPlayer to preview your edits before exporting, but this is out of scope for this blog post, as this API is still a work in progress. Please stay tuned for a future post!

Here’s what this looks like in code:

val mediaItem = MediaItem.Builder().setUri(mediaItemUri).build()
val editedMediaItem = EditedMediaItem.Builder(mediaItem).build()
val transformer = 
  Transformer.Builder(context)
    .addListener(/* Add a Transformer.Listener instance here for completion events */)
    .build()
transformer.start(editedMediaItem, outputFilePath)

Transcoding, Trimming, Muting, and Resizing with the Transformer API

Let’s now take a look at four of the most common single-asset media editing operations, starting with Transcoding.

Transcoding is the process of re-encoding an input file into a specified output format. For this example, we’ll request the output to have video in HEVC (H265) and audio in AAC. Starting with the code above, here are the lines that change:

val transformer = 
  Transformer.Builder(context)
    .addListener(...)
    .setVideoMimeType(MimeTypes.VIDEO_H265)
    .setAudioMimeType(MimeTypes.AUDIO_AAC)
    .build()

Many of you may already be familiar with FFmpeg, a popular open-source library for processing media files, so we’ll also include FFmpeg commands for each example to serve as a helpful reference. Here’s how you can perform the same transcoding with FFmpeg:

$ ffmpeg -i $inputVideoPath -c:v libx265 -c:a aac $outputFilePath

The next operation we’ll try is Trimming.

Specifically, we’ll set Transformer up to trim the input video from the 3 second mark to the 8 second mark, resulting in a 5 second output video. Starting again from the code in the “Getting set up” section above, here are the lines that change:

// Configure the trim operation by adding a ClippingConfiguration to
// the media item
val clippingConfiguration =
   MediaItem.ClippingConfiguration.Builder()
     .setStartPositionMs(3000)
     .setEndPositionMs(8000)
     .build()
val mediaItem =
   MediaItem.Builder()
     .setUri(mediaItemUri)
     .setClippingConfiguration(clippingConfiguration)
     .build()

// Transformer also has a trim optimization feature we can enable.
// This will prioritize Transmuxing over Transcoding where possible.
// See more about Transmuxing further down in this post.
val transformer = 
  Transformer.Builder(context)
    .addListener(...)
    .experimentalSetTrimOptimizationEnabled(true)
    .build()

With FFmpeg:

$ ffmpeg -ss 00:00:03 -i $inputVideoPath -t 00:00:05 $outputFilePath

Next, we can mute the audio in the exported video file.

val editedMediaItem = 
  EditedMediaItem.Builder(mediaItem)
    .setRemoveAudio(true)
    .build()

The corresponding FFmpeg command:

$ ffmpeg -i $inputVideoPath -c copy -an $outputFilePath

And for our final example, we’ll try resizing the input video by scaling it down to half its original height and width.

val scaleEffect = 
  ScaleAndRotateTransformation.Builder()
    .setScale(0.5f, 0.5f)
    .build()
val editedMediaItem =
  EditedMediaItem.Builder(mediaItem)
    .setEffects(
      /* audio */ Effects(emptyList(), 
      /* video */ listOf(scaleEffect))
    )
    .build()

An FFmpeg command could look like this:

$ ffmpeg -i $inputVideoPath -filter:v scale=w=trunc(iw/4)*2:h=trunc(ih/4)*2 $outputFilePath

Of course, you can also combine these operations to apply multiple edits on the same video, but hopefully these examples serve to demonstrate that the Transformer APIs make configuring these edits simple.

Transformer API Performance results

Here are some benchmarking measurements for each of the 4 operations taken with the Stopwatch API, running on a Pixel 9 Pro XL device:

(Note that performance for operations like these can depend on a variety of reasons, such as the current load the device is under, so the numbers below should be taken as rough estimates.)

Input video format: 10s 720p H264 video with AAC audio

  • Transcoding to H265 video and AAC audio: ~1300ms
  • Trimming video to 00:03-00:08: ~2300ms
  • Muting audio: ~200ms
  • Resizing video to half height and width: ~1200ms

Input video format: 25s 360p VP8 video with Vorbis audio

  • Transcoding to H265 video and AAC audio: ~3400ms
  • Trimming video to 00:03-00:08: ~1700ms
  • Muting audio: ~1600ms
  • Resizing video to half height and width: ~4800ms

Input video format: 4s 8k H265 video with AAC audio

  • Transcoding to H265 video and AAC audio: ~2300ms
  • Trimming video to 00:03-00:08: ~1800ms
  • Muting audio: ~2000ms
  • Resizing video to half height and width: ~3700ms

One technique Transformer uses to speed up editing operations is by prioritizing transmuxing for basic video edits where possible. Transmuxing refers to the process of repackaging video streams without re-encoding, which ensures high-quality output and significantly faster processing times.

When not possible, Transformer falls back to transcoding, a process that involves first decoding video samples into raw data, then re-encoding them for storage in a new container. Here are some of these differences:

Transmuxing

    • Transformer’s preferred approach when possible - a quick transformation that preserves elementary streams.
    • Only applicable to basic operations, such as rotating, trimming, or container conversion.
    • No quality loss or bitrate change.
Transmux

Transcoding

    • Transformer's fallback approach in cases when Transmuxing isn't possible - Involves decoding and re-encoding elementary streams.
    • More extensive modifications to the input video are possible.
    • Loss in quality due to re-encoding, but can achieve a desired bitrate target.
Transcode

We are continuously implementing further optimizations, such as the recently introduced experimentalSetTrimOptimizationEnabled setting that we used in the Trimming example above.

A trim is usually performed by re-encoding all the samples in the file, but since encoded media samples are stored chronologically in their container, we can improve efficiency by only re-encoding the group of pictures (GOP) between the start point of the trim and the first keyframes at/after the start point, then stream-copying the rest.

Since we only decode and encode a fixed portion of any file, the encoding latency is roughly constant, regardless of what the input video duration is. For long videos, this improved latency is dramatic. The optimization relies on being able to stitch part of the input file with newly-encoded output, which means that the encoder's output format and the input format must be compatible.

If the optimization fails, Transformer automatically falls back to normal export.

What’s next?

As part of Media3, Transformer is a native solution with low integration complexity, is tested on and ensures compatibility with a wide variety of devices, and is customizable to fit your specific needs.

To dive deeper, you can explore Media3 Transformer documentation, run our sample apps, or learn how to complement your media editing pipeline with Jetpack Media3. We’ve already seen app developers benefit greatly from adopting Transformer, so we encourage you to try them out yourself to streamline your media editing workflows and enhance your app’s performance!

Apps adopt Transformer to support more reliable and performant media editing use cases

Posted by Caren Chang – Developer Relations Engineer

The Jetpack Media3 library enables Android apps to build high quality media apps. As part of the Media3 library, the Transformer module aims to provide easy to use, reliable, and performant APIs for transcoding and editing media.

For example, apps can use Transformer to apply editing operations such as trimming a long piece of media file, or applying effects to video tracks. Transformer can also be used to convert media files from one format to another, such as adjusting the resolution or encoding of the media file.

Developing Transformer APIs

As part of the process to introduce new APIs, our engineering team works closely with Google apps such as Google Photos to test and experiment the new APIs. Experimental flags are first introduced to enable performance improvements. Once the results are successful and conclusive, these experimental features are then built into the default API implementations or promoted to public APIs for all apps to use. This approach allows Transformer APIs to be tested on a wide variety of devices.

Transformer Adoption in apps

Apps that have been using Transformer in production observed in-app performance improvements, less code to maintain, and better developer experience. Let’s take a closer look at how Transformer has helped apps for their media-editing use cases.

One of users’ favorite features in Google Photos is memory sharing, where snippets of your life story that are curated and presented as Google Photos memories can now be shared as videos to social media and chat apps. However, the process of combining media items to create a video on device is resource intensive and subject to significant latency, especially on low-end devices. To reduce this latency and enable the feature on a wider range of devices, Photos adopted Transformer in their media creation pipeline. Along with other improvements made, the team found that Transformer played a part in reducing the median user latency for creating memory videos by 41% on high-end devices and 27% on mid-range devices.

The Photos app also enables users to perform media edits such as trimming or rotating a video. By adopting Transformer APIs for rotating videos, median save latency was reduced by 79% for applicable videos. The app also adopted Transformer’s API for optimizing video trimming, and observed video save latency decrease by 64%.

1 Second Everyday is a personal video journal that helps you create captivating montages and timelapses. One of the app’s main user journeys is sequentially combining short videos to create a meaningful movie. After adopting Transformer for this use case, the app observed that video encoding performance was up to 5x faster, allowing them to explore enabling 4k and HDR support. The Transformer adoption also helped decrease relevant code by 30%, making it easier for the developers to maintain the code base.

BandLab is the next-generation music creation platform used by millions around the world to make and share their music. The app originally used MediaCodecs for their video creation use cases, but found that the low level implementation resulted in native crashes that were difficult to debug. After researching more on Transformer, the team made the decision to migrate from MediaCodecs to Transformer. Overall, it only took the team 12 working days for the migration, and this resulted in a simpler codebase and more maintainable pipeline for their media creation use cases. In addition, the app observed that all previously observed native crashes were no longer occurring anymore.

What’s next for Transformers?

We’re excited to see Transformer’s adoption in the developer community, and will continue adding new features to support more media-editing use cases for the Android ecosystem including:

    • Better support for previewing media edits
    • Improving the performance and developer experience for video frame extraction
    • Easier integration with AI effects
    • and much more

Keep an eye out on what we’re working on in the Media3 Github, and file feature requests to help shape the future of Transformer!

Media transcoding and editing, transform and roll out!

Posted by Andrew Lewis - Software Engineer, Android Media Solutions

The creation of user-generated content is on the rise, and users are looking for more ways to personalize and add uniqueness to their creations. These creations are then shared to a vast network of devices, each with its own capabilities. The Jetpack Media3 1.0 release includes new functionality in the Transformer module for converting media files between formats, or transcoding, and applying editing operations. For example, you can trim a clip from a longer piece of media and apply effects to the video track to share over social media, or transcode media into a more efficient codec for upload to a server.

The overall goal of Transformer is to provide an easy to use, reliable and performant API for transcoding and editing media, including support for customizing functionality, following the same API design principles to ExoPlayer. The library is supported on devices running Android 5.0 Lollipop (API 21) onwards and includes device-specific optimizations, giving developers a strong foundation to build on. This post gives an introduction to the new functionality and describes some of the many features we're planning for upcoming releases!


Getting Started

Most operations with Transformer will follow the same general pattern:

  1. Configure a TransformationRequest with settings like your desired output format
  2. Create a Transformer and pass it your TransformationRequest
  3. Apply additional effects and edits
  4. Attach a listener to react to completion events
  5. Start the transformation

Of course, depending on your desired transformations, you may not need every step. Here's an example of transcoding an input video to the H.265/HEVC video format and removing the audio track.

// Create a TransformationRequest and set the output format to H.265 val transformationRequest = TransformationRequest.Builder().setVideoMimeType(MimeTypes.VIDEO_H265).build() // Create a Transformer val transformer = Transformer.Builder(context) .setTransformationRequest(transformationRequest) // Pass in TransformationRequest .setRemoveAudio(true) // Remove audio track .addListener(transformerListener) // transformerListener is an implementation of Transformer.Listener .build() // Start the transformation val inputMediaItem = MediaItem.fromUri("path_to_input_file") transformer.startTransformation(inputMediaItem, outputPath)

During transformation you can get progress updates with Transformer.getProgress. When the transformation completes the listener is notified in its onTransformationCompleted or onTransformationError callback, and you can process the output media as needed.

Check out our documentation to learn about further capabilities in the Transformer APIs. You can also find details about using Transformer to accurately convert 10-bit HDR content to 8-bit SDR in the "Dealing with color washout" blog post to ensure your video's colors remain as vibrant as possible in the case that your app or the device doesn't support HDR content.


Edits, effects, and extensions

Media3 includes a set of core video effects for simple edits, such as scaling, cropping, and color filters, which you can use with Transformer. For example, you can create a Presentation effect to scale the input to 480p resolution while maintaining the original aspect ratio, and apply it with setVideoEffects:

Transformer.Builder(context) .setVideoEffects(listOf(Presentation.createForHeight(480))) .build()

You can also chain multiple effects to create more complex results. This example converts the input video to grayscale and rotates it by 30 degrees:

Transformer.Builder(context) .setVideoEffects(listOf( RgbFilter.createGrayscaleFilter(), ScaleToFitTransformation.Builder() .setRotationDegrees(30f) .build())) .build()

It's also possible to extend Transformer’s functionality by implementing custom effects that build on existing ones. Here is an example of subclassing MatrixTransformation, where we start zoomed in by 2 times, then zoom out gradually as the frame presentation time increases:

val zoomOutEffect = MatrixTransformation { presentationTimeUs -> val transformationMatrix = Matrix() val scale = 2 - min(1f, presentationTimeUs / 1_000_000f) // Video will zoom from 2x to 1x in the first second transformationMatrix.postScale(/* sx= */ scale, /* sy= */ scale) transformationMatrix // The calculated transformations will be applied each frame in turn } Transformer.Builder(context) .setVideoEffects(listOf(zoomOutEffect)) .build()

Here's a screen recording that shows this effect being applied in the Transformer demo app:

moving image showing what subclassing matrix transformation looks like in the Transformer demo app

For even more advanced use cases, you can wrap your own OpenGL code or other processing libraries in a custom GL texture processor and plug those into Transformer as custom effects. See the demo app for some examples of custom effects. The README also has instructions for trying a demo of MediaPipe integration with Transformer.


Coming soon

Transformer is actively under development but ready to use, so please give it a try and share your feedback! The Media3 development branch includes a sneak peek into several new features building on the 1.0 release described here, including support for tone-mapping HDR videos to SDR using OpenGL, previewing video effects using ExoPlayer.setVideoEffects, and custom audio processing. We are also working on support for editing multiple videos in more flexible compositions, with export from Transformer and playback through ExoPlayer, making Media3 an end-to-end solution for transforming media.

We hope you'll find Transformer an easy-to-use and powerful tool for implementing fantastic media editing experiences on Android! You can send us feature requests and bug reports in the Media3 GitHub issue tracker, and follow this blog to get updates on new features. Stay tuned for our upcoming talk “High quality Android media experiences” at Google I/O.