Variable speed playback on mobile

Variable speed playback was launched on the web several years ago and is one of our most highly requested features on mobile. Now, it’s here! You can speed up or slow down videos in the YouTube app on iOS and on Android devices running Android 5.0+. Playback speed can be adjusted from 0.25x (quarter speed) to 2x (double speed) in the overflow menu of the player controls.

The most commonly used speed setting on the web is 1.25x, closely followed by 1.5x. Speed watching is the new speed listening which was the new speed reading, especially when consuming long lectures or interviews. But variable speed isn’t just useful for skimming through content to save time, it can also be an important tool for investigating finer details. For example, you might want to slow down a tutorial to learn some new choreography or figure out a guitar strumming pattern.

To speed up or slow down audio while retaining its comprehensibility, our main challenge was to efficiently change the duration of the audio signal without affecting the pitch or introducing distortion. This process is called time stretching. Without time stretching, an audio signal that was originally at 100 Hz becomes 200 Hz at double speed causing that chipmunk effect. Similarly, slowing down the speed will lower the pitch. Time stretching can be achieved using a phase vocoder, which transforms the signal into its frequency domain representation to make phase adjustments before producing a lengthened or shortened version. Time stretching can also be done in the time domain by carefully selecting windows from the original signal to be assembled into the new one. On Android, we used the Sonic library for our audio manipulation in ExoPlayer. Sonic uses PICOLA, a time domain based algorithm. On iOS, AVplayer has a built in playback rate feature with configurable time stretching. Here, we have chosen to use the spectral (frequency domain) algorithm.

To speed up or slow down video, we render the video frames in alignment with the modified audio timestamps. Video frames are not necessarily encoded chronologically, so for the video to stay in sync with the audio playback, the video decoder needs to work faster than the rate at which the video frames need to be rendered. This is especially pertinent at higher playback speeds. On mobile, there are also often more network and hardware constraints than on desktop that limit our ability to decode video as fast as necessary. For example, less reliable wireless links will affect how quickly and accurately we can download video data, and then battery, CPU speed, and memory size will limit the processing power we can spend on decoding it. To address these issues, we adapt the video quality to be only as high as we can download dependably. The video decoder can also skip forward to the next key frame if it has fallen behind the renderer, or the renderer can drop already decoded frames to catch up to the audio track.

If you want to check out the feature, try this: turn up your volume and play the classic dramatic chipmunk at 0.5x to see an EVEN MORE dramatic chipmunk. Enjoy!


Posted by Pallavi Powale, Software Engineer, recently watched “Dramatic Chipmunk” at 0.5x speed.