page.title=Avoiding Priority Inversion @jd:body

In this document

This article explains how the Android's audio system attempts to avoid priority inversion, as of the Android 4.1 (Jellybean) release, and highlights techniques that you can use too.

These techniques may be useful to developers of high-performance audio apps, OEMs, and SoC providers who are implementing an audio HAL. Please note that implementing these techniques is not guaranteed to prevent glitches or other failures, particularly if used outside of the audio context. Your results may vary and you should conduct your own evaluation and testing.

Background

The Android audio server "AudioFlinger" and AudioTrack/AudioRecord client implementation are being re-architected to reduce latency. This work started in Android 4.1 (Jellybean), continued in 4.2 (Jellybean MR1), and more improvements are likely in "K".

The lower latency needed many changes throughout the system. One important change was to assign CPU resources to time-critical threads with a more predictable scheduling policy. Reliable scheduling allows the audio buffer sizes and counts to be reduced, while still avoiding artifacts due to underruns.

Priority Inversion

Priority inversion is a classic failure mode of real-time systems, where a higher-priority task is blocked for an unbounded time waiting for a lower-priority task to release a resource such as [shared state protected by] a mutex.

In an audio system, priority inversion typically manifests as a glitch (click, pop, dropout), repeated audio when circular buffers are used, or delay in responding to a command.

In the Android audio implementation, priority inversion is most likely to occur in these places, and so we focus attention here:

As of this writing, reduced latency for AudioRecord is planned but not yet implemented. The likely priority inversion spots will be similar to those for AudioTrack.

Common Solutions

The typical solutions listed in the Wikipedia article include:

Disabling interrupts is not feasible in Linux user space, and does not work for SMP.

Priority inheritance futexes (fast user-space mutexes) are available in Linux kernel, but are not currently exposed by the Android C runtime library Bionic. We chose not to use them in the audio system because they are relatively heavyweight, and because they rely on a trusted client.

Techniques used by Android

We started with "try lock" and lock with timeout. These are non-blocking and bounded blocking variants of the mutex lock operation. Try lock and lock with timeout worked fairly well for us, but were susceptible to a couple of obscure failure modes: the server was not guaranteed to be able to access the shared state if the client happened to be busy, and the cumulative timeout could be too long if there was a long sequence of unrelated locks that all timed out.

We also use atomic operations such as:

All of these return the previous value, and include the necessary SMP barriers. The disadvantage is they can require unbounded retries. In practice, we've found that the retries are not a problem.

Note: atomic operations and their interactions with memory barriers are notoriously badly misunderstood and used incorrectly. We include these here for completeness, but recommend you also read the article SMP Primer for Android for further information.

We still have and use most of the above tools, and have recently added these techniques:

Non-Blocking Algorithms

Non-blocking algorithms have been a subject of much recent study. But with the exception of single-reader single-writer FIFO queues, we've found them to be complex and error-prone.

In Android 4.2 (Jellybean MR1), you can find our non-blocking, single-reader/writer classes in these locations:

These were designed specifically for AudioFlinger and are not general-purpose. Non-blocking algorithms are notorious for being difficult to debug. You can look at this code as a model, but be aware there may be bugs, and the classes are not guaranteed to be suitable for other purposes.

For developers, we may update some of the sample OpenSL ES application code to use non-blocking, or referencing a non-Android open source library.

Tools

To the best of our knowledge, there are no automatic tools for finding priority inversion, especially before it happens. Some research static code analysis tools are capable of finding priority inversions if able to access the entire codebase. Of course, if arbitrary user code is involved (as it is here for the application) or is a large codebase (as for the Linux kernel and device drivers), static analysis may be impractical. The most important thing is to read the code very carefully and get a good grasp on the entire system and the interactions. Tools such as systrace and ps -t -p are useful for seeing priority inversion after it occurs, but do not tell you in advance.

A Final Word

After all of this discussion, don't be afraid of mutexes. Mutexes are your friend for ordinary use, when used and implemented correctly in ordinary non-time-critical use cases. But between high- and low-priority tasks and in time-sensitive systems mutexes are more likely to cause trouble.