diff options
Diffstat (limited to 'en/devices/tech/debug/eval_perf.html')
-rw-r--r-- | en/devices/tech/debug/eval_perf.html | 266 |
1 files changed, 266 insertions, 0 deletions
diff --git a/en/devices/tech/debug/eval_perf.html b/en/devices/tech/debug/eval_perf.html new file mode 100644 index 00000000..44173fe0 --- /dev/null +++ b/en/devices/tech/debug/eval_perf.html @@ -0,0 +1,266 @@ +<html devsite> + <head> + <title>Evaluating Performance</title> + <meta name="project_path" value="/_project.yaml" /> + <meta name="book_path" value="/_book.yaml" /> + </head> + <body> + <!-- + Copyright 2017 The Android Open Source Project + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. + --> + + +<p>There are two user-visible indicators of performance:</p> + +<ul> +<li><strong>Predictable, perceptible performance</strong>. Does the user +interface (UI) drop frames or consistently render at 60FPS? Does audio play +without artifacts or popping? How long is the delay between the user touching +the screen and the effect showing on the display?</li> +<li><strong>Length of time required for longer operations</strong> (such as +opening applications).</li> +</ul> + +<p>The first is more noticeable than the second. Users typically notice jank +but they won't be able to tell 500ms vs 600ms application startup time unless +they are looking at two devices side-by-side. Touch latency is immediately +noticeable and significantly contributes to the perception of a device.</p> + +<p>As a result, in a fast device, the UI pipeline is the most important thing in +the system other than what is necessary to keep the UI pipeline functional. This +means that the UI pipeline should preempt any other work that is not necessary +for fluid UI. To maintain a fluid UI, background syncing, notification delivery, +and similar work must all be delayed if UI work can be run. It is +acceptable to trade the performance of longer operations (HDR+ runtime, +application startup, etc.) to maintain a fluid UI.</p> + +<h2 id="capacity_vs_jitter">Capacity vs jitter</h2> +<p>When considering device performance, <em>capacity</em> and <em>jitter</em> +are two meaningful metrics.</p> + +<h3 id="capacity">Capacity</h3> +<p>Capacity is the total amount of some resource that the device possesses over +some amount of time. This can be CPU resources, GPU resources, I/O resources, +network resources, memory bandwidth, or any similar metric. When examining +whole-system performance, it can be useful to abstract the individual components +and assume a single metric that determines performance (especially when tuning a +new device because the workloads run on that device are likely fixed).</p> + +<p>The capacity of a system varies based on the computing resources online. +Changing CPU/GPU frequency is the primary means of changing capacity, but there +are others such as changing the number of CPU cores online. Accordingly, the +capacity of a system corresponds with power consumption; <strong>changing +capacity always results in a similar change in power consumption.</strong></p> + +<p>The capacity required at a given time is overwhelmingly determined by the +running application. As a result, the platform can do little to adjust the +capacity required for a given workload, and the means to do so are limited to +runtime improvements (Android framework, ART, Bionic, GPU compiler/drivers, +kernel).</p> + +<h3 id="jitter">Jitter</h3> +<p>While the required capacity for a workload is easy to see, jitter is a more +nebulous concept. For a good introduction to jitter as an impediment to fast +systems, refer to +<em><a href="http://permalink.lanl.gov/object/tr?what=info:lanl-repo/lareport/LA-UR-03-3116">THE +CASE OF THE MISSING SUPERCOMPUTER PERFORMANCE: ACHIEVING OPTIMAL PERFORMANCE ON +THE 8,192 PROCESSORS OF ASCl Q</em></a>. (It's an investigation of why the ASCI +Q supercomputer did not achieve its expected performance and is a great +introduction to optimizing large systems.)</p> + +<p>This page uses the term jitter to describe what the ASCI Q paper calls +<em>noise</em>. Jitter is the random system behavior that prevents perceptible +work from running. It is often work that must be run, but it may not have strict +timing requirements that cause it to run at any particular time. Because it is +random, it is extremely difficult to disprove the existence of jitter for a +given workload. It is also extremely difficult to prove that a known source of +jitter was the cause of a particular performance issue. The tools most commonly +used for diagnosing causes of jitter (such as tracing or logging) can introduce +their own jitter.</p> + +<p>Sources of jitter experienced in real-world implementations of Android +include:</p> +<ul> +<li>Scheduler delay</li> +<li>Interrupt handlers</li> +<li>Driver code running for too long with preemption or interrupts disabled</li> +<li>Long-running softirqs</li> +<li>Lock contention (application, framework, kernel driver, binder lock, mmap +lock)</li> +<li>File descriptor contention where a low-priority thread holds the lock on a +file, preventing a high-priority thread from running</li> +<li>Running UI-critical code in workqueues where it could be delayed</li> +<li>CPU idle transitions</li> +<li>Logging</li> +<li>I/O delays</li> +<li>Unnecessary process creation (e.g., CONNECTIVITY_CHANGE broadcasts)</li> +<li>Page cache thrashing caused by insufficient free memory</li> +</ul> + +<p>The required amount of time for a given period of jitter may or may not +decrease as capacity increases. For example, if a driver leaves interrupts +disabled while waiting for a read from across an i2c bus, it will take a fixed +amount of time regardless of whether the CPU is at 384MHz or 2GHz. Increasing +capacity is not a feasible solution to improve performance when jitter is +involved. As a result, <strong>faster processors will not usually improve +performance in jitter-constrained situations.</strong></p> + +<p>Finally, unlike capacity, jitter is almost entirely within the domain of the +system vendor.</p> + +<h3 id="memory_consumption">Memory consumption</h3> +<p>Memory consumption is traditionally blamed for poor performance. While +consumption itself is not a performance issue, it can cause jitter via +lowmemorykiller overhead, service restarts, and page cache thrashing. Reducing +memory consumption can avoid the direct causes of poor performance, but there +may be other targeted improvements that avoid those causes as well (for example, +pinning the framework to prevent it from being paged out when it will be paged +in soon after).</p> + +<h2 id="analyze_initial">Analyzing initial device performance</h2> +<p>Starting from a functional but poorly-performing system and attempting to fix +the system's behavior by looking at individual cases of user-visible poor +performance is <strong>not</strong> a sound strategy. Because poor performance +is usually not easily reproducible (i.e., jitter) or an application issue, too +many variables in the full system prevent this strategy from being effective. As +a result, it's very easy to misidentify causes and make minor improvements while +missing systemic opportunities for fixing performance across the system.</p> + +<p>Instead, use the following general approach when bringing up a new +device:</p> +<ol> +<li>Get the system booting to UI with all drivers running and some basic +frequency governor settings (if you change the frequency governor settings, +repeat all steps below).</li> +<li>Ensure the kernel supports the <code>sched_blocked_reason</code> tracepoint +as well as other tracepoints in the display pipeline that denote when the frame +is delivered to the display.</li> +<li>Take long traces of the entire UI pipeline (from receiving input via an IRQ +to final scanout) while running a lightweight and consistent workload (e.g., +<a href="https://android.googlesource.com/platform/frameworks/base.git/+/master/tests/UiBench/">UiBench</a> +or the ball test in <a href="#touchlatency">TouchLatency)</a>.</li> +<li>Fix the frame drops detected in the lightweight and consistent +workload.</li> +<li>Repeat steps 3-4 until you can run with zero dropped frames for 20+ seconds +at a time. </li> +<li>Move on to other user-visible sources of jank.</li> +</ol> + +<p>Other simple things you can do early on in device bringup include:</p> + +<ul> +<li>Ensure your kernel has the +<a href="https://android.googlesource.com/kernel/msm/+/c9f00aa0e25e397533c198a0fcf6246715f99a7b%5E!/">sched_blocked_reason +tracepoint patch</a>. This tracepoint is enabled with the sched trace category +in systrace and provides the function responsible for sleeping when that +thread enters uninterruptible sleep. It is critical for performance analysis +because uninterruptible sleep is a very common indicator of jitter.</li> +<li>Ensure you have sufficient tracing for the GPU and display pipelines. On +recent Qualcomm SOCs, tracepoints are enabled using:</li> +<pre>$ adb shell "echo 1 > /d/tracing/events/kgsl/enable" +$ adb shell "echo 1 > /d/tracing/events/mdss/enable"</pre> + +<p>These events remain enabled when you run systrace so you can see additional +information in the trace about the display pipeline (MDSS) in the +<code>mdss_fb0</code> section. On Qualcomm SOCs, you won't see any additional +information about the GPU in the standard systrace view, but the results are +present in the trace itself (for details, see +<a href="/devices/tech/debug/systrace.html">Understanding +systrace</a>).</p> + +<p>What you want from this kind of display tracing is a single event that +directly indicates a frame has been delivered to the display. From there, you +can determine if you've hit your frame time successfully; if event X<em>n</em> +occurs less than 16.7ms after event X<em>n-1</em> (assuming a 60Hz display), +then you know you did not jank. If your SOC does not provide such signals, work +with your vendor to get them. Debugging jitter is extremely difficult without a +definitive signal of frame completion.</p></ul> + +<h3 id="synthetic_benchmarks">Using synthetic benchmarks</h3> +<p>Synthetic benchmarks are useful for ensuring a device's basic functionality +is present. However, treating benchmarks as a proxy for perceived device +performance is not useful.</p> + +<p>Based on experiences with SOCs, differences in synthetic benchmark +performance between SOCs is not correlated with a similar difference in +perceptible UI performance (number of dropped frames, 99th percentile frame +time, etc.). Synthetic benchmarks are capacity-only benchmarks; jitter impacts +the measured performance of these benchmarks only by stealing time from the bulk +operation of the benchmark. As a result, synthetic benchmark scores are mostly +irrelevant as a metric of user-perceived performance.</p> + +<p>Consider two SOCs running Benchmark X that renders 1000 frames of UI and +reports the total rendering time (lower score is better).</p> + +<ul> +<li>SOC 1 renders each frame of Benchmark X in 10ms and scores 10,000.</li> +<li>SOC 2 renders 99% of frames in 1ms but 1% of frames in 100ms and scores +19,900, a dramatically better score.</li> +</ul> + +<p>If the benchmark is indicative of actual UI performance, SOC 2 would be +unusable. Assuming a 60Hz refresh rate, SOC 2 would have a janky frame every +1.5s of operation. Meanwhile, SOC 1 (the slower SOC according to Benchmark X) +would be perfectly fluid.</p> + +<h3 id="bug_reports">Using bug reports</h3> +<p>Bug reports are sometimes useful for performance analysis, but because they +are so heavyweight, they are rarely useful for debugging sporadic jank issues. +They may provide some hints on what the system was doing at a given time, +especially if the jank was around an application transition (which is logged in +a bug report). Bug reports can also indicate when something is more broadly +wrong with the system that could reduce its effective capacity (such as thermal +throttling or memory fragmentation).</p> + +<h3 id="touchlatency">Using TouchLatency</h3> +<p>Several examples of bad behavior come from TouchLatency, which is the +preferred periodic workload used for the Pixel and Pixel XL. It's available at +<code>frameworks/base/tests/TouchLatency</code> and has two modes: touch latency +and bouncing ball (to switch modes, click the button in the upper-right +corner).</p> + +<p>The bouncing ball test is exactly as simple as it appears: A ball bounces +around the screen forever, regardless of user input. It is usually also +<strong>by far</strong> the hardest test to run perfectly, but the closer it +comes to running without any dropped frames, the better your device will be. The +bouncing ball test is difficult because it is a trivial but perfectly consistent +workload that runs at a very low clock (this assumes device has a frequency +governor; if the device is instead running with fixed clocks, downclock the +CPU/GPU to near-minimum when running the bouncing ball test for the first time). +As the system quiesces and the clocks drop closer to idle, the required CPU/GPU +time per frame increases. You can watch the ball and see things jank, and you'll +be able to see missed frames in systrace as well.</p> + +<p>Because the workload is so consistent, you can identify most sources of +jitter much more easily than in most user-visible workloads by tracking what +exactly is running on the system during each missed frame instead of the UI +pipeline. <strong>The lower clocks amplify the effects of jitter by making it +more likely that any jitter causes a dropped frame.</strong> As a result, the +closer TouchLatency is to 60FPS, the less likely you are to have bad system +behaviors that cause sporadic, hard-to-reproduce jank in larger +applications.</p> + +<p>As jitter is often (but not always) clockspeed-invariant, use a test that +runs at very low clocks to diagnose jitter for the following reasons:</p> +<ul> +<li>Not all jitter is clockspeed-invariant; many sources just consume CPU +time.</li> +<li>The governor should get the average frame time close to the deadline by +clocking down, so time spent running non-UI work can push it over the edge to +dropping a frame.</li> +</ul> + +</body> +</html> |