A/B system updates, also known as seamless updates, ensure a workable booting system remains on the disk during an over-the-air (OTA) update. This approach reduces the likelihood of an inactive device after an update, which means fewer device replacements and device reflashes at repair and warranty centers. Other commercial-grade operating systems such as ChromeOS also use A/B updates successfully.
A/B system updates provide the following benefits:
/data
or /cache
.
A/B system updates affect the following:
update_engine
daemon,
and bootloader interactions (described below)
A/B system updates use two sets of partitions referred to as slots (normally slot A and slot B). The system runs from the current slot while the partitions in the unused slot are not accessed by the running system during normal operation. This approach makes updates fault resistant by keeping the unused slot as a fallback: If an error occurs during or immediately after an update, the system can rollback to the old slot and continue to have a working system. To achieve this goal, no partition used by the current slot should be updated as part of the OTA update (including partitions for which there is only one copy).
Each slot has a bootable attribute that states whether the slot contains a correct system from which the device can boot. The current slot is bootable when the system is running, but the other slot may have an old (still correct) version of the system, a newer version, or invalid data. Regardless of what the current slot is, there is one slot that is the active slot (the one the bootloader will boot form on the next boot) or the preferred slot.
Each slot also has a successful attribute set by the user
space, which is relevant only if the slot is also bootable. A
successful slot should be able to boot, run, and update itself. A
bootable slot that was not marked as successful (after several
attempts were made to boot from it) should be marked as unbootable
by the bootloader, including changing the active slot to another
bootable slot (normally to the slot running immediately before the
attempt to boot into the new, active one). The specific details of
the interface are defined in
boot_control.h
.
A/B system updates use a background daemon called
update_engine
to prepare the system to boot into a new,
updated version. This daemon can perform the following actions:
boot_control
interface in a pre-defined
workflow.
As the update_engine
daemon is not involved in the boot
process itself, it is limited in what it can do during an update by
the SELinux policies and features
in the current slot (such policies and features can't be
updated until the system boots into a new version). To maintain a
robust system, the update process should not modify
the partition table, the contents of partitions in the current slot,
or the contents of non-A/B partitions that can't be wiped with a
factory reset.
The update_engine
source is located in
system/update_engine
.
The A/B OTA dexopt files are split between installd
and
a package manager:
frameworks/native/cmds/installd/
ota*
includes the postinstall script, the binary for chroot, the
installd clone that calls dex2oat, the post-OTA move-artifacts
script, and the rc file for the move script.
frameworks/base/services/core/java/com/android/server/pm/OtaDexoptService.java
(plus OtaDexoptShellCommand
)
is the package manager that prepares dex2oat commands for
applications.
For a working example, refer to /device/google/marlin/device-common.mk
.
The boot_control
HAL is used by
update_engine
(and possibly other daemons) to instruct
the bootloader what to boot from. Common example scenarios and their
associated states include the following:
update_verifier
,
should mark slot A as successful after some checks are made.
User devices don't always have enough space on /data
to
download the update package. As neither OEMs nor users want to waste
space on a /cache
partition, some users go without
updates because the device has nowhere to store the update package.
To address this issue, Android 8.0 added support for streaming A/B
updates that write blocks directly to the B partition as they are
downloaded, without having to store the blocks on /data
.
Streaming A/B updates need almost no temporary storage and require
just enough storage for roughly 100 KiB of metadata.
To enable streaming updates in Android 7.1, cherrypick the following patches:
These patches are required to support streaming A/B updates in Android 7.1 and later whether using Google Mobile Services (GMS) or any other update client.
The update process starts when an OTA package (referred to in code as a payload) is available for downloading. Policies in the device may defer the payload download and application based on battery level, user activity, charging status, or other policies. In addition, because the update runs in the background, users might not know an update is in progress. All of this means the update process might be interrupted at any point due to policies, unexpected reboots, or user actions.
Optionally, metadata in the OTA package itself indicates the update
can be streamed; the same package can also be used for non-streaming
installation. The server may use the metadata to tell the client it's
streaming so the client will hand off the OTA to
update_engine
correctly. Device manufacturers with their
own server and client can enable streaming updates by ensuring the
server identifies the update is streaming (or assumes all updates are
streaming) and the client makes the correct call to
update_engine
for streaming. Manufacturers can use the
fact that the package is of the streaming variant to send a flag to
the client to trigger hand off to the framework side as streaming.
After a payload is available, the update process is as follows:
Step | Activities |
---|---|
1 | The current slot (or "source slot") is marked as successful (if
not already marked) with markBootSuccessful() . |
2 |
The unused slot (or "target slot") is marked as unbootable by
calling the function setSlotAsUnbootable() . The
current slot is always marked as successful at the beginning of
the update to prevent the bootloader from falling back to the
unused slot, which will soon have invalid data. If the system has
reached the point where it can start applying an update, the
current slot is marked as successful even if other major
components are broken (such as the UI in a crash loop) as it is
possible to push new software to fix these problems.
The update payload is an opaque blob with the instructions to update to the new version. The update payload consists of the following:
|
3 | The payload metadata is downloaded. |
4 | For each operation defined in the metadata, in order, the associated data (if any) is downloaded to memory, the operation is applied, and the associated memory is discarded. |
5 | The whole partitions are re-read and verified against the expected hash. |
6 | The post-install step (if any) is run. In the case of an error during the execution of any step, the update fails and is re-attempted with possibly a different payload. If all the steps so far have succeeded, the update succeeds and the last step is executed. |
7 |
The unused slot is marked as active by calling
setActiveBootSlot() . Marking the unused slot as
active doesn't mean it will finish booting. The bootloader (or
system itself) can switch the active slot back if it doesn't read
a successful state.
|
8 |
Post-installation (described below) involves running a program
from the "new update" version while still running in the old
version. If defined in the OTA package, this step is
mandatory and the program must return with exit
code 0 ; otherwise, the update fails.
|
9 |
After the system successfully boots far enough into the new slot
and finishes the post-reboot checks, the now current slot
(formerly the "target slot") is marked as successful by calling
markBootSuccessful() .
|
For every partition where a post-install step is defined,
update_engine
mounts the new partition into a specific
location and executes the program specified in the OTA relative to
the mounted partition. For example, if the post-install program is
defined as usr/bin/postinstall
in the system partition,
this partition from the unused slot will be mounted in a fixed
location (such as /postinstall_mount
) and the
/postinstall_mount/usr/bin/postinstall
command is
executed.
For post-installation to succeed, the old kernel must be able to:
ld
) is
instructed to use other paths or build a static binary, libraries
will be loaded from the old system image and not the new one.
For example, you could use a shell script as a post-install program
interpreted by the old system's shell binary with a #!
marker at the top), then set up library paths from the new
environment for executing a more complex binary post-install
program. Alternatively, you could run the post-install step from a
dedicated smaller partition to enable the filesystem format in the
main system partition to be updated without incurring backward
compatibility issues or stepping-stone updates; this would allow
users to update directly to the latest version from a factory image.
The new post-install program is limited by the SELinux policies defined in the old system. As such, the post-install step is suitable for performing tasks required by design on a given device or other best-effort tasks (i.e. updating the A/B-capable firmware or bootloader, preparing copies of databases for the new version, etc.). The post-install step is not suitable for one-off bug fixes before reboot that require unforeseen permissions.
The selected post-install program runs in the
postinstall
SELinux context. All the files in the new
mounted partition will be tagged with postinstall_file
,
regardless of what their attributes are after rebooting into that
new system. Changes to the SELinux attributes in the new system
won't impact the post-install step. If the post-install program
needs extra permissions, those must be added to the post-install
context.
After rebooting, update_verifier
triggers the integrity
check using dm-verity. This check starts before zygote to avoid Java
services making any irreversible changes that would prevent a safe
rollback. During this process, bootloader and kernel may also
trigger a reboot if verified boot or dm-verity detect any
corruption. After the check completes, update_verifier
marks the boot successful.
update_verifier
will read only the blocks listed in
/data/ota_package/care_map.txt
, which is included in an
A/B OTA package when using the AOSP code. The Java system update
client, such as GmsCore, extracts care_map.txt
, sets up
the access permission before rebooting the device, and deletes the
extracted file after the system successfully boots into the new
version.
Yes. The marketing name for A/B updates is seamless updates.
Pixel and Pixel XL phones from October 2016 shipped with A/B, and
all Chromebooks use the same update_engine
implementation of A/B. The necessary platform code implementation is
public in Android 7.1 and higher.
A/B OTAs provide a better user experience when taking updates. Measurements from monthly security updates show this feature has already proven a success: As of May 2017, 95% of Pixel owners are running the latest security update after a month compared to 87% of Nexus users, and Pixel users update sooner than Nexus users. Failures to update blocks during an OTA no longer result in a device that won't boot; until the new system image has successfully booted, Android retains the ability to fall back to the previous working system image.
The following table contains details on the shipping A/B configuration versus the internally-tested non-A/B configuration:
Pixel partition sizes | A/B | Non-A/B |
---|---|---|
Bootloader | 50*2 | 50 |
Boot | 32*2 | 32 |
Recovery | 0 | 32 |
Cache | 0 | 100 |
Radio | 70*2 | 70 |
Vendor | 300*2 | 300 |
System | 2048*2 | 4096 |
Total | 5000 | 4680 |
A/B updates require an increase of only 320 MiB in flash, with a savings of 32MiB from removing the recovery partition and another 100MiB preserved by removing the cache partition. This balances the cost of the B partitions for the bootloader, the boot partition, and the radio partition. The vendor partition doubled in size (the vast majority of the size increase). Pixel's A/B system image is half the size of the original non-A/B system image.
For the Pixel A/B and non-A/B variants tested internally (only A/B shipped), the space used differed by only 320MiB. On a 32GiB device, this is just under 1%. For a 16GiB device this would be less than 2%, and for an 8GiB device almost 4% (assuming all three devices had the same system image).
We experimented with SquashFS but weren't able to achieve the performance desired for a high-end device. We don't use or recommend SquashFS for handheld devices.
More specifically, SquashFS provided about 50% size savings on the system partition, but the overwhelming majority of the files that compressed well were the precompiled .odex files. Those files had very high compression ratios (approaching 80%), but the compression ratio for the rest of the system partition was much lower. In addition, SquashFS in Android 7.0 raised the following performance concerns:
As SquashFS matures and adds features to reduce CPU impact (such as a whitelist of commonly-accessed files that shouldn't be compressed), we will continue to evaluate it and offer recommendations to device manufacturers.
Applications are stored in .apk files, which are actually ZIP archives. Each .apk file has inside it one or more .dex files containing portable Dalvik bytecode. An .odex file (optimized .dex) lives separately from the .apk file and can contain machine code specific to the device. If an .odex file is available, Android can run applications at ahead-of-time compiled speeds without having to wait for the code to be compiled each time the application is launched. An .odex file isn't strictly necessary: Android can actually run the .dex code directly via interpretation or Just-In-Time (JIT) compilation, but an .odex file provides the best combination of launch speed and run-time speed if space is available.
Example: For the installed-files.txt from a Nexus 6P running Android 7.1 with a total system image size of 2628MiB (2755792836 bytes), the breakdown of the largest contributors to overall system image size by file type is as follows:
.odex | 1391770312 bytes | 50.5% |
.apk | 846878259 bytes | 30.7% |
.so (native C/C++ code) | 202162479 bytes | 7.3% |
.oat files/.art images | 163892188 bytes | 5.9% |
Fonts | 38952361 bytes | 1.4% |
icu locale data | 27468687 bytes | 0.9% |
These figures are similar for other devices too, so on Nexus/Pixel
devices, .odex files take up approximately half the system partition. This meant
we could continue to use ext4 but write the .odex files to the B partition
at the factory and then copy them to /data
on first boot. The
actual storage used with ext4 A/B is identical to SquashFS A/B, because if we
had used SquashFS we would have shipped the preopted .odex files on system_a
instead of system_b.
Not exactly. On Pixel, most of the space taken by .odex files is for apps,
which typically exist on /data
. These apps take Google Play
updates, so the .apk and .odex files on the system image are unused for most of
the life of the device. Such files can be excluded entirely and replaced by
small, profile-driven .odex files when the user actually uses each app (thus
requiring no space for apps the user doesn't use). For details, refer to the
Google I/O 2016 talk The
Evolution of Art.
The comparison is difficult for a few key reasons:
/data
as soon as they receive their first update.For details on the tuning options available to OEMs, see Configuring ART.
It's a little more complicated ... After the new system image has been
written, the new version of dex2oat is run against the new .dex files to
generate the new .odex files. This occurs while the old system is still running,
so the old and new .odex files are both on /data
at the same time.
The code in OtaDexoptService
(frameworks/base/+/nougat-mr1-release/services/core/java/com/android/server/pm/OtaDexoptService.java#200
)
calls getAvailableSpace
before optimizing each package to avoid
over-filling /data
. Note that available here is still
conservative: it's the amount of space left before hitting the usual
system low space threshold (measured as both a percentage and a byte count). So
if /data
is full, there won't be two copies of every .odex file.
The same code also has a BULK_DELETE_THRESHOLD: If the device gets that close
to filling the available space (as just described), the .odex files belonging to
apps that aren't used are removed. That's another case without two copies of
every .odex file.
In the worst case where /data
is completely full, the update
waits until the device has rebooted into the new system and no longer needs the
old system's .odex files. The PackageManager handles this:
(frameworks/base/+/nougat-mr1-release/services/core/java/com/android/server/pm/PackageManagerService.java#7215
). After the new system has
successfully booted, installd
(frameworks/native/+/nougat-mr1-release/cmds/installd/commands.cpp#2192
)
can remove the .odex files that were used by the old system, returning the
device back to the steady state where there's only one copy.
So, while it is possible that /data
contains two copies of all
the .odex files, (a) this is temporary and (b) only occurs if you had plenty of
free space on /data
anyway. Except during an update, there's only
one copy. And as part of ART's general robustness features, it will never fill
/data
with .odex files anyway (because that would be a problem on a
non-A/B system too).
Only a small portion of flash is rewritten: a full Pixel system update writes about 2.3GiB. (Apps are also recompiled, but that's true of non-A/B too.) Traditionally, block-based full OTAs wrote a similar amount of data, so flash wear rates should be similar.
No. Pixel didn't increase in system image size (it merely divided the space across two partitions).
Yes. If you've actually used a device, taken an OTA, and performed a factory
data reset, the first reboot will be slower than it would otherwise be (1m40s vs
40s on a Pixel XL) because the .odex files will have been lost from B after the
first OTA and so can't be copied to /data
. That's the trade-off.
Factory data reset should be a rare operation when compared to regular boot
so the time taken is less important. (This doesn't affect users or reviewers who
get their device from the factory, because in that case the B partition is
available.) Use of the JIT compiler means we don't need to recompile
everything, so it's not as bad as you might think. It's also possible
to mark apps as requiring ahead-of-time compilation using
coreApp="true"
in the manifest:
(frameworks/base/+/nougat-mr1-release/packages/SystemUI/AndroidManifest.xml#23
).
This is currently used by system_server
because it's not allowed to
JIT for security reasons.
No. As explained above, the new dex2oat is run while the old system image is still running to generate the files that will be needed by the new system. The update isn't considered available until that work has been done.
32GiB works well as it was proven on Pixel, and 320MiB out of 16GiB means a reduction of 2%. Similarly, 320MiB out of 8GiB a reduction of 4%. Obviously A/B would not be the recommended choice on devices with 4GiB, as the 320MiB overhead is almost 10% of the total available space.
No. Android Verified Boot has always required block-based updates, but not necessarily A/B updates.
No.
No. There's some confusion here because if an A/B system fails to boot into the new system image it will (after some number of retries determined by your bootloader) automatically revert to the "previous" system image. The key point here though is that "previous" in the A/B sense is actually still the "current" system image. As soon as the device successfully boots a new image, rollback protection kicks in and ensures that you can't go back. But until you've actually successfully booted the new image, rollback protection doesn't consider it to be the current system image.
With non-A/B updates, the aim is to install the update as quickly as possible because the user is waiting and unable to use their device while the update is applied. With A/B updates, the opposite is true; because the user is still using their device, as little impact as possible is the goal, so the update is deliberately slow. Via logic in the Java system update client (which for Google is GmsCore, the core package provided by GMS), Android also attempts to choose a time when the users aren't using their devices at all. The platform supports pausing/resuming the update, and the client can use that to pause the update if the user starts to use the device and resume it when the device is idle again.
There are two phases while taking an OTA, shown clearly in the UI as Step 1 of 2 and Step 2 of 2 under the progress bar. Step 1 corresponds with writing the data blocks, while step 2 is pre-compiling the .dex files. These two phases are quite different in terms of performance impact. The first phase is simple I/O. This requires little in the way of resources (RAM, CPU, I/O) because it's just slowly copying blocks around.
The second phase runs dex2oat to precompile the new system image. This obviously has less clear bounds on its requirements because it compiles actual apps. And there's obviously much more work involved in compiling a large and complex app than a small and simple app; whereas in phase 1 there are no disk blocks that are larger or more complex than others.
The process is similar to when Google Play installs an app update in the background before showing the 5 apps updated notification, as has been done for years.
The current implementation in GmsCore doesn't distinguish between background updates and user-initiated updates but may do so in the future. In the case where the user explicitly asked for the update to be installed or is watching the update progress screen, we'll prioritize the update work on the assumption that they're actively waiting for it to finish.
With non-A/B updates, if an update failed to apply, the user was usually left with an unusable device. The only exception was if the failure occurred before an application had even started (because the package failed to verify, say). With A/B updates, a failure to apply an update does not affect the currently running system. The update can simply be retried later.
In Google's A/B implementation, the platform APIs and
update_engine
provide the mechanism while GmsCore provides the
policy. That is, the platform knows how to apply an A/B update and all
that code is in AOSP (as mentioned above); but it's GmsCore that decides
what and when to apply.
If you’re not using GmsCore, you can write your own replacement using the
same platform APIs. The platform Java API for controlling
update_engine
is android.os.UpdateEngine
:
frameworks/base/core/java/android/os/UpdateEngine.java
.
Callers can provide an UpdateEngineCallback
to be notified of status
updates:
frameworks/base/+/master/core/java/android/os/UpdateEngineCallback.java
.
Refer to the reference files for the core classes to use the interface.
As of 2017-03-15, we have the following information:
Android 7.x Release | Android 8.x Release | |
Qualcomm | Depending on OEM requests | All chipsets will get support |
Mediatek | Depending on OEM requests | All chipsets will get support |
For details on schedules, check with your SoC contacts. For SoCs not listed above, reach out to your SoC directly.