diff options
author | Joshua Haberman <jhaberman@gmail.com> | 2020-05-24 21:25:20 -0700 |
---|---|---|
committer | Joshua Haberman <jhaberman@gmail.com> | 2020-05-24 21:25:20 -0700 |
commit | 4936ee3c076f7761a177d833783097bb4a06ac45 (patch) | |
tree | 824de55dce77195199cdccb63e047933af481b4f | |
parent | a54c5763ae32fe5c2d8d6edcaec922e96f1070ee (diff) | |
parent | 5b1bc5e29188e12b55389a9c60e21d543e70ee47 (diff) | |
download | bloaty-4936ee3c076f7761a177d833783097bb4a06ac45.tar.gz |
Merge branch 'master' of github.com:google/bloaty
-rw-r--r-- | CMakeLists.txt | 2 | ||||
-rw-r--r-- | README.md | 17 | ||||
-rw-r--r-- | doc/how-bloaty-works.md | 311 | ||||
-rw-r--r-- | how-bloaty-works.md | 159 |
4 files changed, 329 insertions, 160 deletions
diff --git a/CMakeLists.txt b/CMakeLists.txt index 8b14d73..8309cff 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -249,7 +249,7 @@ else() enable_testing() if(BUILD_TESTING) - set(INSTALL_GTEST OFF) + option(INSTALL_GTEST "" OFF) add_subdirectory(third_party/googletest) include_directories(third_party/googletest/googletest/include) include_directories(third_party/googletest/googlemock/include) @@ -7,6 +7,14 @@ Ever wondered what's making your binary big? Bloaty McBloatface will show you a size profile of the binary so you can understand what's taking up space inside. +Bloaty performs a deep analysis of the binary. Using custom +ELF, DWARF, and Mach-O parsers, Bloaty aims to accurately +attribute every byte of the binary to the symbol or +compileunit that produced it. It will even disassemble the +binary looking for references to anonymous data. For more +information about the analysis performed by Bloaty, please +see [doc/how-bloaty-works.md](doc/how-bloaty-works.md). + Bloaty works on binaries, shared objects, object files, and static libraries (`.a` files). The following file formats are supported: @@ -498,6 +506,15 @@ $ ./bloaty -d sections bloaty 100.0% 29.5Mi 100.0% 6.69Mi TOTAL ``` +Sections are regions of the binary that are the linker +treats as atomic when linking. The linker will never break +apart or rearrange the data within a section. This is why it +is necessary to compile with `-ffunction-sections` and +`-fdata-sections` if you want the linker to strip out +individual functions or variables that have no references. +However the linker will often combine many input sections +into a single output section. + ## Symbols Symbols come from the symbol table, and represent individual diff --git a/doc/how-bloaty-works.md b/doc/how-bloaty-works.md new file mode 100644 index 0000000..9196f73 --- /dev/null +++ b/doc/how-bloaty-works.md @@ -0,0 +1,311 @@ + +# How Bloaty Works + +At a high level, Bloaty's goal is to create a map of the binary where every +byte has a label attached to it. Every byte starts out as unknown +(unattributed). As we scan the binary we assign labels to different ranges of +the file. For example, if the user selected the "symbols" data source we scan +the symbol table and use the symbol name as the label for each range. + +Ideally these labeled ranges will cover the entire file by the time we are +done. In practice we usually can't achieve perfect 100% coverage. To +compensate for this, we have various kinds of "fallback" labels we attach to +mystery regions of the file. This is how we guarantee an important invariant +of Bloaty: the totals given in Bloaty's output will always match the total size +of the file. This ensures that we always account for the entire file, even if +we don't have detailed information for every byte. + +The ELF/Mach-O/etc. data structures we are traversing were not designed to +enable size profiling. They were designed to assist linkers, loaders, +debuggers, stack unwinders, etc. to run and debug the binary. This means that +Bloaty's size analysis is inherently an unconventional use of ELF/Mach-O +metadata. Bloaty has to be clever about how to use the available information to +achieve its goal. This can pose a challenge, but also makes Bloaty fun to work +on. Getting the coverage close to 100% requires a lot of ingenuity (and some +heuristics). + +## Range Map + +RangeMap (as defined in [range_map.h](https://github.com/google/bloaty/blob/master/src/range_map.h)) +is the core data structure of Bloaty. It is a sparse map of +`[start, end) -> std::string` that associates regions of VM or file space to +a label. + +Sometimes we know a region's start but not its end. For example, Mach-O +symbols have an address but not a size (whereas ELF symbols have both). +To support this case, `RangeMap` supports adding an address with `kUnknownSize`. +A range with unknown size will automatically extend to the beginning of the +next region, even if the next region is added later. + +If we try to add a label to a range of the binary that has already been assigned +a label, the first label assigned takes precedence. This means that the order +in which we scan data structures is significant. So our general strategy is to +scan our most granular and detailed information first. We scan generic +information as a last resort, to give at least some information for parts of the +binary that we couldn't find any more specific information about. + +## VM Space and File Space + +Loadable binaries have two fundamental domains of space we are trying to map: +*VM space* and *file space*. File space is the bytes of the input file. VM +space is the bytes of memory when the executable is loaded at runtime. Some +regions of the binary exist only in file space (like debug info) and some +regions exist only in VM space (like `.bss`, zero-initialized data). Even +entities that exist in both spaces can have different sizes in each. + +We create two separate `RangeMap` structures for these two domains. For +convenience, we put them together into a single structure called `DualMap`: + +```cpp +struct DualMap { + RangeMap vm_map; + RangeMap file_map; +}; +``` + +We populate these two maps simultaneously as we scan the file. We must populate +both maps even if we only care about one of them, because most of the metadata we're +scanning gives us VM addresses *or* file offsets, not both. For example, +debug info always refers to VM addresses, because it's intended for debugging at +runtime. Even if we only care about file size, we still have to scan VM addresses +and translate them to file offsets. + +Bloaty's overall analysis algorithm (in pseudo-code) is: + +```c++ +for (auto f : files) { + // Always start by creating the base map. + DualMap base_map = ScanBaseMap(f); + + // Scan once for every data source the user selected with '-d'. + std::vector<DualMap> maps; + for (auto s : data_sources) { + maps.push_back(ScanDataSource(f, s)); + } +} +``` + +## Base Map + +To translate between VM and file space, we always begin by creating a "base +map." The base map is just a `DualMap` like any other, but we give it special +meaning: + +* It defines what ranges of file and VM space constitute "the entire binary" + (ie. the "TOTALS" row of the final report). +* We use it to translate between VM space and File space. + +This means that the base map must be exhaustive, and must also provide +translation for any entity that exists in both VM and file space. For example, +suppose we are scanning the "symbols" data source and we see in the symbol +table that address `0x12345` corresponds to symbol `foo`. We will add that to +VM map immediately, but we will also use the base map to translate address +`0x12355` to a file offset so we can add that range to the file map. + +How does the base map store translation info? I left one thing out about +`RangeMap` above. In addition to storing a label for every region, it can also +(optionally) store a member called `other_start`. This is the offset in the +other space where this range maps to. + +We build the base map by scanning either the segments (program headers) or +sections of the binary. These give both VM address and file offset for regions +of the binary that are loaded into memory. To make sure we cover the entire +file space, we use `[Unmapped]` as a last ditch fallback for any regions of the +on-disk binary that didn't have any segment/section data associated with them. +This ensures that Bloaty always accounts for the entire physical binary, even if +we can't find any information about it. + +## Scanning Data Sources + +Once we have built the base map, we can get on to the meat of Bloaty's work. +We can now scan the binary according to whatever data source(s) the user has +selected. + +We must do more than just scan a single table though. For example, if the user +asked us to scan symbols we can't simply scan the symbol table, add all the +symbol table entries to a `RangeMap`, and call it a day. The first prerelease +version of Bloaty did just that and I got very understandable complaints that +Bloaty's coverage was low, and very large parts of the binary would show up as +`[None]`. + +The problem is that the symbol table only describes the actual machine code of +the function. But a function can emit far more artifacts into a binary than +just the machine code. It can also emit: + +* unwind info into `.eh_frame` +* relocations into `.rela.dyn`, etc. +* debugging information into various `.debug_*` sections. +* the symbol table entries themselves (yes, these do take up space!) + +If we want to achieve high coverage of the binary, we have to scan all of these +different sections and attribute them as best we can to individual +symbols/compileunits/etc. + +All of this means that we have to delve deep into ELF, DWARF, Mach-O, etc. to +get good coverage. Bloaty includes custom ELF/DWARF/Mach-O parsers to not only +extract every bit of data we can, but also to track where exactly in the files +each bit of data lives. + +### Segments and Sections + +The `segments` and `sections` data sources are relatively straightforward. For +the most part we can simply scan the segments/sections table and call it a day. + +For ELF, segments and sections have distinct tables in the binary that can be +scanned independently. This means that technically a section could span multiple +segments, but in practice segments/sections form a 1:many relationship, where +each section is contained entirely within a single segment. + +Currently Bloaty only reports `PT_LOAD` and `PT_TLS` segments. We scan `PT_LOAD` +segments first, so if there is overlap with `PT_TLS` the `PT_LOAD` label will +win. In the future It may make sense to scan `PT_TLS` first, as this is more +granular data that can give insight into the per-thread runtime overhead of TLS +variables. It may also make sense to scan other segment types, to give more +granular info. + +ELF segments do not have names. To distinguish between different `PT_LOAD` +segments, we include both a segment offset and the segment flags in the label, +eg. `LOAD #2 [R]`. + +For Mach-O, segments are contained within a file-level table of "load commands." +Each load command has a type, and technically speaking, segments are a subset of +all load commands. However Bloaty can parse non-segment load commands such as +the symbol table (`LC_SYMTAB`, `LC_DYSYMTAB`), code signature +(`LC_CODE_SIGNATURE`), and more. Segments can have zero or more sections, so in +Mach-O files the 1:many nature of segments and sections is enforced by the file +format. + +### Symbols + +The `symbols` data source requires a lot of deep analysis of the binary to give +good coverage. For example, take the following data from running Bloaty on +itself: + +``` +$ ./bloaty bloaty -d symbols,sections + FILE SIZE VM SIZE + -------------- -------------- +[...] + 0.2% 116Ki 1.6% 116Ki AArch64_printInst + 84.9% 98.8Ki 84.9% 98.8Ki .text + 14.9% 17.4Ki 14.9% 17.4Ki .rodata + 0.1% 156 0.1% 156 .eh_frame + 0.0% 24 0.0% 0 .symtab + 0.0% 18 0.0% 0 .strtab + 0.0% 8 0.0% 8 .eh_frame_hdr +[...] + 0.1% 50.1Ki 0.7% 49.8Ki reg_name_maps + 59.6% 29.8Ki 59.8% 29.8Ki .rela.dyn + 40.0% 20.0Ki 40.2% 20.0Ki .data.rel.ro + 0.4% 216 0.0% 0 .symtab + 0.0% 14 0.0% 0 .strtab +``` + +I excerpted two symbols from the report. Between these two symbols, Bloaty has +found seven distinct kinds of sections that contributed to these two symbols. If +you wrote a tool that naively just parsed the symbol table, you would only find +one of those seven. + +Let's take a survey of what Bloaty has found here. + +1. `.text.`/`.data.rel.ro`: this is the data we obtain by simply following the + symbol table entry. This is the primary code or data emitted by the function + or variable. +2. `.eh_frame`: this is the "unwind information" for a function. [It is used for + many things](https://stackoverflow.com/a/26302715/77070), including C++ + exceptions and stack traces when no frame pointer is available. +3. `.eh_frame_hdr`: this is metadata about the `.eh_frame` section. +4. `.symtab`: this is the function/variable's symbol table entry itself. It is a + fixed size for every entry. The fact that `reg_name_maps` above has a + `.symtab` size of 216 indicates that there must actually be 9 different + symbols being represented by this entry. Bloaty has combined them because + they all have the same name. We can break them apart if we want using: + + ``` + $ ./bloaty bloaty -d compileunits,symbols --source-filter=reg_name_maps$ + FILE SIZE VM SIZE + -------------- -------------- + 20.3% 10.2Ki 20.3% 10.1Ki ../third_party/capstone/arch/AArch64/AArch64Mapping.c + 100.0% 10.2Ki 100.0% 10.1Ki reg_name_maps + 18.9% 9.45Ki 18.9% 9.43Ki ../third_party/capstone/arch/X86/X86Mapping.c + 100.0% 9.45Ki 100.0% 9.43Ki reg_name_maps + 16.4% 8.20Ki 16.4% 8.18Ki ../third_party/capstone/arch/PowerPC/PPCMapping.c + 100.0% 8.20Ki 100.0% 8.18Ki reg_name_maps + 10.7% 5.35Ki 10.7% 5.33Ki ../third_party/capstone/arch/Mips/MipsMapping.c + 100.0% 5.35Ki 100.0% 5.33Ki reg_name_maps + 9.1% 4.57Ki 9.1% 4.55Ki ../third_party/capstone/arch/SystemZ/SystemZMapping.c + 100.0% 4.57Ki 100.0% 4.55Ki reg_name_maps + 8.7% 4.35Ki 8.7% 4.31Ki ../third_party/capstone/arch/ARM/ARMMapping.c + 100.0% 4.35Ki 100.0% 4.31Ki reg_name_maps + 7.0% 3.52Ki 7.0% 3.49Ki ../third_party/capstone/arch/TMS320C64x/TMS320C64xMapping.c + 100.0% 3.52Ki 100.0% 3.49Ki reg_name_maps + 6.9% 3.44Ki 6.9% 3.41Ki ../third_party/capstone/arch/Sparc/SparcMapping.c + 100.0% 3.44Ki 100.0% 3.41Ki reg_name_maps + 2.0% 1.02Ki 2.0% 1016 ../third_party/capstone/arch/XCore/XCoreMapping.c + 100.0% 1.02Ki 100.0% 1016 reg_name_maps + 100.0% 50.1Ki 100.0% 49.8Ki TOTAL + Filtering enabled (source_filter); omitted file = 46.7Mi, vm = 7.08Mi of entries + ``` +5. `.strtab`: this is the text of the function/variables's name itself in the + string table. Longer names take up more space in the binary, and Bloaty's + analysis here reflects that (though the symbol table is not loaded at + runtime, so it's not costing RAM). +6. `.rela.dyn`: these are relocations embedded into the executable. Normally we + would associate relocations with `.o` files and not the final linked binary. + However shared objects and position-independent executables must also emit + relocations for any global variable that is initialized to an address of some + other data. These relocations can take up a significant amount of space, + indeed more space than the data itself in this case! Without this deep + analysis of the binary, this cost would be invisible. Bloaty scans all + relocation tables and "charges" each relocation entry to the function/data + that requires the relocation (*not* the function being pointed to). +7. `.rodata`: Bloaty has found some data associated with this function. + Sometimes data doesn't get its own symbol table entry, for whatever reason. + Bloaty can attribute anonymous data to the function that uses it by + disassembling the binary looking for instructions that reference a different + part of the binary. + +Note that this is more granular information than you can get from a linker map +file. A linker map file will break down some of these sections by compile unit, +but the symbol-level granularity is limited to the primary code/data for each +symbol (#1 in the list above). + +### Compile Units + +Like symbols, we can see that Bloaty is capable of breaking down lots of +sections by compile unit: + +``` +$ ./bloaty bloaty -d compileunits,sections + FILE SIZE VM SIZE + -------------- -------------- + 37.9% 17.7Mi 49.4% 3.52Mi [160 Others] + 15.0% 7.04Mi 3.4% 246Ki ../third_party/protobuf/src/google/protobuf/descriptor.cc + 33.9% 2.38Mi 0.0% 0 .debug_info + 32.6% 2.29Mi 0.0% 0 .debug_loc + 17.2% 1.21Mi 0.0% 0 .debug_str + 6.5% 468Ki 0.0% 0 .debug_ranges + 5.3% 381Ki 0.0% 0 .debug_line + 2.8% 204Ki 83.1% 204Ki .text + 1.0% 70.9Ki 0.0% 0 .strtab + 0.4% 25.7Ki 10.4% 25.7Ki .eh_frame + 0.2% 13.3Ki 0.0% 0 .symtab + 0.1% 10.6Ki 4.3% 10.6Ki .rodata + 0.1% 3.97Ki 1.6% 3.97Ki .eh_frame_hdr + 0.0% 1.03Ki 0.4% 1.03Ki .rela.dyn + 0.0% 368 0.1% 368 .data.rel.ro + 0.0% 0 0.0% 81 .bss +[...] +``` + +To attribute all of the different `.debug_*` sections, Bloaty includes parsers +for all of the different DWARF formats that live in these sections. We also use +the DWARF data to find which symbols belong to which compile units. + +The `compileunits` data source contains much of the same data that you could get +from a linker map. Since each compile unit generally comes from a separate `.o` +file, a linker map can often give good data about which parts of the binary came +from which translation units. However Bloaty is able to derive this data without +needing a linker map file, which may be tricky to obtain. The `compileunits` +data source is also useful when combined with other data sources in hierarchical +profiles, as seen above. diff --git a/how-bloaty-works.md b/how-bloaty-works.md deleted file mode 100644 index f246a01..0000000 --- a/how-bloaty-works.md +++ /dev/null @@ -1,159 +0,0 @@ - -# How Bloaty Works - -At a high level, Bloaty's goal is to create a map of the binary where every -byte has a label attached to it. Every byte starts out as unknown -(unattributed). As we scan the binary we assign labels to different ranges of -the file. For example, if the user selected the "symbols" data source we scan -the symbol table and use the symbol name as the label for each range. - -Ideally these labeled ranges will cover the entire file by the time we are -done. In practice we usually can't achieve perfect 100% coverage. To -compensate for this, we have various kinds of "fallback" labels we attach to -mystery regions of the file. This is how we guarantee an important invariant -of Bloaty: the totals given in Bloaty's output will always match the total size -of the file. This ensures that we always account for the entire file, even if -we don't have detailed information for every byte. - -The ELF/Mach-O/etc. data structures we are traversing were not designed to -enable size profiling. They were designed to assist linkers, loaders, -debuggers, stack unwinders, etc. to run and debug the binary. This means that -Bloaty's size analysis is inherently an unconventional use of ELF/Mach-O -metadata. Bloaty has to be clever about how to use the available information to -achieve its goal. This can pose a challenge, but also makes Bloaty fun to work -on. Getting the coverage close to 100% requires a lot of ingenuity (and some -heuristics). - -## Range Map - -RangeMap (as defined in [range_map.h](https://github.com/google/bloaty/blob/master/src/range_map.h)) -is the core data structure of Bloaty. It is a sparse map of -`[start, end) -> std::string` that associates regions of VM or file space to -a label. - -Sometimes we know a region's start but not its end. For example, Mach-O -symbols have an address but not a size (whereas ELF symbols have both). -To support this case, `RangeMap` supports adding an address with `kUnknownSize`. -A range with unknown size will automatically extend to the beginning of the -next region, even if the next region is added later. - -If we try to add a label to a range of the binary that has already been assigned -a label, the first label assigned takes precedence. This means that the order -in which we scan data structures is significant. So our general strategy is to -scan our most granular and detailed information first. We scan generic -information as a last resort, to give at least some information for parts of the -binary that we couldn't find any more specific information about. - -## VM Space and File Space - -Loadable binaries have two fundamental domains of space we are trying to map: -*VM space* and *file space*. File space is the bytes of the input file. VM -space is the bytes of memory when the executable is loaded at runtime. Some -regions of the binary exist only in file space (like debug info) and some -regions exist only in VM space (like `.bss`, zero-initialized data). Even -entities that exist in both spaces can have different sizes in each. - -We create two separate `RangeMap` structures for these two domains. For -convenience, we put them together into a single structure called `DualMap`: - -```cpp -struct DualMap { - RangeMap vm_map; - RangeMap file_map; -}; -``` - -We populate these two maps simultaneously as we scan the file. We must populate -both maps even if we only care about one of them, because most of the metadata we're -scanning gives us VM addresses *or* file offsets, not both. For example, -debug info always refers to VM addresses, because it's intended for debugging at -runtime. Even if we only care about file size, we still have to scan VM addresses -and translate them to file offsets. - -Bloaty's overall analysis algorithm (in pseudo-code) is: - -```c++ -for (auto f : files) { - // Always start by creating the base map. - DualMap base_map = ScanBaseMap(f); - - // Scan once for every data source the user selected with '-d'. - std::vector<DualMap> maps; - for (auto s : data_sources) { - maps.push_back(ScanDataSource(f, s)); - } -} -``` - -## Base Map - -To translate between VM and file space, we always begin by creating a "base -map." The base map is just a `DualMap` like any other, but we give it special -meaning: - -* It defines what ranges of file and VM space constitute "the entire binary" - (ie. the "TOTALS" row of the final report). -* We use it to translate between VM space and File space. - -This means that the base map must be exhaustive, and must also provide -translation for any entity that exists in both VM and file space. For example, -suppose we are scanning the "symbols" data source and we see in the symbol -table that address `0x12345` corresponds to symbol `foo`. We will add that to -VM map immediately, but we will also use the base map to translate address -`0x12355` to a file offset so we can add that range to the file map. - -How does the base map store translation info? I left one thing out about -`RangeMap` above. In addition to storing a label for every region, it can also -(optionally) store a member called `other_start`. This is the offset in the -other space where this range maps to. - -We build the base map by scanning either the segments (program headers) or -sections of the binary. These give both VM address and file offset for regions -of the binary that are loaded into memory. To make sure we cover the entire -file space, we use `[Unmapped]` as a last ditch fallback for any regions of the -on-disk binary that didn't have any segment/section data associated with them. -This ensures that Bloaty always accounts for the entire physical binary, even if -we can't find any information about it. - -## Scanning Data Sources - -Once we have built the base map, we can get on to the meat of Bloaty's work. -We can now scan the binary according to whatever data source(s) the user has -selected. - -We must do more than just scan a single table though. For example, if the user -asked us to scan symbols we can't simply scan the symbol table, add all the -symbol table entries to a `RangeMap`, and call it a day. The first prerelease -version of Bloaty did just that and I got very understandable complaints that -Bloaty's coverage was low, and very large parts of the binary would show up as -`[None]`. - -The problem is that the symbol table only refers to the actual machine code of -the function. But a function can emit far more artifacts into a binary than -just the machine code. It can also emit: - -* unwind info into `.eh_frame` -* relocations into `.rela.dyn`, etc. -* debugging information into various `.debug_*` sections. -* the symbol table entries themselves (yes, these do take up space!) - -If we want to achieve high coverage of the binary, we have to scan all of these -different sections and attribute them as best we can to individual -symbols/compileunits/etc. - -All of this means that we have to delve deep into ELF, DWARF, Mach-O, etc. to -get good coverage. We have to venture into dark arts like `.eh_frame` which is -underspecified and underdocumented, and that few programs ever have any reason -to parse. For Bloaty to work well, we must go deeper. - -## ELF - -TODO - -## DWARF - -TODO - -## Mach-O - -TODO |