aboutsummaryrefslogtreecommitdiff
path: root/bin/README.md
blob: f4262bfc2208bec3d4ab3bd7e8fba198e0ab0fb5 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
Command Line Tools
==================

This directory contains command line tools for RAPPOR analysis.

Analysis Tools
--------------

### decode-dist

Decode a distribution -- requires a "counts" file (summed bits from reports),
map file, and a params file.  See `test.sh decode-dist` in this dir for an
example.

### decode-assoc

Decode a joint distribution between 2 variables ("association analysis").  See
`test.sh decode-assoc-R` or `test.sh decode-assoc-cpp` in this dir for an
example.

Currently it only supports associating strings vs. booleans.

### Setup

Both of these tools are written in R, and require several R libraries to be
installed (see `../setup.sh r-packages`).

`decode-assoc` also shells out to a native binary written in C++ if
`--em-executable` is passed.  This requires a C++ compiler (see
`analysis/cpp/run.sh`).  You can run `test.sh decode-assoc-cpp` to test it.


Helper Tools
------------

These are simple Python implementations of tools needed for analysis.  At
Google, Chrome uses alternative C++/Go implementations of these tools.

### sum-bits

Given a CSV file with RAPPOR reports (IRRs), produce a "counts" CSV file on
stdout.  This is the `m x (k+1)` matrix that is used in the R analysis (where m
= #cohorts and k = report width in bits).

### hash-candidates

Given a list of candidates on stdin, produce a CSV file of hashes (the "map
file").  Each row has `m x h` cells (where m = #cohorts and h = #hashes)

See the `regtest.sh` script for examples of how these tools are invoked.