Compare Results: Tournaments#

Suppose, we want to analyze the impact of a performance improvement.

First, we need a baseline measurement. For example:

esrally race --track=pmc --revision=latest --user-tags="intention:baseline_github_1234"

Above we run the baseline measurement based on the latest source code revision of Elasticsearch. We can use the command line parameter --user-tags to provide a key-value pair to document the intent of a race.

Then we implement our changes and finally we want to run another benchmark to see the performance impact of the change. In that case, we do not want Rally to change our source tree and thus specify the pseudo-revision current:

esrally race --track=pmc --revision=current --user-tags="intention:reduce_alloc_1234"

After we’ve run both races, we want to know about the performance impact. With Rally we can analyze differences of two given races easily. First of all, we need to find two races to compare by issuing esrally list races:

$ esrally list races

    ____        ____
   / __ \____ _/ / /_  __
  / /_/ / __ `/ / / / / /
 / _, _/ /_/ / / / /_/ /
/_/ |_|\__,_/_/_/\__, /
                /____/
Recent races:

Race ID                               Race Timestamp    Track    Track Parameters    Challenge            Car       User Tags
------------------------------------  ----------------  -------  ------------------  ------------------   --------  ------------------------------
beb154e4-0a05-4f45-ad9f-e34f9a9e51f7  20160518T122341Z  pmc                          append-no-conflicts  defaults  intention:reduce_alloc_1234
0bfd4542-3821-4c79-81a2-0858636068ce  20160518T112057Z  pmc                          append-no-conflicts  defaults  intention:baseline_github_1234
0cfb3576-3025-4c17-b672-d6c9e811b93e  20160518T101957Z  pmc                          append-no-conflicts  defaults

We can see that the user tags help us to recognize races. We want to compare the two most recent races and have to provide the two race IDs in the next step:

$ esrally compare --baseline=0bfd4542-3821-4c79-81a2-0858636068ce --contender=beb154e4-0a05-4f45-ad9f-e34f9a9e51f7

    ____        ____
   / __ \____ _/ / /_  __
  / /_/ / __ `/ / / / / /
 / _, _/ /_/ / / / /_/ /
/_/ |_|\__,_/_/_/\__, /
                /____/

Comparing baseline
  Race ID: 0bfd4542-3821-4c79-81a2-0858636068ce
  Race timestamp: 2016-05-18 11:20:57
  Challenge: append-no-conflicts
  Car: defaults

with contender
  Race ID: beb154e4-0a05-4f45-ad9f-e34f9a9e51f7
  Race timestamp: 2016-05-18 12:23:41
  Challenge: append-no-conflicts
  Car: defaults

------------------------------------------------------
    _______             __   _____
   / ____(_)___  ____ _/ /  / ___/_________  ________
  / /_  / / __ \/ __ `/ /   \__ \/ ___/ __ \/ ___/ _ \
 / __/ / / / / / /_/ / /   ___/ / /__/ /_/ / /  /  __/
/_/   /_/_/ /_/\__,_/_/   /____/\___/\____/_/   \___/
------------------------------------------------------
                                                  Metric    Baseline    Contender               Diff
--------------------------------------------------------  ----------  -----------  -----------------
                        Min Indexing Throughput [docs/s]       19501        19118  -383.00000
                     Median Indexing Throughput [docs/s]       20232      19927.5  -304.45833
                        Max Indexing Throughput [docs/s]       21172        20849  -323.00000
                               Total indexing time [min]     55.7989       56.335    +0.53603
                                  Total merge time [min]     12.9766      13.3115    +0.33495
                                Total refresh time [min]     5.20067      5.20097    +0.00030
                                  Total flush time [min]   0.0648667    0.0681833    +0.00332
                         Total merge throttle time [min]    0.796417     0.879267    +0.08285
               Query latency term (50.0 percentile) [ms]     2.10049      2.15421    +0.05372
               Query latency term (90.0 percentile) [ms]     2.77537      2.84168    +0.06630
              Query latency term (100.0 percentile) [ms]     4.52081      5.15368    +0.63287
        Query latency country_agg (50.0 percentile) [ms]     112.049      110.385    -1.66392
        Query latency country_agg (90.0 percentile) [ms]     128.426      124.005    -4.42138
       Query latency country_agg (100.0 percentile) [ms]     155.989      133.797   -22.19185
             Query latency scroll (50.0 percentile) [ms]     16.1226      14.4974    -1.62519
             Query latency scroll (90.0 percentile) [ms]     17.2383      15.4079    -1.83043
            Query latency scroll (100.0 percentile) [ms]     18.8419      18.4241    -0.41784
 Query latency country_agg_cached (50.0 percentile) [ms]     1.70223      1.64502    -0.05721
 Query latency country_agg_cached (90.0 percentile) [ms]     2.34819      2.04318    -0.30500
Query latency country_agg_cached (100.0 percentile) [ms]     3.42547      2.86814    -0.55732
            Query latency default (50.0 percentile) [ms]     5.89058      5.83409    -0.05648
            Query latency default (90.0 percentile) [ms]     6.71282      6.64662    -0.06620
           Query latency default (100.0 percentile) [ms]     7.65307       7.3701    -0.28297
             Query latency phrase (50.0 percentile) [ms]     1.82687      1.83193    +0.00506
             Query latency phrase (90.0 percentile) [ms]     2.63714      2.46286    -0.17428
            Query latency phrase (100.0 percentile) [ms]     5.39892      4.22367    -1.17525
                            Median CPU usage (index) [%]     668.025       679.15   +11.12499
                            Median CPU usage (stats) [%]      143.75        162.4   +18.64999
                           Median CPU usage (search) [%]       223.1        229.2    +6.10000
                             Total Young Gen GC time [s]      39.447       40.456    +1.00900
                                Total Young Gen GC count          10           11    +1.00000
                               Total Old Gen GC time [s]       7.108        7.703    +0.59500
                                  Total Old Gen GC count          10           11    +1.00000
                                         Index size [GB]     3.25475      3.25098    -0.00377
                                      Total written [GB]     17.8434      18.3143    +0.47083
                             Heap used for segments [MB]     21.7504      21.5901    -0.16037
                           Heap used for doc values [MB]     0.16436      0.13905    -0.02531
                                Heap used for terms [MB]     20.0293      19.9159    -0.11345
                                Heap used for norms [MB]    0.105469    0.0935669    -0.01190
                               Heap used for points [MB]    0.773487     0.772155    -0.00133
                               Heap used for points [MB]    0.677795     0.669426    -0.00837
                                           Segment count         136          121   -15.00000
                     Indices Stats(90.0 percentile) [ms]     3.16053      3.21023    +0.04969
                     Indices Stats(99.0 percentile) [ms]     5.29526      3.94132    -1.35393
                    Indices Stats(100.0 percentile) [ms]     5.64971      7.02374    +1.37403
                       Nodes Stats(90.0 percentile) [ms]     3.19611      3.15251    -0.04360
                       Nodes Stats(99.0 percentile) [ms]     4.44111      4.87003    +0.42892
                      Nodes Stats(100.0 percentile) [ms]     5.22527      5.66977    +0.44450