Field notes

Blog

Short essays on the parts of pairwise ranking that bite you in production.

  • Stopping rules: when have you compared enough?

    There is no universal answer, but there are three honest tests. We walk through each, with the queries you can run against the compere API.

    stopping-rulesucboperations
  • Reading the Elo output without lying

    Compere ships Elo, not Bradley-Terry. The numbers it produces are easy to misread. A field guide to what an Elo gap actually means.

    elointerpretationcommunication
  • Why 50 pairwise votes beat 500 ratings

    Rating scales drift, anchor, and lie. Pairwise comparisons survive all three. Here is the math we use, and where it actually breaks down.

    pairwiseeloucbstudy-design