Skelf-Research / open source

Intelligent pairwise comparisons. Better rankings with fewer votes.

Compere is a Python library and FastAPI service that picks which pair to compare next using a Multi-Armed Bandit, then turns the verdicts into Elo ratings you can read. Built for RLHF data collection, A/B testing, leaderboards, and eval ranking.

Read the docs View source Read the blog

What it does

Two algorithms, one honest answer.

Pair selection · UCB1 UCB(i) = win_rate(i) + c · √(ln N / n_i)

c = 1.414 (default), N = total comparisons, n_i = comparisons involving entity i. New entities receive a large initial weight so they get surveyed before they get scored.

Rating update · Elo new_rating = old_rating + K · (actual − expected)
expected = 1 / (1 + 10^{((opp − rating) / 400)})

K = 32 by default. Initial rating 1500. The same Elo formulation you know from chess; nothing fancier is claimed.

Note on the field: compere does not implement Bradley-Terry, Thurstone, or TrueSkill. UCB plus Elo was chosen because both are interpretable end-to-end — you can explain a rating change to a stakeholder in two sentences.

Who uses it

Eval & RLHF teams

Collect preference data over model outputs without showing annotators every pair. UCB concentrates votes on uncertain comparisons; you ship a reward signal sooner.

A/B and content ranking ops

Rank designs, headlines, or product photos against each other. The Elo board is sortable, replayable, and an honest function of the votes it received.

Taste-graph builders

Turn “A or B?” clicks into a ranked catalog. SQLite by default; PostgreSQL when you outgrow it; same code on either.

Researchers and tinkerers

Import compere as a library and call create_entity / create_comparison / get_ratings directly. No server needed for offline studies.

Quick start

Install, run, vote.

pip install compere
compere --port 8090

# get the next pair to compare (UCB picks it)
curl localhost:8090/mab/next_comparison

# record a verdict
curl -X POST localhost:8090/comparisons/ \
  -H "Content-Type: application/json" \
  -d '{"entity1_id":1,"entity2_id":2,"selected_entity_id":1}'

# read the leaderboard
curl localhost:8090/ratings

Interactive API docs are served at /docs by FastAPI. The full HTTP surface is the eleven endpoints listed in the API reference.

From the blog

May 28, 2026

Stopping rules: when have you compared enough?

There is no universal answer, but there are three honest tests. We walk through each, with the queries you can run against the compere API.
May 20, 2026

Reading the Elo output without lying

Compere ships Elo, not Bradley-Terry. The numbers it produces are easy to misread. A field guide to what an Elo gap actually means.
May 12, 2026

Why 50 pairwise votes beat 500 ratings

Rating scales drift, anchor, and lie. Pairwise comparisons survive all three. Here is the math we use, and where it actually breaks down.

All posts →

How compere compares

Honest, narrow comparisons against tools that overlap on one axis or another.

compere vs. plain Elo libraries

You can get Elo from a 30-line gist. The interesting question is which pair to ask about next.
compere vs. AHP-style ranking SaaS

AHP gives weighted criteria; compere gives a single ranked list from raw pairwise votes.

Intelligent pairwise comparisons. Better rankings with fewer votes.

Two algorithms, one honest answer.

Eval & RLHF teams

A/B and content ranking ops

Taste-graph builders

Researchers and tinkerers

Install, run, vote.

Stopping rules: when have you compared enough?

Reading the Elo output without lying

Why 50 pairwise votes beat 500 ratings

compere vs. plain Elo libraries

compere vs. AHP-style ranking SaaS