evals
other
temp: 0.2

Eval: compare models with paired tests

F

frosty

Verified

@frosty

1 min read
4h ago

Paired tests

Use the same inputs across models and compare outputs side-by-side.

Tip

Blind the judge to model names.

Comments (0)

No comments yet. Be the first to comment!