Discover AI Workflows

Explore prompts, agent designs, model notes, and developer tools

evals

other

Agent evaluation: measure tool success, not vibes

Evaluate agents with measurable outcomes.

#evals #agents #metrics

Ffrosty

1 min

17h ago

evals

other

LLM eval basics: golden sets and rubric scoring

How to build an evaluation set that actually catches regressions.

#evals #testing #ci

Ffrosty

1 min

17h ago

evals

other

LLM-as-judge: how to reduce bias

Practical tricks to make LLM judging more stable.

#evals #llm-as-judge #quality

Ffrosty

1 min

17h ago

tools

other

RAG chunking rules that usually work

Chunk sizing + overlap guidelines for retrieval.

#rag #retrieval #chunking

Ffrosty

1 min

17h ago

tools

other

Embeddings 101: cosine similarity pitfalls

Common mistakes when using embeddings for search.

#embeddings #search #rerank

Ffrosty

1 min

17h ago

tools

other

Re-ranking: the easiest big quality win

Why rerankers often improve relevance more than “better chunking”.

#rerank #rag #search

Ffrosty

1 min

17h ago

models

other

OpenAI vs Anthropic for dev tooling: quick comparison

A practical comparison for developer workflows.

#models #comparison #tooling

Ffrosty

1 min

17h ago

models

other

Temperature, top_p, and why you shouldn’t tune both

A simple heuristic for sampling parameters.

#models #sampling #best-practices

Ffrosty

1 min

17h ago

tools

other

Token budgeting for apps: the boring thing that saves money

Practical token caps and truncation strategies.

#cost #tokens #optimization

Ffrosty

1 min

17h ago

tools

other

Prompt injection 101: what to defend

A practical overview of injection risks.

#security #prompt-injection #rag

Ffrosty

1 min

17h ago

Page 7 of 8

Prev Next