Loading...
Loading...
Loading...
Run a text retrieval benchmark without generation (no LLM required).
# Text Retrieval Benchmark
Run a text retrieval benchmark without generation (no LLM required).
## Download Dataset
```bash
autorag-research data restore beir scifact_openai-small
```
Downloads BEIR SciFact (300 queries, 5,183 documents).
## Create Experiment Config
```yaml
# configs/experiment.yaml
db_name: beir_scifact_test_openai_small
pipelines:
retrieval:
- bm25
generation: []
metrics:
retrieval:
- recall
- precision
- ndcg
- mrr
generation: []
```
## Run
```bash
autorag-research run --config-name=experiment
```
## Expected Output
```
Pipeline: bm25
Recall@10: 0.847
Precision@10: 0.085
NDCG@10: 0.712
MRR@10: 0.634
```
## Recommended Datasets
| Dataset | Queries | Documents | Best For |
|---------|---------|-----------|----------|
| BEIR SciFact | 300 | 5,183 | Scientific claims |
| BEIR NFCorpus | 323 | 3,633 | Biomedical |
| MTEB | varies | varies | General text |
See [Text Datasets](../datasets/text/index.md) for all options.
## Recommended Metrics
| Metric | Measures |
|--------|----------|
| Recall@k | Coverage of ground truth |
| NDCG@k | Ranking quality |
| MRR | First relevant position |
See [Retrieval Metrics](../metrics/retrieval/index.md) for details.
## Next
- [Text RAG](text-rag.md) - Add generation
- [Custom Pipeline](custom-pipeline.md) - Implement your algorithm
- Without a harness, you **can't compare** prompts, models, retrieval configs, or costs.
Evaluate, benchmark, and regression-test AI/LLM systems. Covers evaluation framework design, benchmark creation, human evaluation protocols, automated evaluation (LLM-as-judge), regression testing, statistical significance, and continuous evaluation pipelines.
<img width="1388" height="298" alt="full_diagram" src="https://github.com/user-attachments/assets/12a2371b-8be2-4219-9b48-90503eb43c69" />
A list of all public EEG-datasets. This list of EEG-resources is not exhaustive. If you find something new, or have explored any unfiltered link in depth, please update the repository.