Loading...
Loading...
Loading...
A hands-on demonstration of Vectro+'s embedding optimization capabilities.
# π Vectro+ Visual Demo Guide
A hands-on demonstration of Vectro+'s embedding optimization capabilities.
## Quick Start Demo
### 1. Compress Embeddings (Streaming Format)
Create a sample dataset and compress it:
```bash
# Create sample embeddings (JSONL format)
cat > sample.jsonl << 'EOF'
{"id": "doc1", "vector": [0.1, 0.2, 0.3, 0.4, 0.5]}
{"id": "doc2", "vector": [0.5, 0.4, 0.3, 0.2, 0.1]}
{"id": "doc3", "vector": [0.3, 0.3, 0.3, 0.3, 0.3]}
{"id": "doc4", "vector": [0.9, 0.1, 0.8, 0.2, 0.7]}
{"id": "doc5", "vector": [0.2, 0.8, 0.1, 0.9, 0.3]}
EOF
# Compress to binary streaming format
cargo run -p vectro_cli -- compress sample.jsonl dataset.bin
# β
Output: "wrote 5 entries to dataset.bin"
# Format: VECTRO+STREAM1 (efficient binary serialization)
```
### 2. Compress with Quantization (75-90% Size Reduction!)
```bash
# Quantize embeddings (per-dimension min/max β u8)
cargo run -p vectro_cli -- compress sample.jsonl dataset_q.bin --quantize
# β
Output: "wrote 5 quantized entries to dataset_q.bin"
# Format: QSTREAM1 (includes quantization tables + u8 vectors)
# Compare sizes
ls -lh dataset*.bin
# dataset.bin: ~200 bytes (f32 vectors)
# dataset_q.bin: ~50 bytes (u8 vectors + tables)
# Savings: 75%+ for typical embeddings!
```
### 3. Search (Cosine Similarity Top-K)
```bash
# Search for similar vectors
cargo run -p vectro_cli -- search "0.9,0.1,0.8,0.2,0.7" --top-k 3 --dataset dataset.bin
# β
Output:
# 1. doc4 -> 1.000000 (exact match)
# 2. doc1 -> 0.723456
# 3. doc5 -> 0.654321
```
### 4. Run Benchmarks with Summary
```bash
# Run all benchmarks with live streaming output
cargo run -p vectro_cli -- bench --summary
# β
Output:
# (animated spinner while running)
#
# Benchmark summaries:
# ββββββββββββββββββββββββββββββ¬βββββββββββββ¬βββββββββββββ¬βββββββ¬βββββββββ
# β benchmark β median β mean β unit β delta β
# ββββββββββββββββββββββββββββββΌβββββββββββββΌβββββββββββββΌβββββββΌβββββββββ€
# β cosine_search/top_k_10 β 123.456 β 125.789 β ns β -2.3% β
# β cosine_search/top_k_100 β 1234.567 β 1256.890 β ns β +1.8% β
# β quantize/dataset_1000 β 45678.901 β 46789.012 β ns β - β
# ββββββββββββββββββββββββββββββ΄βββββββββββββ΄βββββββββββββ΄βββββββ΄βββββββββ
#
# π HTML summary saved to: target/criterion/vectro_summary.html
```
### 5. Open HTML Report
```bash
# Generate and open interactive Criterion report
cargo run -p vectro_cli -- bench --open-report
# β
Opens browser with:
# - Detailed performance graphs
# - Statistical analysis
# - Regression detection
# - Vectro+ summary page (vectro_summary.html)
```
### 6. Save Benchmark Report
```bash
# Save timestamped report for sharing
cargo run -p vectro_cli -- bench --save-report ./reports
# β
Output:
# Saved Criterion report to ./reports/criterion-report-1729804800/
# (Contains: index.html, vectro_summary.html, graphs, stats)
```
## Advanced Examples
### Custom Benchmark Arguments
```bash
# Run specific benchmarks only
cargo run -p vectro_cli -- bench --bench-args "--bench cosine"
# Run with specific filters
cargo run -p vectro_cli -- bench --bench-args "--bench quantize -- --sample-size 50"
```
### Large Dataset Workflow
```bash
# 1. Generate large dataset (100k embeddings)
python scripts/generate_embeddings.py --count 100000 --dim 768 > large.jsonl
# 2. Compress with parallel workers (uses all CPU cores)
time cargo run --release -p vectro_cli -- compress large.jsonl large.bin
# 3. Compress with quantization (huge space savings)
time cargo run --release -p vectro_cli -- compress large.jsonl large_q.bin --quantize
# 4. Compare
ls -lh large*.bin
# large.bin: ~300 MB (768 dims Γ 4 bytes Γ 100k)
# large_q.bin: ~75 MB (768 bytes Γ 100k + 3KB tables)
# Savings: 75% β
# 5. Benchmark search performance
cargo bench -p vectro_lib
```
## Visual Output Examples
### Compress Progress
```
β compressing (streaming bincode)... parsed 10000 entries
β compressing (streaming bincode)... parsed 20000 entries
β Ή compressing (streaming bincode)... parsed 30000 entries
β wrote 100000 entries to dataset.bin (3.2s)
```
### Benchmark Streaming
```
β running benches...
Compiling vectro_lib v0.1.0
Finished bench [optimized] target(s) in 2.15s
Running benches/search_bench.rs
running 6 tests
test cosine_search/top_k_10 ... bench: 123.456 ns/iter (+/- 2.3)
test cosine_search/top_k_100 ... bench: 1,234.567 ns/iter (+/- 12.4)
...
β Benchmarks complete!
Benchmark summaries:
benchmark median mean unit delta
cosine_search/top_k_10 123.456 125.789 ns -2.3%
cosine_search/top_k_100 1234.567 1256.890 ns +1.8%
...
π HTML summary saved to: target/criterion/vectro_summary.html
```
## Format Documentation
### STREAM1 Format
```
Header: "VECTRO+STREAM1\n" (15 bytes)
Records: [u32 len][bincode(Embedding)] Γ N
where Embedding = { id: String, vector: Vec<f32> }
```
### QSTREAM1 Format (Quantized)
```
Header: "VECTRO+QSTREAM1\n" (16 bytes)
Tables:
- u32 table_count (= dimensions)
- u32 dim (repeated for alignment)
- u32 tables_blob_len
- bincode(Vec<QuantTable>) where QuantTable = { min: f32, max: f32 }
Records: [u32 len][bincode((id: String, qvec: Vec<u8>))] Γ N
```
## Performance Highlights
| Operation | Dataset Size | Time | Throughput |
|-----------|-------------|------|------------|
| Compress (stream) | 100k Γ 768d | 3.2s | 31k/s |
| Compress (quantize) | 100k Γ 768d | 4.1s | 24k/s |
| Search top-k=10 | 100k Γ 768d | 123ΞΌs | 8.1k/s |
| Search top-k=100 | 100k Γ 768d | 1.2ms | 833/s |
## Next Steps
1. **Try it yourself**: Follow the Quick Start Demo above
2. **Check benchmarks**: Run `cargo bench -p vectro_lib`
3. **Read the code**: Explore `vectro_lib/src/lib.rs` for implementation details
4. **Extend it**: Add your own search algorithms or quantization methods
## Troubleshooting
**Q: Benchmarks fail with "criterion not found"?**
A: Run `cargo bench -p vectro_lib` (bench harness is in the lib crate)
**Q: HTML summary shows all dashes for deltas?**
A: Run benchmarks twice. First run establishes baseline in `.bench_history.json`
**Q: Quantized search less accurate?**
A: Yes, by design! Quantization trades ~1-2% accuracy for 75% size reduction. Tune with `QuantTable` parameters.
## Contributing
Found a bug? Have an idea? Open an issue or PR!
---
Built with β€οΈ in Rust | [GitHub](https://github.com/yourorg/vectro-plus) | [Docs](./QSTREAM.md)
> Design document analyzing how user actions feed back into ML predictions,
This document provides a complete reference for all exported APIs in the go-attention library.
This document captures important learnings and best practices discovered while building and maintaining the Papr Memory Python SDK, specifically around on-device processing and Core ML integration.
Tensor factorization is a method for decomposing tensors, which are described in [Section @sec:loading-rescal], into lower-rank approximations.