## Bringing Embeddings to Your Spreadsheets: A Game-Changer for Data Analysis
Embeddings are vector representations of text that capture semantic meaning, making them perfect for tasks like finding similar documents, clustering ideas, or powering recommendation engines. Imagine analyzing customer feedback, matching job descriptions to resumes, or even organizing your movie watchlist by plot themes—all right inside Excel. Microsoft’s Excel ML Tools add-in makes this a reality by integrating Azure OpenAI’s embedding models seamlessly into spreadsheets. This opens up machine learning capabilities to anyone comfortable with Excel formulas, democratizing advanced AI for everyday analysts and business users.
In this guide, we'll dive deep into setting up and using embeddings in Excel. We'll cover installation, key functions, practical examples with movie plots, and real-world applications. By the end, you'll be equipped to perform semantic searches and similarity computations effortlessly. Let's get started!
## Step 1: Install the Excel ML Tools Add-in
First things first, you need the [Excel ML Tools add-in from GitHub](https://github.com/microsoft/excel-ml-tools). This free tool from Microsoft brings ML functions like embeddings into Excel without requiring Python or complex setups.
Here's how to install it:
- Head to the [GitHub repo](https://github.com/microsoft/excel-ml-tools) and download the latest `ExcelMLTools-[version].xll` file.
- Open Excel (it works on Excel for Microsoft 365, version 2403 or later; preview builds also supported).
- Go to **File > Get Add-ins** (or **Insert > Get Add-ins** in some versions).
- Search for "Excel ML Tools" or browse **My Add-ins** > **Manage My Add-ins**.
- Alternatively, for a manual install: **File > Options > Add-ins > Manage Excel Add-ins > Go**, then **Browse** to the downloaded `.xll` file.
Once installed, you'll see new functions like `EMBEDDING` in the formula autocomplete. Pro tip: Restart Excel after installation to ensure everything loads smoothly.
## Step 2: Set Up Your Azure OpenAI API Key
Embeddings rely on Azure OpenAI's models. You'll need an API key:
- Create an [Azure account](https://azure.microsoft.com/free/) if you don't have one (free tier available).
- In the Azure portal, search for **Azure OpenAI** and create a resource.
- Deploy the `text-embedding-ada-002` model (it produces 1536-dimensional vectors optimized for English text).
- Grab your **API Key**, **Endpoint**, and **Deployment Name** from the resource keys and deployments sections.
Store these securely—Excel functions will reference them. For production use, consider key vaults, but for demos, pasting into cells works fine.
**Real-world tip**: Rotate keys regularly and use environment variables if scripting outside Excel.
## Step 3: Generate Embeddings with the EMBEDDING Function
The star of the show is `=EMBEDDING(text, model, api_key/version_endpoint, [options])`. It converts text into a vector.
### Quick Example Setup
Let's replicate the article's movie demo for hands-on learning. Download the [demo workbook](https://github.com/microsoft/excel-ml-tools/blob/main/examples/embeddings/EmbeddingsDemo.xlsx) from the repo to follow along.
1. In cell A1, enter movie titles (e.g., A2: "The Matrix", A3: "Inception", etc.).
2. In B2:B10, paste plot summaries like: "A computer hacker learns from mysterious rebels about the true nature of his reality and his role in the war against its controllers."
3. In C1, input your API details: Deployment name in C2 (e.g., "text-embedding-ada-002"), API key/version in D2, endpoint in E2.
4. In D2, use: `=EMBEDDING(B2, C$2, D$2 & "/embeddings?api-version=2023-12-01-preview", "{\"dimensions\":1536}")`
This spills a 1x1536 array of floats representing the embedding. Each row gets its vector—Excel handles the arrays natively!
**Deep Dive**: The `[options]` parameter lets you tweak dimensions (default 1536, but truncatable to 512 for speed). Rate limits apply (e.g., 3K RPM for ada-002), so batch wisely.
## Step 4: Measure Similarity with COSINE_SIMILARITY
Vectors are useless without comparison. Enter `=COSINE_SIMILARITY(vec1, vec2)`, returning values from -1 (opposite) to 1 (identical).
### Example: Movie Plot Matching
- Embed all plots in column D.
- In F2: `=COSINE_SIMILARITY(D2, D$2:D$10)`—this creates a similarity matrix!
- Format as a heatmap (Conditional Formatting > Color Scales) for visual similarity clusters.
**Pro Application**: In sales, embed product descriptions and customer queries to score matches automatically. Beats keyword search every time.
**Enhancement**: Combine with `MMULT` for matrix-wide computations: `=MMULT(TRANSPOSE(D2:D10), D2:D10)` for dot products, then normalize for cosine.
## Step 5: Power Up with TOP_K and SEMANTIC_SEARCH
For actionable insights:
- `=TOP_K(array, values, k, [include_ties])`: Ranks top similar items.
- Example: `=TOP_K(F2:F10, D2, 5)` finds top 5 movies like "The Matrix".
- `=SEMANTIC_SEARCH(search_text, embeddings_array, texts_array, k)`: One-shot semantic search!
- In G1: Query "hacker story".
- G2: `=SEMANTIC_SEARCH(G$1, D$2:D$10, A$2:A$10, 3)`—returns top matches with scores.
This spills results as `{text; score}` pairs. Magic for querying datasets!
### Full Movie Demo Walkthrough
| Movie | Plot | Embedding | Similarity to Matrix |
|-------|------|-----------|----------------------|
| The Matrix | ... | [vector] | 1 |
| Inception | ... | [vector] | 0.72 |
Top results for "time travel": Back to the Future (0.68), Interstellar (0.65).
## Advanced Tips and Real-World Use Cases
- **Batch Efficiency**: Embed once, reuse vectors. Store in hidden sheets.
- **Multi-Language**: ada-002 handles others, but text-embedding-3-small/large excel too.
- **Clustering**: Use `KMEANS` from the add-in on embeddings for grouping.
**Applications**:
- **HR**: Semantic resume screening—query "Python ML engineer" against CVs.
- **Marketing**: Cluster feedback embeddings for themes.
- **E-commerce**: Product recommendations via similarity.
- **Research**: Literature reviews by embedding abstracts.
**Limitations**: API costs (~$0.0001/1K tokens), no local models yet, internet required.
## Wrapping Up: Your ML Spreadsheet Journey
With Excel ML Tools, embeddings turn spreadsheets into AI powerhouses. Experiment with the [demo file](https://github.com/microsoft/excel-ml-tools/blob/main/examples/embeddings/EmbeddingsDemo.xlsx), tweak queries, and scale to your data. Check the [full repo](https://github.com/microsoft/excel-ml-tools) for updates and more functions like CHATGPT or CLASSIFY.
This is just Day 22 vibes—keep adventuring in ML!
---
<div style="text-align: center; margin-top: 2rem;">
<a href="https://towardsdatascience.com/the-machine-learning-advent-calendar-day-22-embeddings-in-excel/" target="_blank" rel="noopener noreferrer" class="view-full-resource-btn" style="display: inline-block; background-color: #f97316; color: white; padding: 12px 24px; border-radius: 8px; text-decoration: none; font-weight: 600; transition: background-color 0.2s;">View Full Resource</a>
</div>