## Introduction to Softmax Regression
Softmax regression stands as a cornerstone algorithm in machine learning, particularly for tackling multi-class classification tasks. Unlike binary logistic regression, which handles two outcomes, softmax extends this to scenarios with three or more categories. Imagine classifying flowers into species based on petal measurements—that's exactly what we'll do here using the classic Iris dataset.
This guide draws inspiration from machine learning advent calendars, where daily challenges build practical skills. Today, we're diving into creating a fully functional softmax model inside Microsoft Excel. Why Excel? It's ubiquitous, requires no programming setup, and lets you visualize every calculation step-by-step. This approach demystifies the 'black box' of ML models, making it ideal for beginners transitioning to advanced concepts.
By the end, you'll have a working model that predicts class probabilities and minimizes errors via gradient descent. We'll cover theory, setup, training, and evaluation, adding insights on real-world tweaks.
## Core Concepts: From Logits to Probabilities
At its heart, softmax regression transforms input features into class probabilities. Start with **features** (e.g., sepal length, petal width) multiplied by **weights** (learned parameters) plus a **bias**, yielding **logits**—raw, unnormalized scores for each class.
The magic happens in the **softmax function**:
$$ p_k = \\frac{e^{z_k}}{\\sum_{j=1}^K e^{z_j}} $$
Here, $z_k$ is the logit for class $k$, and $K$ is the total classes (3 for Iris: Setosa, Versicolor, Virginica). This ensures probabilities sum to 1 and are between 0 and 1.
### Loss Function: Measuring Errors
To train, we minimize the **cross-entropy loss**:
$$ L = -\\frac{1}{N} \\sum_{i=1}^N \\sum_{k=1}^K y_{i,k} \\log(p_{i,k}) $$
$y_{i,k}$ is 1 if sample $i$ belongs to class $k$, else 0. This penalizes confident wrong predictions heavily.
### Training with Gradient Descent
Update weights using gradients:
$$ w \\leftarrow w - \\eta \\frac{\\partial L}{\\partial w} $$
Where $\\eta$ is the learning rate. Gradients for softmax involve $(p_k - y_k)$ scaled by features—simple yet powerful.
For context, this mirrors frameworks like TensorFlow but exposes the math. Real-world tip: Numerical stability matters; exponentiating large logits can overflow. Excel handles this via LOGEST, but we'll approximate manually.
## Preparing Your Excel Workspace
Download the Iris dataset (150 samples, 4 features, 3 classes) or use a subset for speed. For the full implementation, grab the template from [this GitHub repo](https://github.com/everett-lindquist/excel-softmax-regression).
### Step 1: Data Layout
Set up columns A-F for the first 10 rows (expand later):
| A | B | C | D | E | F |
|---|---|---|---|---|---|
| Sample | Sepal Length | Sepal Width | Petal Length | Petal Width | Target Class |
| 1 | 5.1 | 3.5 | 1.4 | 0.2 | 0 (Setosa) |
- Targets as one-hot: Columns G-I for classes 0,1,2 (1 or 0).
- Normalize features? Optional, but subtract mean/divide std dev in new columns for stability.
### Step 2: Initialize Weights
Assume 4 features + bias = 5 inputs per class. For 3 classes:
- Row 20-24: Weights for Class 0 (W0_1 to W0_5, random -0.5 to 0.5).
- Row 25-29: Class 1.
- Row 30-34: Class 2.
Use `=RAND()*1-0.5` and copy. Transpose for matrix ops.
## Computing Logits and Probabilities
### Logits Calculation
For sample 1, class 0 logit (cell J1):
```excel
=MMULT(TRANSPOSE(data_range), weights_class0) + bias
```
Excel lacks direct matrix mult for rows; simulate with SUMPRODUCT:
```excel
=SUMPRODUCT(B2:E2, $J$20:$J$23) + $J$24
```
Copy across classes (K1 for class1, L1 for class2). Drag down for all samples.
### Softmax Probabilities
For class 0 prob (M1):
```excel
=EXP(J1) / (EXP(J1) + EXP(K1) + EXP(L1))
```
Replicate for others. Pro tip: For large logits, subtract max first: `EXP(J1-MAX(J1:L1))` prevents #NUM! errors.
## Loss and Gradients
### Total Loss
Per sample cross-entropy (N1):
```excel
= - (G2*LN(M2) + H2*LN(N2) + I2*LN(O2))
```
Average loss: `=AVERAGE(N1:N150)`.
### Gradients
Weight gradient for feature 1, class 0 (P20):
```excel
= (AVERAGE( (M_column - G_column) * feature1_column ) ) / N
```
More precisely, sum over samples: `(p_k - y_k) * x_j`, averaged.
For each weight:
```excel
= SUMPRODUCT( (probs_class0 - targets_class0), features ) / row_count
```
Biases similar, without features (just average error).
## Training Loop: Iterative Updates
In a 'Training' section:
- Learning rate η: 0.01 (cell A40).
- Epochs: Manual or VBA, but drag formulas.
Update button simulation:
New weights (Q20): `=J20 - $A$40 * P20`
Copy-paste values back to weights, recalculate. Repeat 1000+ times. Watch loss drop!
Advanced: Add momentum (`v = 0.9*v + 0.1*grad`) for faster convergence.
## Evaluating the Model
Post-training, predictions: `=MATCH(MAX(M2:O2), M2:O2, 0)-1`
Accuracy: Count correct vs total.
For Iris subset, expect 90%+ accuracy. Visualize: Plot loss vs epochs, confusion matrix.
```excel
# Confusion Matrix Setup
Rows: True classes, Columns: Predicted.
=COUNTIFS(pred_range, col_class, true_range, row_class)
```
## Real-World Applications and Extensions
- **Image Classification**: Scale to MNIST (784 features)—Excel strains, but proves concept.
- **NLP**: Word embeddings as features.
- **Limitations**: Excel slow for big data (use Python export). No regularization built-in; add L2 penalty to loss.
- **Enhancements**: One-vs-all logistic, or embed Solver add-in for optimization.
This Excel model bridges theory-practice. Export weights to scikit-learn for validation:
```python
import numpy as np
w_excel = np.array([...]) # Paste from sheet
```
## Key Takeaways
- Softmax: Logits → normalized probs.
- Cross-entropy + GD: Universal training recipe.
- Excel empowers: Visualize gradients, tweak η live.
Experiment: Try wine dataset. For production files, check [GitHub repo](https://github.com/everett-lindquist/excel-softmax-regression). Happy ML adventuring!
---
<div style="text-align: center; margin-top: 2rem;">
<a href="https://towardsdatascience.com/the-machine-learning-advent-calendar-day-14-softmax-regression-in-excel/" target="_blank" rel="noopener noreferrer" class="view-full-resource-btn" style="display: inline-block; background-color: #f97316; color: white; padding: 12px 24px; border-radius: 8px; text-decoration: none; font-weight: 600; transition: background-color 0.2s;">View Full Resource</a>
</div>