## Why Time Series Anomaly Detection Matters
Time series data is everywhere—from sensor readings in manufacturing to website traffic logs and financial metrics. Anomalies in these sequences can signal critical issues like equipment failures, cyber attacks, or market crashes. Detecting them early saves time and money, but it's tricky due to trends, seasonality, and noise.
Traditional stats methods often fall short on complex patterns, so machine learning steps in. This guide walks you through four battle-tested Python libraries: ADTK, TSFresh, Luminos, and PyOD. You'll get installation steps, core concepts, code snippets, and tips for production use. All examples use synthetic data for reproducibility, but they scale to real datasets.
We'll use a sample dataset: daily sales with injected anomalies. Grab the notebooks from [this GitHub repo](https://github.com/philiphendren/time-series-anomaly-detection-python) to follow along.
## Getting Started: Setup and Data Prep
First, install dependencies. Use a virtual environment:
```bash
pip install pandas numpy matplotlib scikit-learn adtk tsfresh luminos pyod
```
Load and visualize data:
```python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime
dates = pd.date_range('2020-01-01', periods=1000, freq='D')
data = pd.DataFrame({
'timestamp': dates,
'value': 100 + np.cumsum(np.random.randn(1000) * 0.1) + np.sin(np.arange(1000) / 30) * 10
}, index=dates)
# Inject anomalies
anomaly_indices = [100, 200, 500]
data.iloc[anomaly_indices, 1] += 50
plt.plot(data.index, data['value'])
plt.title('Sample Time Series with Anomalies')
plt.show()
```
This creates a trending, seasonal series with point anomalies. Now, dive into each toolkit.
## ADTK: Specialized for Time Series Anomalies
[ADTK](https://github.com/arundo/adtk) (Anomaly Detection Toolkit) shines for univariate and multivariate time series. It offers detectors, transformers, and aggregators to build custom pipelines.
### Key Components
- **Detectors**: Threshold, outliers, seasonal, level shift, etc.
- **Transformers**: Clean data, decompose trends/seasonality.
- **Validators**: Measure contamination levels.
### Step-by-Step Implementation
1. **Prep data**: Convert to ADTK format (DataFrame with DatetimeIndex).
2. **Apply transformers**:
```python
from adtk.data import validate_series
from adtk.transformer import DoubleRollingAggregate
validate_series(data['value'])
# Trend and seasonality decomposition
roller = DoubleRollingAggregate(window=30, agg='mean', diff='mean', center=True)
data_trend = roller.fit_transform(data[['value']])
```
3. **Detect anomalies**:
```python
from adtk.detector import ThresholdAD, PersistAD, LevelShiftAD
from adtk.pipe import Pipeline
from adtk.visualization import plot
# Pipeline: detect spikes, persistence, shifts
pipeline = Pipeline([
('threshold', ThresholdAD(high=3, low=-3, impact=1)),
('persist', PersistAD(c=3, side='both')),
('levelshift', LevelShiftAD(c=6))
])
anomalies = pipeline.fit_detect(data_trend)
plot(data_trend, anomaly=anomalies, anomaly_color='orange', anomaly_tag='marker')
```
ADTK excels in interpretable rules for domains like IoT monitoring. Tune parameters via cross-validation on historical anomalies. Pro: Time-series native. Con: Less ML-heavy for subtle patterns.
## TSFresh: Feature Extraction Powerhouse
[TSFresh](https://github.com/blue-yonder/tsfresh) extracts hundreds of features from time series, then pairs with scikit-learn for anomaly detection. Ideal when you need engineered features for ML models.
### Workflow
1. **Extract features**:
```python
from tsfresh import extract_features
from tsfresh.utilities.dataframe_functions import impute
# Roll into segments (id for each window)
data_rolled = data.rolling(window=30).mean().dropna()
data_rolled['id'] = data_rolled.index // 30
extracted_features = extract_features(data_rolled, column_id='id', column_sort='timestamp', column_value='value')
impute(extracted_features)
```
2. **Train Isolation Forest**:
```python
from sklearn.ensemble import IsolationForest
from sklearn.model_selection import train_test_split
X_train, X_test = train_test_split(extracted_features, test_size=0.2)
clf = IsolationForest(contamination=0.1)
clf.fit(X_train)
preds = clf.predict(X_test) # -1 anomaly
```
TSFresh auto-selects relevant features, reducing dimensionality. Use for high-volume data like logs. Add value: Combine with domain features (e.g., holidays). Pro: Scalable feature engineering. Con: Compute-intensive on long series.
## Luminos: End-to-End from Salesforce
[Luminos](https://github.com/salesforce/luminos) is an all-in-one library for forecasting and anomaly detection. It handles preprocessing, modeling, and visualization seamlessly.
### Core Pipeline
1. **Initialize and preprocess**:
```python
from luminos import Luminos
lumi = Luminos(dataframe=data, timestamp_col='timestamp', target_col='value')
lumi.preprocess()
```
2. **Detect anomalies**:
```python
lumi.detect_anomalies(method='iqr', window=30) # or 'zscore', 'mad'
# Visualize
lumi.plot_anomalies()
```
3. **Advanced: Ensemble models**:
Luminos supports Prophet, ARIMA hybrids under the hood. Tune with `forecast_horizon` for predictions.
Great for quick prototypes in business analytics. Real-world: Salesforce uses it for SaaS metrics. Pro: Minimal code. Con: Less customizable than others.
## PyOD: Unsupervised Outlier Detection
[PyOD](https://github.com/yzhao062/pyod) is a general-purpose outlier library with 45+ algorithms, including time-series friendly ones like AutoEncoder and LSTM.
### Time Series Adaptation
1. **Flatten or window**:
Convert series to supervised format:
```python
from pyod.utils.utility import standardizer
# Sliding windows
def create_windows(series, window=30):
X = []
for i in range(len(series) - window):
X.append(series[i:i+window])
return np.array(X)
X = create_windows(data['value'].values)
X = standardizer(X)
```
2. **Fit detector**:
```python
from pyod.models.knn import KNN
from pyod.models.iforest import IForest
# KNN for local outliers
clf = KNN(contamination=0.1)
clf.fit(X)
scores = clf.decision_scores_
labels = clf.labels_ # 1 anomaly
```
3. **Deep learning option**:
```python
from pyod.models.auto_encoder import AutoEncoder
ae = AutoEncoder(epochs=50, contamination=0.1)
ae.fit(X)
```
PyOD benchmarks algorithms for speed/accuracy. Use GPU for NN models. Pro: Vast choices, production-ready. Con: Requires feature engineering for TS.
## Comparing the Toolkits
| Library | Best For | Ease | Speed | Customization |
|---------|----------|------|-------|---------------|
| ADTK | Rule-based TS | High | Fast | Medium |
| TSFresh| Feature ML | Med | Slow | High |
| Luminos| Quick E2E | High | Fast | Low |
| PyOD | Advanced OOD | Med | Var | High |
Pick based on data volume and expertise. Ensemble them: Use ADTK for alerts, PyOD for confirmation.
## Production Tips
- **Scalability**: Stream with Kafka + online detectors (PyOD supports).
- **Evaluation**: Precision@K, since labels are rare. Use PR-AUC.
- **Real-World Example**: Monitor server CPU—ADTK catches spikes, TSFresh trends.
- **Next Steps**: Tune hyperparameters with Optuna, deploy via FastAPI.
This toolkit arms you for any time series anomaly challenge. Experiment with the [demo repo](https://github.com/philiphendren/time-series-anomaly-detection-python) and adapt to your data.
---
<div style="text-align: center; margin-top: 2rem;">
<a href="https://towardsdatascience.com/a-practical-toolkit-for-time-series-anomaly-detection-using-python/" target="_blank" rel="noopener noreferrer" class="view-full-resource-btn" style="display: inline-block; background-color: #f97316; color: white; padding: 12px 24px; border-radius: 8px; text-decoration: none; font-weight: 600; transition: background-color 0.2s;">View Full Resource</a>
</div>