Data & Analysis

Hands-On Python Toolkit for Time Series Anomaly Detection: Step-by-Step Guide

Claude Directory December 30, 2025

0 views

Build robust anomaly detection pipelines for time series data using Python libraries like ADTK, TSFresh, Luminos, and PyOD. This guide provides code examples, practical tips, and real-world applications to spot outliers effectively.

## Why Time Series Anomaly Detection Matters Time series data is everywhere—from sensor readings in manufacturing to website traffic logs and financial metrics. Anomalies in these sequences can signal critical issues like equipment failures, cyber attacks, or market crashes. Detecting them early saves time and money, but it's tricky due to trends, seasonality, and noise. Traditional stats methods often fall short on complex patterns, so machine learning steps in. This guide walks you through four battle-tested Python libraries: ADTK, TSFresh, Luminos, and PyOD. You'll get installation steps, core concepts, code snippets, and tips for production use. All examples use synthetic data for reproducibility, but they scale to real datasets. We'll use a sample dataset: daily sales with injected anomalies. Grab the notebooks from [this GitHub repo](https://github.com/philiphendren/time-series-anomaly-detection-python) to follow along. ## Getting Started: Setup and Data Prep First, install dependencies. Use a virtual environment: ```bash pip install pandas numpy matplotlib scikit-learn adtk tsfresh luminos pyod ``` Load and visualize data: ```python import pandas as pd import numpy as np import matplotlib.pyplot as plt from datetime import datetime dates = pd.date_range('2020-01-01', periods=1000, freq='D') data = pd.DataFrame({ 'timestamp': dates, 'value': 100 + np.cumsum(np.random.randn(1000) * 0.1) + np.sin(np.arange(1000) / 30) * 10 }, index=dates) # Inject anomalies anomaly_indices = [100, 200, 500] data.iloc[anomaly_indices, 1] += 50 plt.plot(data.index, data['value']) plt.title('Sample Time Series with Anomalies') plt.show() ``` This creates a trending, seasonal series with point anomalies. Now, dive into each toolkit. ## ADTK: Specialized for Time Series Anomalies [ADTK](https://github.com/arundo/adtk) (Anomaly Detection Toolkit) shines for univariate and multivariate time series. It offers detectors, transformers, and aggregators to build custom pipelines. ### Key Components - **Detectors**: Threshold, outliers, seasonal, level shift, etc. - **Transformers**: Clean data, decompose trends/seasonality. - **Validators**: Measure contamination levels. ### Step-by-Step Implementation 1. **Prep data**: Convert to ADTK format (DataFrame with DatetimeIndex). 2. **Apply transformers**: ```python from adtk.data import validate_series from adtk.transformer import DoubleRollingAggregate validate_series(data['value']) # Trend and seasonality decomposition roller = DoubleRollingAggregate(window=30, agg='mean', diff='mean', center=True) data_trend = roller.fit_transform(data[['value']]) ``` 3. **Detect anomalies**: ```python from adtk.detector import ThresholdAD, PersistAD, LevelShiftAD from adtk.pipe import Pipeline from adtk.visualization import plot # Pipeline: detect spikes, persistence, shifts pipeline = Pipeline([ ('threshold', ThresholdAD(high=3, low=-3, impact=1)), ('persist', PersistAD(c=3, side='both')), ('levelshift', LevelShiftAD(c=6)) ]) anomalies = pipeline.fit_detect(data_trend) plot(data_trend, anomaly=anomalies, anomaly_color='orange', anomaly_tag='marker') ``` ADTK excels in interpretable rules for domains like IoT monitoring. Tune parameters via cross-validation on historical anomalies. Pro: Time-series native. Con: Less ML-heavy for subtle patterns. ## TSFresh: Feature Extraction Powerhouse [TSFresh](https://github.com/blue-yonder/tsfresh) extracts hundreds of features from time series, then pairs with scikit-learn for anomaly detection. Ideal when you need engineered features for ML models. ### Workflow 1. **Extract features**: ```python from tsfresh import extract_features from tsfresh.utilities.dataframe_functions import impute # Roll into segments (id for each window) data_rolled = data.rolling(window=30).mean().dropna() data_rolled['id'] = data_rolled.index // 30 extracted_features = extract_features(data_rolled, column_id='id', column_sort='timestamp', column_value='value') impute(extracted_features) ``` 2. **Train Isolation Forest**: ```python from sklearn.ensemble import IsolationForest from sklearn.model_selection import train_test_split X_train, X_test = train_test_split(extracted_features, test_size=0.2) clf = IsolationForest(contamination=0.1) clf.fit(X_train) preds = clf.predict(X_test) # -1 anomaly ``` TSFresh auto-selects relevant features, reducing dimensionality. Use for high-volume data like logs. Add value: Combine with domain features (e.g., holidays). Pro: Scalable feature engineering. Con: Compute-intensive on long series. ## Luminos: End-to-End from Salesforce [Luminos](https://github.com/salesforce/luminos) is an all-in-one library for forecasting and anomaly detection. It handles preprocessing, modeling, and visualization seamlessly. ### Core Pipeline 1. **Initialize and preprocess**: ```python from luminos import Luminos lumi = Luminos(dataframe=data, timestamp_col='timestamp', target_col='value') lumi.preprocess() ``` 2. **Detect anomalies**: ```python lumi.detect_anomalies(method='iqr', window=30) # or 'zscore', 'mad' # Visualize lumi.plot_anomalies() ``` 3. **Advanced: Ensemble models**: Luminos supports Prophet, ARIMA hybrids under the hood. Tune with `forecast_horizon` for predictions. Great for quick prototypes in business analytics. Real-world: Salesforce uses it for SaaS metrics. Pro: Minimal code. Con: Less customizable than others. ## PyOD: Unsupervised Outlier Detection [PyOD](https://github.com/yzhao062/pyod) is a general-purpose outlier library with 45+ algorithms, including time-series friendly ones like AutoEncoder and LSTM. ### Time Series Adaptation 1. **Flatten or window**: Convert series to supervised format: ```python from pyod.utils.utility import standardizer # Sliding windows def create_windows(series, window=30): X = [] for i in range(len(series) - window): X.append(series[i:i+window]) return np.array(X) X = create_windows(data['value'].values) X = standardizer(X) ``` 2. **Fit detector**: ```python from pyod.models.knn import KNN from pyod.models.iforest import IForest # KNN for local outliers clf = KNN(contamination=0.1) clf.fit(X) scores = clf.decision_scores_ labels = clf.labels_ # 1 anomaly ``` 3. **Deep learning option**: ```python from pyod.models.auto_encoder import AutoEncoder ae = AutoEncoder(epochs=50, contamination=0.1) ae.fit(X) ``` PyOD benchmarks algorithms for speed/accuracy. Use GPU for NN models. Pro: Vast choices, production-ready. Con: Requires feature engineering for TS. ## Comparing the Toolkits | Library | Best For | Ease | Speed | Customization | |---------|----------|------|-------|---------------| | ADTK | Rule-based TS | High | Fast | Medium | | TSFresh| Feature ML | Med | Slow | High | | Luminos| Quick E2E | High | Fast | Low | | PyOD | Advanced OOD | Med | Var | High | Pick based on data volume and expertise. Ensemble them: Use ADTK for alerts, PyOD for confirmation. ## Production Tips - **Scalability**: Stream with Kafka + online detectors (PyOD supports). - **Evaluation**: Precision@K, since labels are rare. Use PR-AUC. - **Real-World Example**: Monitor server CPU—ADTK catches spikes, TSFresh trends. - **Next Steps**: Tune hyperparameters with Optuna, deploy via FastAPI. This toolkit arms you for any time series anomaly challenge. Experiment with the [demo repo](https://github.com/philiphendren/time-series-anomaly-detection-python) and adapt to your data. --- <div style="text-align: center; margin-top: 2rem;"> <a href="https://towardsdatascience.com/a-practical-toolkit-for-time-series-anomaly-detection-using-python/" target="_blank" rel="noopener noreferrer" class="view-full-resource-btn" style="display: inline-block; background-color: #f97316; color: white; padding: 12px 24px; border-radius: 8px; text-decoration: none; font-weight: 600; transition: background-color 0.2s;">View Full Resource</a> </div>

Comments

More Blog

View all

Data & Analysis

Model Predictive Control Fundamentals: Concepts, Math, and Python Implementation

Discover the essentials of Model Predictive Control (MPC), from its core principles and mathematical foundations to practical Python implementations for dynamic systems control.

Claude Directory

Data & Analysis

Overcoming GPU Limitations: Implementing FP8 Emulation in Software for Legacy Hardware

Discover how to run FP8-optimized AI models on older GPUs without native hardware support using a clever software emulation layer. Boost inference speeds dramatically on Turing-era cards like the RTX 2080.

Claude Directory

Data & Analysis

Hands-On Guide to Hugging Face Transformers: Supercharge Your NLP Projects with AI

Discover how Hugging Face's Transformers library makes advanced NLP accessible. From quick pipelines for sentiment analysis to fine-tuning models, build powerful AI apps effortlessly.

Claude Directory

Data & Analysis

Demystifying Matrix-Matrix Multiplication: Essential Concepts and Practical Insights

Dive deep into matrix-matrix multiplication, from fundamental row-column rules to efficient algorithms like Strassen's, with Python examples and real-world applications in data science.

Claude Directory

Data & Analysis

Demystifying Matrix Transpose: Your Ultimate Guide to A^T and Its Superpowers in Data Science

Dive into the exciting world of matrix transpose! Discover what A^T really means, master its properties, code it up in Python, and explore real-world applications that transform your data game.

Claude Directory

Data & Analysis

Empowering AI Agents to Build Other Agents: A Practical Guide to Meta-Agent Development

Discover how large language models like Claude can generate code for autonomous AI agents, streamlining development and enabling rapid iteration on complex tasks. This approach turns manual coding into an automated, scalable process.

Claude Directory

Hands-On Python Toolkit for Time Series Anomaly Detection: Step-by-Step Guide

Tags

Comments

More Blog

Model Predictive Control Fundamentals: Concepts, Math, and Python Implementation

Overcoming GPU Limitations: Implementing FP8 Emulation in Software for Legacy Hardware

Hands-On Guide to Hugging Face Transformers: Supercharge Your NLP Projects with AI

Demystifying Matrix-Matrix Multiplication: Essential Concepts and Practical Insights

Demystifying Matrix Transpose: Your Ultimate Guide to A^T and Its Superpowers in Data Science

Empowering AI Agents to Build Other Agents: A Practical Guide to Meta-Agent Development