Anomaly detection with autoencoders

In this Answer, we’ll explore the fascinating field of anomaly detection using PyTorch.

Introduction to anomaly detection

Anomaly detection, also known as outlier detection, is a crucial aspect of machine learning where the goal is to identify instances or patterns in a dataset that deviate significantly from the norm or expected behavior. This deviation from the norm is often called an anomaly or outlier. Anomalies can represent rare events, errors, fraud, faults, or any other unusual occurrences that are not typical in the dataset.

Leveraging autoencoders for anomaly detection

Autoencoders, a type of neural network, offer a powerful technique for anomaly detection. The fundamental idea behind using autoencoders for this task is their ability to learn a compact input data representation. An autoencoder consists of an encoder that compresses input data into a lower-dimensional representation (encoding) and a decoder that reconstructs the input data from this encoding.

Dataset visualization

We’ll be working with a synthetic time series dataset. Visualization is a crucial first step as it helps us understand the structure of our data and any patterns that may exist. Our goal is to identify anomalies in this time series data.

import numpy as np
import matplotlib.pyplot as plt
# Generate a synthetic time series data
timesteps = 1000
time = np.arange(0, timesteps, 1)
regular_data = np.sin(0.02 * time) # Regular sine wave pattern
# Introduce anomaly
anomaly_index = 800
regular_data[anomaly_index] = 2.5 # Introduce anomaly
# Plot the synthetic time series data
plt.plot(time, regular_data)
plt.xlabel('Time')
plt.ylabel('Value')
plt.show()

In the generated time series data, we can observe a regular sine wave pattern, which represents the expected behavior. The anomaly, introduced at the 800th time step, is visualized as a deviation from this regular pattern.

Designing the autoencoder and training

Now, let’s dive into the process of training the autoencoder. If interested in a detailed understanding of designing autoencoders, feel free to read Introduction to autoencoders using PyTorch.

main.py
Autoencoder.py
synthetic_data_generator.py
import torch
import torch.nn as nn
import torch.optim as optim
from Autoencoder import Autoencoder # Assume that the autoencoder class is defined in Autoencoder.py
from synthetic_data_generator import generate_synthetic_data
# Load the synthetic dataset
regular_data = generate_synthetic_data()
# Load autoencoder architecture
input_dim = 1
hidden_dim = 64
latent_dim = 32
encoder_dim = (input_dim, hidden_dim, latent_dim)
decoder_dim = (latent_dim, hidden_dim, input_dim)
autoencoder = Autoencoder(encoder_dim, decoder_dim)
# Define the loss function and optimizer
criterion = nn.MSELoss()
optimizer = optim.Adam(autoencoder.parameters(), lr=0.001)
# Train the autoencoder
num_epochs = 50
for epoch in range(num_epochs):
# Forward pass
outputs = autoencoder(regular_data)
loss = criterion(outputs, regular_data)
# Backward pass and optimization
optimizer.zero_grad()
loss.backward()
optimizer.step()
if (epoch + 1) % 10 == 0:
print(f'Epoch [{epoch + 1}/{num_epochs}], Loss: {loss.item():.4f}')
# Save the trained model
torch.save(autoencoder.state_dict(), "anomaly_detector.pth")

Code explanation

  • Lines 1–5: Import necessary libraries and modules.

  • Line 9: Load the synthetic time series dataset.

  • Lines 12–17: Load autoencoder architecture and instantiate the Autoencoder class.

  • Lines 20–21: Define mean squared error (MSE) loss and set up the Adam optimizer.

  • Lines 24–39: Train the autoencoder for a specified number of epochs, update parameters, and monitor loss.

Testing the autoencoder and visualizing the results

Let’s put our trained autoencoder to the test by feeding it with the regular time series data and visualizing the reconstruction errors. Anomalies will likely result in higher reconstruction errors, allowing us to identify and visualize them.

main.py
Autoencoder.py
synthetic_data_generator.py
import torch
import matplotlib.pyplot as plt
import torch.nn as nn
import torch.optim as optim
from Autoencoder import Autoencoder
from synthetic_data_generator import generate_synthetic_data
from synthetic_data_generator import time
from synthetic_data_generator import anomaly_index
# Load the synthetic dataset
regular_data = generate_synthetic_data()
# Load autoencoder architecture
input_dim = 1
hidden_dim = 64
latent_dim = 32
encoder_dim = (input_dim, hidden_dim, latent_dim)
decoder_dim = (latent_dim, hidden_dim, input_dim)
autoencoder = Autoencoder(encoder_dim, decoder_dim)
# Load the trained model
autoencoder.load_state_dict(torch.load("anomaly_detector.pth"))
autoencoder.eval()
# Predict on the regular data
predicted_data = autoencoder(regular_data)
# Calculate the reconstruction error
reconstruction_error = torch.mean((regular_data - predicted_data)**2, dim=1).detach().numpy()
# Plot the regular data and reconstruction errors
plt.figure(figsize=(10, 6))
plt.subplot(2, 1, 1)
plt.plot(time, regular_data.numpy(), label='Regular Data')
plt.scatter(anomaly_index, regular_data[anomaly_index].item(), color='red', label='Anomaly')
plt.title('Regular Time-Series Data with Anomaly')
plt.xlabel('Time')
plt.ylabel('Value')
plt.legend()
plt.subplot(2, 1, 2)
plt.plot(time, reconstruction_error, label='Reconstruction Error', color='orange')
plt.axvline(x=anomaly_index, color='red', linestyle='--', label='Anomaly Index')
plt.title('Reconstruction Error over Time')
plt.xlabel('Time')
plt.ylabel('Reconstruction Error')
plt.legend()
plt.tight_layout()
plt.show()

Note: In the visualizations, we’ll see the original time series data, marked with the introduced anomaly. The second subplot displays the reconstruction errors over time. Anomalies, such as the one introduced, will likely exhibit higher reconstruction errors, making them stand out in the plot.

Code explanation

  • Lines 7–8: Import time and anomaly_index from synthetic data generator file.

  • Line 22: Load pretrained autoencoder weights, enabling model reusability.

  • Line 23: Set autoencoder to evaluation mode for consistent inference behavior.

  • Line 26: Predict on regular time series data using the trained autoencoder.

  • Line 29: Calculate reconstruction error by comparing original data with predictions.

  • Lines 32–51: Plot regular time series data and reconstruction errors, marking anomalies.

Applications of anomaly detection

Anomaly detection has widespread applications beyond our example. Here are a few real-world scenarios:

  • Healthcare: Implementing anomaly detection in medical data can help identify unusual patterns in patient records, facilitating early disease detection.

  • Educational apps: In kids' learning apps, this approach can be used to detect anomalies in writing. If a child writes the wrong alphabet, the system can generate an alert, promoting effective learning.

  • Daily life security: Imagine using anomaly detection in your home security system. If someone unfamiliar with the household enters, the system can trigger an alert.

In conclusion, anomaly detection with autoencoders is a powerful tool with diverse applications, contributing to enhanced security, healthcare, and educational experiences. Dive into the code, explore, and unleash the potential of anomaly detection in your projects!

Unlock your potential: Autoencoders series, all in one place!

To deepen your understanding of Autoencoders, explore our series of Answers below:

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved