(Deprecated) Lisa's Tech Blog: Anomaly Detection

Thursday, May 7, 2015

Anomaly Detection

1. Data Labels

Supervised Anomaly Detection

labels available for both normal data and anomalies
similar to rare class mining

Semi-supervised Anomaly Detection

labels available only for normal data

Unsupervised Anomaly Detection

No labels assumed
Based on the assumption that anomalies are very rare compared to normal data

2. Type of Anomalies

P

Point Anomalies

An individual data instance is anomalous w.r.t. the data

Contextual Anomalies

An individual data instance is anomalous within a context
requires a notion of context
also referred to as conditional anomalies

Collective Anomalies

A collection of related data instances is anomalous
Requires a relationship among data instances

sequential data
spatial data
graph data

The individual instances within a collective anomaly are not anomalous by themselves

3. Evaluation of Anomaly Detection

Accuracy is not sufficient metric for evaluation

Recall: $= \frac{TP}{TP+FN}$
Precision $=\frac{TP}{TP+FP}$
F-measure $= \frac{2*P*R}{R+P}$
Recall (detection rate)

ratio between the number of correctly detected anomalies and the total number of anomalies

False alarm (false positive)

ratio between the number of data records from normal class that are misclassified as anomalies and the total number of data records from normal class.

ROC curve: a tradeoff between detection rate and false alarm rate.
Area under the ROC curve (AUC) is computed using a tapeziod rule.

4. Taxonomy

5. Classification Based Techniques

Main idea: build a classification model for normal (and anomalous (rare)) events based on labeled training data, and use it to classify each new unseen event.
Classification models must be able to handle skewed (imbalanced) class distribution
Categories

supervised learning techniques

require knowledge of both normal and anomaly class
build classifier to distinguish between normal and known anomalies

semi-supervised classification techniques

require knowledge of normal class only
use modified classification model to learn the normal behavior and then detect any deviations from normal behavior as anomalous

Advantage

supervised learning techniques

models that can be easily understood
high accuracy in detecting many kinds of known anomalies

semi-supervised classification techniques

models that can be easily understood
normal behavior can be accurately learned

Drawbacks

supervised learning techniques

require both labels from both normal and anomaly class
cannot detect unknown and emerging anomalies

semi-supervised classification techniques

require labels from normal class
possible high false alarm rate

previous unseen data records may be recognized as anomalies

6. Nearest Neighbor Based

Key assumption

normal points have close neighbors while anomalies are located far from other points

Two approach

compute neighborhood for each data record
analyze the neighbor to determine whether data record is anomaly or not

Categories

distance-based methods

anomalies are data points most distant from other points

density-based methods

anomalies are data points in low density regions

Advantage

can be used in unsupervised or semi-supervised setting (do not make any assumptions about data distribution)

Drawbacks

if normal points do not have sufficient number of neighbors the techniques may fail
computationally expensive
in high dimensional spaces, data is sparse and the concept of similarity may not be meaningful anymore

due to the sparseness, distances between any two data records may become quite similar

each data record may be considered as potential outlier

7. Clustering-based Techniques

Key assumption

normal data records belong to large and dense clusters, while anomalies do not belong to any of the clusters or form very small clusters

Categorization according to labels

semi-supervised

cluster normal data to create models for normal behavior. If a new instance do not belong to any of the clusters or it is not close to any cluster, is anomaly.

unsupervised

post processing is needed after a clustering step to determine the size of the clusters and the distance from the cluster is required for the point to be anomaly

Advantage

No need to be supervised
Easily adaptable to online/incremental mode suitable for anomaly detection from temporal data

Drawbacks

computationally expensive

using indexing structures may alleviate this problem

if normal points do not create any clusters the techniques may fail
in high dimensional spaces, data is sparse and distances between any two data records may become quite similar

8. Statistics Based Techniques

Data points are modeled using stochastic distribution

points are determined to be outliers depending on their relationship with their model

Advantage

utilize existing statistical modeling techniques to model various type of distributions

Drawback

with high dimensions, difficult to estimate distributions
parametric assumptions often do no hold for real data sets

Parametric Techniques

assume that the normal (and possibly anomalous) data is generated from underlying parametric distribution
learn the parameters from the normal sample
determine the likelihood of a test instance to be generated from this distribution to detect anomalies

Non-parametric Techniques

do not assume any knowledge of parameters
use non-parametric techniques to learn a distribution

e.g., parzen window estimation

9. Information Theory Based Techniques

Computer information content in data using information theoretic measures

e.g., entropy, relative entropy, etc.

Key idea: outliers significantly alter the information content in a dataset
Approach: detect data instances that significantly alter the information content

require an information theoretic measure

Advantage

operate in an unsupervised mode

Drawback

require an information theoretic measure sensitive enough to detect irregularity induced by few outliers

Example

Kolmogorov complexity

detect smallest data subset whose removal leads to maximum reduction in kolmogorov complexity

Entropy-based approaches

find a k-sized subset whose removal leads to the maximal decrease in entropy

10. Spectral Techniques

Analysis based on eigen decomposition of data
key idea

find combination of attributes that capture bulk of variability
reduced set of attributes can explain normal data well, but not necessarily the outliers

Advantage

can operate in an unsupervised mode

Drawback

based on the assumption that anomalies and normal instances are distinguishable in the reduced space

Several methods use Principal Component Analysis

top few principal components capture variability in normal data
smallest principal component should have constant values
outliers have variability in the smallest component

11. Collective Anomaly Detection

Detect collective anomalies
Exploit the relationship among data instances
Sequential anomaly detection

detect anomalous sequence

Spatial anomaly detection

detect anomalous sub-regions within a spatial data set

Graph anomaly detection

detect anomalous sub-graphs in graph data

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)