(Deprecated) Lisa's Tech Blog: Partitional Clustering

Wednesday, May 6, 2015

Partitional Clustering

1. Partitional Clustering v.s. Hierarchical Clustering

Partitioning clustering: A division of data objects into non-overlapping subsets (clusters) such that each data object is in exactly one subset.
Hierarchical clustering: A set of nested clusters organized as a hierarchical tree.

2. Types of Cluster

Well-Separated Clusters: a cluster is a set of points such that any point in a cluster is closer (or more similar) to every other point in the cluster than any point not in the cluster.
Center-based Clusters: A cluster is a set of objects such that any point in a cluster is closer (more similar) to the "center" of a cluster, than to the center of any other cluster.

The center of a cluster is often a centroid, the average of all the points in the cluster, or a medoid, the most "representative" point of a cluster.

Contiguous Clusters (Nearest Neighbor or Transitive): A cluster is a set of points such that a point in a cluster is closer (or more similar) to one or more other points in the cluster than to any point not in the cluster.

Density-based: A cluster is a dense region of points, which is separated by low-density regions, from other regions of high density.

Used when cluster are irregular or intertwined.

Shared Property or Conceptual Clusters: Finds clusters that share some common property or represent a particular concept.

Cluster Defined by an Objective Function:

Finds clusters that minimize or maximize an objective function
Enumerate all possible ways of dividing points into clusters and evaluate the "goodness" of each potential set of clusters by using the given objective function (NP hard).
Can have global or local objectives

hierarchical clustering algorithms typically have local objectives
partitional algorithms typically have global objectives

A variant of the global objective function approach is to fit the data to a parameterized model

parameters for the model are determined from the data
mixture models assume that the data is a "mixture" of a number of statistical distributions.

Map the clustering problem to a different domain and solve a related problem in that domain

proximity matrix defines a weighted graph, where the nodes are the points being clustered, and the weighted edges represent the proximities between points.
Clustering is equivalent to breaking the graph into connected components, one for each cluster.

3. K-means Clustering

Code

Initial centroids are often chosen randomly.

Clusters produced vary from one run to another.

The centroid is (typically) the mean of the points in the cluster.
"Closeness" is measured by Euclidean distance, cosine similarity, correlation, etc.
K-means will converge for common similarity measures mentioned above.
Most of the converge happens in the few iterations.

often the stopping condition is changed to "until relatively few points change clusters"

Complexity is O(n*k*i*d)

n: number of points
k: number of clusters
i: number of iterations
d: number of attributes

4. Evaluating K-means Clustering

Most common measure is Sum of Squared Error (SSE)

for each point, the error in the distance to the nearest cluster
to get SSE, we square these errors and sum them

SSE = $\sum^k_{i=1} \sum_{x \in C_i}$ dist^2$(m_i,x)$
where x is the data point in cluster $C_i$ and $m_i$ is the representative point for cluster $C_i$

5. Problem with Selecting Initial Points

If there are K "real" clusters then the chance of selecting one centroid from each cluster is small

Chances is relatively small when K is large.
If clusters are the same size, n, then

P = $\frac{ \mbox{number of ways to select one centroid from each cluster}}{\mbox{number of ways to select K centroids}} = \frac{K!n^K}{(Kn)^K} = \frac{K!}{K^K}$

Solutions to Initial Centroids Problem

Multiple runs: helps, but probability is not on your side
Sample and user hierarchical clustering to determine initial centroids
Select more than k initial centroids and then select among these initial centroids

select most widely separated

Postprocessing
Bisecting K-means

Not as susceptible to initialization issues

6. Handling Empty Clusters

Basic K-means algorithm can yield empty clusters
Several strategies

Choose the point that contributes most to SSE
Choose a point from the cluster with the highest SSE
If there are several empty clusters, the above can be repeated several times

7. Updating Centers Incrementally

In the basic K-means algorithm, centroids are updated after all points are assigned to a centriod
An alternative is to update the centriods after each assignment (incremental approach)

each assignment updates zero or two centroids
more expensive
introduces an order dependency
never get an empty cluster
can use "weights" to change the impact

8. Pre-processing and Post-processing

Pre-processing

normalize the data
eliminate outliers

Post-processing

Eliminate small clusters that may represent outliers
Split "loose" clusters, i.e., clusters with relatively high SSE
Merge clusters that are "close" and that have relatively low SSE
Can use these steps during the clustering process

ISODATA

9. Bisecting K-means

Bisecting K-means algorithm

variant of K-means that can produce a partitional or a hierarchical clustering

10. Limitations of K-means

K-means has problems when clusters are of differing

sizes
densities
non-globular shapes

K-means has problems when the data contains outliers.
One solution is to use many clusters

find parts of clusters, but need to put together

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)