1. The Canopies Approach
- Two distance metrics
- cheap & expensive
- First pass
- very inexpensive distance metric
- create overlapping canopies
- Second pass
- expensive, accurate distance metric
- canopies determine which distances calculated
- Calculate expensive distances between points in the same canopy
- All other distances default to infinity
- Use finite distances and iteratively merge closest
3. Preserve Good Clustering
- Small, disjoint canopies
- big time savings
- Large, overlapping canopies
- original accurate clustering
- Goal: fast and accurate
- For every cluster, there exists a canopy such that all points in the cluster are in the canopy
No comments:
Post a Comment