Part I Metric Searching in a Nutshell
Overview 3
1. FOUNDATIONS OF METRIC SPACE SEARCHING 5
1 The Distance Searching Problem 6
2 The Metric Space 8
3 Distance Measures 9
3.1 Minkowski Distances 10
3.2 Quadratic Form Distance 11
3.3 Edit Distance 12
3.4 Tree Edit Distance 13
3.5 Jaccard’s Coefficient 13
3.6 Hausdorff Distance 14
3.7 Time Complexity 14
4 Similarity Queries 15
4.1 Range Query 15
4.2 Nearest Neighbor Query 16
4.3 Reverse Nearest Neighbor Query 17
4.4 Similarity Join 17
4.5 Combinations of Queries 18
4.6 Complex Similarity Queries 18
5 Basic Partitioning Principles 20
5.1 Ball Partitioning 20
5.2 Generalized Hyperplane Partitioning 21
5.3 Excluded Middle Partitioning 21
5.4 Extensions 21
6 Principles of Similarity Query Execution 22
6.1 Basic Strategies 22
6.2 Incremental Similarity Search 25
7 Policies for Avoiding Distance Computations 26
7.1 Explanatory Example 27
7.2 Object-Pivot Distance Constraint 28
7.3 Range-Pivot Distance Constraint 30
7.4 Pivot-Pivot Distance Constraint 31
7.5 Double-Pivot Distance Constraint 33
7.6 Pivot Filtering 34
8 Metric Space Transformations 35
8.1 Metric Hierarchies 36
8.1.1 Lower-Bounding Functions 36
8.2 User-Defined Metric Functions 38
8.2.1 Searching Using Lower-Bounding Functions 38
8.3 Embedding Metric Space 39
8.3.1 Embedding Examples 39
8.3.2 Reducing Dimensionality 40
9 Approximate Similarity Search 41
9.1 Principles 41
9.2 Generic Algorithms 44
9.3 Measures of Performance 46
9.3.1 Improvement in Efficiency 46
9.3.2 Precision and Recall 46
9.3.3 Relative Error on Distances 48
9.3.4 Position Error 49
10 Advanced Issues 50
10.1 Statistics on Metric Datasets 51
10.1.1 Distribution and Density Functions 51
10.1.2 Distance Distribution and Density 52
10.1.3 Homogeneity of Viewpoints 54
10.2 Proximity of Ball Regions 55
10.3 Performance Prediction 58
Contents ix
10.4 Tree Quality Measures 60
10.5 Choosing Reference Points 63
2. SURVEY OF EXISTING APPROACHES 67
1 Ball Partitioning Methods 67
1.1 Burkhard-Keller Tree 6
1