Quasi-metrics Similarities And Searches Aspects Of Geometry Of Protein Datasets | Awesome Learning to Hash Add your paper to Learning2Hash

Quasi-metrics Similarities And Searches Aspects Of Geometry Of Protein Datasets

Stojmirovic Aleksandar. Arxiv 2008

[Paper]    
ARXIV

A quasi-metric is a distance function which satisfies the triangle inequality but is not symmetric: it can be thought of as an asymmetric metric. The central result of this thesis, developed in Chapter 3, is that a natural correspondence exists between similarity measures between biological (nucleotide or protein) sequences and quasi-metrics. Chapter 2 presents basic concepts of the theory of quasi-metric spaces and introduces a new examples of them: the universal countable rational quasi-metric space and its bicompletion, the universal bicomplete separable quasi-metric space. Chapter 4 is dedicated to development of a notion of the quasi-metric space with Borel probability measure, or pq-space. The main result of this chapter indicates that `a high dimensional quasi-metric space is close to being a metric space’. Chapter 5 investigates the geometric aspects of the theory of database similarity search in the context of quasi-metrics. The results about \(pq\)-spaces are used to produce novel theoretical bounds on performance of indexing schemes. Finally, the thesis presents some biological applications. Chapter 6 introduces FSIndex, an indexing scheme that significantly accelerates similarity searches of short protein fragment datasets. Chapter 7 presents the prototype of the system for discovery of short functional protein motifs called PFMFind, which relies on FSIndex for similarity searches.

Similar Work