Skip to content

Latest commit

 

History

History
23 lines (18 loc) · 1.53 KB

File metadata and controls

23 lines (18 loc) · 1.53 KB

Presentation

ABSTRACT

It is frequently the case that we need to quantitatively compare the similarity or distance between trees composed of the same set of leaves but presenting different topologies. This problem is not new, and although there are many existing metrics, they suffer from numerous limitations and do not scale to large trees. This thesis provides a novel Information Theoretic Metric for Trees called Tree Mutual Information (TMI) which can be interpreted through the information shared by trees. The metric is based on the best-aligned partitions of the set of leaves induced by both trees. It can be used to evaluate the quality of hierarchical clustering and to interpret its results.

In addition to the novel metric, this thesis proposes a new technique for evaluating adjusted mutual information based on pairwise permutations. The computation is much faster, and can thus be used for comparing large trees. All the experiments were conducted both on synthetic and real datasets to illustrate the approach’s efficiency in different settings. The proposed metric outperforms existing metrics both in quality and running time.

Keywords: Hierarchical clustering, Dendrogram, Tree, Adjusted Mutual Information, Informa- tion Theory.