Alejandro Ribeiro | Research / Circles of trust: Hierarchical clustering in asymmetric networks

Miranda trusts Billy who trusts Ariel who trusts Miranda, but there has not been enough interactions in the opposite direction to establish trust. When these three people meet, shall they trust each other? I.e., are they part of a circle of trust? The objective of this project is to develop an axiomatic theory to provide an answer to this question. In general, we start with a network were nodes represent individuals and directed edges represent a trust dissimilarity from the originating node to the end node. Small values of this dissimilarity signify large amounts of trust of the edge's source node on the edge's destination. Our goal is the study of the formation of trust groups in the network. I.e., the determination of the level of trust at which two individuals are integrated in a trust cluster given not only their direct interactions but their indirect interactions through other members of the network. It may make sense for Miranda, Billy, and Ariel to trust each other, because they all either trust each other directly, or have trust on someone that trusts the person they don't know.

Figure 1. Trust network. Arcs denote trust dissimilarity, e.g., $a$ has substantial trusts in $b$, but $b$ has little trust in $a$. Clustering intuition is precarious. Nodes seem close clockwise but far apart counterclockwise.

Once the problem is written in this language it is clear that determining circles of trust is akin to finding clusters in an asymmetric network for a given resolution level. The determination of a family of clusterings indexed by this resolution parameter is a problem known as hierarchical clustering. Simple as this sounds, the problem is that clustering in general and clustering using asymmetric data in particular is a poorly understood problem. There are plethora of methods that can be chosen to perform clustering, but these methods are based on heuristic intuition, not fundamental principles. Beyond purist concerns, lack of theoretical understanding is also a practical problem for clustering of asymmetric data because the intuition backing clustering methods is drawn from geometric point clouds. This intuition does not carry when the given data is not metric as in the case of asymmetric trust dissimilarities. E.g., in the network in Fig. 1 nodes $a$ and $b$ are closest to each other clockwise, but farthest apart counterclockwise, $c$ and $d$ seem to be closest on average, yet, it seems that all nodes are relatively close as it is possible to circle the graph clockwise without encountering a dissimilarity larger than 3.

Even though asymmetric clustering intuition is difficult in general, there are some particular specks of intuition that we can exploit to gain insight into the general problem. These intuitive statements can be postulated as axioms that restrict the space of allowable asymmetric hierarchical clustering methods. To the extent that the axioms are true, the properties of this reduced space of methods are fundamental properties of asymmetric hierarchical clustering and by extension fundamental properties of the formation of circles of trust. This is the approach advocated in this project.

Axioms of value and transformation

In our preliminary investigations we have postulated three axioms that we call the axioms of value, influence, and transformation. These axioms are stated formally in our published work but they correspond to the following intuitions:

Figure 2. Axiom of Value. For a two node network nodes are clustered together at the resolution at which both can influence each other.

(A1) Axiom of Value. For a network with two nodes the nodes are clustered together at the resolution at which both trust each other, namely, the maximum of the two trust dissimilarities between them.
(A2) Axiom of Transformation. If we consider a network and reduce all pairwise trust dissimilarities, the level at which two nodes become part of the same circle of trust is not larger than the level at which they were clustered together in the original network.

Axiom (A1) says that in a network with two nodes $p$ and $q$ and dissimilarities $A_X(p,q)=\alpha$ and $A_X(q,p)=\beta$, the nodes are reported as separate clusters for resolutions $\delta<\max(\alpha,\beta)$ and as a single cluster for resolutions $\delta\geq\max(\alpha,\beta)$. This is reasonable because at resolutions $\delta<\max(\alpha,\beta)$ one node can influence the other but not vice versa, which in most situations means that the nodes are not alike since trust flows in a single direction. Axiom (A2) states that increasing the level of trust between some nodes may result in the formation of additional circles of trust but cannot result in the dissolution of existing circles. This is erasable because a reduction in trust dissimilarities is expected to generate more trust in the network.

Figure 3. Axiom of Transformation. A dissimilarity reducing map $\phi:X\to Y$ produces dendrograms where clusters in the original network may be combined in the transformed network but cannot be separated.

Despite their apparent weakness, axioms (A1)-(A2) are a source of strong structure. Our first preliminary result is the derivation of two asymmetric hierarchical clustering methods that abide to these axioms. The first method insists that trust propagate only through arcs in which there is bidirectional trust and is therefore termed reciprocal clustering. The second method allows trust to propagate unidirectionally and is thus termed nonreciprocal clustering. That these methods comply with (A1)-(A2) is not particularly surprising. However, we have proved that any clustering method that satisfies axioms (A1)-(A2) lies between reciprocal and nonreciprocal clustering in a well defined sense. Specifically, any clustering method that satisfies axioms (A1)-(A2) forms circles of trust at resolutions larger than the resolutions at which they are formed with nonreciprocal clustering, and smaller than the resolutions at which they are formed with reciprocal clustering. These preliminary result endows reciprocal and nonreciprocal clustering with special meaning. For a given resolution level, nodes that {\it do not} cluster together with nonreciprocal clustering cannot be part of a circle of trust. Nodes that {\it do} cluster together with reciprocal clustering are definitely part of a circle of trust. In between, the answer depends on the extent to which reciprocal trust propagation is required or nonreciprocal trust propagation is acceptable.

Ongoing work

These preliminary results are enticing but far from a complete axiomatic exploration on the formation of circles of trust. To enrich this exploration we are currently pursuing four interrelated research thrusts that we preview in the following.

Circles of trust. Nonreciprocal and reciprocal clustering set lower and upper bounds in the formation of circles of trust providing the basis for their empirical study in social networks. This thrust encompasses a data collection effort and corresponding data analysis. Questions of interest include the symmetry and segmentation of trust. We associate symmetry of trust with the difference between reciprocal and nonreciprocal clusters. We identify segmentation of trust as the number of separate clusters as a function of the resolution level.
Further axiomatic constructions. Reasonable though they are, axioms (A1)-(A2) are a particular selection. This research thrust leverages the developed techniques to study additional axiomatic constructions. We consider different versions of axioms (A1) and (A2) that could lead to more general, more restricted, or simply different sets of admissible clustering methods. We are also studying additional axioms that can be added on top of (A1)-(A2) with the objective of finding a set of axioms leading to uniqueness results.
Stability. Two networks that are close to each other shall yield hierarchical clusters that are also close to each other. While this sense of stability looks like, and should be, a precondition for practical applicability, many (symmetric) clustering algorithms used in practice are not stable in this sense. Studying stability in asymmetric clustering is further complicated by the nonexistence of suitable tools to formalize the proto-continuity statement of ``networks close to each other.'' The first objective of this research thrust is to define generalized versions of Gromov-Hausdorff and Gromov-Wasserstein distances that can be used to formalize a notion of continuity in the space of networks. With this tool in hand we derive conditions to guarantee that networks arbitrarily close to each other yield hierarchical clusters that are also arbitrarily close to each other.
Intermediate clustering methods. Nonreciprocal and reciprocal clustering lower and upper bound the range of hierarchical clustering methods. A first question that arises is if there really are clustering methods lying between these two. We believe the answer to that question is probably positive. The reason for this belief is that the outputs of hierarchical clustering algorithms are trees. If we have two trees that satisfy axioms (A1)-(A2) we can combine them with each other by pruning branches that we then graft in the other tree. It seems reasonable that we could accommodate this grafting process so that axioms (A1)-(A2) are satisfied in the rearranged tree. We are currently studying this and other techniques that may lead to intermediate clustering methods.

Acknowledgments and references

This is the Ph.D. work of Santiago Segarra. Also part of this project are Prof. Facundo Memoli and Prof. Gunnar Carlsson. This work builds on their prior work on hierarchical clustering for metric data and would not be possible without their contribution. Some preliminary results have appeared in the following paper:

bibtexsummary:[02_my_conferences_2013.bib,cCarlssonEtal13]