Ir Arriba Ir abajo

Neighborhood Graphs with HITS and SALSA Seth J. Chandler

A particularly helpful search of a network such as the Internet or a citation network not only finds nodes that satisfy some criteria but also ranks those nodes for importance to create what amounts to a "reading list". Traditionally, this ranking was independent of the particular search performed. In part for reasons of speed, the nodes were ranked on the basis of their pre-computed importance within the entire network. More recently, however, relatively rapid algorithms have been developed to permit search-specific rankings. Thus, when these newer methodologies are used, if web pages or nodes in a citation network cover multiple topics and a particular document is important with respect to Topic A and far less important on Topic B, a search for documents based on the presence of Topic B will select this node but will not rank it highly. These newer methodologies essentially create a ranked and topic-specific "reading list" to explore the information revealed by the search. The key to these new methodologies is the creation of a "neighborhood graph", often containing far fewer nodes and edges than the full network to which various algorithms for computing node centrality can be rapidly applied.

This Demonstration shows how these new methods can create search-specific rankings of nodes selected by a search. It does so by the creation of a "neighborhood graph". You select from among six networks that are intended to resemble citation networks. Each node of the each network has an "attribute set" consisting of five Boolean values. You then filter the nodes (conduct a search) by choosing whether you want each member of the five attributes associated with the node to be True or False, or whether you don’t want to exclude any nodes based on the value of that attribute. These choices create a "result set" of nodes matching your selection. The Demonstration colors these nodes in red. These nodes are then augmented into a "base set" by adding a sample of the parents of each member of the result set and a sample of the children of each member of the result set. You specify the maximum cardinality of each selection of parents and the maximum cardinality of each selection of children. The base set thus establishes a partial "buffer" around the result set. The Demonstration colors these buffering nodes in green.

It is now time to determine how the nodes in the result set are to be ranked and the nodes are to be correlatively sized. Although you have the option of ranking the nodes based on their global importance in the entire network (using the traditional PageRanks method), in general, users will want to select a method for prioritizing the nodes in the result set based on the neighborhood graph. You can choose to prioritize the nodes in the neighborhood graph through the PageRanks method promoted by Google, the "HITS" (Hyperlink-Induced Topic Search) algorithm or the more recent "SALSA" (Stochastic Approach for Link Structure Analysis) algorithm. The latter two algorithms require you to specify whether you care about the nodes' importance as an "authority" relied on by other nodes or a "hub" that connects to an authoritative node.

Two additional controls let you customize the scale and the size of the vertices in the result set and permit you either to see the entire network or just the neighborhood graph.

No hay comentarios:

Publicar un comentario

Bienvenido a Avibert.
Deja habilitado el acceso a tu perfil o indica un enlace a tu blog o sitio, para que la comunicación sea mas fluida.
Saludos y gracias por comentar!