Tuesday, April 1, 2014

From Random Walks to Personalized PageRank

In graph theory (and its applications) it is often required to model how information spreads within a given graph. This is interesting for many applications, such as attack predictionsybil detection, and recommender systems - just to name a few. The question now becomes, can the spread of information be mathematically modeled and generalized, such that we can reuse the same model for any kind of information?

Random Walks

Let's consider the graph below. And let's assume that we drop a piece of information on the red vertex. Then we'd like to know a couple of things. For example, where does it spread first? How far does it spread? Will the vertices, close to the red vertex obtain more information then those far away? And finally, will the information continue to go back and forth forever or will we reach a stable distribution eventually?
Surprisingly, the solution lies on the shoulders of small processes randomly walking through the graph. I say "surprisingly" because at first sight it might seem counter-intuitive to expect a deterministic answer from a random process. But randomness becomes deterministic when applied infinitely often: If you toss a coin, nobody can predict which side will be up. But if you toss a coin for an infinite number of times, we know that 50% of the time, head will be up - given the coin is fair. Of course, in the case of graphs, we can't throw coins. But we can walk along the graph's edges. And well, we can do so randomly. For this purpose, we introduce the random walker. He starts at the red vertex, chooses an edge at random and goes to the vertex on the other side of the edge. If we repeat that experiment an infinite number of times, we obtain a deterministic distribution of random walkers, given that they started at the red vertex and were allowed to travel one hop away from it.
Well that's boring. Of course we knew it: The random walkers had no choice but to walk along a single edge. But what happens if they are allowed to travel along two or three or four consecutive edges? It becomes already more difficult to see how this ends. The good news is that we don't even have to run this experiment infinitely often. In fact, we can use the matrix of transition probabilities A and multiply it with the vector containing the initial distribution of random walkers x: x' = Ax. That's for one hop - the second hop can be computed with x'' = Ax' and so on. Below, you see the distribution of random walkers after different lengths of random walks.
As we see, the random walkers go back and forth and in fact, they will continue to do so forever. This is because the graph is bipartite - meaning that vertices can be grouped in such a way that edges never connect any two vertices from the same group. For such graphs, the distribution of random walkers will never become stable. There is a solution, though: Lazy random walks.

Lazy Random Walks

The idea of lazy random walks is that we allow the random walkers to remain on a vertex with probability 1/2. Hence, our formula becomes x' = 1/2(A + I)*x. In this formula, I is the identity matrix and A is the original matrix of transition probabilities. In the animation below, the thickness of an edge corresponds to the geometric mean of the amount of information of its adjacent vertices. The color of the edge is determined by the "information current": The difference in the amount of information between the adjacent vertices. In other words: The thicker the edge, the more information flows along the edge. Edges with an adjacent vertex that has a high degree tend to be colored red. These are the vertices which contain a significantly larger part of information than the rest of the network.
As we can see, the distribution converges. And not only that: It also becomes less and less apparent where the information was originating from. In fact, for the vertices which are well connected, the distribution of information (or random walkers - whatever you want to call it) approximates their degree distribution. This means that high-degree vertices will contain proportionally more information than low-degree vertices.

But what if we actually wanted the vertex that contains the information initially to play an important role? One way to model that is to not stall on any vertex, but let the random walkers jump back to this specific vertex with a given probability (i.e. the teleport probability, alpha).

Personalized PageRank

It turns out that this is exactly what "Personalized PageRank" is all about. It models the distribution of  rank, given that the distance random walkers (the paper calls them random surfers) can travel from their source (the source is often referred to as "seed") is determined by alpha. In fact the expected walk-length is 1/alpha. The formula now becomes x' = (1-alpha)*Ax + alpha*E. Here, alpha is a constant between 0 and 1 and E is the vector containing the source of information - i.e. in our case it is all zero, except for the red vertex where our information starts to spread.
In this animation, alpha is fixed to 1/2 in order to being able to allow for comparison with the lazy random walks. This is pretty high, though - with the result that our information indeed remains close to the seed vertex. In many cases we will want the random walkers to travel farther. Below, is an animation for alpha = 0.1

Conclusion

Now what is the verdict? First, any diffusion of information in a graph can be modeled with random walks. The most interesting parameter of this algorithm is the length of the random walk. The longer it is, the farther the information spreads. In sybil detection, for example, this is used to detect users which have only very few 'real' friends. The idea is that if we trust some real users of a social network, we can diffuse this trust among close friends of these trusted users by using random walks. If these walks are short enough, the trust does not diffuse to far and sparsely connected users which might by sybils but remains within the well connected group of honest users. Look at the animation depicting the lazy random walks: The information indeed spreads well among the well connected vertices but it does not spread to the cluster on the left which is only connected through a single edge.

For well connected networks it becomes irrelevant quickly (i.e. for short random walks) where the information started to spread - in the case of honest users this is a good property, because we can choose any honest user - no matter which user we choose - the diffusion of trust will converge to the same distribution among these well connected honest users. Sometimes, however, we want the initial source of information to play an important role. The ranking of web-pages is such an example: If you set a homepage in your browser or visit the same set of webpages frequently, search engines use this fact and rank web-pages higher which are closer to the set of web-pages you visit often. This closes the circle to the Personalized PageRank algorithm which was designed to model exactly that. People, however, have applied it to many different domains, such as predicting future targets of cyber attacks or even community detection.

I hope this clarifies some of the parts of Personalized PageRank and how it relates to random walks. While this information is spread across several research papers and lectures I thought it would be good to condense it in a single blog-post. And even better: with animations of what's happening and why. If you want to alter some of the parameters and try for yourself: checkout the source-code for the animations above on my BitBucket repository.