Handling Cases in Page Rank: Part 2

PUBLISHED ON JAN 6, 2018 — DEVELOPMENT, MATHEMATICS, TUTORIAL

In the last post here I discussed the basics of how Google’s PageRank algorithm works. In this post, we only discussed how PageRank works in a simple network. The point of this article is to discuss some of the problems you reach when they network gets more complicated.

Dangling Nodes within U2

In the last post, I introduced U1 which was a simple network with connected pages. We are going to need to introduce another universe U2 now. Why? Because the network I provided before was so simple that it failed to capture a majority of the dynamics of the web. For one, all the pages had at least one connection with another page. In the real world, that isn’t realistic. For one, you can have completely isolated pages. If a page doesn’t link to anything else, its an endpoint or dangling node.

Why is this a problem? Because when google crawls through the web it will keep going, until it hits a dangling node. At that point, it will be stuck, and so the dangling node is like a “black hole” for a webcrawler.

Let’s look at a new table, with instead of D linking A, this time it doesn’t link to anything.

D is hanging on its own. Uh oh!

Let’s look at the table matrix now:

As you can see here, there is no way to make the right hand column equal to 1 because there are no outlinks at D. This dandling node busts our Markov Chain and is a real problem for us. Eventually, this dangling node will be responsible for converging the pagerank vector to zero.

What can you do?

Well there are a couple approaches to fixing this problem.

   1. We can artificially insert $\frac{1}{N}$ into each entry within the column, and then run the pagerank algorithm to convergence. This is the equivlient of a web-crawler randomly opening up a new tab and choosing a new webpage.

   2. We can separate the two networks, and tackle the dangling nodes and non-dangling nodes adjacently. This is called lumping. If we lump the network, we take all the dangling nodes and analyze them seperately from all the non-dangling nodes.

In this case, we can analyze the dangling and non-dangling networks seperate

Let’s work through a simple example using the dangling network and the first approach.

Let’s represent the matrix we just introduced a little while back in proper matrix form.

$$ \begin{bmatrix} 0 & 1 & .5 & 0\\ .33 & 0 & .5 & 0 \\ .33 & 0 & 0 & 0\\ .33 & 0 & 0 & 0\end{bmatrix} \cdot \begin{bmatrix} P_A \\ P_B \\ P_C \\ P_D \end{bmatrix} = \begin{bmatrix} P_A \\ P_B \\ P_C \\ P_D \end{bmatrix} $$

If we took the eigenvector and eigenvalues using the power method over time the model would converge to zero or fails to converge entirely. Let’s take a look at this example:

Iteration v1 v2 v3 v4
0 1.500 0.833 0.333 0.333
1 1.000 0.666 0.500 0.500
2 0.916 0.583 0.333 0.333
3 0.750 0.472 0.305 0.305
4 0.625 0.402 0.250 0.250
5 0.527 0.333 0.208 0.208
10 0.214 0.136 0.085 0.085
15 0.087 0.055 0.035 0.035
20 0.035 0.022 0.014 0.014
30 0.006 0.003 0.002 0.002
40 0.001 0.000 0.000 0.000
50 0.000 0.000 0.000 0.000
99 0.000 0.000 0.000 0.000

With this particular example you can see it quickly moves toward 0 and convergence by ~40 iterations which is not useful for us.

Note: If you’re interested in testing the convergence yourself, the code is really easy. Here an example of the code in python (requires numpy):

import numpy as np

#create your markov matrix and initial page rank vector
A = np.matrix([[0,1,.5,0], [1/3,0,.5,0], [1/3,0,0,0], [1/3,0,0,0]])
p = np.matrix([[1],[1],[1],[1]])

#check convergence and print results
for x in range(0, 100): #loop and multiple by eachother
    p = np.array(np.dot(A,p))
    print('|Iteration {:2d}| {:=f} | {:=f} | {:=f} | {:=f} |'.format(x, p[0][0], p[1][0], p[2][0], p[3][0]))

Initialize the dangling node to achieve convergence

We are going to try the first idea to handle the dangling nodes, which is to initialize the matrix with equal weight to all other nodes for the dangling column. Let’s try it.

New Matrix:

$$ \begin{bmatrix} 0 & 1 & .5 & .33\\ .33 & 0 & .5 & .33 \\ .33 & 0 & 0 & .33\\ .33 & 0 & 0 & 0\end{bmatrix} \cdot \begin{bmatrix} P_A \\ P_B \\ P_C \\ P_D \end{bmatrix} = \begin{bmatrix} P_A \\ P_B \\ P_C \\ P_D \end{bmatrix} $$

Notice we set everything on column D to .33. Let’s see how it converges:

Iterations PageA PageB PageC PageD
0 1.830 1.163 0.663 0.333
1 1.605 1.051 0.720 0.610
2 1.612 1.096 0.736 0.535
3 1.641 1.082 0.714 0.537
4 1.616 1.081 0.724 0.547
5 1.624 1.081 0.719 0.538
6 1.619 1.079 0.719 0.541
7 1.617 1.078 0.718 0.539
8 1.615 1.076 0.717 0.539
9 1.612 1.074 0.716 0.538
10 1.610 1.073 0.715 0.537
15 1.599 1.066 0.710 0.534
20 1.588 1.058 0.705 0.530
30 1.567 1.044 0.696 0.523
40 1.546 1.030 0.686 0.516
50 1.525 1.016 0.677 0.509
99 1.426 0.950 0.633 0.476

Fantastic. Now our network converges and D which has no links to it, is the worst page rank. The new rank is this:

New Rank

PageA PageB PageC PageC
Rank 1 2 3 4

Looks pretty good. We’ve now successfully handled dangling nodes in our network.

Damping Factor and Reducible Networks

Let’s take one more network, but this time have something I’m calling locked networks, networks in which a webcrawler is locked into a system. Imagine for example, A linked to B, B to C, C to A. Any webcrawler that lands in one of these 3 webpages will be forced to continue to move between pages A,B,and C and will not be able to get to D following the links.

Here's an example of lockout. D is unreachable from A,B, or C but still has a link. Over time this will converge to 0 as the probability of getting to it is very low.

We need something that will prevent this “lockout”. If we don’t, D will eventually converge to a 0 score (Not cool because D has a link and Ds think they should matter #PageEquality)

Larry Page and Sergey Brin also realized this issue, and so they introduced something called the damping factor, $\alpha$, which is supposed to simulate a user randomly moving opening a new tab and going through the network from a different angle. The dampening factor as described to them is a value between 0 and 1 which simulates an orthogonal lookup. Based upon empirical studies, they found a dampening factor of .85 converges fastest.

This is the adjusted formula:

$$ (\frac{1-\alpha}{N})\Gamma ) = \hat{A} \\
$$ $$ \vec{p}_{t+1} = (\alpha A + (\frac{1-\alpha}{N})\Gamma ) \cdot \vec{p}_t \\
$$

$$ \vec{p}_{t+1} = \hat{A} \cdot \vec{p}_t $$

where $\alpha A$ = the dampened network
$ (\frac{1-\alpha}{N}) $ = a weighted network average
$ \Gamma $ = 1 initialized matrix of NxN dimensions (rank 1)

Let’s apply it to the network and see what happens:

Closed Network

Apply the formula with $\alpha = .85 $ and let’s give equal weights to the page rank vector to start:

$$ \vec{p}_{t+1} = \Bigg( .85 \hat{A} + \Big( \frac{(1-.85)}{5} \Big) \begin{bmatrix} 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 \end{bmatrix} \Bigg) \cdot p_t $$

Ex. $ \vec{p}_1 $

$$ \vec{p}_{1} = \Bigg( .85 \hat{A} + \Big( \frac{(1-.85)}{5} \Big) \begin{bmatrix} 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 \end{bmatrix} \Bigg) \cdot [ \frac{1}{5} , \frac{1}{5} , \frac{1}{5} , \frac{1}{5} , \frac{1}{5} ] $$

Let’s work until convergence using the same steps as previously and compare without the damping formula:

Without Damping Formula

Iterations PageA PageB PageC PageD PageE
0 0.366 0.266 0.266 0.100 0.000
1 0.299 0.349 0.349 0.000 0.000
2 0.349 0.324 0.324 0.000 0.000
3 0.324 0.336 0.336 0.000 0.000
4 0.336 0.330 0.330 0.000 0.000
5 0.330 0.333 0.333 0.000 0.000
6 0.333 0.331 0.331 0.000 0.000
7 0.331 0.332 0.332 0.000 0.000
8 0.332 0.332 0.332 0.000 0.000
9 0.332 0.332 0.332 0.000 0.000
10 0.332 0.332 0.332 0.000 0.000
15 0.332 0.332 0.332 0.000 0.000
20 0.332 0.332 0.332 0.000 0.000
30 0.332 0.332 0.332 0.000 0.000
40 0.332 0.332 0.332 0.000 0.000
50 0.332 0.332 0.332 0.000 0.000
99 0.332 0.332 0.332 0.000 0.000

​As you can see, page D and E have a score of 0, but D does have an inlink (E points to D), so D shouldn’t have a zero score. Maybe lower than A,B,or C, but certainly not zero. Also, E links to A, so A should have a slightly higher score than B or C. Upon convergence, A,B, and C have the same score. That’s not what we want.

Let’s try this again with the damping formula at $ \alpha = .85 $.

With Damping Formula

Iterations PageA PageB PageC PageD PageE
Iteration 0 0.341100 0.256100 0.256100 0.115000 0.030000
Iteration 1 0.292642 0.316017 0.316017 0.042699 0.029949
Iteration 2 0.323239 0.300576 0.300576 0.042648 0.029920
Iteration 3 0.310077 0.306993 0.306993 0.042625 0.029909
Iteration 4 0.315510 0.304109 0.304109 0.042609 0.029898
Iteration 5 0.313038 0.305177 0.305177 0.042594 0.029887
Iteration 6 0.313926 0.304565 0.304565 0.042578 0.029876
Iteration 7 0.313386 0.304667 0.304667 0.042563 0.029865
Iteration 8 0.313453 0.304466 0.304466 0.042547 0.029854
Iteration 9 0.313262 0.304394 0.304394 0.042532 0.029844
Iteration 10 0.313181 0.304267 0.304267 0.042516 0.029833
Iteration 15 0.312602 0.303719 0.303719 0.042439 0.029779
Iteration 30 0.310902 0.302067 0.302067 0.042208 0.029617
Iteration 40 0.309774 0.300971 0.300971 0.042055 0.029509
Iteration 50 0.308649 0.299878 0.299878 0.041902 0.029402
Iteration 99 0.303199 0.294583 0.294583 0.041162 0.028883

Now that looks way better!. A has the highest score, B and C are equal. D has a non-zero score (albeit low), and E has a very low score so it will rank last.

Here’s the final scores of the network:

PageA PageB PageC PageD PageE
Rank 1 2 2 3 4

Note: We aren’t going to handle a tie in a score for the purposes in this article

Let’s just stop here for a second and say YAY!. Hopefully you can appreciate how awesome that is and if you managed to get through all these pages reading you deserve a pat on the back.

Exploiting the Algorithm. Weaknesses of PageRank!

At this point, I think we can move on to discussing where pagerank fails . As with any algorithm, PageRank (particularly the older versions) has a number of weaknesses which we can exploit with better knowledge because we now know how it works.

I'll list 2 obvious weaknesses:

Older is better

One of the more obvious weaknesses is that older is better. Because it takes some time to converge to the optimal vector, the network is biased toward web pages that have had longer time to converge. New networks are probably initialized at small values, so they will take some time to move up the rankings of PageRank.

Is Centrality better?

Is centrality really better when we are talking about relevancy of a search? PageRank will populate more central nodes to less central nodes, but that has a couple impacts:

   1. It exacerbates the scoring bias. More central pages get more links, which in turn bootstraps their rank.
   2. Pages that are potentially more relevant to the user may lose out due to their lack of centrality.

One Larger Network As An Example

We’ve only been working in a 4~5 node universe. As the final material section, let’s do one example where we kick up the nodes to a much larger amount and see what happens. For kicks lets do 15 nodes and 30 connections. Note This data is random so you can refesh if you want.

A random graph of 15 nodes and 30 connections

Node Slider: Change the # of nodes on "Generate". Currently: 30
Link Slider: Change the # of links on "Generate". Currently: 30

Here’s the Markov Matrix related to it:

Hopefully you have a at least one dangling node in this. If not just update the data and you should get some dangling nodes over time.

The Result

Using the Power Method, the result should be:

Huzzah! I can’t comment on your results because it is random but if you order them and rank them you should get something like this:

Things we haven’t addressed

There’s a lot more that we haven’t discussed that is factored into the PageRank algorithm. For example, we didn’t discussing adding a personalization vector which is a vector used to combat things like spamming. It will be the same size of the pagerank vector $\pi$ but applies a preference toward certain pages over others. Think of it as a preference vector.

We also could go into more detail about space lumping, that is lumping Non-Dangling nodes together and Dangling nodes together into a single node.

Follow the links below for further research.

Helpful Sources