In the last post here I discussed the basics of how Google’s PageRank algorithm works. In this post, we only discussed how PageRank works in a simple network. The point of this article is to discuss some of the problems you reach when they network gets more complicated.

In the last post, I introduced **U1** which was a simple network with connected pages. We are going to need to introduce another universe **U2** now. Why? Because the network I provided before was so simple that it failed to capture a majority of the dynamics of the web. For one, *all the pages had at least one connection with another page*. In the real world, that isn’t realistic. For one, you can have completely isolated pages. If a page **doesn’t link to anything else**, its an *endpoint* or **dangling** node.

Why is this a problem? Because when google crawls through the web it will keep going, until it hits a dangling node. At that point, it will be stuck, and so the dangling node is like a “black hole” for a webcrawler.

Let’s look at a new table, with instead of D linking A, this time it doesn’t link to anything.

D is hanging on its own. Uh oh!

Let’s look at the table matrix now:

As you can see here, there is no way to make the right hand column equal to 1 because there are no outlinks at D. This dandling node busts our Markov Chain and is a real problem for us. Eventually, this dangling node will be responsible for converging the pagerank vector to zero.

What can you do?

Well there are a couple approaches to fixing this problem.

1. We can artificially insert $\frac{1}{N}$ into each entry within the column, and then run the pagerank algorithm to convergence. This is the equivlient of a web-crawler randomly opening up a new tab and choosing a new webpage.

2. We can separate the two networks, and tackle the dangling nodes and non-dangling nodes adjacently. This is called *lumping*. If we lump the network, we take all the dangling nodes and analyze them seperately from all the non-dangling nodes.

In this case, we can analyze the dangling and non-dangling networks seperate

Let’s work through a simple example using the dangling network and the first approach.

Let’s represent the matrix we just introduced a little while back in proper matrix form.

$$ \begin{bmatrix} 0 & 1 & .5 & 0\\ .33 & 0 & .5 & 0 \\ .33 & 0 & 0 & 0\\ .33 & 0 & 0 & 0\end{bmatrix} \cdot \begin{bmatrix} P_A \\ P_B \\ P_C \\ P_D \end{bmatrix} = \begin{bmatrix} P_A \\ P_B \\ P_C \\ P_D \end{bmatrix} $$

If we took the eigenvector and eigenvalues using the power method over time the model would converge to zero or fails to converge entirely. Let’s take a look at this example:

Iteration | v1 | v2 | v3 | v4 |
---|---|---|---|---|

0 | 1.500 | 0.833 | 0.333 | 0.333 |

1 | 1.000 | 0.666 | 0.500 | 0.500 |

2 | 0.916 | 0.583 | 0.333 | 0.333 |

3 | 0.750 | 0.472 | 0.305 | 0.305 |

4 | 0.625 | 0.402 | 0.250 | 0.250 |

5 | 0.527 | 0.333 | 0.208 | 0.208 |

10 | 0.214 | 0.136 | 0.085 | 0.085 |

15 | 0.087 | 0.055 | 0.035 | 0.035 |

20 | 0.035 | 0.022 | 0.014 | 0.014 |

30 | 0.006 | 0.003 | 0.002 | 0.002 |

40 | 0.001 | 0.000 | 0.000 | 0.000 |

50 | 0.000 | 0.000 | 0.000 | 0.000 |

99 | 0.000 | 0.000 | 0.000 | 0.000 |

With this particular example you can see it quickly moves toward 0 and convergence by ~40 iterations which is not useful for us.

*Note: If you’re interested in testing the convergence yourself, the code is really easy. Here an example of the code in python (requires numpy):*

```
import numpy as np
#create your markov matrix and initial page rank vector
A = np.matrix([[0,1,.5,0], [1/3,0,.5,0], [1/3,0,0,0], [1/3,0,0,0]])
p = np.matrix([[1],[1],[1],[1]])
#check convergence and print results
for x in range(0, 100): #loop and multiple by eachother
p = np.array(np.dot(A,p))
print('|Iteration {:2d}| {:=f} | {:=f} | {:=f} | {:=f} |'.format(x, p[0][0], p[1][0], p[2][0], p[3][0]))
```

**Initialize the dangling node to achieve convergence**

We are going to try the first idea to handle the dangling nodes, which is to initialize the matrix with equal weight to all other nodes for the dangling column. Let’s try it.

New Matrix:

$$ \begin{bmatrix} 0 & 1 & .5 & .33\\ .33 & 0 & .5 & .33 \\ .33 & 0 & 0 & .33\\ .33 & 0 & 0 & 0\end{bmatrix} \cdot \begin{bmatrix} P_A \\ P_B \\ P_C \\ P_D \end{bmatrix} = \begin{bmatrix} P_A \\ P_B \\ P_C \\ P_D \end{bmatrix} $$

Notice we set everything on column D to .33. Let’s see how it converges:

Iterations | PageA | PageB | PageC | PageD |
---|---|---|---|---|

0 | 1.830 | 1.163 | 0.663 | 0.333 |

1 | 1.605 | 1.051 | 0.720 | 0.610 |

2 | 1.612 | 1.096 | 0.736 | 0.535 |

3 | 1.641 | 1.082 | 0.714 | 0.537 |

4 | 1.616 | 1.081 | 0.724 | 0.547 |

5 | 1.624 | 1.081 | 0.719 | 0.538 |

6 | 1.619 | 1.079 | 0.719 | 0.541 |

7 | 1.617 | 1.078 | 0.718 | 0.539 |

8 | 1.615 | 1.076 | 0.717 | 0.539 |

9 | 1.612 | 1.074 | 0.716 | 0.538 |

10 | 1.610 | 1.073 | 0.715 | 0.537 |

15 | 1.599 | 1.066 | 0.710 | 0.534 |

20 | 1.588 | 1.058 | 0.705 | 0.530 |

30 | 1.567 | 1.044 | 0.696 | 0.523 |

40 | 1.546 | 1.030 | 0.686 | 0.516 |

50 | 1.525 | 1.016 | 0.677 | 0.509 |

99 | 1.426 | 0.950 | 0.633 | 0.476 |

Fantastic. Now our network converges and D which has no links to it, is the worst page rank. The new rank is this:

**New Rank**

PageA | PageB | PageC | PageC | |
---|---|---|---|---|

Rank | 1 | 2 | 3 | 4 |

Looks pretty good. We’ve now successfully handled dangling nodes in our network.

Let’s take one more network, but this time have something I’m calling *locked networks*, networks in which a webcrawler is locked into a system. Imagine for example, A linked to B, B to C, C to A. Any webcrawler that lands in one of these 3 webpages will be forced to continue to move between pages A,B,and C and will *not* be able to get to D following the links.

Here's an example of lockout. D is unreachable from A,B, or C but still has a link. Over time this will converge to 0 as the probability of getting to it is very low.

We need something that will prevent this “lockout”. If we don’t, D will eventually converge to a 0 score (Not cool because D has a link and Ds think they should matter `#PageEquality`

)

Larry Page and Sergey Brin also realized this issue, and so they introduced something called the damping factor, $\alpha$, which is supposed to simulate a user randomly moving opening a new tab and going through the network from a different angle. The dampening factor as described to them is a value between 0 and 1 which simulates an orthogonal lookup. Based upon empirical studies, they found a dampening factor of **.85** converges fastest.

This is the adjusted formula:

$$
(\frac{1-\alpha}{N})\Gamma ) = \hat{A} \\

$$
$$
\vec{p}_{t+1} = (\alpha A + (\frac{1-\alpha}{N})\Gamma ) \cdot \vec{p}_t \\

$$

$$ \vec{p}_{t+1} = \hat{A} \cdot \vec{p}_t $$

where $\alpha A$ = the dampened network

$ (\frac{1-\alpha}{N}) $ = a weighted network average

$ \Gamma $ = 1 initialized matrix of NxN dimensions (rank 1)

Let’s apply it to the network and see what happens:

**Closed Network**

Apply the formula with $\alpha = .85 $ and let’s give equal weights to the page rank vector to start:

$$ \vec{p}_{t+1} = \Bigg( .85 \hat{A} + \Big( \frac{(1-.85)}{5} \Big) \begin{bmatrix} 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 \end{bmatrix} \Bigg) \cdot p_t $$

Ex. $ \vec{p}_1 $

$$ \vec{p}_{1} = \Bigg( .85 \hat{A} + \Big( \frac{(1-.85)}{5} \Big) \begin{bmatrix} 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 \end{bmatrix} \Bigg) \cdot [ \frac{1}{5} , \frac{1}{5} , \frac{1}{5} , \frac{1}{5} , \frac{1}{5} ] $$

Let’s work until convergence using the same steps as previously and compare without the damping formula:

**Without Damping Formula**

Iterations | PageA | PageB | PageC | PageD | PageE |
---|---|---|---|---|---|

0 | 0.366 | 0.266 | 0.266 | 0.100 | 0.000 |

1 | 0.299 | 0.349 | 0.349 | 0.000 | 0.000 |

2 | 0.349 | 0.324 | 0.324 | 0.000 | 0.000 |

3 | 0.324 | 0.336 | 0.336 | 0.000 | 0.000 |

4 | 0.336 | 0.330 | 0.330 | 0.000 | 0.000 |

5 | 0.330 | 0.333 | 0.333 | 0.000 | 0.000 |

6 | 0.333 | 0.331 | 0.331 | 0.000 | 0.000 |

7 | 0.331 | 0.332 | 0.332 | 0.000 | 0.000 |

8 | 0.332 | 0.332 | 0.332 | 0.000 | 0.000 |

9 | 0.332 | 0.332 | 0.332 | 0.000 | 0.000 |

10 | 0.332 | 0.332 | 0.332 | 0.000 | 0.000 |

15 | 0.332 | 0.332 | 0.332 | 0.000 | 0.000 |

20 | 0.332 | 0.332 | 0.332 | 0.000 | 0.000 |

30 | 0.332 | 0.332 | 0.332 | 0.000 | 0.000 |

40 | 0.332 | 0.332 | 0.332 | 0.000 | 0.000 |

50 | 0.332 | 0.332 | 0.332 | 0.000 | 0.000 |

99 | 0.332 | 0.332 | 0.332 | 0.000 | 0.000 |

As you can see, page D and E have a score of 0, but D does have an inlink (E points to D), so D shouldn’t have a zero score. Maybe lower than A,B,or C, but certainly not zero. Also, E links to A, so A should have a slightly higher score than B or C. Upon convergence, A,B, and C have the same score. That’s not what we want.

Let’s try this again with the damping formula at $ \alpha = .85 $.

**With Damping Formula**

Iterations | PageA | PageB | PageC | PageD | PageE |
---|---|---|---|---|---|

Iteration 0 | 0.341100 | 0.256100 | 0.256100 | 0.115000 | 0.030000 |

Iteration 1 | 0.292642 | 0.316017 | 0.316017 | 0.042699 | 0.029949 |

Iteration 2 | 0.323239 | 0.300576 | 0.300576 | 0.042648 | 0.029920 |

Iteration 3 | 0.310077 | 0.306993 | 0.306993 | 0.042625 | 0.029909 |

Iteration 4 | 0.315510 | 0.304109 | 0.304109 | 0.042609 | 0.029898 |

Iteration 5 | 0.313038 | 0.305177 | 0.305177 | 0.042594 | 0.029887 |

Iteration 6 | 0.313926 | 0.304565 | 0.304565 | 0.042578 | 0.029876 |

Iteration 7 | 0.313386 | 0.304667 | 0.304667 | 0.042563 | 0.029865 |

Iteration 8 | 0.313453 | 0.304466 | 0.304466 | 0.042547 | 0.029854 |

Iteration 9 | 0.313262 | 0.304394 | 0.304394 | 0.042532 | 0.029844 |

Iteration 10 | 0.313181 | 0.304267 | 0.304267 | 0.042516 | 0.029833 |

Iteration 15 | 0.312602 | 0.303719 | 0.303719 | 0.042439 | 0.029779 |

Iteration 30 | 0.310902 | 0.302067 | 0.302067 | 0.042208 | 0.029617 |

Iteration 40 | 0.309774 | 0.300971 | 0.300971 | 0.042055 | 0.029509 |

Iteration 50 | 0.308649 | 0.299878 | 0.299878 | 0.041902 | 0.029402 |

Iteration 99 | 0.303199 | 0.294583 | 0.294583 | 0.041162 | 0.028883 |

**Now that looks way better!**. A has the highest score, B and C are equal. D has a non-zero score (albeit low), and E has a very low score so it will rank last.

Here’s the final scores of the network:

PageA | PageB | PageC | PageD | PageE | |
---|---|---|---|---|---|

Rank | 1 | 2 | 2 | 3 | 4 |

*Note: We aren’t going to handle a tie in a score for the purposes in this article*

Let’s just stop here for a second and say `YAY!`

. Hopefully you can appreciate how awesome that is and if you managed to get through all these pages reading you deserve a pat on the back.

At this point, I think we can move on to discussing **where pagerank fails** . As with any algorithm, PageRank (particularly the older versions) has a number of weaknesses which we can exploit with better knowledge because we now know how it works.

`I'll list 2 obvious weaknesses:`

One of the more obvious weaknesses is that older is better. Because it takes some time to converge to the optimal vector, the network is biased toward web pages that have had longer time to converge. New networks are probably initialized at small values, so they will take some time to move up the rankings of PageRank.

Is centrality really better when we are talking about relevancy of a search? PageRank will populate more central nodes to less central nodes, but that has a couple impacts:

1. It exacerbates the scoring bias. More central pages get more links, which in turn bootstraps their rank.

2. Pages that are potentially more relevant to the user may lose out due to their lack of centrality.

We’ve only been working in a 4~5 node universe. As the final material section, let’s do one example where we kick up the nodes to a much larger amount and see what happens. For kicks lets do 15 nodes and 30 connections. *Note This data is random so you can refesh if you want.*

A random graph of 15 nodes and 30 connections

Link Slider: Change the # of links on "Generate". Currently: 30

Here’s the Markov Matrix related to it:

Hopefully you have a at least one dangling node in this. If not just update the data and you should get some dangling nodes over time.

**The Result**

Using the Power Method, the result should be:

Huzzah! I can’t comment on your results because it is random but if you order them and rank them you should get something like this:

There’s a lot more that we haven’t discussed that is factored into the PageRank algorithm. For example, we didn’t discussing **adding a personalization vector** which is a vector used to combat things like spamming. It will be the same size of the pagerank vector $\pi$ but applies a preference toward certain pages over others. Think of it as a preference vector.

We also could go into more detail about space lumping, that is lumping Non-Dangling nodes together and Dangling nodes together into a single node.

Follow the links below for further research.

- http://infolab.stanford.edu/~backrub/google.html - The original paper of Sergey Brin and Larry Page they sumbitted in 1996
- http://home.ie.cuhk.edu.hk/~wkshum/papers/pagerank.pdf - Really good primer
- http://www.math.cornell.edu/~mec/Winter2009/RalucaRemus/Lecture3/lecture3.html - Good high level overview of different approaches toward convergence
- http://www4.ncsu.edu/~ipsen/ps/slides_dagstuhl07071.pdf - Really good for understanding the linear algebra applications
- https://uu.diva-portal.org/smash/get/diva2:536076/FULLTEXT01.pdf - More about convergence speeds
- http://www.cems.uvm.edu/~tlakoba/AppliedUGMath/other_Google/Wills.pdf - Another primer