Reading:Python et Google PageRank

Python et Google PageRank

pythonSuite à mes recherches pour trouver un script de calcul de Pagerank en python, je souhaitais vous présenter deux pages qui me sont tombées sous les yeux.

Script de vérification de Pagerank en Python :

Ce script permet d’aller chercher la valeur de Pagerank Google d’un site donné en argument. Cela sert pour toutes sortes de choses dans le monde du SEO. La page en question est là :

http://blogmag.net/blog/read/91/Python_code_to_check_your_Google_PageRank

Testé aujourd’hui (1er Juillet 2009), le script fonctionne parfaitement bien. Ceux que j’ai pu trouver en php ne marchaient pas ( google me considère comme un pirate vous savez). Si vous en possédez en PHP, je suis preneur

Script de calcul de PageRank ( Google-like)

La page suivante se base sur un article publié à l’AMS, mais dont le principe ne décrit pas exactement le fonctionnement de Google.

Le script écrit en python tente de reproduire le fonctionnement d’un algorithme de Pagerank. Je ne l’ai pas testé, mais ça peut interesser du monde :

http://www.eioba.com/a69792/the_google_pagerank_algorithm_in_126_lines_of_python

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# (C) 2008 Fred Cirera
# ported in Python from the Ruby code by Vsevolod S. Balashov
# http://snippets.dzone.com/posts/show/3284

import urllib2 import re import time import sys

from urllib import urlencode from pprint import pprint

HOST = “toolbarqueries.google.com”

def mix(a, b, c): M = lambda v: v % x100000000 # int32 modulo a, b, c = (M(a), M(b), M(c))

<span class="n">a</span> <span class="o">=</span> <span class="n">M</span><span class="p">(</span><span class="n">a</span><span class="o">-</span><span class="n">b</span><span class="o">-</span><span class="n">c</span><span class="p">)</span> <span class="o">^</span> <span class="p">(</span><span class="n">c</span> <span class="o">&gt;&gt;</span> <span class="mf">13</span><span class="p">)</span>
<span class="n">b</span> <span class="o">=</span> <span class="n">M</span><span class="p">(</span><span class="n">b</span><span class="o">-</span><span class="n">c</span><span class="o">-</span><span class="n">a</span><span class="p">)</span> <span class="o">^</span> <span class="p">(</span><span class="n">a</span> <span class="o">&lt;&lt;</span>  <span class="mf">8</span><span class="p">)</span>
<span class="n">c</span> <span class="o">=</span> <span class="n">M</span><span class="p">(</span><span class="n">c</span><span class="o">-</span><span class="n">a</span><span class="o">-</span><span class="n">b</span><span class="p">)</span> <span class="o">^</span> <span class="p">(</span><span class="n">b</span> <span class="o">&gt;&gt;</span> <span class="mf">13</span><span class="p">)</span>

<span class="n">a</span> <span class="o">=</span> <span class="n">M</span><span class="p">(</span><span class="n">a</span><span class="o">-</span><span class="n">b</span><span class="o">-</span><span class="n">c</span><span class="p">)</span> <span class="o">^</span> <span class="p">(</span><span class="n">c</span> <span class="o">&gt;&gt;</span> <span class="mf">12</span><span class="p">)</span>
<span class="n">b</span> <span class="o">=</span> <span class="n">M</span><span class="p">(</span><span class="n">b</span><span class="o">-</span><span class="n">c</span><span class="o">-</span><span class="n">a</span><span class="p">)</span> <span class="o">^</span> <span class="p">(</span><span class="n">a</span> <span class="o">&lt;&lt;</span> <span class="mf">16</span><span class="p">)</span>
<span class="n">c</span> <span class="o">=</span> <span class="n">M</span><span class="p">(</span><span class="n">c</span><span class="o">-</span><span class="n">a</span><span class="o">-</span><span class="n">b</span><span class="p">)</span> <span class="o">^</span> <span class="p">(</span><span class="n">b</span> <span class="o">&gt;&gt;</span> <span class="mf">5</span><span class="p">)</span>

<span class="n">a</span> <span class="o">=</span> <span class="n">M</span><span class="p">(</span><span class="n">a</span><span class="o">-</span><span class="n">b</span><span class="o">-</span><span class="n">c</span><span class="p">)</span> <span class="o">^</span> <span class="p">(</span><span class="n">c</span> <span class="o">&gt;&gt;</span>  <span class="mf">3</span><span class="p">)</span>
<span class="n">b</span> <span class="o">=</span> <span class="n">M</span><span class="p">(</span><span class="n">b</span><span class="o">-</span><span class="n">c</span><span class="o">-</span><span class="n">a</span><span class="p">)</span> <span class="o">^</span> <span class="p">(</span><span class="n">a</span> <span class="o">&lt;&lt;</span> <span class="mf">10</span><span class="p">)</span>
<span class="n">c</span> <span class="o">=</span> <span class="n">M</span><span class="p">(</span><span class="n">c</span><span class="o">-</span><span class="n">a</span><span class="o">-</span><span class="n">b</span><span class="p">)</span> <span class="o">^</span> <span class="p">(</span><span class="n">b</span> <span class="o">&gt;&gt;</span> <span class="mf">15</span><span class="p">)</span>

<span class="k">return</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span>

def checksum(iurl): C2I = lambda s: sum(c << 8*i for i, c in enumerate(s[:4])) a, b, c = x9e3779b9, x9e3779b9, xe6359a60 lg = len(iurl) k = while k <= lg-12: a = a + C2I(iurl[k:k+4]) b = b + C2I(iurl[k+4:k+8]) c = c + C2I(iurl[k+8:k+12]) a, b, c = mix(a, b, c) k += 12

<span class="n">a</span> <span class="o">=</span> <span class="n">a</span> <span class="o">+</span> <span class="n">C2I</span><span class="p">(</span><span class="n">iurl</span><span class="p">[</span><span class="n">k</span><span class="p">:</span><span class="n">k</span><span class="o">+</span><span class="mf">4</span><span class="p">])</span>
<span class="n">b</span> <span class="o">=</span> <span class="n">b</span> <span class="o">+</span> <span class="n">C2I</span><span class="p">(</span><span class="n">iurl</span><span class="p">[</span><span class="n">k</span><span class="o">+</span><span class="mf">4</span><span class="p">:</span><span class="n">k</span><span class="o">+</span><span class="mf">8</span><span class="p">])</span>
<span class="n">c</span> <span class="o">=</span> <span class="n">c</span> <span class="o">+</span> <span class="p">(</span><span class="n">C2I</span><span class="p">(</span><span class="n">iurl</span><span class="p">[</span><span class="n">k</span><span class="o">+</span><span class="mf">8</span><span class="p">:])</span><span class="o">&lt;&lt;</span><span class="mf">8</span><span class="p">)</span> <span class="o">+</span> <span class="n">lg</span>
<span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span> <span class="o">=</span> <span class="n">mix</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">)</span>
<span class="k">return</span> <span class="n">c</span>

def GoogleHash(value): I2C = lambda i: [i & xff, i >> 8 & xff, i >> 16 & xff, i >> 24 & xff] ch = checksum([ord(c) for c in value]) ch = ((ch % x0d) & 7) | ((ch/7) << 2) return “6%s" % checksum(sum((I2C(ch-9*i) for i in range(20)), []))

def make_url(host, site_url): url = “info:" + site_url params = dict(client=“navclient-auto”, ch="%s" % GoogleHash(url), ie=“UTF-8”, oe=“UTF-8”, features=“Rank”, q=url) return “http://%s/search?%s" % (host, urlencode(params))

# Where the fun begins

if name == "main": if len(sys.argv) != 2: url = http://www.google.com/' else: url = sys.argv[1]

<span class="k">if</span> <span class="ow">not</span> <span class="n">url</span><span class="o">.</span><span class="n">startswith</span><span class="p">(</span><span class="s">'http://'</span><span class="p">):</span>
    <span class="n">url</span> <span class="o">=</span> <span class="s">'http://</span><span class="si">%s</span><span class="s">'</span> <span class="o">%</span> <span class="n">url</span>

<span class="c"># print make_url(HOST, url)</span>
<span class="n">req</span> <span class="o">=</span> <span class="n">urllib2</span><span class="o">.</span><span class="n">Request</span><span class="p">(</span><span class="n">make_url</span><span class="p">(</span><span class="n">HOST</span><span class="p">,</span> <span class="n">url</span><span class="p">))</span>
<span class="k">try</span><span class="p">:</span>
    <span class="n">f</span> <span class="o">=</span> <span class="n">urllib2</span><span class="o">.</span><span class="n">urlopen</span><span class="p">(</span><span class="n">req</span><span class="p">)</span>
    <span class="n">response</span> <span class="o">=</span> <span class="n">f</span><span class="o">.</span><span class="n">readline</span><span class="p">()</span>
<span class="k">except</span> <span class="ne">Exception</span><span class="p">,</span> <span class="n">err</span><span class="p">:</span>
    <span class="k">print</span> <span class="n">err</span>
    <span class="c"># print err.read()</span>
    <span class="n">sys</span><span class="o">.</span><span class="n">exit</span><span class="p">(</span><span class="mf">1</span><span class="p">)</span>

<span class="k">try</span><span class="p">:</span>
    <span class="n">rank</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="s">r'^Rank_\d+:\d+:(\d+)'</span><span class="p">,</span> <span class="n">response</span><span class="o">.</span><span class="n">strip</span><span class="p">())</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="mf">1</span><span class="p">)</span>
<span class="k">except</span> <span class="ne">AttributeError</span><span class="p">:</span>
    <span class="k">print</span> <span class="s">"This page is not ranked"</span>
    <span class="n">rank</span> <span class="o">=</span> <span class="o">-</span><span class="mf">1</span>

<span class="k">print</span> <span class="s">"PagerRank: </span><span class="si">%d</span><span class="se">\t</span><span class="s">URL: </span><span class="si">%s</span><span class="s">"</span> <span class="o">%</span> <span class="p">(</span><span class="nb">int</span><span class="p">(</span><span class="n">rank</span><span class="p">),</span> <span class="n">url</span><span class="p">)</span></pre>