This article introduces how to setup an offline Wikipedia which supports keywords search.

Before you start, you should read the great tutorial provided by Thanassis Tsiodras:

http://users.softlab.ntua.gr/~ttsiod/buildWikipediaOffline.html

Read it carefully to make sure you know what’s going on. Then read this supplement:

http://jsomers.net/blog/offline-wikipedia

Now let’s move on!

  1. Download official .xml.bz2 dumps:

    The address is:

    http://dumps.wikimedia.org/enwiki/latest/

    According to the official notice, downloading enwiki-latest-pages-articles.xml.bz2 alone should suffice. But I’ve made up a list named enwikidl.txt:

    http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-abstract.xml
    http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-all-titles-in-ns0.gz
    http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-category.sql.gz
    http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-categorylinks.sql.gz
    http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-externallinks.sql.gz
    http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-flaggedpages.sql.gz
    http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-flaggedrevs.sql.gz
    http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-image.sql.gz
    http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-imagelinks.sql.gz
    http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-interwiki.sql.gz
    http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-iwlinks.sql.gz
    http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-langlinks.sql.gz
    http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-md5sums.txt
    http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-oldimage.sql.gz
    http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-page.sql.gz
    http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-page_props.sql.gz
    http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-page_restrictions.sql.gz
    http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pagelinks.sql.gz
    http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2
    http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-logging.xml.gz
    http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-protected_titles.sql.gz
    http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-redirect.sql.gz
    http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-site_stats.sql.gz
    http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-stub-articles.xml.gz
    http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-templatelinks.sql.gz
    http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-user_groups.sql.gz
    

    which can be used with wget as follows, in case you need other files:

        wget -c -i enwikidl.txt
    

    Note. Some very large files are not included in the list.

    Don’t decompress it! We won’t need the decompressed file!

  2. Install Xapian and Django. If you use Arch Linux, type

    pacman -S xapian-core django
    

    The case in other distros should be similar. You can also compile it from source.

  3. Download this package:

    http://users.softlab.ntua.gr/~ttsiod/offline.wikipedia.tar.bz2

    Decompress it and enter the decompressed folder. The svn server in the Makefile doesn’t work. So we need to manually download the needed file from:

    http://users.softlab.ntua.gr/~ttsiod/mediawiki_sa.tar.7z

    Put it in the folder offline.wikipedia, then decompress it. So you should have folder structure offline.wikipedia/mediawiki_sa/.

    Then manually change the Makefile. Specifically, remove the line:

    @svn co http://fslab.de/svn/wpofflineclient/trunk/mediawiki_sa/ mediawiki_sa || echo Failed to get from svn...
    
  4. Move the .xml.bz2 downloaded in Step 1 to offline.wikipedia/wiki-splits/ Then change the first line of the Makefile to

    XMLBZ2 = enwiki-latest-pages-articles.xml.bz2
    
  5. Type ‘make’ and wait for 5 to 6 hours. Have a cup of coffee.

  6. Enter the folder offline.wikipedia/mywiki. Run

    python manage.py runserver
    

    Then open your browser at http://localhost:8000/ If everything is fine, you should be able to use it. I encountered some problems. The browser shows the parser has error and the console shows PHP is not being able to open /var/tmp/result. If this happens, edit /etc/php/php.ini, append /var/tmp/ to open_basedir, like this

    open_basedir = /foldera/:/folderb/:/folderc/:/var/tmp/
    

    After this, there is still a PHP Notice ‘Only variable references should be returned by reference’. This can be fixed by editing offline.wikipedia/mediawiki_sa/includes/DatabaseFunctions.php. Modify Line 53 to 55 as:

    $ret = null;
    //$ret =& $wgLoadBalancer->getConnection( $db, true, $groups );
    return $ret;
    

    The source of MediaWiki may help you:

    http://phpxref.com/xref/mediawiki/index.html

  7. To make it more pleasant, add the following lines beneath </form> in Line 97, offline.wikipedia/mywiki/show.pl:

    <style type="text/css">
        body {
            font-family: Verdana;
            font-size: 12.23px;
            line-height: 1.5em;
        }
        a {
            color: #1166bb;
            text-decoration: none;
        }
        a:hover {
            border-bottom: 1px solid #1166bb;
        }
        .reference {
            font-size: 7.12px;
        }
        .references {
            font-size: 10.12px;
        }
    </style>
    
  8. Download images (not done yet):

    Use Wikix:

    http://meta.wikimedia.org/wiki/Wikix