This article introduces how to setup an offline Wikipedia which supports keywords search.

Before you start, you should read the great tutorial provided by Thanassis Tsiodras:

http://users.softlab.ntua.gr/~ttsiod/buildWikipediaOffline.html

Read it carefully to make sure you know what’s going on. Then read this supplement:

http://jsomers.net/blog/offline-wikipedia

Now let’s move on!

  1. Download official .xml.bz2 dumps:

The address is:

http://dumps.wikimedia.org/enwiki/latest/

According to the official notice, downloading enwiki-latest-pages-articles.xml.bz2 alone should suffice. But I’ve made up a list named enwikidl.txt:

http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-abstract.xml
http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-all-titles-in-ns0.gz
http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-category.sql.gz
http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-categorylinks.sql.gz
http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-externallinks.sql.gz
http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-flaggedpages.sql.gz
http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-flaggedrevs.sql.gz
http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-image.sql.gz
http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-imagelinks.sql.gz
http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-interwiki.sql.gz
http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-iwlinks.sql.gz
http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-langlinks.sql.gz
http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-md5sums.txt
http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-oldimage.sql.gz
http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-page.sql.gz
http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-page_props.sql.gz
http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-page_restrictions.sql.gz
http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pagelinks.sql.gz
http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2
http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-logging.xml.gz
http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-protected_titles.sql.gz
http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-redirect.sql.gz
http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-site_stats.sql.gz
http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-stub-articles.xml.gz
http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-templatelinks.sql.gz
http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-user_groups.sql.gz

which can be used with wget as follows, in case you need other files:

    wget -c -i enwikidl.txt

Note. Some very large files are not included in the list.

Don’t decompress it! We won’t need the decompressed file!

  1. Install Xapian and Django. If you use Arch Linux, type

    pacman -S xapian-core django

The case in other distros should be similar. You can also compile it from source.

  1. Download this package:

http://users.softlab.ntua.gr/~ttsiod/offline.wikipedia.tar.bz2

Decompress it and enter the decompressed folder. The svn server in the Makefile doesn’t work. So we need to manually download the needed file from:

http://users.softlab.ntua.gr/~ttsiod/mediawiki_sa.tar.7z

Put it in the folder offline.wikipedia, then decompress it. So you should have folder structure offline.wikipedia/mediawiki_sa/

Then manually change the Makefile. Specifically, remove the line:

    @svn co http://fslab.de/svn/wpofflineclient/trunk/mediawiki_sa/ mediawiki_sa || echo Failed to get from svn...
  1. Move the .xml.bz2 downloaded in Step 1 to offline.wikipedia/wiki-splits/ Then change the first line of the Makefile to

    XMLBZ2 = enwiki-latest-pages-articles.xml.bz2

  2. Type ‘make’ and wait for 5 to 6 hours. Have a cup of coffee.

  3. Enter the folder offline.wikipedia/mywiki. Run

    python manage.py runserver

Then open your browser at http://localhost:8000/ If everything is fine, you should be able to use it. I encountered some problems. The browser shows the parser has error and the console shows PHP is not being able to open /var/tmp/result. If this happens, edit /etc/php/php.ini, append /var/tmp/ to open_basedir, like this

    open_basedir = /foldera/:/folderb/:/folderc/:/var/tmp/

After this, there is still a PHP Notice ‘Only variable references should be returned by reference’. This can be fixed by editing offline.wikipedia/mediawiki_sa/includes/DatabaseFunctions.php. Modify Line 53 to 55 as:

        $ret = null;
        //$ret =& $wgLoadBalancer->getConnection( $db, true, $groups );
        return $ret;

The source of MediaWiki may help you:

http://phpxref.com/xref/mediawiki/index.html

  1. To make it more pleasant, add the following lines beneath </form> in Line 97, offline.wikipedia/mywiki/show.pl:

  2. Download images (not done yet):

Use Wikix:

http://meta.wikimedia.org/wiki/Wikix