Offline Wikipedia
This article introduces how to setup an offline Wikipedia which supports keywords search.
Before you start, you should read the great tutorial provided by Thanassis Tsiodras:
http://users.softlab.ntua.gr/~ttsiod/buildWikipediaOffline.html
Read it carefully to make sure you know what’s going on. Then read this supplement:
http://jsomers.net/blog/offline-wikipedia
Now let’s move on!
-
Download official .xml.bz2 dumps:
The address is:
http://dumps.wikimedia.org/enwiki/latest/
According to the official notice, downloading enwiki-latest-pages-articles.xml.bz2 alone should suffice. But I’ve made up a list named
enwikidl.txt
:http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-abstract.xml http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-all-titles-in-ns0.gz http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-category.sql.gz http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-categorylinks.sql.gz http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-externallinks.sql.gz http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-flaggedpages.sql.gz http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-flaggedrevs.sql.gz http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-image.sql.gz http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-imagelinks.sql.gz http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-interwiki.sql.gz http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-iwlinks.sql.gz http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-langlinks.sql.gz http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-md5sums.txt http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-oldimage.sql.gz http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-page.sql.gz http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-page_props.sql.gz http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-page_restrictions.sql.gz http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pagelinks.sql.gz http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2 http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-logging.xml.gz http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-protected_titles.sql.gz http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-redirect.sql.gz http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-site_stats.sql.gz http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-stub-articles.xml.gz http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-templatelinks.sql.gz http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-user_groups.sql.gz
which can be used with wget as follows, in case you need other files:
wget -c -i enwikidl.txt
Note. Some very large files are not included in the list.
Don’t decompress it! We won’t need the decompressed file!
-
Install Xapian and Django. If you use Arch Linux, type
pacman -S xapian-core django
The case in other distros should be similar. You can also compile it from source.
-
Download this package:
http://users.softlab.ntua.gr/~ttsiod/offline.wikipedia.tar.bz2
Decompress it and enter the decompressed folder. The svn server in the Makefile doesn’t work. So we need to manually download the needed file from:
http://users.softlab.ntua.gr/~ttsiod/mediawiki_sa.tar.7z
Put it in the folder
offline.wikipedia
, then decompress it. So you should have folder structureoffline.wikipedia/mediawiki_sa/
.Then manually change the Makefile. Specifically, remove the line:
@svn co http://fslab.de/svn/wpofflineclient/trunk/mediawiki_sa/ mediawiki_sa || echo Failed to get from svn...
-
Move the .xml.bz2 downloaded in Step 1 to
offline.wikipedia/wiki-splits/
Then change the first line of the Makefile toXMLBZ2 = enwiki-latest-pages-articles.xml.bz2
-
Type ‘make’ and wait for 5 to 6 hours. Have a cup of coffee.
-
Enter the folder offline.wikipedia/mywiki. Run
python manage.py runserver
Then open your browser at http://localhost:8000/ If everything is fine, you should be able to use it. I encountered some problems. The browser shows the parser has error and the console shows PHP is not being able to open
/var/tmp/result
. If this happens, edit/etc/php/php.ini
, append/var/tmp/
toopen_basedir
, like thisopen_basedir = /foldera/:/folderb/:/folderc/:/var/tmp/
After this, there is still a PHP Notice ‘Only variable references should be returned by reference’. This can be fixed by editing
offline.wikipedia/mediawiki_sa/includes/DatabaseFunctions.php
. Modify Line 53 to 55 as:$ret = null; //$ret =& $wgLoadBalancer->getConnection( $db, true, $groups ); return $ret;
The source of MediaWiki may help you:
-
To make it more pleasant, add the following lines beneath
</form>
in Line 97,offline.wikipedia/mywiki/show.pl
:<style type="text/css"> body { font-family: Verdana; font-size: 12.23px; line-height: 1.5em; } a { color: #1166bb; text-decoration: none; } a:hover { border-bottom: 1px solid #1166bb; } .reference { font-size: 7.12px; } .references { font-size: 10.12px; } </style>
-
Download images (not done yet):
Use Wikix: