Wikipedia’s XML Dump, MySQL and PHP

Posted on Aug 7th, 2006
Share 

For a corpa to use for an as-yet-unnamed project I’m working on, I’ve been struggling with the unwieldy wikipedia XML dump.

1.4gb of pure XML wikicontent. A huge pain to import however, since SQL dumps are no longer directly released. I had to install mediawiki’s (the software that wikipedia runs on) database structure (in the source code its in maintentance/tables.sql), then run a java program called mwdumper to create an enourmous SQL file. All of that didn’t take very long, what’s taking a while now is actually importing that SQL file.

If you liked this post, don't forget to subscribe to my RSS feed or my email newsletter so you never miss the science.

View Comments to “Wikipedia’s XML Dump, MySQL and PHP”

  1. Netherbound » Blog Archive » a ge(r)m of an idea… Says:

    [...] They provide a handy piece of software called MWDumper–which doesn’t work (on WinXP–apparently it works fine on *nix). [...]

  2. Vasco Says:

    Do you have any old dumps available that I could download? The “official” ones aren’t working and nobody even replies to my many tries… Please?

    Thanks,

    Vasco

  3. Simon Says:

    I tried MWDumper on WinXP about a year ago and if I remember correctly it worked then.

    Also, check here for English Wikipedia data dumps:

    http://download.wikimedia.org/enwiki/

    The 20070527 directory has a valid copy of wikipedia.

    Simon

  4. James Says:

    I downloaded the 5.04G “full” dump and used their xml2sql program to turn the XML file into a SQL dump. So far the file is 13.9G (for just one of the files being created). How large were your SQL dumps when they were completed and which file did you originally download from Wikipedia?

Leave a Reply

blog comments powered by Disqus

Get my 22 page report full of scientifically proven ways to get more ReTweets by subscribing to my blog via email.

the social media marketing book

Key Posts

Recent Posts

Topics

Blogroll

Copyright © 2010 by Dan Zarrella, social media marketing and viral marketing consultant. All rights reserved. site map

DanZarrella.com, Social & Viral Marketing Scientist