Wikipedia’s XML Dump, MySQL and PHP

Aug 7th 2006 View Comments




For a corpa to use for an as-yet-unnamed project I’m working on, I’ve been struggling with the unwieldy wikipedia XML dump.

1.4gb of pure XML wikicontent. A huge pain to import however, since SQL dumps are no longer directly released. I had to install mediawiki’s (the software that wikipedia runs on) database structure (in the source code its in maintentance/tables.sql), then run a java program called mwdumper to create an enourmous SQL file. All of that didn’t take very long, what’s taking a while now is actually importing that SQL file.

If you liked this post, don't forget to subscribe to my RSS feed or my email newsletter so you never miss the science.

Take this quick survey and tell me what social media data you'd like to see me analyze.

  • Pingback: Netherbound » Blog Archive » a ge(r)m of an idea…

  • Vasco

    Do you have any old dumps available that I could download? The “official” ones aren’t working and nobody even replies to my many tries… Please?

    Thanks,

    Vasco

  • http://www.qldeye.com/ Simon

    I tried MWDumper on WinXP about a year ago and if I remember correctly it worked then.

    Also, check here for English Wikipedia data dumps:

    http://download.wikimedia.org/enwiki/

    The 20070527 directory has a valid copy of wikipedia.

    Simon

  • http://www.semanticfocus.com/ James

    I downloaded the 5.04G “full” dump and used their xml2sql program to turn the XML file into a SQL dump. So far the file is 13.9G (for just one of the files being created). How large were your SQL dumps when they were completed and which file did you originally download from Wikipedia?