HomeReportsToolsPortfolioBioContact
Wikipedia’s XML Dump, MySQL and PHP

If you're new here, you may want to subscribe to my RSS feed, follow me on Twitter or sign up to my email list.

If you want to talk about how I can help with your social & viral marketing campaigns, contact me. Thanks!

For a corpa to use for an as-yet-unnamed project I’m working on, I’ve been struggling with the unwieldy wikipedia XML dump.

1.4gb of pure XML wikicontent. A huge pain to import however, since SQL dumps are no longer directly released. I had to install mediawiki’s (the software that wikipedia runs on) database structure (in the source code its in maintentance/tables.sql), then run a java program called mwdumper to create an enourmous SQL file. All of that didn’t take very long, what’s taking a while now is actually importing that SQL file.

If you liked this post, share it:
  • del.icio.us
  • Reddit
  • StumbleUpon
  • Technorati
  • Digg
  • Facebook
  • LinkedIn
  • Mixx
  • Pownce
  • TwitThis

Contact me to talk about what I can do to help your viral & social marketing campaigns, and don't forget to subscribe to my feed, join my email list or follow me on Twitter to stay up to date.


COMMENTS / 4 COMMENTS

[...] They provide a handy piece of software called MWDumper–which doesn’t work (on WinXP–apparently it works fine on *nix). [...]

Netherbound » Blog Archive » a ge(r)m of an idea… added these pithy words on Jan 12 07 at 7:07 am

Do you have any old dumps available that I could download? The “official” ones aren’t working and nobody even replies to my many tries… Please?

Thanks,

Vasco

Vasco added these pithy words on May 07 07 at 11:33 am

I tried MWDumper on WinXP about a year ago and if I remember correctly it worked then.

Also, check here for English Wikipedia data dumps:

http://download.wikimedia.org/enwiki/

The 20070527 directory has a valid copy of wikipedia.

Simon

Simon added these pithy words on Jul 02 07 at 10:47 pm

I downloaded the 5.04G “full” dump and used their xml2sql program to turn the XML file into a SQL dump. So far the file is 13.9G (for just one of the files being created). How large were your SQL dumps when they were completed and which file did you originally download from Wikipedia?

James added these pithy words on Nov 28 07 at 12:49 pm

SPEAK / ADD YOUR COMMENT
Comments are moderated.

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Return to Top