Wikipedia’s XML Dump, MySQL and PHP

For a corpa to use for an as-yet-unnamed project I’m working on, I’ve been struggling with the unwieldy wikipedia XML dump.

1.4gb of pure XML wikicontent. A huge pain to import however, since SQL dumps are no longer directly released. I had to install mediawiki’s (the software that wikipedia runs on) database structure (in the source code its in maintentance/tables.sql), then run a java program called mwdumper to create an enourmous SQL file. All of that didn’t take very long, what’s taking a while now is actually importing that SQL file.

Read More