Dive into the archives.
- Wikipedia’s XML Dump, MySQL and PHP
For a corpa to use for an as-yet-unnamed project I’m working on, I’ve been struggling with the unwieldy wikipedia XML dump.
1.4gb of pure XML wikicontent. A huge pain to import however, since SQL dumps are no longer directly released. I had to install mediawiki’s (the software that wikipedia runs on) database structure (in the source […]
- PHP Part-of-Speech Tagger
I ported Jason Wiener’s python POS tagger into a PHP part of speech tagger for use in some stuff I’m working on. Its rough around the edges still, but check it out.




