Hi,
I am planning to study the difference in users edits style and their
spelling errors in English Wikipedia as part of a research project I am
involved in.
So I downloaded some of the wikipedia XML partial dump and convert them to
SQL. My understanding that wikipedia stores every copy of the pages in the
database.
- I can not see the users table! Is the users table stored in a special
partial dump?
- Does the user table contain any properties related to the user country,
preferred wikipeidas, or their skill in different languages ?
- I am interested in the user modifications that contain addition to the
articles and not modification or deletion. I am planning now to diff between
revisions to get such data. Are you aware of any tool or effort that can
help?
- Are you aware of any tools that extract the text from wikipedia markup
language.
Regards.
--
Rami Al-Rfou'
PhD student at Stony Brook University