Now blogging at dkeithrobinson.com | Good Stuff: Web Hosting by Dreamhost

Hacking Wikipedia?

March 28, 2005 | Comments 14 Comments

This is more of a question than a post. I’ve been thinking about the Wikipedia quite a bit lately and I wanted to know if anyone out there knows of any APIs for hooking into or querying it. I realize that you can download the Wikipedia database, but I’m curious to know if it’s possible to run queries or extract data from it in some way.

Any ideas or examples would be greatly appreciated. As well, if you know of any applications or sites that are using the Wikipedia in anyway, I’d love to see those as well.

Filed under: News

Comments

1. Jeremy Flint said:

I have been using Trillian on my PC at work for instant messaging, and it will highlight words and show you definitions.

Starting with 3.0, those terms are pulled from Wikipedia.

Posted on March 28, 2005 11:37 AM | #

2. David Barrett said:

What exactly are you looking to do with the Wikipedia, Keith? Ultimately, you’re looking for something to fulfil certain needs, and before we can track down something that’s useful we’ll need to have a better idea of what you’re trying to do.

There is at least one API available for Perl, but WWW::Wikipedia at least is of very limited utility.

The Special:Export stuff (http://en.wikipedia.org/wiki/Special:Export) seems to be the what you would want to build an API off of. Take a look at the Special:Export stuff for the Train article

It looks promising… there’s plenty of stuff available to use. But as the Wikipedia site only contains a reference to one Perl API, I suspect one hasn’t been developed yet to meet your needs (unless all you’re looking for is the ability to search the site).

Posted on March 28, 2005 12:16 PM | #

3. Keith said:

David – I wish I had a better answer for you, but really I’m not 100% sure what I’m trying to do–if anything really. I’m just curious.

I guess what I’m thinking about is a way to pull Wikipedia information into another Web application or site. Like how I can pull RSS or Amazon information into Asterisk.

Say, for example, I wanted to create an online bookstore and in addition to the information I pulled from Amazon, my own information, or whatever, I wanted to also pull info from the Wikipedia for the author, etc. and have that info displayed within my application.

Something like that…

Posted on March 28, 2005 12:40 PM | #

4. Thomas Baekdal said:

You could use the server side variation of XMLhttprequest (…or even client side) to retrieve pages - and the modify/strip the parts you need.

Retrieve the data, and take out the innerHTML of the “content” div, then placed it somewhere on your site.

Posted on March 28, 2005 12:50 PM | #

5. Chris Vincent said:

I’m not sure about an API or anything, but http://www.answers.com/ pulls information from Wikipedia when applicable. It’s actually a pretty brilliant idea; I wish I would have come up with it first.

Answers.com is also the new definitions site used by Google. I like it much better than dictionary.com.

Posted on March 28, 2005 01:29 PM | #

6. Keith said:

Chris – Ok, yeah, just like that! So, anyone know how they do that??

Posted on March 28, 2005 01:35 PM | #

7. David Barrett said:

Keith: Well, WWW::Wikipedia should be able to do that for you. The problem, of course, is making sure that the entry in the Wikipedia for that name is actually an entry for that author. It could refer to someone else entirely.

Still, it could be what you’re looking for.

Posted on March 28, 2005 01:38 PM | #

8. Simon Willison said:

I’ve used Special:Export and it works pretty well. You have to be very careful with automated queries of Wikipedia though, as they already have pretty bad performance and scaling problems and you wouldn’t want to do anything that impacted the performance of the site.

Posted on March 28, 2005 01:41 PM | #

9. Daniel said:

I’ve been thinking about something like this, too, given how many times in the past few weeks I’ve referred directly to Wikipedia material. I had in mind the idea that I could as an SOP earmark terms which I thought potentially needed some further explanation I wasn’t intending to provide, and have the appropriate Wikipedia article linked. I even considered an experiment wherein every single word of a post was linked to Wikipedia, which may or may not be useful or engaging, but it might be interesting nonetheless.

However, being of meager technical skill, and with too little time to write let alone explore this sort of thing, I left it to die like so many other pipe dreams. Wouldn’t it be nifty if there were some plugin developed with some simple markup syntax that would take something like “wp:Godel:wp” and turn it into a link to the Kurt Godel entry?

Good luck.

Posted on March 28, 2005 08:15 PM | #

10. dan hartung said:

Well, you probably want to start with Wikipedia:Tools, although it chiefly lists browser-side thingamabobs for the ease of editing and tracking. There’s an apparently moribund Wikipedia API for Python project. For Firefox, it would be trivial to write a greasemonkey script to mimic Google’s AutoLink (or M$ Smart Tags), as Mark Pilgrim did with his “Butler” script, only directing to Wikipedia. Getting correct results would be tricky – I wrote in my blog about a site, upto11, that pulls in band articles, except there are often errors – such as Manitoba (Caribou) displaying the article for the province!

Posted on March 28, 2005 09:26 PM | #

11. Keith said:

Daniel – This stuff went quickly over my head also! But at least I know it can be done…

dan – That Upto11 site is very close to what I was thinking as far as Wikipedia use. It was the great entries on music, movies, and books that got me thinking…and I can’t help but think that the Wikipedia could be doing much more…combine it with something like Ourmedia and, well….

Anyway, Upto11 has an interesting take on an similar idea I was tossing about the other day. Not the same, but close. For one thing it seems almost too automated…I like my recommendations with more of a personal touch. But the use of Wikipedia is almost exactly what I was thinking. I can see how it’d need a good editor (or three) to avoid errors like you mention.

I wonder if they’d tell me how they did it? ;0)

Posted on March 28, 2005 11:32 PM | #

12. Anil said:

Gina’s made some great progress with her WikipedizeText web service.

Posted on March 29, 2005 01:33 AM | #

13. Jonathan Fenocchi said:

Answers.com queries the Wikipedia site, or so it seems. I’m working on a JavaScript artificial intelligence robot which will basically respond to any text you put into a box, and I had planned on using Ajax to retrieve related information on specific topics to make the robot smarter. Unfortunately, neither Answers.com nor Wikipedia offers an XML query service, so I emailed them and surprisingly got a response. As far as I know, however, they haven’t implemented such a thing. As a result, my JavaScript artifical intelligence robot has been consolidating in the empty virtual reality that is cyberspace.

Posted on March 29, 2005 05:03 PM | #

14. Gina said:

I love WikiWax typeahead search, like Google Suggest for Wikipedia.

Posted on March 31, 2005 06:51 AM | #

Comments are now closed

Entry Archives

You are reading Hacking Wikipedia? posted on March 28, 2005 and filed under News.

About the Author

is a Web designer and developer in Seattle, Washington. More »


7nights.com  Web


Old Stuff: