Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Wiki[mp]edia data sources & the MediaWiki API
Search
Brianna Laugher
November 09, 2009
Technology
0
210
Wiki[mp]edia data sources & the MediaWiki API
Presented at Melbourne Hack Weekend, 2009.
Brianna Laugher
November 09, 2009
Tweet
Share
More Decks by Brianna Laugher
See All by Brianna Laugher
Realities of open source testing: Lessons learned from Adopt Pytest Month
pfctdayelise
0
230
Crowd funded free software
pfctdayelise
0
180
Dynamic visualisation in the IPython Notebook
pfctdayelise
0
260
Funcargs and other fun with pytest
pfctdayelise
0
270
Zookeepr: home grown conference management software
pfctdayelise
0
180
Why "gender" should be a text field
pfctdayelise
0
250
Distributed wikis
pfctdayelise
0
190
Neurosexism
pfctdayelise
0
300
Clash of the encyclopedias: is competition good for sharing?
pfctdayelise
0
190
Other Decks in Technology
See All in Technology
学生・新卒・ジュニアから目指すSRE
hiroyaonoe
2
760
10Xにおける品質保証活動の全体像と改善 #no_more_wait_for_test
nihonbuson
PRO
2
330
OWASP Top 10:2025 リリースと 少しの日本語化にまつわる裏話
okdt
PRO
3
840
モダンUIでフルサーバーレスなAIエージェントをAmplifyとCDKでサクッとデプロイしよう
minorun365
4
220
20260208_第66回 コンピュータビジョン勉強会
keiichiito1978
0
190
ブロックテーマでサイトをリニューアルした話 / 2026-01-31 Kansai WordPress Meetup
torounit
0
480
量子クラウドサービスの裏側 〜Deep Dive into OQTOPUS〜
oqtopus
0
150
Cosmos World Foundation Model Platform for Physical AI
takmin
0
970
20260204_Midosuji_Tech
takuyay0ne
1
160
Ruby版 JSXのRuxが気になる
sansantech
PRO
0
170
今こそ学びたいKubernetesネットワーク ~CNIが繋ぐNWとプラットフォームの「フラッと」な対話
logica0419
5
400
usermode linux without MMU - fosdem2026 kernel devroom
thehajime
0
240
Featured
See All Featured
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
16
1.8k
Organizational Design Perspectives: An Ontology of Organizational Design Elements
kimpetersen
PRO
1
470
YesSQL, Process and Tooling at Scale
rocio
174
15k
Making the Leap to Tech Lead
cromwellryan
135
9.7k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
141
34k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
231
22k
Testing 201, or: Great Expectations
jmmastey
46
8.1k
Tell your own story through comics
letsgokoyo
1
810
Color Theory Basics | Prateek | Gurzu
gurzu
0
200
Why Our Code Smells
bkeepers
PRO
340
58k
Amusing Abliteration
ianozsvald
0
100
Optimizing for Happiness
mojombo
379
71k
Transcript
Wiki[mp]edia data sources & the MediaWiki API Brianna Laugher for
#melhack November 2009
...
Wikipedia 13M articles total 3M+ articles in English 240+ languages
Simple English!
{{coord|37|48|49|S|144|57| 47|E|type:city_region:AU-VIC| display=inline,title}} stable.toolserver.org/geohack/ wiki.toolserver.org/view/GeoHack
{{Infobox Company |name = Lonely Planet |logo = |type =
[[United Kingdom|British]] [[Government-owned company|government-owned]] (subsidiary of [[BBC Worldwide]]) |genre = [[Guide book|Travel guides]] |foundation = 1972 |founder = Tony Wheeler<br /> Maureen Wheeler |location_city = [[Footscray, Victoria]] |location_country = [[Australia]] |location = |origins = |key_people = Matt Goldberg <small>(Global [[CEO]])</small> |area_served = Worldwide |industry = [[Multi media]] |products = Travel [[guidebook, digital applications, online travel community]] |services =
Wikimedia Commons commons.wikimedia.org Multilingual 5M+ files “Self-created”, PD, Flickr Predominantly
photographs, but also diagrams, maps, flags
None
Wiktionary 5M+ entries 170+ languages 13 languages > 100K entries
French biggest at 1.5M (English second at 1.4M)
JavaScript Wiktionary lookup plugin for third parties: http://bawolff.blogspot.com/2009/10/introducing- wiktionary-lookup-now-for.html http://en.wiktionary.org/wiki/Wiktionary:Parsing
Users Logs Pages, subpages, talk pages
Links, backlinks Templates Categories MediaWiki structure
MediaWiki markup The only thing that completely understands it is
MediaWiki :(
XML download.wikimedia.org OR Amazon Public Data Sets meta.wikimedia.org/wiki/ Data_dumps Database
dumps
DBpedia Community project extracting structured data from Wikipedia and making
it available Can download data sets or query them online Ontology++ e.g. dbpedia.org/page/Lonely_Planet
MediaWiki API mediawiki.org/wiki/API en.wikipedia.org/w/api.php Client libraries!
mwclient Python library for accessing MediaWiki APIs
None
toolserver.org Server for community-developed plugins, addons, extensions, stats and hacks
– tools Tools often explicitly implements implicit editing community standards (“community API”) Toolserver
TemplateTiger toolserver.org/~kolossos/templatetiger/ For a few dozen Wikipedia languages, & Wikimedia
Commons Lets you query templates very much like SQL
identi.ca/pfctdayelise
[email protected]
Thanks! Logos and screenshots may be copyright their
respective owners Slides are otherwise © Brianna Laugher