Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Wiki[mp]edia data sources & the MediaWiki API
Search
Brianna Laugher
November 09, 2009
Technology
0
110
Wiki[mp]edia data sources & the MediaWiki API
Presented at Melbourne Hack Weekend, 2009.
Brianna Laugher
November 09, 2009
Tweet
Share
More Decks by Brianna Laugher
See All by Brianna Laugher
Realities of open source testing: Lessons learned from Adopt Pytest Month
pfctdayelise
0
120
Crowd funded free software
pfctdayelise
0
75
Dynamic visualisation in the IPython Notebook
pfctdayelise
0
140
Funcargs and other fun with pytest
pfctdayelise
0
160
Zookeepr: home grown conference management software
pfctdayelise
0
84
Why "gender" should be a text field
pfctdayelise
0
110
Distributed wikis
pfctdayelise
0
81
Neurosexism
pfctdayelise
0
190
Clash of the encyclopedias: is competition good for sharing?
pfctdayelise
0
73
Other Decks in Technology
See All in Technology
自己改善からチームを動かす! 「セルフエンジニアリングマネージャー」のすゝめ
shoota
6
1k
LLM開発・活用の舞台裏@2024.04.25
yushin_n
3
1.2k
障害対応をちょっとずつよくしていくための 演習の作りかた
heleeen
1
1.7k
ExaDB-D dbaascli で出来ること
oracle4engineer
PRO
0
2.1k
AWSに詳しくない人でも始められるコスト最適化ガイド
yuhta28
2
400
データベース02: データベースの概念
trycycle
0
180
IPUT App Dev. Co. -Overview 2024/4
iputapp
0
130
競技としてのKaggle、役に立つKaggle
yu4u
6
2.4k
Microsoft for Startups Founders Hub_20240429 update
daikikanemitsu
1
2.4k
Rustで「プリズモイダル法」を利用して「土量計算」をガチでやる
nokonoko1203
1
310
Amplify 🩷 Bedrock 〜生成AI入門〜
minorun365
PRO
8
730
Azureの基本的な権限管理の勉強会
yhana
1
2.1k
Featured
See All Featured
Debugging Ruby Performance
tmm1
70
11k
No one is an island. Learnings from fostering a developers community.
thoeni
16
2.1k
The Cult of Friendly URLs
andyhume
74
5.7k
Web development in the modern age
philhawksworth
203
10k
For a Future-Friendly Web
brad_frost
172
9k
Being A Developer After 40
akosma
66
580k
Into the Great Unknown - MozCon
thekraken
14
1k
Embracing the Ebb and Flow
colly
80
4.2k
Why Our Code Smells
bkeepers
PRO
331
56k
ParisWeb 2013: Learning to Love: Crash Course in Emotional UX Design
dotmariusz
104
6.6k
Thoughts on Productivity
jonyablonski
60
3.9k
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
117
18k
Transcript
Wiki[mp]edia data sources & the MediaWiki API Brianna Laugher for
#melhack November 2009
...
Wikipedia 13M articles total 3M+ articles in English 240+ languages
Simple English!
{{coord|37|48|49|S|144|57| 47|E|type:city_region:AU-VIC| display=inline,title}} stable.toolserver.org/geohack/ wiki.toolserver.org/view/GeoHack
{{Infobox Company |name = Lonely Planet |logo = |type =
[[United Kingdom|British]] [[Government-owned company|government-owned]] (subsidiary of [[BBC Worldwide]]) |genre = [[Guide book|Travel guides]] |foundation = 1972 |founder = Tony Wheeler<br /> Maureen Wheeler |location_city = [[Footscray, Victoria]] |location_country = [[Australia]] |location = |origins = |key_people = Matt Goldberg <small>(Global [[CEO]])</small> |area_served = Worldwide |industry = [[Multi media]] |products = Travel [[guidebook, digital applications, online travel community]] |services =
Wikimedia Commons commons.wikimedia.org Multilingual 5M+ files “Self-created”, PD, Flickr Predominantly
photographs, but also diagrams, maps, flags
None
Wiktionary 5M+ entries 170+ languages 13 languages > 100K entries
French biggest at 1.5M (English second at 1.4M)
JavaScript Wiktionary lookup plugin for third parties: http://bawolff.blogspot.com/2009/10/introducing- wiktionary-lookup-now-for.html http://en.wiktionary.org/wiki/Wiktionary:Parsing
Users Logs Pages, subpages, talk pages
Links, backlinks Templates Categories MediaWiki structure
MediaWiki markup The only thing that completely understands it is
MediaWiki :(
XML download.wikimedia.org OR Amazon Public Data Sets meta.wikimedia.org/wiki/ Data_dumps Database
dumps
DBpedia Community project extracting structured data from Wikipedia and making
it available Can download data sets or query them online Ontology++ e.g. dbpedia.org/page/Lonely_Planet
MediaWiki API mediawiki.org/wiki/API en.wikipedia.org/w/api.php Client libraries!
mwclient Python library for accessing MediaWiki APIs
None
toolserver.org Server for community-developed plugins, addons, extensions, stats and hacks
– tools Tools often explicitly implements implicit editing community standards (“community API”) Toolserver
TemplateTiger toolserver.org/~kolossos/templatetiger/ For a few dozen Wikipedia languages, & Wikimedia
Commons Lets you query templates very much like SQL
identi.ca/pfctdayelise
[email protected]
Thanks! Logos and screenshots may be copyright their
respective owners Slides are otherwise © Brianna Laugher