Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Big Data Analytics
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Matt Wood
August 01, 2012
Technology
7
1.3k
Big Data Analytics
An introduction to Big Data Analytics in the cloud.
Matt Wood
August 01, 2012
Tweet
Share
More Decks by Matt Wood
See All by Matt Wood
Field Notes from Expeditions in the Cloud
mza
2
460
A Platform for Big Data
mza
6
810
The Data Lifecycle
mza
5
550
Provision Throughput Like a Boss
mza
0
500
Impact of Cloud Computing: Life Sciences
mza
2
900
Latency's Worst Nightmare: Performance Tuning Tips and Tricks
mza
4
1.1k
Under the Covers of DynamoDB
mza
4
1.2k
From Analytics to Intelligence: Amazon Redshift
mza
9
1k
Scaling Science
mza
3
550
Other Decks in Technology
See All in Technology
プロダクト成長を支える開発基盤とスケールに伴う課題
yuu26
4
1.3k
予期せぬコストの急増を障害のように扱う――「コスト版ポストモーテム」の導入とその後の改善
muziyoshiz
1
1.6k
広告の効果検証を題材にした因果推論の精度検証について
zozotech
PRO
0
120
Claude_CodeでSEOを最適化する_AI_Ops_Community_Vol.2__マーケティングx_AIはここまで進化した.pdf
riku_423
2
450
Embedded SREの終わりを設計する 「なんとなく」から計画的な自立支援へ
sansantech
PRO
3
2.1k
データ民主化のための LLM 活用状況と課題紹介(IVRy の場合)
wxyzzz
2
670
レガシー共有バッチ基盤への挑戦 - SREドリブンなリアーキテクチャリングの取り組み
tatsukoni
0
200
2026年、サーバーレスの現在地 -「制約と戦う技術」から「当たり前の実行基盤」へ- /serverless2026
slsops
2
210
顧客の言葉を、そのまま信じない勇気
yamatai1212
1
340
オープンウェイトのLLMリランカーを契約書で評価する / searchtechjp
sansan_randd
3
650
Agile Leadership Summit Keynote 2026
m_seki
1
430
Digitization部 紹介資料
sansan33
PRO
1
6.8k
Featured
See All Featured
Into the Great Unknown - MozCon
thekraken
40
2.2k
Primal Persuasion: How to Engage the Brain for Learning That Lasts
tmiket
0
240
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
32
2k
Technical Leadership for Architectural Decision Making
baasie
1
240
Code Reviewing Like a Champion
maltzj
527
40k
The State of eCommerce SEO: How to Win in Today's Products SERPs - #SEOweek
aleyda
2
9.5k
Utilizing Notion as your number one productivity tool
mfonobong
2
210
Visualization
eitanlees
150
17k
The #1 spot is gone: here's how to win anyway
tamaranovitovic
2
930
How to Grow Your eCommerce with AI & Automation
katarinadahlin
PRO
0
100
Java REST API Framework Comparison - PWX 2021
mraible
34
9.1k
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
55
3.2k
Transcript
Big Data Analytics w i t h A m a
z o n W e b S e r v i c e s Dr. Matt Wood An Online Seminar for Partners. Wednesday 1st August.
Hello, and thank you.
Big Data Analytics An introduction
Big Data Analytics An introduction The story of analytics on
AWS
Big Data Analytics An introduction The story of analytics on
AWS Integrating partners
Big Data Analytics An introduction The story of analytics on
AWS Integrating partners Partner success stories
INTRODUCING BIG DATA 1
Data for competitive advantage.
Customer segmentation, financial modeling, system analysis, line-of-sight, business intelligence. Using
data
Generation Collection & storage Analytics & computation Collaboration & sharing
Cost of data generation is falling.
Generation Collection & storage Analytics & computation Collaboration & sharing
lower cost, increased throughput
Generation Collection & storage Analytics & computation Collaboration & sharing
HIGHLY CONSTRAINED
Very high barrier to turning data into information.
Move from a data generation challenge to analytics challenge.
Enter the Cloud.
Remove the constraints.
Enable data-driven innovation.
Move to a distributed data approach.
Maturation of two things.
Maturation of two things. Software for distributed storage and analysis
Maturation of two things. Software for distributed storage and analysis
Infrastructure for distributed storage and analysis
Frameworks for data-intensive workloads. Software Distributed by design.
Platform for data-intensive workloads. Infrastructure Distributed by design.
Support the data timeline.
Generation Collection & storage Analytics & computation Collaboration & sharing
HIGHLY CONSTRAINED
Generation Collection & storage Analytics & computation Collaboration & sharing
Lower the barrier to entry.
Accelerate time to market and increase agility.
Enable new business opportunities.
Washington Post Pinterest NASA
“AWS enables Pfizer to explore difficult or deep scientific questions
in a timely, scalable manner and helps us make better decisions more quickly” Michael Miller, Pfizer
THE STORY OF ANALYTICS 2
EC2 Utility computing. 6 years young.
Embarrassingly parallel problems. Scale out systems Queue based distribution. Small,
medium and high scale.
None
None
None
EC2 Utility computing. 6 years young. Cost optimization.
Achieving economies of scale 100% Time
Reserved capacity Achieving economies of scale 100% Time
Reserved capacity Achieving economies of scale 100% Time On-demand
Reserved capacity Achieving economies of scale 100% Time On-demand UNUSED
CAPACITY
Bid on unused EC2 capacity. Spot Instances Very large discount.
Perfect for batch runs. Balance cost and scale.
$650 per hour
Pattern for distributed computing. Map/reduce Software frameworks such as Hadoop.
Write two functions. Scale up.
Pattern for distributed computing. Map/reduce Software frameworks such as Hadoop.
Write two functions. Scale up. Complex cluster configuration and management.
Managed Hadoop clusters. Amazon Elastic MapReduce Easy to provision and
monitor. Write two functions. Scale up. Optimized for S3 access.
Input data S3 UNDER THE HOOD i i
Elastic MapReduce Code Input data S3 UNDER THE HOOD i
i
Elastic MapReduce Code Name node Input data S3 UNDER THE
HOOD i i
Elastic MapReduce Code Name node Input data S3 Elastic cluster
UNDER THE HOOD i i
Elastic MapReduce Code Name node Input data S3 Elastic cluster
HDFS UNDER THE HOOD i i
Elastic MapReduce Code Name node Input data S3 Elastic cluster
HDFS Queries + BI Via JDBC, Pig, Hive UNDER THE HOOD i i
Elastic MapReduce Code Name node Output S3 + SimpleDB Input
data S3 Elastic cluster HDFS Queries + BI Via JDBC, Pig, Hive UNDER THE HOOD i i
Output S3 + SimpleDB Input data S3 UNDER THE HOOD
i i
None
None
None
None
None
None
None
None
None
None
None
None
None
None
Performance
Performance Compute performance
Intel Xeon E5-2670 Cluster Compute 10 gig E non-blocking network
Placement groupings 60.5 Gb UNDER THE HOOD i i
Intel Xeon E5-2670 Cluster Compute 10 gig E non-blocking network
Placement groupings 60.5 Gb UNDER THE HOOD i i + GPU enabled instances
Performance Compute performance
Performance Compute performance IO performance
NoSQL Unstructured data storage.
Predictable, consistent performance DynamoDB Unlimited storage No schema for unstructured
data Single digit millisecond latencies Backed on solid state drives
...and SSDs for all. New Hi1 storage instances.
2 x 1Tb SSDs hi1.4xlarge 10 GigE network HVM: 90k
IOPS read, 9k to 75k write PV: 120k IOPS read, 10k to 85k write UNDER THE HOOD i i
Netflix “The hi1.4xlarge configuration is about half the system cost
for the same throughput.” http://techblog.netflix.com/2012/07/benchmarking-high-performance-io-with.html
EBS Elastic Block Store
Provisioned IOPS Provision required IO performance
Provisioned IOPS Provision required IO performance + EBS-optimized instances with
dedicated throughput
Generation Collection & storage Analytics & computation Collaboration & sharing
Performance + ease of use
PARTNER INTEGRATION 3
Extend platform with partners
Innovate on behalf of customers
Remove undifferentiated heavy lifting
Rolled the Amazon Hadoop optimizations into MapR MapR distribution for
EMR Choice for EMR customers Easy deployment for MapR customers
Hadoop distribution MapR distribution for EMR Integrated into EMR NFS
and ODBC drivers High availability and cluster mirroring
Enterprise data toolchain Informatica on EMR “Swiss army knife” for
data formats Data integration Available to all on EMR
AWS Marketplace Karmasphere, Marketshare, Acunu Cassandra, Metamarkets, Aspera and more.
aws.amazon.com/marketplace
PARTNER SUCCESS STORIES 4
Razorfish
3.5 billion records 71MM unique cookies 1.7MM targeted ads per
day
3.5 billion records 71MM unique cookies 1.7MM targeted ads per
day 500% improvement in return on ad spend.
Cycle Computing + Schrodinger
30k cores, $4200 an hour (compared to $10+ million)
Marketshare + Ticketmaster Optimize live event pricing
Reduced developer infrastructure management time by 3 hours a day
Thank you!
Q & A
[email protected]
@mza on Twitter