Scala Spark

Uploaded by

The document discusses how functional abstractions from previous Scala courses can be mapped to computations over massive datasets on multiple machines. It notes that functional frameworks like Spark make scaling computations easier than imperative systems. Learners will analyze large data tasks like K-means functionally and see how they can be implemented in Spark. While many data science courses use languages like R, Python, Octave and MATLAB for small datasets, these don't allow scaling to large datasets without reimplementing algorithms from scratch in systems like Hadoop or Spark.

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Scala Spark

Uploaded by

Mastan

0% found this document useful (0 votes)

10 views2 pages

Original Description:

Original Title

Scala_Spark

Copyright

Available Formats

DOCX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Download as docx, pdf, or txt

0% found this document useful (0 votes)

10 views2 pages

Scala Spark

Uploaded by

Mastan

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Download as docx, pdf, or txt

Jump to Page

You are on page 1of 2

Search inside document

how to map some of the functional abstractions that you've learned in previous Scala courses to

computations on multiple machines over massive data sets.

What is, we will see first-hand how the functional abstractions that we've covered in the previous Scala
courses makes it easier and more user-friendly to scale computations over large clusters. Or easier, per
se, than scalingcomputations on imperative frameworks, imperative systems fordistributedcomputation.

we're always going to focus on analyzing large data sets. That is you'll be challenged to think about
common data science tasks like K-means functionally, such as that they can be adopted to and
implemented in the context of Spark.

A functionally oriented framework for large scale data processing that's implemented in Scala

you might beasking well, if we're going to be focusing on a lightweight data science flavor of the
processing tasks, then why are we bothering with Scala and

why are we bothering with Spark? After all

if you want to learn data science in the classroom off of statistics professor's favorite languages or

frameworks like R or Python or Octave and/or MATLAB.

So then why should one bother running Scala or Spark which are both arguably very unlike R, Python,
Octave and MATLAB? The answer is that these language and frameworks are good for data science in
the small.

Algorithms on data sets that are perhaps just a few hundred megabytes or even a few gigabytes in size.
However, once the dataset becomes too large to fit into main memory on one computer, it suddenly
becomes much more difficult to use one of these languages or frameworks alone.

if your small dataset grows into a much larger data set than these languages and frameworks like
R,Python, MATLAB, etc. They won't allow you to scale,you'll need to start completely from scratch
reimplementing all of your algorithms using a system like Hadoop or Spark anyway. We'll need to
manually figure out how to distribute your problem over many machines without the help of such a
framework.

Which is kind of a bad idea if you're not already an expert in building distributed systems.

there's also this wholehuge massive industry shift towards data-oriented decision making. Nowadays,
many companies across manydifferent industries have realized that by looking more closely at the data
they'recollecting from device logs to health or genetic data, they can innovate in ways that were
impossible before. For example, now we have all of these devices surrounding us, collecting information
and attempting to provide all kinds of insights to enrich our day-to-day lives.

instead, imagine hundreds of thousands of users of some device, say a smartphone or

some wearable or something. And imagine as part of your job, you'rresponsible for providing some
analysis or insight behind all of the data that's collected.

CTS INTERNSHIP REPORT - Mohak
Document32 pages
CTS INTERNSHIP REPORT - Mohak
KANIKA RAI
50% (4)
Getting Started With Hazelcast - Second Edition - Sample Chapter
Document14 pages
Getting Started With Hazelcast - Second Edition - Sample Chapter
Packt Publishing
0% (1)
The Big Big Data' Question Hadoop or Spark
Document3 pages
The Big Big Data' Question Hadoop or Spark
Rajiv Nayan
No ratings yet
Spark 101
Document25 pages
Spark 101
Daniel Ortiz
No ratings yet
Big Data Processing With Apache Spark - Infoqdotcom
Document16 pages
Big Data Processing With Apache Spark - Infoqdotcom
abhijitch
No ratings yet
Big Data Processing With Apache Spark
Document17 pages
Big Data Processing With Apache Spark
abhijitch
No ratings yet
Machine Learning With Spark - Sample Chapter
Document36 pages
Machine Learning With Spark - Sample Chapter
Packt Publishing
100% (1)
Ebook: Data Visualization Tools For Users (English)
Document26 pages
Ebook: Data Visualization Tools For Users (English)
BBVA Innovation Center
No ratings yet
Step by Step Guide To Become Big Data Developer
Document15 pages
Step by Step Guide To Become Big Data Developer
Saggam Bharath
75% (4)
Data Science - UNIT-3 - Notes
Document32 pages
Data Science - UNIT-3 - Notes
catsa dogga
No ratings yet
Machine Learning Python Packages
Document9 pages
Machine Learning Python Packages
Nandkumar Khachane
No ratings yet
Research Paper On Big Data Hadoop
Document5 pages
Research Paper On Big Data Hadoop
t1tos1z0t1d2
100% (1)
(Download PDF) Algorithms and Data Structures in Action Meap V12 Marcello La Rocca Online Ebook All Chapter PDF
Document42 pages
(Download PDF) Algorithms and Data Structures in Action Meap V12 Marcello La Rocca Online Ebook All Chapter PDF
yvonne.thorsness561
100% (14)
Full Learning Ray (Fifth Early Release) Max Pumperla Ebook All Chapters
Document49 pages
Full Learning Ray (Fifth Early Release) Max Pumperla Ebook All Chapters
jhamalamaran
No ratings yet
Map Reduce
Document13 pages
Map Reduce
Harshali Kalunge
No ratings yet
Real-Time Big Data Analytics - Sample Chapter
Document30 pages
Real-Time Big Data Analytics - Sample Chapter
Packt Publishing
100% (2)
Beginning Database Design
Document2 pages
Beginning Database Design
I Made Putrama
No ratings yet
Basic Libraries For Data Science
Document4 pages
Basic Libraries For Data Science
sgoranks
No ratings yet
Apache Spark Essential Training
Document30 pages
Apache Spark Essential Training
Fernando Andrés Hinojosa Villarreal
No ratings yet
Evaluative Summary On Databricks' Value Propositions
Document2 pages
Evaluative Summary On Databricks' Value Propositions
Saad Sadiq
No ratings yet
Apache Spark Things To Know
Document8 pages
Apache Spark Things To Know
sparkaredla
No ratings yet
Parallel Python with Dask
From Everand
Parallel Python with Dask
Tim Peters
No ratings yet
Parallel Python with Dask: Perform distributed computing, concurrent programming and manage large dataset
From Everand
Parallel Python with Dask: Perform distributed computing, concurrent programming and manage large dataset
Tim Peters
No ratings yet
Master Microsoft Excel
Document6 pages
Master Microsoft Excel
Mohit Bansal
No ratings yet
Learning Real-Time Processing With Spark Streaming - Sample Chapter
Document30 pages
Learning Real-Time Processing With Spark Streaming - Sample Chapter
Packt Publishing
No ratings yet
How Is Mapreduce A Good Method To Analyse HTTP Server Logs?: 2 Answers
Document3 pages
How Is Mapreduce A Good Method To Analyse HTTP Server Logs?: 2 Answers
Vaddi Ramanjaneyulu
No ratings yet
What Is Bigdata
Document5 pages
What Is Bigdata
vaddeseetharamaiah
No ratings yet
6th Sem Cse Data Science Analytics SM o
Document40 pages
6th Sem Cse Data Science Analytics SM o
Tushar Chaudhari
No ratings yet
Thesis Apache Spark
Document4 pages
Thesis Apache Spark
iapesmiig
100% (2)
Spark2x: Big Data Huawei Course
Document25 pages
Spark2x: Big Data Huawei Course
Thiago Siqueira
No ratings yet
Big Data Technologies
Document31 pages
Big Data Technologies
AdiTan00
No ratings yet
Unit 4
Document60 pages
Unit 4
Ramstage Testing
No ratings yet
The Next Database Revolution: Jim Gray Microsoft 455 Market St. #1650 San Francisco, CA, 94105 USA
Document4 pages
The Next Database Revolution: Jim Gray Microsoft 455 Market St. #1650 San Francisco, CA, 94105 USA
nguyendinh126
No ratings yet
The Modern Data Warehouse in Azure: Building with Speed and Agility on Microsoft’s Cloud Platform
From Everand
The Modern Data Warehouse in Azure: Building with Speed and Agility on Microsoft’s Cloud Platform
Matt How
No ratings yet
Research Paper On Apache Hadoop
Document6 pages
Research Paper On Apache Hadoop
soezsevkg
100% (1)
Developing Analytic Talent: Becoming a Data Scientist
From Everand
Developing Analytic Talent: Becoming a Data Scientist
Vincent Granville
Rating: 3 out of 5 stars
3/5 (7)
Fast Data Processing With Spark - Second Edition - Sample Chapter
Document18 pages
Fast Data Processing With Spark - Second Edition - Sample Chapter
Packt Publishing
No ratings yet
Introduction To Big Data Technologies
Document10 pages
Introduction To Big Data Technologies
indolent56
No ratings yet
Learning Spark Preview Ed
Document18 pages
Learning Spark Preview Ed
linux87s
No ratings yet
PolyBase Revealed: Data Virtualization with SQL Server, Hadoop, Apache Spark, and Beyond
From Everand
PolyBase Revealed: Data Virtualization with SQL Server, Hadoop, Apache Spark, and Beyond
Kevin Feasel
No ratings yet
The Data Science Toolkit
Document5 pages
The Data Science Toolkit
guruvarshniganesapandi
No ratings yet
Core Libraries For Machine Learning
Document5 pages
Core Libraries For Machine Learning
Nandkumar Khachane
No ratings yet
COLL Report Typesafe Apache Spark
Document24 pages
COLL Report Typesafe Apache Spark
RahulAgarwal
No ratings yet
Learning Cascading
From Everand
Learning Cascading
Michael Covert
No ratings yet
BDE Pertemuan 1
Document20 pages
BDE Pertemuan 1
Ignatius Joko Dewanto
No ratings yet
Mapreduce Google Research Paper
Document7 pages
Mapreduce Google Research Paper
zgkuqhxgf
100% (1)
Getting Started With Hadoop
Document47 pages
Getting Started With Hadoop
TeeMan27
No ratings yet
Instant Ebooks Textbook Machine Learning With Spark Nick Pentreath Download All Chapters
Document43 pages
Instant Ebooks Textbook Machine Learning With Spark Nick Pentreath Download All Chapters
sanhaayzhan
No ratings yet
PDF Advanced Analytics With Pyspark 1st Edition Akash Tandon Download
Document33 pages
PDF Advanced Analytics With Pyspark 1st Edition Akash Tandon Download
kaeennaviti
No ratings yet
Spss
Document35 pages
Spss
Dilshad Shah
No ratings yet
A - Learning - Oreilly.com-Preface Data Engineering With AWS
Document6 pages
A - Learning - Oreilly.com-Preface Data Engineering With AWS
Meraldo Antonio
No ratings yet
Mapreduce Research Paper
Document5 pages
Mapreduce Research Paper
qhfyuubnd
100% (1)
Fast and Interactive Analytics Over Hadoop Data With Spark
Document7 pages
Fast and Interactive Analytics Over Hadoop Data With Spark
Sami Dick
No ratings yet
Hands-On Machine Learning Recommender Systems with Apache Spark
From Everand
Hands-On Machine Learning Recommender Systems with Apache Spark
Ernesto Lee
No ratings yet
Fabric Data Science
Document652 pages
Fabric Data Science
pascalburume
No ratings yet
DataStage Vs Informatica
Document3 pages
DataStage Vs Informatica
vkaturiLS
No ratings yet
18 Free Exploratory Data Analysis Tools For People Who Don't Code So Well
Document14 pages
18 Free Exploratory Data Analysis Tools For People Who Don't Code So Well
ajmal
No ratings yet
Fabric Data Science 1 150
Document150 pages
Fabric Data Science 1 150
pascalburume
No ratings yet
1 Introduction To Data Structures
Document3 pages
1 Introduction To Data Structures
at9187
No ratings yet
Large Scale and MultiStructured Databases
Document223 pages
Large Scale and MultiStructured Databases
Franco Terranova
No ratings yet
Srikanth Resume
Document5 pages
Srikanth Resume
Quantico Smith
100% (1)
Pranjali Mishra Resume BusinessAnalyst
Document1 page
Pranjali Mishra Resume BusinessAnalyst
pranjali mishra IPBA 2021 Batch 10
No ratings yet
Spark Healthcare
Document23 pages
Spark Healthcare
sabinaahmetaj815
No ratings yet
Cloudera Developer Training For Apache Spark
Document3 pages
Cloudera Developer Training For Apache Spark
kesh
No ratings yet
Big Data Analytics Tools and Technologies With Key Features
Document2 pages
Big Data Analytics Tools and Technologies With Key Features
Emilia koley
No ratings yet
Kafka Reference Architecture
Document12 pages
Kafka Reference Architecture
mbhangale
No ratings yet
Cours - Kafka
Document72 pages
Cours - Kafka
nadir nadjem
No ratings yet
Big Data Analytics: Snapshot of Class Lab and Data Camp Course
Document38 pages
Big Data Analytics: Snapshot of Class Lab and Data Camp Course
Fasih Dawood
No ratings yet
Name: Sadikshya Khanal Section: C3G2: Workshop - 9 - Hadoop Part 2
Document51 pages
Name: Sadikshya Khanal Section: C3G2: Workshop - 9 - Hadoop Part 2
Sadikshya khanal
No ratings yet
Introduction To Apache Spark (Spark) : - by Praveen
Document19 pages
Introduction To Apache Spark (Spark) : - by Praveen
vasari8882573
No ratings yet
Mongodb Whats New 3.4
Document16 pages
Mongodb Whats New 3.4
Abhiroop Roy
No ratings yet
Cloudera Data Scientist - Xebia Training
Document9 pages
Cloudera Data Scientist - Xebia Training
amit bhalla
No ratings yet
Cheat Sheet: From Spark Data Sources SQL Queries
Document1 page
Cheat Sheet: From Spark Data Sources SQL Queries
Karthigai Selvan
No ratings yet
The Data Engineering Cookbook: Andreas Kretz December 2, 2018 v0.1
Document40 pages
The Data Engineering Cookbook: Andreas Kretz December 2, 2018 v0.1
ifti24
No ratings yet
Chandan Prakash's Blog
Document4 pages
Chandan Prakash's Blog
nilesh86378
No ratings yet
BDA UNIT-2 (Final)
Document27 pages
BDA UNIT-2 (Final)
Sai Hareen
No ratings yet
IMQAV
Document3 pages
IMQAV
Electron Volt
No ratings yet
CCA-175 Docs and Projects
Document5 pages
CCA-175 Docs and Projects
Murthydvms
No ratings yet
Advanced Programming Using The Spark Core API: in This Chapter
Document69 pages
Advanced Programming Using The Spark Core API: in This Chapter
Huy Nguyễn
No ratings yet
Rohit Kumar Singh
Document5 pages
Rohit Kumar Singh
preeti d
No ratings yet
Hadoop and Java Ques - Ans
Document222 pages
Hadoop and Java Ques - Ans
ravi
No ratings yet
10 SparkBasics
Document45 pages
10 SparkBasics
Petter P
No ratings yet
Hands-On Guide To Apache Spark 3: Build Scalable Computing Engines For Batch and Stream Data Processing Alfonso Antolínez García
Document45 pages
Hands-On Guide To Apache Spark 3: Build Scalable Computing Engines For Batch and Stream Data Processing Alfonso Antolínez García
fawzyydinish
100% (1)
(English) How Disney Hotstar Captures One Billion Emojis! (DownSub - Com)
Document4 pages
(English) How Disney Hotstar Captures One Billion Emojis! (DownSub - Com)
Akash Nawin
No ratings yet
Architecting A Platform For Big Data Analytics
Document23 pages
Architecting A Platform For Big Data Analytics
Sorin Miu
No ratings yet
Big Data Analytics
Document13 pages
Big Data Analytics
star
No ratings yet
Dav Yeshiva University
Document37 pages
Dav Yeshiva University
AMit Prasad
No ratings yet
AZ 900 Notes 1653008274
Document92 pages
AZ 900 Notes 1653008274
Bolivar Barrios
No ratings yet
Project Report
Document37 pages
Project Report
Somesh Rewadkar
No ratings yet
Azure DataBricks Interview Questions
Document17 pages
Azure DataBricks Interview Questions
sunitacrm
No ratings yet