Apache Spark & Scala Course Content
Apache Spark & Scala Course Content
Apache Spark & Scala Course Content
Integrations
What is SBT?
Integration of Scala in Eclipse IDE.
Integration of SBT with Eclipse.
Workers,Cluster Managers,Driver Programs,Executors,Tasks
ELANCERSOFTSOLUTIONS
H.NO: 46/B, I V Reddy Hospital, SR Nagar, Hyderabad-500038.
PH: 040-48540745, +91-9704249988 EMAIL: [email protected] www.online.elancersoft.com
Coding Spark jobs in Scala
Data Sources
Exploring the Spark shell -> Creating Spark Context.
RDD Programming
Operations on RDD.
Lazy Operations
Caching
RDD Caching Methods,RDD Caching Is Fault Tolerant,Cache Memory Management
Spark Jobs
Shared Variables,Broadcast Variables,Accumulators
Configuring and running the Spark cluster.
Exploring to Multi Node Spark Cluster.
Cluster management
Submitting Spark jobs and running in the cluster mode.
Developing Spark applications in Eclipse
Tuning and Debugging Spark.
Two Projects using Core Spark
Application Programming Interface (API)
StreamingContext
Basic Structure of a Spark Streaming
Application Discretized Stream (DStream)
Creating a DStream
Processing a Data Stream
Output Operations
Window Operation
Discretized streams RDD.
Applying Transformations and Actions on Streaming Data
Integration with Flume and Kafka.
Integration with Cassandra.
Monitoring streaming jobs.
Use case with spark core and spark Streaming
ELANCERSOFTSOLUTIONS
H.NO: 46/B, I V Reddy Hospital, SR Nagar, Hyderabad-500038.
PH: 040-48540745, +91-9704249988 EMAIL: [email protected] www.online.elancersoft.com
WEEK-4 ->SPARK SQL
Introduction to Apache Spark SQL
Understanding the Catalyst optimizer
How it works…,Analysis, Logical plan optimization,Physical planning,Code generation
Creating HiveContext
Inferring schema using case classes
Programmatically specifying the schema
The SQL context
Importing and saving data
Processing the Text files,JSON and Parquet Files
Data Frames
Using Hive
Application Programming Interface (API)
Key Abstractions,Creating DataFrames,Processing Data Programmatically with SQL/HiveQL
Processing Data with the DataFrame API
Saving a DataFrame
Built-in Functions
Aggregate,Collection,Date/Time,Math,String,Window
UDFs and UDAFs
Interactive Analysis Example
Interactive Analysis with Spark SQL JDBC Server
Local Hive Metastore server
Loading and saving data using the Parquet format
Loading and saving data using the JSON format
Loading and saving data from relational databases
Loading and saving data from an arbitrary source
Integrating With Hive
Integrating With MySQl.
ELANCERSOFTSOLUTIONS
H.NO: 46/B, I V Reddy Hospital, SR Nagar, Hyderabad-500038.
PH: 040-48540745, +91-9704249988 EMAIL: [email protected] www.online.elancersoft.com
Using linear regression
Supervised Learning with MLlib – Classification
Doing classification using logistic regression
Doing classification using decision trees
Doing classification using Random Forests
Doing classification using Gradient Boosted Trees
Doing classification with Naïve Bayes
Unsupervised Learning with MLlib
Clustering using k-means
Dimensionality reduction with principal component analysis
Building the Spark server
ELANCERSOFTSOLUTIONS
H.NO: 46/B, I V Reddy Hospital, SR Nagar, Hyderabad-500038.
PH: 040-48540745, +91-9704249988 EMAIL: [email protected] www.online.elancersoft.com
Updating and Deleting Data.
Two REAL TIME PROJECTS Covering all the above concepts.
ELANCERSOFTSOLUTIONS
H.NO: 46/B, I V Reddy Hospital, SR Nagar, Hyderabad-500038.
PH: 040-48540745, +91-9704249988 EMAIL: [email protected] www.online.elancersoft.com