Welcome to Scribd!

100% found this document useful (1 vote)

432 views

Shark Tank - Web and Social Media Analytics Case Study

Uploaded by

This document summarizes a project using predictive modeling on a Shark Tank dataset to predict whether investments were made based on entrepreneurs' pitches. The following steps were taken: data was cleaned and preprocessed; CART, logistic regression, and random forest models were built using pitch descriptions alone and with an additional funding ratio variable; models were evaluated and compared, with logistic regression achieving 100% accuracy and performing best. Logistic regression was selected as the best model for further analysis.

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Shark Tank - Web and Social Media Analytics Case Study

Uploaded by

Shyam Kishore Tripathi

100% found this document useful (1 vote)

432 views9 pages

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Download as pdf or txt

100% found this document useful (1 vote)

432 views9 pages

Shark Tank - Web and Social Media Analytics Case Study

Uploaded by

Shyam Kishore Tripathi

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Download as pdf or txt

Jump to Page

You are on page 1of 9

Search inside document

A Project on Web and Social

Media Analytics

SHYAM KISHORE TRIPATHI

PGP - BABI

0|Page
Table of Contents

Sl. No. Description Page No.

1 Project Objective 2
2 Defining Business Problem 2
3 Reading data and performing initial data clean up 2
4 CART, Logistic Regression and Random Forest Before Ratio 4
5 CART, Logistic Regression and Random Forest After Ratio 6
6 Comparing Model and Conclusion 8

1|Page
Project Objective
In this exercise we will be developing a predictive model to predict deal or no deal using Shark Tank dataset (US based
show). Complete exercise will be based upon the text mining analytics.

Defining Business Problem

Shark Tank is an US based show wherein entrepreneurs and founders pitch their businesses in front of investors who
decides to invest or not in the businesses based on multiple parameters.

In this dataset, we have got a dataset containing Shark Tank episodes with 495 records where each entrepreneur
made their pitch to investors . Using Social Media Analytics algorithms, we will predict whether given description of
pitch it will convert into success or not.

Reading data and performing initial data clean up

1. Loading dataset
Sharktank = read.csv("Shark Tank.csv", stringsAsFactors=FALSE)

2. Loading required libraries

library(wordcloud)
library(tm)
library(SnowballC)
library(rpart)
library(rpart.plot)

3. Performing Initial Clean Up

# Creating corpus
corpus = Corpus(VectorSource(Sharktank$description))

# Convert to lower-case
corpus = tm_map(corpus, tolower)

# Remove punctuation
corpus = tm_map(corpus, removePunctuation)

# Word cloud before removing stopwords

wordcloud(corpus,colors=rainbow(7),max.words=100)

2|Page
Now we need to normalize the texts before we proceed with further analysis. Below are the steps we need
to perform
1. Converting every text to lower case
2. Removing punctuation marks and stop words.
3. Removing extra white spaces.
4. Perform Stemming of documents

We need to use DTM (Document-Term Matrix) for further analysis when basically we will be converting all
the documents as rows, terms/words as columns, frequency of the term in the document.

This will help us identify unique words in the corpus used frequently.

To reduce the dimensions in DTM, we will remove less frequent words using remove Sparse Terms and
sparsity less than 0.995

3|Page
Converting this dataset into data frame and add dependent variable “deal” into data frame as final step for
data preparation.

4. CART, Logistic Regression and Random Forest Before Ratio

To predict whether investors will invest in the businesses we will use deal as an output variable and use the
CART, logistic regression and random forest models to measure the performance and accuracy of the model.

a) Building CART Model

4|Page
Evaluating CART Model

b) Building Random Forest Model

5|Page
c) Building Logistic Regression Model

5. CART, Logistic Regression and Random Forest After Ratio

Now we will add additional variable “ Ratio “which will be derived using column ask for/valuation and then
we will re-run the models to see if we can have improved accuracy in the models

a) Building CART Model

6|Page
b) Building Random Forest Model

c) Building Logistic Regression Model

7|Page
6. Comparing Model and Conclusion

Action CART Model Logistic Regression Model Random Forest Model

Before Ratio Accuracy 65.65% 99.79% 55.35%
After Ratio Accuracy 66.06% 100% 55.75%

With CART Model we were able to predict around 65.65% and 66.06% accurate results using only description
and description + ratio respectively.

With Random Forest, we were able to predict 55.35% and 55.75% accurate results using only description and
description + ratio respectively.

With Logistic regression, it gave us 100% accuracy description and description + ratio .

From the above analysis we can confirm that Logistic Regression is the best model for proceeding further with
insight analysis.

8|Page

ISB Cybersecurity For Leaders Brochure
Document22 pages
ISB Cybersecurity For Leaders Brochure
r.jeyashankar9550
No ratings yet
Teaching English Pronunciation For A Global World-WALKER - ARCHER
Document13 pages
Teaching English Pronunciation For A Global World-WALKER - ARCHER
Marina Melo
No ratings yet
Capstone Project 1
Document20 pages
Capstone Project 1
pranavi p
100% (1)
Customer Churn Data - A Project Based On Logistic Regression
Document31 pages
Customer Churn Data - A Project Based On Logistic Regression
Shyam Kishore Tripathi
100% (12)
Analysis of Transport Choice of Employees - A Project On Machine Learning
Document24 pages
Analysis of Transport Choice of Employees - A Project On Machine Learning
Shyam Kishore Tripathi
100% (10)
Analysis of Transport Choice of Employees - A Project On Machine Learning
Document24 pages
Analysis of Transport Choice of Employees - A Project On Machine Learning
Shyam Kishore Tripathi
100% (10)
Machine Learning Assignment Report - Cars
Document42 pages
Machine Learning Assignment Report - Cars
surajramkumar
100% (4)
Telecom Churn Report
Document66 pages
Telecom Churn Report
Abhay Poddar
No ratings yet
Machine Learning Project
Document30 pages
Machine Learning Project
Prasanna rs
67% (3)
Assignment Report - Predictive Modelling - Rahul Dubey
Document18 pages
Assignment Report - Predictive Modelling - Rahul Dubey
Rahul
No ratings yet
Extended Project FastKart SQLite MYSQL 1 1 PDF
Document5 pages
Extended Project FastKart SQLite MYSQL 1 1 PDF
Amey Udapure
No ratings yet
DVT Alternate Project
Document1 page
DVT Alternate Project
Charit Sharma
50% (2)
Data Visualization in Tableau - Car Insurance Claim Project
Document51 pages
Data Visualization in Tableau - Car Insurance Claim Project
Tunde Asaaju
50% (2)
Time Series Forecasting
Document1 page
Time Series Forecasting
Ashu
0% (1)
Telecom Churn Solution
Document28 pages
Telecom Churn Solution
Shyam Kishore Tripathi
100% (5)
Factor-Hair RV PDF
Document23 pages
Factor-Hair RV PDF
Ramachandran Venkataraman
No ratings yet
Project 4 - Predictive Modeling - Telecom Customer Churn Prediction PDF
Document22 pages
Project 4 - Predictive Modeling - Telecom Customer Churn Prediction PDF
Sanan Olachery
No ratings yet
CustomerChurn Assignment
Document15 pages
CustomerChurn Assignment
Malavika R Kumar
100% (3)
End Term Quiz1 - Attempt Review
Document5 pages
End Term Quiz1 - Attempt Review
Aakanksha Gulabdhar Mishra
No ratings yet
Final Churn Prediction
Document16 pages
Final Churn Prediction
Danny Farooq
No ratings yet
Data Mining Project - 27.06.2021
Document6 pages
Data Mining Project - 27.06.2021
vansh gupta
No ratings yet
Finance and Risk Analytics GRP Assgn
Document7 pages
Finance and Risk Analytics GRP Assgn
psyish
No ratings yet
1) Introduction A) Defining Problem Statement:-: ST ST
Document10 pages
1) Introduction A) Defining Problem Statement:-: ST ST
Raveendra Babu Gaddam
No ratings yet
Telecom Churn Solution
Document28 pages
Telecom Churn Solution
Shyam Kishore Tripathi
100% (5)
Par Report
Document5 pages
Par Report
Shyam Kishore Tripathi
No ratings yet
Unethical Issue Found From Hermes
Document2 pages
Unethical Issue Found From Hermes
NicoleKua
0% (1)
Capstone Project
Document7 pages
Capstone Project
Surya Phani
100% (1)
Facebook Comment Volume Prediction
Document20 pages
Facebook Comment Volume Prediction
Vara
No ratings yet
Mini Project DVT
Document3 pages
Mini Project DVT
sumit kumar
No ratings yet
Executive Sumary - Rajarshi Das (Data Visualization Using Tableau Project)
Document11 pages
Executive Sumary - Rajarshi Das (Data Visualization Using Tableau Project)
Rajarshi Das
100% (1)
Cart-Rf-ANN: Prepared by Muralidharan N
Document16 pages
Cart-Rf-ANN: Prepared by Muralidharan N
Krishnaveni Raj
0% (1)
Telecom Customer Churn Prediction Assessment PDF
Document23 pages
Telecom Customer Churn Prediction Assessment PDF
Prakash Jha
100% (1)
Problem Statement1
Document1 page
Problem Statement1
stephennrobert
No ratings yet
Tushar Tukaram Bhakare: Education Skills
Document1 page
Tushar Tukaram Bhakare: Education Skills
SUMEET SARODE
No ratings yet
FRA Assignment - India Credit Model
Document14 pages
FRA Assignment - India Credit Model
psyish
No ratings yet
Customer Churn Analysis
Document10 pages
Customer Churn Analysis
Rahul Jaju
No ratings yet
Social Media Tourism - Capstone Project
Document13 pages
Social Media Tourism - Capstone Project
pranavi p
No ratings yet
Bartlett Test To Check Whether The Data Is Suitable For PCA
Document24 pages
Bartlett Test To Check Whether The Data Is Suitable For PCA
Raveendra Babu Gaddam
100% (2)
FRA Business Report
Document21 pages
FRA Business Report
Surabhi Kulkarni
100% (1)
Answer Report (Preditive Modelling)
Document29 pages
Answer Report (Preditive Modelling)
Shweta Lakhera
100% (1)
MachineLearning Project PDF
Document32 pages
MachineLearning Project PDF
Senthil Kumar
No ratings yet
Time Series
Document34 pages
Time Series
Priti
67% (3)
House Price Prediction
Document3 pages
House Price Prediction
Aeshna Gupta
No ratings yet
Education - Post 12th Standard - CSV
Document11 pages
Education - Post 12th Standard - CSV
Ruhee's Kitchen
No ratings yet
Car Transport Machine Learning
Document28 pages
Car Transport Machine Learning
Satish Patnaik
88% (8)
Milestone 1
Document2 pages
Milestone 1
embryjd
No ratings yet
MRA Project As On 23rd Feb-2020
Document29 pages
MRA Project As On 23rd Feb-2020
Raveendra Babu Gaddam
93% (14)
Palash Bhai - Machine Learning Assignment
Document18 pages
Palash Bhai - Machine Learning Assignment
PalashKulshrestha
100% (1)
Customer Churn Prediction Using Machine Learning: D. Deepika, Nihal Chandra
Document14 pages
Customer Churn Prediction Using Machine Learning: D. Deepika, Nihal Chandra
Karthik 1
100% (1)
CapStone Project
Document4 pages
CapStone Project
Manojay's Directionone
No ratings yet
Fra Project Report-Bajaj Auto Ltd. Vs Hero Motocorp Ltd. (Group-X)
Document10 pages
Fra Project Report-Bajaj Auto Ltd. Vs Hero Motocorp Ltd. (Group-X)
Shaurya Jaiswal
100% (1)
Data Mining Project
Document14 pages
Data Mining Project
sriec12
100% (1)
Data Mining Business Report
Document38 pages
Data Mining Business Report
Thaku Singh
No ratings yet
Customer Churn Prediction Using Big Data Analytics
Document41 pages
Customer Churn Prediction Using Big Data Analytics
Besty H
50% (2)
Predictive Modelling Project 1 PDF
Document38 pages
Predictive Modelling Project 1 PDF
preeti
50% (2)
Cart-Rf-Ann: Prepared by Muralidharan N
Document33 pages
Cart-Rf-Ann: Prepared by Muralidharan N
rakesh sandhyapogu
50% (2)
DVT Group Assignment PDF
Document14 pages
DVT Group Assignment PDF
Anirban bhattacharya
100% (1)
Linear Regression Model For Predicting Medical Expenses Based On Insurance Data
Document6 pages
Linear Regression Model For Predicting Medical Expenses Based On Insurance Data
Adriana Padure
No ratings yet
ML Assignemnt PDF
Document21 pages
ML Assignemnt PDF
Eric Norman
No ratings yet
7z1018 CW Example Predicting House Prices in King County
Document16 pages
7z1018 CW Example Predicting House Prices in King County
Krusty00
No ratings yet
SQL Quiz Results
Document17 pages
SQL Quiz Results
Jram Marj
No ratings yet
MySQL - Week 1 Quiz
Document9 pages
MySQL - Week 1 Quiz
Pratyusha Chamarti
No ratings yet
Lifi
Document16 pages
Lifi
Ankita Mishra
100% (1)
LDA KNN Logistic
Document29 pages
LDA KNN Logistic
shruti gujar
100% (1)
Analytics Center Of Excellence A Complete Guide - 2021 Edition
From Everand
Analytics Center Of Excellence A Complete Guide - 2021 Edition
Gerardus Blokdyk
No ratings yet
RAG with math
Document7 pages
RAG with math
kbdsoft
No ratings yet
Project 2 Factor Hair Revised Case Study
Document25 pages
Project 2 Factor Hair Revised Case Study
rishit
No ratings yet
Australian Gas Production - Project On Time Series Forecasting
Document29 pages
Australian Gas Production - Project On Time Series Forecasting
Shyam Kishore Tripathi
100% (19)
Market Segmentation - Product Service Management
Document16 pages
Market Segmentation - Product Service Management
Shyam Kishore Tripathi
No ratings yet
Demand and Supply Mathematics
Document6 pages
Demand and Supply Mathematics
Bellindah G
No ratings yet
Class X Maths Important Questions
Document92 pages
Class X Maths Important Questions
asha jhawar
No ratings yet
Student Autonomy in Online Learning: Nataliya Serdyukova and Peter Serdyukov
Document5 pages
Student Autonomy in Online Learning: Nataliya Serdyukova and Peter Serdyukov
Thang Ho
No ratings yet
Little Child Jesus Christian Academy Cabiao, Nueva Ecija, Inc
Document4 pages
Little Child Jesus Christian Academy Cabiao, Nueva Ecija, Inc
Eunice
No ratings yet
Geo Asia 04 Schaefer
Document7 pages
Geo Asia 04 Schaefer
Geotecnia
No ratings yet
R.Venugopal 3C
Document49 pages
R.Venugopal 3C
parmindersin
No ratings yet
AJP Micro Project
Document10 pages
AJP Micro Project
Tanmay Patil
No ratings yet
Weekly Plan Sheet
Document25 pages
Weekly Plan Sheet
AVENGER OFFICER
No ratings yet
Duplomatic Valve
Document4 pages
Duplomatic Valve
HBracing1
No ratings yet
Lecture 11 Propositional Logic Part 3 Natural Deduction
Document40 pages
Lecture 11 Propositional Logic Part 3 Natural Deduction
mminseol
No ratings yet
Wall Street English Blended Learning Method
Document3 pages
Wall Street English Blended Learning Method
Herman Suprapto
No ratings yet
Game
Document7 pages
Game
josskripti
No ratings yet
Lesson 1.4 Domain and Range of Functions PDF
Document36 pages
Lesson 1.4 Domain and Range of Functions PDF
hi myname
No ratings yet
5G Technology: Opportunities and Challenges
Document4 pages
5G Technology: Opportunities and Challenges
abdulghaniyu obaro
No ratings yet
Gesture
Document6 pages
Gesture
Kelappan Nair
No ratings yet
Lecture3 Mech SU
Document15 pages
Lecture3 Mech SU
Nazeeh Abdulrhman Albokary
No ratings yet
Lesson Plan
Document7 pages
Lesson Plan
Wing Jintalan Buen
No ratings yet
CL86T V41
Document13 pages
CL86T V41
Lenin Pastrana
No ratings yet
Vek M4D / Vek M4Dc 4-Channel Loop Detectors For Traffic Light Systems and Car Park Counting
Document3 pages
Vek M4D / Vek M4Dc 4-Channel Loop Detectors For Traffic Light Systems and Car Park Counting
FJGV7
No ratings yet
CV. Gentur Maladi - Inspector 10
Document6 pages
CV. Gentur Maladi - Inspector 10
abdurrahman warsyah
No ratings yet
Ricardo Large Engines Brochure
Document28 pages
Ricardo Large Engines Brochure
Martin Kratky
No ratings yet
Essential Elements of Legal Right - Jurisprudence - SRD Law Notes
Document8 pages
Essential Elements of Legal Right - Jurisprudence - SRD Law Notes
cakishan kumar gupta
No ratings yet
Applications of Fuzzy Logic in Geographic Informat
Document11 pages
Applications of Fuzzy Logic in Geographic Informat
Hamdi Nur
No ratings yet
Agriculture Term Paper Topics
Document5 pages
Agriculture Term Paper Topics
fuhukuheseg2
100% (1)
Project in Math: Fourth Quarter
Document2 pages
Project in Math: Fourth Quarter
JiyahnBay
No ratings yet
3 - Performances Entrelaçadas
Document33 pages
3 - Performances Entrelaçadas
Suzete Gomes
No ratings yet
Ford Motor Company Advanced Product Quality Planning (APQP) Status Reporting Guideline
Document3 pages
Ford Motor Company Advanced Product Quality Planning (APQP) Status Reporting Guideline
Sushil Kumar
No ratings yet