Install Pyspark On Windows, Mac & Linux - DataCamp - 1
Install Pyspark On Windows, Mac & Linux - DataCamp - 1
Olivia Smith
Senior developer at CMARIX TechnoLabs. Writes about trending technologies like AI & ML
TO P I C S
Python
Data Science
Windows Installation
Linux Installation
Mac Installation
Apache Spark is a new and open-source framework used in the big data industry for real-
time processing and batch processing. It supports different languages, like Python, Scala,
Java, and R.
https://www.datacamp.com/tutorial/installation-of-pyspark 1/18
30/05/2023, 18:07 Install Pyspark on Windows, Mac & Linux | DataCamp
Apache Spark is initially written in a Java Virtual Machine(JVM) language called Scala,
whereas Pyspark is like a Python API which contains a library called Py4J. This allows
dynamic interaction with JVM objects.
Windows Installation
The installation which is going to be shown is for the Windows Operating System. It
consists of the installation of Java with the environment variable and Apache Spark with
the environment variable.
Java
BLOG installation Offers ends:
CategoryStarted
Sign In
2d 16h Get
27m 27s
https://www.datacamp.com/tutorial/installation-of-pyspark 2/18
30/05/2023, 18:07 Install Pyspark on Windows, Mac & Linux | DataCamp
4. Go to "Command Prompt" and type "java -version" to know the version and know
whether it is installed or not.
https://www.datacamp.com/tutorial/installation-of-pyspark 3/18
30/05/2023, 18:07 Install Pyspark on Windows, Mac & Linux | DataCamp
Note: You can locate your Java file by going to C drive, which is C:\Program Files
(x86)\Java\jdk1.8.0_251' if you've not changed location during the download.
5. Add the Java path
Installing Pyspark
1. Head over to the Spark homepage.
6. Go to the search bar and "EDIT THE ENVIRONMENT VARIABLES.
2. Select the Spark release and package type as following and download the .tgz file.
You can make a new folder called 'spark' in the C directory and extract the given file by
using 'Winrar', which will be helpful afterward.
https://www.datacamp.com/tutorial/installation-of-pyspark 4/18
30/05/2023, 18:07 Install Pyspark on Windows, Mac & Linux | DataCamp
Go to Winutils choose your previously downloaded Hadoop version, then download the
winutils.exe file by going inside 'bin'. The link to my Hadoop version is:
https://github.com/steveloughran/winutils/blob/master/hadoop-2.7.1/bin/winutils.exe
8. Click into "New" to create your new Environment variable.
Make a new folder called 'winutils' and inside of it create again a new folder called
'bin'.Then put the file recently download 'winutils' inside it.
Environment variables
9. Use Variable Name as "JAVA_HOME' and your Variable Value as 'C:\Program Files
(x86)\Java\jdk1.8.0_251'. This is your location of the Java file. Click 'OK' after you've
finished the process.
10. Let's add the User variable and select 'Path' and click 'New' to create it.
https://www.datacamp.com/tutorial/installation-of-pyspark 5/18
30/05/2023, 18:07 Install Pyspark on Windows, Mac & Linux | DataCamp
1. Let's create a new environment where variable name as "hadoop_home" and variable
value to be the location of winutils, which is "C:\winutils" and click "OK".
11. Add the Variable name as 'PATH' and path value as 'C:\Program Files
(x86)\Java\jdk1.8.0_251\bin', which is your location of Java bin file. Click 'OK' after
you've finished the process.
2. For spark, also let's create a new environment where the variable name is
"Spark_home" and the variable value to be the location of spark, which is "C:\spark"
and click "OK".
3. Finally, double click the 'path' and change the following as done below where a new
path is created "%Spark_Home%\bin' is added and click "OK".
https://www.datacamp.com/tutorial/installation-of-pyspark 6/18
30/05/2023, 18:07 Install Pyspark on Windows, Mac & Linux | DataCamp
Linux Installation
The installation which is going to be shown is for the Linux Operating System. It consists of
the installation of Java with the environment variable along with Apache Spark and the
environment variable.
Java Installation
https://www.datacamp.com/tutorial/installation-of-pyspark 7/18
30/05/2023, 18:07 Install Pyspark on Windows, Mac & Linux | DataCamp
2. Move to the download section consisting of the operating system Linux and download
it according to your system requirement.
3. Save the file and click "Ok" to save in your local machine.
4. Go to your terminal and check the recently downloaded file using 'ls' command.
5. Install the package using the following command, which will install the debian
package of java, which is recently downloaded.
https://www.datacamp.com/tutorial/installation-of-pyspark 8/18
30/05/2023, 18:07 Install Pyspark on Windows, Mac & Linux | DataCamp
Installing Spark
6. Finally, you can check your java version using 'java --version' command.
7. For configuring environment variables, let's open the 'gedit' text editor using the
following command.
8. Let's make the change by providing the following information where the 'Java' path is
specified.
https://www.datacamp.com/tutorial/installation-of-pyspark 9/18
30/05/2023, 18:07 Install Pyspark on Windows, Mac & Linux | DataCamp
2. Select the Spark release and package type as following and download the .tgz file.
https://www.datacamp.com/tutorial/installation-of-pyspark 10/18
30/05/2023, 18:07 Install Pyspark on Windows, Mac & Linux | DataCamp
2.
5. Provide the following
Let's extract information
the file using according
the following to your suitable path on your computer. In
command.
my case, the following were the required path to my Spark location, Python path, and
Java path. Also, first press 'Esc' and then type ":wq" to save and exit from vim.
6. After extracting the file, the new file is created and shown using the list('ls') command.
3. To make a final change, save, and exit. This results in accessing the pyspark
command everywhere in the directory.
4. Open pyspark using 'pyspark' command, and the final message will be shown as
below.
https://www.datacamp.com/tutorial/installation-of-pyspark 11/18
30/05/2023, 18:07 Install Pyspark on Windows, Mac & Linux | DataCamp
Mac Installation
The installation which is going to be shown is for the Mac Operating System. It consists of
the installation of Java with the environment variable along with Apache Spark and the
environment variable.
Java Installation
1. Go to Download Java JDK.
Visit Oracle's website for the download of the Java Development Kit (JDK).
2. Move to download section consisting of the operating system Linux and download
according to your system requirement.
https://www.datacamp.com/tutorial/installation-of-pyspark 12/18
30/05/2023, 18:07 Install Pyspark on Windows, Mac & Linux | DataCamp
2. Select the Spark release and package type as following and download the .tgz file.
export SPARK_HOME="/Downloads/spark"
export PATH=$SPARK_HOME/bin:$PATH
export PYSPARK_PYTHON=python3
Open pyspark using 'pyspark' command, and the final message will be shown as below.
https://www.datacamp.com/tutorial/installation-of-pyspark 13/18
30/05/2023, 18:07 Install Pyspark on Windows, Mac & Linux | DataCamp
Congratulations
Congratulations, you have made it to the end of this tutorial!
In this tutorial, you've learned about the installation of Pyspark, starting the installation of
Java along with Apache Spark and managing the environment variables in Windows,
Linux, and Mac Operating System.
If you would like to learn more about Pyspark, take DataCamp's Introduction to Pyspark.
TO P I C S
PySpark Courses
Introduction to PySpark C
Beginner 4 hr 114.6K
https://www.datacamp.com/tutorial/installation-of-pyspark 14/18
30/05/2023, 18:07 Install Pyspark on Windows, Mac & Linux | DataCamp
See More
Related
Building Your Data Science
Portfolio with DataCamp…
Justin Saddlemyer
Moez Ali
Adel Nehme
See More
https://www.datacamp.com/tutorial/installation-of-pyspark 15/18
30/05/2023, 18:07 Install Pyspark on Windows, Mac & Linux | DataCamp
LEARN
Learn Python
Learn R
Learn AI
Learn SQL
Learn Power BI
Learn Tableau
Assessments
Career Tracks
Skill Tracks
Courses
DATA C O U R S E S
Upcoming Courses
Python Courses
R Courses
SQL Courses
Power BI Courses
Tableau Courses
Spreadsheets Courses
https://www.datacamp.com/tutorial/installation-of-pyspark 16/18
30/05/2023, 18:07 Install Pyspark on Windows, Mac & Linux | DataCamp
WO R KS PA C E
Get Started
Templates
Integrations
Documentation
C E R T I F I C AT I O N
Certifications
Data Scientist
Data Analyst
Data Engineer
RESOURCES
Resource Center
Upcoming Events
Blog
Tutorials
Open Source
RDocumentation
Course Editor
PLANS
Pricing
For Business
For Universities
https://www.datacamp.com/tutorial/installation-of-pyspark 17/18
30/05/2023, 18:07 Install Pyspark on Windows, Mac & Linux | DataCamp
DataCamp Donates
S U P PO R T
Help Center
Become an Instructor
Become an Affiliate
ABOUT
About Us
Learner Stories
Careers
Press
Leadership
Contact Us
Privacy Policy Cookie Notice Do Not Sell My Personal Information Accessibility Security
Terms of Use
https://www.datacamp.com/tutorial/installation-of-pyspark 18/18