Record Cloud Computing

Download as pdf or txt
Download as pdf or txt
You are on page 1of 79

NPR COLLEGE OF ENGINEERING &TECHNOLOGY

NATHAM - 624 401

NAME : …………………………………………………….

REGISTER NUMBER : …………………………………………………….

DEPARTMENT : COMPUTER SCIENCE AND ENGINEERING

YEAR & SEM : IV &VII

SUBJECT : CS8711-CLOUD COMPUTING LABORATORY


NPR COLLEGE OF ENGINEERING AND TECHNOLOGY
NATHAM, DINDIGUL -624 401

Name: ………………………………………………………………………………

Year: ………………. …Semester …………………Branch..…………………….

University Register No

CERTIFIED that this a Bonafide Record work done by the above

Student in the ............................................................................................... Laboratory


during the year 20 - 20

Signature of Lab. In-charge Signature of Head of the Department

Submitted for practical examination held on ………………………………………..

Internal Examiner External Examiner


CONTENTS

Date Date
of Page of
S.No Experiment Name of The Experiment Remarks
No Submission

1.

2.

3.

4.

5.

6.

7.

8.

9.

10.
NPR COLLEGE OF ENGINEERING & TECHNOLOGY, NATHAM

VISION

 To develop students with intellectual curiosity and technical expertise to meet the global needs.

MISSION

 To achieve academic excellence by offering quality technical education using best teaching
techniques.
 To improve Industry – Institute interactions and expose industrial atmosphere.
 To develop interpersonal skills along with value based education in a dynamic learning
environment.
 To explore solutions for real time problems in the society.

3
NPR COLLEGE OF ENGINEERING & TECHNOLOGY, NATHAM

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

VISION
 To produce globally competent technical professionals for digitized society.
MISSION
 To establish conducive academic environment by imparting quality education and value added
training.
 To encourage students to develop innovative projects to optimally resolve the challenging social
problems.

PROGRAM EDUCATIONAL OBJECTIVES

Graduates of Computer Science and Engineering Program will be able to:

 Develop into the most knowledgeable professional to pursue higher education and Research or have a
successful carrier in industries.
 Successfully carry forward domain knowledge in computing and allied areas to solve complex and real
world engineering problems.
 Meet the technological revolution they are continuously upgraded with the technical knowledge.

Serve the humanity with social responsibility combined with ethics

4
CS8711 CLOUD COMPUTING LABORATORY LTPC
0042
OBJECTIVES:
 To develop web applications in cloud
 To learn the design and development process involved in creating a cloud based application
 To learn to implement and use parallel programming using Hadoop

LIST OF EXPERIMENTS:

1. Install Virtual box/VMware Workstation with different flavors of Linux or windows OS on top of
windows7 or 8.
2. Install a C compiler in the virtual machine created using virtual box and executes Simple Programs
3. Install Google App Engine. Create hello world app and other simple web applications using python/java.
4. Use GAE launcher to launch the web applications.
5. Simulate a cloud scenario using CloudSim and run a scheduling algorithm that is not present in CloudSim.
6. Find a procedure to transfer the files from one virtual machine to another virtual machine
.7. Find a procedure to launch virtual machine using trystack (Online Openstack Demo Version)
8. Install Hadoop single node cluster and run simple applications like word count.

TOTAL: 60 PERIODS
OUTCOMES:

On completion of this course, the students will be able to:


 Configure various virtualization tools such as Virtual Box, VMware workstation.
 Design and deploy a web application in a PaaS environment
.  Learn how to simulate a cloud environment to implement new schedulers.
 Install and use a generic cloud environment that can be used as a private cloud.
 Manipulate large data sets in a parallel environment.

5
CLOUD COMPUTING LABORATORY
Course Outcomes

After completion of the course, Students are able to learn the listed Course Outcomes.

Knowledge
COs Course Code Course Outcomes
Level
Configure various virtualization tools such as Virtual
CO1 C407.1 K2
box,VMware workstation
Design and Deploy a web application in a PaaS environment link
CO2 C407.2 K2
layer
Learn how to simulate a cloud environment to implement new
CO3 C407.3 K2
schedulers
Demonstrate generic cloud environment that can be used as a
CO4 C407.4 K3
private cloud
CO5 C407.5 Manipulate large data sets in a parallel environment K2
CO6 C407.6 Apply Hadoop single node cluster and run simple applications K3

List of Experiments with COs, POs and PSOs

Exp.
Name of the Experiment COs POs PSOs
No.
1. Virtual box/VMware Workstation. 1,2 &
CO1 1,2,3 & 12
3
2. C compiler. CO1
1,2,3 & 12 1,2
CO2
3. Google App Engine. CO2
1,2,3 & 12 1,2 &3
CO2
4. GAE launcher. CO2
1,2,3,5 & 12 1,2
CO3
5. Cloudsim CO3
1,2,3,9& 12 1,2
CO4
6. Virtual machine operation. CO4 1,2,3,10 &
1,2
CO5 12
7. Launching of virtual machine using try stack. CO4 1,2,3 10,11 1,2 &
CO5 &12 3
8. Hadoop single node cluster. CO4 1,2,3 10,11
1,2 &3
CO5 &12
Additional Experiments
9. Cloud for HPC and HTC
CO5 1,2,3,5 & 12 1,2 &3
10. Performance of Cloud
CO5 1,2,3,5 & 12 1,2 &3

6
Program Outcomes

1. Engineering Knowledge 7. Environment and Sustainability


2. Problem Analysis 8. Ethics
3. Design/Development of Solutions 9. Individual and Team Work
4. Conduct Investigations of Complex Problems 10. Communication
5. Modern Tool Usage 11. Project Management and Finance
6. The Engineer and Society 12. Life-long Learning

Program Specific Outcomes

At the end of the program students will be able to


 Deal with real time problems by understanding the evolutionary changes in computing, applying
standard practices and strategies in software project development using open-ended programming
environments.
 Employ modern computer languages, environments and platforms in creating innovative career paths
by inculcating moral values and ethics.
 Achieve additional expertise through add-on and certificate programs.

7
Ex.No:01
Date: INSTALL VIRTUALBOX/VMWARE WORKSTATION

Aim:
To Install Virtual box/VMware Workstation with different flavors of Linux or windows OS on top of
windows7 or 8.

Procedure to Install

Step 1- Download Link


Link for downloading the software is https://www.vmware.com/products/workstation-pro/workstationpro-
evaluation.html. Download the software for windows. Good thing is that there is no signup process. Click and
download begins. Software is around 541 MB.

Step 2- Download the installer file It should probably be in the download folder by default, if you have not
changed the settings in your browser. File name should be something like VMware-workstation-full-15.5.1-
15018445.exe. This file name can change depending on the version of the software currently available for
download. But for now, till the next version is available, they will all be VMware Workstation 15 Pro.

Step 3- Locate the downloaded installer file For demonstration purpose, I have placed the downloaded
installer on my desktop. Find the installer on your system and double click to launch the application.

8
Step 4- User Access Control (UAC) Warning
Now you should see User Access Control (UAC) dialog box. Click yes to continue

Initial Splash screen will appear. Wait for the process to complete.

9
Step 5- VMware Workstation Setup wizard
Now you will see VMware Workstation setup wizard dialog box. Click next to continue.

Step 6- End User License Agreement


This time you should see End User License Agreement dialog box. Check “I accept the terms in theLicense
Agreement” box and press next to continue.

10
Step 7- Custom Setup options
Select the folder in which you would like to install the application. There is no harm in leaving the
defaults as it is. Also select Enhanced Keyboard Driver check box.

Step 8- User Experience Settings


Next you are asked to select “Check for Updates” and “Help improve VMware Workstation Pro”. Do as you
wish. I normally leave it to defaults that are unchecked.

11
Step 9- Application Shortcuts preference
Next step is to select the place you want the shortcut icons to be placed on your system to launch the
application. Please select both the options, desktop and start menu and click next.

Step 10- Installation begins


Now you see the begin installation dialog box. Click install to start the installation process.

12
Below screenshot shows Installation in progress. Wait for this to complete

At the end you will see installation complete dialog box. Click finish and you are done with the
installation process. You may be asked to restart your computer. Click on Yes to restart.

13
Step 11- Launch VMware Workstation
After the installation completes, you should see VMware Workstation icon on the desktop. Double clickon it
to launch the application.

Step 12- License Key


If you see the dialog box asking for license key, click on trial or enter the license key. Then what you have is
the VMware Workstation 15 Pro running on your windows 10 desktop. If don‟t have the licensekey, you will
have 30 days trial

14
Step 13- At some point if you decide to buy
At some point of time if you decide to buy the License key, you can enter the License key by going to
Help->Enter a License Key You can enter the 25 character license key in the dialog box shown below and
click OK. Now you havethe license version of the software.

Result:

Thus the virtual machine is created

15
Ex.No:02
Date: INSTALL C COMPILER

Aim:
To install a C compiler in the virtual machine and executes a sample programs

Procedure:
step1:
Choose the Latest version of Ubuntu and 32-bitand click“Start Download”

16
Step 2:
Login into the VM of installed OS.

Step 3:
Click“Continue” on the pop-up window
TypeVM name, select “Linux” for the OS and choose“Ubuntu” for the version.

17
Choose the amount of memory to allocate (I suggestchoosing between 5I2 MB to I024 MB)

Click Continue or Next

Choose create a new virtual hard disk


Click Continue or Next

18
ChooseVDI (VirtualBox Disk Image)

Click Continue or Next

Choose “Dynamically Allocated” click continue.


This way, the size of your Virtual Hard Disk will grow as you use.

19
Click the folder icon and choose the ubuntu iso file you downloaded.
Select the size of the Virtual Disk (I recommend choosing 8 GB) and click continue

Click Create

20
Choose Ubuntu from left column and click Start

Click Install Ubuntu

21
Open Terminal (Applications-Accessories-Terminal)

Open gedit by typing “gedit &” on terminal

22
Step 4:
Write a sample program likeWelcome.cpp
#include<iostream.h>
using namespace std;
int main()
{
cout<<”Hello world”;
return 0;
}

Step 5:
First we need to compile and link our program. Assuming the source code is saved in a file welcome.cpp, we
can do that using GNU C++ compiler g++, for example g++ -Wall -o welcome welcome.cpp

Result:

Thus the C compiler has been successfully installed and executed a sample program.

23
Ex.No:03
Date: GOOGLE APP ENGINE

Aim:

To Install Google App Engine. Create hello world app and other simple web applications using python/java.
Procedure:

Use Eclipse to create a Google App Engine (GAE) Java project (hello worldexample), run it locally and
deploy it to Google App Engine account.

Tools used:

JDK 1.6

Eclipse 3.7 + Google Plugin for Eclipse

Google App Engine Java SDK 1.6.3.1

Note

GAE supports Java 1.5 and 1.6.

P.S Assume JDK1.6 and Eclipse 3.7 are installed.

1. Install Google Plugin for Eclipse


Read this guide – how to install Google Plugin for Eclipse. If you install the Google App Engine
Java SDK together with “Google Plugin for Eclipse“, then go to step 2, otherwise, get the Google App Engine Java
SDK and extract it.

2. Create New Web Application Project


In Eclipse toolbar, click on the Google icon, and select “New Web Application Project…”

Figure – New Web Application Project

24
Figure – Deselect the “Google Web ToolKit“, and link your GAE Java SDK via the “configure SDK” link.

Click finished, Google Plugin for Eclipse will generate a sample project automatically.

25
3. Hello World
Review the generated project directory.

Nothing special, a standard Java web project structure.


HelloWorld/ src/
...Java source code... META-INF/
...other configuration... war/
...JSPs, images, data files... WEB-INF/
...app configuration... lib/
...JARs for libraries... classes/
...compiled classes...
Copy
The extra is this file “appengine-web.xml“, Google App Engine need this to run and deploy
the application.

26
File : appengine-web.xml

<?xml version="1.0" encoding="utf-8"?> <appengine-web-app


xmlns="http://appengine.google.com/ns/1.0">
<application></application>
<version>1</version>

<!-- Configure java.util.logging -->


<system-properties>
<property name="java.util.logging.config.file" value="WEB-INF/logging.properties"/>
</system-properties>

</appengine-web-app>
Copy

1. Run it local
Right click on the project and run as “Web Application“.

Eclipse console :

//...
INFO: The server is running at http://localhost:8888/
30 Mac 2012 11:13:01 PM
com.google.appengine.tools.development.DevAppServerImplstart INFO: The
admin console is running at http://localhost:8888/_ah/admin
Copy

Access URL http://localhost:8888/, see output and also the hello world servlet –
http://localhost:8888/helloworld

27
1. Deploy to Google App Engine
Register an account on https://appengine.google.com/, and create an application ID
for yourweb application.

In this demonstration, I created an application ID, named “mkyong123”, and put it in


appengine-web.xml.

File : appengine-web.xml

<?xml version="1.0" encoding="utf-8"?> <appengine-


web-appxmlns="http://appengine.google.com/ns/1.0">
<application>mkyong123</application>
<version>1</version>

<!-- Configure java.util.logging -->


<system-properties>
<property name="java.util.logging.config.file" value="WEB-INF/logging.properties"/>
</system-properties>

</appengine-
web-app>
Copy

To deploy, see following steps:

Click on GAE deploy button on the toolbar.

28
Sign in with your Google account and click on the Deploy button.

29
If everything is fine, the hello world web application will be deployed to this URL
– http://mkyong123.appspot.com/

Result:

Thus the GAE is installed and executed the hello world application.

30
Ex.No:04
Date: GAE LAUNCHER

Aim:
To use GAE launcher to launch the web applications.

Procedure:
You can use Google App Engine to host a static website. Static web pages can contain client- side
technologies such as HTML, CSS, and JavaScript. Hosting your static site on App Engine can cost less
than using a traditional hosting provider, as App Engine provides a free tier.
Sites hosted on App Engine are hosted on the REGION_ID.r.appspot.com subdomain, such as [my-
project-id].uc.r.appspot.com. After you deploy your site, you can map your own domain name to your
App Engine-hosted website.
Before you begin
Before you can host your website on Google App Engine:
Create a new Cloud Console project or retrieve the project ID of an existing project to use: Go to the
Project page
Tip: You can retrieve a list of your existing project IDs with the gcloud command line tool.
2. Install and then initialize the Google Cloud SDK: Download the SDK
Creating a website to host on Google App Engine Basic structure for the project
This guide uses the following structure for the project:
• app.yaml: Configure the settings of your App Engine application.
• www/: Directory to store all of your static files, such as HTML, CSS, images, and JavaScript.
• css/: Directory to store stylesheets.
• style.css: Basic stylesheet that formats the look and feel of your site.
• images/: Optional directory to store images.
• index.html: An HTML file that displays content for your website.
• js/: Optional directory to store JavaScript files.
• Other asset directories.

31
Creating the app.yaml file
The app.yaml file is a configuration file that tells App Engine how to map URLs to your static files. In the
following steps, you will add handlers that will load www/index.html when someone visits your website,
and all static files will be stored in and called from the www directory.
Create the app.yaml file in your application's root directory:
1. Create a directory that has the same name as your project ID. You can find your project ID in the
Console.
2. In directory that you just created, create a file named app.yaml.
3. Edit the app.yaml file and add the following code to the file: runtime: python27
api_version: 1
threadsafe:
true handlers:
- url: /
static_files: www/index.html upload: www/index.html
- url: /(.*) static_files: www/\1 upload: www/(.*)
More reference information about the app.yaml file can be found in the app.yaml reference
documentation.
Creating the index.html file

Create an HTML file that will be served when someone navigates to the root page of your
website. Store this file in your www directory.

<html>
<head>
<title>Hello, world!</title>
<link rel="stylesheet"
type="text/css"
href="/css/style.css"> </head>
<body>
<h1>Hello, world!</h1>
<p>
This is a simple static HTML file that will be served from Google
App Engine.
</p>
</body>
</html>

32
Deploying your application to App Engine
When you deploy your application files, your website will be uploaded to App Engine. To deploy your
app, run the following command from within the root directory of your application where the app.yaml
file is located:
gcloud app deploy Optional flags:
• Include the --project flag to specify an alternate Cloud Console project ID to what you initialized as the
default in the gcloud tool. Example: --project [YOUR_PROJECT_ID]
• Include the -v flag to specify a version ID, otherwise one is generated for you. Example: - v
[YOUR_VERSION_ID]
To learn more about deploying your app from the command line, see Deploying a Python 2 App.
Viewing your application
To launch your browser and view the app at https://PROJECT_ID.REGION_ID.r.appspot.com, run the
following command:
gcloud app browse

Result:
Thus the GAE launcher used to launch the web applications.

33
Ex.No:05
Date: CLOUDSIM

Aim:

To simulate a cloud scenario using CloudSim and run a scheduling algorithm that isnot present in CloudSim.

Procedure:

Basics of Scheduling

In computers, Scheduling is a process of arranging the submitted jobs/task into a very specific sequence of
execution. It is an essential characteristic of any software operating environment, which is handled by a very
special program known as a scheduler.

Scheduler‟s main objective is to keep the underlined hardware resources(primarily processor) to be used
effectively as well as efficient. In general, the scheduler may prefer to have any of the following scheduling
approaches:

Space-shared: In this, the requested resources are allocated dedicatedly to the requesting workload for
execution and will be released only on completion. Space-shared is also known as a batch process scheduling.

Time-shared: In this, the requested resources would be shared among more than one workload(task). The
sharing is done based on time-sliced allocation where each workload is allocated with a required resource for a
defined time(e.g., 200 milliseconds). Once the defined time slice is over, the current workload execution
paused, and the resource is released. The released resource gets allocated to the next workload for the same
defined time slice, and this cycle goes on till the time all the workloads execution is over. Time- shared is also
known as round-robin scheduling.

Scheduling in Cloud

As cloud computing is the virtualized operating environment, and the virtual machines are the primary
computing component which is responsible for the execution of the workloads(tasks). The virtual machine(s)
are powered by a physical server host machine (i.e.) hardware. Depending on the requirement of the Virtual
Machine(VM) there could be „one to one‟ or „many to one‟ mapping between the VM and host machine. That
means in cloud computing the scheduling is done at both the mapping levels that are:

 Virtual Machine to Host Machines


 Tasks to Virtual Machines

34
Both of VM to Host as well as Workload(task) to VM mappings may utilize space-share or time- shared
or any other specialized scheduling algorithm.
Scheduling in Cloudsim
The Cloudsim simulation toolkit framework has effectively addressed the Scheduling scenario and
implemented it as a set of the programmable class hierarchies with parent class as:
1. VmScheduler
2. CloudletScheduler
Also, Virtual Machine(VM) and Task( Cloudlet) scheduling are one of the most important and the popular
use case to be simulated by researchers using the CloudSim simulation toolkit.
Note: In cloudsim, the task is called as cloudlet, therefore in the following text instead of „task‟ we will be
using the „cloudlet‟.
Cloudsim Virtual Machine Scheduling
The VmScheduler is an abstract class that defines and implements the policy used to share processing
power among virtual machines running on a specified host. The hierarchy of the cloudsim virtual machine
scheduler classes is as:

35
Cloudsim Virtual Machine Scheduler Class Hierarchy

These classes can be located in “org.cloudbus.cloudsim” package of cloudsim. The definition of this abstract
class is extended to the following types of policies implemented as classes:

Vm Scheduler Time Shared:

This class implements the VM scheduling policy that allocates one or more processing elements to a single
Virtual machine and allows the sharing of processing elements by multiple virtual machines with a specified
time slice. This class also considers the overhead of VM allocation switching(similar to context switching) in
policy definition. Here, the VM allocation will fail if the number of processing elements requested is not
available. for example, if the VM request for quad- core processor whereas the allocated host has an only dual-
core the allocation will fail.

o VmSchedulerSpaceShared: This class implements the VM scheduling policy that


allocates one or more processing elements to a single virtual machine, but this policy
implementation does not support sharing of processing elements (i.e.) all the
requested resources will be used by the allocated VM till the time the VM is not
destroyed. Also, Under this allocation policy, if any virtual machine requests a
processing element and is not available at that time, the allocation fails.

o VmSchedulerTimeSharedOverSubscription: This is an extended implementation of


VMSchedulerTimeShared VM scheduling policy, which allows over-subscription of
processing elements by the virtual machine(s) (i.e.) the scheduler still allows the
allocation of VMs that require more CPU capacity that is available. And this
oversubscription results in performance degradation.

The application of the VmScheduler classes is while instantiating the host model. Following is
the code snippet used in CloudsimExample1.java from line number 160 to 174:

int hostId = 0;
int ram = 2048; // host
memory (MB) long storage =
1000000; // host storageint bw
= 10000;
hostList.add()
new Host(
hostId,
new RamProvisionerSimple(ram),new
BwProvisionerSimple(bw), storage,
peList,
new VmSchedulerTimeShared(peList)

36
)
);

This is where the processing element list is passed as a parameter to the VmSchedulerTimeShared() class call
and during the simulation, the cloudsim will simulate the timeshare behavior for the virtual machines. Also, in
case you want to test other VmScheduler you may replace it with VmSchedulerTimeShared() call with
appropriate parameters, this includes your own designed custom virtual machine scheduler.

Cloudsim Cloudlet Scheduling


The “CloudletScheduler” is an abstract class that defines the basic skeleton to implement
the policy to be used for cloudlet scheduling to be performed by a virtual machine. The
hierarchy of the cloudsim Cloudlet scheduler classes is as:

Cloudlet Scheduler Class Hierarchy

These classes again exist in “org.cloudbus.cloudsim” package of cloudsim. The definition of this abstract
class is extended as the following types of policies implemented as three individual classes in cloudsim:

• CloudletSchedulerSpaceShared: This class implements a policy of scheduling for Virtual machine to


execute cloudlet(s) in space shared environment (i.e.) only one cloudlet will be executed on a virtual
machine at a time. It means cloudlets share the same queue and requests are processed one at a time per
computing core. Space-sharing is similar to batch processing.

37
• CloudletSchedulerTimeShared: This class implements a policy of cloudlet scheduling for Virtual
machines to execute cloudlets in a time-shared environment (i.e.) more than one cloudlet will be submitted
to the virtual machine and each will get its specified share of time. It means several requests (cloudlets)
are processed at once but they must share the computing power of that virtual machine(by simulating
context switching), so they will affect each other‟s processing time. It basically influences the completion
time of a cloudlet in CloudSim. Time-sharing is probably referring to the concept of sharing executing
power (such as CPU, logical processor, GPU) and is commonly known as the round-robin scheduling.

• CloudletSchedulerDynamicWorkload: This implements a special policy of scheduling for virtual


machine assuming that there is just one cloudlet which is working as an online service with a different
requirement of workload as per the need of peak/offpeak user load at a specified period of time.

The application of the CloudletScheduler classes is while instantiating the Vm model. Following is the
code snippet used in CloudsimExample1.java from line number 82 to 91:

int vmid = 0;
int mips = 1000;
long size = 10000; // image size (MB)int ram
= 512; // vm memory (MB) long bw = 1000;
int pesNumber = 1; // number of cpusString
vmm = "Xen"; // VMM name

Vm vm = new Vm(vmid, brokerId, mips, pesNumber, ram, bw, size, vmm, new
CloudletSchedulerTimeShared()); // create VM
By instantiating the CloudletSchedulerTimeShared() class, the Virtual machine is decided to
follow the timeshare(round-robin) approach while simulation for scheduling & executing the Cloudlets. Also,
in case you want to test other CloudletScheduler you may replace it with CloudletSchedulerTimeShared() call
with appropriate parameters, this includes your own designed custom cloudlet scheduler.

Now in case you want to implement your own scheduling policies with respect to Virtual Machine or
Cloudlet(s), you may simply extend the VmScheduler or CloudletScheduler class to implement all the abstract
methods as specified. This gives you the flexibility to design and implement your own set of algorithms and
then later test & optimize during the repetitive simulation runs.

Result:
Thus the cloud scenario using CloudSim and scheduling algorithm in CloudSim executed.

38
Ex.No:06
Date: VIRTUAL MACHINE OPERATION

Aim:
To move the files between virtual machine.

Procedure:

You can move files between virtual machines in several ways:


• You can copy files using network utilities as you would between physical computers on your network.
To do this between two virtual machine:
o Both virtual machines must be configured to allow access to your network. Any of the networking
methods (host-only, bridged and NAT) are appropriate.
o With host-only networking, you copy files from the virtual machines to the host and vice-versa, since
host-only networking only allows the virtual machines see your host computer.
o With bridged networking or NAT enabled, you can copy files across your network between the virtual
machines.
• You can create a shared drive, either a virtual disk or a raw partition, and mount the drive in each of the
virtual machines.
How to Enable File sharing in VirtualBox.
Step 1. Install Guest Additions on the Guest machine.
Step 2. Configure File Sharing on VirtualBox.
Step 1. Install Guest Additions on the Guest machine.
1. Start the Virtuabox Guest Machine (OS).
2. From Oracle's VM VirtualBox main menu, select Devices > Install Guest Additions *

39
Open Windows Explorer.
Double click at the "CD Drive (X:) VirtualBox Guest additions" to explore its contents.

Right click at "VBoxWindowsAdditions" application and from the pop-up menu, choose "Run as
administrator”

40
Press Next and then follow the on screen instructions to complete the Guest Additions installation.

When the setup is completed, choose Finish and restart the Virtuabox guest machine
Step 2. Setup File Sharing on VirtualBox Guest Machine.
1.From VirtualBox menu click Devices and choose Shared Folders -> Shared Folder Settings.

41
2. Click the Add new shared folder icon.

3. Click the drop-down arrow and select other.

4. Locate and highlight (from the Host OS) the folder that you want to share betweenthe VirtualBox Guest
machine and the Host and click Select Folder. *
* Note: To make your life easier, create a new folder for the file sharing, on the Host OS and give it with a
recognizable name. (e.g. "Public")

42
Now, in the 'Add Share' options, type a name (if you want) at the 'Folder Name box,click the Auto Mount
and the Make Permanent checkboxes and click OK twice toclose the Shared Folder Settings.

5. You 're done! To access the shared folder from the Guest OS, open Windows Explorer and under the
'Network locations' you should see a new network drive that corresponds to the shared folder on the Host
OS.

Result:
Thus the virtual machine files are moved to another VM.

43
Ex.No:07
Date: LAUNCHING OF VIRTUAL MACHINE USING TRY STACK

Aim:
To install the KVM and Open stack in Ubuntu 14.04 version and creation of virtual machine.
Mandatory prerequisite:
1. Linux 64 bit Operating System (The commands mentioned are for Ubuntu Linux Operating System
latest version).
Installing KVM (Hypervisor for Virtualization)
1.Check if the Virtualization flag is enabled in BIOS Run the command in terminal
egrep -c '(vmx|svm)' /proc/cpuinfo
If the result is any value higher than 0, then virtualization is enabled.
If the value is 0, then in BIOS enable Virtualization – Consult system administrator for this step.
2.To check if your OS is 64 bit,
Run the command in terminal
uname -m
If the result is x86_64, it means that your Operating system is 64 bit Operating system.
3.Few KVM packages are availabe with Linux installation.
To check this, run the command,
ls /lib/modules/{press tab}/kernel/arch/x86/kvm
nagarajan@JBL01:~$ ls /lib/modules/4.4.0-21-generic/kernel/arch/x86/kvm
The three files which are installed in your system will be displayed
kvm-amd.ko kvm-intel.ko kvm.ko
4.Install the KVM packages
1. Switch to root (Administrator) user sudo -i
export http_proxy=http://172.16.0.3:8080
2. To install the packages,
run the following commands,
apt-get update
apt-get install qemu-kvm apt-get install libvirt-bin apt-get install bridge-utils
apt-get install virt-manager apt-get install qemu-system
5.To verify your installation, run the command
virsh -c qemu:///system list
it shows output

44
Id Name State
If Vms are running, then it shows name of VM. If VM is not runnign, the system shows blank output,
whcih means your KVM installation is perfect.
6.Run the command
virsh --connect qemu:///system list --all
7.Working with KVM
run the command
virsh
version (this command displays version of software tools installed)
nodeinfo (this command displays your system information)
quit (come out of the system)
8. To test KVM installation - we can create Virtual machines but these machines are to be done in manual
mode. Skipping this, Directly install Openstack.
Installation of Openstack
1. add new user named stack – This stack user is the adminstrator of the openstack services. To add new
user – run the command as root user.
adduser stack
2. run the command
apt-get install sudo -y || install -y sudo
3. Be careful in running the command – please be careful with the syntax. If any error in thsi following
command, the system will crash beacause of permission errors.
echo “stack ALL=(ALL) NOPASSWD:ALL” >> /etc/sudoers
4. Logout the system and login as stack user
5. Run the command (this installs git repo package)
Run in Root
export http_proxy=http://172.16.0.3:8080
sudo apt-get install git
6. Run the command (This clones updatesd version of dev-stack (which is binary auto- installer package
of Openstack)
stack@JBL01:/$ export http_proxy=http://172.16.0.3:8080 stack@JBL01:/$ export
https_proxy=http://172.16.0.3:8080 stack@JBL01:/$ git config --global http.proxy $http_proxy
stack@JBL01:/$ git config --global https.proxy $http_proxy
git clone http://git.openstack.org/openstack-dev/devstack ls (this shows a folder named devstack)
cd devstack (enter into the folder)

45
7. create a file called local.conf. To do this run the command,
nano local.conf
stack@JBL01:/devstack$ sudo nano local.conf
8. In the file, make the following entry (Contact Your Network Adminstrator for doubts in these values)
[[local|localrc]] FLOATING_RANGE=192.168.1.224/27 FIXED_RANGE=10.11.11.0/24
FIXED_NETWORK_SIZE=256 FLAT_INTERFACE=eth0 ADMIN_PASSWORD=root
DATABASE_PASSWORD=root RABBIT_PASSWORD=root SERVICE_PASSWORD=root
SERVICE_TOCKEN=root
9. Save this file
stack@JBL01:/devstack$ sudo gedit stackrc
Save this file
Change File Permission: stack@JBL01:~$ chown stack * -R
10.Run the command (This installs Opentack)
./stack.sh
11.If any error occurs, then run the command for uninistallation
./unstack.sh
1. update the packages apt-get update
2. Then reinstall the package
./stack.sh
12.Open the browser, http://IP address of your machine, you will get the openstack portal.
13.If you restart the machine, then to again start open stack open terminal,
su stack
cd devstack run ./rejoin.sh
14.Again you can access openstack services in the browser, http://IP address of your machine,
VIRTUAL MACHINE CREATION
Launch an instance
1. Log in to the dashboard
2. Select the appropriate project from the drop down menu at the top left.
3. On the Project tab, open the Compute tab and click Instances category.
The dashboard shows the instances with its name, its private and floating IP addresses, size, status, task,
power state, and so on.
4. Click Launch Instance.
5. In the Launch Instance dialog box, specify the following values:
Details tab

46
Availability Zone
By default, this value is set to the availability zone given by the cloud provider (for example, us-west or
apac-south). For some cases, it could be nova.
Instance Name
Assign a name to the virtual machine.
Note
The name you assign here becomes the initial host name of the server. If the name is longer than 63
characters, the Compute service truncates it automatically to ensure dnsmasq works correctly. After the
server is built, if you change the server name in the API or change the host name directly, the names are
not updated in the dashboard.
Server names are not guaranteed to be unique when created so you could have two instances with the
same host name.
Flavor
Specify the size of the instance to launch.
Note
The flavor is selected based on the size of the image selected for launching an instance. For example,
while creating an image, if you have entered the value in the Minimum RAM (MB) field as 2048, then on
selecting the image, the default flavor is m1.small. Instance Count
To launch multiple instances, enter a value greater than 1. The default is 1. Instance Boot Source
Your options are:
Boot from image
If you choose this option, a new field for Image Name displays. You can select the image from the list.
Boot from snapshot
If you choose this option, a new field for Instance Snapshot displays. You can select the snapshot from the
list.
Boot from volume
If you choose this option, a new field for Volume displays. You can select the volume from the list.
Boot from image (creates a new volume)
With this option, you can boot from an image and create a volume by entering the Device Size and Device
Name for your volume. Click the Delete Volume on Instance Delete option to delete the volume on
deleting the instance.
Boot from volume snapshot (creates a new volume)
Using this option, you can boot from a volume snapshot and create a new volume by choosing Volume
Snapshot from a list and adding a Device Name for your volume. Click the Delete Volume on Instance
Delete option to delete the volume on deleting the instance.
Image Name
47
This field changes based on your previous selection. If you have chosen to launch an instance using an
image, the Image Name field displays. Select the image name from the dropdown list. Instance Snapshot
This field changes based on your previous selection. If you have chosen to launch an instance using a
snapshot, the Instance Snapshot field displays. Select the snapshot name from the dropdown list.
Volume
This field changes based on your previous selection. If you have chosen to launch an instance using a
volume, the Volume field displays. Select the volume name from the dropdown list. If you want to delete
the volume on instance delete, check the Delete Volume on Instance Delete option.
Access & Security tab Key Pair
Specify a key pair.
If the image uses a static root password or a static key set (neither is recommended), you do not need to
provide a key pair to launch the instance. Security Groups
Activate the security groups that you want to assign to the instance.
Security groups are a kind of cloud firewall that define which incoming network traffic is forwarded to
instances.
If you have not created any security groups, you can assign only the default security group to the instance.
Networking tab Selected Networks
To add a network to the instance, click the + in the Available Networks field.
Network Ports tab Ports
Activate the ports that you want to assign to the instance. Post-Creation tab
Customization Script Source
Specify a customization script that runs after your instance launches. Advanced Options tab
Disk Partition
Select the type of disk partition from the dropdown list:
Automatic
Entire disk is single partition and automatically resizes. Manual
Faster build times but requires manual partitioning.
6. Click Launch.
The instance starts on a compute node in the cloud.
Note
If you did not provide a key pair, security groups, or rules, users can access the instance only from inside
the cloud through VNC. Even pinging the instance is not possible without an ICMP rule configured.
You can also launch an instance from the Images or Volumes category when you launch an instance from
an image or a volume respectively.
When you launch an instance from an image, OpenStack creates a local copy of the image on the compute
48
node where the instance starts.
For details on creating images, see Creating images manually in the OpenStack Virtual Machine Image
Guide.
When you launch an instance from a volume, note the following steps:
•To select the volume from which to launch, launch an instance from an arbitrary image on the volume.
The arbitrary image that you select does not boot. Instead, it is replaced by the image on the volume that
you choose in the next steps.
To boot a Xen image from a volume, the image you launch in must be the same type, fully virtualized or
paravirtualized, as the one on the volume.
•Select the volume or volume snapshot from which to boot. Enter a device name. Enter vda for
KVM images or xvda for Xen images.

49
50
CREATING USER

51
52
53
Result:

Thus the virtual machine is launched in Ubuntu 14.04.

54
Ex.No:08
Date: HADOOP SINGLE NODE CLUSTER

Aim:
To Set up the one node Hadoop Cluster and execute a word count program
Procedure:
1)Installing Java:

Hadoop is a framework written in Java for running applications on large clusters of commodity hardware.
Hadoop needs Java 6 or above to work.

Step 1: Download Jdk tar.gz file for linux-62 bit, extract it into “/usr/local”
boss@solaiv[]# cd /opt
boss@solaiv[]# sudo tar xvpzf /home/itadmin/Downloads/jdk-8u5-linux-x64.tar.gz boss@solaiv[]# cd
/opt/jdk1.8.0_05

Step 2:
Open the “/etc/profile” file and Add the following line as per the version seta environment for Java
Use the root user to save the /etc/proflie or use gedit instead of vi .
The 'profile' file contains commands that ought to be run for login shells

boss@solaiv[]# sudovi /etc/profile #--insert JAVA_HOME

JAVA_HOME=/opt/jdk1.8.0_05
#--in PATH variable just append at the end of the line PATH=$PATH:$JAVA _HOME/bin
#--Append JAVA_HOME at end of the export statement export PATH
JAVA_HOME save the file using by pressing “Esc” key followed by :wq!

Step 3: Source the /etc/profile boss@solaiv[]# source /etc/profile Step 4: Update the java alternatives
By default OS will have a open jdk. Check by “java -version”. You will be prompt “openJDK”

If you also have openjdk installed then you'll need to update the java alternatives:

If your system has more than one version of Java, configure which one your system causes by entering the
following command in a terminal window

By default OS will have a open jdk. Check by “java -version”. You will be prompt “Java
HotSpot(TM) 64-Bit Server”

boss@solaiv[]# update-alternatives --install "/usr/bin/java" java "/opt/jdk1.8.0_05/bin/java" 1

boss@solaiv[]# update- alternatives --config java --type selection number: boss@solaiv[]# java -version
2)configure ssh

Hadoop requires SSH access to manage its nodes, i.e. remote machines plus your local machine if you want to
use Hadoop on it (which is what we want to do in this short tutorial). For our single-node setup of Hadoop, we

55
therefore need to configure SSH access to localhost

The need to create a Password-less SSH Key generation based authentication is so

that the master node can then login to slave nodes (and the secondary node) to start/stop them easily without
any delays for authentication

If you skip this step, then have to provide password

Generate an SSH key for the user. Then Enable password-less SSH access to yo sudo apt-get install openssh-
server
--You will be asked to enter password, root@solaiv[]# sshlocalhost root@solaiv[]# ssh-keygenroot@solaiv[]#
ssh-copy-id -i localhost
--After above 2 steps, You will be connected without password, root@solaiv[]# sshlocalhost

root@solaiv[]# exit

3)Hadoop installation

Now Download Hadoop from the official Apache, preferably a stable release version of Hadoop 2.7.x and
extract the contents of the Hadoop package to a location of your choice.

We chose location as “/opt/”

Step 1: Download the tar.gz file of latest version Hadoop( hadoop-2.7.x) from the official site .
Step 2: Extract(untar) the downloaded file from this commands to /opt/bigdata

root@solaiv[]# cd /opt
root@solaiv[/opt]# sudo tar xvpzf /home/itadmin/Downloads/hadoop-2.7.0.tar.gz root@solaiv[/opt]# cd
hadoop-2.7.0/

Like java, update Hadop environment variable in /etc/profile

boss@solaiv[]# sudovi /etc/profile

#--insert HADOOP_PREFIX HADOOP_PREFIX=/opt/hadoop-2.7.0

#--in PATH variable just append at the end of the line PATH=$PATH:$HADOOP_PREFIX/bin

#--Append HADOOP_PREFIX at end of the export statement export PATH JAVA_HOME


HADOOP_PREFIX

save the file using by pressing “Esc” key followed by :wq!


Step 3: Source the /etc/profile boss@solaiv[]# source /etc/profile Verify Hadoop installation
boss@solaiv[]# cd $HADOOP_PREFIX boss@solaiv[]# bin/hadoop version Modify the Hadoop
Configuration Files

56
Add the following properties in the various hadoop configuration files which is available under
$HADOOP_PREFIX/etc/hadoop/
core-site.xml, hdfs-site.xml, mapred-site.xml & yarn-site.xml

Update Java, hadoop path to the Hadoop environment file boss@solaiv[]# cd $HADOOP_PREFIX/etc/hadoop

boss@solaiv[]# vi hadoop-env.sh
Paste following line at beginning of the file export JAVA_HOME=/usr/local/jdk1.8.0_05 export
HADOOP_PREFIX=/opt/hadoop-2.7.0 Modify the core-site.xml
boss@solaiv[]# cd $HADOOP_PREFIX/etc/hadoopboss@solaiv[]# vi core- site.xml Paste following between
<configuration> tags <configuration>

<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
Modify the hdfs-site.xml boss@solaiv[]# vi hdfs-site.xml
Paste following between <configuration> tags
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>

YARN configuration - Single Node modify the mapred-site.xml boss@solaiv[]# cpmapred-site.xml.template


mapred-site.xml boss@solaiv[]# vi mapred-site.xml
Paste following between <configuration> tags
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
Modiy yarn-site.xml boss@solaiv[]# vi yarn-site.xml
Paste following between <configuration> tags
<configuration>
<property><name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value></property>
</configuration>
Formatting the HDFS file-system via the NameNode
57
The first step to starting up your Hadoop installation is formatting the Hadoop files system which is
implemented on top of the local file system of our “cluster” which includes only our local machine. We need to
do this the first time you set up a Hadoop cluster.

Do not format a running Hadoop file system as you will lose all the data currently in the cluster (in HDFS)

root@solaiv[]# cd $HADOOP_PREFIX root@solaiv[]# bin/hadoopnamenode - format Start NameNode


daemon and DataNode daemon: (port 50070) root@solaiv[]# sbin/start-dfs.sh

To know the running daemons jut type jps or /usr/local/jdk1.8.0_05/bin/jps Start ResourceManager daemon
and NodeManager daemon: (port 8088) root@solaiv[]# sbin/start-yarn.sh
To stop the running process root@solaiv[]# sbin/stop-dfs.sh
To know the running daemons jut type jps or /usr/local/jdk1.8.0_05/bin/jps Start ResourceManager daemon
and NodeManager daemon: (port 8088) root@solaiv[]# sbin/stop-yarn.sh

Procedure:

Update APT

test@cs-88:~$ sudo apt-get update

E: Unable to lock directory /var/lib/apt/lists/

Check back ground Process:

test@cs-88:~$ ps -ef |grep apt

root 2292 2285 0 11:11 ? 00:00:00 /bin/sh /etc/cron.daily/apt

root 2612 2292 0 11:30 ? 00:00:00 apt-get -qq -y update

root 2615 2612 0 11:30 ? 00:00:00 /usr/lib/apt/methods/http

root 2616 2612 0 11:30 ? 00:00:00 /usr/lib/apt/methods/http

root 2617 2612 0 11:30 ? 00:00:00 /usr/lib/apt/methods/http

root 2619 2612 0 11:30 ? 00:00:00 /usr/lib/apt/methods/gpgv

root 2627 2612 0 11:30 ? 00:00:01 /usr/lib/apt/methods/bzip2

test 2829 2813 0 11:36 pts/0 00:00:00 grep --color=auto apt

Kill Backgroud Process :

test@cs-88:~$ sudo kill -9 2292 2612 2615 2616 2617 2619 2627 2829

58
Updaet apt :

test@cs-88:~$ sudo apt-get update

git installation :

root@cs-88:~# sudo apt-get install git

Clone :

root@cs-88:~# git clone https://git.openstack.org/openstack-dev/devstack

root@cs-88:~# ls devstack

root@cs-88:~# cd devstack

root@cs-88:~/devstack# nano local.conf [[local|localre]]

HOST_IP=192.168.4.88

// FLOATING_RANGE=192.168.1.224/27 FIXED_RANGE=10.11.12.0/24 FIXED_NETWORK_SIZE=256


FLAT_RANGE=eth0 ADMIN_PASSWORD=linux DATABASE_PASSWORD=linux
RABBIT_PASSWORD=linux SERVICE-

TOKEN=linux

Save Nono file:

control+x

Hadoop Installation;

stack@cs-88:~/Downloads$ sudo scp -r * /opt/ stack@cs-88:~/Downloads$ ls /opt/

test@cs-88:/opt$ ls

hadoop-2.7.0.tar.gz Hadoop Pseudo-Node.pdf HDFSCommands.pdf jdk-8u60-linux- x64

test@cs-88:/opt$ ls total 383412

drwxr-xr-x 2 root root 4096 May 6 15:41 ./

drwxr-xr-x 23 root root 4096 May 6 16:34 ../

-rw-r--r-- 1 root root 210343364 May 6 15:41 hadoop-2.7.0.tar.gz

-rw-r--r-- 1 root root 159315 May 6 15:41 Hadoop Pseudo-Node.pdf

-rw-r--r-- 1 root root 43496 May 6 15:41 HDFSCommands.pdf

59
-rw-r--r-- 1 root root 181238643 May 6 15:41 jdk-8u60-linux-x64.gz

-rw-r--r-- 1 root root 402723 May 6 15:41 mrsampledata(1).tar.gz

-rw-r--r-- 1 root root 402723 May 6 15:41 mrsampledata.tar.gz Change root user to test user:

test@cs-88:/opt$ sudo chown -Rh test:test /opt/ test@cs-88:/opt$ ll /* display list file with permission total
383412

drwxr-xr-x 2 test test 4096 May 6 15:41 ./

drwxr-xr-x 23 root root 4096 May 6 16:34 ../

-rw-r--r-- 1 test test 210343364 May 6 15:41 hadoop-2.7.0.tar.gz

-rw-r--r-- 1 test test 159315 May 6 15:41 Hadoop Pseudo-Node.pdf

-rw-r--r-- 1 test test 43496 May 6 15:41 HDFSCommands.pdf

-rw-r--r-- 1 test test 181238643 May 6 15:41 jdk-8u60-linux-x64.gz

-rw-r--r-- 1 test test 402723 May 6 15:41 mrsampledata(1).tar.gz

-rw-r--r-- 1 test test 402723 May 6 15:41 mrsampledata.tar.gz test@cs-88:/opt$

Unzip JAVA

test@cs-88:/opt$ tar -zxvf jdk-8u60-linux-x64.gz

test@cs-88:/opt$ cd jdk1.8.0_60 test@cs-88:/opt/jdk1.8.0_60$ pwd

/opt/jdk1.8.0_60 test@cs-88:/

Set profiloe for JAVA

test@cs-88:/opt/jdk1.8.0_60$ sudo nano /etc/profile

JAVA_HOME=/opt/jdk1.8.0_60 HADOOP_PREFIX=/opt/hadoop-2.7.0

PATH=$PATH:$JAVA_HOME/bin PATH=$PATH:$HADOOP_PREFIX/bin

export PATH JAVA_HOME HADOOP_PREFIX

Save: Control +x Press y

Press Enterkey

test@cs-88:/opt/jdk1.8.0_60$ cd .. test@cs-88:/opt$ pwd

/opt

60
test@cs-88:/opt$ Unzip hadoop file

test@cs-88:/opt$tar -zxvf hadoop-2.7.0.tar.gz test@cs-88:/opt$ source /etc/profile

Java Version

test@cs-88:/opt$ java -version

test@cs-88:/$ source /etc/profile test@cs-88:/$ java -version

java version "1.8.0_60"

Java(TM) SE Runtime Environment (build 1.8.0_60-b27)

Java HotSpot(TM) 64-Bit Server VM (build 25.60-b23, mixed mode)

SSH keygeneration

test@cs-88:/opt$ ssh-keygen Generating public/private rsa key pair.

Enter file in which to save the key (/home/test/.ssh/id_rsa): Created directory '/home/test/.ssh'.

Enter passphrase (empty for no passphrase): Enter same passphrase again:

Your identification has been saved in /home/test/.ssh/id_rsa.

Your public key has been saved in /home/test/.ssh/id_rsa.pub. The key fingerprint is:
c6:f4:33:42:4d:87:fb:3a:72:29:e9:5b:ce:ee:e9:e4 test@cs-88

The key's randomart image is:

+-- [ RSA 2048] --- +

| ... |
| o.. |
| o .. |
| + .. |
| S +. |
| . . o. |
| .oo |
| +*=. |
| .oOE. |
+……….+

test@cs-88:/opt$

configure ssh

test@cs-88:/opt$ ssh-copy-id -i localhost


61
\/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed

test@cs-88:/opt$ sudo apt-get install openssh-server test@cs-88:/opt$ ssh-copy-id -i localhost

The authenticity of host 'localhost (127.0.0.1)' can't be established.

ECDSA key fingerprint is 67:12:a1:69:99:ea:b7:b7:96:b1:f5:4a:29:b5:d0:29. Are you sure you want to


continue connecting (yes/no)? y

Please type 'yes' or 'no': yes

/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that

are already installed

/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new
keys

test@localhost's password:

Number of key(s) added: 1

Now try logging into the machine, with: "ssh 'localhost'" and check to make sure that only the key(s) you
wanted were added.

test@cs-88:/opt$ Configure Hadoop

Verify Hadoop installation

test@cs-88:/opt$ cd $HADOOP_PREFIX test@cs-88:/opt/hadoop-2.7.0$ pwd

/opt/hadoop-2.7.0

test@cs-88:/opt/hadoop-2.7.0$

test@cs-88:/opt/hadoop-2.7.0$ ls

bin include libexec

NOTICE.txt sbin etc lib

LICENSE.txt README.txt share

test@cs-88:/opt/hadoop-2.7.0$ bin/hadoop version Hadoop 2.7.0

Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r
d4c8d4d4d203c934e8074b31289a28724c0842cf Compiled by jenkins on 2015-04-10T18:40Z

62
Compiled with protoc 2.5.0

From source with checksum a9e90912c37a35c3195d23951fd18f

This command was run using /opt/hadoop-2.7.0/share/hadoop/common/hadoop-common- 2.7.0.jar

test@cs-88:/opt/hadoop-2.7.0$

Update Java, hadoop path to the Hadoop environment file

test@cs-88:/opt/hadoop-2.7.0$ cd $HADOOP_PREFIX/etc/hadoop test@cs-88:/opt/hadoop-2.7.0/etc/hadoop$


pwd

/opt/hadoop-2.7.0/etc/hadoop

test@cs-88:/opt/hadoop-2.7.0/etc/hadoop$

test@cs-88:/opt/hadoop-2.7.0/etc/hadoop$ nano hadoop-env.sh type at last export


JAVA_HOME=/opt/jdk1.8.0_60 export HADOOP_PREFIX=/opt/hadoop-2.7.0

test@cs-88:/opt/hadoop-2.7.0/etc/hadoop$ nano core-site.xml

<configuration>

<property>

<name>fs.defaultFS</name>

<value>hdfs://localhost:9000</value>

</property>

</configuration>

test@cs-88:/opt/hadoop-2.7.0/etc/hadoop$ nano hdfs-site.xml

<configuration>

<property>

<name>dfs.replication</name>

<value>1</value>

</property>

</configuration>

63
test@cs-88:/opt/hadoop-2.7.0/etc/hadoop$ cp mapred-site.xml.template mapred-site.xml test@cs-
88:/opt/hadoop-2.7.0/etc/hadoop$ nano mapred-site.xml <configuration>

<property>

<name>mapreduce.framework.name</name>

<value>yarn</value>

</property>

</configuration>

yet another resourcce negociated : yarn

test@cs-88:/opt/hadoop-2.7.0/etc/hadoop$ nano yarn- site.xml <configuration>

<property>

<name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value>

</property>

</configuration>

test@cs-88:/opt/hadoop-2.7.0/etc/hadoop$ cd $HADOOP_PREFIX test@cs-88:/opt/hadoop-2.7.0$ bin/hadoop


namenode -format DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the
hdfs command for it.

16/05/07 09:24:13 INFO namenode.NameNode: STARTUP_MSG:

/************************************************************ images with txid >= 0

16/05/07 09:24:14 INFO util.ExitUtil: Exiting with status 0

16/05/07 09:24:14 INFO namenode.NameNode: SHUTDOWN_MSG:

/************************************************************ SHUTDOWN_MSG: Shutting


down NameNode at cs-88/127.0.1.1

************************************************************/

test@cs-88:/opt/hadoop-2.7.0$

test@cs-88:/opt/hadoop-2.7.0$ sbin/start-dfs.sh Starting namenodes on [localhost]

localhost: starting namenode, logging to /opt/hadoop-2.7.0/logs/hadoop-test- namenode-cs-88.out

localhost: starting datanode, logging to /opt/hadoop-2.7.0/logs/hadoop-test-datanode- cs-88.out

64
Starting secondary namenodes [0.0.0.0]

The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.

ECDSA key fingerprint is 67:12:a1:69:99:ea:b7:b7:96:b1:f5:4a:29:b5:d0:29. Are you sure you want to


continue connecting (yes/no)? y

Please type 'yes' or 'no': yes

0.0.0.0: Warning: Permanently added '0.0.0.0' (ECDSA) to the list of known hosts. 0.0.0.0: starting
secondarynamenode, logging to /opt/hadoop-2.7.0/logs/hadoop-test- secondarynamenode-cs-88.out

test@cs-88:/opt/hadoop-2.7.0$

jps- java machines proces status test@cs-88:/opt/hadoop-2.7.0$ jps 5667 DataNode

5508 NameNode

5863 SecondaryNameNode

5997 Jps

test@cs-88:/opt/hadoop-2.7.0$

test@cs-88:/opt/hadoop-2.7.0$ sbin/start- yarn.sh starting yarn daemons

starting resourcemanager, logging to /opt/hadoop-2.7.0/logs/yarn-test-resourcemanager- cs-88.out

localhost: starting nodemanager, logging to /opt/hadoop-2.7.0/logs/yarn-test- nodemanager-cs-88.out

test@cs-88:/opt/hadoop-2.7.0$

/* now getting 6 Services */

test@cs-88:/opt/hadoop-2.7.0$ jps 5667 DataNode

6084 ResourceManager

5508 NameNode

5863 SecondaryNameNode

6218 NodeManager

6524 Jps

test@cs-88:/opt/hadoop-2.7.0$

65
test@cs-88:/opt/hadoop-2.7.0$ bin/hdfs dfs -put /opt/file1.txt /user

test@cs-88:/opt/hadoop-2.7.0$

66
CREATING DIRECGTORY
test@cs-88:/opt/hadoop-2.7.0$ bin/hdfs dfs -mkdir /user
test@cs-88:/opt/hadoop-2.7.0$ bin/hdfs dfs -mkdir /exs

test@cs-88:/opt/hadoop-2.7.0$ cd .. test@cs-88:/opt$ ls
hadoop-2.7.0 HDFSCommands.pdf mrsampledata(1).tar.gz hadoop-2.7.0.tar.gz
jdk1.8.0_60 mrsampledata.tar.gz Hadoop Pseudo-Node.pdf jdk-8u60-linux-x64.gz
test@cs-88:/opt$ tar zxvf mrsampledata.tar.gz file2.txt
file5.txt file1.txt file4.txt file3.txt
test@cs-88:/opt$
test@cs-88:/opt/hadoop-2.7.0$ bin/hadoop jar share/hadoop/mapreduce/hadoop-
mapreduce-examples-2.7.0.jar grep /user/ /op '(CSE)'

67
68
Word count program to demonstrate the use of Map and Reduce tasks Procedure:
1. Format the path.
2. Start the dfs and check the no. of nodes running.
3. Start the yarn and check the no. of nodes running.
4. Open the browser and check whether the hadoop is installed correctly.
5. Add a file and check whether we can view the file.
6. Implement the grep command for the file added and see the result.
7. Implement the wordcount command for the file added and see the result.
8. After completing the process stop dfs and yarn properly.
Commands:
Install the hadoop cluster by using the commands

1.$sudochown –Rh gee.gee/opt/

2.$nano yarn –site.xml <configuration>


<property> <name>yarn.nodemanager.aux- services </name>
<value>mapreduce_shuffle</value> </property>

</configuration>
3.$cd $Hadoop _prefix
4.$bin / Hadoopnamenode_format
5.$s.bin /start_dfs.sh
6.jps

PROGRAM:
Package hadoop;
import java.util.* ;
import java.io.IOException;
import java.io.IOException;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.* ;
import org.apache.hadoop.io.* ;
import org.apache.hadoop.m apred.* ;
69
import org.apache.hadoop.util.* ;
public class ProcessUnits
{
public static class E_EMapper extends
MapReduceBaseimplements
Mapper<LongWritable Text,
Text, IntWritable>
{
public void m ap(LongWritable key,
Text value, OutputCollector<Text,
IntWritable> output, Reporter
reporter) throws IOException
{
String line = value.toString(); String
lasttoken = null; StringTokenizer s =
new
StringTokenizer(line,"\t"); String year
= s.nextToken();
while(s.hasMoreTokens()) {

lasttoken=s.nextToken();
}
intavgprice =
Integer.parseInt(lasttoken);
output.collect(new Text(year), new
IntWritable(avgprice)); }
}
public static class E_EReduce extends
MapReduceBaseimplements
Reducer< Text, IntWritable, Text,
IntWritable> {
public void reduce( Text key, Iterator
<IntWritable> values,
OutputCollector<Text, IntWritable>
output, Reporter reporter) throws
IOException
{
int m axavg=30;
intval=Integer.MIN_V
ALUE;while
(values.hasNext())
{
if((val=values.next().get())>m axavg)
{
output.collect(key, new IntWritable(val));

70
}
}
}
}
public static void m ain(String args[])throws Exception
{
JobConfconf = new JobConf(Eleunits.class);
conf.setJobNam e("m ax_eletricityunits");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(E_EMapper.class);
conf.setCombinerClass(E_EReduce.class);
conf.setReducerClass(E_EReduce.class);
conf.setInputForm at(TextInputFormat.class);
conf.setOutputForm at(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new
Path(args[0]));
FileOutputFormat.setOutputPath(conf, new
Path(args[1]));JobClient.runJob(conf);
}
}
OUTPUT:

Result:
Thus the map & reduce is successfully done using word count program using one node Hadoop cluster.
71
ADDITIONAL PROGRAM

Ex.No:10
Date: CLOUD FOR HPC AND HTC

Aim:
To study the concept of Cloud HPC and HTC

High performance Computing

It is the use of parallel processing for running advanced application programs efficiently, relatives, and
quickly. The term applies especially is a system that function above a teraflop (1012) (floating opm per
second). The term High-performance computing is occasionally used as a synonym for supercomputing.
Although technically a supercomputer is a system that performs at or near currently highest operational rate
for computers. Some supercomputers work at more than a petaflop (1012) floating points opm per second.
The most common HPC system all scientific engineers & academic institutions. Some Government agencies
particularly military are also relying on APC for complex applications.

High-performance Computers :
Processors, memory, disks, and OS are elements of high-performance computers of interest to small &
medium size businesses today are really clusters of computers. Each individual computer in a commonly
configured small cluster has between one and four processors and today „s processors typically are from 2 to 4
crores, HPC people often referred to individual computers in a cluster as nodes. A cluster of interest to a small
business could have as few as 4 nodes on 16 crores. Common cluster size in many businesses is between 16 &
64 crores or from 64 to 256 crores. The main reason to use this is that in its individual node can work together
to solve a problem larger than any one computer can easily solve. These nodes are so connected that they can
communicate with each other in order to produce some meaningful work.
There are two popular HPC‟s software i. e, Linux, and windows. Most of installations are in Linux because of
its supercomputer but one can use it with his / her requirements.

High-throughput computing
The HTC community is also concerned with robustness and reliability of jobs over a long-time scale.
That is, being able to create a reliable system from unreliable components. This research is similar to
transaction processing, but at a much larger and distributed scale.

Some HTC systems, such as HTCondor and PBS, can run tasks on opportunistic resources. It is a
difficult problem, however, to operate in this environment. On one hand the system needs to provide a reliable
operating environment for the user's jobs, but at the same time the system must not compromise the integrity
of the execute node and allow the owner to always have full control of their resources. Here are many
differences between high-throughput computing, high-performance computing (HPC), and many-task
computing (MTC).

72
HPC tasks are characterized as needing large amounts of computing power for short periods of time,
whereas HTC tasks also require large amounts of computing, but for much longer times (months and years,
rather than hours and days).[1] HPC environments are often measured in terms of FLOPS.
The HTC community, however, is not concerned about operations per second, but rather operations
per month or per year. Therefore, the HTC field is more interested in how many jobs can be completed over a
long period of time instead of how fast.

As an alternative definition, the European Grid Infrastructure defines HTC as “a computing paradigm
that focuses on the efficient execution of a large number of loosely-coupled tasks”,[2] while HPC systems
tend to focus on tightly coupled parallel jobs, and as such they must execute within a particular site with low-
latency interconnects. Conversely, HTC systems are independent, sequential jobs that can be individually
scheduled on many different computing resources across multiple administrative boundaries. HTC systems
achieve this using various grid computing technologies and techniques.

MTC aims to bridge the gap between HTC and HPC. MTC is reminiscent of HTC, but it differs in the
emphasis of using many computing resources over short periods of time to accomplish many computational
tasks (i.e. including both dependent and independent tasks), where the primary metrics are measured in
seconds (e.g. FLOPS, tasks/s, MB/s I/O rates), as opposed to operations (e.g. jobs) per month. MTC denotes
high-performance computations comprising multiple distinct activities, coupled via file system operations.

73
74
Modifications to Slurm Slurm – A workload manager for HPC cluster
• Manages the resource and job scheduling
• Marks a node DOWN and removes the jobs for an unreachable node
• Does the same for a suspended virtual node Modified Slurm to manage the suspended node and keep the job
states intact

Result:

Thus the Concept of HTC and HPC were studied successfully.

75
Ex.No:11
Date: PERFORMANCE OF CLOUD

Aim:
To study the concept of Cloud Performance

Cloud Performance

The success of Cloud deployments is highly dependent on practicing holistic performance engineering and
capacity management techniques. A majority of the obstacles for adoption and growth of cloud computing
are related to the basic performance aspects, such as availability, performance, capacity, or scalability. Please
refer below to Table 1 for the obstacles and opportunities details.

 Potential cloud solutions to overcome these obstacles need to be carefully assessed for their
authenticity in real-life situations.
 Performance engineers need to get to the bottom of the technical transactions of underlying cloud
services before advising cloud computing users and cloud computing providers for the cloud services.
 The degree to which cloud services can meet agreed service level requirements for availability,
performance, and scalability can be estimated by using performance modeling techniques, so that
potential performance anti-patterns can be detected before they happen.
 In the absence of sophisticated tooling for automated monitoring, the automatic provisioning and
usage-based costing (metering) facilities, rely mainly on fine-grained capacity management.
Until more data collection, analysis, and forecasting are in place, capacity management is more
opportune than ever.
 Irrespective of sophisticated tooling for automated monitoring, cloud computing users need to analyze
their demand for capacity and their requirements for performance. In their contract with cloud
computing providers, users should always take a bottom-line approach to accurately formulate
their service-level requirements.

Performance Monitoring can itself be as -a-service

With new technologies and middleware platforms such distributed file systems, MongoDB databases,
Search platforms as well as for very heavy systems processing Big Data, there is a constant need for
performance monitoring and analysis techniques to be developed. These monitoring and analysis
techniques need to ensure that performance metrics can be obtained, analyzed and understood in the context of
these new technologies.

 New centralized monitoring techniques will be required specifically for these new technologies and
middleware platforms.
 The solution is to devise new as-a-service for monitoring and management of the cloud.

76
 This means that tool providers will centrally store monitoring data from large numbers of customers
systems along with new opportunities in terms of data analytics.

Automated Resource Utilization Monitoring and Analysis

A new requirement coming in with cloud technologies and middleware platforms is the availability of
automated utilization metrics. These metrics should be efficiently collected and properly understood.

 A major challenge for cloud providers is to centrally monitor the hardware is being utilized with the
changing load on the system. This is required to harness the power of existing hardware and maximize
the efficiency of cloud infrastructures.
 Lots of research is being conducted in this area of utilization analysis in the context of different
software workloads. Such analysis can be applied say to maximize the system utilization by workload
relocation or to increase energy efficiency.
 Analysis and monitoring can dramatically reduce the costs of services by providing more cost-
optimized cloud platforms and services.

Horizontal Scaling

Horizontal scaling, or scale out, usually refers to clustering multiple independent computers together to
provide more processing power. This type of scaling typically implies multiple instances of operating systems,
residing on separate servers.

 SaaS requires very dynamic horizontal scaling, i.e. the ability to quickly scale out and down during
times of different workloads.
 Performance considerations such as scalability and reliability are an important area for SaaS systems.
It can even be more challenging on large scale SaaS systems with large numbers of components.
 For dynamic scale out, hardware resources may be immediately available, but with a huge financial
cost associated, and may not be an effective and elegant solution in case of inefficient design.
 The financial cost implications become even worse in SaaS systems, as in non-SaaS systems the cost
of running inefficient hardware was capped by the available hardware resources inhouse (generally the
CapEx expenses). In the cloud this is no longer the case, and developers and designers are now closer
to the financial costs associated with running their software. Thus responsible design of software with
respect to performance is required so that efficient usage of the cloud is attained.
 Autonomic management of systems has been a growing area of research over the past decade.
Automatic scaling based on alerting and user defined thresholds is something available today from as-
a-service providers so that system will scale on demand.
 Pre-defined performance non-functional requirements (NFRs) and service level agreements (SLAs),
workload modeling, user load planning for minimum, average and maximum users, and scalability

77
testing are the best way to proactively handle the performance issues. Table 2 below shows typical
Cloud SLAs and KPIs that are used to assess SLA attainment.

Advanced Data Analytics for better performance

An individual enterprise may produce terabytes of log data per month which can contain millions of
events per second. The techniques for gathering monitoring data have become a lot better through the
development of performance tools for in-house enterprise systems; however the analysis of the large volume
of data collected has been still a major challenge.

 There is a burning need of having efficient and preferred log analytics systems in cloud environment,
with the surfacing of new cloud technologies on these challenges such as log management as-a-
service.
 A log management as-a-service technology handling log analysis for large numbers of enterprises
must be able to manage millions of events per second, performing visualization, analysis and alerting
in real time to allow for autonomic management of the system.
 Cloud has produced new challenges due to the larger scale of systems and the much larger volumes of
data produced by these systems. Real time analytics is a growing area and provides challenges in the
analysis of upwards of millions of events per second with real time constraints.
 Real time analytics can be a BIG aid for performance monitoring; this is another emerging rich area of
research. Real time analytics with time constraints will certainly enhance the performance
management of cloud based systems.

Result:

Thus the concepts of Cloud Performance were studied successfully.

78

You might also like