0% found this document useful (0 votes)
74 views11 pages

Cloud Computing Unit-5

MapReduce is a programming paradigm that enables massive scalability across hundreds or thousands of servers in a Hadoop cluster. It refers to two tasks - map and reduce - that Hadoop programs perform to process and generate insights from large datasets in a distributed manner. Some benefits of MapReduce include scalability, flexibility, speed, and simplicity of programming. An example demonstrates how MapReduce can be used to find the maximum temperature for each city by breaking the task into multiple mapping tasks across data files.

Uploaded by

ASHISH CHAUDHARY
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
74 views11 pages

Cloud Computing Unit-5

MapReduce is a programming paradigm that enables massive scalability across hundreds or thousands of servers in a Hadoop cluster. It refers to two tasks - map and reduce - that Hadoop programs perform to process and generate insights from large datasets in a distributed manner. Some benefits of MapReduce include scalability, flexibility, speed, and simplicity of programming. An example demonstrates how MapReduce can be used to find the maximum temperature for each city by breaking the task into multiple mapping tasks across data files.

Uploaded by

ASHISH CHAUDHARY
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 11

Unit- 5

Mapreduce in Hadoop: What is MapReduce? MapReduce is a programming paradigm that


enables massive scalability across hundreds or thousands of servers in a Hadoop
cluster. As the processing component, MapReduce is the heart of Apache Hadoop. The term
"MapReduce" refers to two separate and distinct tasks that Hadoop programs perform.

MapReduce programming offers several benefits to help you gain valuable insights
from your big data:

 Scalability. Businesses can process petabytes of data stored in the Hadoop


Distributed File System (HDFS).
 Flexibility. Hadoop enables easier access to multiple sources of data and multiple
types of data.
 Speed. With parallel processing and minimal data movement, Hadoop offers fast
processing of massive amounts of data.
 Simple. Developers can write code in a choice of languages, including Java, C++ and
Python.

An example of MapReduce
This is a very simple example of MapReduce. No matter the amount of data you need
to analyze, the key principles remain the same.

Assume you have five files, and each file contains two columns (a key and a value in
Hadoop terms) that represent a city and the corresponding temperature recorded in
that city for the various measurement days. The city is the key, and the temperature is
the value. For example: (Toronto, 20). Out of all the data we have collected, you want
to find the maximum temperature for each city across the data files (note that each file
might have the same city represented multiple times).

Using the MapReduce framework, you can break this down into five map tasks, where
each mapper works on one of the five files. The mapper task goes through the data and
returns the maximum temperature for each city.

For example, the results produced from one mapper task for the data above would
look like this: (Toronto, 20) (Whitby, 25) (New York, 22) (Rome, 33)
Virtualbox:
Oracle VM VirtualBox is defined as a tool for virtualizing x86 and AMD64/Intel64
computing architecture, enabling users to deploy desktops, servers, and
operating systems as virtual machines. One can use this solution to deploy as many
virtual machines as the host architecture has the resources for.

VirtualBox can extend the technical capabilities of any compatible


computer, enabling it to run multiple operating systems in different
virtual machines at once. For instance, a user can run Windows 11
and Ubuntu on their MacBook Air without compromising their
existing system configuration or applications.

VirtualBox allows any system to install and operate as many virtual


machines as its memory and disc space allow. Depending on the host
system’s configuration, one can use this solution to deploy anything
from desktop-class machines and small embedded systems to cloud
environments and large datacenter deployments.
google app engine in cloud computing

App Engine is a fully managed, serverless platform for developing and hosting
web applications at scale. You can choose from several popular languages, libraries,
and frameworks to develop your apps, and then let App Engine take care of
provisioning servers and scaling your app instances based on demand.

Google App Engine (GAE) is a platform-as-a-service product that


provides web app developers and enterprises with access to
Google's scalable hosting and tier 1 internet service.

GAE requires that applications be written in Java or Python, store data in


Google Bigtable and use the Google query language. Noncompliant
applications require modification to use GAE.

programming environment for google app engine in


cloud computing
Google App Engine provides four possible runtime environments for applications, one for each
of four programming languages: Java, Python, PHP, and Go. The environment you choose
depends on the language and related technologies you want to use for developing the
application.

When to choose the flexible environment

Application instances run within Docker containers on Compute Engine virtual


machines (VM).

Applications that receive consistent traffic, experience regular traffic fluctuations, or


meet the parameters for scaling up and down gradually.

The flexible environment is optimal for applications with the following characteristics:

 Source code that is written in a version of any of the supported programming


languages:
Python, Java, Node.js, Go, Ruby, PHP, or .NET
 Runs in a Docker container that includes a custom runtime or source code
written in other programming languages.
 Uses or depends on frameworks that include native code.
 Accesses the resources or services of your Google Cloud project that reside in
the Compute Engine network.

OpenStack
OpenStack is a free, open standard cloud computing platform. It is mostly deployed
as infrastructure-as-a-service (IaaS) in both public and private clouds where virtual servers and other
resources are made available to users. The software platform consists of interrelated components
that control diverse, multi-vendor hardware pools of processing, storage, and networking resources
throughout a data center. Users manage it either through a web-based dashboard,
through command-line tools, or through RESTful web services.
OpenStack began in 2010 as a joint project of Rackspace Hosting and NASA. As of 2012, it was
managed by the OpenStack Foundation, a non-profit corporate entity established in September
2012 to promote OpenStack software and its community. By 2018, more than 500 companies had
joined the project. In 2020 the foundation announced it would be renamed the Open Infrastructure
Foundation in 2021.
Federation in the Cloud:

Cloud Federation, also known as Federated Cloud is the deployment and


management of several external and internal cloud computing services to
match business needs. It is a multi-national cloud system that integrates
private, community, and public clouds into scalable computing platforms.
Federated cloud is created by connecting the cloud environment of different
cloud providers using a common standard.

The architecture of Federated Cloud:

The architecture of Federated Cloud consists of three basic components:


1. Cloud Exchange
The Cloud Exchange acts as a mediator between cloud coordinator and cloud
broker. The demands of the cloud broker are mapped by the cloud exchange to
the available services provided by the cloud coordinator. The cloud exchange
has a track record of what is the present cost, demand patterns, and available
cloud providers, and this information is periodically reformed by the cloud
coordinator.
2. Cloud Coordinator
The cloud coordinator assigns the resources of the cloud to the remote users
based on the quality of service they demand and the credits they have in the
cloud bank. The cloud enterprises and their membership are managed by the
cloud controller.
3. Cloud Broker
The cloud broker interacts with the cloud coordinator, analyzes the Service-
level agreement and the resources offered by several cloud providers in cloud
exchange. Cloud broker finalizes the most suitable deal for their client.
Federal Cloud Architecture

Properties of Federated Cloud:

1. In the federated cloud, the users can interact with the architecture
either centrally or in a decentralized manner. In centralized interaction,
the user interacts with a broker to mediate between them and the
organization. Decentralized interaction permits the user to interact
directly with the clouds in the federation.
2. Federated cloud can be practiced with various niches like commercial
and non-commercial.
3. The visibility of a federated cloud assists the user to interpret the
organization of several clouds in the federated environment.
4. Federated cloud can be monitored in two ways. MaaS (Monitoring as a
Service) provides information that aids in tracking contracted services
to the user. Global monitoring aids in maintaining the federated cloud.
5. The providers who participate in the federation publish their offers to a
central entity. The user interacts with this central entity to verify the
prices and propose an offer.
6. The marketing objects like infrastructure, software, and platform have
to pass through federation when consumed in the federated cloud.
Benefits of Federated Cloud:

1. It minimizes the consumption of energy.


2. It increases reliability.
3. It minimizes the time and cost of providers due to dynamic scalability.
4. It connects various cloud service providers globally. The providers
may buy and sell services on demand.
5. It provides easy scaling up of resources.

Challenges in Federated Cloud:

1. In cloud federation, it is common to have more than one provider for


processing the incoming demands. In such cases, there must be a
scheme needed to distribute the incoming demands equally among
the cloud service providers.
2. The increasing requests in cloud federation have resulted in more
heterogeneous infrastructure, making interoperability an area of
concern. It becomes a challenge for cloud users to select relevant
cloud service providers and therefore, it ties them to a particular cloud
service provider.
3. A federated cloud means constructing a seamless cloud environment
that can interact with people, different devices, several application
interfaces, and other entities.

Federated Cloud technologies:

The technologies that aid the cloud federation and cloud services are:
1. OpenNebula
It is a cloud computing platform for managing heterogeneous distributed data
center infrastructures. It can use the resources of its interoperability, leveraging
existing information technology assets, protecting the deals, and adding the
application programming interface (API).
2. Aneka coordinator
The Aneka coordinator is a proposition of the Aneka services and Aneka peer
components (network architectures) which give the cloud ability and
performance to interact with other cloud services.
3. Eucalyptus
Eucalyptus defines the pooling computational, storage, and network resources
that can be measured scaled up or down as application workloads change in
the utilization of the software. It is an open-source framework that performs the
storage, network, and many other computational resources to access the cloud
environment.

Four Levels of Federation:


 Level One: Integration Between Identity Provider and Service Provider. ...
 Level Two: Multiple Federation Protocols and IDP Chaining. ...
 Level Three: Multi-Factor Authentication. ...
 Level Four: Session Management and IDP Proxying.

Federated Services and Applications:

you should properly plan your environment and ensure that the business requirements will be
met by your proposed solution. For example, if you want to provide SSO for an extranet
application in your permiter network, you will need to ensure that your design includes an AD
forest and ADFS servers in the permiter network. You will also need to ensure that the
applications support claims-based authentication using ADFS. After you document business
requirements, you can begin designing your deployment. Figure 4.63 depicts an ADFS
deployment with an application installed in the perimeter network. ADFS in this design is
providing SSO for corporate users with existing user accounts in an internal AD forest.
ADFS has several prerequisites that must be met prior to deployment.
The prerequisites are:

PKI—ADFS requires certificates to secure communications
between two environments. Self-signed certificates can be used
for testing and lab purposes but should not be used in production
deployments.

Windows Server 2008 R2 Enterprise—ADFS servers require
Windows Server 2008 R2 Enterprise edition or greater.

AD Domains—ADFS requires that an AD domain exists on both
the account and resource side.

FS Web Agent installed on application server—The Web server
hosting the application will need the federation services Web
agent installed.
Future of Federation:
The next big evolution for the internet is Cloud Computing, where everyone from
individuals to major corporations and governments move their data storage and
processing into remote data centres. Although Cloud Computing has grown, developed
and evolved very rapidly over the last half decade, Cloud Federation continues being an
open issue in current cloud market.
Cloud Federation would address many existing limitations in cloud computing:
-Cloud end-users are often tied to a unique cloud provider, because of the different
APIs, image formats, and access methods exposed by different providers that make
very difficult for an average user to move its applications from one cloud to another, so
leading to a vendor lock-in problem.
-Many SMEs have their own on-premise private cloud infrastructures to support the
internal computing necessities and workloads. These infrastructures are often over-
sized to satisfy peak demand periods, and avoid performance slow-down. Hybrid cloud
(or cloud bursting) model is a solution to reduce the on-premise infrastructure size, so
that it can be dimensioned for an average load, and it is complemented with external
resources from a public cloud provider to satisfy peak demands.
-Many big companies (e.g. banks, hosting companies, etc.) and also many large
institutions maintain several distributed data-centers or server-farms, for example to
serve to multiple geographically distributed offices, to implement HA, or to guarantee
server proximity to the end user. Resources and networks in these distributed data-
centers are usually configured as non-cooperative separate elements, so that usually
every single service or workload is deployed in a unique site or replicated in multiple
sites.

You might also like