Lab 12 Introduction To Rapidminer/Weka.: Objective
Lab 12 Introduction To Rapidminer/Weka.: Objective
Lab 12 Introduction To Rapidminer/Weka.: Objective
Objective
Introduction to Rapidminer/Weka.
Create a New Process in Rapidminer.
Lab Assessment
The following exercise should be completed at the end of this lab,
2. Use any preexisting dataset from the WEKA library. Choose any classifier and perform all the
steps mentioned in the description and summarize the following:
o Classifier used
o Test Options used
o Confusion Matrix in the form of a table
Lab Requirements
Technical Requirements (Software: RapidMiner, WEKA).
Lab Description
This lab explains about the RapidMiner and WEKA software used for machine learning.
Students should get familiar with:
How to use RapidMiner/WEKA software
How to execute RapidMiner/WEKA Program
How to perform liner regression
After starting RapidMiner Studio, the list in the center shows the typical actions, which you will
perform frequently.
New Process: Opens the design perspective and creates a new analysis process.
Open: Opens a repository browser, if you click on the button. You can choose and open an
existing process in the design perspective.
Design Perspective:
This is the central RapidMiner Studio perspective where all analysis processes are created, edited
and managed.
2
Figure 1: Design Perspective:
Operators View
All work steps (operators) available in RapidMiner Studio are presented in groups here and can
therefore be included in the current process. You can navigate within the groups in a simple manner
and browse in the operators provided to your heart's desire.
3
Figure 2: Design Perspective
Repositories View:
The repository is a central component of RapidMiner Studio. It is used for the management and
structuring of your analysis processes into projects and at the same time as both a source of data as
well as of the associated meta data.
Process View:
The Process View shows the individual steps within the analysis process as well as their
interconnections. New steps can be added to the current process in several ways.
4
Figure 3: Process View of Rapidminer
Inserting Operators:
You can insert new operators into the process in different ways:
Via drag & drop from the Operators View as described above,
Via double click on an operator in the Operators View,
5
Figure 4: Inserting operators
Parameters View:
After an operator offering parameters has been selected in the Process View, its parameters are shown
in the Parameters View.
Each time you select an operator in the Operators View or in the Process View, the help window
within the Help View shows a description of this operator.
7
Figure 7: Creating New Process
The ability to compute the output of an operator or process beforehand and to even do this
during the design time, so without having to load the actual data or even perform the process.
This is typically much less voluminous than the data itself and gives an excellent idea of which
characteristics a particular data set has.
In RapidMiner Studio the meta data are provided at the ports. Go over the output port of the
recently inserted operator with the cursor and see what happens.
If we look at the last two attributes, we see that the number and the individual price of the
objects are given within the transaction, the associated total turnover however is not.
8
Figure 8: Transforming Meta Data
Therefore we want to generate a new attribute with the name \total price", the values of which
correspond to the product from amount and single price.
Go to the group “Data Transformation” => “Attribute Set Reduction and Transformation” =>
“Generation”.
Drag the operator and connect the output port of the data generator with the input port of the
new operator and connect the output port of the latter with the result output of the total process.
9
Figure 9: Generating new attribute
Go to the “function descriptions “parameter and enter the desired computation as shown in
figure.
Open the group “Data Transformation" => “Attribute Set Reduction and Transformation" =>
”Selection" and drag the operator named “Select Attributes“.
Select the new operator and select the option “subset" in its parameters for the parameter
“attribute filter type".
The parameters should be like in Fig.
10
Figure 11: Selecting parameters
• To execute the process press the large play button in the toolbar of RapidMiner.
EX1: Linear Regression: Estimating/predicting the value of GBP with change in USD
• Drag and drop “Read CSV” Operator on Process window.
• Use Import Configuration Wizard to import the data.
11
Figure 13: Reading CSV file
12
Figure 14: Liner regression operator selection
• Drag and drop linear regression operator on Process window and connect the output port out of
Read CSV Operator with input port of Linear Regression operator.
• Connect the output port named mod of linear regression operator with process window res port
13
Figure 15: Liner Regression
• To create the model, simple click the blue play button from tool bar.
• Now we can use the model to estimate/predict the value of GBP using the value of USD.
• Read the test data from .CSV file.
• Drag and drop Apply Model Operator which has two input ports:
14
o -mod to provide the model input
o -unl to provide the data to perform prediction.
• Execute the process and you will get the predicted values.
15
Figure 18: Predicted values
Introduction to WEKA
Introduction
WEKA stands for Waikato Environment for Knowledge Learning. It was developed by the
University of Waikato, New Zealand. WEKA supports many data mining tasks such as data re-
processing, classification, clustering, regression and feature selection to name a few. The
workflow of WEKA would be as follows:
16
Figure 19: Weka Interface
17
WEKA TOOL:
• Preprocessing Filters: The data file needs to be loaded first , below is an example.
Figure 21
Figure 20: weka Tool
The supported data formats are ARFF, CSV, C4.5 and binary. Alternatively you
could also import from URL or an SQL database.
After loading the data, preprocessing filters could be used f o r adding/removing attributes,
discretization, Sampling, randomizing etc.
18
Select attributes: WEKA has a very flexible combination of search and evaluation
methods for the dataset’s attributes. Search methods include Best-first, Ranker,
Genetic-search, etc. Evaluation measures include Information Gain, Gain Ratio
etc.
Clustering: The learning process occurs from data clusters. Methods include k-means,
Cobweb and FarthestFirst.
1. Click Explorer on the first interface screen and load a dataset from the library. Given
here is an illustration for the dataset ‘weather.arff’.
19
Figure 22: Loading dataset
2. Click over each attribute to visualize the distribution of the samples for each of them. You
can also visualize all of them at the same time by clicking the ‘Visualize all’ on the right
pane.
3. Under the Classify tab, click ‘Choose’ and select a classifier from the drop-down menu. E.g.:
‘Decision Stump’
20
Figure 23: Applying classification
4. Once, a classifier is chosen, select percentage split and leaves it with its default values. The
default ratio is 66% for training and 34% for testing.
5. Click ‘Start’ to train and test the classifier. The interface will now like this:
21
Figure 24: Execution of classification algorithm
6. You could also try using ‘Cross validation’ method to train and test the data.
7. The right pane shows the results for training and testing. It also indicates the number of
correctly classified and misclassified samples.
8. You could right click on the model generated and do various operations. You could
also save the model if you wanted. Another performance measure is the ROC curve that can
be viewed as shown in the next picture. Select ‘no’ in the option to view the curve.
22
Figure 25: Different operations on the model
9. Click on the ‘Select Attributes’ tab and to analyze the attributes. A number of ‘Attribute
Evaluator’ and ‘Search methods’ can be combined to gain insight about the attributes.
Given below is an example.
23
Figure 26: Analyzing the attributes
10. Click on the Visualize tab to see the pair wise relationship of the attributes.
Performance Analysis
Once the model has been trained and tested, we need to measure the performance of the model.
For this purpose, we used three measures namely: precision, recall and accuracy.
Where tp, fp, tn and fn are true positive, false positive, true negative and false negative respectively.
Nil…