Final LP-VI Lab Manual 23-24

Download as pdf or txt
Download as pdf or txt
You are on page 1of 71

Sinhgad Technical Education Society

Smt. Kashibai Navale College Of Engineering


Department of Computer Engineering

Vision

We are committed to produce not only good engineers but good human
beings also.

Mission
Our mission is to do What it takes to foster, sustain and upgrade the
quality of education by way of harnessing talent, potential and
optimizing meaningful learning facilities. Our endeavor is to provide
the best learning, conductive environment and equip the students with
effective learning strategies.
STES’s
Smt. Kashibai Navale College of Engineering
Vadgaon Pune.

Laboratory Practice VI

Department of Computer Engineering


BE (2019 Course)
SINHGAD TECHNICAL EDUCATION SOCIETY’S
Smt. Kashibai Navale College of Engineering
Vadgaon, Pune 411041
Department of Computer Engineering

LABORATORY MANUAL
AY: 2023-24

LABORATORY PRACTICE VI
BE Computer Engineering

Semester –II
Subject Code - 410255

TEACHING SCHEME CREDIT EXAMINATION


Practical: 2 Hrs / Week 01 TW: 50 Marks

Prepared By:

Prof. R. B. Kale
Prof. D. A. Nevase
Sinhgad Technical Education Society’s
Smt. Kashibai Navale College Of Engineering
Vadgaon,Pune

CERTIFICATE

This is to certify that

Mr./Ms.

of Class Roll No. has completed all the practical work/term


work in subject

satisfactorily in the department of Computer Engineering as

prescribed by Savitribai Phule Pune University during the academic

year

Staff In-charge Head of Department

Date
Savitribai Phule Pune University
Fourth Year of Computer Engineering (2019 Course)
410256: Laboratory Practice VI
Teaching Scheme: Credit Examination Scheme :
Practical: 2 Hours/Week 01 Term Work: 50 Marks
Companion Course: Elective V (410252), Elective VI( 410253)
Course Objectives:
• To understand the fundamental concepts and techniques of natural language processing (NLP)
• To understand Digital Image Processing Concepts
• To learn the fundamentals of software defined networks
• Explore the knowledge of adaptive filtering and Multi-rate DSP
• To be familiar with the various application areas of soft computing.
• To introduce the concepts and components of Business Intelligence (BI)
• To study Quantum Algorithms and apply these to develop hybrid solutions

Course Outcomes:
On completion of this course, the students will be able to
CO1: Apply basic principles of elective subjects to problem solving and modeling.
CO2: Use tools and techniques in the area of software development to build mini projects
CO3: Design and develop applications on subjects of their choice.
CO4: Generate and manage deployment, administration & security.
Guidelines for Instructor's Manual
List of recommended programming assignments and sample mini-projects is provided for reference.
Referring to these, Course Teacher or Lab Instructor may frame the assignments/mini-project by
understanding the prerequisites, technological aspects, utility and recent trends related to the
respective courses. Preferably there should be multiple sets of assignments/mini-project and
distributed among batches of students. Real world problems/application based assignments/mini-
projects create interest among learners serving as foundation for future research or startup of business
projects. Mini-project can be completed in group of 2 to 3 students. Software Engineering approach
with proper documentation is to be strictly followed. Use of open source software is to be encouraged.
Instructor may also set one assignment or mini-project that is suitable to the respective course beyond
the scope of syllabus.
Operating System recommended: - 64-bit Open source Linux or its derivative Programming
Languages: C++/JAVA/PYTHON/R
Programming tools recommended: Front End: Java/Perl/PHP/Python/Ruby/.net, Backend:
MongoDB/MYSQL/Oracle, Database Connectivity: ODBC/JDBC, Additional Tools: Octave, Matlab,
WEKA,powerBI
Guidelines for Student's Laboratory Journal
The laboratory assignments are to be submitted by students in the form of a journal. Journal may
consists of prologue, Certificate, table of contents, and handwritten write-up of each assignment
(Title, Objectives, Problem Statement, Outcomes, software and Hardware requirements, Date of
Completion, Assessment grade/marks and assessor's sign, Theory- Concept in brief,
Algorithm/Database design, test cases, conclusion/analysis). Program codes with sample output of all
performed assignments are to be submitted as softcopy.
As a conscious effort and little contribution towards Green IT and environment awareness, attaching
printed papers as part of write-ups and program listing to journal may be avoided. Use of digital
storage media/DVD containing students programs maintained by lab In-charge is highly encouraged.
For reference one or two journals may be maintained with program prints at Laboratory.
Guidelines for Laboratory /Term Work Assessment
Continuous assessment of laboratory work is to be done based on overall performance and lab Home
Faculty of Engineering Savitribai Phule Pune University

Syllabus for Fourth Year of Computer Engineering assignments performance of student. Each lab
assignment assessment will assign grade/marks based on parameters with appropriate weightage.
Suggested parameters for overall assessment as well as each lab assignment assessment include-
timely completion, performance, innovation, efficient codes, punctuality and neatness reserving
weightage for successful mini-project completion and related documentation.
Guidelines for Practical Examination
It is recommended to conduct examination based on Mini-Project(s) Demonstration and related skill
learned. Team of 2 to 3 students may work on mini-project. During the assessment, the expert
evaluator should give the maximum weightage to the satisfactory implementation and software
engineering approach followed. The supplementary and relevant questions may be asked at the time
of evaluation to test the student‟s for advanced learning, understanding, effective and efficient
implementation and demonstration skills. Encouraging efforts, transparent evaluation and fair
approach of the evaluator will not create any uncertainty or doubt in the minds of the students. So
adhering to these principles will consummate our team efforts to the promising start of the student's
academics.
Guidelines for Laboratory Conduction
The instructor’s manual is to be developed as a hands-on resource and as ready reference. The
instructor's manual need to include prologue (about University/program/ institute/
department/foreword/ preface etc), University syllabus, conduction and Assessment guidelines, topics
under consideration-concept, objectives, outcomes, set of typical applications/assignments/ guidelines,
references among others.
Recommended / Sample set of assignments and mini projects for reference for four courses
offered for Elective III and for four courses offered for Elective IV. Respective Student has to
complete laboratory work for elective III and IV that he/she has opted.
410252(A): Natural Language Processing
Any 5 Assignments and 1 Mini Project are mandatory
Group 1
1. Perform tokenization (Whitespace, Punctuation-based, Treebank, Tweet, MWE) using
NLTK library. Use porter stemmer and snowball stemmer for stemming. Use any technique
for lemmatization.
Input / Dataset –use any sample sentence
2 Perform bag-of-words approach (count occurrence, normalized count occurrence), TF-IDF
on data. Create embeddings using Word2Vec.
Dataset to be used: https://www.kaggle.com/datasets/CooperUnion/cardataset
3 Perform text cleaning, perform lemmatization (any method), remove stop words (any
method), label encoding. Create representations using TF-IDF. Save outputs.
Dataset: https://github.com/PICT-NLP/BE-NLP-Elective/blob/main/3-
Preprocessing/News_dataset.pickle
4 Create a transformer from scratch using the Pytorch library
5 Morphology is the study of the way words are built up from smaller meaning bearing units.
Study and understand the concepts of morphology by the use of add delete table
Group 2
6 Mini Project (Fine tune transformers on your preferred task)
Finetune a pretrained transformer for any of the following tasks on any relevant dataset of
your choice:
 Neural Machine Translation
 Classification
 Summarization

7 Mini Project - POS Taggers For Indian Languages


8 Mini Project -Feature Extraction using seven moment variants
9 Mini Project -Feature Extraction using Zernike Moments
Virual Lab:https://nlp-iiith.vlabs.ac.in/

PART II 410253 : Elective VI


410253(C) : Business Intelligence
Any 5 Assignments and 1 Mini Project are mandatory
Group 1
1 Import the legacy data from different sources such as (Excel , Sql Server, Oracle etc.) and
load in the target system. ( You can download sample database such as Adventure works,
Northwind, foodmart etc.)
2 Perform the Extraction Transformation and Loading (ETL) process to construct the
database in the Sql server.
3 Create the cube with suitable dimension and fact tables based on ROLAP, MOLAP and
HOLAP model.
4 Import the data warehouse data in Microsoft Excel and create the Pivot table and Pivot
Chart
5 Perform the data classification using classification algorithm. Or Perform the data
clustering using clustering algorithm.

Group 2
6 Mini Project: Each group of 4 Students (max) assigned one case study for this;
A BI report must be prepared outlining the following steps:
a) Problem definition, identifying which data mining task is needed.
b) Identify and use a standard data mining dataset available for the problem.
@The CO-PO Mapping Matrix
CO/PO P PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO1 PO1 PO1
O 0 1 2
1
CO1 2 - - - 2 - - - - - - -
CO2 - 2 - - - - - - - - - -
CO3 - - - 2 - - - - 3 - - -
CO4 2 - 2 - - 3 - - - - - -
Index

Sr. Page
Date Experiment Performed Sign Remark
No. No

410253(C) : Business Intelligence


1 Import the legacy data from different sources such as (Excel ,
Sql Server, Oracle etc.) and load in the target system. ( You
can download sample database such as Adventure works,
Northwind, foodmart etc.)
2 Perform the Extraction Transformation and Loading (ETL)
process to construct the database in the Sql server.
3 Create the cube with suitable dimension and fact tables based
on ROLAP, MOLAP and HOLAP model.
4 Import the data warehouse data in Microsoft Excel and create
the Pivot table and Pivot Chart.
5 Perform the data classification using classification algorithm.
Or Perform the data clustering using clustering algorithm.

410252(A): Natural Language Processing


1 Perform tokenization (Whitespace, Punctuation-based,
Treebank, Tweet, MWE) using NLTK library. Use porter
stemmer and snowball stemmer for stemming. Use any
technique for lemmatization.
Input / Dataset –use any sample sentence
2 Perform bag-of-words approach (count occurrence, normalized
count occurrence), TF- IDFon data. Create embeddings using
Word2Vec.
Dataset to be used:
https://www.kaggle.com/datasets/CooperUnion/cardataset
3 Perform text cleaning, perform lemmatization (any method),
remove stop words (anymethod), label encoding. Create
representations using TF-IDF. Save outputs.
Dataset:https://github.com/PICT-NLP/BE-NLP-
Elective/blob/main/3Preprocessing/News_dataset.pickle
4 Create a transformer from scratch using the Pytorch
library
5 Morphology is the study of the way words are built up
from smaller meaning bearing units. Studyand understand
the concepts of morphology by the use of add delete table
Assignment No: 1
Title: Import the legacy data from different sources such as (Excel, Sql Server, Oracle etc.) and load in
the target system. ( You can download sample database such as Adventure works, Northwind, foodmart
etc.)

Objective of the Assignment: To introduce the concepts and components of Business Intelligence (BI)

Prerequisite:
1. Basics of dataset extensions.
2. Concept of data import

Contents for Theory:


1. Legacy Data
2. Sources of Legacy Data
3. How to import legacy data step by step.

What is Legacy Data?


Legacy data, according to Business Dictionary, is "information maintained in an old or
out-of-date format or computer system that is consequently challenging to access or
handle."
2. Sources of Legacy Data
Where does legacy data come from? Virtually everywhere. Figure 1 indicates that there are
many sources from which you may obtain legacy data. This includes existing databases,
often relational, although non-RDBs such as hierarchical, network, object, XML,
object/relational databases, and NoSQL databases. Files, such as XML documents or "flat
files" such as configuration files and comma-delimited text files, are also common sources
of legacy data. Software, including legacy applications that have been wrapped (perhaps
1.What is Legacy Data?

Legacy data, according to Business Dictionary, is "information maintained in an old or out-of-date format
or computer system that is consequently challenging to access or handle."

2. Sources of Legacy Data

Where does legacy data come from? Virtually everywhere. Figure 1 indicates that there are many sources
from which you may obtain legacy data. This includes existing databases, often relational, although non-
RDBs such as hierarchical, network, object, XML, object/relational databases, and NoSQL databases.
Files, such as XML documents or "flat files” such as configuration files and comma-delimited text files,
are also common sources of legacy data. Software, including legacy applications that have been wrapped
(perhaps via CORBA) and legacy services such as web services or CICS transactions, can also provide
access to existing information. The point to be made is that there is often far more to gaining access to
legacy data than simply writing an SQL query against an existing relational database.

How to import legacy data step by step.

Step 1: Open Power BI

Step 2: Click on Get data following list will be displayed → select Excel
Step 3: Select required file and click on Open, Navigator screen appears

Step 4: Select file and click on edit


Step 5: Power query editor appears
Step 6: Again, go to Get Data and select OData feed

Step 7: Paste url as http://services.odata.org/V3/Northwind/Northwind.svc/ Click on ok


Step 8: Select orders table and click on edit

Note: If you just want to see preview you can just click on table name without clicking on checkbox
Click on edit to view table
Conclusion: In this way we import the Legacy datasets using the Power BI Tool.

FAQ’s:
1. What is legacy Data?
2. How to import legacy data step by step?
3. What are the sources of legacy data?
4. What are the features of PowerBI?
5. Name few BI Tools?
6. What is the purpose of using BI Tools?
7. What do you mean by Power Query?
8. What is BI?
9. Where does legacy data come from?
10. Name the BI tool used in this Assignment.
Assignment No:2

Title: Perform the Extraction Transformation and Loading (ETL) process to construct the database in the
Sql server.

Objective of the Assignment: To introduce the concepts and components of Business Intelligence (BI)

Prerequisite:
1. Basics of ETL Tools.

2. Concept of Sql Server.

Theory:
ETL(Extract, Transform and Load)
ETL is a process in Data Warehousing and it stands for Extract, Transform and Load.
It is a process, in which an ETL tool extracts the data from various data source systems, transforms it in
the staging area and then finally, loads it into the Data Warehouse system.
Extraction
1. Identify the Data Sources: The first step in the ETL process is to identify the data
sources. This may include files, databases, or other data repositories.

2. Extract the Data: Once the data sources are identified, we need to extract the data
from them. This may involve writing queries to extract the relevant data or using
tools such as SSIS to extract data from files or databases.

3. Validate the Data: After extracting the data, it's important to validate it to ensure
that it's accurate and complete. This may involve performing data profiling or
data quality checks.

Transformation
1. Clean and Transform the Data: The next step in the ETL process is to clean and
transform the data. This may involve removing duplicates, fixing invalid data, or
converting data types. We can use tools such as SSIS or SQL scripts to perform these
transformations.

2. Map the Data: Once the data is cleaned and transformed, we need to map the data to
the appropriate tables and columns in the database. This may involve creating a data
mapping document or using a tool such as SSIS to perform the mapping.

Loading
1. Create the Database: Before loading the data, we need to create the database and the
appropriate tables. This can be done using SQL Server Management Studio or a SQL
script.

2. Load the Data: Once the database and tables are created, we can load the data into
the database. This may involve using tools such as SSIS or writing SQL scripts to
insert the data into the appropriate tables.

3. Validate the Data: After loading the data, it's important to validate it to ensure that it was
loaded correctly. This may involve performing data. profiling or data quality checks to
ensure that the data is accurate and complete.
Perform the Extraction Transformation and Loading (ETL) process to construct the database in the SQL
server.

Software requirements: SQL SERVER 2012 FULL VERSION (SQLServer2012SPl-FullSlipstream-


ENU-x86)

Steps to install SQL SERVER 2012 FULL VERSION (SQLServer2012SPl- FullSlipstream-ENU-x86)


are given in my previous post.

Step 1: Open SQL Server Management Studio to restore backup file

Step 2: Right click on Databases Restore Database

Step 3: Click on towards end of device box

Step 4: Click on Add Select path of backup files

Step 5: Select both files at a time

Step 6 : Click ok and in select backup devices window Add both files of Adventure Works

Step 7: Open SQL Server Data Tools

Select File New Project Business Intelligence Integration Services Project & give appropriate project
name.

Step 8: Right click on Connection Managers in solution explorer and click on New Connection Manager.

Add the SSIS connection manager window.

Step 9: Select OLEDB Connection Manager and Click on Add

Step 10: Configure OLE DB Connection Manager window appears Click on New

Step 11: Select Server name(as per your machine) from drop down and database name and click on Test
connection.

If the test connection succeeded, click on OK.

Step 12: Click on OK

Connection is added to connection manager

Step 13: Drag and drop Data Flow Task in Control Flow tab

Step 14: Drag OLE DB Source from Other Sources and drop into Data Flow tab
Step 15: Double click on OLE DB source -> OLE DB Source Editor appears-> click on New to add
connection manager.

Select [Sales].[Store] table from drop down ok

Step 16: Drag ole db destination in data flow tab and connect both

Step 17: Double click on OLE DB destination

Click on New to run the query to get [OLE DB Destination] in Name of the table or the view.

Click on OK.

Step 18: Click on Start

Step 19: Go to SQL Server Management Studio

In database tab Adventureworks Right click on [dbo].[OLE DB Destination] Script Table as SELECT To
New Query Editor Window

Step 20: Execute the following query to get output.

USE [AdventureWorks2012] GO
SELECT [BusinessEntityID]
,[Name]
,[SalesPersonID]
,[Demographics]
,[rowguid]
,[ModifiedDate]
FROM [dbo].[OLE DB Destination] GO

Conclusion : In this way we can perform the ETL process to construct a database in SQL Server.

FAQ’s:
1. What is ETL?
2. What are the different components of BI?
3. What is Data Extraction?
4. What do you mean by Data Transformation?
5. What is Data Warehouse?
6. What is Data Mart?
7. Explain ETL process with respect to SQL Server?
8. How many steps are there in an ETL process?
9. What are the steps involved in an ETL process?
10. What are initial load and full load?
Assignment No:3

Title of the Assignment: Create the cube with suitable dimension and fact tables based on
ROLAP, MOLAP and HOLAP model.

Objective of the Assignment: To introduce the concepts and components of Business Intelligence
(BI)

Prerequisite:

1. Basics of OLAP.

2. Concept of Multi Dimensional Cube.

Theory :

What is a Fact Table ?

In Business Intelligence (BI), A Fact Table is a table that stores quantitative data or facts about
a business process or activity.It is a central table in a data warehouse that provides a snapshot of
a business at a specific point in time.
For example - A Fact Table in a retail business might contain sales data for each transaction,
with dimensions such as date, product, store, and customer. Analysts can use the Fact Table to
analyze trends and patterns in sales, such as which products are selling the most, which stores
are performing well, and which customers are buying the most.

1. What is a ROLAP, MOLAP and HOLAP model

ROLAP, MOLAP, and HOLAP are three types of models used in Business Intelligence (BI)
for organizing and analyzing data:
1. ROLAP (Relational Online Analytical Processing):

In this model, data is stored in a relational database, and the analysis is performed by
joining multiple tables. ROLAP allows for complex queries and is good for handling large
amounts of data, but it may be slower due to the need for frequent joins.
2. MOLAP (Multidimensional Online Analytical Processing):

13
In this model, data is stored in a multidimensional database, which is optimized for fast query
performance. MOLAP is good for analyzing data in multiple dimensions, such as time,
geography, and product, but may be limited in its ability to handle large amounts of data.
3. HOLAP (Hybrid Online Analytical Processing):

This model combines elements of both ROLAP and MOLAP. It stores data in both a relational
and multidimensional database, allowing for efficient analysis of both large amounts of data and
complex queries. HOLAP is a good compromise between the other two models, offering both
speed and flexibility.

4. Create the cube with a suitable dimension and fact tables based on OLAP ?

Step 1: Creating Data Warehouse


Let us execute our T-SQL Script to create a data warehouse with fact tables, dimensions and
populate them with appropriate test values.
Download the T-SQL script attached with this article for creation of Sales Data Warehouse or
download from this article “Create First Data Warehouse” and run it in your SQL Server.
Downloading "Data_WareHouse SQLScript.zip" from the article
https://www.codeproject.com/Articles/652108/Create-First-Data-WareHou se

14
After downloading the extract file in the folder.
Follow the given steps to run the query in SSMS (SQL Server Management Studio).
1. Open SQL Server Management Studio 2012

2. Connect Database Engine


Password for sa : admin123 (as given during installation) Click Connect.
3. Open New Query editor

15
4. Copy paste Scripts given below in various steps in new query editor window one by one

5. To run the given SQL Script, press F5

6. It will create and populate “Sales_DW” database on your SQL Server OR

1. Go to the extracted sql file and double click on it.

2. New Sql Query Editor will be opened containing the Sales_DW Database.

3. Click on execute or press F5 by selecting the query one by one or directly click on
Execute.

4. After completing execution save and close SQL Server Management studio & Reopen to see
Sales_DW in Databases Tab.

16
Step 2: Start SSDT environment and create New Data Source Go to Sql Server Data Tools --> Right click and
run as administrator

Click on File → New → Project


In Business Intelligence → Analysis Services Multidimensional and Data Mining models → appropriate
project name → click OK

Right click on Data Sources in solution explorer → New Data Source


Data Source Wizard appears

Click on New

Select Server Name → select Use SQL Server Authentication → Select or enter a database name (Sales_DW)
Note : Password for sa : admin123 (as given during installation of SQL 2012 full version)

Click Next

Select Inherit → Next


Click Finish

Sales_DW.ds gets created under Data Sources in Solution Explorer

Step 3: Creating New Data Source View


In Solution explorer right click on Data Source View → Select New Data Source View

Click Next
click Next
select FactProductSales(dbo) from Available objects and put in Includes Objects by clicking

Click Next

Click Finish

Sales DW.dsv appears in Data Source Views in Solution Explorer.

Step 4: Creating new cube


Right click on Cubes → New Cube

Select Use existing tables in Select Creation Method → Next

In Select Measure Group Tables → Select FactProductSales → Click Next


In Select Measures → check all measures → Next

In Select New Dimensions → Check all Dimensions → Next

Click on Finish

Sales_DW.cube is created

Step 5: Dimension Modification


In dimension tab → Double Click Dim Product.dim

Drag and Drop Product Name from Table in Data Source View and Add in Attribute Pane at left side

Step 6: Creating Attribute Hierarchy in Date Dimension


Double click On Dim Date dimension -> Drag and Drop Fields from Table shown in Data Source View to
Attributes-> Drag and Drop attributes from leftmost pane of attributes to middle pane of Hierarchy.
Drag fields in sequence from Attributes to Hierarchy window (Year, Quarter Name, Month Name, Week of
the Month, Full Date UK)

Step 7: Deploy Cube


Right click on Project name → Properties

This window appears

Do following changes and click on Apply & ok

Right click on project name → Deploy


Deployment successful

To process cube right click on Sales_DW.cube → Process

Click run

Browse the cube for analysis in solution explorer

Conclusion: In this way we successfully implement cube with suitable dimension and fact tables based
on ROLAP, MOLAP and HOLAP model

FAQ’s:

1. What is ROLAP?
2. What is Fact Table?
3. What is MOLAP?
4. What is HOLAP?
5. What is the difference between ROLAP & MOLAP?
6. What is Data Cube?
7. Name the models used in BI for organizing & analysing Data.
8. Which is faster ROLAP or MOLAP?
9. Which model is amalgamation of ROLAP and MOLAP?
10. What common techniques are used in ROLAP and MOLAP?
Assignment No:-4

Title of the Assignment: Import the data warehouse data in Microsoft Excel and create the Pivot table and Pivot
Chart.

Objective of the Assignment: To introduce the concepts and components of Business Intelligence (BI)

Prerequisite:

1. Basics of Google Sheets.

2. Concept of Table, Chart.

Contents for Theory:

1. What is a Data Warehouse?

2. What is Pivot Table and Pivot Chart?

3. Steps for Creating a Pivot Table in Google Sheets.

4. Steps for Creating a Pivot Chart in Google Sheets.

1. What is a Data Warehouse?

A data warehouse is a centralized repository of integrated and transformed data from multiple sources
within an organization. It is designed to support business intelligence (BI) activities, such as data analysis,
reporting, and decision-making.

2. What is Pivot Table and Pivot Chart?

A pivot table is a powerful tool in spreadsheet software (such as Google Sheets or Microsoft Excel) that
allows you to summarize and analyze large datasets by grouping and summarizing data in different ways.
Pivot tables allow you to quickly create tables that show a summary of data based on specific criteria or
dimensions. For example, you can use a pivot table to summarize sales data by region or by product
category. A pivot chart is a graphical representation of the data in a pivot table. Pivot charts allow you to
visualize the summarized data in a way that is easy to understand and interpret. They can be created based
on the data in a pivot table, and can be customized in a variety of ways to better represent the data being
analyzed. Pivot charts are especially useful when dealing with large amounts of data, as they can help
identify patterns and trends that might not be immediately obvious from the raw data.

3. Steps for Creating a Pivot Table in Google Sheets.

1. Open a Google Sheets document with the data you want to use for the pivot table.
2. Select the range of data you want to use for the pivot table.
3. Click on the "Data" tab in the top menu, then click on "Pivot table."
4. In the "Create Pivot Table" dialog box, select the range of data you want to use for the pivot table and
choose where you want to place the pivot table (in a new sheet or in the same sheet).
5. Click on "Create."
6. In the pivot table editor, drag and drop the columns you want to use for the pivot table into the "Rows,"
"Columns," and "Values" sections.
7. To add a filter to the pivot table, drag a column into the "Filter" section.
8. To customize the values in the pivot table, click on the drop-down menu in the "Values" section and
choose the type of calculation you want to use (such as sum, count, or average). 9. Customize any
additional options in the pivot table editor (such as sorting and formatting).
10.Click on "Update" to apply the changes and create the pivot table

4. Steps for Creating a Pivot Chart in Google Sheets.

1. Open a Google Sheets document with the data you want to use for the pivot chart.
2. Select the range of data you want to use for the pivot chart.
3. Click on the "Data" tab in the top menu, then click on "Pivot table."
4. In the "Create Pivot Table" dialog box, select the range of data you want to use for the pivot table and
choose where you want to place the pivot table (in a new sheet or in the same sheet).
5. Click on "Create."
6. In the pivot table editor, drag and drop the columns you want to use for the pivot chart into the "Rows"
and "Values" sections.
7. Click on the "Chart" tab in the pivot table editor.
8. Choose the type of chart you want to use for the pivot chart from the drop-down menu.
9. Customize the chart options (such as chart title, axis labels, and colors) to your liking.
10. Click on "Update" to apply the changes and create the pivot chart.
Conclusion: In this way we pivot table and pivot chart using Google spreadsheets | Excel.

FAQ’s:

1. What is PivotTable in data warehouse??


2. What is pivot Chart?
3. What is the difference between pivot table and pivot Chart?
4. What is data warehouse?
5. What is data mart?
6. What is pivot data?
7. How do I create a PivotTable from Power Pivot data?
8. How do you create a PivotTable with the data in the database sheet?
9. What is a pivot chart used for?
10. Why is it called a pivot table?
11. What are the advantages of pivot table?
Assignment No:-5
Title of the Assignment: Perform the data classification using classification algorithm. Or perform the data
clustering using a clustering algorithm.

Objective of the Assignment: To introduce the concepts and components of Business Intelligence (BI)

Prerequisite: 1. Basics of Tableau.

Contents for Theory:

1. What is Clustering and classification?

2. Clustering in Tableau:

3. Classification in Tableau:

Theory:

1. What is Clustering and classification?


Clustering and classification are two important techniques used in bioinformatics to analyze biological data.
Clustering is the process of grouping similar objects or data points together based on their similarity or
distance from each other. In bioinformatics, clustering is often used to group genes or proteins based on their
expression patterns or sequences. Clustering can help identify patterns and relationships between different
genes or proteins, which can provide insights into their biological function and interactions. Classification,
on the other hand, is the process of assigning a label or category to a new observation based on its features or
characteristics. In bioinformatics, classification is often used to predict the function or activity of a new gene
or protein based on its sequence or structure. Classification can help identify new drug targets or biomarkers
for disease diagnosis and treatment. Both clustering and classification are important tools for analyzing large
and complex biological datasets and can provide valuable insights into the underlying biological processes.

Clustering in Tableau:

1. Connect to the data: Connect to the data set that you want to cluster in Tableau.

2. Drag and drop the data fields: Drag and drop the data fields into the view, and select the data points that
you want to cluster.

3. Choose a clustering algorithm: Select a clustering algorithm from the analytics pane in Tableau. Tableau
provides several built-in clustering algorithms, such as K-Means and Hierarchical Clustering.

4. Define the number of clusters: Define the number of clusters that you want to create. You can do this
manually or let Tableau automatically determine the optimal number of clusters. 5. Analyze the clusters:
Visualize the clusters and analyze them using Tableau's built-in visualizations and tools.

Classification in Tableau:

1. Connect to the data: Connect to the data set that you want to classify in Tableau.
2. Drag and drop the data fields: Drag and drop the data fields into the view, and select the target variable that you
want to predict.

3. Choose a classification algorithm: Select a classification algorithm from the analytics pane in Tableau. Tableau
provides several built-in classification algorithms, such as Decision Trees and Random Forest.

4. Define the model parameters: Define the model parameters, such as the maximum tree depth or the number of
trees to use in the forest.

5. Train the model: Train the model on a subset of the data using Tableau's built-in cross-validation functionality.

6. Evaluate the model: Evaluate the accuracy of the model using Tableau's built-in metrics, such as confusion
matrix, precision, recall, and F1 score.

7. Predict the target variable: Use the trained model to predict the target variable for new data.

8. Visualize the results: Create visualizations to communicate the results of the classification analysis using
Tableau's built-in visualization tools.

Conclusion: In this way we implement classification and clustering using Tableau.

FAQ’s:

1. What is an example of a classification algorithm?


2. What is classification in Tableau?
3. What is classification in data visualization?
4. What are classification vs clustering in data?
5. What is accuracy?
6. Which algorithm is used to perform clustering in Tableau?
7. How do you do clustering in Tableau?
8. What is the use of clusters in Tableau?
9. How do you use clustering algorithms?
10. What is the example of clustering algorithm?
11. What is the purpose of clustering?
12. What is the importance of clustering?
13. What is a real life example of clustering?
Assignment No 1

Problem Statement:
Perform tokenization (Whitespace, Punctuation-based, Treebank, Tweet, MWE) using NLTK library.
Use porter stemmer and snowball stemmer for stemming. Use any technique for lemmatization.
Input / Dataset –use any sample sentence

Objective:
To understand the fundamental concepts and techniques of natural language processing (NLP).

CO Relevance: CO1

Contents for Theory:

Introduction to NLP (Natural Language Processing)

Computers speak their own language, the binary language. Thus, they are limited in how they can
interact with us humans; expanding their language and understanding our own is crucial to set them free
from their boundaries.

NLP is an abbreviation for natural language processing, which encompasses a set of tools, routines, and
techniques computers can use to process and understand human communications. Not to be confused
with speech recognition, NLP deals with understanding the meaning of words other than interpreting
audio signals into those words.

If you think NLP is just a futuristic idea, you may be shocked to know that we are likely to interact
withNLP every day when we perform queries in Google when we use translators online when we
talk with Google Assistant or Siri. NLP is everywhere, and to implement it in your projects is now very
reachable thanks to libraries such as NLTK, which provide a huge abstraction of the complexity.
Conclusion- In this way we have performed tokenization using NLTK. And porter and snowball
stemming. Using SpaCy library performed lemmatization.

FAQ’s:
1. What is difference between porter and snowball stemmer?
2. What is lemmatization?
3. Differentiate between lemmatization and stemming.
4. What are different python libraries used for lemmatization.
5. Why do we need tokenization?
Assignment No:2

Title of the Assignment:

Perform bag-of-words approach (count occurrence, normalized count occurrence), TF-IDF on


data.Create embeddings using Word2Vec.
Dataset to be used: https://www.kaggle.com/datasets/CooperUnion/cardataset

Objective of the Assignment: To understand the fundamental concepts and techniques of natural language
processing (NLP).

CO Relevance: CO1

Theory:
Conclusion: In this way we have implement Bag-of-Word (BOW), Tf-Idf appoarch by using
python library. And word2vec model implemented successfully by using genism library.

FAQ’s:

1. What is Tf-Idf?
2. Differentiate between continuous-bags-of-words and skip-gram.
3. What is mean by Word Embedding? And what are techniques of word embedding?
4. Why word embedding is required?
5. Which library used in word embedding?
Assignment No 3

Title of the Assignment:


Perform text cleaning, perform lemmatization (any method), remove stop words (any method), label
encoding. Create representations using TF-IDF. Save outputs.

Dataset: https://github.com/PICT-NLP/BE-NLP-
Elective/blob/main/3-Preprocessing/News_dataset.pickle

Objective of the Assignment: To understand the fundamental concepts and techniques of natural
language processing (NLP).

CO Relevance: CO1
Department of Computer Engineering, RMDSSOE , Warje, Pune-58
Conclusion:

Hence, we perform text cleaning, lemmatization, remove stop words, label encoding and create representations
using TF-IDF.

FAQ’s:
1. What is lemmatization?
2. Differentiate between lemmatization and stemming?
3. How calculate TF-IDF?
4. What is pickle library?
5. How perform text cleaning?
Assignment No 4

Title of the Assignment:


Create a transformer from scratch using the Pytorch library.

Objective of the Assignment: To understand the fundamental concepts and techniques of natural language
processing (NLP).

CO Relevance: CO1
Conclusion:

Implement a transformer from scratch using the Pytorch library.

FAQ’s:
 What are the types of embedding in transformer?
 What are the applications of Transformer
 What is the purpose of transformers in NLP?
 What is a Transformer?
 What is Self Attention?
Assignment No 5

Title of the Assignment:


Morphology is the study of the way words are built up from smaller meaning bearing
units. Study andunderstand the concepts of morphology by the use of add delete table.

Objective of the Assignment: To understand the fundamental concepts and techniques of


natural languageprocessing (NLP).

CO Relevance: CO1

Morphology:

Morphology is the study of the way words are built up from smaller meaning bearing units
i.e.,
morphemes. A morpheme is the smallest meaningful linguistic unit. For eg:

 बबबब(ो bachchoM) consists of two morphemes, बबो˛ोT(bachchaa) has the


information of the root word noun "बब˛ोT"(bachchaa) and ब(ोो̇ oM) has the
information of plural and oblique case.
 played has two morphemes play and -ed having information verb "play" and "past
tense", so givenword is past tense form of verb "play".

Words can be analysed morphologically if we know all variants of a given root word. We can
use an 'Add-Delete' table for this analysis.
Morph Analyser

Definition:

Morphemes are considered as smallest meaningful units of language. These morphemes can
either be a root word(play) or affix(-ed). Combination of these morphemes is called
morphological process. So, word "played" is made out of 2 morphemes "play" and "-ed".
Thus finding all parts of a word(morphemes) and thus describing properties of a word is
called "Morphological Analysis". For example, "played" has information verb "play" and
"past tense", so given word is past tense form of verb "play".
Analysis of a word:

बब˛बोो (bachchoM) = बब˛ोT(bachchaa)(root) + ब(ो ˙ oM)(suffix) (ब=


ो 3 plural oblique) A linguistic
paradigm is the complete set of variants of a given lexeme. These variants can be
classified according to shared inflectional categories (eg: number, case etc) and arranged
into tables.

Paradigm for बब˛ोT

Case/num Singular Plural

Direct बब˛ ोT(bachchaa) बब˛ो` (bachche)

oblique बब˛ो` (bachche) बब˛बोो (bachchoM)

Algorithm to get बबबब(ो ो̇ bachchoM) from बब˛ोT(bachchaa)

1. Take Root बब˛(bachch)ब(aa)

2. Delete ब(aa)

42
3. output बब˛(bachch)

4. Add ब(ोो̇ oM) to output

5. Return बब˛बोो (bachchoM)

Therefore ब is deleted and बोो˙ is added to get बबबब

Add-Delete table for बब˛ ोT

Delete Add Number Case Variants

ब(aa) ब(aa) sing dr बब˛ ोT(bachchaa)

ब(aa) ब(e) Plu dr बब˛ो` (bachche)

ब(aa) ब(e) Sing ob बब˛ो` (bachche)

ब(aa) ब(ोो̇ oM) Plu ob बब˛ब(ो bachchoM)

Paradigm Class

Words in the same paradigm class behave similarly, for Example बबबब is in the same
paradigm class as
बब˛, so बबबबT would behave similarly as बब˛ोT as they share the same paradigm class.

Conclusion: Understanding the morphology of a word by the use of Add-Delete table


successfully.

FAQ’s:

 What is Morphology?

 What are types of Morphology?

43
 Why do we need to do Morphological Analysis?

 What is the application of morphology in linguistics?

 What is difference between inflectional and derivational morphology?

44

You might also like