Bits Pilani, Dubai Campus
Bits Pilani, Dubai Campus
Bits Pilani, Dubai Campus
2020
BITS PILANI, DUBAI CAMPUS
Dubai International Academic City
Second Semester 2019 – 2020
Comprehensive Examination (Closed Book)(5 Pages)
Year : B.E. Date:07.06.2020FN
Course No : CS F469 MAX Marks: 40(40%)
Course Title : Information Retrieval Duration:3 Hours
Part A MCQs 10*0.5=5M
Answer all questions
1) Each posting in a positional index is:
A) a position-ID and a list of bi-words
B) a doc-ID and a list of positions
C) a position-ID and a list of documents
D) a doc-ID and a list of permutation terms
2) The sound-ex code for the input string BYis given by:
A) B000
B) B001
C) B100
D) None of the above.
3)
5) The Recommender System, that uses one technique to generate an output, which in turn is
used as an input to the second recommendation technique is called:
A) Parallelized Hybridization B) Switching
C) Feature Augmentation D) None of the above
6)A Topic Model discovers topics across various text documents. It deploys
1
CSF469Information Retrieval Question Paper07.06.2020
A)The data are linearly separable
B) The data are noisy and contain overlapping points
C) The data are clean and ready to use
D)None of the above
9. The Naive Bayes Algorithm for Text Classificationuses _____________ to make class
predictions.
10.Consider the following lexicalized subtrees for a query (q 4 ) and a document (d 2 ), in the
context of XML Retrieval.
2
CSF469Information Retrieval Question Paper07.06.2020
Now, write down the merged postings lists using the Block Sort Based Indexing algorithm (BSBI).
2.Compute the measures PRECISION and RECALL for the following IR application involving text documents:
Number of Documents
RELEVANT NON-RELEVANT
RETRIEVED 60 70
NOT RETRIEVED 160 810
4.In Cross Language Information Retrieval, what are the different types of Bilingual Corpora?
5. Discuss in brief the dimensionality reduction technique “Missing Value Ratio”. Illustrate with an example.
1. Draw the inverted index that would be built for the following document collection:
3
CSF469Information Retrieval Question Paper07.06.2020
It is proposed to develop a Multimedia Information Retrieval System for an International University. The
university is spread over various continents. Each continent has several countries and each country has
several major cities. The university has a branch campus in every major city across the world. The
university intends to provide courses content for access among all its students across the world. The course
content is based on multimedia type (text, image, video, audio, graphical) and faculty members from all
branch campuses are involved in courses content creation. Each Branch Campus hosts content for around
50 courses (there is no overlap). Each course has around 40 topics. For example, “Vector Space Model” can
be a topic in the course “Information retrieval”. A student from any branch campus can retrieve multimedia
information on any specific topic, from one or more branch campuses, based on his/her query. The query
can involve retrieval of data from one or more media types (like audio, image, video, text and graphical).
4. Write a heterogeneous multimedia Query that uses the principle of “Mix and Match data from three
sources – text, image and video”.
It is proposed to develop a BOOKs Recommender System for an online bookstore that allows a customer
to purchase / order books onlineor over telephone.
The application administrator is responsible for design and maintenance of the above system.
The following aspects are to be considered for the above system:
4
CSF469Information Retrieval Question Paper07.06.2020
a) What Recommendation approach(es) you will follow for the above Books Recommender System?
Justify your answer.
b) Write down the steps involved in building the above recommender system. (diagram not needed. Just
write the steps in plain English sentences).
4. Consider a very small collection C that consists in the following three documents:
• d1: “WESTTRYTINPOT”
• d2: “ WESTPOTTINFIN”
• d3: “WESTTRYPOTTIN”
Given the following query: “POTTINWESTOIL".
Compute the cosine similarity values between each document and the query.
=========================================================
5
Question paper - CS F46 Information Retrieval
=============================================
BITS Pilani, Dubai Campus, Academic City, Dubai
II Semester 2019-2020
Degree: B.E. Hons. TEST 2 Question Paper
Course No : CS F469 Course Title: Information Retrieval
Date: 08.04.2020 Wednesday Time: 8.30-9.20 am Total Marks: 20 Weightage: 20%
Data provided are complete. Open Book.
This question paper has 5 questions in 4 pages.
===============================================================
Answer all questions.
Term df t idft
calpurnia 1
animal 109
sunday 1009
fly 10,009
under 100,009
the 1,000,009
[3 M]
1
Question paper - CS F46 Information Retrieval
We may imagine these values as defining a vector for each computer; for in-
stance, A’s vector is [3.15, 500, 7]. We can compute the cosine distance between
any two of the vectors, but if we do not scale the components, then the disk
size will dominate and make differences in the other components essentially
in-visible. Let us use 1 as the scale factor for processor speed, α for the disk
size, and β for the main memory size.
(a) In terms of α and β, compute the cosines of the angles between the
vectors for each pair of the three computers.
(b) What are the cosines of the angles between the vectors if α = 0.01 and
β = 0.5? [3+2 M]
2
Question paper - CS F46 Information Retrieval
3
Question paper - CS F46 Information Retrieval
5. Compute the page rank for the given scenario iteratively (perform 4 iterations) using
Google's original page rank algorithm.
A, B, C and D refer to 4 web pages. Assume that the damping factor d is 0.72. [5 M]
****************
4
IR-Question Paper
BITS PILANI, DUBAI CAMPUS
Dubai International Academic City
Second Semester 2019 – 2020
TEST 1 (Closed Book)(seven questions)
Year : III/IV Date:26.02.2020 W1
Course No : CS F469 MAX Marks: 20(20%)
Course Title : Information Retrieval Duration:50 minutes
Answer all questions.
1.What is the difference between 'TOKEN' and 'TYPE' in the context of IR systems? [2 M]
2. a) What is basic principle behind Block Sort Based Indexing (BSBI) Scheme?
b) What is the Time Complexity of Block Sort-Based Indexing Scheme?
[2 M]
3. Do stemming for the following content using PORTER STEMMER and rewrite the final text (that will be
your output): [3 M]
5. Draw the inverted index that would be built for the following document collection: [4 M]
6.a)Compute the Levenshtein Distance Matrix for computing the edit distance between the following two
strings: GREAT&CREATIONAssume that GREAT is the source string and CREATION is the target string.
[3 M]
6.b.) Identify and list the operations (copy, Replace, insert, delete as applicable) , by backtracking in the
matrix. [1M]
**********************
1
BITS PILANI, DUBAI CAMPUS
Dubai International Academic City
Second Semester 2019 – 2020
Quiz (Closed Book) (2 Pages)
Year : IV/III Date: 20.04.20
Course No : CS F469 MAX Marks: 10(10%)
Course Title : INFORMATION RETRIEVAL Duration: 20 minutes
ID NO:_____________________ Name:___________________________
1. Two web search engines A and B each generate a large number of pages uniformly at
random from their indexes. 40% of A’s pages are present in B’s index, while 60% of
B’s pages are present in A’s index. What is the number of pages in A’s index relative
to B’s? [1 M]
4. Why should the host splitter precede the Duplicate URL Eliminator
in a distributed crawler? [2 M]