Report Movie Recommendation

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 49

ABSTRACT

A Recommendation System filters the data using different algorithms and


recommends the most relevant items to users. Our project aims to implement a
recommendation system that responds to the user to get the recommendations for a
movie. The ultimate purpose of Movie Recommendation System is to make the user's
experience better by recommending them movies. This system uses Collaborative
Filtering approach. Here recommendations get filtered based on the collaboration
between similar user’s preferences. Collaborative Filtering (CF) predicts user
preferences in item selection based on the known user ratings of items. Collaborative
filtering techniques are the most popular and widely used by recommender systems
technique, which utilize similar neighbors to generate recommendations. In our
project, initially we have performed Exploratory Data Analysis (EDA), then created a
Recommendation System and finally integrated this system in a web application using
Django.
1. INTRODUCTION
CHAPTER-1
INTRODUCTION

1.1 GENERAL INTRODUCTION

Recommendation systems are predicting systems that radically recommend items to users or
users to the items, and sometimes users to users too. Tech giants like YouTube, Amazon
Prime, Netflix use similar methods to recommend video content according to their desired
interest. As the internet contains huge loads of data, finding your content is very difficult and
can be very time consuming, thus the recommendation plays an important role in minimizing
our effort. These systems are getting more popular nowadays in various areas such as in
books, videos, music, movies, and other social network sites where the recommendation is
used to filter out the information. It is a tool that is using the user’s information to improve
the suggestion result and give out the most preferred choice. User/Customer satisfaction is
key for building the tool. It is beneficial for both customers and companies, as the more
satisfied the customer is, the more likely he/she would want to use the system for their ease,
which would ultimately make revenues for the companies. Although there are a lot of
algorithms, collaborative filtering is the most popular one used by the companies as it
involves user’s interactions more. Collaborative filtering can predict better than content-
based filtering because it analyses the user’s browsing history and compares with other users
and then suggests results. Whereas, the content-based filtering takes the user’s information as
an input to find similar movies and recommends them in descending order.

1.2 ORGANIZATIONAL PROFILE


1.3 OBJECTIVES

Our project aims to implement a recommendation system that responds to the user to get the
recommendations for a movie. The ultimate purpose of Movie Recommendation System is to
make the user's experience better by recommending them movies. This system uses
Collaborative Filtering approach. Here recommendations get filtered based on the
collaboration between similar user’s preferences. Collaborative Filtering (CF) predicts user
preferences in item selection based on the known user ratings of items.

1.4 SCOPE AND RELEVANCE OF THE PROJECT

Recommender systems are information filtering systems that help deal with the problem of
information overload by filtering and segregating information and creating fragments out of
large amounts of dynamically generated information according to user’s preferences,
interests, or observed behavior about a particular item or items. A Recommender system has
the ability to predict whether a particular user would prefer an item or not based on the user’s
profile and its historical information. Recommendation systems have also proved to improve
the decision making processes and quality. In large e-commerce settings, recommender
systems enhance the revenues for marketing, for the fact that they are effective means of
selling more products. In scientific libraries, recommender systems support and allow users to
move beyond the generic catalogue searches. Therefore, the need to use efficient and accurate
recommendation techniques within a system that provides relevant and dependable
recommendations for users cannot be neglected.

Conglomerates like Netflix use a recommendation engine to present their viewers


with movie and show suggestions. Amazon, on the other hand, uses its recommendation
engine to present customers with product recommendations. While each uses the one for
slightly different purposes, both in general have the same goal: to drive sales, boost
engagement and customer retention, and deliver more personalized customer experiences.
Recommendations typically speed up the searches and make it easier for users to access the
content they have always been interested in, and surprise them with several offers they would
have never searched for. Doing companies are able to gain new customers by sending out
customized emails with links to new offers that meet the recipients’ interests, or suggestions
of films and TV shows that suit their particular profiles.
2. SYSTEM STUDY AND ANALYSIS
CHAPTER 2
SYSTEM ANALYSIS
2.1 INTRODUCTION

System Analysis is the process of gathering and interpreting facts, diagnosing the problems
and using the information to recommend improvements. System study is a general term that
refers to an orderly, structured process for identifying and solving problems. The first phase
of software development is system study. The importance of system study phase is the
establishment of the requirements for the system to be acquired, developed and installed.
Analysing the project to understand the complexity forms the vital part of the system study.
Problematic areas are identified and information is collected. Fact finding or gathering is
essential to any analysis of requirements. It is also highly essential that the analyst familiarize
himself with the objectives, activities and functions of organizations in which the system is to
be implemented. In system study, a detailed study of these operations performed by a system
and their relationships within and outside the system is done. A key question considered here
is, “What must be done to solve the problem?” One aspect of system study is defining the
boundaries of the application and determining whether or not the candidate application
should be considered.

2.2 EXISTING SYSTEM


We use Digital libraries for a wide variety of digital objects research papers,
publications, journals, research projects, newspapers, magazines, and past questions.
But some digital libraries even offer millions of digital objects. Therefore, getting or
finding favorite digital objects from a large collection of available digital objects in
the digital library is one of the major problems. The users need help in finding items
that are in accordance with their interests. Recommender systems offer a solution to
this problem as library users will get recommendations using a form of smart search.
Content-based recommenders provide recommendations by comparing representation
of contents describing an item or a product to the representation of the content
describing the interest of the user. They are sometimes referred to as content based
filtering. Content-based technique is suitable in situations or domains where items are
more than users.
2.3 PROPOSED SYSTEM

The proposed recommendation system used the collaborative filtering technique which is far
more accurate and more efficient to use. Collaborative filtering provides many advantages
over content-based filtering. A few of them are as follows 1. Not required to understand item
content, the content of the items does not necessarily tell the whole story, such as movie
type/genre, and so on. 2. No item cold-start problem: Even when no information on an item is
available, we still can predict the item rating without waiting for a user to purchase it. 3.
Captures the change in user interests over time: Focusing solely on content does not provide
any flexibility on the user's perspective and their preferences. 4. Captures inherent subtle
characteristics: This is very true for latent factor models.

2.4 FEASIBILITY STUDY

A feasibility analysis evaluates the candidate systems and determines the best system that
needs performance requirements. The purpose of feasibility study is to investigate the present
system, evaluate the possible application of computer-based methods, select a tentative
system, evaluate the cost and effectiveness of the proposed system, evaluate the impact of
proposed system on existing system and as certain the need for new system. Feasibility is
carried out to see if the system is technically, economical and operationally feasible.

All projects are feasible when given unlimited resources and infinite time. It is both
necessary and prudent to evaluate the feasibility of the project at the earliest possible time..
An estimate is made of whether the identified user may be satisfied using current hardware
and software technologies. The study will decide if the proposed system will be cost effective
from the business point of view and if it can be developed in the existing budgetary
constraints.

The objective of a feasibility study is to test the technical, social and economic feasibility of
developing a computer system. This is done by investigating the existing system and
generating ideas about a new system. The computer system must be evaluated from a
technical viewpoint first, and if technically feasible, their impact on the organization and the
staff must be accessed. If a compatible, social and technical system can be devised, then it
must be tested for economic feasibility.

There are eight steps involved in a feasibility study,


1. Form a project team and appoint a project leader

2. Prepare system flow chart.

3. Enumerate potential candidate system.

4. Describe and identify characteristics of candidate system.

5. Determine and evaluate performance and cost effectiveness of each candidate system.

6. Weight the system performance and cost.

7. Select the best candidate system.

8. Report project directive management.

2.4.1 Operational Feasibility

Operational feasibility is connected with human organizational and political aspects. The
issues considered are the job changes that will be brought about, the organizational structures
that will be distributed and the new skills that will be required. Methods of processing and
presentation are all according to the needs of clients since they can meet all user requirements
here. The proposed system will not cause any problem, any circumstances and will work
according to the specifications mentioned. Hence the proposed system is operationally
feasible. People are inherently resistant to change and computer has been known to facilitate
changes. The system operation is the longest phase in the development life cycle of a system.
So operational feasibility should be given much importance. This system has a user friendly
interface. Thus, it is easy to handle.

2.4.2 Technical Feasibility

Technical feasibility is the most important of all types of feasibility analysis. Technical
feasibility deals with hardware as well as software requirements. An idea from the outline
design to system requirements in terms of input/output files and procedures is drawn and
types of hardware and software and the methods required for running the system are
analyzed. Technical study is a study of hardware and software requirement. All the technical
issue related to the proposed system is dealed during feasibility stage of preliminary
investigation produced the following results: While considering the problems of existing
systems it is sufficient to implement the new system. The proposed system can be
implemented to solve issues in the existing system. It includes the evaluation of how it meets
the proposed system. The assessment of technical feasibility must be based on the outline of
the system requirements in terms of inputs, outputs, files, programs and procedures. This can
be quantified in terms of volumes of data, trends, frequency of updating, etc.

2.4.3 Economic Feasibility

Economic analysis is the most frequently used method for evaluating the effectiveness of
software, more commonly known as the cost/benefit analysis. The procedure is to determine
the benefits and savings that are expected from a candidate and compare them with costs. If
the benefits outweigh cost, the decision is made to design and implement the system;
otherwise further alternatives have to be made. Here it is seen that no new hardware or
software is needed for the development of the system.

2.4.4 Behavioural Feasibility

Behavioural feasibility determines how much effort will go in to educating, selling and
training the user on the candidate system. People are inherently resistant to change and
computers have been known to facilitate change. Since the system is user friendly, user
training is a very easy matter.

2.4.5 Legal Feasibility

Legal feasibility is the determination of any infringement, violation, or liability that could
result from the development of the system. Legal feasibility environment passes abroad range
of concerns that include contract and liability. The proposed project is also a legally feasible
one.

2.5 ARTIFICIAL INTELLIGENCE

The intelligence demonstrated by machines is known as Artificial Intelligence. Artificial


Intelligence has grown to be very popular in today’s world. It is the simulation of natural
intelligence in machines that are programmed to learn and mimic the actions of humans.
These machines are able to learn with experience and perform human-like tasks. As
technologies such as AI continue to grow, they will have a great impact on our quality of life.
It’s but natural that everyone today wants to connect with AI technology somehow, may it be
as an end-user or pursuing a career in Artificial Intelligence. Artificial Intelligence is the
broader family consisting of Machine Learning and Deep Learning as its components.
3 Types of Artificial Intelligence
 Artificial Narrow Intelligence (ANI)
 Artificial General Intelligence (AGI)
 Artificial Super Intelligence (ASI)

What is Artificial Narrow Intelligence (ANI)?

These Artificial Intelligence systems are designed to solve one single problem and would be
able to execute a single task really well. By definition, they have narrow capabilities, like
recommending a product for an e-commerce user or predicting the weather. This is the only
kind of Artificial Intelligence that exists today. They’re able to come close to human
functioning in very specific contexts, and even surpass them in many instances, but only
excelling in very controlled environments with a limited set of parameters.

What is Artificial General Intelligence (AGI)?

AGI is still a theoretical concept. It’s defined as AI which has a human-level of cognitive
function, across a wide variety of domains such as language processing, image processing,
computational functioning and reasoning and so on. An AGI system would need to comprise
of thousands of Artificial Narrow Intelligence systems working in tandem, communicating
with each other to mimic human reasoning.

What is Artificial Super Intelligence (ASI)?

ASI is seen as the logical progression from AGI. An Artificial Super Intelligence (ASI)
system would be able to surpass all human capabilities. This would include decision making,
taking rational decisions, and even includes things like making better art and building
emotional relationships.

2.6 MACHINE LEARNING

Machine learning is a subsidiary of artificial intelligence that facilitates a techniques where


machine can make decision based on its experience and improve and learn with time and use
without explicitly programmed. Machine learning focuses on the development of computer
programs that can access the data and use it to learn for themselves. Machine learning
algorithms are often categorized as supervised and unsupervised machine learning
algorithms.
2.6.1 Supervised Learning

In supervised learning, the machine is taught by example. The operator provides the machine
learning algorithm with a known dataset that includes desired inputs and outputs, and the
algorithm must find a method to determine how to arrive at those inputs and outputs. While
the operator knows the correct answers to the problem, the algorithm identifies patterns in
data, learns from observations and makes predictions. The algorithm makes predictions and is
corrected by the operator – and this process continues until the algorithm achieves a high
level of accuracy/performance. Under the umbrella of supervised learning fall: Classification,
Regression and Forecasting.

1. Classification: In classification tasks, the machine learning program must draw a


conclusion from observed values and determine to what category new observations
belong. For example, when filtering emails as ‘spam’ or ‘not spam’, the program must
look at existing observational data and filter the emails accordingly.

2. Regression: In regression tasks, the machine learning program must estimate – and
understand – the relationships among variables. Regression analysis focuses on one
dependent variable and a series of other changing variables – making it particularly useful
for prediction and forecasting.

3. Forecasting: Forecasting is the process of making predictions about the future based
on the past and present data, and is commonly used to analyse trends.

2.6.2 Unsupervised Learning

Unsupervised learning is the type of machine learning algorithm where there is no any
defined or labelled class and it itself draws the inferences from datasets. Unsupervised
learning studies how systems can infer a function to describe a hidden structure from
unlabelled data. Under the umbrella of unsupervised learning, fall:

1. Clustering: Clustering involves grouping sets of similar data (based on defined


criteria). It’s useful for segmenting data into several groups and performing analysis on
each data set to find patterns.

2. Dimension reduction: Dimension reduction reduces the number of variables being


considered to find the exact information required.
2.6.3 Reinforcement Learning

Reinforcement learning focuses on regimented learning processes, where a machine learning


algorithm is provided with a set of actions, parameters and end values. By defining the rules,
the machine learning algorithm then tries to explore different options and possibilities,
monitoring and evaluating each result to determine which one is optimal. Reinforcement
learning teaches the machine trial and error. It learns from past experiences and begins to
adapt its approach in response to the situation to achieve the best possible result.

2.7 DEEP LEARNING

This is because deep learning models are capable of learning to focus on the right features by
themselves, requiring little guidance from the programmer. Basically, deep learning mimics
the way our brain functions i.e. it learns from experience. As you know, our brain is made up
of billions of neurons that allows us to do amazing things. Actually, our brain has
subconsciously trained itself to do such things over the years. Now, the question comes, how
deep learning mimics the functionality of a brain? Well, deep learning uses the concept of
artificial neurons that functions in a similar manner as the biological neurons present in our
brain. Therefore, we can say that Deep Learning is a subfield of machine learning concerned
with algorithms inspired by the structure and function of the brain called artificial neural
networks.

Fig 1: Graph to show performance of ML and DL for given data


2.8 Recommender Systems

The main objective of the system is to provide the best user experience. Therefore, companies
strive to connect the users with the most relevant things according to their past behavior and
get them hooked to their content. The recommender system suggests which text should be
read next, which movie should be watched, and which product should be bought, creating a
stickiness factor to any product or service. Its unique algorithms are designed to predict a
users’ interest and suggest different products to the users in many different ways and retain
that interest till the end. Needless to say that we see the implementation of this system in our
daily lives. Many online sellers implement recommender systems to generate sales through
machine learning (ML). Many retail companies generate a high volume of sales by adopting
and implementing this system on their websites. The pioneering organizations using
recommenders like Netflix and Amazon have introduced their algorithms of recommendation
systems to hook their customers. Before diving into the in-depth mechanics, it is necessary to
know that this system removes useless and redundant information. It intelligently filters out
all information before showing it to the front users. To understand the recommender system
better, it is a must to know that there are three approaches to it being:

1. Content-based filtering

2. Collaborative filtering

3. Hybrid model
2.8.1. Content-based filtering

Many of the product’s features are required to implement content-based filtering instead of
user feedback or interaction. It is a machine learning technique that is used to decide the
outcomes based on product similarities. Content-based filtering algorithms are designed to
recommend products based on the accumulated knowledge of users. This technique is all
about comparing user interest with product features, so it is essential to provide a significant
feature of products in the system. It should be the first priority before designing a system to
select the favorite features of each buyer. These two strategies can be applied in a possible
combination. Firstly, a list of features is provided to the user to select the most interesting
features. Secondly, the algorithms keep the record of all the products chosen by the user in
the past and make up the customer’s behavioral data. The buyer’s profile rotates around the
buyer’s choices, tastes, and preferences and shapes the buyer’s rating. It includes how many
times a single buyer clicks on interested products or how many times liked those products in
wishlists.

Content-based filtering consists of a resemblance between the items. The proximity and
similarity of the product are measured based on the similar content of the item. When we talk
about the content, it includes genre, the item category, and so on. Let’s take the example of
recommender systems in movies. Suppose you have four movies in which the user starts off
liking only two movies at first. Still, the 3rd movie is similar to the 1st movie in terms of the
genre, so the system will automatically suggest the 3 rd movie. It is something that is
automatically generated by a content-based recommender system based on the similarity of
content.

Just imagine the power of content-based recommender systems, and the possibilities are
endless. For example, when we have a drama film that the user has not seen or liked before,
this genre will be excluded from their profile altogether. Therefore, a user only gets their
recommendation of the genre that is already existing in their profile. The system would never
suggest any movie out of their genres to present the best user experience.
Let’s get back to the movie example. Imagine that you have only six movie data sets. Let’s for the
sake of clarity that the user has seen all these six movies. Then, the genre of all movies is assigned,
i.e., Super Hero, adventure, comedy, and sci-fi, with each movie assigned one or a combination of
genres. Now moving further, the user has seen and rated three movies and given a rating of 2 out of
10 to the 1st movie, 10 to the 2nd movie, and 8 out of 10 to the 3 rd movie. After these ratings, the
recommender system needs to make calculations based on a user profile. Furthermore, the system will
recommend the best-suited movie according to the calculations. The Content-based filtering system
does not require any buyer information since the suggestion is only specific to the buyer and makes a
scale easier for many buyers. User’s interest is captured by this system and suggests items that few
buyers use.

2.8.2. Collaborative Filtering

Collaborative filtering needs a set of items that are based on the user’s historical choices. This
system does not require a good amount of product features to work. An embedding or feature
vector describes each item and User, and it sinks both the items and the users in a similar
embedding location. It creates enclosures for items and users on its own.

Other purchaser’s reactions are taken into consideration while suggesting a specific product
to the primary user. It keeps track of the behavior of all users before recommending which
item is mostly liked by users. It also relates similar users by similarity in preference and
behavior towards a similar product when proposing a product to the primary customer. Two
sources are used to record the interaction of a product user. First, through implicit feedback,
User likes and dislikes are recorded and noticed by their actions like clicks, listening to music
tracks, searches, purchase records, page views, etc. On the other hand, explicit feedback is
when a customer specifies dislikes or likes by rating or reacting against any specific product
on a scale of 1 to 5 stars. This is direct feedback from the users to show like and dislike about
the product. It includes both positive and negative feedback.

Collaborative Filtering is the most famous application suggestion engine and is based on
calculated guesses; the people who liked the product will enjoy the same product in the
future. This type of algorithm is also known as a product-based collaborative shift. In this
Filtering, users are filtered and associated with each User in place of items. In this system,
only users’ behavior is considered. Only their content and profile information is not enough.
The User giving a positive rating to products will be associated with other User’s behavior
giving a similar rating. The main idea behind this approach is suggesting new items based on
the closeness in the behavior of similar customers. If you plan to watch a new movie, you
will generally ask your friends and seek their recommendations. This is based on the premise
that users trust their friends as they are confident that their friends know their taste in movies.
Therefore, we usually follow and watch whatever is recommended by a good friend who has
a similar taste.

Thus collaborative filtering focuses on relationships between the item and users; items’
similarity is determined by their rating given by customers who rated both the items.

2.8.3. Hybrid Filtering

A hybrid approach is a mixture of collaborative and content-based filtering methods while


making suggestions; the film’s context also considers. The user-to-item relation and the user-
to-user relation also play a vital role at the time of the recommendation. This framework
gives film recommendations as per the user’s knowledge, provides unique recommendations,
and solves a problem if the specific buyer ignores relevant data. The user’s profile data is
collected from the website, film’s context also considers the user’s watching film and the data
of the scores of the movie.

The data consist of aggregating similar calculations. This method is called the hybrid
approach, in which both methods are used to produce the results. When this system is
compared with other approaches, this system has higher suggestions accuracy. The main
reason is the absence of information about the filtering’s domain dependencies and the
people’s interest in a content-based system.

2.8.4 Types of Collaborative Filtering


There are two types of the collaborative filtering process:

1. Memory-based collaborative filtering

2. Model-based collaborative filtering


1. Memory-based Collaborative Filtering

Memory-based CF is one method that calculates the similarity between users or items using
the user’s previous data based on ranking. The main objective of this method is to describe
the degree of resemblance between users or objects and discover homogenous ratings to
suggest the obscured items. Memory-based CF consist of the following two methods:

a. User-based Collaborative Filtering

In this method, the same user who has similar rankings for homogenous items is known.
Then point out the user’s order for the item to which the user is never linked. For this, we
need to follow the given steps:

1. Identify the target user


2. Find the same user who has ratings like the target user.
3. Explore the interacted items.
4. Forecast the ranking of unseen things of the target user.
5. If the forecasted rankings are higher than the threshold, then suggest them to the
target user.
b. Item-based Collaborative Filtering

In item-based CF, we find the same items that the target user has already viewed.

1. Identify the target user.


2. Find the matched items which have the same ratings as items the target user rated.
3. Forecast the rankings for the same items.
4. If the forecasted rankings are higher than the threshold, then suggest them to the
target user.

A numerical measure using a similarity matrix is the most common technique. It involves Dot
product, Cosine similarity, Pearson similarity, and Euclidean distance.

2. Model-based Collaborative Filtering

Model-based collaborative filtering is not required to remember the based matrix. Instead, the
machine models are used to forecast and calculate how a customer gives a rating to each
product. These system algorithms to predict unrated products by customer ratings. These
algorithms are further divided into different subsets, i.e., Matrix factorization-based
algorithms, and clustering algorithms.

The matrix factorization technique is different from analyzing and exploring the rate of rating
matrix in an algebra context and has two main goals. First, the initial ambition is to reduce
the rating matrix dimension. This approach’s second ambition is to identify perspective
features under the rating matrix, which will provide several recommendations.

Collaborative Filtering is a straightforward interpretation of how these algorithms use crowd


data. A large amount of data is gathered from different people and used for creating
customized suggestions and preferences of a single user. These methods were developed in
the 1990s and 2000s. Social media has brought innovation, and data availability has increased
access to information from different sources. The recommended system has begun to use the
social network in account in inclusion to similarity.

Since the similarity measure plays a significant role in improving accuracy in prediction
algorithms, it can be effectively used to balance the ratings significance . There are a couple
of popular similarity algorithms that have been used in the CF recommendation algorithms .
Cosine Vector Similarity

Cosine vector similarity is one of the popular metrics in statistics. Since it notionally
considers only the angle of two vectors without the magnitude, it is a very useful
measurement with data missing preference information as long as it can count the number of
times that term appears in the data.

In the following formula, the cosine vector similarity looks into the angle between two
vectors (the target Item i and the other Item j) of ratings in ndimensional item space.

Rk,i is the rating of the target Item i by User k.

Rk,j is the rating of the other Item j by user k. n is the total number of all rating users to Item i
and Item j.

When the angle between two vectors is near 0 degree (they are in the same direction), Cosine
similarity value, sim(i,j), is 1, meaning very similar. When the angle between two vectors is
near 90 degree, sim(i,j) is 0, meaning irrelevant. When the angle between two vectors is near
180 degree (they are in the opposite direction), sim(i,j) is -1, meaning very dissimilar. In case
of information retrieval using CF, sim(i,j) ranges from 0 to 1. This is because the angle
between two term frequency vectors cannot be greater than 90 degrees .

Pearson Correlation Coefficient

Pearson correlation coefficient is one of the popularly used methods in CF to measure how
larger a number in one series is, relative to the corresponding number. As following formula
shows, it is used to measure the linear correlation between two vectors (Item i and Item j).
It measures the tendency of two series of numbers, paired up one-to-one, to move together.
When two vectors have a high tendency, the correlation, sim(i,j) is close to 1. When two
vectors have a low tendency, sim(i,j) is close to 0. When two vectors have opposite tendency,
sim(i,j) is close to -1. The item-based similarity is computed with the corated items where
users rated both (Item i and Item j).

Rk,i is the rating of the target item i given by User k. Rk,j is the rating of the other item j given
by User i. Ai is the average rating of the target Item i for all the co-rated users, and A j is the
average rating of the other item j for all the co-rated users. n is the total number of ratings
users gave to item i and item j
3. SYSTEM DESIGN
CHAPTER 3

SYSTEM DESIGN
3.1 INTRODUCTION

System design is a solution that “how to” approach the creation of a new system. It provides
the understanding and procedural details necessary for implementing the proposed system.

3.2 BLOCK DIAGRAM

Fig.Block diagram

Following are the step in this block diagram:

1. Raw Data
2. Pre-processing
3. Structured Data
4. EDA
5. User-Movie Matrix
Pre-processing

Data preprocessing is a process of preparing the raw data and making it suitable for a
machine learning model. We used the following Pre-processing components in our Project:

Raw Data- Raw data also called source data, atomic data or primary data is data that has not
been processed for use.

Structured data- It is data that adheres to a predefined data model and is therefore
straightforward to analyse.
Exploration Data Analysis(EDA)-Exploratory Data Analysis refers to the critical process of
performing initial investigations on data so as to discover patterns, to spot anomalies, to test
hypotheses and to check assumptions with the help of summary statistics and graphical
representations.
User Movie Matrix

Recommendations
3.2.1 Application Building

After the model is built, we will be integrating it to a web application so that normal users
can also use it. In this section, we will be building a web application using Django
Framework that is integrated to the model we built. A UI is provided for the users where
he/she has to uploads an image/video . The uploaded image is given to the saved model and
prediction is showcased on the UI. This section has the following tasks

 Building HTML Pages


 Building server-side script
To run the code :

1. Open command prompt from the start menu


2. Navigate to the folder where your manage.py file is.
3. Now type python manage.py runserver

Navigate to the localhost where you can view your web page. Then it will run on
localhost:8000. Navigate to the localhost (http://127.0.0.1:8000/) where you can view your
web page.

3.2.2 Flowchart of above block diagram


3.2.3 Project Structure

3.3 INPUT DESIGN

Input design is the process of converting user-oriented input into a computer based format.
The goal of the designing input is to make data entry as easy and free from error. In Android,
input to the system is entered through activity. An activity is "any surface on which
information is to be entered, the nature of which is determined by what is already on that
surface." If the data going into the system is incorrect, then processing and output will
magnify these errors. So designer should ensure that form is accessible and understandable by
the user. End-users are people who communicate to the system frequently through the user
interface, the design of the input screen should be according to their recommendations. The
data is validated wherever it requires in the project. This ensures only correct data is entered
to the system. Html is the interface used in input design. All the input data are validated in the
order and if any data violates any condition the use is warned by a message and asks to re-
enter data. If the data satisfies all the conditions then it is transferred to the appropriate tables
in the database. This project uses text boxes and drop down to accept user input. If user enters
wrong format then it shows a message to the user. User is never lift in confusion as to what is
happening. Instead appropriate error messages and acknowledgments are displayed to the
user.
3.4 OUTPUT DESIGN

Computer output is the most important one to the user. A major form of the output is the
display of the information gathered by the system and the servicing the user requests to the
system. Output generally refers to the results or information that is generated by the system.
It can be in the form of operational documents and reports. Since some of the users of the
system may not operate the system, but merely use the output from the system to aid them in
decision making, much importance is given to the output design. Output generation hence
serves two main purposes, providing proper communication of information to the users and
providing data in a form suited for permanent storage to be used later on. The output design
phase consists of two stages, output definition and output specification. Output definition
takes into account the types of outputs, its contents, formats, its frequency and its volume.
The output specification describes each type of output in detail. The objective of the output
design to convey the information of all the past activities, current status and emphasize
important a quality output is one, which meets the requirements of the end user and presents
the information.
4 SYSTEM ENVIRONMENT
CHAPTER-4
SYSTEM ENVIRONMENT
4.1 SOFTWARE ENVIRONMENT

Software environment is the term commonly used to refer to support an application. A


software environment for a particular application could include the operating system, specific
development tools or compiler.

4.2 SOFTWARE REQUIREMENT SPECIFICATION

Purpose and Scope

Purpose To understand the nature of the program to be building the software engineers must
understand the information domain for the software. Here the document specifies the
software requirements of automating the functions. The document gives different software
and hardware requirements of the system. This will help the users to understand there own
needs. It will be the validation of all project.

Scope

This document is the only one that describes the requirements of the system to be developed.

4.2.1 TOOLS AND PLATFORMS

OVERVIEW OF WINDOWS 10

Windows 10 is a series of personal computer operating systems produced by Microsoft as


part of its Windows NT family of operating systems. It is the successor to Windows 8.1, and
was released to manufacturing on July 15, 2015, and broadly released for retail sale on July
29, 2015. Windows 10 receives new builds on an ongoing basis, which are available at no
additional cost to users, in addition to additional test builds of Windows 10 which are
available to Windows Insiders. The latest stable build of Windows 10 is Version 1903 (May
2019 Update). Devices in enterprise environments can receive these updates at a slower pace,
or uselong-term support milestones that only receive critical updates, such as security
patches, over their ten-year lifespan of extended support.
PYTHON

Python is an interpreted, object-oriented, high-level programming language with dynamic


semantics. It was created by Guido van Rossum, and first released on February 20, 1991. Its
high-level built in data structures, combined with dynamic typing and dynamic binding, make
it very attractive for Rapid Application Development, as well as for use as a scripting or glue
language to connect existing components together. Python's simple, easy to learn syntax
emphasizes readability and therefore reduces the cost of program maintenance. Python
supports modules and packages, which encourages program modularity and code reuse. The
Python interpreter and the extensive standard library are available in source or binary form
without charge for all major platforms, and can be freely distributed.

Often, programmers fall in love with Python because of the increased productivity it
provides. Since there is no compilation step, the edit-test-debug cycle is incredibly fast.
Debugging Python programs is easy: a bug or bad input will never cause a segmentation
fault. Instead, when the interpreter discovers an error, it raises an exception. When the
program doesn't catch the exception, the interpreter prints a stack trace. A source level
debugger allows inspection of local and global variables, evaluation of arbitrary expressions,
setting breakpoints, stepping through the code a line at a time, and so on. The debugger is
written in Python itself, testifying to Python's introspective power. On the other hand, often
the quickest way to debug a program is to add a few print statements to the source: the fast
edit-test-debug cycle makes this simple approach very effective.

HTML5

HTML5 is a markup language used for structuring and presenting content on the World Wide
Web. It is the fifth and last[3] major HTML version that is a World Wide Web Consortium
(W3C) recommendation. The current specification is known as the HTML Living Standard. It
is maintained by the Web Hypertext Application Technology Working Group (WHATWG), a
consortium of the major browser vendors (Apple, Google, Mozilla, and Microsoft). HTML5
was first released in a public-facing form on 22 January 2008,[2] with a major update and
"W3C Recommendation" status in October 2014.[4][5] Its goals were to improve the
language with support for the latest multimedia and other new features; to keep the language
both easily readable by humans and consistently understood by computers and devices such
as web browsers, parsers, etc., without XHTML's rigidity; and to remain backwardcompatible
with older software. HTML5 is intended to subsume not only HTML 4 but also XHTML 1
and DOM Level 2 HTML.[6] HTML5 includes detailed processing models to encourage
more interoperable implementations; it extends, improves, and rationalizes the markup
available for documents and introduces markup and application programming interfaces
(APIs) for complex web applications.[7] For the same reasons, HTML5 is also a candidate
for cross-platform mobile applications because it includes features designed with low-
powered devices in mind.

Django

Django is a high-level Python web framework that encourages rapid development and
clean, pragmatic design. Built by experienced developers, it takes care of much of the hassle
of web development, so you can focus on writing your app without needing to reinvent the
wheel. It’s free and open source. Django was designed to help developers take applications
from concept to completion as quickly as possible. Django takes security seriously and helps
developers avoid many common security mistakes. Some of the busiest sites on the web
leverage Django’s ability to quickly and flexibly scale.

Sublime Text

Sublime Text is a shareware cross-platform source code editor. It natively supports


many programming languages and markup languages. Users can expand its functionality with
plugins, typically community-built and maintained under free-software licenses. To facilitate
plugins, Sublime Text features a Python API. The following is a list of features of Sublime
Text
 "Goto Anything", quick navigation to project files, symbols, or lines
 "Command palette" uses adaptive matching for quick keyboard invocation of arbitrary
commands
 Simultaneous editing: simultaneously make the same interactive changes to multiple
selected areas
 Python-based plugin API
 Project-specific preferences
 Extensive customizability via JSON settings files, including project-specific and
platform-specific settings
 Cross-platform (Windows and Linux) and Supportive Plugins for cross-platform
 Compatible with many language grammars from textmate
JUPYTER NOTEBOOK

The Jupyter Notebook is an open source web application that you can use to create and share
documents that contain live code, equations, visualizations, and text. Jupyter Notebook is
maintained by the people at Project Jupyter. Jupyter Notebooks are a spin-off project from
the IPython project, which used to have an IPython Notebook project itself. The name,
Jupyter, comes from the core supported programming languages that it supports: Julia,
Python, and R. Jupyter ships with the IPython kernel, which allows you to write your
programs in Python, but there are currently over 100 other kernels that you can also use.

4.3 AI Hardware Requirements

 Operating system: window 7 and above with 64bit


 Processor Type -Intel Core i3-3220
 RAM: 4Gb and above
 Hard disk: min 100GB
5 SYSTEM IMPLEMENTATION
AND TESTING
CHAPTER-5
SYSTEM IMPLEMENTATION AND TESTING

5.1 CODING

CODING STANDARDS

Coding standards are important because they lead to greater consistency within code of all
developers. Consistency leads to code that is easier to understand, which in turn results in
turn result in a code, which is easier to develop and maintain. Code that difficult to
understand and maintain runs the risks of being scrapped rewritten.

5.2 TESTING AND VERIFICATION PROCEDURES

Unit Testing

Unit testing is a concept that would be familiar to people coming from software development.
It is a very useful technique that can help you prevent obvious errors and bugs in your code.
It involves testing individual units of the source code, such as functions, methods, and class
to ascertain that they meet the requirements and have expected behaviour. Unit tests are
usually small and don’t take much time to execute. The tests have a wide range of inputs
often including boundary and edge cases. The outputs of these inputs are usually calculated
by the developer manually to test the output of the unit being tested. For example for an adder
function, we would have test cases something like the following.

You test cases with positive inputs, inputs with zero, negative inputs, positive and negative
inputs. If the output of our function/method being tested would be equal to the outputs
defined in the unit test for all the input cases, your unit would pass the test otherwise it would
fail. You would know exactly which test case failed. Which can be further investigated to
find out the problem. This is an awesome sanity check to have in your code. Especially if
multiple developers are working on a large project. Imagine someone wrote a piece of code
based on certain assumptions and data sizes and a new developer changes something in the
codebase which no longer meets those assumptions. Then the code is bound to fail. Unit tests
allow avoiding such situations.

Following are some of the benefits of unit testing.


Forces you to write modular and reusable code with clearly defined inputs and outputs. As a
result, your code would be easier to integrate.

Increased confidence in changing/maintaining code. It helps to identify bugs introduced by a


code change.

Improved confidence in the unit itself since if it passes the unit tests we are sure that there is
nothing obviously wrong with the logic and the unit is performing as intended.

Debugging becomes easier since you would know which unit failed as well as the particular
test cases which failed.

Integration Testing

Data can be lost across an interface, one module can have an adverse effect on the other
subfunctions, when combined they may not perform the desired functions. Integrated testing
is the systematic testing to uncover the errors within the interface. This testing is done with
simple data and the developed system has run successfully with this simple data. The need for
integrated system is to the overall system performance.

The Modules of this project are connected and tested. After splitting the programs into units,
the units were tested together to see the defects between each module and function. It is
testing to one or more modules or functions together with the intent of interface defects
between the modules or functions. Testing completed as part of unit or functional testing,
integration testing can involve putting together of groups of modules and functions with the
goal of completing and verifying meets the system requirements.

System Testing

System testing focuses on testing the system as a whole. System Testing is a crucial step in
Quality Management Process. In the Software Development Life Cycle, System Testing is
the first level where the System is tested as a whole. The System is tested to verify whether it
meets the functional and technical requirements.

User Acceptance Testing

The system was tested by a small client community to see if the program met the
requirements the analysis stage. It was found to be satisfactory. In this phase, the system is
fully tested by the client community against the requirements in the analysis and design
stages, corrections are made as required, and the production system is built. User acceptance
of the system is key factor for success of the system.

Types of acceptance test

The software application may use different users on different way & it impossible to
developer or tester to predict what all possible scenarios or test data end user will use & how
customer actually use the software application. So most of software venders are use the term
like Alpha testing and Beta Testing which help to uncover the errors that may occurs in the
actual test environment. In this testing method the software application release over limited
end users rather than testing professionals to get feedback from them.

Alpha Testing

Alpha testing is conducted by Customer at the developer‟s site, it is performed by potential


users like developer, end users or organization users before it is released to external
customers & report the defects found while Alpha testing. This software product testing is not
final version of software application, after fixing all reported bug (after bug triage) the new
version of software application will release.

Sometimes the Alpha Testing is carried out by client or an outsider with the attendance of
developer and tester. The version of the release on which Alpha testing is perform is called
“Alpha Release”.

Beta Testing

Most if times we have the sense of hearing term “Beta release/version”, so it is linked to Beta
Testing. Basically the beta testing is to be carried out without any help of developers at the
end user‟s site by the end users &, so it is performed under uncontrolled environment. Beta
testing is also known as Field testing. This is used to get feedback from the market. This
testing is conducted by limited users & all issues found during this testing are reported on
continuous basis which helps to improve the system. Developers are taking actions on all
issues reported in beta testing after bug triage & then the software application is ready for the
final release. The version release after beta testing is called “Beta Release“.
SOFTWARE TESTING

Software testing is critical element of software quality assurance and represents ultimate
review of specification design and coding system with testing is actually a series of different
task whose primary objective is to fully exercise computer-based systems through
successfully, it will uncover error in software. Testing is a process of executing a program
with intension of finding an error, Good test case is one that has a high probability of finding
undiscovered error.

VALIDATION TESTING

In validation testing, all the relevant fields are checked to whether they contain data and also
checks whether they hold the right data format guarantee that all the independent path with in
a module have been exercised at least once

 Exercise all the logical decisions on their true or false sides.


 Exercise all loops at their boundaries and within their operational bounds.

TEST CASES

A specific set of steps and data along with expected results for a particular test objective. A
test case should only test one limited subset of a future or functionality. Test case documents
for each functionality/testing area of our project is written, reviewed and maintained
separately. Test cases that check error conditions are written separately from the functional
test cases and should have steps to verify the error messages.

5.3 SYSTEM IMPLEMENTATIONS

Implementation is the process of personnel check out, install the required equipment and
application and train user accordingly. Depending on the size of the organization and its
requirements the implementation can be divided into three:

Stage Implementation

Here system is implemented in stages. The whole system is not implemented at once. Once
the user starts working with system and is familiar with it, then a stage is introduced and
implemented. Also the system is usually updated, regularly until a final system is sealed.
Direct Implementation

The proposed new system is implemented directly and the user starts working on the new
System. The shortcoming, if any, faced are then rectified later.

Parallel Implementation

The old and the new system are not used simultaneously. This helps in comparison of the
results from the two systems. Once the user is satisfied and his intended objectives are
achieved by the new system, he stops using the old one.

In my project I have used direct implementation method. The client is given with fully
developed system. System developed to provide recommendations of movies in order to
support the decision-making process in a movie website.
6. SYSTEM MAINTENANCE
CHAPTER-6
SYSTEM MAINTENANCE
6.1 MAINTENANCE
Software Maintenance is the process of modifying a software product after it has been
delivered to the client. The main purpose of software maintenance is to modify and
update software application after delivery to correct faults and to improve
performance.

Need for Maintenance: Software Maintenance must be performed in order to:


 Correct faults.
 Improve the design.
 Implement enhancements.
 Interface with other systems.
 Accommodate programs so that different hardware, software, system features,
and telecommunications facilities can be used.
 Migrate legacy software.
 Retire software.

Categories of Software Maintenance: Maintenance can be divided into the


following:
Corrective maintenance: Corrective maintenance of a software product may be
essential either to rectify some bugs observed while the system is in use, or to enhance
the performance of the system.

Adaptive maintenance: This includes modifications and updating when the


customers need the product to run on new platforms, on new operating systems, or
when they need the product to interface with new hardware and software.

Perfective maintenance: A software product needs maintenance to support the new


features that the users want or to change different types of functionalities of the
system according to the customer demands.
Preventive maintenance: This type of maintenance includes modifications and
updations to prevent future problems of the software. It goals to attend problems,
which are not significant at this moment but may cause serious issues in future.

Reverse Engineering – Reverse Engineering is processes of extracting knowledge or


design information from anything man-made and reproducing it based on extracted
information. It is also called back Engineering.

Software Reverse Engineering: Software Reverse Engineering is the process of


recovering the design and requirements specification of a product from an analysis of
its code. It is becoming important, since several existing software products, lack
proper documentation, are highly unstructured, or their structure has degraded through
a series of maintenance efforts.
7.SYSTEM SECURITY MEASURES
CHAPTER-7
SYSTEM SECURITY MEASURES

7.1 INTRODUCTION

Security applied to computing devices such as computers and smartphones, as well as


computer networks such as private and public networks, including the whole Internet.
The field covers all the processes and mechanisms by which digital equipment,
information and services are protected from unintended or unauthorized access,
change or destruction, and is of growing importance in line with the increasing
reliance on computer systems of most societies worldwide. Computer security
includes measures taken to ensure the integrity of files stored on a computer or server
as well as measures taken to prevent unauthorized access to stored data, by securing
the physical perimeter of the computer equipment, authentication of users or computer
accounts accessing the data, and providing a transmission. The variety of threats
combined with the rapid development of new threats has made cyber insecurity and
the removal of information assurance the 'status quo'. As long as man continues to use
the computer, man will also takes interest in manipulating, modifying, creating and
bypassing 'rules' and 'security standards.'

7.2 OPERATING SYSTEM LEVEL SECURITY

Operating system security (OS security) is the process of ensuring OS integrity,


confidentiality and availability. OS security refers to specified steps or measures used
to protect the OS from threats, viruses, worms, malware or remote hacker intrusions.
OS security encompasses all preventive-control techniques, which safeguard any
computer assets capable of being stolen, edited or deleted if OS security is
compromised. OS security encompasses many different techniques and methods
which ensure safety from threats and attacks. OS security allows different applications
and programs to perform required tasks and stop unauthorized interference. OS
security may be approached in many ways, including adherence to the following:

 Performing regular OS patch updates


 Installing updated antivirus engines and software
 Scrutinizing all incoming and outgoing network traffic through a firewall
 Creating secure accounts with required privileges only (i.e., user management)

7.3 SYSTEM LEVEL SECURITY

System-level security refers to the architecture, policy and processes that ensure data
and system security on individual computer systems. It facilitates the security of
standalone and/or network computer systems/servers from events and processes that
can exploit or violate its security or stature.

System-level security is part of a multi-layered security approach in which


information security (IS) is implemented on an IT infrastructure's different
components, layers or levels. Systemlevel security is typically implemented on end-
user computer and server nodes. It ensures that system access is granted only to
legitimate and trusted individuals and applications. The key objective behind system-
level security is to keep system secure, regardless of security policies and processes at
other levels. If other layers or levels are breached, the system must have the ability to
protect itself.

Methods used to implement system-level security are user/ID login credentials,


antivirus and system-level firewall applications.
8. FUTURE SCOPE AND FURTHER
ENHANCEMENT
CHAPTER-8
FUTURE SCOPE AND FURTHER ENHANCEMENT

Recommender system has developed for many years, which ever entered a low point. In the
past few years, the development of machine learning, large-scale network and high
performance computing is promoting new development in this field. We will consider the
following aspects in future work. After getting enough user data, Hybrid filtering
recommendation will be introduced. Introduce more precise and proper features of movie. In
the future we should extract features such as subtitle from movie which can provide a more
accurate description for movie.

In the future we will collect more user data and add user dislike movie list. We will input
dislike movie list into the recommender system as well and generate scores that will be added
to previous result. By this way we can improve the result of recommender system. Make the
recommender system as an internal service. In the future, the recommender system is no
longer a external website that will be just for testing. We will make it as an internal APIs for
developers to invoke.
9. CONCLUSION
CHAPTER-9
CONCLUSION

In this paper, to avoid the use of content-based filtering, the CF filtering approach is used for
obtaining better results. Collaborative Filtering recommendation system is proposed using
Pearson correlation/cosine similarity by employing Movielens dataset. The existing system
are compared and found that the proposed system is more reliable and accurate. It is also
found that when the proposed methodology is applied to different larger datasets, both
accuracy, and efficiency increase which proves that our system is both accurate and as well as
efficient. The main aim was to improve the regular recommendation algorithm and to provide
better results. The research work was successful as it has been able to fulfill our aim of the
project. In the future, more features can be included to datasets to make recommendations
more reliable and innovative.

You might also like