0% found this document useful (0 votes)
21 views30 pages

Unit 5-1

The document discusses social network mining or analysis which is the process of analyzing social networks to extract meaningful insights. It can be used for applications like centrality analysis, community detection, recommendation systems, influence analysis, social media marketing, fraud detection, healthcare modeling, and online community analysis. Challenges include privacy, data quality, and scalability.

Uploaded by

sahuakshat286
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
21 views30 pages

Unit 5-1

The document discusses social network mining or analysis which is the process of analyzing social networks to extract meaningful insights. It can be used for applications like centrality analysis, community detection, recommendation systems, influence analysis, social media marketing, fraud detection, healthcare modeling, and online community analysis. Challenges include privacy, data quality, and scalability.

Uploaded by

sahuakshat286
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 30

UNIT 5

Introduction Applications of social Network mining

Social network mining, also known as social network analysis (SNA), is the
process of analyzing and extracting meaningful insights from social networks.
These insights can be used for various applications across different domains.
Here's a detailed exploration of the introduction and applications of social network
mining:

1. Introduction to Social Network Mining:

Social network mining focuses on understanding the structure, behavior,


and dynamics of social networks, which consist of nodes (individual
entities) and edges (connections between nodes).

It involves analyzing the topology of the network, identifying influential


nodes, detecting communities or clusters, and studying information
diffusion and influence propagation.

2. Applications of Social Network Mining:

Social Network Analysis:

Centrality Analysis: Identifying influential nodes based on metrics like


degree centrality, betweenness centrality, and eigenvector centrality.

Community Detection: Identifying groups of densely connected nodes


(communities) within the network using algorithms like modularity
optimization and hierarchical clustering.

Structural Analysis: Studying the overall structure of the network,


including its size, density, diameter, and average path length.

Recommendation Systems:

Collaborative Filtering: Recommending items (e.g., products, movies,


music) to users based on their social connections and interactions with
similar users.

UNIT 5 1
Personalized Recommendations: Tailoring recommendations to
individual users by considering their social network connections,
preferences, and behaviors.

Influence and Virality Analysis:

Influence Propagation: Studying how information, opinions, or


behaviors spread through the network and identifying influential nodes
that drive diffusion processes.

Virality Prediction: Predicting the likelihood of content (e.g., tweets,


posts) going viral based on network topology, content characteristics,
and user engagement patterns.

Social Media Marketing:

Influencer Identification: Identifying influential users (influencers) in


social networks who can promote products, services, or ideas to a
larger audience.

Targeted Advertising: Targeting advertisements to specific user


segments based on their social connections, interests, and behavior
patterns.

Fraud Detection and Security:

Anomaly Detection: Detecting fraudulent activities, spam, or malicious


behavior by analyzing deviations from normal patterns of interaction
within the social network.

Bot Detection: Identifying automated accounts (bots) and botnets by


analyzing their behavior, connections, and activity patterns in the
network.

Healthcare and Epidemiology:

Disease Spread Modeling: Modeling the spread of diseases and


contagions through social networks to understand epidemic dynamics
and inform public health interventions.

Health Behavior Analysis: Studying health-related behaviors,


influences, and interventions through the lens of social network
connections and interactions.

UNIT 5 2
Online Communities and Forums:

Content Analysis: Analyzing user-generated content in online


communities and forums to understand topics of interest, sentiment,
and user engagement patterns.

Community Moderation: Identifying influential users and community


structures to facilitate moderation, content curation, and community
management.

3. Challenges and Considerations:

Data Privacy and Ethics: Social network mining raises concerns about
data privacy, user consent, and ethical implications of analyzing personal
information.

Data Quality and Bias: Social network data may suffer from quality issues,
including noise, bias, and missing data, which can affect the accuracy and
reliability of analysis results.

Scalability and Performance: Analyzing large-scale social networks


requires scalable algorithms, distributed computing techniques, and
efficient data processing platforms.

Social network mining offers a wealth of opportunities for understanding human


behavior, societal dynamics, and organizational structures through the lens of
social connections and interactions. By leveraging insights derived from social
networks, organizations can make informed decisions, enhance user experiences,
and drive innovation across various domains. However, it is essential to address
challenges related to data privacy, quality, and scalability to realize the full
potential of social network mining in practice.

What Is Social Media Data Mining – How it


works, Benefits
You ought to have heard this many times that agencies like Facebook, or Google
can listen to our thoughts and spot us
as we move approximately our day. Don’t worry; we don’t want any other
Edward Snowden. But it is not that these companies track us down on a

UNIT 5 3
daily basis; they just
relate to our daily activities, which are quite predictable. This is what Social
Media Mining
is! Where big companies collect data indirectly and use it in a way to
improve our quality of life. But is this good or bad? Can it be harmful
to the customers or helpful in some way?
There are numerous questions that might come to your mind, as many people are
not aware of the concept of Social Media mining.mind,
This article is for you. In this article, we’ll be discussing Social
Media Mining and the key benefits it offers. So, without further ado,
let’s get started!

What is Social Media Mining?


Social Media has been around for 30 years, but the
rise in the user base is recent. The data from social media platforms
can be used to
boost business. This
is where Social Media Mining takes over. The amount of data these
companies have to deal with is huge and scattered. Social Media Mining
helps in
extracting meaning fromextract the data.

It is a system of expertise discovery inside a database. It is not wrong to say that


social media is the biggest contributor to Big Data.
The records are not new; they have been around for a long time.
However, the potential to system these statistics has developed. Social
media mining can help us
get insights to study customer behavior and interests,systematize and using this
information, you can serve better and compound your earnings.

Benefits of Social Media Mining


We post a lot of information on social media. In this
Era where algorithms are everywhere, they can generate information
about the
future trends and habits of

UNIT 5 4
the users, as this plays a major role in today’s world. Social Media
mining has become a must-have technique in every business. Here are some
of the benefits you can derive from using Social Media Mining :

1. Spot Trends Before They Become Trend


The data available from social media platforms can
give important insights regarding society and user behavior that were
not possible earlier and were like finding a needle in a haystack. In
today’s world, the data is
corroborative and evolving with time,were and there are multiple needles to look
for. Social Media Data mining is a technique that is capable of finding them all.

It is a process that starts with identifying the target audience and ends with
digging into what they are passionate about. Businesses may analyze the
keywords, search results, comments, and mentions to
identify the current trend, and a deeper study of behavior change can
also help in predicting future trends. This data is very useful for
businesses to make informed decisions when the stakes are high.

2. Sentiment Analysis
Sentiment Analysis is the process of identifying positive or negative sentiments
portrayed in information posted on social media platforms. Businesses use Social
Media Mining to identify the same sentiments associated with their brand and
product lines.

Sentiment Analysis has a vast application, and its


use cannot be limited to self-evaluation only. Negative sentiment about
competitors can be an opportunity to win their customers. The
Nestle Maggie ban is a perfect example of this;
competitors in the noodles market used strategies to market their
products as made from healthier alternatives. Patanjali saw this
opportunity and launched its
noodles,; claiming to be made from atta,noodles, while the noodles market was
full of refined wheat flour noodles (maida).

When combined with social media monitoring, sentiment analysis can help you
analyze your brand image and bring negative aspects

UNIT 5 5
of the business to your attention. With this information, you can
address the negative sentiments and prioritize them so that they can be
addressed properly to improve the customer experience.

3. Keyword Identification
In a world where more than 90% of businesses function online, the importance of
using the right words
cannot be emphasized enough. The business has to stand out to compete
in a world where your sales team cannot charm customers with their looks
and cheesy talks.
Keywords can give your business an edge over itsimprove the competitors.
Keywords are those words that reveal the behavior of users and highlight the
frequently used and popular terms related
to their products. Social Media Data Mining can be highly effective in
finding these keywords. The process is as basic as
scanning the list of the most frequent words or phrases used by customers to
search for or define your product.

Using these keywords to define your product in digital media and implementing
SEO can yieldits pretty good results. Your product will rank higher, and by
implementing frequent and popular terms, you can make your product listings
better.

4. Create a Better Product


Before the use of Big Data, businesses used to conduct individual surveys to
know the public’s opinion about their product. They faced many
challenges; people didn’t entertain them, and even if someone
participated, it was very likely that their
answers were not credible.
With the implementation of Social Media Data Mining, the public is
responding and participating in surveys without even realizing it, which
provides companies with candid data.

Using the processed data, you can identify the things


that bother customers and might give insights about how you can improve
your product to make it even better. In other words, you are

UNIT 5 6
seeking advicegain and opinions from millions of users.
By using so much data, you are essentially tweaking your product in
such a way that the probability of its success is very high. By
analyzing the userbase information, you can
target the social media platform with the highest number of users.

5. Competitor Analysis
You are not wrong to assume that your competitors are
already using Data Mining techniques to monitor the market and to
compete with them; it becomes essential to Improving yourself by
analyzing others’ mistakes is often less painful than learning from your
own.

There’s nothing wrong with following the footprints


of a good competitor. You might not make a fortune, but it will still
help you survive tough times. Analyzing competitor behavior on social
media during the launch of a product will help you
define a trend and use it to your advantage.
Posts by competitor employees and management
regarding hiring may give you an idea of the expansion of business or
even a subtle change in operations will help you to be proactive. Having
an idea of when to stay on your toes is advantageous in highly
competitive industries.

6. Event Identification
Also known as Social Heat Mapping, this technique uses excellent. It is a part of
Social Media mining that helps researchers and agencies to be prepared for
unexpected outbursts.
An excellent example of implementing heat mapping on social media was seen
during the Farmer Protests. During the protest, huge crowds were approaching
the venue of the Republic Day celebration.

7. Manage Real-time Events


This approach is mainly used for events, incidents or
any issues that occur on social media.Researchers and government

UNIT 5 7
department identify big issues as they use heat mapping or any other
technique to access social media sources. They detect the events and
figure out information faster than traditional sensor approaches. Many
users publish the information using their cell phones, so event
identification is real-time and up to date. Organizations can respond
faster as people share information during disasters or social events.

8. Provide Useful Content (and Stop Spamming)


Social media is very close to modern life.
This method is used to improve social media mining. It uses computer
algorithms to help companies for sharing information in a way that they
prefer and avoid spam. It can help organizations to identify small
patterns and recognize customers who might be interested in their
products. Even social media platforms can use techniques to remove
challenging reports. As a result, social media mining provides important
content to secure all users.

9. Recognize Behavior
Social media mining analyzes our real behavior even
when we are not present and helps to learn about humans. Organizations
use some techniques to understand customers. The government provides
facilities for companies to identify the right members and scientists to
explain the events. Therefore, social media mining helps to understand
how events link together that we may not noticed earlier.

Social network graph


Social networks can be represented and analyzed as graphs, where nodes
represent individuals or entities, and edges represent relationships or interactions
between them. This graph-based representation provides a powerful framework
for understanding the structure, dynamics, and behavior of social networks.
Here's a detailed exploration of social networks as graphs:

1. Nodes and Edges:

Nodes: Nodes in a social network graph represent individual users,


entities, or objects. Each node typically corresponds to a unique identifier

UNIT 5 8
(e.g., user ID) and may contain additional attributes such as user
demographics, interests, or behavior.

Edges: Edges in the graph represent relationships or interactions between


nodes. They can be undirected (symmetric relationships) or directed
(asymmetric relationships) and may have associated attributes or weights.

2. Types of Social Network Graphs:

Undirected Graphs: In undirected social network graphs, relationships


between nodes are reciprocal, meaning they are not inherently directional.
For example, in a friendship network, if user A is friends with user B, then
user B is also friends with user A.

Directed Graphs: In directed social network graphs, relationships between


nodes have directionality, indicating the flow or asymmetry of the
relationship. For example, in a follower-following network (e.g., Twitter),
user A may follow user B, but user B may not necessarily follow user A.

Weighted Graphs: In weighted social network graphs, edges are assigned


weights or strengths to represent the intensity or importance of
relationships between nodes. For example, the weight of an edge in a
social network could represent the frequency of interactions between two
users.

3. Network Measures and Metrics:

Degree: The degree of a node is the number of edges incident to it,


indicating its level of connectivity or centrality within the network.

Centrality: Centrality metrics such as betweenness centrality, closeness


centrality, and eigenvector centrality quantify the importance or influence
of nodes within the network.

Clustering Coefficient: The clustering coefficient measures the degree to


which nodes in a network tend to cluster together, indicating the presence
of tightly knit communities or cliques.

Shortest Paths: Shortest path algorithms (e.g., Dijkstra's algorithm, Floyd-


Warshall algorithm) determine the shortest paths between nodes in a
network, revealing the efficiency of information or influence propagation.

UNIT 5 9
4. Community Detection:

Community detection algorithms identify densely connected groups of


nodes (communities) within the network, revealing the underlying
structure and modularity of the social network.

Algorithms such as modularity optimization, hierarchical clustering, and


spectral clustering are commonly used for community detection in social
network graphs.

5. Applications of Social Network Graphs:

Social network analysis (SNA) enables a wide range of applications,


including:

Recommendation Systems: Recommending friends, products, or


content based on social network connections and interactions.

Influence and Virality Analysis: Identifying influential users and


predicting the spread of information or behaviors through the network.

Fraud Detection: Detecting fraudulent activities, fake accounts, or


suspicious behavior patterns within the network.

Healthcare and Epidemiology: Modeling disease spread, studying


health behaviors, and identifying key influencers in public health
interventions.

Online Communities and Forums: Analyzing user engagement,


sentiment, and topic dynamics in online communities and forums.

Social networks as graphs provide a rich framework for analyzing complex


relationships and interactions between individuals, organizations, and entities. By
leveraging graph-based analysis techniques, researchers and practitioners can
gain insights into network structure, identify patterns of influence and behavior,
and develop strategies for improving user experiences, enhancing decision-
making, and driving innovation in various domains.

Types of social networks


Social networks are diverse and can be classified based on various factors such
as the nature of relationships, the purpose of networking, and the platform or

UNIT 5 10
medium of interaction. Here are some common types of social networks:

1. Personal Social Networks:

Friendship Networks: These networks consist of personal connections,


friendships, and acquaintances formed between individuals. Examples
include Facebook and LinkedIn.

Family Networks: Family-based social networks focus on connecting


relatives, extended family members, and close family friends. They may
include features for sharing family updates, events, and photos.

Dating Networks: Dating platforms facilitate connections between


individuals seeking romantic relationships, friendships, or companionship.
Examples include Tinder, Bumble, and OkCupid.

2. Professional and Career-Oriented Networks:

Professional Networks: Professional social networks are designed for


networking, career development, and professional connections. They
allow users to showcase their skills, experiences, and accomplishments to
potential employers and colleagues. Examples include LinkedIn and Xing.

Business Networks: Business-focused networks connect entrepreneurs,


business owners, investors, and professionals within specific industries or
sectors. They provide opportunities for networking, collaboration, and
business development. Examples include AngelList and BizSugar.

3. Interest-Based Networks:

Hobby and Interest Networks: These networks bring together individuals


with shared interests, hobbies, or passions. Users can connect, share
resources, and engage in discussions related to their interests. Examples
include Reddit, Pinterest, and Goodreads.

Travel Networks: Travel-oriented networks connect travelers,


globetrotters, and adventure enthusiasts. They provide platforms for
sharing travel experiences, tips, recommendations, and itineraries.
Examples include TripAdvisor and Couchsurfing.

4. Media Sharing Networks:

UNIT 5 11
Photo Sharing Networks: These networks focus on sharing and
discovering photos, images, and visual content. Users can upload, edit,
and share photos with their network of friends or followers. Examples
include Instagram and Flickr.

Video Sharing Networks: Video-centric networks enable users to upload,


share, and watch videos on a wide range of topics. They may include
features for live streaming, video blogging, and content monetization.
Examples include YouTube, TikTok, and Twitch.

5. Professional and Academic Networks:

Academic Networks: Academic social networks cater to researchers,


scholars, and academics, providing platforms for sharing research papers,
collaborating on projects, and networking with peers. Examples include
ResearchGate and Academia.edu.

Student Networks: Student-oriented networks connect students, alumni,


and educational institutions. They facilitate communication, collaboration,
and engagement within the student community. Examples include Edmodo
and CampusGroups.

6. Messaging and Communication Networks:

Messaging Apps: Messaging networks enable real-time communication


and messaging between individuals, groups, or communities. They
support text messaging, voice calls, video calls, and multimedia sharing.
Examples include WhatsApp, Messenger, and Telegram.

Microblogging Platforms: Microblogging networks allow users to publish


short, concise posts or updates and engage with followers or subscribers.
They may include features for hashtagging, retweeting, and direct
messaging. Examples include Twitter and Tumblr.

These are just a few examples of the diverse types of social networks that exist,
each serving different purposes and catering to various user needs and
preferences. The social networking landscape continues to evolve, with new
platforms emerging and existing platforms evolving to meet changing trends and
demands.

UNIT 5 12
What are recommender systems?
First things first, let’s define recommender systems.
Recommender systems are sophisticated algorithms designed to provide product-
relevant suggestions to users.
Recommender systems play a paramount
role in enhancing user experiences on various online platforms,
including e-commerce websites, streaming services, and social media.
Essentially, recommender systems aim to analyze user data and behavior to make
tailored recommendations.

This is how they work:

Data collection: Recommender systems start by gathering data on user


interactions,
preferences, and behaviors. This data can include past purchases,
browsing history, ratings, and social connections.

Data processing: Once collected, they process the data to extract meaningful
patterns
and insights. This involves techniques like data cleaning,
transformation, and feature engineering.

Algorithm selection: Depending on the specific platform and its data, a


specific recommender algorithm is applied to generate recommendations.
Common types include
collaborative filtering, content-based filtering, and hybrid methods.

User profiling: Using historical data, recommender systems create user


profiles. These
represent their preferences, interests, and behavior, allowing the
system to understand individual tastes.

Item profiling: Similarly, items or content available on the platform are also
profiled based on their characteristics. Think of attributes like genres,
keywords, or product features.

Recommendation generation: The next step involves algorithms matching


user profiles with item

UNIT 5 13
profiles. For example, collaborative filtering identifies users with
similar preferences and recommends items liked by others with similar
profiles. Content-based filtering recommends items based on the
attributes of items users have previously interacted with.

Ranking and presentation: Finally, the recommended items are ranked based
on their relevance to
the user. The top-ranked items are then presented to the user through
interfaces like recommendation lists, personalized emails, or pop-up
suggestions.

Now that we’ve learned how


recommender systems work, let’s explore the basic types of recommenders –
non-personalized and personalized.

Non-personalized recommender systems


Non-personalized recommendation
systems provide recommendations to users without taking into account
their individual preferences or behavior.

These systems make recommendations


based on the characteristics of items or content themselves rather than
relying on user-specific data.
A popular non-personalized
recommender is the popularity-based recommender which recommends the
most popular items to the users, for instance:

Top-10 movies,

Top 5 trending products,

New products.

However, non-personalized
recommendation systems have their limitations, including the inability
to provide highly tailored recommendations. They may be a good option
for a first step in the process of personalization, but you shouldn’t
stop there.

UNIT 5 14
Once you gather enough data about the user in question, personalized offers and
recommendations are the logical next step.
This is especially important if you
don’t want to reject your potential buyer by failing to recognize what
they like and what to recommend next. Or even worse, you recommend a
product they have already bought.
This can all be handled well with a suitable personalized recommender system.

Personalized recommender systems


Personalized recommendation systems
are designed to provide tailored recommendations to individual users
based on their past behavior, preferences, and demographic information.

Based on the user’s data such as


purchases or ratings, personalized recommenders try to understand and
predict what items or content a specific user is likely to be interested
in. In that way, every user will get customized recommendations.
At this point, you might ask yourself – what makes a good recommendation?
Well, a good recommendation:

Is personalized (relevant to that user),

Is diverse (includes different user interests),

Doesn’t recommend the same items to users for the second time, and

Recommends available products at the right time.

There are a few types of personalized


recommendation systems, including content-based filtering,
collaborative filtering, and hybrid recommenders.
Let’s explore them in greater detail.

Types of personalized recommender systems


Personalized recommender systems can
be categorized into several types, each with its own methods and

UNIT 5 15
techniques for providing tailored recommendations.
These include:

Content-based filtering,

Collaborative filtering, and

Hybrid recommenders.

Content-based filtering
Content-based recommender systems use
items or user metadata to create specific recommendations. To do this,
we look at the user’s purchase history.
For example, if a user has already
read a book from one author or a product from a certain brand, you
assume that they have a preference for that author or that brand. Also,
there is a probability that they will buy a similar product in the
future.

A content-based recommender system

Let’s assume that Jenny loves sci-fi


books and her favorite writer is Walter Jon Williams. If she reads the

UNIT 5 16
Aristoi book, then her recommended book will be Angel Station, also a
sci-fi book written by Walter Jon Williams.

This is what content-based filtering looks like in real life.

Pros of the content-based approach


The content-based approach is one of
the common techniques used in personalized recommendation systems. It
has its advantages and disadvantages, which are important to consider
when deciding to implement this approach.
Let’s take a look at some of its most obvious advantages first:

Less cold-start problem: Content-based recommendations can effectively


address the “cold-start”
problem, allowing new users or items with limited interaction history to still
receive relevant recommendations.

Transparency: Content-based filtering allows users to understand why a


recommendation is made
because it’s based on the content and attributes of items they’ve
previously interacted with.

Diversity: Considering various attributes, content-based systems can provide


diverse recommendations. For example, in a movie recommendation system,
recommendations can be based on genre, director, and actors.

Reduced data privacy concerns: Since content-based systems primarily use


item attributes, they may not
require as much user data, which can mitigate privacy concerns
associated with collecting and storing user data.

Cons of the content-based approach


On the other hand, the content-based approach can come with a few
disadvantages, too. These can include:

The “Filter bubble”: Content filtering can recommend only content similar to
the user’s past preferences. If a user reads a book about a political ideology
and

UNIT 5 17
books related to that ideology are recommended to them, they will be in
the “bubble of their previous interests”.

Limited serendipity: Content-based systems may have limited capability to


recommend items that are outside a user’s known preferences.

In the first case scenario, 20% of items attract the attention of 70-80%
of users and 70-80% of items attract the attention of 20% of users. The
recommender’s goal is to introduce other products that are not available to
users at first glance.

In the second case scenario, content-based filtering recommends


products
that are fitting content-wise, yet very unpopular (i.e. people don’t buy
those products for some reason, for example, the book is bad even
though it fits thematically).

Over-specialization: If the content-based system relies too heavily on a user’s


past
interactions, it can recommend items that are too similar to what the
user has already seen or interacted with, potentially missing
opportunities for diversification.

Collaborative filtering
Collaborative filtering is a popular
technique used to provide personalized recommendations to users based on
the behavior and preferences of similar users.
The fundamental idea behind
collaborative filtering is that users who have interacted with items in
similar ways or have had similar preferences in the past are likely to
have similar preferences in the future, too.
Collaborative filtering relies on the collective wisdom of the user community to
generate recommendations.
There are two main types of collaborative filtering: memory-based and model-
based.

Memory-based recommenders

UNIT 5 18
Memory-based recommenders rely on the direct similarity between users or items
to make recommendations.
Usually, these systems use raw,
historical user interaction data, such as user-item ratings or purchase
histories, to identify similarities between users or items and generate
personalized recommendations.
The biggest disadvantage of
memory-based recommenders is that they require a lot of data to be
stored and comparing every item/user with every item/user is extremely
computationally demanding.
Memory-based recommenders can be categorized into two main types user-
based and item-based collaborative filtering.
User-based

A user-based collaborative filtering recommender system


With the used-based approach,
recommendations to the target user are made by identifying other users
who have shown similar behavior or preferences. This translates to
finding users who are most similar to the target user based on their
historical interactions with items. This could be “users who are similar
to you also liked…” type of recommendations.
But if we say that users are similar, what does that mean?
Let’s say that Jenny and Tom both
love sci-fi books. This means that, when a new sci-fi book appears and

UNIT 5 19
Jenny buys that book, that same book will be recommended to Tom, since
he also likes sci-fi books.
Item-based

An item-based collaborative filtering recommender system


In item-based collaborative
filtering, recommendations are made by identifying items that are
similar to the ones the target user has already interacted with.
The idea is to find items that share
similar user interactions and recommend those items to the target user.
This can include “users who liked this item also liked…” type of
recommendations.
To illustrate with an example, let’s assume that John, Robert, and Jenny highly
rated sci-fi books Fahrenheit 451 and The Time Machine, giving them 5 stars. So,
when Tom buys Fahrenheit 451, the system automatically recommends The Time
Machine to him because it has identified it as similar based on other users’
ratings.

How to calculate user-user and item-item similarities?


Unlike the content-based approach
where metadata about users or items is used, in the collaborative
filtering memory-based approach we are looking at the user’s behavior

UNIT 5 20
e.g. whether the user liked or rated an item or whether the item was
liked or rated by a certain user.
For example, the idea is to recommend Robert the new sci-fi book. Let’s look at
the steps in this process:

Create a user-item-rating matrix.

Create a user-user similarity matrix: Cosine similarity is calculated


(alternatives: adjusted cosine
similarity, Pearson similarity, Spearman rank correlation) between every two
users. This is how we get a user-user matrix. This matrix is
smaller than the initial user-item-rating matrix.

Cosine similarity

1. Look up similar users: In the user-user matrix, we observe users that are most
similar to Robert.

2. Candidate generation: When we find Robert’s most similar users, we look at


all the books these users read and the ratings they gave them.

3. Candidate scoring: Depending on the other users’ ratings, books are ranked
from the ones
they liked the most, to the ones they liked the least. The results are
normalized on a scale from 0 to 1.

4. Candidate filtering: We check if Robert has already bought any of these


books and eliminate those he already read.

The item-item similarity calculation is done in an identical way and has all the
same steps as user-user similarity.

Comparing user-based and item-based approaches

UNIT 5 21
The similarity between items is more stable than the similarity between the users.
Why?
Well, a math book will always be a
math book, but a user can easily change his mind – something they liked
last week might not be interesting next week.

Moreover, there are fewer products


than users. This means that an item-item matrix with similarity scores
will be smaller than a user-user matrix.
Finally, an item-based is a better
approach if a new user visits the site while the user-based approach is
problematic in that case since you don’t have enough or any data at all
(the cold-start problem).

Model-based recommenders
Model-based recommenders make use of machine learning models to generate
recommendations.
These systems learn patterns,
correlations, and relationships from historical user-item interaction
data to make predictions about a user’s preferences for items they
haven’t interacted with yet.
There are different types of model-based recommenders, such as matrix
factorization, Singular Value Decomposition (SVD), or neural networks.
However, matrix factorization remains the most popular one, so let’s explore it a bit
further.

Matrix factorization
Matrix factorization is a mathematical technique used to decompose a large matrix
into the product
of multiple smaller matrices.
In the context of recommender systems, matrix factorization is commonly
employed to uncover latent patterns or features in user-item interaction data,
allowing for

UNIT 5 22
personalized recommendations. Latent information can be reported by
analyzing user behavior.
If there is feedback from the user,
for example – they have watched a particular movie or read a particular
book and have given a rating, that can be represented in the form of a
matrix. In this case,

Rows represent users,

Columns represent items, and

The values in the matrix represent user-item interactions (e.g., ratings,


purchase history, clicks, or binary preferences).

Since it’s almost impossible for the


user to rate every item, this matrix will have many unfilled values.
This is called sparsity.

The matrix factorization process


Matrix factorization aims to approximate this interaction matrix by factorizing it
into two or more lower-dimensional matrices:

User latent factor matrix (U), which contains information about users and their
relationships with latent factors.

Item latent factor matrix (V), which contains information about items and their
relationships with latent factors.

The rating matrix is a product of two


smaller matrices – the item-feature matrix and the user-feature matrix.
The higher the score in the matrix, the better the match between the
item and the user.

UNIT 5 23
Matrix factorization
The matrix factorization process includes the following steps:

Initialization of random user and item matrix,

The ratings matrix is obtained by multiplying the user and the transposed item
matrix,

The goal of matrix factorization is to minimize the loss function (the


difference in the ratings of the predicted and actual matrices must be
minimal). Each rating can be described as a dot product of a row in the
user matrix and a column in the item matrix.

Minimization of loss function

In order to minimize loss function we can apply Stochastic Gradient


Descent (SGD) or Alternating Least Squares (ALS). Both methods can be
used to incrementally update the model as new rating comes in. SGD is
faster and more accurate than ALS.

Pros of collaborative filtering


Looking at the bigger picture, collaborative filtering comes with a set of great
advantages:

Effective personalization: Collaborative filtering is highly effective in


providing personalized
recommendations to users. It takes into account the behavior and
preferences of similar users to suggest items that a particular user is
likely to enjoy.

UNIT 5 24
No need for item attributes: Collaborative filtering works solely based on
user-item interactions,
making it applicable to a wide range of recommendation scenarios where
item features may be sparse or unavailable. This is especially useful in
content-rich platforms.

Serendipitous discoveries: Collaborative filtering can introduce users to items


they might not have discovered
otherwise. By analyzing user behaviors and identifying patterns across
the user community, collaborative filtering can recommend items that
align with a user’s tastes but may not be immediately obvious to them.

Cons of collaborative filtering


It’s important to note that while
collaborative filtering offers these and other advantages, it also has
its limitations, including:

The “cold-start” problem:

User cold start occurs when a new user joins the system without any prior
interaction history. Collaborative filtering relies on historical
interactions to make recommendations, so it can’t provide personalized
suggestions to new users who start with no data.

Item cold start happens when a new item is added, and there’s no user
interaction data for it. Collaborative filtering has difficulty
recommending new items since it lacks information about how users have
engaged with these items in the past.

Sensitivity to sparse data: Collaborative filtering depends on having enough


user-item interaction data to
provide meaningful recommendations. In situations where data is sparse
and users interact with only a small number of items, collaborative
filtering may struggle to find useful patterns or similarities between
users and items.

Potential for popularity bias: Collaborative filtering tends to recommend


popular items more
frequently. This can lead to a “rich get richer” phenomenon, where

UNIT 5 25
already popular items receive even more attention, while niche or
less-known items are overlooked.

To address these and other


limitations, recommendation systems often use hybrid approaches that
combine collaborative filtering with content-based methods or other
techniques to improve recommendation quality in the long run.

Hybrid recommenders
Hybrid recommendation systems combine
multiple recommendation techniques or approaches to provide more
accurate, diverse, and effective personalized recommendations.
They are particularly valuable in
real-world recommendation scenarios because they can provide more
robust, accurate, and adaptable recommendations.
The choice of which hybrid approach
to use depends on the specific requirements and constraints of the
recommendation system and the nature of the available data.

Pros of hybrid recommenders


Some of the most common advantages of hybrid recommenders include:

Improved recommendation quality: Hybrid recommenders leverage multiple


recommendation techniques,
combining their strengths to provide more accurate and diverse
recommendations. This often results in higher recommendation quality
compared to individual methods, benefiting users by offering more
relevant suggestions.

Enhanced robustness and flexibility: Hybrid models are often more robust in
handling various recommendation
scenarios. They can adapt to different data characteristics, user
behaviors, and recommendation challenges. This flexibility is valuable
in real-world recommendation systems.

Addressing common recommendation limitations: Hybrid recommenders can


mitigate the limitations of individual recommendation

UNIT 5 26
techniques. For example, they can overcome the “cold-start” problem for
new users and items by incorporating content-based recommendations,
providing serendipitous suggestions, and reducing popularity bias.

Cons of hybrid recommenders


Just like all other recommenders systems, hybrid recommenders have their
downsides, too. Some include:

Increased complexity and development effort: Implementing and maintaining


hybrid recommendation systems can be more
complex and resource-intensive. It requires expertise in multiple
recommendation techniques and careful integration of these methods.

Data and computational demands: Hybrid models often require more data
and computational resources
because they use multiple recommendation algorithms. This can be
challenging, especially in large-scale systems with vast user-item
interactions and a diverse catalog of items.

Tuning and parameter sensitivity: Hybrid recommenders may involve a


greater number of parameters and
hyperparameters that need to be fine-tuned. Yet, ensuring optimal
parameter settings for each recommendation component can be challenging
and time-consuming.

While hybrid recommenders offer


significant advantages in terms of recommendation quality and
versatility, you should carefully consider the trade-offs and resource
requirements when deciding which system to implement.
This is the best way to ensure that the benefits of hybridization outweigh the
added complexity and costs.

Evaluation metrics for recommender systems


To assess the performance and
effectiveness of recommender systems, you have to take into
consideration certain evaluation metrics.

UNIT 5 27
They can help you measure how well a
recommendation algorithm or model is performing and provide insights
into its strengths and weaknesses.
There are several categories of evaluation metrics, depending on the specific
aspect of recommendations being assessed.
Some common evaluation metrics include:

Accuracy metrics assess the accuracy of the recommendations made by a


system in terms of how well they match the user’s actual preferences or
behavior. Here we
have Mean Absolute Error (MAE), Root Mean Square Error (RMSE), or Mean
Squared Logarithmic Error (MSLE).

Ranking metrics evaluate how well a recommender system ranks items for a
user, especially in
top-N recommendation scenarios. Think of hit rate, average reciprocal
hit rate (ARHR), cumulative hit rate, or rating hit rate.

Diversity metrics assess the diversity of recommended items to ensure that


recommendations are
not overly focused on a narrow set of items. These include Intra-List
Diversity or Inter-List Diversity.

Novelty metrics evaluate how well a recommender system introduces users


to new or
unfamiliar items. Catalog coverage and item popularity belong to this
category.

Serendipity metrics assess the system’s ability to recommend unexpected but


interesting
items to users – surprise or diversity are looked at in this case.

You can also choose to look at some


business metrics such as conversion rate, click-through rate (CTR), or
revenue impact. But, ultimately, the best way to do an online evaluation
of your recommender system is through A/B testing.

What metric to use?

UNIT 5 28
Which metric will be used depends on the business problem being solved.
If we think that we have made the
best possible recommender and the metric is great, but in practice it is
bad, then our recommender is not good. For example, Netflix’s
recommender was never used in practice because it didn’t meet customer
needs.
The most important thing is that the
user gains trust in the recommender system. If we recommend to them the
top 10 products, but only 2 or 3 are relevant to them, they will
consider this a bad recommendation.
For this reason, the idea is not to always recommend the top 10 items but to
recommend items above a certain threshold.

Recommender real-life challenges


Although quite helpful and effective in providing personalized recommendations,
recommender systems encounter several real-world challenges.
One significant challenge is the “cold start problem,” which arises when a new
user joins the system, and there is limited data available about their preferences.

In such cases, recommender systems can initially recommend either the top 10
best-selling products or the top 10 products on promotion as a starting point.
Alternatively,
conducting user interviews can help gather information about the user’s
preferences.
Another aspect of the cold start
problem pertains to introducing new products to users. This can be
achieved by leveraging content-based attributes and periodically adding
new products to user recommendations while actively promoting them.
Furthermore, churn
poses another challenge, as users’ preferences and behaviors evolve
over time. To address this, recommender systems should incorporate a
degree of randomization to refresh the top N list of recommended items
periodically.

UNIT 5 29
It is also crucial to ensure that
recommender systems are designed with sensitivity in mind, avoiding
content that may offend or discriminate against users.

This includes steering clear of


recommending items containing vulgar language, religious or political
content, or references to drugs.
By tackling these challenges
thoughtfully, recommender systems can enhance user satisfaction and
provide meaningful recommendations while upholding ethical
considerations.

UNIT 5 30

You might also like