Visualizing Online Social Networks

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 76

Unit II

MODELING AND VISUALIZATION

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


1
Topics Covered
Visualizing Online Social Networks

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


A Taxonomy of Visualizations
Graph Representation
Node-Edge Diagrams
Visualizing Social Networks with Matrix Based
Representations
Node-Link Diagrams
Hybrid Representations
Modelling and aggregating social network data
Random Walks and their Applications
Use of Hadoop and Map Reduce
Ontological representation of social individuals and 2
relationships
GRAPH REPRESENTATION

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


3
Graph Theory: Definitions
Node degree: number of edges incident to the node.

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


Node density:
The density of an undirected graph can be defined as
(2*E)/=N*(N 1), where E is the number of edges.
the density of a directed graph can be defined as E=N*(N- 1).
Path length: number of edges in the sequence that a walk
follows.
Component size: number of connected nodes in a graph.

4
Centrality
One of the key applications in social networks is to identify

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


the most important or central nodes in the network.
Used to give a rough indication of the social power of a node
based on how well they connect the network.
Three popular individual centrality measures:
Degree centrality
Betweenness centrality
Closeness centrality

5
IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT
6
IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT
7
IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT
8
Clustering
Many social networks contain subsets of nodes that are highly

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


connected within the subset and have relatively few
connections to nodes outside the subset
The nodes in such subsets are likely to share some attributes
and form their own communities.
Clustering coefficient: to measure the degrees of nodes to
decide which nodes in a graph tend to be clustered together.

9
NODE- EDGE DIAGRAMS

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


10
Node-Edge Diagrams
A node-edge diagram is an intuitive way to

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


visualize social networks.
With the node-edge visualization, many network
analysis tasks, such as component size
calculation, centrality analysis, and pattern
sketching, can be better presented in a more
straightforward manner.
Many node-edge layouts have been presented to
place the nodes in the graph for users to clearly
recognize the structure of the social network. 11
Random Layout
A random layout is to put the nodes at random geometric

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


locations in the graph.
A random layout algorithm can efficiently draw the social
network graph in linear time, O(N).
It can be usable to visualize very large network graphs.

12
Force-Directed Layout
A force-directed layout is also known as a spring

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


layout, which simulates the graph as a virtual
physical system.
In a force-directed layout, the edges act as spring
and the nodes act as repelling objects
Hence, there exists gravitational attraction or
magnetic repulsion between each node in the
graph.
The running cost of a force-directed layout is
13
much higher than that of a random layout
Node-link graph layouts for
social networks

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


A random geographic layout A force-based graph layout

14
Tree Layout
A basic tree layout is to choose a node as the root of tree,

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


and the nodes connected to the root become children of the
root node.
Nodes that are at more levels away from the root become
the grand-children of the root and so on.
A tree layout can display a more structural layout than graph
layouts by considering more contextual information.

15
Three kinds of tree layouts for
social network visualization

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


Hyperbolic Tree view
Radial Tree layout
An H3 view

16
IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT
VISUALIZING SOCIAL NETWORKS WITH 17

MATRIX-BASED REPRESENTATIONS
Matrix Representation
A social network graph can be transformed into a simple

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


Boolean matrix whose rows and columns represent the
vertices of the graph.
The Boolean values in the matrix can be further replaced
with valued attributes associated with the edges to
provide more informative network visualizations
The matrix-based representation of graphs offers an
alternative to the traditional node-edge diagrams.
With a matrix-based representation, clusters and
associations among the nodes can also be better
discovered when the number of nodes increases. 18
Matrix Visualization

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


19
Enhanced matrix-based
representation
MatrixExplorer, was developed to visualize social networks

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


with a Dual-Representation.
provide users with two synchronized representations of the
same network: matrix and node-edge
When a social network is composed of highly interlaced
edges, the matrix-based view can help users quickly recognize
the associations between nodes.
A matrix-based visualization could complement the
shortcomings of a node-edge diagram to better the social
network visualization.

20
Matrix-based representation
of MatrixExplorer
matrix-base view reordered matrix

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


21
Web Site Example

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


22
Advantages of matrices
Matrices constitute a good representation to

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


initiate an exploration.
They do not suffer from node overlapping.
They do not suffer from link crossing each other;
therefore they are a viable alternative for dense
networks.
Matrices show all possible pairs of vertices, they
can highlight the lack of connections and also
the directedness of the connections.
23
NODE-LINK DIAGRAMS

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


24
Node-Link Diagrams
The principle of node-link diagrams is to

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


graphically represent actors of the network by
nodes and connections by links.
Node-link diagrams are the most commonly used
representation of graphs and networks.

25
Advantages of node-link
diagrams
These representations are familiar to a wide

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


audience; they constitute a powerful
communication tool.
For small or sparse networks, node-link diagrams
were more effective than matrices.
For a compact representation, node-link
diagrams are a better choice.
When the analysis requires to perform a number
of path-related tasks, node-link diagrams are
26
more appropriate.
Scaling to Larger Networks
Scaling to large networks with several thousand or even

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


millions of nodes remains a challenge.
Solutions
Reducing the quantity of information by filtering or
aggregating data
Representing a subset of the network and exploring it
incrementally
Providing more visual space to represent the graph
Using an alternative representation.

27
Matrix Vs Node Link

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


Usable without reordering Familiar
No node overlapping Compact
No edge crossing More readable for path following
Readable for dense graphs More effective for small graphs
Fast navigation More effective for sparse graphs
Fast manipulation
Usable interactively
More readable for some tasks Useless without layout
Node overlapping
Less familiar Edge crossing
Not readable for dense graphs
Use more space
Manipulation requires layout
Weak for path following tasks
computation 28
HYBRID REPRESENTATIONS

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


29
Hybrid Representations
To minimize the display space required and limit the

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


cognitive cost when switching representations
Two hybrid representations:
Augmenting Matrices
Merging Matrix and Node-Link Diagram
The goal of these hybrids is to augment one
representation to overcome its drawbacks and enrich it
with the advantages of the other one.

30
Example Hybrid
Representations
Augmenting Matrices

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


Merging Matrix and Node-Link Diagram

31
Visualizing Online Social Networks
Visualization of online social networks can be categorized into

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


three types by their social relationships:
user-centric visualization
content-centric visualization
hybrid visualization.

32
1.User-centric visualization
Present various characteristics of actors and helps

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


explore different subjects and relationships of interests.
For example, to discover individuals and communities
that meet the following expectations:
actors or groups with similar/complementary features
key actors or those with high social impacts
actors with popular interpersonal relationships or active social
interactions.
user-centric visualizations are widely utilized to help
people access their social networks and discover the
social networks of their interests.
33
2.Content-centric visualization
Various kinds of contents can be properly presented to facilitate

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


people analyzing social networks.
For example, at least the following three sorts of social network
contents can displayed with content-centric visualization:
1. the distribution of user opinions, including key opinions and
controversial comments
2. user opinions with high impacts toward their social communities,
especially the effective-impact period of time
3. the relations among different content groups.

34
3.Hybrid visualization
Hybrid visualization is to visualize social networks from

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


different aspects of attributes, e.g. people and contents.
Particularly, online social activities, such as email and dating
services, usually include such elements.

35
NETWORKS
VISUALIZING ONLINE SOCIAL

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


36
Visualizations of online social
networks
Developed according to the attributes of network sociality to

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


present their network structure.
Web communities
Email groups
Digital libraries
Web 2.0 services

37
1.Web communities
Different social network services were created on

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


the Web to help people maintain their social
relationships.
The SixDegrees.com website was an early representative
In 2003, Club Nexus was established
Vizster in 2005, with customized techniques to visualize
social relationships and community structures
A project called FOAF (Friend-of-a-friend) - based on
Semantic Web social metadata
Recently, Microsoft Research Asia proposed a novel object-
level search service, called EntityCube2
38
2.Email groups
For analyzing the social structures of the daily email activities,

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


visualization techniques are employed to explore different
patterns.
Examples
Soylent visualization
Social Network Fragments (SNF) and PostHistory
Themail

39
3.Digital libraries
In digital libraries, social networks can be mainly analyzed

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


from two aspects: authors and writings.
Co-Authorship Networks
Some characteristics, such as clustering coefficient and average
path length, can be analyzed
Co-Citation Relations
co-citation social networks can be formed through the
continuously accumulated publications.
With proper visualization of co-citation networks, documents
with high impacts or similar citation patterns can be immediately
identified

40
4.Web 2.0 services
Many Web 2.0 applications are popularly accessed by users to

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


connect their social networks, such as Twitter and Facebook.
For example, Twitter provides users convenient functionalities
to share the up-to-date status with their followers.
Nexus4 is a visualization application on Facebook
communities to illustrate their large network graphs.

41
IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT
Network Data Representation
Ontological Representation of Social Individuals
Ontological Relationship of Social Relationships
Aggregating and Reasoning with Social Network Data

MODELING AND AGGREGATING SOCIAL 4


2
NETWORK DATA
Network Data Representation
Graphs

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


Matrices
Number the nodes and use the numbers to represent the edges
(e.g., 12 means edge between nodes 1 and 2)
GraphML (XML for graphs)
Do not support the aggregation of network data
Key challenges: Identification and Disambiguation

43
Ontological Representation of
Social Individuals
FOAF is an example of an ontological representation of individuals

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


Eliminates the drawbacks of early social networks like Friendster,
Orkut
The early social networks had centralized control and were difficult
to manage
FOAF is distributed and has a rich ontology to characterize
individuals

44
Ontological Representation of
Social Relationships
Social networks such as FOAF need to be extended to support

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


relationships
Support the integration of social information
Integrates/aggregates multiple social networks
Properties of relationships
Sign: Positive or Negative relationships
Strength (e.g., frequency of contact)
Provenance (different ways of viewing relationships)
Relationship History
Relationship roles
Conceptual models for social data semantic net, RDF
45
Aggregating and Reasoning with
Social Network Data
Representing Identity

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


URI (Universal Resource Identifier)
Disambiguation (A and B are the same; There are two people called
John Smith)
OWL has the sameAS property
Equality
0 The property sameAs is reflexive, symmetric and transitive
Descriptive Logic vs. Rule based reasoners
Rule based reasoners use forward chaining and backward chaining
Descriptive logic is used for classification and checking for ontology
consistency

46
RANDOM WALKS AND THEIR APPLICATIONS

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


47
Definitions

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


nxn Adjacency matrix A.
A(i,j) = weight on edge from i to j
If the graph is undirected A(i,j)=A(j,i), i.e. A is symmetric
nxn Transition matrix P.
P is row stochastic
P(i,j) = probability of stepping on node j from node i
= A(i,j)/iA(i,j)
nxn Laplacian Matrix L.
L(i,j)=iA(i,j)-A(i,j)
Symmetric positive semi-definite for undirected graphs
Singular
48
Definitions

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


Adjacency matrix A Transition matrix P

1
1 1/2
1 1
1

1/2 49
1
Random walk on graphs
On an undirected graph G:

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


Starting from vertex v0
Repeat for a number of steps:
Go to a random neighbor.

Simple but powerful.

50
What is a random walk
t=0 t=1

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


1 1
1/2 1/2
1 1

1/2 1/2

t=2 t=3
1
1/2 1
1
1/2
1

1/2
1/2
51
Road map: Random walk

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


Parameters Algorithms
/Properties

k-SAT
Hitting time st-connectivity

PageRank
Mixing time Approximate counting
Error-reduction 52
Important parameters of random
walk
Access time or hitting time Hij is the expected number of

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


steps before node j is visited, starting from node i

Commute time: i j i: Hij + Hji

Cover time: Starting from a node/distribution the expected


number of steps to reach every node

53
Applications of Random Walks
on Graphs
Ranking Web Pages

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


HITS on citation network
Clustering using random walk

54
USE OF HADOOP
AND MAP REDUCE

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


5
5
What is Hadoop?
Open-source data storage and processing API
Massively scalable, automatically parallelizable
Based on work from Google
GFS + MapReduce + BigTable
Current Distributions based on Open Source and Vendor
Work
Apache Hadoop
Cloudera CH4 w/ Impala
Hortonworks
MapR
AWS
Windows Azure HDInsight
56

IFETCE\M.E CSE\III SEM\NE7012-


SNA\UNIT 2-PPT
Why Use Hadoop?

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


Cheaper
Scales to Petabytes or
more
Faster
Parallel data processing
Better
Suited for particular types
of BigData problems

57
What types of business problems for
Hadoop?
Customer Churn Recommendation
Risk Modeling
Analysis Engine

Point of Sale
Ad Targeting Transactional Threat Analysis
Analysis

Trade
Search Quality Data Sandbox
Surveillance
58

IFETCE\M.E CSE\III SEM\NE7012-


SNA\UNIT 2-PPT
Companies Using Hadoop

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


Facebook
Yahoo
Amazon
eBay
American Airlines
The New York Times
Federal Reserve Board
IBM
Orbitz

59
Hadoop is a set of Apache Frameworks and more

Data storage (HDFS)

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


Runs on commodity hardware (usually Linux)
Horizontally scalable
Processing (MapReduce) Monitoring & Alerting
Parallelized (scalable) processing Tools & Libraries
Fault Tolerant Data Access
Other Tools / Frameworks MapReduce API
Data Access
HBase, Hive, Pig, Mahout Hadoop Core - HDFS
Tools
Hue, Sqoop
Monitoring 60
Greenplum, Cloudera
What are the core parts of a Hadoop distribution ?

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


HDFS Storage
Redundant (3 copies)
MapReduce API
For large files large blocks Batch (Job) processing
Other Libraries
64 or 128 MB / block Distributed and Localized to Pig
Can scale to 1000s of nodes clusters (Map)
Hive
Auto-Parallelizable for huge
amounts of data HBase
Fault-tolerant (auto retries) Others
Adds high availability and
more
61
Hadoop Cluster HDFS (Physical) Storage

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


One Name Node

Contains web site to view cluster Name Node


information
V2 Hadoop uses multiple Name
Nodes for HA Secondary
Name Node
Many Data Nodes

3 copies of each node by default


Data Node Data Node Data Node
Work with data in HDFS 1 2 3

Using common Linux shell


commands
Block size is 64 or 128 MB 62
Common Hadoop Distributions

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


Open Source
Apache
Commercial
Cloudera
Hortonworks
MapR
AWS MapReduce
Microsoft HDInsight
(Beta)
63
What is MapReduce?

Restricted parallel programming model meant for large

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


clusters
User implements Map() and Reduce() functions
Parallel computing framework
Libraries take care of EVERYTHING else
Parallelization
Fault Tolerance
Data Distribution
Load Balancing
Useful model for many practical tasks
64
Common Data Jobs for MapReduce

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


Text Index
Graphs
Mining Building

Patterns Filtering Prediction

Risk
Analysis 65
Ways to MapReduce

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


Libraries Languages

HBase Java*
Hive HiveQL (HQL)
Pig Latin
Pig
Python
Sqoop
C#
Oozie JavaScript
Mahout R
Others More 66

Note: Java is most common, but other languages can be used


Map and Reduce

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


The idea of Map, and Reduce is 40+ year old
Present in all Functional Programming
Languages.
See, e.g., APL, Lisp and ML
Alternate names for Map: Apply-All
Higher Order Functions
take function definitions as arguments, or
return a function as output
67
Map and Reduce are higher-order functions.
Map and Reduce Functions

Functions borrowed from functional programming

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


languages (eg. Lisp)
Map()
Process a key/value pair to generate
intermediate key/value pairs
Reduce()
Merge all intermediate values associated with
the same key

68
Example: Counting Words

Map()

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


Input <filename, file text>
Parses file and emits <word, count> pairs
eg. <hello, 1>

Reduce()
Sums all values for the same key and emits
<word, TotalCount>
eg. <hello, (3 5 2 7)> => <hello, 17>

69
MapReduce Example - WordCount

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


70
MapReduce v. Hadoop

MapReduce Hadoop
Org Google Yahoo/Apache
Impl C++ Java
Distributed
GFS HDFS
File Sys
Data Base Bigtable HBase

Distributed
Chubby ZooKeeper
lock mgr
IFETCE\M.E CSE\III SEM\NE7012-
SNA\UNIT 2-PPT
71
Limitations of MapReduce

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


MapReduce
Batch Designed for programming Lack of API / security
processing, a specific paradigm not trained model are
not problem commonly support moving
interactive domain understood professionals targets
(functional)

72
Comparing: RDBMS vs. Hadoop

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


Traditional RDBMS Hadoop / MapReduce

Data Size Gigabytes (Terabytes) Petabytes (Hexabytes)

Access Interactive and Batch Batch NOT Interactive

Updates Read / Write many times Write once, Read many times

Structure Static Schema Dynamic Schema

Integrity High (ACID) Low

Scaling Nonlinear Linear

Query Response Can be near immediate Has latency (due to batch processing)
Time

73
Ontological Representation of
Social Individuals
FOAF is an example of an ontological representation of individuals

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


Eliminates the drawbacks of early social networks like Friendster,
Orkut
The early social networks had centralized control and were difficult
to manage
FOAF is distributed and has a rich ontology to characterize
individuals

74
Ontological Representation of
Social Relationships
Social networks such as FOAF need to be extended to support

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT


relationships
Support the integration of social information
Integrates/aggregates multiple social networks
Properties of relationships
Sign: Positive or Negative relationships
Strength (e.g., frequency of contact)
Provenance (different ways of viewing relationships)
Relationship History
Relationship roles
Conceptual models for social data semantic net, RDF

75
In a Nutshell
Visual analysis of social
networks has become
exciting but also
challenging.
A number of techniques
to help node-link
diagrams scale to larger
networks.
Matrix-based
representations can
scale to larger networks 76
and provide insightful
IFETCE\M.E CSE\III SEM\NE7012-
overviews. SNA\UNIT 2-PPT

You might also like