Modul - Data Representation Ver 3.0 - Updated
Modul - Data Representation Ver 3.0 - Updated
Modul - Data Representation Ver 3.0 - Updated
0|Page
DSC651 – Data Representation
Table of Content
Chapter Page No
1 Data Representation 1
What is Data?
In computing, data is information that has been translated into a form that is efficient for
movement or processing. Relative to today's computers and transmission media, data is
information converted into binary digital form. It is acceptable for data to be used as a
singular subject or a plural subject. Raw data is a term used to describe data in its most
basic digital format.
Data a representation of
facts
Concepts
instructions
in a formalized manner
Suitable for communication, interpretation or processing
by human or electronic machine.
2|Page
DSC651 – Data Representation
What is Information?
Information is organized or classified data, which has some meaningful values for the
receiver. Information is the processed data on which decisions and actions are based.
For the decision to be meaningful, the processed data must qualify for the following
characteristics :
Timely − Information should be available when required.
Accuracy − Information should be accurate.
Completeness − Information should be complete.
Big Data has been described by some Data Management experts as “huge,
overwhelming, and uncontrollable amounts of information.” In 1663, John Graunt dealt
with “overwhelming amounts of information” while he studied the bubonic plague, which
was currently ravaging Europe. Graunt used statistics and was the first person to use
statistical data analysis.
The global data explosion is highly driven by technologies including digital video and
music, smartphones, and the Internet. This data has its' origins in a variety of sources
including web searches, sensors, commercial transactions, social media interactions,
audio and video uploads, and mobile phone GPS signals.
3|Page
DSC651 – Data Representation
Big data is everywhere and it can help organisations any industry in many different ways.
Nowadays there is so much data that existing hardware and software are not able to
deal with the vast amount of different types of data that is created at such a high
speed. Big data has become too complex and too dynamic to be able to process, store,
analyze and manage with traditional data tools.
4|Page
DSC651 – Data Representation
the same that the occurrence of an event). Quality of this kind of source depends mostly
of the capacity of the sensor to take accurate measurements in the way it is expected.
5|Page
DSC651 – Data Representation
5) Broadcastings: Mainly referred to video and audio produced on real time, getting
statistical data from the contents of this kind of electronic data by now is too complex
and implies big computational and communications power, once solved the problems of
converting “digital-analog” contents to “digital-data” contents we will have similar
complications to process it like the ones that we can find on social interactions.
Examples
Data typically originates from one of three primary sources of big data
Social networking sites: Facebook, Google, LinkedIn all these sites generates
huge amount of data on a day to day basis as they have billions of users
worldwide.
Exercise: What are the data they collected?
6|Page E-commerce site: Sites like Amazon, Flipkart, Alibaba generates huge amount of
logs from which users buying trends can be traced.
Weather Station: All the weather station and satellite gives very huge data
which are stored and manipulated to forecast weather.
Share Market: Stock exchange across the world generates huge amount of data
DSC651 – Data Representation
Processing - the input data is changed to produce data in a more useful form.
For example, pay-checks can be calculated from the time cards, or a
summary of sales for the month can be calculated from the sales orders.
Output - The particular form of the output data depends on the use of the data.
7|Page
DSC651 – Data Representation
The concept of using pictures to understand data has been around for centuries from
maps and graphs in the 17th century to the invention of the pie chart in the early 1800s.
Several decades later, one of the most cited examples of statistical graphics occurred
when Charles Minard mapped Napoleon’s invasion of Russia. The map depicted the size
of the army as well as the path of Napoleon’s retreat from Moscow – and tied that
information to temperature and time scales for a more in-depth understanding of the
event. It’s technology, however, that truly lit the fire under data visualization. Computers
made it possible to process large amounts of data at lightning-fast speeds. Today, data
visualization has become a rapidly evolving blend of science and art that is certain to
change the corporate landscape over the next few years.
Example 1
Example 2
8|Page
DSC651 – Data Representation
Example 3
9|Page
DSC651 – Data Representation
https://visme.co/blog/examples-data-visualizations/
Data visualizations make big and small data easier for the human brain to understand,
and visualization also makes it easier to detect patterns, trends, and outliers in groups
of data. Good data visualizations should place meaning into complicated datasets so
that their message is clear and concise.
It also comes with justification of how visualization aid users to learn about data
representation
This infographic takes dense material, such as indicators and figures, and presents
it in a beautiful, clean and captivating format. Not only is the design deceptively
simple and functional, it also provides the user with many options for interacting
with the graphic, such as adding countries, indicators and type of relation.
12 | P a g e
DSC651 – Data Representation
A good infographic will not only do the hard work of digesting complex data, it may
also stimulate readers’ imagination by allowing them to conjure up different
hypothetical situations and possibilities, as is done in this example. By presenting
an interactive, game-like experience, this infographic quickly engages the user and
keeps them interested from beginning to end.
It explains a process
In line with the objective of making the complex easy to understand, this infographic
provides a visual representation of a coffee bean’s journey, from bean to cup. By
breaking the process down into parts, this data visualization does its job of giving the
reader bite-sized pieces of information that are easily digestible.
2
To explore or analyse information:
13 | P a g e
DSC651 – Data Representation
It reveals trends
14 | P a g e
DSC651 – Data Representation
This data visualization not only has all of the previous qualities mentioned, it also
allows the user to have direct access to all the original raw data (view link on
bottom right corner).
Also, by using bubbles shaped in accordance with the size of the data breach, the
viewer can get a solid overview of the data breach “landscape.” And if viewers
want to get into the details of the information, they can also go as deep or as
superficially as they want by navigating the different filters and raw data.
It is interactive
The interactive piece The Daily Routines of Famous Creative People is a perfect
example of a data visualization that combines all the necessary ingredients of an
15 | P a g e
DSC651 – Data Representation
effective and engaging piece: It combines reams of data into a single page; it uses
color to easily distinguish trends; it allows the viewer to get a global sense of the data;
it engages users by allowing them to interact with the piece; and it is surprisingly
simple to understand in a single glance.
EXERCISE
Find one data visualisation system and describe how the exploration features are used
3 1)
2)
Share and persuade
Emphasize important aspects of data
3) Answer a question
4) Make decisions
5) Present an argument
6) Feedback and interaction
7) Tell a story
It tells a story
16 | P a g e
DSC651 – Data Representation
EXERCISE
Find one case study and explain its story telling.
EXERCISE
18 | P a g e
DSC651 – Data Representation
There are various types of data representations sturucture where each type has its own
unique features and characteristics. Some of the data representations type to be
discussed are (i) hierarchical-trees, (ii) relational-networks, (iii) trends, timelines and
flows, (iv) spatial-temporal and (v) textual.
1) Hierarchical – Trees
A hierarchical data model - one of the earliest data models to represent a file-
based model build like a tree.
Hierarchical data is arranged in a tree form basis. Each data is a node, a parent
node could be associated to multiple child nodes, but a child node could have
only one parent.
a) Sunburst Chart
b) Treemap Diagram
c) Radial Tree
20 | P a g e
DSC651 – Data Representation
2) Relational – Networks
A network graph uses information from both the link and node data sets to
generate a graphical depiction of the network.
The nodes and links in a network graph can be arranged in a variety of layout
patterns.
In computer science, a graph is an abstract data type that is meant to implement
the undirected graph and directed graph concepts from graph theory in
mathematics.
A directed graph with three vertices and An undirected graph with three vertices
four directed edges (the double arrow and three edges.
represents an edge in each direction).
21 | P a g e
DSC651 – Data Representation
Social Network
scenario represented
as graph
(Friends_with)
22 | P a g e
DSC651 – Data Representation
23 | P a g e
DSC651 – Data Representation
4) Spatial-Temporal
A common example of spatial data can be seen in a road map.
A road map is a two-dimensional object that contains points, lines, and polygons
that can represent cities, roads, and political boundaries such as states or
provinces.
A road map is a visualization of geographic information.
Spatial data types provide the information that a computer requires to reconstruct
the spatial data in digital form.
In the raster world, grid cells representing real world features.
In the vector world, points, lines and polygons that consist of vertices and paths.
24 | P a g e
DSC651 – Data Representation
5) Textual
Why Visualize Text Data?
Visually representing the content of a text document is one of the most important
tasks in the field of text mining
Many text visualizations do not represent the text directly, they represent an
output of a language model (word count, character length, word sequences, etc.)
a) Word Cloud
b) Trees or Hierarchies
25 | P a g e
DSC651 – Data Representation
Display data in more sophisticated ways such as infographics, dials and gauges,
geographic maps, sparklines, heat maps, and detailed bar, pie and fever charts.
Interactive capabilities, enabling users to manipulate them or drill into the data for
querying and analysis.
Indicators designed to alert users when data has been updated or predefined conditions
occur can also be included.
Cons:
26 | P a g e
DSC651 – Data Representation
Functional Requirements
Functional requirements are divided into two categories: functional user requirements
and functional system requirements.
Functional user requirements are high-level statements of what the system should do.
27 | P a g e
DSC651 – Data Representation
business rules
transaction corrections, adjustments and cancellations
administrative functions
authentication
authorization levels
audit tracking
external interfaces
certification requirements
reporting requirements
historical data
legal or regulatory requirements
Non-Functional Requirements
Any requirement which specifies how the system performs a certain function. Describe
how a system should behave and what limits there are on its functionality. Specify the
system’s quality attributes or characteristics.
For example:
“modified data in a database should be updated for all users accessing it within 2
seconds”
“the cup shall contain hot liquid without heating up to more than 45°C”
It is important to correctly state non-functional requirements. Since they will affect your
users’ experience when interacting with the system.
a) Usability
Prioritize the important functions of the system based on usage patterns.
Frequently used functions should be tested for usability
Complex and critical functions should also be tested their usability
b) Reliability
Users have to trust the system, even after using it for a long time.
Your goal should be a long mtbf (mean time between failures).
Create a requirement that data created in the system will be retained for a number of
years without the data being changed by the system.
It’s a good idea to also include requirements that make it easier to monitor system
performance.
28 | P a g e
DSC651 – Data Representation
c) Performance
What should system response times be, as measured from any point, under what
circumstances?
Are there specific peak times when the load on the system will be unusually high?
Think of stress periods, for example, at the end of the month or in conjunction with
payroll disbursement.
d) Supportability
The system needs to be cost-effective to maintain.
Maintainability requirements may cover diverse levels of documentation
Such as system documentation
As well as test documentation
Example: which test cases and test plans will accompany the system
29 | P a g e
DSC651 – Data Representation
This is why most ads have two words at most. Even the ads that have lots of words just
use them for decoration. The real message is in the one or two words they’ve
highlighted.
By not putting in too much detail, you increase the odd that your programmers, graphic
designers, executive staff, and others will actually read your specification
Specific, but not prescriptive
Do say exactly what needs to be included. Don’t say exactly how it needs to be achieved.
Putting in too much detail can also sap the creativity of your programmers.
Put yourself in their shoes. What if you got some ten pages manifesto dropped on your
desk that described how the print button was supposed to work, including every last little
detail about what to do if the document is too long, too short, contains pictures, is too
wide, etc. Just tell them they need to put a print feature in, and that it should be able to
print all documents created by the program or gracefully error out, and you’re done.
2) Use pictures.
A picture is worth a thousand words.
Creating wireframe interface mockups is one of the best ways to show people what the
program will do.
Even better is to hire a graphic or user interface designer and get them to create
prototypes of the actual screens.
You are going to have to do this at some point anyway, and getting it done now can save
a lot of time in the end.
30 | P a g e
DSC651 – Data Representation
Ask them what was wrong with the last version or a comparable product
Find out what is causing support calls on other products
The problem is that everybody else is just as busy as you are, and it’s easy to ignore a
document, especially if it is the third revision they have seen and they are leaving for
New York in two hours.
You tell them that if they don’t show, then they don’t get any input.
Of course, the tough talk is hard to enforce when someone does come back three weeks
later and say that you absolutely must have ‘xyz’ feature.
Its particularly hard to enforce when ‘someone’ happens to be your boss.
5) Keep it organized
Group all requirements by their functional areas. Put in section headings, table of
contents, etc. . Automatically assume that somebody will read the entire doc.
Point of fact: you will likely be the only one to read your doc cover-to-cover, particularly if
it’s for a large system.
Everyone else will skim through the features or sections that they care about, or are
responsible for, and just skim the rest. So, make it easy for them.
31 | P a g e
DSC651 – Data Representation
Get as much feedback as you can, ask as many questions and test your own
comprehension by presenting the facts you gathered, back to the user.
That way, all unclear issues can be threshed out as early as possible. Organize your
presentation
While you are writing functional requirements, another team member is also waiting for
your spec.
Any delay on your part is a delay for the entire team.
By forwarding a clear and concise functional requirement document as early as possible,
you are also giving time for the systems developer to work on his job under less pressing
conditions.
Part of the project's allotted time is for getting feedback that may require revisions and
adjustments.
A customer who becomes satisfied with the team's output at the soonest time possible,
indicates a job well done.
c) Ownership of requirements
Exercise
Each group are required find a sample of a complete functional and non-functional report
and share them in the class’s group.
32 | P a g e
DSC651 – Data Representation
This is the core of the Contextual Design philosophy, in which it understand users in
order to find out their fundamental intents, desires, and drivers.
However, these desires are invisible to the users, hence the only way to glean them is to
go out in the field and engage in conversations with people.
33 | P a g e
DSC651 – Data Representation
A user-centric interface should make the user feel as if you have read their mind. The
processes involved in designing products and systems that meet both users’ and
business’ needs.
The best product designs are produced when the product’s designers are involved in
collecting and interpreting customers’ data and appreciate what customer need. It
heavily emphasis getting the right data to make the right decisions.
The focus is on designing effective solutions. E,g, Data gathered from customers is the
basic criteria that are used to decide what a product should do. It acts as a guide during
the designing process and helps avoid claims such as “what customers may like,” which
at times can be misleading. Contextual inquiry is an explicit step for understanding who
the customers really are and how they work on a daily basis.
34 | P a g e
DSC651 – Data Representation
Spotify also suggests playlists based on the time, date and the user’s location. These
solutions are using context-awareness to create a personalized experience, reward the
user for being active, and encourage further participation.
35 | P a g e
DSC651 – Data Representation
In this stage, it is important to understand the customer, their needs and how
they work every day.
Interviews should be conducted as customers work.
discusses their unique perspectives of the data so that they develop a shared
view of their customers.
2) WORK MODEL :
3) CONSOLIDATION:
When solutions are designed, they may at times be required to serve the needs
of the entire customer population.
In such scenario, all individual diagrams of the work of various customers should
be put together in order to identify any common patterns.
This is done through an affinity diagram that shows the scope of issues and
consolidated work models that shows the underlying pattern and structure that
needs to be addressed.
4) WORK REDESIGN:
The consolidated data that is collected helps the design team to find ways in
which technology and other changes to organizational procedures can be
introduced to help improve work.
Storyboards are used as part of this process to define a new work system.
This helps a user understand the various parts of the system that has been
created and the functions each part serves
It also shows how each part fits into other existing systems.
36 | P a g e
DSC651 – Data Representation
37 | P a g e
DSC651 – Data Representation
3) Need to achieve a business goal the user wouldn’t seek out on their own.
Sneak in an extra step in their journey.
In order to bring your product from good to great, you need a delight factor. It could be a
cute illustration,
a cool animation
a level of personalization they didn’t expect.
38 | P a g e
DSC651 – Data Representation
EXERCISE
Describe how the system have used the contextual design concept to fullfill your need?
Designers need to plan for all scenarios, including times when there is no data to work
with. Contextual designs will require more screen states than a static interface would
o Personalized products
However, it is worth it for every user to feel as if the app was designed for them.
39 | P a g e
DSC651 – Data Representation
So, it is important that you feed the algorithms the right data. Also, it is equally important
that the data is in right format and scale.
40 | P a g e
DSC651 – Data Representation
It has been estimated that data preparation accounts for 60%-80% of the time spent on a
data mining project. Data preparation improves the quality of data and consequently
helps improve the quality of data analysis results.
The data can be stored in a relational or NoSQL database depending upon the size of
data. Also, it is tempting to use all the data that is available.
Larger dataset does not always guarantee high performance.
In fact, the larger the dataset, the higher the computational cost.
So, it is better to use a subset of the available data in the first run.
If the smaller subset of the data does not perform well in terms of precision and
recall, there is always an option to use the whole dataset.
41 | P a g e
DSC651 – Data Representation
Data Transformation
1) Data Merging:
It’s a process of integrating the data from multiple data sources into a single
operational data set
Data Sources can be homogenous or heterogeneous
Two types of merging:
o Horizontal Merging: Process of joining the records horizontally when the 2
sources have different data definitions.
o Vertical Merging: Process of joining the records vertically when the 2
sources have similar data structures
2) Data cleansing:
Data cleansing is the process of ensuring that a set of data is correct and
accurate.
During this process, records are checked for:
o accuracy
o Consistency
and they are either corrected or deleted as necessary.
This can occur within a single set of records or between multiple sets of data that
need to be merged or that will work together
The Data Processing Cycle is a series of steps carried out to extract information from
raw data. Although each step must be taken in order, the order is cyclic.
The output and storage stage can lead to the repeat of the data collection stage,
resulting in another cycle of data processing. The cycle provides a view on how the data
travels and transforms from collection to interpretation, and ultimately, used in effective
business decisions.
42 | P a g e
DSC651 – Data Representation
2) Preparation is the manipulation of data into a form that is suitable for further
analysis and processing. Raw data cannot be processed and must be checked for
accuracy. Preparation is about constructing a dataset from one or more data
sources to be used for further exploration and processing. Analyzing data that has
not been carefully screened for problems can produce highly misleading results that
are heavily dependent on the quality of data prepared
3) Input is the task where verified data is coded or converted into machine readable
form so that it can be processed through a computer. Data entry is done through the
use of a keyboard, digitizer, scanner, or data entry from an existing source. This
time-consuming process requires speed and accuracy. Most data need to follow a
formal and strict syntax since a great deal of processing power is required to
breakdown the complex data at this stage. Due to the costs, many businesses are
resorting to outsource this stage.
43 | P a g e
DSC651 – Data Representation
6) Storage is the last stage in the data processing cycle, where data, instruction and
information are held for future use. The importance of this cycle is that it allows
quick access and retrieval of the processed information, allowing it to be passed on
to the next stage directly, when needed. Every computer uses storage to hold
system and application software.
44 | P a g e
DSC651 – Data Representation
The more colors in a chart to represent your data, the harder it becomes to read it
quickly. Your readers will need to often consult the color key to understand what is
shown in your chart. You can also use another chart type
45 | P a g e
DSC651 – Data Representation
46 | P a g e
DSC651 – Data Representation
3) Consider the color grey as the most important color in Data Vis.
Using grey for less important elements in your chart makes your highlight colors
(which should be reserved for your most important data points) stick out even
more.
Grey is also helpful for general context data, less important annotations, to show
what’s unselected by the user, or to tone down the overall visual impression of
your charts.
Since grey can come off a little cold, consider using it with a hint of color: Try:
o a warm grey,
o grey+yellow/orange/red),
o or use another very light color as an alternative (e.g. super light yellow):
47 | P a g e
DSC651 – Data Representation
48 | P a g e
DSC651 – Data Representation
EXERCISE
49 | P a g e
DSC651 – Data Representation
50 | P a g e
DSC651 – Data Representation
7) Use light colors for low values and dark colors for high values.
When using color gradients, make sure that the bright colors represent low
values,
while the dark colors represent high values.
This will be most intuitive for most readers:
8) Don’t use a gradient color palette for categories and the other way
round.
It might be tempting to use shades of one hue (e.g. blue) even for categories, to
make your chart look less colorful.
However, since many readers will associate dark colors with “more/high” and
bright colors with “less/low”, such a color palette will imply a ranking of your
categories.
Use different hues (green, yellow, pink, etc.) for your categories to avoid that, and
to be able to talk about these colors.
Readers might be quicker at finding specific categories if you make their colors
stand out with a different lightness or saturation,
but note that your chart should explain why these colors stand out.
If you find your chart to be too colorful, consider another chart type for your data.
51 | P a g e
DSC651 – Data Representation
52 | P a g e
DSC651 – Data Representation
10) Consider using two hues for a gradient, not just one.
Make your map or chart even more decipherable.
Readers will be able to distinguish the colors on the gradient better if they are
encoded through lightness and (two or three carefully selected) hue:
53 | P a g e
DSC651 – Data Representation
54 | P a g e
DSC651 – Data Representation
55 | P a g e
DSC651 – Data Representation
56 | P a g e
DSC651 – Data Representation
57 | P a g e
DSC651 – Data Representation
58 | P a g e
DSC651 – Data Representation
59 | P a g e
DSC651 – Data Representation
60 | P a g e
DSC651 – Data Representation
61 | P a g e
DSC651 – Data Representation
62 | P a g e