Modul - Data Representation Ver 3.0 - Updated

Download as pdf or txt
Download as pdf or txt
You are on page 1of 63

DSC651 – Data Representation

0|Page
DSC651 – Data Representation

Table of Content
Chapter Page No

1 Data Representation 1

2 Structure of Data Representation 18

3 User Requirements And Perceptions 26

4 The Relationship with the Users: Analyzing and Reviewing 32


Context Design

5 Data Acquisition and Preparation 39

6 Basic Visualization Design 43


(Graph Story Telling and Make up)

This module is compiled by:


Zainura Idrus
Siti Salwa Salleh
1|Page
Juliana Hamka Kamaroddin
Faculty of Computer Science, UiTM Shah Alam, Selangor
3 Sept 2019
DSC651 – Data Representation

What is Data?
In computing, data is information that has been translated into a form that is efficient for
movement or processing. Relative to today's computers and transmission media, data is
information converted into binary digital form. It is acceptable for data to be used as a
singular subject or a plural subject. Raw data is a term used to describe data in its most
basic digital format.

Data a representation of
 facts
 Concepts
 instructions
 in a formalized manner
 Suitable for communication, interpretation or processing
by human or electronic machine.

Data is represented by characters such as:


 alphabets (A-Z, a-z)
 digits (0-9)
 special characters (+,-,/,*,<,>,= etc.)

2|Page
DSC651 – Data Representation

Fig 1. Examples of computer data

What is Information?
Information is organized or classified data, which has some meaningful values for the
receiver. Information is the processed data on which decisions and actions are based.

For the decision to be meaningful, the processed data must qualify for the following
characteristics :
 Timely − Information should be available when required.
 Accuracy − Information should be accurate.
 Completeness − Information should be complete.

Brief Evolution of Data and Information to


Big Data
The increasing amount of data and information has created big data.

Big Data has been described by some Data Management experts as “huge,
overwhelming, and uncontrollable amounts of information.” In 1663, John Graunt dealt
with “overwhelming amounts of information” while he studied the bubonic plague, which
was currently ravaging Europe. Graunt used statistics and was the first person to use
statistical data analysis.

The global data explosion is highly driven by technologies including digital video and
music, smartphones, and the Internet. This data has its' origins in a variety of sources
including web searches, sensors, commercial transactions, social media interactions,
audio and video uploads, and mobile phone GPS signals.

3|Page
DSC651 – Data Representation

Sources of Data and Big Data


In 2005, Big Data, which had been used without a name, was labeled by Roger
Mougalas. He was referring to a large set of data that, at the time, was almost
impossible to manage and process using the traditional business intelligence tools
available

In defining big data, it’s also important to


understand the mix of unstructured and
multi-structured data that comprises the
volume of information. Big data is often
characterized by the 3Vs: the
extreme volume of data, the wide variety of
data types and the velocity at which the data
must be processed.

Big data is everywhere and it can help organisations any industry in many different ways.
Nowadays there is so much data that existing hardware and software are not able to
deal with the vast amount of different types of data that is created at such a high
speed. Big data has become too complex and too dynamic to be able to process, store,
analyze and manage with traditional data tools.

There are some of many sources of Big Data:

1) Sensors/meters and activity records from electronic devices:These kind of


information is produced on real-time, the number and periodicity of observations of the
observations will be variable, sometimes it will depend of a lap of time, on others of the
occurrence of some event (per example a car passing by the vision angle of a camera)
and in others will depend of manual manipulation (from an strict point of view it will be

4|Page
DSC651 – Data Representation

the same that the occurrence of an event). Quality of this kind of source depends mostly
of the capacity of the sensor to take accurate measurements in the way it is expected.

2) Social interactions: Is data produced by


human interactions through a network, like
Internet. The most common is the data
produced in social networks. This kind of
data implies qualitative and quantitative
aspects which are of some interest to be
measured. Quantitative aspects are easier to
measure tan qualitative aspects, first
ones implies counting number of
observations grouped by geographical or
temporal characteristics, while the quality of
the second ones mostly relies on the
accuracy of the algorithms applied to extract
the meaning of the contents which are
commonly found as unstructured text written
in natural language, examples of analysis
that are made from this data are sentiment
analysis, trend topics analysis, etc.;

3) Business transactions: Data produced as a result of business activities can be


recorded in structured or unstructured databases. When recorded on structured data
bases the most common problem to analyze that information and
get statistical indicators is the big volume of information and the periodicity of its
production because sometimes these data is produced at a very fast pace, thousands of
records can be produced in a second when big companies like supermarket chains are
recording their sales.

But these kind of data is not always produced


in formats that can be directly stored in
relational databases, an electronic invoice is
an example of this case of source, it has more
or less an structure but if we need to put the
data that it contains in a relational database,
we will need to apply some process
to distribute that data on different tables (in
order to normalize the data accordingly with
the relational database theory), and maybe is
not in plain text (could be a picture, a PDF,
Excel record, etc.), one problem that we could
have here is that the process needs time and
as previously said, data maybe is being
produced too fast, so we would need to have
different strategies to use the data, processing
it as it is without putting it on a relational
database, discarding some observations
(which criteria?), using parallel processing, etc.
Quality of information produced from business
transactions is tightly related to the capacity to
get representative observations and to process
them;

5|Page
DSC651 – Data Representation

4) Electronic Files: These refers to unstructured documents, statically or dynamically


produced which are stored or published as electronic files, like Internet pages, videos,
audios, PDF files, etc. They can have contents of special interest but are difficult to
extract, different techniques could be used, like text mining, pattern recognition, and so
on. Quality of our measurements will mostly rely on the capacity to extract and correctly
interpret all the representative information from those documents;

5) Broadcastings: Mainly referred to video and audio produced on real time, getting
statistical data from the contents of this kind of electronic data by now is too complex
and implies big computational and communications power, once solved the problems of
converting “digital-analog” contents to “digital-data” contents we will have similar
complications to process it like the ones that we can find on social interactions.

Examples
Data typically originates from one of three primary sources of big data

 Social networking sites: Facebook, Google, LinkedIn all these sites generates
huge amount of data on a day to day basis as they have billions of users
worldwide.
 Exercise: What are the data they collected?
6|Page  E-commerce site: Sites like Amazon, Flipkart, Alibaba generates huge amount of
logs from which users buying trends can be traced.
 Weather Station: All the weather station and satellite gives very huge data
which are stored and manipulated to forecast weather.
 Share Market: Stock exchange across the world generates huge amount of data
DSC651 – Data Representation

Data Processing Cycle


Data processing is the re-structuring or re-ordering of data by people or machine to
increase their usefulness. The Data Processing Cycle is a series of steps carried out to
extract information from raw data. Although each step must be taken in order, the order
is cyclic. The output and storage stage can lead to the repeat of the data collection stage,
resulting in another cycle of data processing

Fig 2. Data processing cycle

The basic steps involve are: input, processing, and output.

Input - input data is prepared in some convenient form for processing.

Processing - the input data is changed to produce data in a more useful form.
For example, pay-checks can be calculated from the time cards, or a
summary of sales for the month can be calculated from the sales orders.

Output - The particular form of the output data depends on the use of the data.

7|Page
DSC651 – Data Representation

For example, output data may be pay-checks for employees.

Data Representation with Visualization


Data visualization describes any effort to help people understand the significance of
data by placing it in a visual context.

The concept of using pictures to understand data has been around for centuries from
maps and graphs in the 17th century to the invention of the pie chart in the early 1800s.
Several decades later, one of the most cited examples of statistical graphics occurred
when Charles Minard mapped Napoleon’s invasion of Russia. The map depicted the size
of the army as well as the path of Napoleon’s retreat from Moscow – and tied that
information to temperature and time scales for a more in-depth understanding of the
event. It’s technology, however, that truly lit the fire under data visualization. Computers
made it possible to process large amounts of data at lightning-fast speeds. Today, data
visualization has become a rapidly evolving blend of science and art that is certain to
change the corporate landscape over the next few years.

Example 1

Example 2

8|Page
DSC651 – Data Representation

Example 3

9|Page
DSC651 – Data Representation

Goto URL for more examples:

https://visme.co/blog/examples-data-visualizations/

Why Data Visualization is Important?


10 | P a g e
DSC651 – Data Representation

Data visualizations make big and small data easier for the human brain to understand,
and visualization also makes it easier to detect patterns, trends, and outliers in groups
of data. Good data visualizations should place meaning into complicated datasets so
that their message is clear and concise.

Therefore data visualization is use:


 To communicate information clearly and efficiently to users by the usage of
information graphics such as tables and charts.

 To helps users in analyzing a large amount of data in a simpler way by making


complex data more accessible, readable, understandable and usable.

Three Functions of Visualization


11 | P a g e
DSC651 – Data Representation

It also comes with justification of how visualization aid users to learn about data
representation

1 To record information in the form of photographs, blueprints or graphs.

How does visualisation help?

It presents data beautifully

This infographic takes dense material, such as indicators and figures, and presents
it in a beautiful, clean and captivating format. Not only is the design deceptively
simple and functional, it also provides the user with many options for interacting
with the graphic, such as adding countries, indicators and type of relation.

It stimulates user’s imagination

12 | P a g e
DSC651 – Data Representation

A good infographic will not only do the hard work of digesting complex data, it may
also stimulate readers’ imagination by allowing them to conjure up different
hypothetical situations and possibilities, as is done in this example. By presenting
an interactive, game-like experience, this infographic quickly engages the user and
keeps them interested from beginning to end.

It explains a process

In line with the objective of making the complex easy to understand, this infographic
provides a visual representation of a coffee bean’s journey, from bean to cup. By
breaking the process down into parts, this data visualization does its job of giving the
reader bite-sized pieces of information that are easily digestible.

2
To explore or analyse information:

13 | P a g e
DSC651 – Data Representation

a) Process and calculate


b) Reason about data
c) Analyse and discover: patterns, trends, and correlations

How does visualisation help?

It reveals trends

The Year in News is a good example of how an expertly executed data


visualization can reveal patterns and trends hiding beneath the surface of
mountains of data. By analyzing 184.5 million Twitter mentions, Echelon Insights
was able to provide a bird’s eye view of what America was talking about in 2014.

It provides access to raw data

14 | P a g e
DSC651 – Data Representation

This data visualization not only has all of the previous qualities mentioned, it also
allows the user to have direct access to all the original raw data (view link on
bottom right corner).

Also, by using bubbles shaped in accordance with the size of the data breach, the
viewer can get a solid overview of the data breach “landscape.” And if viewers
want to get into the details of the information, they can also go as deep or as
superficially as they want by navigating the different filters and raw data.

It is interactive

The interactive piece The Daily Routines of Famous Creative People is a perfect
example of a data visualization that combines all the necessary ingredients of an

15 | P a g e
DSC651 – Data Representation

effective and engaging piece: It combines reams of data into a single page; it uses
color to easily distinguish trends; it allows the viewer to get a global sense of the data;
it engages users by allowing them to interact with the piece; and it is surprisingly
simple to understand in a single glance.

EXERCISE

Find one data visualisation system and describe how the exploration features are used

Explain information or presents storytelling

3 1)
2)
Share and persuade
Emphasize important aspects of data
3) Answer a question
4) Make decisions
5) Present an argument
6) Feedback and interaction
7) Tell a story

It tells a story

16 | P a g e
DSC651 – Data Representation

An effective data visualization not only conveys information in a convincing


manner, it also narrates a story worth telling. This piece, for example, tells the story
of every known drone strike and victim in Pakistan. By distilling information into an
easily understandable visual format, this infographic dramatically brings to light
disturbing facts that should not go unnoticed.

(please refer to https://visme.co/blog/examples-data-visualizations/#o4Mj2fFklm8Y13bb.99 for more explanation)

EXERCISE
Find one case study and explain its story telling.

Trends in Data Representation


17 | P a g e
DSC651 – Data Representation

EXERCISE

Find one type of visualization and explain what is their feature.

18 | P a g e
DSC651 – Data Representation

Types of Data Representation Structure

There are various types of data representations sturucture where each type has its own
unique features and characteristics. Some of the data representations type to be
discussed are (i) hierarchical-trees, (ii) relational-networks, (iii) trends, timelines and
flows, (iv) spatial-temporal and (v) textual.

1) Hierarchical – Trees
 A hierarchical data model - one of the earliest data models to represent a file-
based model build like a tree.
 Hierarchical data is arranged in a tree form basis. Each data is a node, a parent
node could be associated to multiple child nodes, but a child node could have
only one parent.

Examples of tree diagram:


19 | P a g e
DSC651 – Data Representation

a) Sunburst Chart

 consists of an inner circle surrounded by rings of deeper hierarchy levels.


 the angle of each segment is either proportional to a value or divided equally
under its parent node.
 all segments in sunburst charts may be coloured according to which category
or hierarchy level they belong to.

b) Treemap Diagram

 is a method for displaying hierarchicl data using nested figures, usually


rectangles
 displays categories by color and proximity
 can easily show lots of data which would be difficult with other chart types
 is easy to spot patterns, such as which items are a store's best sellers

c) Radial Tree

20 | P a g e
DSC651 – Data Representation

 a method of displaying a tree structure in a way that expands outwards,


radially.
 is one of many ways to visually display a tree

2) Relational – Networks
 A network graph uses information from both the link and node data sets to
generate a graphical depiction of the network.
 The nodes and links in a network graph can be arranged in a variety of layout
patterns.
 In computer science, a graph is an abstract data type that is meant to implement
the undirected graph and directed graph concepts from graph theory in
mathematics.

Directed Graph Undirected Graph

A directed graph with three vertices and An undirected graph with three vertices
four directed edges (the double arrow and three edges.
represents an edge in each direction).

21 | P a g e
DSC651 – Data Representation

 A graph database in computing is a database that uses graph structures


for semantic queries with nodes, edges, and properties to represent and store
data. The graph relates the data items in the store to a collection of nodes and
edges, the edges representing the relationships between the nodes. The
following is an example of graph representation of social network.

Table with users


and relationship
between users

Table with pages and


relationship between
user and pages
(Likes)
(Friends_with)

Social Network
scenario represented
as graph
(Friends_with)

 A network diagram is a visual representation of network architecture.


 It maps out the structure of a network with a variety of different symbols and line
connections.
 It is the ideal way to share the layout of a network because the visual
presentation makes it easier for users to understand how items are connected.

22 | P a g e
DSC651 – Data Representation

3) Trends, Timelines And Flows


 The main function of timelines is to communicate time-related information, either
for analysis or to visually present a story or view of history
 If scale-based, a timeline allows viewer to see when things occur or are to occur,
by allowing the viewer to assess the time intervals between events

23 | P a g e
DSC651 – Data Representation

4) Spatial-Temporal
 A common example of spatial data can be seen in a road map.
 A road map is a two-dimensional object that contains points, lines, and polygons
that can represent cities, roads, and political boundaries such as states or
provinces.
 A road map is a visualization of geographic information.
 Spatial data types provide the information that a computer requires to reconstruct
the spatial data in digital form.
 In the raster world, grid cells representing real world features.
 In the vector world, points, lines and polygons that consist of vertices and paths.

Screenshot showing a collection different techniques to visualize time, listed at


timeviz.net.

24 | P a g e
DSC651 – Data Representation

5) Textual
Why Visualize Text Data?

 Visually representing the content of a text document is one of the most important
tasks in the field of text mining
 Many text visualizations do not represent the text directly, they represent an
output of a language model (word count, character length, word sequences, etc.)

a) Word Cloud

The size of word relates to prominence or salience

b) Trees or Hierarchies

25 | P a g e
DSC651 – Data Representation

Data Visualization Today


Today's data visualisation goes beyond the standard charts and graphs used in
Microsoft Excel spreadsheets.

Display data in more sophisticated ways such as infographics, dials and gauges,
geographic maps, sparklines, heat maps, and detailed bar, pie and fever charts.

Interactive capabilities, enabling users to manipulate them or drill into the data for
querying and analysis.

Indicators designed to alert users when data has been updated or predefined conditions
occur can also be included.

Pros and Cons Of Data Visualisation


Pros:

 It can be accessed quickly by a wider audience.


 It conveys a lot of information in a small space.
 It makes your report more visually appealing.

Cons:

 It can misrepresent information – if an incorrect visual representation is made.


 It can be distracting – if the visual data is distorted or excessively used.

Future of Data Visualisation


 Data visualisation is going to change the way our analysts work with data
 They are going to be expected to respond to issues more rapidly
 They will need to be able to dig for more insights – looking at data differently,
more imaginatively
 Data visualization will promote that creative data exploration

26 | P a g e
DSC651 – Data Representation

Functional Requirements
Functional requirements are divided into two categories: functional user requirements
and functional system requirements.

Functional user requirements are high-level statements of what the system should do.

Functional system requirements:

 Describe clearly about the system services in detail.


 Any requirements which specify what the system should do
 Describe a particular behavior of function of the system when certain conditions are
met, for example:
O “to send email when a new customer signs up”
O “a cup shall have the ability to contain tea or coffee without leaking”

Typical functional requirements:

27 | P a g e
DSC651 – Data Representation

 business rules
 transaction corrections, adjustments and cancellations
 administrative functions
 authentication
 authorization levels
 audit tracking
 external interfaces
 certification requirements
 reporting requirements
 historical data
 legal or regulatory requirements

Non-Functional Requirements
Any requirement which specifies how the system performs a certain function. Describe
how a system should behave and what limits there are on its functionality. Specify the
system’s quality attributes or characteristics.
For example:
 “modified data in a database should be updated for all users accessing it within 2
seconds”
 “the cup shall contain hot liquid without heating up to more than 45°C”

It is important to correctly state non-functional requirements. Since they will affect your
users’ experience when interacting with the system.

Four examples of non-functional requirements are usability, reliability, performance and


supportability.

a) Usability
Prioritize the important functions of the system based on usage patterns.
Frequently used functions should be tested for usability
Complex and critical functions should also be tested their usability

b) Reliability
Users have to trust the system, even after using it for a long time.
Your goal should be a long mtbf (mean time between failures).
Create a requirement that data created in the system will be retained for a number of
years without the data being changed by the system.
It’s a good idea to also include requirements that make it easier to monitor system
performance.

28 | P a g e
DSC651 – Data Representation

c) Performance
What should system response times be, as measured from any point, under what
circumstances?
Are there specific peak times when the load on the system will be unusually high?
Think of stress periods, for example, at the end of the month or in conjunction with
payroll disbursement.

d) Supportability
The system needs to be cost-effective to maintain.
Maintainability requirements may cover diverse levels of documentation
 Such as system documentation
 As well as test documentation
 Example: which test cases and test plans will accompany the system

Other non-functional requirements include:


 Performance – for example: response time, throughput, utilization, static
volumetric
 Scalability
 Capacity
 Availability
 Reliability
 Recoverability
 Maintainability
 Serviceability
 Security
 Regulatory
 Manageability
 Environmental
 Data integrity
 Usability
 Interoperability

29 | P a g e
DSC651 – Data Representation

Designing and Writing Good Functional


Requirements
How do you write requirements that are:
 complete enough - that they convey the needed information?
 but simple enough - that people actually read them?

1) Keep it simple (most important)


People don’t like to read.

This is why most ads have two words at most. Even the ads that have lots of words just
use them for decoration. The real message is in the one or two words they’ve
highlighted.
By not putting in too much detail, you increase the odd that your programmers, graphic
designers, executive staff, and others will actually read your specification
Specific, but not prescriptive

Do say exactly what needs to be included. Don’t say exactly how it needs to be achieved.

Putting in too much detail can also sap the creativity of your programmers.
Put yourself in their shoes. What if you got some ten pages manifesto dropped on your
desk that described how the print button was supposed to work, including every last little
detail about what to do if the document is too long, too short, contains pictures, is too
wide, etc. Just tell them they need to put a print feature in, and that it should be able to
print all documents created by the program or gracefully error out, and you’re done.

2) Use pictures.
A picture is worth a thousand words.
Creating wireframe interface mockups is one of the best ways to show people what the
program will do.

Even better is to hire a graphic or user interface designer and get them to create
prototypes of the actual screens.

You are going to have to do this at some point anyway, and getting it done now can save
a lot of time in the end.

3) Circulate requirements early and often


Circulate requirements early and often and set hard deadlines for feedback.
It is a good idea to circulate your requirements ‘early and often’ to get feedback, and
then set hard deadlines for that feedback, for example “If I don’t hear from you by
Tuesday at noon, I’m going to assume that you think everything is ok”.

Get stakeholders involved early

30 | P a g e
DSC651 – Data Representation

Ask them what was wrong with the last version or a comparable product
Find out what is causing support calls on other products

The goal is to get all the stakeholders in a project to buy in.

The problem is that everybody else is just as busy as you are, and it’s easy to ignore a
document, especially if it is the third revision they have seen and they are leaving for
New York in two hours.

One successful technique is to organize a product design meeting,


Where you setup a meeting to go over the requirements (or a section of the
requirements)
Make sure to invite everybody that needs to be there.

You tell them that if they don’t show, then they don’t get any input.
Of course, the tough talk is hard to enforce when someone does come back three weeks
later and say that you absolutely must have ‘xyz’ feature.
Its particularly hard to enforce when ‘someone’ happens to be your boss.

4) Use bullet points


Some people even use PowerPoint, not word.
People do seem to respond better to bullet points.
With a bullet point format, you a highlight everything critical in the bullet point,
And then write the rest in paragraphs later. Give a bullet point, and if I’m interested in the
details, I’ll read it.

5) Keep it organized
Group all requirements by their functional areas. Put in section headings, table of
contents, etc. . Automatically assume that somebody will read the entire doc.

Point of fact: you will likely be the only one to read your doc cover-to-cover, particularly if
it’s for a large system.

Everyone else will skim through the features or sections that they care about, or are
responsible for, and just skim the rest. So, make it easy for them.

6) Other important points to consider


a) Double-check your facts

Accuracy of information is the first important aspect to consider-meet user requirement.


Make it a point to check if your own understanding of the user's wants is correct.
Use illustrations, make a diagram or draw a prototype and present it to the user.

31 | P a g e
DSC651 – Data Representation

Get as much feedback as you can, ask as many questions and test your own
comprehension by presenting the facts you gathered, back to the user.
That way, all unclear issues can be threshed out as early as possible. Organize your
presentation

Present all requirements in an organized and systematic manner. Systems developer


will base his comprehension on the way the requirements are presented. This is where
diagrams or illustrations will be most useful, as a means for presenting the inputs and
outputs in the proper functional areas.

b) Observe time frames

While you are writing functional requirements, another team member is also waiting for
your spec.
Any delay on your part is a delay for the entire team.
By forwarding a clear and concise functional requirement document as early as possible,
you are also giving time for the systems developer to work on his job under less pressing
conditions.
Part of the project's allotted time is for getting feedback that may require revisions and
adjustments.
A customer who becomes satisfied with the team's output at the soonest time possible,
indicates a job well done.

c) Ownership of requirements

Someone has to have the final say.


If you are expecting an entire committee to sign off anything longer than a page, you are
likely to end up in endless iterations.
Agree with the committee that person x is the ultimate authority and if they are happy
then the requirements are signed off.

Exercise

Each group are required find a sample of a complete functional and non-functional report
and share them in the class’s group.

32 | P a g e
DSC651 – Data Representation

1) To understand the importance of contextual design


2) To learn the stages of contextual design
3) To identify contextual design challenges

What Is Contextual Design?

Contextual design is a user-centered design process developed by Hugh Beyer


and Karen Holtzblatt.

Contextual Design is a structured, well-defined user-centered design process that


provides methods to collect data about users in the field, interpret and consolidate that
data in a structured way. Data are used to create product prototypes and service
concepts, and iteratively test and refine those concepts with users.

This is the core of the Contextual Design philosophy, in which it understand users in
order to find out their fundamental intents, desires, and drivers.

However, these desires are invisible to the users, hence the only way to glean them is to
go out in the field and engage in conversations with people.

33 | P a g e
DSC651 – Data Representation

A user-centric interface should make the user feel as if you have read their mind. The
processes involved in designing products and systems that meet both users’ and
business’ needs.

The best product designs are produced when the product’s designers are involved in
collecting and interpreting customers’ data and appreciate what customer need. It
heavily emphasis getting the right data to make the right decisions.

The focus is on designing effective solutions. E,g, Data gathered from customers is the
basic criteria that are used to decide what a product should do. It acts as a guide during
the designing process and helps avoid claims such as “what customers may like,” which
at times can be misleading. Contextual inquiry is an explicit step for understanding who
the customers really are and how they work on a daily basis.

34 | P a g e
DSC651 – Data Representation

Case Study- Contextual Design Is Effortless:


For The User
Spotify, is a music streaming service developed by Swedish company Spotify
Technology, which headquartered in Stockholm, Sweden. It creates a custom weekly
playlist for each user based on their listening habits.

Spotify also suggests playlists based on the time, date and the user’s location. These
solutions are using context-awareness to create a personalized experience, reward the
user for being active, and encourage further participation.

35 | P a g e
DSC651 – Data Representation

Stages of Contextual Design


1) CONTEXTUAL ENQUIRY:

 In this stage, it is important to understand the customer, their needs and how
they work every day.
 Interviews should be conducted as customers work.
 discusses their unique perspectives of the data so that they develop a shared
view of their customers.

2) WORK MODEL :

 At times understanding a customer’s work can be complex when multiple


departments of an organization are involved.
 In such scenarios work models or diagrams can be created to get an idea of what
work is being done.

3) CONSOLIDATION:

 When solutions are designed, they may at times be required to serve the needs
of the entire customer population.
 In such scenario, all individual diagrams of the work of various customers should
be put together in order to identify any common patterns.
 This is done through an affinity diagram that shows the scope of issues and
consolidated work models that shows the underlying pattern and structure that
needs to be addressed.

4) WORK REDESIGN:

 The consolidated data that is collected helps the design team to find ways in
which technology and other changes to organizational procedures can be
introduced to help improve work.
 Storyboards are used as part of this process to define a new work system.

5) USER ENVIRONMENT DESIGN:

 This helps a user understand the various parts of the system that has been
created and the functions each part serves
 It also shows how each part fits into other existing systems.

36 | P a g e
DSC651 – Data Representation

6) MOCK-UP AND TEST WITH CUSTOMERS:

 Testing through prototypes is essential to eliminate problems at the early stages


 Mock-ups are redesigned together by the design team and the end-user to
ensure that it meets the requirements better.

7) PUTTING INTO PRACTICE:

 When introducing a product, solution or a new work system, sometimes there


may be resistance.
 Existing resources and skills should be used to deal with such issues.
 Contextual design has to be customised to each organization.
 Systems that work for a small organization may not work as effectively in a larger
organization.

Fig.4 Similar Type of Contextual Design Process

37 | P a g e
DSC651 – Data Representation

Contextual Design Is Functional


1) User need to be guided along or through a process
Creates a status dashboard that presents a clear call-to-action, it also tracks their
progress, keeps them updated, gets their attention when time has passed and
rewards them when they have finished.

2) Need to avoid drop-off on screens prone to user error


 Provide context-sensitive help. Tell them why they’ve encountered the error.
 Suggest solutions. Offer alternatives.
 Use the data you have to help them pass the barrier.

3) Need to achieve a business goal the user wouldn’t seek out on their own.
Sneak in an extra step in their journey.

Contextual Design Can Delight


Users don’t always notice when you’re making their life easier, but if they are not thinking
about the process then you’re probably doing a good job.

In order to bring your product from good to great, you need a delight factor. It could be a

 cute illustration,
 a cool animation
 a level of personalization they didn’t expect.

38 | P a g e
DSC651 – Data Representation

EXERCISE

Identify one system that you like a lot.

Share your experience about the system with your class.

Describe how the system have used the contextual design concept to fullfill your need?

Contextual Design Challenges


One of the first challenges you may run into is data collection. Permissions can create
barriers for gaining access to the necessary data.

Designers need to plan for all scenarios, including times when there is no data to work
with. Contextual designs will require more screen states than a static interface would

This means more error- prone and more testing.

Contextual design can be:

o the foundation of your product

o Personalized products

o help your product meet a business objective

o It may require some extra effort to design and develop

However, it is worth it for every user to feel as if the app was designed for them.

39 | P a g e
DSC651 – Data Representation

1) To understand the importance of data preparation


2) To understand data transformation
3) To identify stages of data processing cycle

What Is Data Preparation?


Data preparation is the process of:
 Gathering
 Combining
 Structuring
 Organizing data

so it can be analyzed as part of:


 data visualization
 Data analytics
 machine learning applications.

So, it is important that you feed the algorithms the right data. Also, it is equally important
that the data is in right format and scale.

40 | P a g e
DSC651 – Data Representation

Why We Need Data Preparation


Real-world data are:
 noisy
 missing
 Inconsistent
 so after collecting the data, the data should be cleaned before any further
processing

E.g. Cleaning textual data involves:


 Fixing typos
 Stemming - grouping words with a similar basic meaning
 Removing extra symbols
o Html tags
o Stop words.
 Tokenization
 Part-of-speech tagging
 Chunking
 Grouping
 Negation handling

Many of the data preparation activities are:


 Routine
 Tedious
 time consuming

It has been estimated that data preparation accounts for 60%-80% of the time spent on a
data mining project. Data preparation improves the quality of data and consequently
helps improve the quality of data analysis results.

There is a well-known saying "garbage-in garbage-out" is very relevant to this domain.


One of the most important task in data preparation is formatting. It is important that the
collected data is in the correct format.

The data can be stored in a relational or NoSQL database depending upon the size of
data. Also, it is tempting to use all the data that is available.
 Larger dataset does not always guarantee high performance.
 In fact, the larger the dataset, the higher the computational cost.
 So, it is better to use a subset of the available data in the first run.
 If the smaller subset of the data does not perform well in terms of precision and
recall, there is always an option to use the whole dataset.

41 | P a g e
DSC651 – Data Representation

Data Transformation
1) Data Merging:
 It’s a process of integrating the data from multiple data sources into a single
operational data set
 Data Sources can be homogenous or heterogeneous
 Two types of merging:
o Horizontal Merging: Process of joining the records horizontally when the 2
sources have different data definitions.
o Vertical Merging: Process of joining the records vertically when the 2
sources have similar data structures

2) Data cleansing:
 Data cleansing is the process of ensuring that a set of data is correct and
accurate.
 During this process, records are checked for:
o accuracy
o Consistency
 and they are either corrected or deleted as necessary.
 This can occur within a single set of records or between multiple sets of data that
need to be merged or that will work together

Other Data Preparation


What are other common data preparations?
 join data from multiple tables
 append tables (data union)
 sort data
 filter data
 create calculated columns
 pivot data

The Data Processing Cycle is a series of steps carried out to extract information from
raw data. Although each step must be taken in order, the order is cyclic.

The output and storage stage can lead to the repeat of the data collection stage,
resulting in another cycle of data processing. The cycle provides a view on how the data
travels and transforms from collection to interpretation, and ultimately, used in effective
business decisions.

42 | P a g e
DSC651 – Data Representation

Stages of the Data Processing Cycle


The followings are the stages involve in data processing

1) Collection is the first stage of the cycle

 This is a very crucial stage


 This is because the quality of data collected will impact heavily on the output.
 Needs to ensure that the data gathered are both defined and accurate
 So that subsequent decisions based on the findings are valid.

Some types of data collection include:


 Census (data collection about everything in a group or statistical population)
 Sample survey (collection method that includes only part of the total population)
 Administrative by-product (data collection is a byproduct of an organization’s day-
to-day operations).

2) Preparation is the manipulation of data into a form that is suitable for further
analysis and processing. Raw data cannot be processed and must be checked for
accuracy. Preparation is about constructing a dataset from one or more data
sources to be used for further exploration and processing. Analyzing data that has
not been carefully screened for problems can produce highly misleading results that
are heavily dependent on the quality of data prepared

3) Input is the task where verified data is coded or converted into machine readable
form so that it can be processed through a computer. Data entry is done through the
use of a keyboard, digitizer, scanner, or data entry from an existing source. This
time-consuming process requires speed and accuracy. Most data need to follow a
formal and strict syntax since a great deal of processing power is required to
breakdown the complex data at this stage. Due to the costs, many businesses are
resorting to outsource this stage.

4) Processing is when the data is subjected to various means and methods of


manipulation, the point where a computer program is being executed, and it
contains the program code and its current activity. The process may be made up of
multiple threads of execution that simultaneously execute instructions, depending on
the operating system. While a computer program is a passive collection of
instructions, a process is the actual execution of those instructions. Many software
programs are available for processing large volumes of data within very short
periods.

5) Output and interpretation is the stage where processed information is now


transmitted to the user. Output is presented to users in various report formats like
printed report, audio, video, or on monitor. Output need to be interpreted so that it
can provide meaningful information that will guide future decisions of the company.

43 | P a g e
DSC651 – Data Representation

6) Storage is the last stage in the data processing cycle, where data, instruction and
information are held for future use. The importance of this cycle is that it allows
quick access and retrieval of the processed information, allowing it to be passed on
to the next stage directly, when needed. Every computer uses storage to hold
system and application software.

What Are the Common Visualization Design


Entities?
 Position
 Shape
 Orientation
 Size/Area
 Color
 Value
 Texture
 Length
 Slope
 Thickness
 Angle
 Distance

44 | P a g e
DSC651 – Data Representation

Colour - Gradient Colors


This is great to show patterns e.g. on a choropleth map.
But it is hard to decipher the actual values from them and to see differences between
the values. Consider showing your most important values with bars, position (like in a dot
plot) or even areas use colors to only show categories. Readers will be able to decipher
your values faster:

Color – Variety of Colors


Colors makes it easy to distinguish categories in your data but try to avoid using more
than seven colours.

The more colors in a chart to represent your data, the harder it becomes to read it
quickly. Your readers will need to often consult the color key to understand what is
shown in your chart. You can also use another chart type

45 | P a g e
DSC651 – Data Representation

Colour - How To Make Better Color Choices


The following are the guidelines in choosing colors:

1) Use the same color for the same variables.


 Using the same color is the best option to avoid overly colorful article.
 For example, it’s ok to show the unemployment rate in blue in the first chart and a
few paragraphs later a gross domestic product (GDP) chart with the same color.
 However, if you use more than one color for your first chart, the colors in this
chart will be “taken”.
 To not confuse readers and increase comparability, consider only using these
colors again if you’re showing data about the same category/country/etc.:

46 | P a g e
DSC651 – Data Representation

2) Make sure to explain to readers what your colors encode.


 Every visual mark that represents a value or variable should be explained:
o What does the height of your bar mean?
o What does the size of your circles on a symbol map represent?
 The same is true for colors. There are many ways to create a color key. Here
are three of them:

3) Consider the color grey as the most important color in Data Vis.
 Using grey for less important elements in your chart makes your highlight colors
(which should be reserved for your most important data points) stick out even
more.
 Grey is also helpful for general context data, less important annotations, to show
what’s unselected by the user, or to tone down the overall visual impression of
your charts.
 Since grey can come off a little cold, consider using it with a hint of color: Try:
o a warm grey,
o grey+yellow/orange/red),
o or use another very light color as an alternative (e.g. super light yellow):

47 | P a g e
DSC651 – Data Representation

4) Make sure your contrasts are high enough.


 Your readers will be able to read your chart on their screen, even in low light and
even if you use light colors like grey.
 This is especially important for text: The smaller the text, the higher its contrast to
the background needs to be for it to be readable.
 The contrast ratio between background and foreground should be at least 2.5 for
big text and at least 4 for small text.
 avoid complementary hues (e.g. red and green, orange and blue) and bright
colors for backgrounds.
 Use this tool to test your color contrast, the brightness difference and if colors are
“compliant”.

5) Consider where your colors appear in relation to each other.


 The smaller the areas on your chart and the bigger the distance between them,
the harder it is to compare them.
 Consider giving small points or lines a high contrast in their hue or brightness, to
make them easily distinguishable.
 However, big areas can handle toned-down colors with little contrast; especially if
there is no other (background) color between these areas:

48 | P a g e
DSC651 – Data Representation

EXERCISE

How the contrast ratios above effect your data perception?

49 | P a g e
DSC651 – Data Representation

6) Use intuitive colors.


 When choosing a color palette, consider their meaning in the culture of your
target audience.
 If possible, use colors that readers will associate with your data anyway,
o e.g. party colors:
o Republican = red
o Democrats = blue
 natural colors:
o forest = green
o lake = blue
 learned colors
o red = attention/stop/bad
o green = good (to go)
 When it comes to color-encoding gender data
o avoid the stereotypical pink-blue combination.
o To not confuse your readers entirely,
o try a cold color for men (e.g. blue or purple)
o and a warmer color for women (e.g. yellow, orange or a warm green):

50 | P a g e
DSC651 – Data Representation

7) Use light colors for low values and dark colors for high values.
 When using color gradients, make sure that the bright colors represent low
values,
 while the dark colors represent high values.
 This will be most intuitive for most readers:

8) Don’t use a gradient color palette for categories and the other way
round.
 It might be tempting to use shades of one hue (e.g. blue) even for categories, to
make your chart look less colorful.
 However, since many readers will associate dark colors with “more/high” and
bright colors with “less/low”, such a color palette will imply a ranking of your
categories.
 Use different hues (green, yellow, pink, etc.) for your categories to avoid that, and
to be able to talk about these colors.
 Readers might be quicker at finding specific categories if you make their colors
stand out with a different lightness or saturation,
 but note that your chart should explain why these colors stand out.
 If you find your chart to be too colorful, consider another chart type for your data.

51 | P a g e
DSC651 – Data Representation

9) Use lightness to build gradients, not just hue.


 When designing color gradients, lots of consideration is needed.
 If you’re unsure, use the
o Datawrapper defaults,
o the ColorBrewerpalettes or
o these Carto gradients.
 Don’t place more than two hues with the same lightness in your gradient, but
design it from a bright color (e.g. white) to a dark color (e.g. dark blue) in a
consistent way.
 Your gradient should work in black and white, too.
 Gradients with much variation in lightness (like rainbow scales) can confuse
readers:

52 | P a g e
DSC651 – Data Representation

10) Consider using two hues for a gradient, not just one.
 Make your map or chart even more decipherable.
 Readers will be able to distinguish the colors on the gradient better if they are
encoded through lightness and (two or three carefully selected) hue:

11) Consider using diverging color gradients.


 If you want to emphasize how a variable diverts from a baseline (say the national
average), you may want to consider using a diverging palette.
 It’s important to use clearly distinguishable hues for both sides of the gradient.
 The center color should ideally be a light grey, not white:

53 | P a g e
DSC651 – Data Representation

12) Consider color-blind people.


 Using different lightnesses in your gradients and color palettes has the big
advantage that readers with a color vision deficiency will still be able to
distinguish your colors. There are many different types of color blindness: Use
an online tool or Datawrapper’s automatic colorblind-check to make sure that
color-blind users can distinguish the colors on your chart:

54 | P a g e
DSC651 – Data Representation

Other Visualization Design

55 | P a g e
DSC651 – Data Representation

56 | P a g e
DSC651 – Data Representation

57 | P a g e
DSC651 – Data Representation

58 | P a g e
DSC651 – Data Representation

59 | P a g e
DSC651 – Data Representation

60 | P a g e
DSC651 – Data Representation

61 | P a g e
DSC651 – Data Representation

62 | P a g e

You might also like