NeedForDS FACETS
NeedForDS FACETS
NeedForDS FACETS
• Data science and big data are used almost everywhere in both commercial and
noncommercial settings.
• Commercial companies in almost every industry.
• To gain insights into their customers, processes, staff, completion, and
products.
• Governmental organizations
• Many governmental organizations not only rely on internal data scientists to
discover valuable information, but also share their data with the public.
• Nongovernmental organizations
• They use it to raise money and defend their causes.
Facets of data
• Structured
• Unstructured
• Natural language
• Machine-generated
• Graph-based
• Audio, video, and images
• Streaming
Structured data
• Structured data is data that depends on a data model and resides in a fixed field within a record.
• It’s stored as structured data in tables within databases or Excel files.
Unstructured data
• Unstructured data is data that isn’t easy to fit into a data model because the content is context-specific or
varying
Natural language
• Natural language is a special type of unstructured data.
• It’s challenging to process because it requires knowledge of specific data science
techniques and linguistics.
• The natural language processing community has had success in entity recognition, topic
recognition, summarization, text completion, and sentiment analysis.
Machine-generated data
• Machine-generated data is information that’s automatically created by a computer,
process, application, or other machine without human intervention.
Graph-based or network data
• “Graph data” can be a confusing term because any data can be shown in a graph.
Streaming Data
• While streaming data can take almost any of the previous forms, it has an extra property.
• The data flows into the system when an event happens instead of being loaded into a data store
in a batch.
The big data
ecosystem
Check your understanding
• What is the need for data science?
• What are the facets of data?
Summary
• Need for data science
• Benefits and uses
• Facets of data
• Big data ecosystem
Reference
• Davy Cielen, Arno D B Meysman, Mohamed Ali, “Introducing Data Science – Big data, Machine Learning,
and more using Python tools”, Manning Publications Co, 2016.