Scalable Data Streaming with Amazon Kinesis: Design and secure highly available, cost-effective data streaming applications with Amazon Kinesis
()
About this ebook
Amazon Kinesis is a collection of secure, serverless, durable, and highly available purpose-built data streaming services. This data streaming service provides APIs and client SDKs that enable you to produce and consume data at scale.
Scalable Data Streaming with Amazon Kinesis begins with a quick overview of the core concepts of data streams, along with the essentials of the AWS Kinesis landscape. You'll then explore the requirements of the use case shown through the book to help you get started and cover the key pain points encountered in the data stream life cycle. As you advance, you'll get to grips with the architectural components of Kinesis, understand how they are configured to build data pipelines, and delve into the applications that connect to them for consumption and processing. You'll also build a Kinesis data pipeline from scratch and learn how to implement and apply practical solutions. Moving on, you'll learn how to configure Kinesis on a cloud platform. Finally, you’ll learn how other AWS services can be integrated into Kinesis. These services include Redshift, Dynamo Database, AWS S3, Elastic Search, and third-party applications such as Splunk.
By the end of this AWS book, you’ll be able to build and deploy your own Kinesis data pipelines with Kinesis Data Streams (KDS), Kinesis Data Firehose (KFH), Kinesis Video Streams (KVS), and Kinesis Data Analytics (KDA).
Related to Scalable Data Streaming with Amazon Kinesis
Related ebooks
Practical AWS Networking: Build and manage complex networks using services such as Amazon VPC, Elastic Load Balancing, Direct Connect, and Amazon Route 53 Rating: 0 out of 5 stars0 ratingsServerless Architectures with AWS: Discover how you can migrate from traditional deployments to serverless architectures with AWS Rating: 0 out of 5 stars0 ratingsVMware Cross-Cloud Architecture: Automate and orchestrate your Software-Defined Data Center on AWS Rating: 0 out of 5 stars0 ratingsAccelerating DevSecOps on AWS: Create secure CI/CD pipelines using Chaos and AIOps Rating: 0 out of 5 stars0 ratingsLearning AWS Rating: 4 out of 5 stars4/5Data Engineering with AWS: Learn how to design and build cloud-based data transformation pipelines using AWS Rating: 0 out of 5 stars0 ratingsGenerative AI-Powered Assistant for Developers: Accelerate software development with Amazon Q Developer Rating: 0 out of 5 stars0 ratingsAWS Administration - The Definitive Guide: Design, build, and manage your infrastructure on Amazon Web Services, 2nd Edition Rating: 0 out of 5 stars0 ratingsIntelligent Workloads at the Edge: Deliver cyber-physical outcomes with data and machine learning using AWS IoT Greengrass Rating: 0 out of 5 stars0 ratingsBuilding Serverless Web Applications Rating: 0 out of 5 stars0 ratingsArchitecting Cloud-Native Serverless Solutions: Design, build, and operate serverless solutions on cloud and open source platforms Rating: 0 out of 5 stars0 ratingsLearn Microsoft Azure: Step by Step in 7 day for .NET Developers Rating: 0 out of 5 stars0 ratingsHands-On Azure for Developers: Implement rich Azure PaaS ecosystems using containers, serverless services, and storage solutions Rating: 0 out of 5 stars0 ratingsModern Data Architecture on AWS: A Practical Guide for Building Next-Gen Data Platforms on AWS Rating: 0 out of 5 stars0 ratingsExpert AWS Development: Efficiently develop, deploy, and manage your enterprise apps on the Amazon Web Services platform Rating: 0 out of 5 stars0 ratingsAI as a Service: Serverless machine learning with AWS Rating: 1 out of 5 stars1/5AWS Cloud Projects: Strengthen your AWS skills through practical projects, from websites to advanced AI applications Rating: 0 out of 5 stars0 ratingsLearn AWS Serverless Computing: A beginner's guide to using AWS Lambda, Amazon API Gateway, and services from Amazon Web Services Rating: 0 out of 5 stars0 ratingsLearning AWS: Design, build, and deploy responsive applications using AWS Cloud components, 2nd Edition Rating: 0 out of 5 stars0 ratingsAmazon Redshift Cookbook: Recipes for building modern data warehousing solutions Rating: 0 out of 5 stars0 ratingsOptimizing Your Modernization Journey with AWS: Best practices for transforming your applications and infrastructure on the cloud Rating: 0 out of 5 stars0 ratingsHybrid Cloud for Architects: Build robust hybrid cloud solutions using AWS and OpenStack Rating: 0 out of 5 stars0 ratingsAmazon EC2 Cookbook Rating: 0 out of 5 stars0 ratingsMastering AWS Security: Create and maintain a secure cloud ecosystem Rating: 0 out of 5 stars0 ratingsAzure for Developers.: Implement rich Azure PaaS ecosystems using containers, serverless services, and storage solutions Rating: 0 out of 5 stars0 ratings
Computers For You
Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics Rating: 4 out of 5 stars4/5Elon Musk Rating: 4 out of 5 stars4/5CompTIA Security+ Get Certified Get Ahead: SY0-701 Study Guide Rating: 5 out of 5 stars5/5The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology Rating: 3 out of 5 stars3/5Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad Rating: 0 out of 5 stars0 ratingsThe Invisible Rainbow: A History of Electricity and Life Rating: 5 out of 5 stars5/5The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution Rating: 4 out of 5 stars4/5Excel 101: A Beginner's & Intermediate's Guide for Mastering the Quintessence of Microsoft Excel (2010-2019 & 365) in no time! Rating: 0 out of 5 stars0 ratings101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters Rating: 4 out of 5 stars4/5Alan Turing: The Enigma: The Book That Inspired the Film The Imitation Game - Updated Edition Rating: 4 out of 5 stars4/5Slenderman: Online Obsession, Mental Illness, and the Violent Crime of Two Midwestern Girls Rating: 4 out of 5 stars4/5Mastering ChatGPT: 21 Prompts Templates for Effortless Writing Rating: 4 out of 5 stars4/5Uncanny Valley: A Memoir Rating: 4 out of 5 stars4/5CompTIA IT Fundamentals (ITF+) Study Guide: Exam FC0-U61 Rating: 0 out of 5 stars0 ratingsDeep Search: How to Explore the Internet More Effectively Rating: 5 out of 5 stars5/5SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL Rating: 4 out of 5 stars4/5The Professional Voiceover Handbook: Voiceover training, #1 Rating: 5 out of 5 stars5/5How to Create Cpn Numbers the Right way: A Step by Step Guide to Creating cpn Numbers Legally Rating: 4 out of 5 stars4/5Dark Aeon: Transhumanism and the War Against Humanity Rating: 5 out of 5 stars5/5Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are Rating: 4 out of 5 stars4/5The Hacker Crackdown: Law and Disorder on the Electronic Frontier Rating: 4 out of 5 stars4/5
Reviews for Scalable Data Streaming with Amazon Kinesis
0 ratings0 reviews
Book preview
Scalable Data Streaming with Amazon Kinesis - Tarik Makota
BIRMINGHAM—MUMBAI
Scalable Data Streaming with Amazon Kinesis
Copyright © 2021 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author(s), nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Group Product Manager: Kunal Parikh
Publishing Product Manager: Devika Battike
Senior Editor: Mohammed Yusuf Imaratwale
Content Development Editors: Sean Lobo and Tazeen Shaikh
Technical Editor: Devanshi Deepak Ayare
Copy Editor: Safis Editing
Project Coordinator: Aparna Ravikumar Nair
Proofreader: Safis Editing
Indexer: Tejal Daruwale Soni
Production Designer: Shankar Kalbhor
First published: March 2021
Production reference: 1300321
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
ISBN 978-1-80056-540-1
www.packt.com
Contributors
About the authors
Tarik Makota hails from a small town in Bosnia. He is a principal solutions architect with AWS, a builder, a writer, and the self-proclaimed best fly fisherman at AWS. Never a perfect student, he managed to earn an MSc in software development and management from RIT. When he is not doing the cloud
or writing, Tarik spends most of his time fly fishing to pursue slippery trout. He feeds his addiction by spending summers in Montana. Tarik lives in New Jersey with his family, Mersiha, Hana, and two exceptionally perfect dogs.
Brian Maguire is a solutions architect at AWS, where he is focused on helping customers build solutions in the cloud. He is a technologist, writer, teacher, and student who loves learning. Brian lives in New Hope, Pennsylvania, with his family, Lorna, Ciara, Chris, and several cats.
Danny Gagne is a solutions architect at AWS. He has extensive experience in the design and implementation of large-scale, high-performance analysis systems. He lives in New York City.
Rajeev Chakrabarti is a principal developer advocate with the Amazon Kinesis and the Amazon MSK team. He has worked for many years in the big data and data streaming space. Before joining the Amazon Kinesis team, he was a streaming specialist solutions architect helping customers build streaming pipelines. He lives in New Jersey with his family, Shaifalee and Anushka.
About the reviewers
Ritesh Gupta works as a software development manager with AWS, leading the control plane and data plane teams on the Kinesis Data Streams service. He has over 20 years of experience in leading and delivering geographically distributed web-scale applications and highly available distributed systems supporting millions of transactions per second; he has 10 years of experience in managing engineers and managers. Prior to Amazon, he worked at Microsoft, EA Games, Dell, and a few successful start-ups. His technical expertise cuts across building web-scale applications, enterprise software, and big data. I thank my wife, Jyothi, and daughter, Udita, for putting up with the late-night learning sessions that have allowed me to be where I am.
Randy Ridgley is an experienced technology generalist working with organizations in the media and entertainment, casino gaming, and public sector fields that are looking to adopt cloud technologies. He started his journey into software development at a young age, building BASIC programs on the Commodore 64. In his professional career, he started by building Windows applications, eventually graduating to Linux with multiple programming languages. Currently, you can find Randy spending most of his time building end-to-end real-time streaming solutions on AWS using serverless technologies and IoT.
Table of Contents
Preface
Section 1: Introduction to Data Streaming and Amazon Kinesis
Chapter 1: What Are Data Streams?
Introducing data streams
Sources of data
The value of real-time data in analytics
Decoupling systems
Challenges associated with distributed systems
Transactions per second
Scaling
Latency
Fault tolerance/high availability
Overview of messaging concepts
Overview of core messaging components
Messaging concepts
Examples of data streaming
Application log processing
Internet of Things
Real-time recommendations
Video streams
Summary
Further reading
Chapter 2: Messaging and Data Streaming in AWS
Amazon Kinesis Data Streams (KDS)
Encryption, authentication, and authorization
Producing and consuming records
Data delivery guarantees
Integration with other AWS services
Monitoring
Amazon Kinesis Data Firehose (KDF)
Encryption, authentication, and authorization
Monitoring
Producers
Delivery destinations
Transformations
Amazon Kinesis Data Analytics (KDA)
Amazon KDA for SQL
Amazon Kinesis Data Analytics for Apache Flink (KDA Flink)
Amazon Kinesis Video Streams (KVS)
Amazon Simple Queue Service (SQS)
Amazon Simple Notification Service (SNS)
Amazon SNS integrations with other AWS services
Encryption at rest
Amazon MQ for Apache ActiveMQ
IoT Core
Device software
Control services
Analytics services
Amazon Managed Streaming for Apache Kafka (MSK)
Apache Kafka
Amazon MSK
Amazon EventBridge
Service comparison summary
Summary
Chapter 3: The SmartCity Bike-Sharing Service
The mission for sustainable transportation
SmartCity new mobile features
SmartCity data pipeline
SmartCity data lake
SmartCity operations and analytics dashboard
SmartCity video
The AWS Well-Architected Framework
Summary
Further reading
Section 2: Deep Dive into Kinesis
Chapter 4: Kinesis Data Streams
Technical requirements
Discovering Amazon Kinesis Data Streams
Creating streams and shards
Creating a stream producer application
Creating a stream consumer application
Data pipelines with Amazon Kinesis Data Streams
Data pipeline design (simple)
Data pipeline design (intermediate)
Data pipeline design (full design)
Designing for scalable and reliable analytics pipelines
Monitoring and scaling with Amazon Kinesis Data Streams
X-Ray tracing with Amazon Kinesis Data Streams
Scaling up with Amazon Kinesis Data Streams
Securing Amazon Kinesis Data Streams
Implementing least-privilege access
Summary
Further reading
Chapter 5: Kinesis Firehose
Technical requirements
Setting up the AWS account
Using a local development environment
Using an AWS Cloud9 development environment
Code examples
Discovering Amazon Kinesis Firehose
Understanding KDF delivery streams
Understanding encryption in KDF
Using data transformation in KDF with a Lambda function
Understanding delivery stream destinations
Amazon S3
Amazon Redshift
Amazon Elasticsearch Service
Splunk destination
HTTP endpoint destination
Understanding data format conversion in KDF
Deserialization
Schema
Serializer
Data format conversion errors
Understanding monitoring in KDF
Use-case example – Bikeshare station data pipeline with KDF
Steps to recreate the example
Summary
Further reading
Chapter 6: Kinesis Data Analytics
Technical requirements
AWS account setup
AWS CDK
Java and Java IDE
Code examples
Discovering Amazon KDA
Working on SmartCity bike share analytics use cases
Creating operational insights using SQL Engine
Core concepts and capabilities
Creating operational insights using Apache Flink
Options for running Flink applications in AWS Cloud
Flink applications on KDA
Building bike ride analytic applications
Setting up a producer application
Building a KDA SQL application
Building a KDA Flink application
Monitoring KDA applications
Summary
Further reading
Blogs
Workshops
Chapter 7: Amazon Kinesis Video Streams
Technical requirements
AWS account setup
Using a local development environment
Code examples
Understanding video fundamentals
Containers
Codecs
Discovering Amazon Kinesis video streams WebRTC
Core concepts and connection patterns
Creating a signaling channel
Establishing a connection
Discovering Amazon KVS
Key components of KVS
Stream
Kinesis producer
Consuming
Creating a stream
Producing
Integration with Rekognition
Building video-enabled applications with KVS
Summary
Further reading
Section 3: Integrations
Chapter 8: Kinesis Integrations
Technical requirements
AWS account setup
AWS CLI
Kinesis Data Generator
Code examples
Amazon services that can produce data to send to Kinesis
Amazon Connect
Amazon Aurora database activity
DynamoDB activity
Processing Kinesis data with Apache Spark
Amazon services that consume data from Kinesis
Serverless data lake
Amazon services that transform Kinesis data
Routing events with EventBridge
Third-party integrations with Kinesis
Splunk
Summary
Further reading
Why subscribe?
Other Books You May Enjoy
Preface
Amazon Kinesis is a collection of secure, serverless, durable, and highly available purpose-built data streaming services. These data streaming services provide APIs and client SDKs to enable you to produce and consume data at scale.
Scalable Data Streaming with Amazon Kinesis begins with a quick overview of the core concepts of data streams along with the essentials of the AWS Kinesis landscape. You'll then explore the requirements of the use cases shown throughout the book to help you get started, and cover the key pain points encountered in the data stream life cycle. As you advance, you'll get to grips with the architectural components of Kinesis, understand how they are configured to build data pipelines, and delve into the applications that connect to them for consumption and processing. You'll also build a Kinesis data pipeline from scratch and learn how to implement and apply practical solutions. Moving on, you'll learn how to configure Kinesis on a cloud platform. Finally, you'll learn how other AWS services can be integrated into Kinesis. These services include Redshift, Dynamo Database, AWS S3, Elasticsearch, and third-party applications such as Splunk.
By the end of this AWS book, you'll be able to build and deploy your own Kinesis data pipelines with Kinesis Data Streams (KDS), Kinesis Firehose (KFH), Kinesis Video Streams (KVS), and Kinesis Data Analytics (KDA).
Who this book is for
This book is for solutions architects, developers, system administrators, data engineers, and data scientists looking to evaluate and choose the most performant, secure, scalable, and cost-effective data streaming technology to overcome their data ingestion and processing challenges on AWS. Prior knowledge of cloud architectures on AWS, data streaming technologies, and architectures is expected.
What this book covers
Chapter 1, What Are Data Streams?, covers core streaming concepts so that you will have a detailed understanding of their application in distributed systems.
Chapter 2, Messaging and Data Streaming in AWS, takes a brief look at the ecosystem of AWS services in the messaging space. After reading this chapter, you will have a good understanding of the various services, be able to differentiate them, and understand the strengths of each service.
Chapter 3, The SmartCity Bike-Sharing Service, reviews the existing bike-sharing application and how the city plans to modernize it. This chapter will provide the background information for the examples used throughout the book.
Chapter 4, Kinesis Data Streams, teaches concepts and capabilities, common deployment patterns, monitoring and scaling, and how to secure KDS. We will step through a data streaming solution that will ingest, process, and feed data from multiple SmartCity data systems.
Chapter 5, Kinesis Firehose, teaches the concepts, common deployment patterns, monitoring and scaling, and security in KFH.
Chapter 6, Kinesis Data Analytics, covers the concepts and capabilities, approaches for common deployment patterns, monitoring and scaling, and security in KDA. You will learn how real-time streaming data can be queried like a database with SQL or code.
Chapter 7, Amazon Kinesis Video Streams, explores the concepts, monitoring and scaling, security, and deployment patterns for real-time communication and data ingestion. We will step through a solution that will provide real-time access to a video stream and ingest video data for the SmartCity data system.
Chapter 8, Kinesis Integrations, reviews how to integrate Kinesis with several Amazon services, such as Amazon Redshift, Amazon DynamoDB, AWS Glue, Amazon Aurora, Amazon Athena, and other third-party services such as Splunk. We will integrate a wide variety of services to create a serverless data lake.
To get the most out of this book
All of the examples in the chapters in this book are run using an AWS account to access services such as Amazon Kinesis, DynamoDB, and Amazon S3. Readers will need a Windows, Mac, or Linux computer with an internet connection. Many of the examples in the book use a command-line terminal such as PuTTY, macOS Terminal, GNOME Terminal, or iTerm2 to run commands and change configuration. The examples written in Python are written for the Python 3 interpreter and may not work with Python 2. For the examples written for the Java platform, readers are encouraged to use Java version 11 and AWS Java SDK version 1.11. We make extensive use of the AWS CLI v2 and will also use Docker for some examples. In addition to software, a webcam or IP camera and Android device will be needed to fully execute some of the examples.
If you are using the digital version of this book, we advise you to type the code yourself or access the code via the GitHub repository (link available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.
Download the example code files
You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Streaming-Data-Solutions-with-Amazon-Kinesis. In case there's an update to the code, it will be updated on the existing GitHub repository.
We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
Download the color images
We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://static.packt-cdn.com/downloads/9781800565401_ColorImages.pdf.
Conventions used
There are a number of text conventions used throughout this book.
Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: In this command, we'll send the test2.mkv file we downloaded to the KVS stream.
A block of code is set as follows:
aws glue create-database --database-input {\"Name\":\"smartcitybikes\"}
aws glue create-table --database-name smartcitybikes --table-input file://SmartCityGlueTable.json
When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:
mediaSource.start();
Any command-line input or output is written as follows:
aws rekognition start-stream-processor --name kvsprocessor
Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: Once you have entered the appropriate information, all that's left is to click Create signaling channel.
Tips or important notes
Appear like this.
Get in touch
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.
Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.
Reviews
Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!
For more information about Packt, please visit packt.com.
Section 1: Introduction to Data Streaming and Amazon Kinesis
In this section, you will be introduced to the concept of data streams and how they are used to create scalable data solutions.
This section comprises the following chapters:
Chapter 1, What Are Data Streams?
Chapter 2, Messaging and Data Streaming in AWS
Chapter 3, The SmartCity Bike-Sharing Service
Chapter 1: What Are Data Streams?
A data stream is a system where data continuously flows from multiple sources, just like water flows through a stream. The data is often produced and collected simultaneously in a continuous flow of many small files or records. Data streams are utilized by a wide range of business, medical, government, social media, and mobile applications. These applications include financial applications for the stock market and e-commerce ordering systems that collect orders and cover fulfillment of delivery.
In the entertainment space, live data is produced by sensing devices embedded in player equipment, video game players generate large amounts of data at a massive scale, and there are new social media posts thousands of times per second. Governments also leverage streaming data and geospatial services to monitor land, wildlife, and other activities.
Data volume and velocity are increasing at faster rates, creating new challenges in data processing and analytics. This book will detail these challenges and demonstrate how Amazon Kinesis can be used to address them. We will begin by discussing key concepts related to messaging in a technology-agnostic form to provide a solid foundation for building your Kinesis knowledge.
Incorporating data streams into your application architecture will allow you to deliver high-performance solutions that are secure, scalable, and fast. In this chapter, we will cover core streaming concepts so that you will have a detailed understanding of their application to distributed systems. You will learn what a data stream is, how to leverage data streams to scale, and examine a number of high-level use cases.
This chapter covers the following topics:
Introducing data streams
Challenges associated with distributed systems
Overview of messaging concepts
Examples of data streaming
Introducing data streams
Data streams are a way of storing a sequence of messages. They enable us to design systems where we think about state as a series of events instead of only entities and values, or rows and columns in a database. This shift in mindset and technology enables real-time analytics to extract the value from data by acting on it before it is stale. They also enable organizations to design and develop resilient software based on microservice architectures by helping them to decouple systems. We will begin with an overview of streaming data sources, why real-time data analysis is valuable, and how they can be used architecturally to decouple systems. We will then review the core challenges associated with distributed systems, and conclude with an overview of key messaging concepts and some high-level examples. Messages can contain a wide variety of information and come from different sources, so let's look at the primary sources and data formats.
Sources of data
The proliferation of data steadily increases from sources such as social media, IoT devices, web clickstreams, application logs, and video cameras. This data poses challenges to most systems, since it is typically high-velocity, intermittent, and bursty, making it difficult to adequately provision and design downstream systems. Payloads are generally small, except when containing audio or video data, and come in a variety of formats.
In this book, we will be focusing on three data formats. These formats include the following:
JavaScript Object Notation (JSON)
Log files
Time-encoded binary files such as video
JSON streams
JSON has become the dominant format for message serialization over the past 10 years. It is a lightweight data interchange format that is easy for humans to read and write and is based on the JavaScript object syntax. It has two data structures – hash tables and lists. A hash table consists of key-value pairs, {key
:value
}, where the keys must be unique. A list is a set of values in a specific order, [value 1
, value 2
]. The following code sample shows a sample IoT JSON message:
{
deviceid
: device001
,
eventTime
: -192778200,
temp
: 68.4,
humidity
: 77.3,
coords
: {
latitude
: 32.779039,
longitude
: -96.808660
}
}
Log file streams
Log files come in a variety of formats. Common ones include Apache Commons Logging, Apache Combined Log, Apache Error Log, and RFC3164 Syslog. They are plain text, and usually each line, delineated by a newline ('\n') character, is a separate log entry. In the following sample log, we see an HTTP GET request where the IP address is 10.13.37.01, the datetime of the request, the HTTP verb, the URL fragment, the HTTP version, the response code, and the size of the result.
The sample log line in Apache Commons Logging format is as follows:
10.13.37.01 - - [03/Sep/2017:12:00:01 +0830] GET /mailman/listinfo/test HTTP/1.1
200 2457
Time-encoded binary streams
Time-encoded binary streams consist of a time series of records where each record is related to the adjacent records (prior and subsequent records). These can be used for a wide variety of sensor data, from audio streams and RADAR signals to video streams. Throughout this book, the primary focus will be video streams and their applications.
Figure 1.1 – Time-encoded video dataFigure 1.1 – Time-encoded video data
As shown in Figure 1.1, video streams are composed of fragments, where each fragment is a self-contained sequence of media frames. There are no dependencies between fragments. We will discuss video streams in more detail in Chapter 7, Kinesis Video Streams. Now that we've covered the types of data that we'll be processing, let's take a step back to understand the value of real-time data in analytics.
The value of real-time data in analytics
Analysis is done to support decision making by individuals, organizations, or computer programs. Traditionally, data analysis has been done on batches of data, usually in long-running jobs that occur overnight and that happen periodically at predetermined times: nightly, weekly, quarterly, and so on. This not only limits the scope of actions available to decisions makers, but it is also only providing them with a representation of the past environment. Information is now available seconds after it is produced, so we need to design systems that provide decision makers with the freshest data available to make timely decisions.
The OODA – Observe, Orient, Decide, Act – loop is a decision-making, conceptual framework that describes how decisions are made when reacting to an event. By breaking it down into these four components,