Introduction to AWS Quicksight
In this article, I would be explaining the high level architecture of the AWS Quicksight Ecosystem and would be doing a step by step process of adding and manipulating your data source that is sourced by S3 storage or a direct upload.
What is AWS Quicksight?
AWS Quicksight is service that runs on the Amazon Web Services that allows the creation of production level business intelligence dashboards. The service uses “Super-fast, Parallel, In-memory Calculation Engine” which is known as SPICE to perform the computations and create graphs.
Quicksight Architecture
Quicksight has its own unique architecture when compared with its competitors such as Microsoft Power BI and Tableau. The below diagram describes the high level architecture of this service.
A separate storage capacity could be allocated to SPICE. Data can initially loaded into spice and QuickSight will access this cached data very quickly and efficiently.
Data could be accessed using direct query rather than importing to SPICE as well but this would be very inefficient and time consuming because you would be retrieving data from the database every time you change or update an analysis.
Creating My First Data Source
I would be creating a data source based on the COVID-19 Corona virus data.
Here are the links to the CSV Data sets I have used :
Covid-19 Data set :- https://data.europa.eu/euodp/en/data/dataset/covid-19-coronavirus-data
Population Data set:- https://www.kaggle.com/tanuprabhu/population-by-country-2020
Step 1 :- Go to Manage Data
Go to AWS Quicksight Home page and select on the manage data button on the upper right corner as shown below
Step 2 :- Go to New data set
Select the New Data Set on the upper left corner
Step 3 :- Upload a File/S3 Path
In here you would be able to choose the source of the data. It could a S3, Redshift, SQL sever or even a direct upload. I’ll be showing how you how to use S3 or direct upload file as the sources of data.
- Simple Storage System (S3)
Access must be given to the S3 bucket from Quicksight.
- Change your region to N.Virginia for duration of this setting change only.
- Go to Manage Quicksight -> Security & Permissions -> Add or remove
- Click on Details to pop up a window which will allow you to select the bucket to give permissions.
- Select the relevant bucket and click on update.
- Change the region back to your original region
We are now ready to add a data source via S3
- Select S3 from create a new Data set window you will be prompted with the below view
- Create a file named “manifest.json” file locally. Copy and paste the below configuration onto the file and replace the URL with S3 path to your CSV.
{
"fileLocations": [
{
"URIs": [
"https://xxxxx.s3.amazonaws.com/corona.csv"
]
}],
"globalUploadSettings": {
"format": "CSV",
"delimiter": ",",
"containsHeader": "true"
}
}
- Upload the the manifest file
2. Direct Upload
When you use this option, you will be asked to select the file from the file explorer of your OS.
Step 4:- Select Edit Settings and prepare data
You will be now getting a view of all the columns and rows of the data source.
You will be able to
- Add new columns (Calculated Columns) with new functions
- Change data type of existing columns
- Perform joins (Inner, Left, Right or Full) with the same or a different data source.
- Add filters to filter out the rows
I’ll be demonstrating an example for each of the above cases
Adding a New Column
We would be concatenating two columns for this case
Field name : geod_id_country_code
Formula : concat({geoId},"+",{countryterritoryCode})
Eg : AF+AFG
For this case ensure that Quicksight recognizes the dateRep field as a Date type, if not enable it as shown below.
Note that Quicksight might not given accurate date values because of incorrect auto - casting so follow the steps below as well.
“dateRep” is in dd/MM/yyyy and for this case quicksight detects it as MM/dd/yy, so we are going to amend it.
In the pop up window give the format as dd/MM/yyyy
Now add another calculated field :
Field name : date_diff_present
Formula : dateDiff({dateRep},now())
Output Eg : 1
The above formula give the difference between the date in dataRep column and the current date in days.
Performing a Join
When add data is selected, you will be provided with an interface to upload or select a data source.
Here I uploaded another separate data source that has the most recent populations for each country and performed a left join between the two data sources.
- A combined list of columns would be generated that could be used for calculated fields.
Adding Row Level Filters
Imagine a certain Scenario where you do not need your analysis to contain any data regarding “Armenia” country.
Simplest solution is applying a filter. Click on the countries and Territories
Below window would be response for the above action. Add the following details as shown below to filter out “Armenia”
Step 4:- Save data set
Click on “Save”
CONGRATULATIONS !!
You just created your first Data Source
You can create your analysis based on this data source which can be transformed into a Dashboard. Stay tuned for my next article on Creating a analysis based on this data set.