OHDSI - OHDSIonAWS - Automation Code and Documentation For Standing Up The OHDSI Toolstack in An AWS Environment

Download as pdf or txt
Download as pdf or txt
You are on page 1of 34

OHDSI / OHDSIonAWS

Code Issues 6 Pull requests 1 Actions Projects Wiki

master 1
0 Go to file Add file Code About
branch
tags
Automation code
Clone
and
JamesSWiggins Upd… 3010d78 on 4 Feb 110 commits
HTTPS SSH GitHub CLI documentation for
https://github.com/OHDSI/OHDSIonAWS.git standing up the
datasources Add files via upload 16 months ago
OHDSI toolstack in
Use Git or checkout with SVN using the web URL.
images Add files via upload 9 months ago an AWS
environment
00-master-… Add files via upload 16 months ago
Open with GitHub Desktop
Readme
00-master-… updates 9 months ago
Download ZIP Apache-2.0
00-master-… Add files via upload 16 months ago
License
00-master-… Add files via upload 16 months ago

01-certifica… update certificate logic 9 months ago


Releases
01-vpc-oh… Add files via upload 16 months ago
No releases published

01-vpc-oh… Add files via upload 16 months ago

01-vpc-oh… Fixed for Tokyo region 16 months ago Packages

01-vpc-oh… Add files via upload 16 months ago No packages published

02-databa… Add files via upload 16 months ago

02-databa… Add files via upload 16 months ago Contributors 3

02-rstudio-… Add files via upload 16 months ago


JamesSWigg…

02-rstudio-… updates 9 months ago


delur407 Jim…
03-applica… Add files via upload 16 months ago
pbr6cornell …
03-applica… updates 9 months ago
03-route53… Add files via upload 2 years ago
Languages

LICENSE Initial commit 2 years ago


Shell 100.0%

README.md Updated OHDSI tool versions 5 months ago

atlasuserm… Update atlasusermgmt.sh 9 months ago

upgrade_at… Update upgrade_atlas_weba… 9 months ago

Enterprise, Multi-User
OHDSI on AWS

Quick Start

This repository allows you to quickly deploy an enterprise


class, multi-user, scalable and fault tolerant OHDSI
environment on AWS using the latest OHDSI tools.
Choose Internet Accessible to launch an OHDSI
environment that users can access over the Internet.
Private Network Only allows you to specify your own
Amazon Virtual Private Cloud (VPC) making the OHDSI
environment only accessible from your organization's
private network. If you are interested in a personal OHDSI
learning environment you may be better served by the
OHDSI-in-a-box Project.

Deployment Walk Through Video


Launch
Launch
AWS Private
Internet
Region Name Network
Accessible
Code Only
Environment
Environment

cn-
China
northwest-
(Ningxia)
1

ap-
northeast- AP (Seoul)
2

ap-
AP
southeast-
(Singapore)
1

ap-
northeast- AP (Tokyo)
1

ap-
AP
southeast-
(Sydney)
2

eu-west-1 EU (Ireland)

eu- EU
central-1 (Frankfurt)
EU
eu-west-2
(London)

eu-west-3 EU (Paris)

README.md EU
eu-north-1
(Stockholm)

ca-
Canada
central-1

US East (N.
us-east-1
Virginia)

US East
us-east-2
(Ohio)

US West (N.
us-west-1
California)

US West
us-west-2
(Oregon)

Included OHDSI Projects

OHDSI Component Version

OMOP Common Data Model v5.3.1

Atlas v2.8.0

WebAPI v2.8.0

Achilles v1.6.3

PatientLevelPrediction v4.0.5

CohortMethod v4.0.1

SqlRender v1.6.8

DatabaseConnector v3.0.0

DatabaseConnectorJars v1.1.0
OhdsiRTools v1.9.1

FeatureExtraction v3.1.0

Cyclops v3.0.0

EmpiricalCalibration v2.0.2

OhdsiSharing v0.2.2

MethodEvaluation v2.0.0

Hydra v0.1.1

PredictionComparison v1.0.0

Eunomia v1.0.1

BigKNN v1.0.0

Andromeda v0.4.0

SelfControlledCaseSeries v2.0.0

SelfControlledCohort v1.5.0

EvidenceSynthesis v0.2.3

CohortDiagnostics v1.2.7

Included Sample Data Sources

Sample
Size Schema Name
Data Source

1,000
Synthea synthea1k
persons

100,000
Synthea synthea100k
persons

>2,000,000
Synthea synthea23m
persons

CMS 1,000
CMSDESynPUF1k
DeSynPUF persons
CMS 100,000
CMSDESynPUF100k
DeSynPUF persons

CMS >2,000,000
CMSDESynPUF23m
DeSynPUF persons

Topics

OHDSI on AWS architecture and features


OHDSI on AWS deployment instructions
Using RStudio
Troubleshooting Deployments
Ongoing Operations

OHDSI on AWS architecture and


features

The features of using this architecture are as follows:

It’s deployed in an isolated, three-tier Amazon Virtual


Private Cloud (Amazon VPC).
Can be deployed with access from the public
Internet, or accessible only from within your
organization's private network.
It deploys the OMOP CDM with clinical and
vocabulary data, Atlas, WebAPI, Achilles, RStudio
and Jupyter Notebooks with PatientLevelPrediction,
CohortMethod, and many other R libraries.
Provides role-based access control for Atlas,
RStudio, and Jupyter Notebooks.
It uses data-at-rest and in-flight encryption to
respect the requirements of HIPAA.
It uses managed services from AWS; OS,
middleware, and database patching and
maintenance is largely automatic.
It creates automated backups for operational and
disaster recovery.
It’s built automatically in just a few hours.
Environments can be configured from very small to
peta-byte scale, geographically redundant
implementations by providing different deployment
parameters.
The design results in a reasonable monthly cost

A high-level diagram showing how the different


components of OHDSI map to AWS Services is shown
below.

Internet Accessible v. Private Network Only Architectures

There are two versions of the OHDSIonAWS architecture,


one that allows access to your OHDSI environment from
the Internet and the other that only allows access from
your organization's private network. These two versions
use the same underlying technical architecture for OHDSI
and only vary in the way they access to the tool web
interfaces.
With the Internet Accessible version, the OHDSIonAWS
architecture is deployed in an AWS three-tiered VPC, with
Internet access, using network address ranges that you
specify. You can restrict who can access your OHDSI
environment over the Internet by providing an allowed
public IP address range as a parameter when the
environment is deployed. This version is useful for
environments with non-sensitive data, public-facing
demonstrations, training environments, or test
deployments.

With the Private Network Only version, the


OHDSIonAWS architecture is deployed into a pre-existing
AWS VPC that you specify during deployment. This can
be a VPC that you designed and deployed or one
provided to you from your organization's central IT team.
The VPC you specify can be connected back to your
organization's on-premises internal network using a VPN
connection or AWS Direct Connect connection. The
OHDSI environment is then deployed using only internal,
private IP addressing and can only be accessed from
within your organization's internal, private network. This
version is useful for production analytics environments
containing sensitive data.

General Architecture Description


Starting from the user, public Internet DNS services are
(optionally) provided by Amazon Route 53. This gives
you the ability to automatically add a domain to an
existing hosted zone in Route 53 (i.e. ohdsi.example.edu
if example.edu is already hosted in Route 53). In addition,
if you are deploying a new domain to Route 53, an SSL
certificate can be automatically generated and applied
using AWS Certificate Manager (ACM). This enables
HTTPS communication and ensures the data sent from
the users is encrypted in-transit (in accordance with
HIPAA). HTTPS communication is also used between the
Application Load Balancers and the Atlas/WebAPI
servers.

The default configuration of this environment is to have


your Atlas and RStudio instances accessible over the
public Internet. A parameter is provided to limit the
source IP addresses that can access these resources, but
for use with sensitive data this environment should be
deployed in a completely private network and only
accessed via VPN connection, AWS Direct Connect
connection, or VPC Peering connection. For more
information about how to configure your deployment in
this way, please reach out to your AWS account team.

AWS Elastic Beanstalk is used to deploy the OHDSI


application onto Apache/Tomcat Linux servers. Elastic
Beanstalk is an easy-to-use service for deploying and
scaling web applications. It covers everything from
capacity provisioning, load balancing, regular OS and
middleware updates, autoscaling, and high availability, to
application health monitoring. Using a feature of Elastic
Beanstalk called ebextensions, the Atlas/WebAPI servers
are customized to use an encrypted storage volume for
the middleware application logs.
Amazon Relational Database Service (RDS) with
Amazon Aurora PostgreSQL is used to provide an
(optionally) highly available database for the WebAPI
database. Amazon Aurora is a relational database built for
the cloud that combines the performance and availability
of high-end commercial databases with the simplicity and
cost-effectiveness of open-source databases. It provides
cost-efficient and resizable capacity while automating
time-consuming administration tasks such as hardware
provisioning, database setup, patching, and backups. It is
configured for high availability and uses encryption at
rest for the database and backups, and encryption in
flight for the JDBC connections. The data stored inside
this database is also encrypted at rest.

Amazon Redshift is used to store the OMOP CDM that


contains your patient-level observational health data as
well as your vocabulary tables. Amazon Redshift is a fast,
fully managed data warehouse that allows you to run
complex analytic queries against petabytes of structured
data. It uses using sophisticated query optimization,
columnar storage on high-performance local disks, and
massively parallel query execution.

Amazon Elastic Compute Cloud (EC2) is used host a


multi-user RStudio Server Open Source Edition instance.
RStudio is an web-based, integrated development
environment (IDE) for working with R. It can be licensed
either commercially or under AGPLv3. On the RStudio
instance, an encrypted AWS Elastic Block Store (EBS)
volume is mounted as the home directory for all RStudio
user’s files.
Amazon SageMaker is used to build, train, and deploy
machine learning models to predict patient health
outcomes developed with the OHDSI
PatientLevelPrediction R package. Amazon SageMaker is
a fully-managed service that covers the entire machine
learning workflow to label and prepare your data, choose
an algorithm, train the algorithm, tune and optimize it for
deployment, make predictions, and take action. Your
models get to production faster with much less effort and
lower cost.

A more detailed, network-oriented diagram of this


environment is shown following.

OHDSI on AWS deployment


instructions
Before deploying an application on AWS that transmits,
processes, or stores protected health information (PHI) or
personally identifiable information (PII), address your
organization's compliance concerns. Make sure that you
have worked with your internal compliance and legal team
to ensure compliance with the laws and regulations that
govern your organization. To understand how you can use
AWS services as a part of your overall compliance
program, see the AWS HIPAA Compliance whitepaper.
With that said, we paid careful attention to the HIPAA
control set during the design of this solution.

Pre-flight check

If you intend to use Route 53 for DNS or ACM to provide an


SSL certificate

Automatically provisioning and applying an SSL


certificate with this CloudFormation template using ACM
requires the use of Route 53 for your DNS service. If you
have not already done so, create a new Route 53 Hosted
Zone, transfer registration of an existing domain to Route
53, or transfer just your DNS service to Route 53.

If you do not intend to use Route 53 and ACM to


automatically generate and provide an SSL certificate, an
SSL certificate can be applied to your environment after it
is deployed.

Ensure you have appropriate permissions and limits

0.1. This template must be run by an AWS IAM User who


has sufficient permission to create the required
resources. These resources include: VPC, IAM User and
Roles, S3 Bucket, EC2 Instance, ALB, Elastic Beanstalk,
Redshift Clusters, SageMaker models, Route53 entries,
and ACM certificates. If you are not an Administrator of
the AWS account you are using, please check with them
before running this template to ensure you have sufficient
permission.
0.2. This template will create two S3 buckets. By default,
AWS accounts have a limit of 100 S3 buckets. If you are
near that limit, please either delete some unused S3
buckets or request a limit increase before running this
template.

Begin deployment

1. Begin the deployment process by clicking the


Launch Stack button at the top of this page that
matches the AWS Region you'd like to use. This will
take you to the CloudFormation Manage Console and
specify the OHDSI Cloudformation template. Then
click the Next button in the lower-right corner.

2. The next screen will take in all of the parameters for


your OHDSI environment. A description is provided
for each parameter to help explain its function, but
following is also a detailed description of how to use
each parameter. At the top, provide a unique Stack
Name.

General AWS parameters

Parameter
Description
Name

Required You must choose a key pair.


This will allow you SSH access into
your WebAPI/Atlas and RStudio
EC2 Key
instances. To learn more about
Pair
administering the instances in this
OHDSI environment, see the On-going
Operations section below.

Required This parameter allows you to


Limit limit the IP address range that can
access to access your Atlas and RStudio servers.
IP address It uses CIDR notation. The default of
range? 0.0.0.0/0 will allow access from any
address.

DNS and SSL parameters

Parameter
Description
Name

Required This unique name will be


combined with AWS Region identifier
(i.e. us-east-1 ) to determine the
web address for your Atlas and
RStudio servers. The Elastic Beanstalk
URL (will be rendered
http://(EBEndpointName).
(region).elasticbeanstalk.com). You
can check to see if an endpoint name
Elastic
is in use by checking for an existing
Beanstalk
DNS entry using the 'nslookup'
Endpoint
command from your Windows,
Name
MacOS, or Linux terminal: #
nslookup (EBEndpoint).
(region).elasticbeanstalk.com . If
nslookup returns an IP address, that
means that the name is in use and you
should pick a different name. You
need to pick an Elastic Beanstalk
Endpoint Name even if you are using a
Route53 DNS entry.

China Only, Required By default,


Web server China blocks inbound access to
port common ports like 80, 8080, and 443.
number If you need to use these ports, there is
an exception process.

If you select True, a DNS record will


automatically be created using the
Use Route Route53 parameters below. If you
53? select False, then the Elastic
Beanstalk assigned domain name will
be used.

Requires the use of Route53. If you


select True, an SSL certificate will be
automatically generated for your
Apply SSL domain name using AWS Certificate
Certificate? Manager (ACM). If you select False,
HTTP will be used and an SSL
certificate can be applied after
deployment.

Optional, only needed if using


Route53. The Route 53 hosted zone
Route53 ID to create the site domain in (e.g.
Hosted Z2FDTNDATAQYW2). You can find
Zone ID this value by looking up your Hosted
Zone in the Route53 Management
Console.

Optional, only needed if using


Route53 Route53. The Route 53 hosted zone
Hosted domain name to create the site
Zone domain in (e.g. example.edu). You can
Domain find this value by looking up your
Name Hosted Zone in the Route53
Management Console.

Optional, only needed if using


Route53. The sub-domain name you
Route53 want to use for your OHDSI
Site implementation. This name will be
Domain prepended your specified Hosted
Zone Domain Name (e.g. ohdsi in
ohdsi.example.edu).

Database Tier parameters

Parameter
Description
Name
Specifies whether to deploy the AWS
Use
Aurora PostgreSQL WebAPI database
Primary
in a Multi-AZ configuration. This
and Standy
provides a stand-by database for high
Database
availability in the event the primary
Instances?
database fails.

Determines processing power and


memory capacity of the WebAPI
DB database. The default of r4.large
Instance should be sufficient for most
Type environments. Details of other
instance types can be found in the
Amazon Aurora documentation.

Determines the processing power and


storage capacity of OMOP CDM data
Instance warehouse. Additional speed and
Type for space can be added by choosing a
Redshift larger Instance Type or by increasing
cluster the number of nodes (parameter
nodes below). Additional scaling details can
be found in the Redshift
documentation.

Required The number of nodes


Number of
determines the overall processing
nodes in
power and storage space of your
your
OMOP CDM data warehouse.
Redshift
Additional scaling details can be
cluster
found in the Redshift documentation.

Required This password will be used


Aurora for the master user of the Aurora
PostgreSQL PostgreSQL WebAPI database and
and the Redshift OMOP CDM data
Redshift warehouse. It must have a length of
master 8-41 and be letters (upper or lower),
password numbers, and/or these special
characters ~#%^*_+,- .

OMOP Sources parameters

These parameters allow you to specify any number of


OMOP formatted data sources that will be automatically
loaded into your OHDSI environment. After they are
loaded, the Achilles project will be used to populate a
Results schema for each source enabling population-level
visualizations within Atlas and also data quality feedback
from Achilles Heel.

Parameter
Name

Comma-
delimited list of Comma-delimited list of OMOP CDM schema sources to load into the Red
OMOP CDM datawarehouse. By default, this is set to
schema CMSDESynPUF23m,CMSDESynPUF100k,CMSDESynPUF1k,synthea23m,synthe
sources to which will load all six of the sample data sources.
load into the sources you want in your environment.
Redshift data sources.
datawarehouse

S3 Bucket that
contains DDL
SQL files name
after each
'Source'.sql
S3 Bucket that contains DDL SQL files name after each 'Source'.sql that w
that will be
to load data into the OMOP CDM schema sources.
executed to
load data into
the OMOP
CDM schema
sources.

Included sample data sources


Ultimately, you will want to provide your own custom data
sources, but to get started there are several sample,
synthetic data sources available for you to use. They are
listed in the table below.

Sample
Size Schema Name
Data Source

1,000
Synthea synthea1k
persons

100,000
Synthea synthea100k
persons

>2,000,000
Synthea synthea23m
persons

CMS 1,000
CMSDESynPUF1k
DeSynPUF persons

CMS 100,000
CMSDESynPUF100k
DeSynPUF persons

CMS >2,000,000
CMSDESynPUF23m
DeSynPUF persons

Using your own custom data sources

To configure your own custom data source, provide the


schema names you want to use (i.e. CMSDESynPUF1k)
and an S3 bucket that contains matching named files (i.e.
CMSDESynPUF1k.sql) with Redshift-compatible SQL
statements to load the OMOP tables. Examples of these
load files can be found in this repository
CMSDESynPUF1k.sql and CMSDESynPUF23m. Please
note the top of the files must set the search path to the
specified schema name (i.e. SET search_path to
CMSDESynPUF1k; ). Documnetation provides more
information on using the Redshift COPY command.
Creating and S3 bucket and uploading the 'Source'.sql
files:

RStudio parameters

Parameter Name Description

This determines the


processing power of your
multi-user RStudio
instance. About 0.75GB per
Instance Type for
concurrent user is
RStudio
recommended. For more
information, see the list of
available EC2 instnance
types.

The amount of encrypted


Home Directory size disk space, in GBs,
for RStudio instance allocated to store RStudio
user's local data.

Provide a comma
separated list of usernames
and passwords
(user1,pass1,user2,pass2)
to create on the RStudio
Comma-delimited Server. The first user in the
user list for RStudio list will be given sudoers
access. Do not use 'admin'
as a username. It causes
problems with Atlas
security.

Name of the S3 bucket you


want to use to hold the
PatientLevelPrediction
Bucket for
training data and model
PatientLevelPrediction
output for SageMaker. If
SageMaker Models
you leave this blank a
bucket will be generated
for you.

Web Tier parameters

The web tier contains the Atlas/WebAPI Apache and


Tomcat auto-scaling instances behind a load balancer.
This allows Atlas/WebAPI to be fault tolerant and highly
available while also growing and shrinking the number of
instances based on load.

Parameter
Description
Name

This determines the processing


power of your Atlas/WebAPI
instances. t2.micro should be
Web Tier sufficient for small
Instance Type implementations (5 or less
concurrent users). For more
information, see the list of available
EC2 instnance types.

Required Specifies the minimum


number of EC2 instances in the
Minimum Web Autoscaling Group. A value of
Atlas/WebAPI >1 will create a highly available
Instances environment by placing instances
in multiple availability zones.

Required Specifies the maximum


Minimum number of EC2 instances in the
Atlas/WebAPI Web Autoscaling Group. Must be
Instances greater than or equal to the
Minimum Atlas/WebAPI Instances.

If true, this will use the comma-


Enable delimited list of usernames and
Authentication passwords provided for RStudio to
For Atlas? control access to Atlas. The first
user in the list will be an 'admin'

OHDSI Project Versions parameters

This parameters section contains a list of the OHDSI


components that will be deployed in your environment
and allows you to provide the version number as a
parameter. Default versions are provided that work well
together, but you can provide your own version numbers
if you desire. The version number here must map to a
tagged release or branch for that component in it's
GitHub repository. For instance, this is the list of tagged
releases for the OHDSI WebAPI project.
VPC Networking parameters (Internet Accessible version)

As a part of this deployment, a new AWS Virtual Private


Cloud (VPC) is created using the IP address ranges that
you specify. This VPC provides a public tier that is
accessible from the Internet and contains a load balancer.
The application and data tiers, which contain the OHDSI
applications and databases respectively, are not
accessible directly from the Internet.

VPC Networking parameters (Private Network Only version)


As a part of this deployment, you specify and existing
VPC and subnets within that VPC to use for each tier of
the OHDSIonAWS architecture (presentation, application,
and database). For each tier, you provide two subnets,
giving you the ability to spread your deployment over two
AWS Availability Zones within the AWS Region you chose.
The Presentation Tier in this architecture, as well as the
Application and Data tiers, only use private IP addressing,
so none of the resources are accessible from the Internet.
This environment is only accessible from your internal,
private network. This private VPC can also be connected
back to your internal, on-premises network using a VPN
connection or AWS Direct Connect connection.

When you've provided appropriate values for the


Parameters, choose Next.

3. On the next screen, you can provide some other


optional information like Tags, alternative
Permissions, etc. at your discretion. This information
isn't necessary for typical deployments. Then choose
Next.

4. On the next screen, you can review what will be


deployed. At the bottom of the screen, there is a
check box for you to acknowledge that AWS
CloudFormation might create IAM resources with
custom names and AWS CloudFormation might
require the following capability:
CAPABILITY_AUTO_EXPAND. This is correct; the
template being deployed creates four custom roles
that give permission for the AWS services involved to
communicate with each other. Details of these
permissions are inside the CloudFormation template
referenced in the URL given in the first step. Check
the box acknowledging this and choose Next.
5. You can watch as CloudFormation builds out your
OHDSI environment. A CloudFormation deployment
is called a stack. The parent stack creates several
child stacks depending on the parameters you
provided. When all the stacks have reached the
green CREATE_COMPLETE status, as shown in the
screenshot following, then the OHDSI architecture
has been deployed. Select the Outputs tab to find
your OHDSI environment URLs.

NOTE: Even though Atlas and RStudio are now


accessible, in the background, the data sources you
specified are still being loaded and the results schema is
still being populated by Achilles. Until the first source is
added, you will see an error message in Atlas indicating
that there are no sources. The process of adding all of
your data sources can take anywhere from 30 minutes to
several hours to complete depending on the number and
size of data sources you specified and the size of your
Redshift cluster. During this time, browsing Data Sources
in Atlas may not be successful and performance will be
slow. You can check the progress of your data sources by
looking in the Redshift Management Console, clicking on
your cluster, and looking at the Queries tab.

Using RStudio
Inside of each user's home directory in RStudio is a file
called ConnectionDetails.R. It contains all of the
connection parameters needed to use OHDSI
components like PatientLevelPrediction, CohortMethod,
or Achilles.

These components and all of their dependencies are also


pre-installed and can be invoke simply by issueing the
command library(PatientLevelPrediction) ,
library(CohortMethod) , or library(Achilles) .
Users can change their passwords after logging in by
going to the Terminal and using the Linux # passwd
command as shown following.

New users can also be added to the RStudio server by


logging in with a user who has sudo access and using the
Linux # adduser command as shown following. Recall
that the first user you provided in the RStudio user list
parameter was given sudo access.

Using Jupyter Notebooks


This architecture now includes an implementation of
Jupyter Notebooks, JupyterHub, and JupyterLab. It runs
on the same server as RStudio and allows users to login
using their same username/password combination. These
users have the same home directory as when they login
to RStudio, enabling them to switch back and forth
between RStudio and Jupyter Notebooks depending on
which environment is the best fit for their current work.

You can access the Jupyter Notebook environment by


using the URL in the 'Outputs' tab of the CloudFormation
template (as shown below).

Troubleshooting Deployments

CloudFormation Events
A CloudFormation deployment is called a stack. The
OHDSI template deploys a number of child or nested
stacks depending on which options you choose. If one of
the steps in any of these stacks fail during deployment,
all of the stacks will be rolled back, meaning that they will
be deleted in the reverse order that they were deployed.
In order to understand why a deployment rolled back, it
can be helpful to look at the events that CloudFormation
recorded during deployment. You can do this by looking
at the Events tab of each stack. If a stack has already
been rolled back, you will have to change the filter in the
upper-left corner of the CloudFormation Management
Console from it's default of Active to Deleted to see it's
event log. A demonstration of this is shown following.

Build Log
During the build process a temporary Linux instance is
created that compiles WebAPI, combines it with Atlas,
loads all of your OMOP data sets into Redshift, runs
Achilles, and performs various other configuration
functions. You can see a log of the work that it did by
looking in the CloudWatch Logs Management Console
under the Log Group ohdsi-temporary-ec2-instance-
build-log .

On-going Operations
At this point, you have a fully functioning and robust
OHDSI environment to begin using. Following are some
helpful points to consider regarding how to support this
environment on-going.

Atlas User Management


If you enabled Atlas Security during your deployment, you
can use an included shell script called
atlasusermgmt.sh to add and delete users and to
update their passwords. This script is included in the
RStudio home directory of the first user in the comma-
delimited list you provided. Simply run the script from the
RStudio Terminal (as shown below). It will ask you for the
OHDSI RDS Endpoint name and the Password for user
ohdsi_security_user . These are included in the
ConnectionDetails.R file in the Connection string
for the WebAPI database on RDS Aurora PostgreSQL .

Upgrading Atlas/WebAPI

Click the above Launch Stack button to deploy a


CloudFormation template that will allow you to specify a
new version of Atlas and WebAPI. It will then download
and compile this new version with the parameters you
supply and replace your current version running within
Elastic Beanstalk.

Web Security
If your OHDSI implementation will be accessible over the
public Internet, consider implementing a AWS Web
Application Firewall (WAF) to help protect against
common web exploits that could affect availability,
compromise security or consume excessive resources.
You can use AWS WAF to create custom rules that block
common attack patterns, such as SQL injection or cross-
site scripting. Learn more in the whitepaper "Use AWS
WAF to Mitigate OWASP’s Top 10 Web Application
Vulnerabilities" . You can deploy AWS WAF on either
Amazon CloudFront as part of your CDN solution or on
the Application Load Balancer (ALB) that was deployed
as a part of this solution.

Pull logs, view monitoring data, get alerts, and apply


patches

Using the Elastic Beanstalk service you can pull log files
from one or more of your instances. You can also view
monitoring data including CPU utilization, network
utilization, HTTP response codes, and more. From this
monitoring data, you can configure alarms to notify you
of events within your Atlas/WebAPI application
environment. Elastic Beanstalk also makes managed
platform updates available, including Linux, Tomcat, and
Apache upgrades that you can apply during maintanence
windows your define.

Using AWS Relational Database Services (RDS) you can


pull log files from you Aurora PostgreSQL database. You
can also view monitoring data and create alarms on
metrics including disk space, CPU utilization, memory
utilization, and more. RDS also makes available Aurora
MySQL DB Engine upgrades that are applied
automatically during a maintenance window you define.
Using Amazon Redshift you can implement audit logging
for your OMOP CDM. You can also view performance
data about overall cluster cpu and disk space utilization,
across nodes, and view the performance of individual
queries. Redshift also gives you the ability to execute
queries against you OMOP CDM from within the Redshift
Management Console. Redshift cluster maintenance,
such as cluster patching, is also automatically applied
during maintenance windows that you define.

Access Running Atlas/WebAPI and RStudio Instances

If you need to access the command line on the running


Atlas/WebAPI or RStudio instances, this can be done by
using the AWS Systems Manager Session Manager. It lets
you manage your Amazon EC2 instances through an
interactive one-click browser-based shell or through the
AWS CLI. Session Manager provides secure and auditable
instance management without the need to open inbound
ports, maintain bastion hosts, or manage SSH keys. Just
go to the the AWS Systems Manager Session Manager
Console and click Start session. You'll then see a list of
your OHDSI instances. Select the one that you want to
access and click Start session. Now you have a shell
with sudoers access.

Fault tolerance and backups


Elastic Beanstalk keeps a highly available copy of your
current and previous Atlas/WebAPI application versions
as well as your environment configuration. This can be
used to re-deploy or re-create your Atlas/WebAPI
application environment at any time and serves as a
'backup'. You can also clone an Elastic Beanstalk
environment if you want a temporary environment for
testing or as a part of a recovery exercise. High
availability and fault tolerance are provided are achieved
by configuring your environment to have a minimum of 2
instances. Elastic Beanstalk will deploy these
Atlas/WebAPI application instances over multiple
availability zones. If an instance is unhealthy it will
automatically be removed and replaced.

The AWS Relational Database Service (RDS)


automatically takes backups of your database that you
can use to restore or "roll-back" the state of your
application. You can configure how long they are retained
and when they are taken using the RDS management
console. These backups can also be used to create a
copy of your database for use in testing or as a part of a
recovery exercise. High availability and fault tolerance are
provided by using the 'Multi-AZ' deployment option for
RDS. This creates a primary and secondary copy of the
database in two different availability zones.

Amazon Redshift automatically takes snapshots of your


OMOP data warehouse that can be used to recover it in
the event of a disaster or roll-back to a previous state.
You have the ability to restore the entire OMOP database
or single tables. You can also change the interval at which
snapshots are taken and change their retention period.

Scalability
Your Elastic Beanstalk environment is configured to scale
automatically based on CPU utilization within the
minimum and maximum instance parameters you
provided during deployment. You can change this
configuration to specify a larger minimum footprint, a
larger maximum footprint, different instance sizes, or
different scaling parameters. This allows your
Atlas/WebAPI application environment to respond
automatically to the amount of load seen from your users.

Your Relational Database Environment can be scaled by


selecting a larger instance type or increasing the storage
capacity of your instances.

Your Redshift data warehouse can be scaled for larger


storage or more performance. This is done by resizing
your Redshift cluster to either use larger nodes, more
nodes, or both.

You might also like