OHDSI - OHDSIonAWS - Automation Code and Documentation For Standing Up The OHDSI Toolstack in An AWS Environment
OHDSI - OHDSIonAWS - Automation Code and Documentation For Standing Up The OHDSI Toolstack in An AWS Environment
OHDSI - OHDSIonAWS - Automation Code and Documentation For Standing Up The OHDSI Toolstack in An AWS Environment
master 1
0 Go to file Add file Code About
branch
tags
Automation code
Clone
and
JamesSWiggins Upd… 3010d78 on 4 Feb 110 commits
HTTPS SSH GitHub CLI documentation for
https://github.com/OHDSI/OHDSIonAWS.git standing up the
datasources Add files via upload 16 months ago
OHDSI toolstack in
Use Git or checkout with SVN using the web URL.
images Add files via upload 9 months ago an AWS
environment
00-master-… Add files via upload 16 months ago
Open with GitHub Desktop
Readme
00-master-… updates 9 months ago
Download ZIP Apache-2.0
00-master-… Add files via upload 16 months ago
License
00-master-… Add files via upload 16 months ago
Enterprise, Multi-User
OHDSI on AWS
Quick Start
cn-
China
northwest-
(Ningxia)
1
ap-
northeast- AP (Seoul)
2
ap-
AP
southeast-
(Singapore)
1
ap-
northeast- AP (Tokyo)
1
ap-
AP
southeast-
(Sydney)
2
eu-west-1 EU (Ireland)
eu- EU
central-1 (Frankfurt)
EU
eu-west-2
(London)
eu-west-3 EU (Paris)
README.md EU
eu-north-1
(Stockholm)
ca-
Canada
central-1
US East (N.
us-east-1
Virginia)
US East
us-east-2
(Ohio)
US West (N.
us-west-1
California)
US West
us-west-2
(Oregon)
Atlas v2.8.0
WebAPI v2.8.0
Achilles v1.6.3
PatientLevelPrediction v4.0.5
CohortMethod v4.0.1
SqlRender v1.6.8
DatabaseConnector v3.0.0
DatabaseConnectorJars v1.1.0
OhdsiRTools v1.9.1
FeatureExtraction v3.1.0
Cyclops v3.0.0
EmpiricalCalibration v2.0.2
OhdsiSharing v0.2.2
MethodEvaluation v2.0.0
Hydra v0.1.1
PredictionComparison v1.0.0
Eunomia v1.0.1
BigKNN v1.0.0
Andromeda v0.4.0
SelfControlledCaseSeries v2.0.0
SelfControlledCohort v1.5.0
EvidenceSynthesis v0.2.3
CohortDiagnostics v1.2.7
Sample
Size Schema Name
Data Source
1,000
Synthea synthea1k
persons
100,000
Synthea synthea100k
persons
>2,000,000
Synthea synthea23m
persons
CMS 1,000
CMSDESynPUF1k
DeSynPUF persons
CMS 100,000
CMSDESynPUF100k
DeSynPUF persons
CMS >2,000,000
CMSDESynPUF23m
DeSynPUF persons
Topics
Pre-flight check
Begin deployment
Parameter
Description
Name
Parameter
Description
Name
Parameter
Description
Name
Specifies whether to deploy the AWS
Use
Aurora PostgreSQL WebAPI database
Primary
in a Multi-AZ configuration. This
and Standy
provides a stand-by database for high
Database
availability in the event the primary
Instances?
database fails.
Parameter
Name
Comma-
delimited list of Comma-delimited list of OMOP CDM schema sources to load into the Red
OMOP CDM datawarehouse. By default, this is set to
schema CMSDESynPUF23m,CMSDESynPUF100k,CMSDESynPUF1k,synthea23m,synthe
sources to which will load all six of the sample data sources.
load into the sources you want in your environment.
Redshift data sources.
datawarehouse
S3 Bucket that
contains DDL
SQL files name
after each
'Source'.sql
S3 Bucket that contains DDL SQL files name after each 'Source'.sql that w
that will be
to load data into the OMOP CDM schema sources.
executed to
load data into
the OMOP
CDM schema
sources.
Sample
Size Schema Name
Data Source
1,000
Synthea synthea1k
persons
100,000
Synthea synthea100k
persons
>2,000,000
Synthea synthea23m
persons
CMS 1,000
CMSDESynPUF1k
DeSynPUF persons
CMS 100,000
CMSDESynPUF100k
DeSynPUF persons
CMS >2,000,000
CMSDESynPUF23m
DeSynPUF persons
RStudio parameters
Provide a comma
separated list of usernames
and passwords
(user1,pass1,user2,pass2)
to create on the RStudio
Comma-delimited Server. The first user in the
user list for RStudio list will be given sudoers
access. Do not use 'admin'
as a username. It causes
problems with Atlas
security.
Parameter
Description
Name
Using RStudio
Inside of each user's home directory in RStudio is a file
called ConnectionDetails.R. It contains all of the
connection parameters needed to use OHDSI
components like PatientLevelPrediction, CohortMethod,
or Achilles.
Troubleshooting Deployments
CloudFormation Events
A CloudFormation deployment is called a stack. The
OHDSI template deploys a number of child or nested
stacks depending on which options you choose. If one of
the steps in any of these stacks fail during deployment,
all of the stacks will be rolled back, meaning that they will
be deleted in the reverse order that they were deployed.
In order to understand why a deployment rolled back, it
can be helpful to look at the events that CloudFormation
recorded during deployment. You can do this by looking
at the Events tab of each stack. If a stack has already
been rolled back, you will have to change the filter in the
upper-left corner of the CloudFormation Management
Console from it's default of Active to Deleted to see it's
event log. A demonstration of this is shown following.
Build Log
During the build process a temporary Linux instance is
created that compiles WebAPI, combines it with Atlas,
loads all of your OMOP data sets into Redshift, runs
Achilles, and performs various other configuration
functions. You can see a log of the work that it did by
looking in the CloudWatch Logs Management Console
under the Log Group ohdsi-temporary-ec2-instance-
build-log .
On-going Operations
At this point, you have a fully functioning and robust
OHDSI environment to begin using. Following are some
helpful points to consider regarding how to support this
environment on-going.
Upgrading Atlas/WebAPI
Web Security
If your OHDSI implementation will be accessible over the
public Internet, consider implementing a AWS Web
Application Firewall (WAF) to help protect against
common web exploits that could affect availability,
compromise security or consume excessive resources.
You can use AWS WAF to create custom rules that block
common attack patterns, such as SQL injection or cross-
site scripting. Learn more in the whitepaper "Use AWS
WAF to Mitigate OWASP’s Top 10 Web Application
Vulnerabilities" . You can deploy AWS WAF on either
Amazon CloudFront as part of your CDN solution or on
the Application Load Balancer (ALB) that was deployed
as a part of this solution.
Using the Elastic Beanstalk service you can pull log files
from one or more of your instances. You can also view
monitoring data including CPU utilization, network
utilization, HTTP response codes, and more. From this
monitoring data, you can configure alarms to notify you
of events within your Atlas/WebAPI application
environment. Elastic Beanstalk also makes managed
platform updates available, including Linux, Tomcat, and
Apache upgrades that you can apply during maintanence
windows your define.
Scalability
Your Elastic Beanstalk environment is configured to scale
automatically based on CPU utilization within the
minimum and maximum instance parameters you
provided during deployment. You can change this
configuration to specify a larger minimum footprint, a
larger maximum footprint, different instance sizes, or
different scaling parameters. This allows your
Atlas/WebAPI application environment to respond
automatically to the amount of load seen from your users.