Reproducible Quantum Chemistry in Jupyter Notebooks
Reproducible Quantum Chemistry in Jupyter Notebooks
Reproducible Quantum Chemistry in Jupyter Notebooks
in JupyterLab
Chris Harris (Kitware)
@openchem
Overview
▪ Scientific Use Case
▪ Why Jupyter?
▪ Approach
▪ Demo
▪ Architecture
- Backend
- Frontend
▪ Deployment
▪ Future
Project and Team
▪ Department of Energy SBIR Phase II (Office of Science contract DE-
SC0017193)
▪ Marcus D. Hanwell (Kitware)
- Background in physics, experimental data, nanomaterials, visualization
▪ Chris Harris (Kitware)
- Computer science, AI, HPC
▪ Bert de Jong (Berkeley Lab)
- Developer of NWChem computational chemistry code, machine learning,
quantum computing
▪ Johannes Hachmann (SUNY Buffalo)
- Expertise in chemistry, machine learning, chemical library generation
Scientific Use Case
▪ Using quantum mechanics to characterize chemical systems
▪ Has seen vast improvements in both veracity and volume of data
▪ Lack of transparent and reproducible workflow
- Ad-hoc data management
- Complexity associated with codes
- The intricacies of HPC
▪ Lack of integration with environments for visualization and analysis
▪ Need a platform to enable end-to-end workflows from simulation setup, simulation
submission, right through to analytics and visualization of the result
Why Jupyter?
▪ Supports interactive analysis while preserving the analytic steps
- Preserves much of the provenance
▪ Familiar environment and language
- Many are already familiar with the environment
- Python is the language of scientific computing
▪ Simple extension mechanism
- Particularly with JupyterLab
- Allows for complex domain specific visualization
▪ Vibrant ecosystem and community
Approach
▪ Data is the core of the platform
- Start with simple but powerful data model and data server
▪ RESTful APIs everywhere
- Allows access anywhere
- Notebooks, web apps, command line, desktop applications, etc
▪ Jupyter notebooks for interactive analysis
- Provide a simple high-level domain specific Python API for use within the notebooks
▪ Web application
- Authentication, access control and user management
- Launching/managing notebooks
- Enable users to interact with data without having to launch notebooks
Demo
Architecture
▪ Backend
- Data Management
- Job Execution
- Notebook management
▪ Frontend
- Web components
- JupyterLab Extensions
- Web application
Data Management
▪ Computational chemistry codes produce a wide variety of output
- Often non-standard, even non-structured
- Need to convert to single format
▪ Chemical JSON (CJSON)
- Simple JSON format for representing chemical information
- Efficient binary representation
- MolSSI standard being developed
▪ Support export in multiple standard formats
- Facilitate integration
Data Management
▪ Girder
- Web-based data management platform
- Enable quick and easy construction of web applications:
- Data organization and dissemination
- User management & authentication
- Authorization management
- Extended via the development of plugins
- Expose new data models and RESTful endpoints
Job Execution
▪ What's involved in submitting a job to run on HPC resource?
- Input generation
- Code specific and often pretty esoteric
- Moving the required data onto the resource
- Generate submission script
- Scheduler specific
- Submit and monitor job
- Scheduler specific
- Post-processing or ingestion of result