Triage

Data Science Toolkit for Social Good and Public Policy Problems

Building data science systems requires answering many design questions, turning them into modeling choices, which in turn run machine learning models. Questions such as cohort selection, unit of analysis determination, outcome determination, feature (explanantory variables) generation, model/classifier training, evaluation, selection, and list generation are often complicated and hard to choose apriori. In addition, once these choices are made, they have to be combined in different ways throughout the course of a project.

Triage is designed to:

Guide users (data scientists, analysts, researchers) through these design choices by highlighting critical operational use questions.
Provide an integrated interface to components that are needed throughout a data science project workflow.

Quick Links

Dirty Duck Tutorial - Are you completely new to Triage? Go through the tutorial here with sample data
QuickStart Guide - Try Triage out with your own project and data
Triage Documentation Site - Used Triage before and want more reference documentation?
Development - Contribute to Triage development.

Installation

To install Triage, you need:

Python 3.6
A PostgreSQL 9.4+ database with your source data (events, geographical data, etc) loaded.
Ample space on an available disk, (or for example in Amazon Web Services's S3), to store the needed matrices and models for your experiments

We recommend starting with a new python virtual environment (with Python 3.6 or greater) and pip installing triage there.

$ virtualenv triage-env
$ . triage-env/bin/activate
(triage-env) $ pip install triage

Data

Triage needs data in a postgres database and a configuration file that has credentials for the database. The Triage CLI defaults database connection information to a file stored in 'database.yaml' (example in example/database.yaml).

Configure Triage for your project

Triage is configured with a config.yaml file that has parameters defined for each component. You can see some sample configuration with explanations to see what configuration looks like.

Using Triage

Via CLI:

triage experiment example/config/experiment.yaml

Import as a python package:

from triage.experiments import SingleThreadedExperiment

experiment = SingleThreadedExperiment(
    config=experiment_config, # a dictionary
    db_engine=create_engine(...), # http://docs.sqlalchemy.org/en/latest/core/engines.html
    project_path='/path/to/directory/to/save/data' # could be an S3 path too: 's3://mybucket/myprefix/'
)
experiment.run()

There are a plethora of options available for experiment running, affecting things like parallelization, storage, and more. These options are detailed in the Running an Experiment page.

Development

Triag was initially developed at University of Chicago's Center For Data Science and Public Policy and is now being maintained at Carnegie Mellon University.

To build this package (without installation), its dependencies may alternatively be installed from the terminal using pip:

pip install -r requirement/main.txt

Testing

To add test (and development) dependencies, use test.txt:

pip install -r requirement/test.txt [-r requirement/dev.txt]

Then, to run tests:

pytest

Development Environment

To quickly bootstrap a development environment, having cloned the repository, invoke the executable develop script from your system shell:

./develop

A "wizard" will suggest set-up steps and optionally execute these, for example:

(install) begin

(pyenv) installed

(python-3.6.2) installed

(virtualenv) installed

(activation) installed

(libs) install?
1) yes, install {pip install -r requirement/main.txt -r requirement/test.txt -r requirement/dev.txt}
2) no, ignore
#? 1

Contributing

If you'd like to contribute to Triage development, see the CONTRIBUTING.md document.

Name		Name	Last commit message	Last commit date
Latest commit History 1,231 Commits
dirtyduck		dirtyduck
docker		docker
docs		docs
example		example
requirement		requirement
src		src
.bandit.yml		.bandit.yml
.codeclimate.yml		.codeclimate.yml
.editorconfig		.editorconfig
.gitignore		.gitignore
.python-version.current		.python-version.current
.pyup.yml		.pyup.yml
.travis.yml		.travis.yml
AUTHORS.rst		AUTHORS.rst
CONTRIBUTING.md		CONTRIBUTING.md
HISTORY.rst		HISTORY.rst
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
develop		develop
manage.py		manage.py
pytest.ini		pytest.ini
setup.cfg		setup.cfg
setup.py		setup.py
tox.ini		tox.ini
tutorial.sh		tutorial.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Triage

Quick Links

Installation

Data

Configure Triage for your project

Using Triage

Development

Testing

Development Environment

Contributing

About

Releases

Packages

Languages

License

silvrwolfboy/triage-1

Folders and files

Latest commit

History

Repository files navigation

Triage

Quick Links

Installation

Data

Configure Triage for your project

Using Triage

Development

Testing

Development Environment

Contributing

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages