Skip to content

Commit

Permalink
Iterate blocks (#215)
Browse files Browse the repository at this point in the history
* change get dataset to only return a df, and storing evaluations using model_id

* test using triage

* traige working with many test_dates and using metta to store matrices

* change the way we call the dates for generating the features

* use metrics for percentage and absolute values for evaluation

* added parallel execution of feature generation

* updated result schema with indexes

* change metadata to datetime

* change logistic regression parameter C_reg to C

* add test_time_ahead

* add raw incident type and levels

* change group incident to complaint type to add raw type

* delete grouped incident type or incident type levels from incidents table, just leaving the raw code

* create functions to create labels table from config file and reads the matrices for loading train and test matrices

* change example config file

* add dispatch type code for original and final response and add new categories for the dispatch lookup table to match for cmpd

* remove features using group incident type code

* added new field interview outcome type banned

* added new field interview types

* bracket was missing at dispatch config

* fixed bug with unclosed bracket and prepeard config

* updated config and featuers for new categories and limited selection based on 1600 colum limit

* changed config to be below 1600 columns

* merge with new configuration for labels and create function for generating all matrices

* delete events with NULL event_datetime and officer ids from labels table

* changes in config file for labels

* added locking function which is NFS save

* add make_hashable function for metadata

* removed debugging statements

* changed data-types to real in results to save space and added a timeout for locking

* fix hash function, DaysSinceLastCompletedAllegation and config file

* fix labels table removing null values

* make parallelize generate matrices

* use psycopg2 for reading features and naming cursor

* changed model running to train a single model first and then test across all test sets and then discard the model

* add labels_config to models_group

* added a delete statement before new evaluations for the same as_of_date and model_id are  written to the database

* add officer_id column and labels_config that passes into metta

* change query for grabbing labels to exclude as of date + prediction window interval

* add model_config to metadata for creating model_group_id

* added rounding for storing float to avoid underflow errors in postgres

* increased verbose output

* add function for generating as of dates for feature generation

* typo

* change officer roles table adding dates and department

* comment util functions and change train_metadata send to db_parameters to the config file in the db

* add ucr4 code grouped matching dispatch types, add lookuptables for ucr4 grouped and codes

* delete old code for feature generation, add docstrings

* remove unused config file

* be able to iterate through blocks leaving one block out and reading features tables directly from collate tables

* add arrest crime feature and compliments block

* add sworn flag

* fix stored procedure for reading feature blocks

* add individual importances table in production schema

* iterate through blocks and add post collate feature for dispatch movement

* change log info and add DispatchMovement to config

* add create patrol districts table and change config parameters

* stored procedure for getting complete feature description

* change evaluations as of date column to start and end time

* delete patrol distrincts

* added outside employment

* add train end time and experiment hash to results.models

* add experiment hash for triage and fix evaluation start time

* remove feature abstract unused classes, update config and add features descriptions
  • Loading branch information
andreanr authored and k1aus committed Apr 10, 2017
1 parent 1211a45 commit 0094477
Show file tree
Hide file tree
Showing 35 changed files with 2,696 additions and 3,939 deletions.
4 changes: 2 additions & 2 deletions docs/repositories_dependencies_and_pipeline.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,13 +73,13 @@ Change `staging_dev` to an alternate name if desired, and from the terminal run:

To generate features, set the configuration of features (and labels) in a configuration file that has the same form as `example_officer_config.yaml`. And then from the `police-eis` directory, run:

`python -m eis.run [configuration file name] --buildfeatures`
`python -m eis.run --config [configuration file name] --labels [labels configuration file name] --buildfeatures`

### Features to results

Once the features have been built, you can run all of the models with:

`python -m eis.run [configuration file name]`
`python -m eis.run --config [configuration file name] --labels [labels configuration file name]`

### Model interrogation, plotting, and analysis

Expand Down
550 changes: 211 additions & 339 deletions eis/dataset.py

Large diffs are not rendered by default.

8 changes: 2 additions & 6 deletions eis/experiment.py
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,7 @@ def generate_time_sets(config):
test_end_date -= relativedelta(**update_window_deltas[update_window])
return temporal_info

def generate_models_to_run(config, query_db=True):
def generate_models_to_run(config, labels_config, query_db=True):
"""Generates a list of experiments with the various options
that we want to test, e.g. different temporal cross-validation
train/test splits, model types, hyperparameters, features, etc.
Expand Down Expand Up @@ -176,15 +176,11 @@ def generate_models_to_run(config, query_db=True):
this_config["prediction_window"] = temporal_info["prediction_window"]
this_config["officer_past_activity_window"] = temporal_info["officer_past_activity_window"]


# pass only the labels names selected in the config as True
this_config["officer_labels"] = [ key for key in config["officer_labels"] if config["officer_labels"][key] == True ]

# get the appropriate feature data from the database
if config["unit"] == "officer":
# get officer-level features to use
this_config["officer_features"] = officer.get_officer_features_table_columns( config )
exp_data = officer.run_traintest(this_config, as_of_dates_to_use)
exp_data = officer.run_traintest(this_config, labels_config, as_of_dates_to_use)

elif config["unit"] == "dispatch":
exp_data = dispatch.run_traintest(this_config)
Expand Down
446 changes: 446 additions & 0 deletions eis/feature_loader.py

Large diffs are not rendered by default.

186 changes: 0 additions & 186 deletions eis/features/abstract.py

This file was deleted.

3 changes: 0 additions & 3 deletions eis/features/class_map_generator.py

This file was deleted.

134 changes: 134 additions & 0 deletions eis/features/features_descriptions.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
# suffix used in each block
feature_blocks_suffixes:
IncidentsReported: ['ir', 'irAG']
IncidentsCompleted: ['ic', 'icAG']
OfficerShifts: ['shifts']
OfficerArrests: ['arstat']
TrafficStops: ['ts']
FieldInterviews: ['fi']
UseOfForce: ['uof']
Dispatches: ['dispatch', 'dispatchPOST']
OfficerEmployment: ['outemp']
EISAlerts: ['eis']
OfficerCharacteristics: ['ocND', 'ocAG']
DemographicNpaArrests: ['demarrests']
OfficerCompliments: ['compliments']

# names of features for each block
feature_names:
#IncidentsReported:
SuspensionsOfType: 'Officer suspensions'
HoursSuspensionsOfType: 'Hours of suspension an officer has had'
InterventionsOfType: 'Punitive actions of each type an officer has had'
AllAllegations: 'Allegations made against an officer'
IncidentsOfType: 'IA Reported incidents'
ComplaintsTypeSource: 'Complaints against officer by source'
DaysSinceLastAllegation: 'Days since the last allegation was made against the officer'

#IncidentsCompleted:
IncidentsByOutcome: 'IA Incidents of outcome'
IncidentsOfTypeOutSustained: 'IA Sustained incidents '
IncidentsOfTypeUnSustained: 'IA Unsustained incidents '
IncidentsOfTypeUnknown: 'IA Unknown outcome incidents '
DaysSinceLastSustainedAllegation: 'Days since the last sustained allegation was made against the officer'

#OfficerCompliments:
Compliments: 'Officer compliments'

#OfficerShifts:
ShiftsOfType: 'Shifts Types'
HoursPerShift: 'Hours per shift'

#OfficerArrests:
ArrestMonthlyVariance: 'Month-by-month variance of arrest counts'
ArrestMonthlyCOV: 'Month-by-month coefficient of variation in arrest counts'
Arrests: 'Arrests made by an officer'
ArrestsOfType: 'Arrests type'
ArrestsON: 'Arrests by day of week'
SuspectsArrestedOfRace: 'Suspects arrested by race'
SuspectsArrestedOfEthnicity: 'Suspects arrested by ethnicity'
ArrestsCrimeType: 'Crimes by UCR grouped type'

#TrafficStops:
TrafficStopsWithSearch: 'Traffic stops with search'
TrafficStopsWithUseOfForce: 'Traffic stops with use of force'
TrafficStops: 'Traffic stops'
TrafficStopsWithArrest: 'Traffic stops with arrests'
TrafficStopsWithInjury: 'Traffic stops with injury'
TrafficStopsWithOfficerInjury: 'Traffic stops with officer injury'
TrafficStopsWithSearchRequest: 'Traffic stops with search request'
TrafficStopsByRace: 'Traffic stops by race'
TrafficStopsByStopType: 'Traffic stops by stop type'
TrafficStopsByStopResult: 'Traffic stops by result'
TrafficStopsBySearchReason: 'Traffic stops by search reason'
TrafficStopsByInterestingSearch: 'Traffic stop searches made by interesting words in the search justification narrative'

#FieldInterviews:
FieldInterviews: 'Field interviews'
HourOfFieldInterviews: 'Average hour that field interviews are conducted'
ModeHourOfFieldInterviews: 'Most common hour when field interviews are conducted'
FieldInterviewsByRace: 'Field interviews by race'
FieldInterviewsByOutcome: 'Field interviews of outcome'
FieldInterviewsWithFlag: 'Field inteviews with flag'
InterviewsType: 'Field interviews Types'

#UseOfForce:
UsesOfForceOfType: 'Uses of force of force'
UnjustifiedUsesOfForceOfType: 'Unjustified uses of force'
UnjustUOFInterventionsOfType: 'Punitive action types following an unjustified force'
UOFwithSuspectInjury: 'Uses of force where the suspect was injured'
SuspectInjuryToUOFRatio: 'Ratio of suspect injuries to uses of force that an officer has'

#Dispatches:
DispatchType: 'Dispatches Types'
DispatchInitiatiationType: 'Dispatches by type of initiation (Officer or Citizen)'
DispatchDivision: 'Dispatches by division'
DispatchMovement: 'Cross Division Dispatches'

#OfficerCharacteristics:
DummyOfficerMarital: 'Officer marital status'
DummyOfficerGender: 'Officer gender'
DummyOfficerRace: 'Officer race '
DummyOfficerEthnicity: 'Officer ethnicity'
OfficerAge: 'Age of officer in years'
DummyOfficerEducation: 'Officer education level'
MilesFromPost: 'Number of miles to post'
DummyOfficerMilitary: 'Officer Military experience'
AcademyScore: 'Performance score at the police academy'
DummyOfficerRank: 'Officer rank'

#OfficerEmployment:
OutsideEmploymentHours: 'Outside employment hours worked'

#DemographicNpaArrests:
Arrests311Call: '311 calls in areas where the officcer made arrests'
Arrests311Requests: '311 requests in areas where the officcer made arrests'
PopulationDensity: 'Population density in areas where the officcer made arrests'
AgeOfResidents: 'Age of Residents in areas where the officcer made arrests'
BlackPopulation: 'Black population in areas where the officcer made arrests'
HouseholdIncome: 'Household income in areas where the officcer made arrests'
EmploymentRate: 'Employment rate in areas where the officcer made arrests'
VacantLandArea: 'Vacant land area in areas where the officcer made arrests'
VoterParticipation: 'Voter participation in areas where the officcer made arrests'
AgeOfDeath: 'Age of death in areas where the officcer made arrests'
HousingDensity: 'Housing density in areas where the officcer made arrests'
NuisanceViolations: 'Nuisance violations in areas where the officcer made arrests'
ViolentCrimeRate: 'Violent crime rate in areas where the officcer made arrests'
PropertyCrimeRate: 'Property crime rate in areas where the officcer made arrests'
SidewalkAvailability: 'Sidewalk availability in areas where the officcer made arrests'
Foreclosures: 'Foreclosures in areas where the officcer made arrests'
DisorderCallRate: 'Disorder call rate in areas where the officcer made arrests'

time_aggregations:
1d: 'over the past day'
1w: 'over the past week'
1m: 'over the past month'
1y: 'over the past year'
5y: 'over the past 5 years'

metrics_name:
sum: 'Total number of'
avg: 'Average number of'
max: ''
mode: 'The most common value of'
rate: 'Rate of'
Loading

0 comments on commit 0094477

Please sign in to comment.