NIH’s AllofUs project is following the health of one million Americans throughout their lives. A person’s location has an important influence on their health and AllofUs is working to tap into that data via a subsidiary project: the Center for Linkage and Detection, aka CLAD.
Research IT is contributing toward linking these Social Determinants of Health by supporting the testing and implementation of geospatial tools in NIH’s private cloud. The NIH cloud has a number of special restrictions, from database size to security, that each presented unique hurdles to overcome while doing this custom geocoder implementation.
Delivering this project’s unique needs meant creating our own development environment for building and testing tools before they ever make it into the cloud. The geocoders were specially packaged for the NIH cloud with its own unique operational expectations, constraints that ended up requiring we break one of the geocoder databases into multiple pieces to fit. NIH’s own requirements around maintaining an alert security posture for cloud tools can in certain conditions cause a compulsory rebuild of tools to remediate critical security risks on a very short timeline.
By not being locked into any one toolset RIT was able to be flexible in meeting the moment by blending a mixture of local and cloud automation so each of the key challenges became not only surmountable but manageable with a minimum of human intervention. Research IT’s usual flavor of automated pipeline is more like delivering a table of clinical data, or exporting a Common Data Model representation of EHR data, but in this case a much more involved container creation pipeline was built incorporating unit tests, security checks and important logging for posterity. At its core the task was still delivering data in support of data science, it was just wrapped in a much fancier package than most folks are normally requesting!