An established software for secure data science collaboration

DataSHIELD is an infrastructure and series of R packages that enables the remote and non-disclosive analysis of sensitive research data. DataSHIELD has been used with real world and consented research data for over 10 years.

DataSHIELD features

Privacy by design

DataSHIELD follows privacy by design principles built into the computing infrastructure, analytic functionality, automated output checking and is designed to operate with best practice data governance for the use of senstive health data.

Privacy preserving analysis

Analysts do not directly connect to DataSHIELD data sites, secure connection is only available through a DataSHIELD client by authorised analysts. Analysts do not view individual patient data, and it is not shared or moved from the data site.

Federated analysis

DataSHIELD analysis can be aggregated from individual patient data at each location or a global analysis can be conducted simultaneously at all sites without shareing or moving individual patient data.

Automated output checking

DataSHIELD analytic functions have been designed to only share non disclosive summary statistics, with built in automated output checking based on statistical disclosure control. Only data sites can set the threshold values for the automated output checks.

Software validation

DataSHIELD base package dsBase runs over 12,000 tests every software release. These tests run against synthetic datasets included in the package checking for correct operation, outputs and that disclosure control is not breached.

Open source

DataSHIELD comprises a collection of open source infrastructure developed by OBiBa and Molgenis and R packages for analytic functionality maintained by the DataSHIELD community comprising contritbutors from academia, healthcare, industry and SMEs.