DataSHIELD is an infrastructure and series of R packages and tools that enables the remote and active disclosive protected analysis of sensitive research data.
Disclaimer: Neither the DataSHIELD team nor the DataSHIELD Advisory Board can warrant that the packages listed here are fit for purpose. It is the responsibility of any user to ensure the package is sufficiently tested, documented and disclosure checked to be used.
R packages that enables the remote and non-disclosive analysis of sensitive research data. dsBase and dsBaseClient provide the non-domain specific data shaping, analysis and presentation methods. The dsBase being deployed to the server and dsBaseClient being deployed to the R environment used by the analyst/researcher and provide the method used by the initiate operations of the server.
Omics analysis. Currently the methods being supported are:
This package is mostly a wrapper of the rexposome package, which is an R package for exposome characterization and exposome-outcome test association.
This package contains a set of functions to automate processes in DataSHIELD to make data manipulation and analysis easier.
This is a standalone package for survival analysis in DataSHIELD. These are client side functions for building survival models and Cox models.
Methods to apply causal mediation analysis.
dsSwissKnife is a software package which builds upon (and extends) DataSHIELD to allow non-disclosive remote federated analysis on sensitive data.
Non-disclosive Machine learning functions. Currently the methods being supported are:
dsGeo contains basic functions for defining geometries in space, create buffers around geometries and determine overlaps between geometries
This package contains utilities to aid the development and testing of new DataSHIELD methods and packages.
The package is intended to be a collection of methods for microbiome analysis. Currently methods are wrappers around methods provided by the “vegan”.
Boltzmann machines are generative neural networks which are able to learn the distribution of data that are fed as input to the network. The dsBoltzmannMachines package allows to train and use these generative models in DataSHIELD for creating synthetic data that preserve patterns of input data. Synthetic data samples are not linked to individual samples in the original data but are generated via sampling from the distribution that is captured by a Boltzmann machine model.
This package performs federated Bayesian inference through Markov Chain Monte Carlo (MCMC) on Structural Equation Regression (SEM) model.
dsMTL (Federated Multi-Task Learning based on DataSHIELD) provided federated, privacy-preserving multi-task learning analysis. dsMTL aimed at simultaneously learning the outcome (e.g. diagnosis) associated patterns across datasets with dataset-specific, as well as shared, effects. Four sparse MTL methods were contained in dsMTL to disentangle specific structures of “shared-specific” components. In addition, Federated LASSO was also included due to its wide usage. dsMTL was suitable for biomedical applications, such as comorbidity analysis, multi-omics analysis, multi-modal data analysis, and high-dimensional molecular studies.
This package can be used to generate a synthetic data set on the client side by running the generation on the server side. Users can then perform harmonisation while working with full access to synthetic data on the client to confirm algorithms are working as expected. When the user is happy that the algorithms are working correctly, they can then be applied to the real data on the server side. The user therefore has the benefit of being able to see the data they are working with, but without the need to go through labourious data transfer processes. The same benefits are realised for an analysis user.
This package perform non-disclosive cluster analysis in DataSHIELD.