Blind Stats
Blind Stats is a powerful privacy-preserving operation that allows a dataset user to understand a study population across multiple datasets, even when the data is in different organizations or regions.
Blind Stats is a Safe operation (see Privacy Assurances and Risk in the Getting Started section of the User Guide), and has the potential for misuse. TripleBlind has a number of safeguards for its use:
- Unless an Agreement has been established permitting auto-approval of requests, all Blind Stats operations require an informed Asset Owner approval through an Access Request. The Access Request for Blind Stats contains information on all requested statistics.
- Requests are automatically rejected for Blind Stats operations when they would return descriptive information on groups of records that do not meet the minimum
k-Grouping
limits set on the involved datasets.
Operation
- Use the
get_statistics()
method to query the dataset for descriptive statistics. - When using
add_agreement()
to permit a counterparty to obtain descriptive statistics for your dataset asset, useOperation.STATS
for theoperation
parameter. - Although the calculation of these statistics is done in a secure and private manner, be sure that the information that is returned (such as minimums, maximums, and quartiles) is acceptable to be shared before creating a permissive agreement with a counterparty.
Parameters
column: List[str]
- Name of data column(s) upon which to calculate.
function: List[StatFunc]
# Function(s) to calculate. If not specified, all stats are calculated.
StatFunc.CONFIDENCE_INTERVAL
# 95% CI, labeled ‘ci-lower’, ‘ci-upper’StatFunc.COUNT
# number of items in group, labeled ‘n’StatFunc.KURTOSIS
# labeled 'kurt'StatFunc.MAXIMUM
# labeled 'max'StatFunc.MINIMUM
# labeled 'min'StatFunc.MEAN
# labeled 'mean'StatFunc.MEDIAN
# labeled 'median'StatFunc.QUARTILES
# labeled ‘q1’, ‘mean’, ‘q3’StatFunc.SKEW
# labeled ‘skew’StatFunc.STANDARD_DEVIATION
# labeled ‘sd’StatFunc.STANDARD_ERROR
# labeled ‘se’StatFunc.VARIANCE
# labeled ‘var’
combine_with: Optional[Union[Asset, List[Asset]]] = None
- Other table(s) with the same data/columns to virtually combine for the calculation.
- The combination of datasets using
combine_with
is a horizontal union/concatenation (not like a join). - One dataset may be supplied by leaving
combine_with
out of the process call.
group_by: Optional[str] = None
- Data column for grouping data before the calculation.
- Supports stratification on a single grouping column.
preproc: Optional[Union[TabularPreprocessor, List[TabularPreprocessor]]]
- The preprocessor(s) to use against datasets. If a list is given, the order must be the same as the
combine_with
assuming the first entry is thisTableAsset
.
job_name: Optional[str]
- Reference name for the job which performs this task.
silent: Optional[bool] = False
- Suppress status messages during execution? Default is to show messages.
session: Optional[Session]
- A connection session. If not specified, the default session is used.
Limitations
- The
group_by
parameter supports a single grouping column. - When the supplied datasets have values missing and appear as
NaNs
orNulls
, the row is dropped before entering the multi-party computation. We recommend preprocessing datasets to handle these values upstream of the operation.Wed May 15 2024 03:40:59 GMT-0400 (Eastern Daylight Time)