Blind Join

Blind Join builds on Blind Match but provides powerful additional functions:

  • Identify the subset of data within 3rd-party datasets that match to features within your own dataset, and bring in additional feature information for that subset of the 3rd party’s data.
  • Perform “fuzzy matching” (using Jaro-Winkler distance) on identifiers that may differ slightly (e.g., name or address).

Blind Join is a Safe with Care operation (see Privacy Assurances and Risk in the Getting Started section of the User Guide), and has the potential for misuse. TripleBlind has a number of safeguards for its use:

  • Blind Join is not permitted to return any columns the Asset Owner has masked; the assumption being that masked columns contain PII/PHI or otherwise sensitive information.
  • Blind Join is disabled by default at our strictest security levels.
  • Unless an Agreement has been established permitting auto-approval of requests, all Blind Join operations require an informed Asset Owner approval through an Access Request.
  • k-Grouping is respected in the Blind Join operation as a minimum record threshold on the output; a join that would result in fewer than k records would automatically fail with a warning message.

Operation

  • Use the blind_join() method to join your dataset with one or more dataset assets.
  • When using add_agreement() to allow a counterparty to use your dataset for a Blind Join, use Operation.BLIND_JOIN for the operation parameter. Permissive agreements (without usage restrictions) are not recommended for use with this operation.

⚠️Blind Join is NOT recommended for HIPAA applications.

Parameters

intersect_with: Union[Asset, List[Asset]]

  • A dataset Asset (or a list of Assets) to be included in the Blind Join.

match_column: Union[str, List[str]]

  • Name of the column to match. If not the same in all datasets, a list of the matching column names, starting with the initiator asset and then listing a name in each intersect_with dataset.
  • If a single fieldname is provided, each dataset must have the same name for that match_column, eg. “ID”.

match_fuzziness: Optional[float] = 0.0

  • If specified, the fuzziness threshold for match values. This is a value between 0.0 and 1.0. Default is 0.0 (perfect match).

return_columns: Union[str, List[str], List[List[str]]]

  • Name of the columns to be returned in the resultant dataset. This is a list of lists of column names such that each party has its own list of column names (starting, as with match_column, with the initiator asset).
  • str: If a single column name is provided, columns matching that name across all datasets will be returned. eg. ["a"] is interpreted as [["a"], ["a"], ["a"]]
  • List[str]: If a list of names is provided, columns matching these names across all datasets will be returned. eg. ["a", "b"] is interpreted as [["a", "b"], ["a", "b"], ["a", "b"]]
  • List[List[str]]: If a list of lists is provided, only columns matching the names from the respective dataset will be returned. eg. [["a", "b"], ["c", "d"]] is interpreted as [["a", "b"], ["c", "d"]]
  • If a column name is specified for an asset and it does not exist, it is ignored. If multiple datasets have the same name for a column, the output dataset will contain suffixes to differentiate the source of each field. e.g. If "name" is in multiple datasets, the return will contain "name", "name__1" & "name__2".

join_type: Optional[JoinType]

  • Describes the format of the resulting dataset table. Default is INNER.
  • INNER: This is a SQL-like inner join where each party's return columns are included in the result set.
  • INNER_PARTITIONED: This is an inner join with each party's matching rows appended to the end of the dataset. There is no alignment by matching column. This format provides more privacy-protection and reveals less group-membership information than a standard inner join.
  • LEFT: This is a SQL-like left join where ALL records are returned from the initiator's table, with matching records from each party's return columns (where available).

job_name: Optional[str]

  • Reference name for this process. This name will appear in the Access Request, Process History, and Audit Reports.

preproc: Optional[Union[TabularPreprocessor, List[TabularPreprocessor]]]

  • The preprocessor(s) to use against the datasets. When no preproc is specified, the default preproc selects all columns.

silent: Optional[bool]

  • Suppress status messages during execution? Default is to show messages.

Limitations

  • A Blind Join using fuzzy matching is only supported for two parties.
  • Blind Join is a powerful operation and can only be used when the initiator owns at least one of the datasets in the computation.
  • When using Blind Join, the owned dataset must be supplied as the first (or left-side) dataset asset.
  • SELECT * is not permitted in sql_transform preprocessors upstream of Blind Query and Blind Join. Use get_column_names() to request a list of column names and list them explicitly in the query. This empowers the data owner to make an informed approval decision within any Access Requests against their dataset.

Wed May 15 2024 03:18:04 GMT-0400 (Eastern Daylight Time)