Module tripleblind.asset

Assets are the primary and most valuable objects stored on the Router.

An asset represents either an Algorithm or Data. All assets have an owner who controls the access to their assets. Additionally, all assets have prices associated with them for utilization.

To assist in working with different assets, there is a hierarchy of helper classes:

Asset
    DatasetAsset
        DicomDataset
        ImageDataset
        NumPyDataset
        TabularDataset
            CSVDataset
            ElasticsearchDataset
            S3Dataset
            DatabaseDataset
                AzureDataLakeStorageDataset
                BigQueryDatabase
                DatabricksDatabase
                MongoDatabase
                MSSQLDatabase
                OracleDatabase
                RedshiftDatabase
                SnowflakeDatabase


    AlgorithmAsset
        NeuralNetwork
        PMMLRegression
        ReportAsset
            DatabaseReport
                BigQueryReport
                DatabricksReport
                RedshiftReport
                MSSQLReport
                OracleReport
                SnowflakeReport
        XGBoostModel

    ModelAsset
        Regression
            RegressionModel
            PSIVerticalRegressionModel

    ModelTrainerAsset

Most of these have a create() method to assist you in building assets easily and properly. See the specific class for more details.

Global variables

var CNN

The built-in CNN Operation

var DISTRIBUTED_INFERENCE

The built-in Distributed Inference Operation

var FEDERATED_LEARNING_PROTOCOL

The built-in Federated Learning Operation

var NAMESPACE_DEFAULT_USER

Special UUID which represents the current user

var PSI_VERTICAL_PARTITION_TRAINING

The built-in PSI Vertical Partition Training Operation

var PYTORCH_INFERENCE

The built-in PyTorch Inference Operation

var RANDOM_FOREST_INFERENCE

The built-in Random Forest Inference Operation

var REGRESSION_INFERENCE

The built-in Regression Inference Operation

var ROI_DETECTOR_INFERENCE

The built-in Region of Interest Detector Inference Operation

var SKLEARN_INFERENCE

The built-in Scikit-learn Inference Operation

var SPLIT_LEARNING_TRAINING

The built-in Split Learning Training Operation

var VERTICAL_PARTITION_SPLIT_TRAINING

The built-in Vertical Partition Split Training Operation

var XGBOOST_INFERENCE_FED

The built-in XGBoost Inference with FED security Operation

var XGBOOST_INFERENCE_SMPC

The built-in XGBoost Inference with SMPC security Operation

Functions

def create_frame(data, opcode, fin=1)

Monkey patch websocket-client library to skip masking.

https://github.com/websocket-client/websocket-client/blob/df275d351f9887fba2774e2e1aa79ff1e5a24bd1/websocket/_abnf.py#L194

Masking is a protocol level security feature that is redundant with TLS connections. This is monkey patched because gevents-websocket server side websocket library is incredibly slow at unmasking causing something like a 4x slow down. See: https://stackoverflow.com/a/32290330/2395133

Classes

class AlgorithmAsset (uuid: UUID)

An abstract Asset used to perform a calculation.

This could be a trained neural network, a prebuilt protocol, or a Python or SQL script to be executed against a DatasetAsset.

Ancestors

Subclasses

Inherited members

class Asset (uuid: UUID)

Points to a dataset or an algorithm indexed on the TripleBlind Router.

Subclasses

Class variables

var uuid : uuid.UUID

Identifier for this asset.

Static methods

def find(search: Optional[Union[str, re.Pattern]], namespace: Optional[UUID] = None, owned: Optional[bool] = False, owned_by: Optional[int] = None, dataset: Optional[bool] = None, algorithm: Optional[bool] = None, session: Optional[Session] = None, exact_match: Optional[bool] = True) -> Asset

Search the Router index for an asset matching the given search

Args

search : str or re.Pattern, optional
The search pattern applied to asset names. A simple string will be used as a substring search if exact_match is False, otherwise it will only return exact matches.
namespace : UUID, optional
The UUID of the user to which this asset belongs. None indicates any user, NAMESPACE_DEFAULT_USER indicates the current API user.
owned : bool, optional
Only return owned assets (either personally or by the current user's team)
owned_by : int, optional
Only return owned assets owned by the given team ID
dataset : bool, optional
Set to True to search for datasets. Default is to search for both data and algorithms.
algorithm : bool, optional
Set to True to search for algorithms. Default is to search for both data and algorithms.
session : Session, optional
A connection session. If not specified, the default session is used.
exact_match : bool, optional
When the 'search' is a string, setting this to True will perform an exact match. Ignored for regex patterns, defaults to True.

Raises

TripleblindAssetError
Thrown when multiple assets are found which match the search.

Returns

Asset
A single asset, or None if no match found
def find_all(search: Optional[Union[str, re.Pattern]], namespace: Optional[UUID] = None, owned: Optional[bool] = False, owned_by: Optional[int] = None, dataset: Optional[bool] = None, algorithm: Optional[bool] = None, max: Optional[int] = 500, session: Optional[Session] = None) -> List[Asset]

Search the Router index for assets matching the given search

Args

search : Optional[Union[str, re.Pattern]]
Either an asset ID or a search pattern applied to asset names and descriptions. A simple string will match substrings, or a regular expression can be passed for complex searches.
namespace : UUID, optional
The UUID of the user to which this asset belongs. None indicates any user, NAMESPACE_DEFAULT_USER indicates the current API user.
owned : bool, optional
Only return owned assets (either personally or by the current user's team)
owned_by : Optional[int], optional
Only return owned assets owned by the given team ID
dataset : Optional[bool], optional
Set to True to search for datasets. Default is to search for both data and algorithms.
algorithm : Optional[bool], optional
Set to True to search for algorithms. Default is to search for both data and algorithms.
max : Optional[int], optional
Maximum number of results to return. If not specified, defaults to 500.
session : Session, optional
A connection session. If not specified, the default session is used.

Returns

List[Asset]
A list of found assets, or None if no match found
def position(file_handle: Union[str, Path, Package, io.BufferedReader], name: str, desc: str, is_discoverable: Optional[bool] = False, k_grouping: Optional[int] = 5, allow_overwrite: Optional[bool] = False, session: Optional[Session] = None, is_dataset: bool = True, custom_protocol: Optional[Any] = None, metadata: dict = {}, unmask_columns: Optional[List[str]] = None, validate_sql: Optional[bool] = True, asset_type: Optional[str] = None)

Place data on your Access Point for use by yourself or others.

If dataset is a .csv dataset, it is assumed the first row is a header containing column names. Column validation will be performed. Use CSVDataset.position() for more options, including auto-renaming.

Args

file_handle : str, Path, Package or io.BufferedReader
File handle or path to the data to place on the API user's associated Access Point.
name : str
Name of the new asset.
desc : str
Description of the new asset (can include markdown)
is_discoverable : bool, optional
Should this asset be listed in the Router index to be found and used by others?
k_grouping : int, optional
The minimum count of records with like values required for reporting.
allow_overwrite : bool, optional
If False an exception will be thrown if the asset name already exists. If True, an existing asset will be overwritten.
session : Session, optional
A connection session. If not specified, the default session is used.
is_dataset : bool, optional
Is this a dataset? (False == algorithm)
custom_protocol
Internal use
metadata : dict
Custom metadata to include in the asset
unmask_columns : [str], optional
When is_dataset=True, list of column names that will be initially unmasked. Default is to mask all columns.
validate_sql : bool, optional
If True (the default) the query syntax is checked for common SQL syntax errors.
asset_type : str, optional
Type of asset to be positioned. Default is 'dataset'. Other options: 'algorithm' or 'report'.

Raises

SystemExit
SQL syntax errors were found in query.

Returns

Asset
New asset on the Router, or None on failure
def upload(file_handle: io.BufferedReader, name: str, desc: str, is_discoverable: Optional[bool] = False, allow_overwrite: Optional[bool] = False, session: Optional[Session] = None, is_dataset: bool = True, custom_protocol: Optional[Any] = None)

Deprecated, use Asset.position() instead.

Instance variables

var accesspoint_filename : str

str: Disk filename on the Access Point which holds this asset.

var activate_date : dt.datetime

str: Date when this asset became active

var deactivate_date : dt.datetime

str: Date when this asset was archived (deleted)

var desc : str

str: Longer description of the asset

var filename : str

str: Filename associated with the asset

var hash : str

str: Hash that was registered at the router when this asset was positioned.

var is_active : bool

bool: Is this an active asset?

var is_discoverable : bool

bool: True if anyone can discover the asset on the Router

var is_valid : bool

Verify that this Asset object points to a valid dataset.

Returns

bool
True if this is a valid Asset on the Router
var k_grouping : Optional[int]

int: The minimum count of records with like values required for reporting.

var metadata : dict

str: Asset metadata (e.g. 'cols' on some datasets)

var name : str

str: Simple short name of the asset

var namespace : UUID

str: Namespace which contains the asset. Each user has a namespace, so generally assets exist under the namespace of the creator. Think of this like a personal folder within your organization to hold your assets.

var team : str

str: Name of the team that owns the asset

var team_id : str

str: ID of the team that owns the asset

Methods

def add_agreement(self, with_team: int, operation: "Union[Operation, UUID, 'Asset']" = None, expiration_date: str = None, num_uses: int = None, algorithm_security: str = 'smpc', session: Optional[Session] = None)

Establish Agreement allowing another team to use this Asset

Args

with_team : int
ID of the partner team, or "ANY" to make an Asset available to everyone without explicit permission.
operation : Operation, UUID or Asset
The action being enabled by this Agreement against this Asset. If an Asset is provided, it will be treated as a algorithmic operation applied to this Asset (e.g. allowing a trained model to run against a dataset)
expiration_date : str
ISO formatted date on which the Agreement becomes invalid, or None for no expiration.
num_uses : int
The number of jobs that can be created under the Agreement before it becomes invalid, or None no limit.
algorithm_security : str
"smpc" or "fed". Specifies the level of algorithm security required to run the operation.
session : Session, optional
A connection session. If not specified the default session is used.

Returns

Agreement
New agreement, None if unable to create
def archive(self, session: Optional[Session] = None, remote_delete: bool = False) -> bool

Remove asset from the Router index (and optionally the Access Point)

Args

session : Session, optional
A connection session. If not specified the default session is used.
remote_delete : bool, optional
Delete the underlying file from the Access Point? Default is to leave the file on the Access Point's attached storage.

Raises

TripleblindAssetError
Thrown when unable to archive the asset.
TripleblindAPIError
Thrown when unable to talk to Router

Returns

bool
Was asset successfully deleted.
def delete(self, session: Optional[Session] = None) -> bool

Deprecated, use Asset.archive() instead.

def download(self, save_as: Optional[str] = None, overwrite: Optional[bool] = False, show_progress: Optional[bool] = False, session: Optional[Session] = None) -> str

Deprecated. Use Asset.retrieve() instead

def list_agreements(self, session: Optional[Session] = None)

List agreements governing your Asset.

Args

session : Session, optional
A connection session. If not specified the default session is used.

Returns

list
List of Agreement objects connected to the Asset.
def publish_to_team(self, to_team: int, session: Optional[Session] = None, algorithm_security: str = 'smpc')

Expose existence of Asset to a specified team; still requiring explicit usage approval

Args

with_team : int
ID of the partner team.
session : Session, optional
A connection session. If not specified the default session is used.
algorithm_security : str, optional
Acceptable security level of algorithms to run with ("fed" or "smpc").

Returns

Agreement
The new Agreement object
def retrieve(self, save_as: Optional[str] = None, overwrite: Optional[bool] = False, show_progress: Optional[bool] = False, session: Optional[Session] = None) -> str

Fetch an asset package and save locally.

NOTE: Asset packages are .zip file format. These files can be easily accessed via tb.Package.load(filename) or the ZipFile default Python library.

Args

save_as : str, optional
Filename to save under, None to use default filename in the current directory.
overwrite : bool, optional
Should this overwrite an already existing file?
show_progress : bool, optional
Display progress bar?
session : Session, optional
A connection session. If not specified, the default session is used.

Raises

TripleblindAPIError
Authentication failure
IOError
File already exists and no 'overwrite' flag
TripleblindAssetError
Failed to retrieve

Returns

str
Absolute path to the saved file
class AzureBlobStorageDataset (uuid: UUID)

A table stored in CSV format file inside an Azure Blob Storage Account.

Ancestors

Static methods

def create(storage_account_name: str, storage_key: str, file_system: str, key: str, name: str, desc: str, is_discoverable: Optional[bool] = False, k_grouping: Optional[int] = 5, allow_overwrite: Optional[bool] = False, session: Optional[Session] = None, unmask_columns: Optional[List[str]] = None) -> AzureDataLakeStorageDataset

Create a dataset connection to a CSV file in Azure Blob Storage.

This dataset will be 'live', any updates that are made to the file resting in Azure Blob Store will be available to this dataset the next time it is used.

Args

storage_account_name : str
The Azure storage account to reference.
storage_key : str
An access token for reading from the storage account.
file_system : str
The file system defined in the Azure control panel for the storage account.
key : str
The key associated with the data in Azure Blob Storage.
name : str
Name of the new asset.
desc : str
Description of the new asset (can include markdown)
is_discoverable : bool, optional
Should this asset be listed in the Router index to be found and used by others?
k_grouping : int, optional
The minimum count of records with like values required for reporting.
allow_overwrite : bool, optional
If False an exception will be thrown if the asset name already exists. If True, an existing asset will be overwritten.
session : Session, optional
A connection session. If not specified, the default session is used.
unmask_columns : [str], optional
List of column names that will be initially unmasked. Default is to mask all columns.

Returns

AzureBlobStorageDataset
New asset on the Router, or None on failure

Inherited members

class AzureDataLakeStorageDataset (uuid: UUID)

A table stored in CSV format at a given file path inside an Azure Data Lake Instance.

Ancestors

Static methods

def create(storage_account_name: str, storage_key: str, file_system: str, path: str, name: str, desc: str, is_discoverable: Optional[bool] = False, k_grouping: Optional[int] = 5, allow_overwrite: Optional[bool] = False, session: Optional[Session] = None, unmask_columns: Optional[List[str]] = None) -> AzureDataLakeStorageDataset

Create a dataset connected to a CSV file within Azure Data Lake Storage.

This dataset will be 'live', any updates that are made to the file resting in the Data Lake will be available to this dataset the next time it is used.

Args

storage_account_name : str
The Azure storage account to reference.
storage_key : str
An access token for reading from the storage account.
file_system : str
The file system defined in the Azure control panel for the storage account.
path : str
The full path to the file within Azure Data Lake Storage.
name : str
Name of the new asset.
desc : str
Description of the new asset (can include markdown)
is_discoverable : bool, optional
Should this asset be listed in the Router index to be found and used by others?
k_grouping : int, optional
The minimum count of records with like values required for reporting.
allow_overwrite : bool, optional
If False an exception will be thrown if the asset name already exists. If True, an existing asset will be overwritten.
session : Session, optional
A connection session. If not specified, the default session is used.
unmask_columns : [str], optional
List of column names that will be initially unmasked. Default is to mask all columns.

Returns

AzureDataLakeStorageDataset
New asset on the Router, or None on failure

Inherited members

class BigQueryDatabase (uuid: UUID)

A table asset backed by a view from a BigQuery database.

Ancestors

Static methods

def create(gcp_project: str, bigquery_dataset: str, credentials: Union[str, Path], query: str, name: str, desc: str, is_discoverable: Optional[bool] = False, k_grouping: Optional[int] = 5, allow_overwrite: Optional[bool] = False, session: Optional[Session] = None, unmask_columns: Optional[List[str]] = None, validate_sql: Optional[bool] = True) -> DatabaseDataset

Create a connection to a BigQuery database

Args

gcp_project : str
The project name of your Google Cloud Project which will be used to cover any query access costs.
bigquery_dataset : str
The BigQuery dataset name.
credentials : str or Path
The path of your keyfile.json. See the Google documentation for more details. These credentials will be stored securely on your Access Point; neither TripleBlind nor anyone using your dataset will have access to it.
query : str
The SQL query to generate a view on the database.
name : str
Name of the new asset.
desc : str
Description of the new asset (can include markdown)
is_discoverable : bool, optional
Should this asset be listed in the Router index to be found and used by others?
k_grouping : int, optional
The minimum count of records with like values required for reporting.
allow_overwrite : bool, optional
If False an exception will be thrown if the asset name already exists. If True, an existing asset will be overwritten.
session : Session, optional
A connection session. If not specified, the default session is used.
unmask_columns : [str], optional
List of column names that will be initially unmasked. Default is to mask all columns.
validate_sql : bool, optional
If True (the default) the query syntax is checked for common SQL syntax errors.

Raises

SystemExit
SQL syntax errors were found in query.

Returns

DatabaseDataset
New asset on the Router, or None on failure

Inherited members

class CSVDataset (uuid: UUID)

An static table asset, typically a CSV text file.

Ancestors

Static methods

def create(datafile: Union[str, Path], name: str, desc: str, header: List[str] = None, is_discoverable: Optional[bool] = False, k_grouping: Optional[int] = 5, allow_overwrite: Optional[bool] = False, session: Optional[Session] = None, unmask_columns: Optional[List[str]] = None, auto_rename_columns: Optional[bool] = False) -> TabularDataset

Place a CSV (Comma-separated value) file on your Access Point.

Args

datafile : str, Path
Path to the data to place on the API user's associated Access Point.
name : str
Name of the new asset.
desc : str
Description of the new asset (can include markdown)
header : List[str], optional
A list of names to use as column headers. If None, headers will be detected in first row. If none exist, generic headers will be created for the dataset.
is_discoverable : bool, optional
Should this asset be listed in the Router index to be found and used by others?
k_grouping : int, optional
The minimum count of records with like values required for reporting.
allow_overwrite : bool, optional
If False an exception will be thrown if the asset name already exists. If True, an existing asset will be overwritten.
session : Session, optional
A connection session. If not specified, the default session is used.
unmask_columns : [str], optional
List of column names that will be initially unmasked. Default is to mask all columns.

Returns

TabularDataset
New asset on the Router, or None on failure
def position(file_handle: Union[str, Path, Package, io.BufferedReader], name: str, desc: str, is_discoverable: Optional[bool] = False, k_grouping: Optional[int] = 5, allow_overwrite: Optional[bool] = False, session: Optional[Session] = None, is_dataset: bool = True, custom_protocol: Optional[Any] = None, metadata: dict = {}, unmask_columns: Optional[List[str]] = None, header: List[str] = None, has_header: Optional[bool] = True, check_col_names: Optional[bool] = True, auto_rename_columns: Optional[bool] = False)

Place tabular data on your Access Point for use by yourself or others.

Args

file_handle : str, Path, Package or io.BufferedReader
File handle or path to the data to place on the API user's associated Access Point.
name : str
Name of the new asset.
desc : str
Description of the new asset (can include markdown)
is_discoverable : bool, optional
Should this asset be listed in the Router index to be found and used by others?
k_grouping : int, optional
The minimum count of records with like values required for reporting.
allow_overwrite : bool, optional
If False an exception will be thrown if the asset name already exists. If True, an existing asset will be overwritten.
session : Session, optional
A connection session. If not specified, the default session is used.
is_dataset : bool, optional
Is this a dataset? (False == algorithm)
custom_protocol
Internal use
metadata : dict
Custom metadata to include in the asset
unmask_columns : [str], optional
When is_dataset=True, list of column names that will be initially unmasked. Default is to mask all columns.
header : List[str], optional
A list of names to use as column headers. If None, headers will be detected in first row. If none exist, generic headers will be created for the dataset.
has_header : bool, Optional
Does the dataset have a header row containing column names?
check_col_names : bool, Optional
Should column name validation be performed at position time? Enabled by default.
auto_rename_columns : bool, Optional
Should invalid column names be altered to be made legal at position time? Disabled by default.

Returns

Asset
New asset on the Router, or None on failure

Raises

Exception("Error creating asset: Invalid field names.")

def rename_columns(invalid_col_names: list, df: Optional[pd.Dataframe] = None)

Adjust column names in dataframe to pass required validations

Tested/corrected name characteristics: - Name cannot be longer than 64 chars - Name cannot start with a number - Name can only contain alphanumeric characters and underscores

Args

invalid_col_names : list
Names determined to be invalid
df : pd.Dataframe, optional
Dataframe to be corrected

Returns

If dataframe is provided, renames columns in place. If no dataframe is provided, returns list of new column names.

Inherited members

class DatabaseDataset (uuid: UUID)

A live table asset backed by a database.

Most implementations of this class utilize a connection string to define the database, user and credentials to access the data view represented by the asset. Connection strings can be templated using Mustache, allowing for secrets to be used in the connection string. For example, a connection string could be defined as: "mssql+pyodbc://{{secret_username}}:{{secret_password}}@myserver:3306/payroll" Or secrets could be included in the parameters: username="{{secret_username}}", password="{{secret_password}}" if using a create method which accepts those parameters. See "Using Named Secrets" under https://dev.tripleblind.app/portal/docs/user-guide/asset-owner-operations for more details.

Certain implementations may have helper methods to create the connection string. In those cases, the fields which allow for secrets are documented in their create() method signature.

Ancestors

Subclasses

Static methods

def create(connection: str, query: str, name: str, desc: str, options: Optional[dict] = None, credentials_info: Optional[dict] = None, is_discoverable: Optional[bool] = False, k_grouping: Optional[int] = 5, allow_overwrite: Optional[bool] = False, session: Optional[Session] = None, unmask_columns: Optional[List[str]] = None, validate_sql: Optional[bool] = True) -> DatabaseDataset

Create a dataset connected to a traditional database.

Unlike other Datasets a DatabaseDataset is 'live'. Every computation will query the connected database using the given query.

Args

connection : str
A supported connection str, such as "snowflake://youruser:yourpassword@account/yourdatabase"
query : str
The SQL query to generate a view on the database.
name : str
Name of the new asset.
desc : str
Description of the new asset (can include markdown).
options : dict, optional
Connection options to provide to the database connection.
credentials_info : dict, optional
Optional dictionary containing the credentials to use for the database connection. This is only necessary for certain databases where the connection string does not contain the credentials.
is_discoverable : bool, optional
Should this asset be listed in the Router index to be found and used by others?
k_grouping : int, optional
The minimum count of records with like values required for reporting.
allow_overwrite : bool, optional
If False an exception will be thrown if the asset name already exists. If True, an existing asset will be overwritten.
session : Session, optional
A connection session. If not specified, the default session is used.
unmask_columns : [str], optional
List of column names that will be initially unmasked. Default is to mask all columns.
validate_sql : bool, optional
If True (the default) the query syntax is checked for common SQL syntax errors.

Raises

SystemExit
SQL syntax errors were found in query.

Returns

DatabaseDataset
New asset on the Router, or None on failure

Methods

def unmask_columns(self, col_names: Union[str, List[str]], session: Optional[Session] = None) -> bool

Unmask columns identified by the supplied list of col_names.

Args

col_names : List or str
Column name or list of names to unmask.
session : Session, optional
A connection session. If not specified, the default session is used.

Returns

True if the operation succeeded, otherwise false.

Inherited members

class DatabricksDatabase (uuid: UUID)

A table asset backed by a view from a Databricks database.

Ancestors

Static methods

def create(access_token: str, server_hostname: str, http_path: str, catalog: str, schema: str, query: str, name: str, desc: str, is_discoverable: Optional[bool] = False, k_grouping: Optional[int] = 5, allow_overwrite: Optional[bool] = False, session: Optional[Session] = None, unmask_columns: Optional[List[str]] = None, validate_sql: Optional[bool] = True) -> DatabaseDataset

Create a connection to a Databricks database

You can find the connection details for your Databricks cluster in the Databricks UI. Under the Compute in the sidebar, choose your target cluster. Under the Configuration tab for that cluster expand Advanced Options and choose the JDBC/ODBC tab, where you will find the needed values. See the Databricks documentation for more details: https://docs.databricks.com/en/integrations/compute-details.html

Args

access_token : str
A Databricks access token or a secret name. For example, "dapi1234567890abcdef"
server_hostname : str
The Databricks server name or a secret name. For example, "community.cloud.databricks.com"
http_path : str
The Databricks server name or a secret name. For example, "/sql/protocolv1/o/1234567890123456/0123-456789-abc123"
catalog : str
The Databricks catalog name or a secret name. For example, "default"
schema : str
The Databricks schema name or a secret name. For example, "default"
query : str
The SQL query to generate a view on the database.
name : str
Name of the new asset.
desc : str
Description of the new asset (can include markdown)
is_discoverable : bool, optional
Should this asset be listed in the Router index to be found and used by others?
k_grouping : int, optional
The minimum count of records with like values required for reporting.
allow_overwrite : bool, optional
If False an exception will be thrown if the asset name already exists. If True, an existing asset will be overwritten.
session : Session, optional
A connection session. If not specified, the default session is used.
unmask_columns : [str], optional
List of column names that will be initially unmasked. Default is to mask all columns.
validate_sql : bool, optional
If True (the default) the query syntax is checked for common SQL syntax errors.

Raises

SystemExit
SQL syntax errors were found in query.

Returns

DatabaseDataset
New asset on the Router, or None on failure

Inherited members

class DatasetAsset (uuid: UUID)

An abstract Asset containing a set of data of some form.

Ancestors

Subclasses

Inherited members

class DicomDataset (uuid: UUID)

An Asset containing DICOM imaging files.

DICOM is a standard for serializing medical imaging data, such as X-ray, CT, MRI, ultrasound, etc.

Ancestors

Inherited members

class ElasticReturnType (value, names=None, *, module=None, qualname=None, type=None, start=1)

The type of data returned by an Elasticsearch query.

Values

AGG_JSON: Only return the aggregations from the query as JSON HITS_JSON: Only return the hits from the query as JSON FULL_JSON: Return the full response from the query as JSON

Ancestors

  • enum.Enum

Class variables

var AGG_JSON
var FULL_JSON
var HITS_JSON
class ElasticsearchDataset (uuid: UUID, connection: str = '', index: str = '', api_key: str = '', body: dict = None, return_type: ElasticReturnType = ElasticReturnType.AGG_JSON, store_type: Optional[ElasticReturnType] = None)

Elasticsearch dataset asset.

Ancestors

Class variables

var api_key : str
var body : dict
var connection : str
var index : str
var return_typeElasticReturnType
var store_type : Optional[ElasticReturnType]

Static methods

def cast(asset: Asset) -> ElasticsearchDataset

Convert a generic Asset into a ElasticsearchDataset.

This should only be used on an asset known to be an ElasticsearchDataset, no validation occurs during the cast.

Args

asset : Asset
A generic Asset

Returns

DatabaseDatasourceAsset
A DatabaseDatasourceAsset object
def create(connection: str, api_key: str, index: str, body: dict, name: str, desc: str, return_type: ElasticReturnType = ElasticReturnType.AGG_JSON, store_type: Optional[ElasticReturnType] = None, is_discoverable: Optional[bool] = False, allow_overwrite: Optional[bool] = False, session: Optional[Session] = None, k_grouping: Optional[int] = 5) -> Optional[ElasticsearchDataset]

Create a new ElasticsearchDataset.

Args

connection : str
The connection string to the Elasticsearch server
api_key : str
The API key to use for the connection
index : str
The index to query
body : dict
The request body to send to elastic search
name : str
The name of the asset
desc : str
The description of the asset
return_type : ElasticReturnType, optional
The type of data to return to algoritms using this asset.
store_type : ElasticReturnType, optional
The type of data to store on the data owners access point each
time the asset is queried. If None, then no data will be retained on the data owners access point.
is_discoverable : bool, optional
Whether the asset is discoverable
allow_overwrite : bool, optional
Whether to allow overwriting an existing asset
session : Session, optional
A connection session. If not specified, the default session is used.
k_grouping : int, optional
The minimum count of records with like values.

Returns

ElasticsearchDataset
The new asset
def find(search: Optional[Union[str, re.Pattern]], namespace: Optional[UUID] = None, owned: Optional[bool] = False, owned_by: Optional[int] = None, session: Optional[Session] = None, exact_match: Optional[bool] = True) -> Optional[ElasticsearchDataset]

Search the Router index for an asset matching the given search

Args

search : str or re.Pattern, optional
Either an asset ID or a search pattern applied to asset names and descriptions. A simple string will match a substring or the entire string if exact_match is True, or a regular expression can be passed for complex searches.
namespace : UUID, optional
The UUID of the user to which this asset belongs. None indicates any user, NAMESPACE_DEFAULT_USER indicates the current API user.
owned : bool, optional
Only return owned assets (either personally or by the current user's team)
owned_by : int, optional
Only return owned assets owned by the given team ID
session : Session, optional
A connection session. If not specified, the default session is used.
exact_match : bool, optional
When the 'search' is a string, setting this to True will perform an exact match. Ignored for regex patterns, defaults to True.

Raises

TripleblindAssetError
Thrown when multiple assets are found which match the search.

Returns

ReportAsset
A single asset, or None if no match found

Inherited members

class ImageDataset (uuid: UUID)

An Asset containing a set of images and optionally labels.

Ancestors

Inherited members

class MSSQLDatabase (uuid: UUID)

A table asset backed by a view from a Microsoft SQL database.

Ancestors

Static methods

def create(host: str, port: int, database: str, query: str, name: str, desc: str, username: Optional[str] = None, password: Optional[str] = None, options: Optional[dict] = {}, is_discoverable: Optional[bool] = False, k_grouping: Optional[int] = 5, allow_overwrite: Optional[bool] = False, session: Optional[Session] = None, unmask_columns: Optional[List[str]] = None, validate_sql: Optional[bool] = True) -> DatabaseDataset

Create a connection to a Microsoft SQL Database.

Args

host : str
The host name of the Microsoft SQL database or a secret name. Example: testsqlserver123.database.windows.net
port : int
The port number of the Microsoft SQL database.
database : str
The name of the Microsoft SQL database to connect to or a secret name. Example: "dev" or "{{secret_database_name}}".
query : str
The SQL query to generate a view on the database.
name : str
Name of the new asset.
desc : str
Description of the new asset (can include markdown)
username : str, optional
Username to use in the database connection, like "myuser" or a secret name like "{{secret_username}}".
password : str, optional
Password to use in the database connection or a secret name.
options : dict, optional
Dictionary of connection options for connecting to the Microsoft SQL database. For supported connection options see https://learn.microsoft.com/en-us/sql/connect/odbc/dsn-connection-string-attribute?view=sql-server-ver16#supported-dsnconnection-string-keywords-and-connection-attributes NOTE: The driver parameter is not required and the connection will use the access point's version of the driver. Example: options={ "authentication": "ActiveDirectoryMsi" }
is_discoverable : bool, optional
Should this asset be listed in the Router index to be found and used by others?
k_grouping : int, optional
The minimum count of records with like values required for reporting.
allow_overwrite : bool, optional
If False an exception will be thrown if the asset name already exists. If True, an existing asset will be overwritten.
session : Session, optional
A connection session. If not specified, the default session is used.
unmask_columns : [str], optional
List of column names that will be initially unmasked. Default is to mask all columns.
validate_sql : bool, optional
If True (the default) the query syntax is checked for common SQL syntax errors.

Raises

SystemExit
SQL syntax errors were found in query.

Returns

DatabaseDataset
New asset on the Router, or None on failure

Inherited members

class MongoDatabase (uuid: UUID)

A table asset backed by a view from a Mongo database.

Ancestors

Static methods

def create(connection_str: str, query: dict, database: str, collection: str, name: str, desc: str, projection: dict = None, is_discoverable: Optional[bool] = False, k_grouping: Optional[int] = 5, allow_overwrite: Optional[bool] = False, session: Optional[Session] = None, unmask_columns: Optional[List[str]] = None, limit: Optional[int] = None, sort: Optional[List[Tuple]] = None) -> Asset

Create a connection to a MongoDB database

Args

connection_str : str
A Mongo DB connection URI, such as "mongodb://user:[email protected]:27017/". Secrets can be included in the connection string using Mustache templating of named secrets, e.g. "mongodb://{{MY_USER}}:{{MY_PWD}}@mongo.host:{{MY_PORT}}/" Any portion of the connection string can be templated, including the host, username, password, database, etc.
query : dict
A MongoDB JSON query to generate a view on the database.
database : str
Name of the MongoDB database.
collection : str
Collection inside of MongoDB database to query
name : str
Name of the new asset.
desc : str
Description of the new asset (can include markdown)
projection : dict, optional
MongoDB projection to manipulate output structure. Default is None.
is_discoverable : bool, optional
Should this asset be listed in the Router index to be found and used by others?
k_grouping : int, optional
The minimum count of records with like values required for reporting.
allow_overwrite : bool, optional
If False an exception will be thrown if the asset name already exists. If True, an existing asset will be overwritten.
session : Session, optional
A connection session. If not specified, the default session is used.
unmask_columns : [str], optional
List of column names that will be initially unmasked. Default is to mask all columns.
limit : int, optional
Query results will be at most limit documents
sort : [Tuple], optional
List of tuples representing how the query results should be sorted. For more details, see https://www.mongodb.com/docs/manual/reference/method/cursor.sort/#std-label-sort-asc-desc.

Returns

Asset
New asset on the Router, or None on failure

Inherited members

class NeuralNetwork (uuid: UUID)

A neural network that takes one or more Datasets as input

Ancestors

Static methods

def create(model: Union[str, Path], name: str, desc: str, is_discoverable: Optional[bool] = False, allow_overwrite: Optional[bool] = False, session: Optional[Session] = None) -> Asset

Place a pretrained Neural Network file on your Access Point.

Args

model : str, Path
Path to the neural network. This accepts a Keras .h5 model file.
name : str
Name of the new asset.
desc : str
Description of the new asset (can include markdown)
is_discoverable : bool, optional
Should this asset be listed in the Router index to be found and used by others?
allow_overwrite : bool, optional
If False an exception will be thrown if the asset name already exists. If True, an existing asset will be overwritten.
session : Session, optional
A connection session. If not specified, the default session is used.

Returns

Asset
New asset on the Router, or None on failure

Inherited members

class NumPyDataset (uuid: UUID)

A generic n-dimensional array. Used to represent arbitrary data.

Ancestors

Static methods

def create()

Inherited members

class OracleDatabase (uuid: UUID)

A table asset backed by a view from an Oracle database.

Ancestors

Static methods

def create(host: str, port: int, database: str, query: str, name: str, desc: str, username: Optional[str] = None, password: Optional[str] = None, options: Optional[dict] = {}, is_discoverable: Optional[bool] = False, k_grouping: Optional[int] = 5, allow_overwrite: Optional[bool] = False, session: Optional[Session] = None, unmask_columns: Optional[List[str]] = None, validate_sql: Optional[bool] = True) -> DatabaseDataset

Create a connection to an Oracle Database.

Args

host : str
The host name of the Oracle database or a secret name. Example: testoracle123.database.net
port : int
The port number of the Oracle database. The port for most Oracle databases is 1521.
database : str
The name of the Oracle database to connect to or a secret name. Example: "dev" or "{{secret_database_name}}".
query : str
The SQL query to generate a view on the database.
name : str
Name of the new asset.
desc : str
Description of the new asset (can include markdown)
username : str, optional
Username to use in the database connection, like "myuser" or a secret name like "{{secret_username}}".
password : str, optional
Password to use in the database connection or a secret name.
options : dict, optional
Dictionary of connection options for connecting to the Oracle database. For supported connection options see https://www.oracle.com/database/technologies/appdev/python/quickstartpython.html#connect-python-cx_oracle-connecting
is_discoverable : bool, optional
Should this asset be listed in the Router index to be found and used by others?
k_grouping : int, optional
The minimum count of records with like values required for reporting.
allow_overwrite : bool, optional
If False an exception will be thrown if the asset name already exists. If True, an existing asset will be overwritten.
session : Session, optional
A connection session. If not specified, the default session is used.
unmask_columns : [str], optional
List of column names that will be initially unmasked. Default is to mask all columns.
validate_sql : bool, optional
If True (the default) the query syntax is checked for common SQL syntax errors.

Raises

SystemExit
SQL syntax errors were found in query.

Returns

DatabaseDataset
New asset on the Router, or None on failure

Inherited members

class PMMLRegression (uuid: UUID)

A Predictive Model Markup Language (PMML) model that takes one or more Datasets as input

Ancestors

Static methods

def create(model: Union[str, Path], name: str, desc: str, is_discoverable: Optional[bool] = False, allow_overwrite: Optional[bool] = False, session: Optional[Session] = None) -> Asset

Place a PMML Regression model file on your Access Point.

Args

model : str, Path
Path to the PMML model file.
name : str
Name of the new asset.
desc : str
Description of the new asset (can include markdown)
is_discoverable : bool, optional
Should this asset be listed in the Router index to be found and used by others?
allow_overwrite : bool, optional
If False an exception will be thrown if the asset name already exists. If True, an existing asset will be overwritten.
session : Session, optional
A connection session. If not specified, the default session is used.

Returns

Asset
New asset on the Router, or None on failure

Inherited members

class PMMLTree (uuid: UUID)

Creates asset for PMML Tree models.

Ancestors

Static methods

def create(model: Union[str, Path], name: str, desc: str, is_discoverable: Optional[bool] = False, allow_overwrite: Optional[bool] = False, session: Optional[Session] = None) -> Asset

Place a PMML Tree model file on your Access Point.

Args

model : str, Path
Path to the PMML model file.
name : str
Name of the new asset.
desc : str
Description of the new asset (can include markdown)
is_discoverable : bool, optional
Should this asset be listed in the Router index to be found and used by others?
allow_overwrite : bool, optional
If False an exception will be thrown if the asset name already exists. If True, an existing asset will be overwritten.
session : Session, optional
A connection session. If not specified, the default session is used.

Returns

Asset
New asset on the Router, or None on failure

Inherited members

class RedshiftDatabase (uuid: UUID)

A table asset backed by a view from a AWS Redshift database.

Ancestors

Static methods

def create(host: str, port: int, database: str, query: str, name: str, desc: str, username: Optional[str] = None, password: Optional[str] = None, options: Optional[dict] = {}, is_discoverable: Optional[bool] = False, k_grouping: Optional[int] = 5, allow_overwrite: Optional[bool] = False, session: Optional[Session] = None, unmask_columns: Optional[List[str]] = None, validate_sql: Optional[bool] = True) -> DatabaseDataset

Create a connection to a Redshift database

Args

host : str
The host name of the Redshift database or a secret name. Example: default.528.us-east-2.redshift-serverless.amazonaws.com
port : int
The port number of the Redshift database.
database : str
The name of the Redshift database to connect to or or a secret name. Example: "dev" or "{{secret_database_name}}".
query : str
The SQL query to generate a view on the database.
name : str
Name of the new asset.
desc : str
Description of the new asset (can include markdown)
username : str, optional
Username to use in the database connection, like "myuser" or a secret name like "{{secret_username}}".
password : str, optional
Password to use in the database connection or a secret name.
options : dict, optional
Dictionary of connection options for connecting to the Redshift database. Supported options are described at https://docs.aws.amazon.com/redshift/latest/mgmt/python-configuration-options.html. Example using IAM keys: options={ "iam": True, "access_key_id": "AKFCXNRSVRCFGMRQCAQR", "secret_access_key": "bEGzX7QnOb7eK9CRt4CV97n4e/bKOtQUFd9/pgIc" }
is_discoverable : bool, optional
Should this asset be listed in the Router index to be found and used by others?
k_grouping : int, optional
The minimum count of records with like values required for reporting.
allow_overwrite : bool, optional
If False an exception will be thrown if the asset name already exists. If True, an existing asset will be overwritten.
session : Session, optional
A connection session. If not specified, the default session is used.
unmask_columns : [str], optional
List of column names that will be initially unmasked. Default is to mask all columns.
validate_sql : bool, optional
If True (the default) the query syntax is checked for common SQL syntax errors.

Raises

SystemExit
SQL syntax errors were found in query.

Returns

DatabaseDataset
New asset on the Router, or None on failure

Inherited members

class S3Dataset (uuid: UUID)

A table asset stored in an Amazon Web Services S3 bucket.

Ancestors

Static methods

def create(bucket_name: str, region: str, object_name: str, aws_access_key_id: str, aws_secret_access_key: str, name: str, desc: str, is_discoverable: Optional[bool] = False, k_grouping: Optional[int] = 5, allow_overwrite: Optional[bool] = False, session: Optional[Session] = None, unmask_columns: Optional[List[str]] = None) -> TabularDataset

Creates an asset from an AWS S3 bucket.

Args

bucket_name : str
The AWS bucket name to contain the file
region : str
The AWS region containing this bucket (eg. "us-east-1")
object_name : str
The name of the file in the S3 bucket
aws_access_key_id : str
This accounts AWS Access Key ID
aws_secret_access_key : str
This accounts AWS Secret Access Key
name : str
Name of the new asset.
desc : str
Description of the new asset (can include markdown)
is_discoverable : bool, optional
Should this asset be listed in the Router index to be found and used by others?
k_grouping : int, optional
The minimum count of records with like values required for reporting.
allow_overwrite : bool, optional
If False an exception will be thrown if the asset name already exists. If True, an existing asset will be overwritten.
session : Session, optional
A connection session. If not specified, the default session is used.
unmask_columns : [str], optional
List of column names that will be initially unmasked. Default is to mask all columns.

Returns

TabularDataset
New asset on the Router, or None on failure

Inherited members

class SnowflakeDatabase (uuid: UUID)

A table asset backed by a view from a Snowflake database.

Ancestors

Static methods

def create(snowflake_username: str, snowflake_password: str, snowflake_account: str, snowflake_warehouse: str, snowflake_database: str, snowflake_schema: str, role: str, query: str, name: str, desc: str, is_discoverable: Optional[bool] = False, k_grouping: Optional[int] = 5, allow_overwrite: Optional[bool] = False, session: Optional[Session] = None, unmask_columns: Optional[List[str]] = None, validate_sql: Optional[bool] = True) -> DatabaseDataset

Create a connection to a Snowflake database

Args

snowflake_username : str
Your Snowflake username, like "myuser" or a secret name like "{{secret_username}}".
snowflake_password : str
Your Snowflake password or a secret name like "{{secret_password}}".
snowflake_account : str
You Snowflake account or a secret name. This is the start of the URL when you visit your console. For example, if the URL is https://ab12345.us-central1.gcp.snowflakecomputing.com/ then your snowflake_account is "ab12345.us-central1.gcp".
snowflake_warehouse : str
The name of the Snowflake warehouse you are connecting to for the query or a secret name.
snowflake_database : str
The name of the Snowflake database you are connecting to for the query or a secret name.
snowflake_schema : str
The name of the Snowflake schema you are connecting to for the query or a secret name.
role : str
The role of the Snowflake user you are using to connect to the Snowflake database or a secret name.
query : str
The SQL query to generate a view on the database.
name : str
Name of the new asset.
desc : str
Description of the new asset (can include markdown)
is_discoverable : bool, optional
Should this asset be listed in the Router index to be found and used by others?
k_grouping : int, optional
The minimum count of records with like values required for reporting.
allow_overwrite : bool, optional
If False an exception will be thrown if the asset name already exists. If True, an existing asset will be overwritten.
session : Session, optional
A connection session. If not specified, the default session is used.
unmask_columns : [str], optional
List of column names that will be initially unmasked. Default is to mask all columns.
validate_sql : bool, optional
If True (the default) the query syntax is checked for common SQL syntax errors.

Raises

SystemExit
SQL syntax errors were found in query.

Returns

DatabaseDataset
New asset on the Router, or None on failure

Inherited members

class TabularDataset (uuid: UUID)

An abstract for data stored in rows and columns, like a spreadsheet.

Each column is a field and each row is a single record with one of each column.

Ancestors

Subclasses

Inherited members

class XGBoostModel (uuid: UUID)

A Scikit-learn XGBoost model

Ancestors

Static methods

def train(training_data: Union[Asset, List[Asset]], datatype: str, target_var: str, variables: Union[str, List[str]], custom_preprocessor: "'TabularPreprocessor'" = None, silent: bool = False, is_regression: bool = False, job_name: Optional[str] = None, session: Optional[Session] = None) -> XGBoostModel

Create an XGBoost model trained on the provided dataset(s).

Args

training_data : Union[Asset, List[Asset]]
Table(s) to use as training data. Can be Assets or paths to local files.
datatype : str
The format of the data, using numpy dtypes. E.g. "float32", "int64", etc. Ignored if preprocessor is provided.
target_var : str
The name of the column containing the training target value. Ignored if preprocessor is provided.
variables : str, List[str]
A list of column names containing variables to include in training, or "ALL" for every column. Ignored if preprocessor is provided.
custom_preprocessor : TabularPreprocessor, optional
A custom preprocessor, overriding the standard built from datatype, target_var and variables.
silent : bool, optional
Suppress status messages during execution? Default is to show messages.
is_regression : bool, optional
Is this a regression? Otherwise a classification model is built.
job_name : Optional[str], optional
The name associate with the job. Default name is "XGBoost Training".
session : Optional[Session], optional
A connection session. If not specified, the default session is used.

Raises

TripleblindTrainingError
XGBoost Model training failed

Returns

XGBoostModel
The trained model, or None if training fails.

Methods

def predict(self, inference_data: Union[Asset, List[Asset]], use_smpc: bool, custom_preprocessor: "'TabularPreprocessor'" = None, silent: bool = False, job_name: Optional[str] = None, session: Optional[Session] = None)

Perform an inference against a trained XGBoost model.

Output is the most likely classification. See predict_proba() if you want to information on the likelihood of the classification.

Args

inference_data : Union[Asset, List[Asset]]
Table(s) to use as inference data. Can be Assets or paths to local files.
use_smpc : bool
Flag to indicate whether the user wants to use SMPC or FED. If True, SMPC is used. If False, FED is used.
custom_preprocessor : TabularPreprocessor, optional
A custom preprocessor.
silent : bool, optional
Suppress status messages during execution? Default is to show messages.
job_name : Optional[str], optional
The name associated with the job. Default name is "XGBoost Inference".
session : Optional[Session], optional
A connection session. If not specified, the default session is used.

Raises

TripleblindProcessError
XGBoost Remote Inference failed
TripleblindProcessError
Unable to create XGBoost Inference job

Returns

pd.Dataframe
A pandas dataframe.
def predict_proba(self, inference_data: Union[Asset, List[Asset]], use_smpc: bool, custom_preprocessor: "'TabularPreprocessor'" = None, silent: bool = False, job_name: Optional[str] = None, session: Optional[Session] = None)

Perform a predict_proba inference against a trained XGBoost model.

Output of predict_proba() is the probability of each classification. For example a model that classifies into three categories might return: [ 0.2, 0.1, 0.7 ] Meaning the first two classes are 20% and 10% likely for the given input, and the last class is 70% likely.

See predict() if you simply want the most likely classification.

Args

inference_data : Union[Asset, List[Asset]]
Table(s) to use as inference data. Can be Assets or paths to local files.
use_smpc : bool
Flag to indicate whether the user wants to use SMPC or FED. If True, SMPC is used. If False, FED is used.
custom_preprocessor : TabularPreprocessor, optional
A custom preprocessor.
silent : bool, optional
Suppress status messages during execution? Default is to show messages.
job_name : Optional[str], optional
The name associated with the job. Default name is "XGBoost Inference".
session : Optional[Session], optional
A connection session. If not specified, the default session is used.

Raises

TripleblindProcessError
XGBoost Remote Inference failed
TripleblindProcessError
Unable to create XGBoost Inference job

Returns

pd.Dataframe
A pandas dataframe.

Inherited members