Module tripleblind.model_asset

Specialized Asset representing trained models, such as a neural network.

The ModelAsset wraps a generic asset, allowing the complexity of creating jobs to be completely hidden. Common operations can happen with just a few lines of code.

For example:

import tripleblind as tb

# Use a trained model to privately analyze a patient xray
model = tb.ModelAsset("diagnose_disease_model")
result = model.infer(data="xray.jpg")

print(result.table.dataframe)

Classes

class ModelAsset (uuid: UUID)

Points to a dataset or an algorithm indexed on the TripleBlind Router.

Ancestors

Subclasses

Static methods

def cast(asset: Asset) -> ModelAsset

Convert a generic Asset into a ModelAsset

This should only be used on an asset known to be model, no validation occurs during the cast.

Args

asset : Asset
A generic Asset

Returns

ModelAsset
A ModelAsset object
def find(search: Optional[Union[str, re.Pattern]], namespace: Optional[UUID] = None, owned: Optional[bool] = False, owned_by: Optional[int] = None, session: Optional[Session] = None, exact_match: Optional[bool] = True) -> ModelAsset

Search the Router index for an asset matching the given search

Args

search : str or re.Pattern, optional
Either an asset ID or a search pattern applied to asset names and descriptions. A simple string will match a substring or the entire string if exact_match is True, or a regular expression can be passed for complex searches.
namespace : UUID, optional
The UUID of the user to which this asset belongs. None indicates any user, NAMESPACE_DEFAULT_USER indicates the current API user.
owned : bool, optional
Only return owned assets (either personally or by the current user's team)
owned_by : int, optional
Only return owned assets owned by the given teamID
session : Session, optional
A connection session. If not specified, the default session is used.
exact_match : bool, optional
When the 'search' is a string, setting this to True will perform an exact match. Ignored for regex patterns, defaults to True.

Raises

TripleblindAssetError
Thrown when multiple assets are found which match the search.

Returns

ModelAsset
A single asset, or None if no match found

Methods

def infer(self, data: Union[Asset, TableAsset, str, Path, List[Asset], List[TableAsset], List[str], List[Path]], preprocessor: Optional[Union[TabularPreprocessor, List[TabularPreprocessor], TabularPreprocessorBuilder, List[TabularPreprocessorBuilder], ImagePreprocessorBuilder, List[ImagePreprocessorBuilder], NumpyInputPreprocessor, List[NumpyInputPreprocessor], NumpyInputPreprocessorBuilder, List[NumpyInputPreprocessorBuilder]]] = None, params: Optional[Dict] = None, job_name: Optional[str] = None, silent: Optional[bool] = False, session: Optional[Session] = None, stream_output: bool = False, identifier_columns: Optional[Union[List[str], str]] = None) -> Union[JobResultStatusOutputStream]

Perform an inference using a model

NOTE: For inferences which produce textual output, such as a classifier, the result can be easily accessed via code like this:

r = model.infer("data.csv")
print(r.table)

Or the r.table.dataframe can be used as a standard Pandas dataframe.

Args

data : Asset or str
The data to infer against. Can be an Asset or or a path to a file.
preprocessor : Preprocessor
A preprocessor to apply to the data. it not defined, the dataset is used directly.
params : dict
Dictionary of unique parameters for the model. Typically, this is not needed.
job_name : str, optional
Reference name for the job with performs this task.
silent : bool, optional
Suppress status messages during execution? Default is to show messages.
session : Session, optional
A connection session. If not specified, the default session is used.
stream_output : bool, optional
Whether to start the job and return a StatusOutputStream, or wait for job completion and return a JobResult (the default).
identifier_columns : str, List[str], optional
Column or columns which will be returned alongside results. Default is None.

Raises

TripleblindAPIError
Inference failed

Returns

When stream_output is set to False (the default), a JobResult is returned once the job completes. If successful, the inference output is found at result.asset and/or result.table

If stream_output is set to True, a StatusOutputStream object is immediately returned and can be used as a Generator that outputs the status messages produced while the job is running.

def psi_infer(self, data: Union[Asset, List[Asset], TableAsset, List[TableAsset]], match_column: Union[str, List[str]], regression_type: Optional[RegressionType] = None, preprocessor: Optional[Union[TabularPreprocessor, List[TabularPreprocessor], TabularPreprocessorBuilder, List[TabularPreprocessorBuilder]]] = None, params: Optional[Dict] = None, job_name: Optional[str] = None, silent: Optional[bool] = False, session: Optional[Session] = None, stream_output: bool = False) -> Union[JobResultStatusOutputStream]

Perform an inference using a model on distributed data matched with PSI

NOTE: For inferences which produce textual output, such as a classifier, the result can be easily accessed via code like this:

r = model.psi_infer("data.csv")
print(r.table)

Or the r.table.dataframe can be used as a standard Pandas dataframe.

Args

data : Asset or str
The data to infer against.
match_column : Union[str, List[str]]
Name of the column to match. If not the same in all datasets, a list of the matching column names, starting with the initiator asset and then listing a name in each dataset.
regression_type : RegressionType
The type of regression to be performed. If populated, indicates a regression inference will be performed. One of: tb.RegressionType.LINEAR, LOGISTIC
preprocessor : Union[TabularPreprocessor, List[TabularPreprocessor], TabularPreprocessorBuilder, List[TabularPreprocessorBuilder]], optional
A preprocessor to apply to the data. If not defined, the dataset is used directly.
params : dict
Dictionary of unique parameters for the model. Typically, this is not needed.
job_name : str, optional
Reference name for the job with performs this task.
silent : bool, optional
Suppress status messages during execution? Default is to show messages.
session : Session, optional
A connection session. If not specified, the default session is used.
stream_output : bool, optional
Whether to start the job and return a StatusOutputStream, or wait for job completion and return a JobResult (the default).

Returns

When stream_output is set to False (the default), a JobResult is returned once the job completes. If successful, the inference output is found at result.asset and/or result.table

If stream_output is set to True, a StatusOutputStream object is immediately returned and can be used as a Generator that outputs the status messages produced while the job is running.

Inherited members

class ModelTrainerAsset (uuid: UUID)

Points to a dataset or an algorithm indexed on the TripleBlind Router.

Ancestors

Static methods

def cast(asset: Asset) -> ModelTrainerAsset

Convert a generic Asset into a ModelTrainerAsset

This should only be used on an asset known to be model, no validation occurs during the cast.

Args

asset : Asset
A generic Asset

Returns

ModelTrainerAsset
A ModelTrainerAsset object

Methods

def train(self, data: Optional[Union[Asset, str, Path, Package, List[Asset], List[str], List[Path], List[Package]]], data_type: str = 'table', epochs: int = 1, model_output: str = None, data_shape: Optional[List[int]] = None, batch_size: Optional[int] = None, test_size: Optional[float] = None, preprocessor: Union[TabularPreprocessor, List[TabularPreprocessor], TabularPreprocessorBuilder, List[TabularPreprocessorBuilder], ImagePreprocessorBuilder, List[ImagePreprocessorBuilder], NumpyInputPreprocessor, List[NumpyInputPreprocessor], NumpyInputPreprocessorBuilder, List[NumpyInputPreprocessorBuilder]] = None, loss_name: str = None, loss_params: Optional[Dict] = None, optimizer_name: str = None, optimizer_params: Optional[Dict] = None, lr_scheduler_name: Optional[str] = None, lr_scheduler_params: Optional[Dict] = None, params: Optional[Dict] = None, delete_trainer: Optional[bool] = False, job_name: Optional[str] = None, silent: Optional[bool] = False, session: Optional[Session] = None, stream_output: bool = False) -> Union[JobResultStatusOutputStream]

Train this model using the data and parameters specified

Args

data : Asset or str
One or more datasets to use for training. If a string is passed, it must be a path to valid data that will be converted into a temporary asset for the training.
dataset : Asset, str, Path, Package or list of same, optional
One or more datasets to use for training. Datasets can be specified as Assets or as a filename. When a filename is given it will automatically be converted to a temporary Asset which gets deleted at the completion of the Job.
data_type : str
The type of the training data. Valid values are "table", "image", and "numpy".
epochs : int, optional
Number of passes to make through the training data.
model_output : str
The type result generated by the model. Valid values are "regression", "multiclass", and "binary".
data_shape : List[int], optional
Description of the training data, depending on the data_type: table - number of columns of data, e.g. [cols] image - image dimensions, e.g. [width, height, bytes-per-pixel] numpy - not used
batch_size : int, optional
Number of data samples to pass at one time during training.
test_size : float, optional
A percentage of the data to be reserved for accuracy testing and reporting with each epoch.
preprocessor : Preprocessor or List[Preprocessor], optional
A single preprocessor to apply to all data, or a list of preprocessors to apply to each dataset. If a list of preprocessors is given, the count must match the number of datasets.
loss_name : str, optional
A loss function name, consistent with PyTorch. See https://pytorch.org/docs/stable/nn.html#loss-functions
loss_params : dict, optional
Dictionary of parameters appropriate for the loss function.
optimizer_name : str, optional
An optimizer function name, consistent with PyTorch. See https://pytorch.org/docs/stable/optim.html
optimizer_params : dict, optional
Dictionary of parameters appropriate for the optimizer_name.
lr_scheduler_name : str, optional
A learning rate scheduler function name, either "CyclicLR" or "CyclicCosineDecayLR". Default is to use a constant learning rate.
lr_scheduler_params : dict, optional

Dictionary of parameters appropriate for the scheduler_name. Legal values depend on the lr_scheduler_name. For "CyclicLR"::

{ 'step_size': 10, # Number of epochs over which the cycle is completed. 'base_lr': 0.0001, # Starting rate, lower boundary in the cycle 'max_lr': 0.01, # Upper boundary in the cycle. 'mode': 'triangular' # or "triangular2", or "exp_range" 'gamma': # Multiplicative factor of decay of learning rate at the end of each cycle, default=0.99 }

For "CyclicCosineDecayLR"::

{
    "init_decay_epochs": 10,  # Number of initial decay epochs.
    "min_decay_lr": 0.0001,  # Learning rate at the end of decay.
    "restart_interval": 3,  # Restart interval for fixed cycles, or None to disable cycles.
    "restart_interval_multiplier": 1.5,  # Multiplication coefficient for geometrically increasing cycles.
    "restart_lr": 0.01,  # Learning rate when cycle restarts.
    "warmup_epochs":  # Number of warmup epochs, default is None
    "warmup_start_lr":  # Learning rate at the beginning of warmup.
}
params : dictionary, optional
Additional customer parameters.
delete_trainer : bool, optional
Set to True to delete the training model after training completes. Ignored if stream_output is set to True.
job_name : str, optional
Reference name for the job which performs this task. Default is "Model training - "
silent : bool, optional
Suppress status messages during execution? Default is to show messages.
session : Session, optional
A connection session. If not specified, the default session is used.
stream_output : bool, optional
Whether to start the job and return a StatusOutputStream, or wait for job completion and return a JobResult (the default).

Raises

TripleblindTrainingError
Model training failed

Returns

When stream_output is set to False (the default), a JobResult is returned once the job completes. If successful, the inference output is found at result.asset and/or result.table

If stream_output is set to True, a StatusOutputStream object is immediately returned and can be used as a Generator that outputs the status messages produced while the job is running.

Inherited members