AI/Machine Learning
The following use cases will help you understand how to use TripleBlind to conduct privacy-preserving modeling and inferencing on 3rd-party data.
Use Case #1: Model Training
Using APIs, train an AI/ML model based on datasets of virtually any type. Personas represented in this use case are: Data Scientist (User), Dataset Owner.
Workflow
The following workflow is used to train models using TripleBlind.
- Initialize a TripleBlind session
- Register new assets or locate existing assets
- Explore assets
- Perform preprocessing and tune model parameters
- Train the model and get results of the training run
Steps
To execute this use case follow these steps in your Python IDE:
1. The User authenticates with Router and starts a Session.
import tripleblind as tb tb.initialize(api_token=user1_token)
ℹ️The call to tb.initialize
is unnecessary if the User token is set up in the User’s tripleblind.yaml
file.
2. The Owner and/or User registers datasets as new assets, or the User searches for existing assets and selects them. The first code snippet is an example of registering a new dataset asset. The second is an example of searching for an existing asset.
asset0 = tb.Asset.position( file_handle="/Users/john/data_munge_sql_a.csv", name="Data Munge Table A-001", desc="Example dataset containing patient information in imperial units.", is_discoverable=True, ) asset0 = tb.TableAsset.find("Data Munge Table A")
3. Optionally, the User explores an EDA profile and synthetic data view of registered Assets.
Alternatively, the Owner can grant access for the Blind Sample
operation so that the User can get a realistic privacy-preserving sample similar to the real data.
Owner
asset0.add_agreement( with_org=2, operation=tb.Operation.BLIND_SAMPLE, )
User
table = tb.TableAsset.find("Data Munge Table A") df = table.get_sample() print(df) Patient_Id Age Height_IN Weight_LBS 0 3838753949679968321 58 74 134 1 1648887823711656506 37 61 212 2 7552046757277691320 66 67 246 3 9125359464938872180 34 69 216 4 5348069512603498341 82 72 198 5 1318251060642557776 59 60 166 6 2306922378047909737 19 63 183 7 1705011269891820451 82 68 188 8 2576874739707490790 53 76 187 9 4233275952277371155 80 70 135
4. The Owner adds an Agreement for a training operation such as Regression
or Blind Learning
to their Asset.
asset0.add_agreement( with_org=2, operation=tb.Operation.REGRESSION, )
Alternatively, the Owner can authorize each process run by the User manually.
5. The User experiments with sample data until the optimal preprocessing steps and model parameters have been realized.
preprocess0 = tb.TabularPreprocessor.builder() .add_column("bmi", target=True) .all_columns(True) .sql_transform( "SELECT Patient_Id as pid, Height_IN as height, Weight_LBS as weight, 1 / (Height_IN * Height_IN) * Weight_LBS * 703 as bmi FROM data WHERE Age > 50" ) .dtype("float32") … job = tb.create_job( job_name="Calculated BMI example", operation=tb.Operation.REGRESSION, dataset=[asset0, asset1], preprocessor=[preprocess0, preprocess1], params={ "regression_algorithm": "Linear", "test_size": 0.1 } )
ℹ️The model setup and training job parameters can vary widely. For example, for a PyTorch neural network, one or more NetworkBuilder objects with network layer splits and different training parameters will be used. A linear regression model may only require the train/test split size parameter as per the example above.
6. The User runs a training job and obtains results, including model file and reference ID.
if job.submit(): job.wait_for_completion() # Download a local copy of the trained model model_file = job.result.asset.download("bmi_model.zip", overwrite=True) print("Trained Network Asset ID:", job.result.asset.uuid) # load the model to view results pack = tb.Package.load("bmi_model.zip") model = pack.model() print("\nCoefficients:") print(model.coef_)
Use Case #2: Model Inference
Using APIs, make predictions using a registered model Asset. Personas represented in this use case are: Data Scientist (User), Model Owner.
Workflow
The following workflow is used to generate inferences against trained models using TripleBlind.
- Initialize a TripleBlind session
- Register a model asset or locate an existing model
- Preprocess input data
- Run predictions against the model and get results
Steps
To execute this use case follow these steps in your Python IDE:
1. The User authenticates with the Router and starts a Session.
import tripleblind as tb tb.initialize(api_token=user1_token)
ℹ️The call to tb.initialize
is unnecessary if the User token is set up in the User’s tripleblind.yaml
file.
2. The Owner adds an Agreement for their model to be executed by the User.
model = tb.Asset.find("3142c5db-3609-42d9-beb9-d3847b642fec") model.add_agreement(with_org=2, operation=tb.Operation.EXECUTE)
Alternatively, the Owner can authorize each access request manually.
3. The User searches for an existing model and selects it.
model = tb.Asset.find("3142c5db-3609-42d9-beb9-d3847b642fec")
4. The User preprocesses input data to work with the model.
This step varies based on the model and datasets involved. Generally speaking, the prediction inputs should match the format of the training inputs. For example, if images were resized and cast as DICOM format during training, the same should be done for inferencing. If values for training were encoded, values for prediction should be encoded in a similar way.
A good habit to practice for model builders is to define some of these requirements in the metadata description and/or Q&A fields for their algorithm asset reference in the TripleBlind Router Index.
Example inference dataset preprocessing from the examples/CMAPSS_CNN
example in the SDK:
X, y = reformat_data(data_x, data_y) # a user-defined encoding method ds = torch.utils.data.TensorDataset(X, y) test_loader = torch.utils.data.DataLoader(ds, batch_size=128) y_pred_list = [] y_true_list = [] with torch.no_grad(): for X_batch, y_batch in test_loader: y_test_pred = model(X_batch) for i in y_test_pred: y_pred_list.append(i.numpy()) for i in y_batch: y_true_list.append(i.item()) y_pred_list = [a.squeeze().tolist() for a in y_pred_list] r2_metric = r2_score(y_true_list, y_pred_list) print(f"R2 score(FD001_test): {r2_metric}")
5. The User runs an inference job and obtains results.
for file in files: job = tb.create_job( job_name="Model test", operation=model, params={"security": "aes"}, # or "smpc" dataset=f"/Users/john/{file}", ) if job.submit(): job.wait_for_completion() filename = job.result.asset.download(save_as="result.zip") pack = tb.Package.load(filename) inference_predictions = pack.records() print(f"Inference results: {inference_predictions}")