Access Points & Assets

The TripleBlind platform supports two primary types of Assets: Data and Algorithms. In this tutorial, we use Data Assets (aka "datasets") to illustrate and explain how to position and manage assets. By the end of this tutorial, you will know how to perform the following tasks:

  • Position an asset via the web interface and SDK
  • Manage your dataset (change properties, retrieve, archive, etc.)
  • Search for a dataset
  • Retrieve a dataset you own
  • Archive a dataset

Positioning Assets

Positioning an asset is the process of placing your data file inside your own infrastructure's Access Point and indexing that asset on the TripleBlind Router. Positioning assets is the first step towards monetizing your data or training collaborative and distributed algorithms—while preserving the privacy of your data.

ℹī¸Positioned data never leaves your organization's firewall. TripleBlind never hosts or has access to your data. The only route to your data is through your own Access Point, which is controlled by your organization.

You can position your data either via TripleBlind's web interface or the SDK. We explain these two methods next.

Web Interface

You can use an intuitive web interface to position your datasets. Simply visit the 🔗Explore Assets page and click Position Dataset to begin the process. You will provide the following information:

  • Listing Name
    A descriptor for the dataset. The name is used when listing or searching for your dataset in the Data Explorer. The best names are descriptive, but short.
  • k-Grouping
    Inspired by
    k-anonymity, TripleBlind supports configuration of a “k-Grouping” safeguard at the dataset-level. See k-Grouping for more information about how TripleBlind honors k-Grouping in Blind Query, Blind Join, and Blind Stats operations.
  • Visibility
    Control whether to list your dataset in the public listing of available datasets or not. If "Public,” users from other organizations will know of the existence and see summaries of the dataset, but they will not be able to see the contents or operate on it without requesting your permission first.
  • Description
    A freeform area to convey in-depth information about the dataset. The description can use
    Markdown formatting to present images, tables, links and other rich text tools. The Preview tab allows you to see the description as it will appear in the web interface. Good descriptions include technical details about the dataset, as this is the primary way other organizations learn about your dataset.

All these properties of the dataset can be easily changed later.

ℹī¸Positioning data on your Access Point is convenient via the web interface, but be aware that the content will briefly pass through TripleBlind's servers when you use this method. To keep your data 100% private, use the SDK method to position data.

SDK Operation

The TripleBlind SDK provides a Python library which gives you easy programmatic access to the platform. If you haven't installed and configured the SDK yet, see the Installation Guide.

This and other tutorials include snippets of Python code. These snippets highlight important commands, but might not include all setup code to be able to run independently. You can find ready-to-run and more detailed Python scripts in the tutorials directory that came with the SDK. You should examine and run these scripts as you work through the tutorial.

The SDK provides both a utility and a library for interacting with TripleBlind. Both will be covered below.

Command Line tool

The tb.py command line tool is a quick and easy way to manage Assets. To create a new Asset, simply use:

tb asset create

You will be prompted to enter the filename to position, provide a description and set the visibility for the asset. Optionally, you can provide all of that on the command line in a single shot, e.g.:

tb.py asset create tabular.csv --name 'my data' --desc 'some csv data' --public
Custom Code

Next we will explore the basic APIs to position and manage your data using the TripleBlind library. Any data file (dataset or algorithm) can become an Asset, we will use a basic comma-separated value (CSV) file.

The first step is to pull in the library so you can connect to the TripleBlind Router and your own Access Point. This statement needs to be in all scripts using the TripleBlind SDK.

import tripleblind as tb

Now we will use the library to position a local data file on your Access Point. We'll use the file Meteorite_Landings.csv that comes with the SDK as an example.

my_dataset = tb.Asset.position(
# path to the local data file to position
file_handle="data/NASA_Meteorite_Landings.csv",
# give your dataset a name
name=f"NASA Meteorite Landings",
# describe your data
desc="A dataset for Meteorite Landings from NASA...",
# Make the dataset discoverable by other users in the web interface
is_discoverable=True,
)
print(f"Successfully created: {my_dataset.name}, ID: {my_dataset.uuid}")

Managing Assets

Once your asset is on your Access Point you can manage it directly either from the web interface or the SDK.

To manage an asset you created via the web interface, visit the 🔗My Assets page. Select an asset to view it and click the Edit button to change its properties.

Alternatively, you can manage your assets directly from your code using our APIs. In the following code snippets, we illustrate how to:

  • rename an asset (asset.name)
  • update the description (asset.desc)
  • update the default file name (asset.filename)
  • change discoverability: (asset.is_discoverable)
  • retrieve an asset you own (asset.retrieve())
  • remove an asset (asset.archive())

First you need to have an Asset object. When we positioned the CSV above it returned an object which we stored in the my_dataset variable. You can also connect to an asset using its dataset's unique ID seen in the detail view in the Data Explorer below. For example, here we will connect to the "MNIST Handwritten Digit Database" asset provided by TripleBlind. You can view properties for this asset, but cannot change them since you are not an owner.

mnist_dataset = tb.Asset("27e6e5e6-c281-425e-8a82-06e3d0c8dcc9")
print(mnist_dataset.name)

Next we'll change some properties of the meteorite dataset you created above and do own.

#### rename the asset
print(f"Previous dataset name: {my_dataset.name}") my_dataset.name = "Meteorite Landings Dataset" print(f"New dataset name: {my_dataset.name}")

#### update the description
print(f"Previous description: {my_dataset.desc}")
my_dataset.desc = "A new *markdown* description on my dataset."
# NOTE: Text in the description surrounded by * will show as bold in the web interface print(f"New description: {my_dataset.desc}")

The filename property is the default name used when retrieving the asset, if no other name is specified. Notice that the name we are using has a .zip -- that's because the act of positioning data packaged it up, which wraps the dataset in a zip archive along with metadata. For a CSV the data is only a single file, but other dataset packages might have many files, such as an image training set.

#### change the default file name
print(f"Previous file name: {my_dataset.filename}")
my_dataset.filename = "Meteorite_Landings_v1.csv.zip"
print(f"New file name: {my_dataset.filename}")

As the owner of this asset, we can also retrieve it. This is only possible for assets you own and if you have sufficient permissions within your organization!

my_dataset.retrieve(overwrite=True)

Finally, you can archive an Asset. If you set remote_delete=False, the actual data remains on your Access Point. Either way, the Asset is removed from the index on the Router and can no longer be shared with anyone.

my_dataset.archive(remote_delete=True)

ℹī¸ Assets can either be destroyed or just delisted from the Router via asset.archive(remote_delete=False). Either way, the audit records remain intact and the dataset can no longer be used for other operations.


Accessing Assets

You can use TripleBlind's Data Explorer to browse existing assets -- both data and algorithms -- which were made discoverable by their owners. You can search for interesting assets either in the web interface or using the SDK.

ℹī¸ Remember, discoverable assets do not allow you to view the actual data -- just learn the existence of the data/algorithms that you can access to train algorithms or run inferences. Before accessing other organizations' assets you first need to obtain the necessary permissions (see the Permissions section below).

Web Interface

Visit the web interface to explore 🔗Datasets, 🔗Algorithms, and 🔗Reports. Searches match on the asset name or words found in the description.

Each card provides a summary of the asset it represents. Click on the asset listing to get more information about an asset, including the UUID (used to reference assets in your code), the full description, and summaries of the asset like an EDA report. For example, the following figure illustrates part of the page for the SAN dataset.

Exploratory Data Analysis (EDA)

The detail page of a CSV dataset will contain an EDA report under the Data Profile, which provides summaries and statistics about the dataset without revealing the actual contents. Data scientists can use this information to determine the utility of a dataset for their purposes.

Access Requests

ℹī¸ Granting access means enabling other organizations to use your data privately to train an algorithm or run some data analysis task. However, your data will never leave your infrastructure and will never be revealed neither to the organization requesting the job nor TripleBlind. To learn more about our secure and privacy-preserving techniques, please visit: 🔗tripleblind.com.

Whenever a user outside of your organization wishes to use an asset that you own (such as a dataset), you must explicitly grant permission for each use. To view and grant permission, visit the 🔗Access Requests page.

Some operations are able to use preprocessing techniques to normalize data with a SQL Transform or Python Transform. These transformations are powerful, so we call extra attention to these processes with a warning icon, as shown below, and the actual SQL or Python code can be reviewed within the Details view. Processes with a SQL Transform or Python Transform that are not understood should not be approved.

Agreements

Granting permissions can become repetitive when working with a close partner organization that is running multiple trainings or analyses using your dataset. Visit the 🔗Assets page, select the asset you want to create an agreement for, select Manage, and then Create New from the Agreements tab. Agreements allow specific organizations to use your asset without waiting for your approval for every usage.

Additionally, you can create a special type of Agreements, called "ANY ORGANIZATION", which will allow unfettered access to your asset by any user registered with TripleBlind. Similar to all permissions, each access to your assets will be recorded and logged in audit records (see the 🔗Audit Usage page).

Additionally, you can create an Agreement that allows all organizations registered with TripleBlind to use your asset for the specified purpose. Each access of your assets is recorded and logged in audit records. See the 🔗Audit Usage page for audit records.

SDK Operation

You can access an asset, whether you own it or not, from the SDK using one of the following methods:

  • Point to the asset using its UUID (which can be obtained in the web interface)
  • Search for the asset by name or a keyword using the SDK
  • Search for datasets or algorithms by name using Asset.find()

In the following code example, we search for all datasets that have the keyword "Bank" and then print their names and IDs to see which are of interest to us. Many tasks can consume multiple datasets, such as using datasets from different organizations to build an image classifier.

search_results = tb.Asset.find_all("Customer Database", dataset=True)
if search_results:
print(f"Found {len(search_results)} dataset(s) with the keyword 'Customer Database':")
for dataset in search_results:
print(f"'{dataset.name}', ID: {dataset.uuid}")
else:
print("No results...")

Command Line Tools

All of the asset management functionality is available from the tb.py utility that comes with the SDK. It can quickly be used from the command line instead of writing custom Python programs to do simple tasks.

Here are some examples of how you can use this utility. To view the NASA Meteorite dataset we created earlier, you could run:

python tb.py list "NASA Meteorite Landings"

or, even simpler, you can run the script directly on Mac and Linux systems:

./tb.py list "NASA Meteorite Landings"

We'll stick with this simpler way for the rest of the examples.

List assets found during the last months of 2020:

./tb.py list --since=11-1-2020 --before=1-1-2021

List data assets belonging to you:

./tb.py list data --mine

Find assets with "Bank" or "bank" in the name, using a regular expression match:

./tb.py list /[Bb]ank/

The same utility can set properties. Here we'll change the description of the dataset:

./tb.py set "NASA Meteorite Landings" desc "Changed from command line!"

Retrieve a copy of an asset you own:

./tb.py retrieve "NASA Meteorite Landings" nasa.csv

And even destroy your dataset:

./tb.py remove "NASA Meteorite Landings" --delete

Create a simple data asset:

./tb.py create nasa.csv --name "NASA Meteorite Landings" --desc "NASA data" --public

ℹī¸If --name and --desc weren't included, you would be prompted for them interactively.

For a full list of capabilities:

./tb.py --help

TIP: You can run these commands within a Jupyter notebook cell by adding "!" before the command, like this:

! ./tb.py
Wed May 15 2024 04:19:59 GMT-0400 (Eastern Daylight Time)