Data Format Support
TripleBlind currently supports the following data formats. In addition to those directly supported, many other data formats that can be preprocessed into a supported format before positioning as an Asset. If there are formats you would like to work with natively, please let us know by contacting your Customer Success Manager or submitting a request to đź”—Customer Support.
Tabular Data
Any data consisting of records with the same number of fields can be represented as grids with column headings, or “tables”. This simple format is extremely versatile and is used in a vast number of computer applications.
Tabular Data in CSV Format
Tabular data stored in comma-separated value (CSV) format can be directly positioned as a Dataset Asset. Numerous data formats can be exported or preprocessed into CSV format, including spreadsheets, which can then be positioned.
Please refer to the following examples in the TripleBlind SDK to learn more about how to use tabular data in CSV format.
- Gene_Regression
- PSI
- PSI_Vertical_Partition
- Random_Forest
- Table_Search
- Tabular_Data
- Transfer_Learning
- XGBoost
- XGBoost_Regression
Tabular Data Stored in Databases
Views of tabular data stored in databases can be positioned as a Database Asset. The asset contains the description of the connection and data content, the actual data is read from the database at each usage, so it is “live”. The following databases are currently supported:
- Microsoft SQL Server
- MongoDB
- MySQL
- Oracle
- Postgres
- SQLite
If support for another database is needed, please let us know.
Please refer to the following examples in the TripleBlind SDK to learn more about how to use tabular data stored in databases:
- tutorials/notebooks/1b_Database_Assets
- examples/Data_Connectors
- examples/Data_Munging
Tabular Data Stored in Data Warehouses
Tabular data stored in data warehouses can be positioned as a Database Asset. Like Databases, the asset contains connection information and the actual data is read from the data warehouse at the time of usage, so it is “live”. The following data warehouses are currently supported:
- Amazon Redshift
- Amazon S3
- Databricks
- Google BigQuery
- Microsoft Azure Blob Storage
- Microsoft Azure Data Lake
- Snowflake
If support for another data warehouse is needed, please let us know.
Please refer to the following examples in the TripleBlind SDK to learn more about how to use tabular data stored in data warehouses:
- examples/Data_Connectors
Image Formats
Images can be positioned as Dataset Assets. The following image formats are supported:
- Any image format supported by the đź”—Python Image Library (pillow), including JPEG, PNG, BMP and many more.
- DICOM x-ray images
Please refer to the following examples in the TripleBlind SDK to learn more about how to use image data:
- Multimodal_AI
- Cifar
- Federated_Learning
- Image_Data
- Object_Detection
NumPy Binary Format
The đź”—NumPY NPY format is the standard binary file format for persisting a single NumPy array to disk. NumPy binary files can be directly positioned as Dataset Assets.
Please refer to the following examples in the TripleBlind SDK to learn more about how to use NumPy binary data:
- LSTM
- Network_Provisioning
- PMML
Text Files
Collections of text (.txt) files, such as doctor’s notes, can be positioned as a Dataset Asset.
Please refer to the following examples in the TripleBlind SDK to learn more about how to use text data:
- Redact
Compressed Files
The preprocessor.package
(aka tb.Package
) is a general packaging tool for efficient handling of groups of files, such as images, text files, and NumPy binary arrays. The Package requires a specific internal structure, including a .meta.json
, but is otherwise a regular ZIP archive file.
Please refer to the following examples in the TripleBlind SDK to learn more about how to use compressed packages of data files:
- Federated_Learning
- Image_Data
- LSTM
- Multimodal_AI
Mon Oct 14 2024 17:40:48 GMT-0400 (Eastern Daylight Time)