Module preprocessor.document_input

Classes

class DocumentPreprocessor (target_column: str, document_column: str, vocab: List, targets: List)

Subclasses

Static methods

def builder() -> DocumentPreprocessorBuilder

Instance variables

var document_column
var target_column
class DocumentPreprocessorBuilder

Abstract base for any object that can be converted into and out of a dict. If schema validation is possible for the derived type, the schema definition should be in a class variable named SCHEMA.

Ancestors

Class variables

var SCHEMA

Methods

def document_column(self, column_name: str) -> DocumentPreprocessorBuilder

Sets which column from the asset's record data to use as a text data.

Args

column_name
The name of the column to take as document information.

Returns

DocumentPreprocessorBuilder
This class instance, useful for chaining.
def output_raw_dataset(self) -> DocumentRawDataset
def output_torch_dataset(self) -> TorchDatasetPreprocessor
def target_column(self, column_name: str) -> DocumentPreprocessorBuilder

Sets which column from the asset's record data to use as a target.

Args

column_name
The name of the column to take as target information.

Returns

DocumentPreprocessorBuilder
This class instance, useful for chaining.

Inherited members

class DocumentRawDataset (document_column: str, target_column: str, vocab: List, targets: List)

Ancestors

Methods

def read_asset(self, path: str | pathlib.Path)
class TabularTorchPreprocessor (document_column: str, target_column: str, vocab: List, targets: List)

Abstract base for any preprocessor that results in a PyTorch Dataset.

Ancestors

Inherited members