Module preprocessor.document_input
Classes
class DocumentPreprocessor (target_column: str, document_column: str, vocab: List, targets: List)
-
Subclasses
Static methods
def builder() -> DocumentPreprocessorBuilder
Instance variables
var document_column
var target_column
class DocumentPreprocessorBuilder
-
Abstract base for any object that can be converted into and out of a dict. If schema validation is possible for the derived type, the schema definition should be in a class variable named
SCHEMA
.Ancestors
- IsDict
- abc.ABC
Class variables
var SCHEMA
Methods
def document_column(self, column_name: str) -> DocumentPreprocessorBuilder
-
Sets which column from the asset's record data to use as a text data.
Args
column_name
- The name of the column to take as document information.
Returns
DocumentPreprocessorBuilder
- This class instance, useful for chaining.
def output_raw_dataset(self) -> DocumentRawDataset
def output_torch_dataset(self) -> TorchDatasetPreprocessor
def target_column(self, column_name: str) -> DocumentPreprocessorBuilder
-
Sets which column from the asset's record data to use as a target.
Args
column_name
- The name of the column to take as target information.
Returns
DocumentPreprocessorBuilder
- This class instance, useful for chaining.
Inherited members
class DocumentRawDataset (document_column: str, target_column: str, vocab: List, targets: List)
-
Ancestors
Methods
def read_asset(self, path: str | pathlib.Path)
class TabularTorchPreprocessor (document_column: str, target_column: str, vocab: List, targets: List)
-
Abstract base for any preprocessor that results in a PyTorch Dataset.
Ancestors
Inherited members