Module preprocessor.document_input
Classes
class DocumentPreprocessor (target_column: str, document_column: str, vocab: List, targets: List)-
Subclasses
Static methods
def builder() -> DocumentPreprocessorBuilder
Instance variables
var document_columnvar target_column
class DocumentPreprocessorBuilder-
Abstract base for any object that can be converted into and out of a dict. If schema validation is possible for the derived type, the schema definition should be in a class variable named
SCHEMA.Ancestors
- IsDict
- abc.ABC
Class variables
var SCHEMA
Methods
def document_column(self, column_name: str) -> DocumentPreprocessorBuilder-
Sets which column from the asset's record data to use as a text data.
Args
column_name- The name of the column to take as document information.
Returns
DocumentPreprocessorBuilder- This class instance, useful for chaining.
def output_raw_dataset(self) -> DocumentRawDatasetdef output_torch_dataset(self) -> TorchDatasetPreprocessordef target_column(self, column_name: str) -> DocumentPreprocessorBuilder-
Sets which column from the asset's record data to use as a target.
Args
column_name- The name of the column to take as target information.
Returns
DocumentPreprocessorBuilder- This class instance, useful for chaining.
Inherited members
class DocumentRawDataset (document_column: str, target_column: str, vocab: List, targets: List)-
Ancestors
Methods
def read_asset(self, path: str | pathlib.Path)
class TabularTorchPreprocessor (document_column: str, target_column: str, vocab: List, targets: List)-
Abstract base for any preprocessor that results in a PyTorch Dataset.
Ancestors
Inherited members