Module preprocessor.abstract
The abstract classes in this modular define the interfaces used by concrete classes defined by in this package or custom preprocessors.
Classes
class IsDict
-
Abstract base for any object that can be converted into and out of a dict. If schema validation is possible for the derived type, the schema definition should be in a class variable named
SCHEMA
.Ancestors
- abc.ABC
Subclasses
Static methods
def from_dict(input: dict) -> Any
-
Converts a Python dict into the type.
Args
input
:dict
- The output of a serializer like json.load()
Returns
Any
- The specific type which implemented this interface.
Methods
def to_dict(self) -> dict
-
Converts the type into a Python dict for serialization.
Returns
dict
- Python dict which can be passed to serializers like json.dump()
class NumpyPreprocessor
-
Abstract base for any preprocessor that can generate a numpy.ndarray
Ancestors
- abc.ABC
Subclasses
- DicomNumpyPreprocessor
- preprocessor.image.ImageNumpyPreprocessor
- preprocessor.numpy_input.NumpyNumpyPreprocessor
- TabularNumpyPreprocessor
Methods
def read_asset(self, asset: Union[str, pathlib.Path, Package]) -> numpy.ndarray
-
Reads a
preprocessor.package
file and coerces it into an ndarray.Args
asset (Union(str, Path, or Package)): File (or Package) pointing to the asset on disk.
Returns
np.ndarray
- The content of the package file as an ndarray
def read_asset_chunked(self, asset: Union[str, pathlib.Path, Package], chunksize: int) -> Iterator[numpy.ndarray]
-
Reads in chunks from a tabular
preprocessor.package
file and coerces it into a pandas.DataFrame.Args
- asset (Union(str, Path, or Package)): File (or Package) pointing to the asset on disk.
chunksize
:int
- The number of tabular rows to include in each chunk.
Yields
Iterator[np.ndarray]
- An ndarray with at most the number of rows defined by the chunk size.
def read_bytes(self, data: _io.BytesIO) -> numpy.ndarray
-
Reads a BytesIO source and coerces it into an ndarray.
Args
data
- The BytesIO source containing the input data bytes.
Returns
np.ndarray
- The byte content as an ndarray
def read_file(self, path: Union[str, pathlib.Path, Package]) -> numpy.ndarray
-
Reads a file from disk and coerces it into an ndarray
Args
path
:Union[str, Path, Package]
- Reference to the file on disk.
Returns
np.ndarray
- The content of the file as an ndarray
def read_folder(self, pattern: str) -> Iterator[numpy.ndarray]
-
Loads files from a folder based on a glob pattern.
Args
pattern
- A glob pattern used to select files to read. Ex. "data/sample.*"
Yields
Iterator[np.ndarray]
- Data represented as an ndarray.
class NumpyTargetPreprocessor
-
Abstract base preprocessor that can generate a NumPy ndarray object with a target.
Ancestors
- abc.ABC
Subclasses
Methods
def read_asset(self, asset: Union[str, pathlib.Path, Package]) -> Tuple[numpy.ndarray, Optional[numpy.ndarray]]
-
Reads a
Package
file from disk and coerces it into a tuple of ndarrays.Args
asset (Union(str, Path, or Package)): File (or Package) pointing to the asset on disk.
Returns
Tuple[np.ndarray, Optional[np.ndarray]]
- The content of the file as where the first value is a 2D features array, and the second is a 1D target array.
def read_asset_chunked(self, asset: Union[str, pathlib.Path, Package], chunksize: int) -> Iterator[Tuple[numpy.ndarray, Optional[numpy.ndarray]]]
-
Reads a large
Package
file from disk and coerces it into a pandas.DataFrame.Args
- asset (Union(str, Path, or Package)): File (or Package) pointing to the asset on disk.
chunksize
- The number of tabular rows that will be included in each chunk.
Yields
Iterator[Tuple[np.ndarray, Optional[np.ndarray]]]
- A tuple of numpy arrays, each of which has at most the number of rows defined by the chunk size.
def read_bytes(self, data: _io.BytesIO) -> Tuple[numpy.ndarray, Optional[numpy.ndarray]]
-
Reads from a BytesIO source and coerces it into a tuple of ndarrays.
Args
data
- The BytesIO source containing the input data bytes.
Returns
Tuple[np.ndarray, Optional[np.ndarray]]
- Where the first value is a 2D features array, and the second is a 1D target array.
def read_file(self, path: Union[str, pathlib.Path, Package]) -> Tuple[numpy.ndarray, Optional[numpy.ndarray]]
-
Loads a file from disk and coerces into a tuple of ndarrays.
Args
path
:Union[str, Path, Package]
- Reference to the file on disk.
Returns
Tuple[np.ndarray, Optional[np.ndarray]]
- The content of the file as where the first value is a 2D features array, and the second is a 1D target array.
def read_folder(self, pattern: str) -> Iterator[Tuple[numpy.ndarray, Optional[numpy.ndarray]]]
-
Creates a generator for multiple files matching a glob pattern from a folder into a tuple of ndarrays.
Args
pattern
- A glob pattern used to select files to read. Ex. "data/sample.*"
Yields
Tuple[np.ndarray, Optional[np.ndarray]]
- Where the first value is a 2D features array, and the second is a 1D target array.
class OutputNumpy
-
Abstract base for a preprocessor that can output data as a numpy.ndarray
Ancestors
- RequirePropertyDtype
- abc.ABC
Subclasses
- DicomPreprocessorBuilder
- ImagePreprocessorBuilder
- NumpyInputPreprocessorBuilder
- TabularPreprocessorBuilder
Methods
def output_numpy(self) -> NumpyPreprocessor
-
Completes the builder by returning a constructed NumpyPreprocessor
Returns
A NumpyPreprocessor
Inherited members
class OutputNumpyTarget
-
Abstract base for a preprocessor that can output data as a numpy.ndarray
Ancestors
- RequirePropertyDtype
- abc.ABC
Subclasses
Methods
def output_numpy_target(self) -> NumpyTargetPreprocessor
-
Completes the builder by returning a constructed NumpyTargetPreprocessor
Returns
A NumpyTargetPreprocessor
Inherited members
class OutputPandas
-
Abstract base for a preprocessor that can output data as a pandas.dataframe
Ancestors
- abc.ABC
Subclasses
Methods
def output_pandas(self) -> PandasPreprocessor
-
Completes the builder by returning a constructed PandasPreprocessor
Returns
A PandasPreprocessor
class OutputTorchDataset
-
Abstract base for a preprocessor that can output data as a torch.dataset
Ancestors
- RequirePropertyDtype
- abc.ABC
Subclasses
- ImagePreprocessorBuilder
- NumpyInputPreprocessorBuilder
- ROIPreprocessorBuilder
- TabularPreprocessorBuilder
Methods
def output_torch_dataset(self) -> TorchDatasetPreprocessor
-
Completes the builder by returning a constructed TorchDatasetPreprocessor
Returns
A TorchDatasetPreprocessor
Inherited members
class PandasPreprocessor
-
Abstract base for any preprocessor that results in a pandas.DataFrame.
Ancestors
- abc.ABC
Subclasses
Methods
def read_asset(self, asset: Union[str, pathlib.Path, Package]) -> pandas.core.frame.DataFrame
-
Reads a
preprocessor.package
file from disk and coerces it into a pandas.DataFrame.Args
asset
:Union[str, Path,
orPackage]
- File (or Package) pointing to the asset on disk.
Returns
The content of the package file as a pandas.DataFrame.
def read_asset_chunked(self, asset: Union[str, pathlib.Path, Package], chunksize: int) -> Iterator[pandas.core.frame.DataFrame]
-
Reads a
Package
file from disk and coerces it into a pandas.DataFrame.Args
- asset (Union(str, Path, or Package)): File (or Package) pointing to the asset on disk.
chunksize
- The number of tabular rows that will be included in each chunk.
Returns
An iterator of pandas.DataFrames, each of which has at most the number of rows defined by the chunk size.
def read_bytes(self, data: _io.BytesIO) -> pandas.core.frame.DataFrame
-
Reads a BytesIO source and coerces it into a pandas.DataFrame.
Args
data
- The BytesIO source containing the input data bytes.
Returns
The content of the file as a pandas.DataFrame.
def read_file(self, asset: Union[str, pathlib.Path, Package]) -> pandas.core.frame.DataFrame
-
Reads a file form disk and coerces it into a pandas.DataFrame.
Args
asset
:Union[str, Path,
orPackage]
- File (or Package) pointing to the asset on disk.
Returns
The content of the file as a pandas.DataFrame.
def read_folder(self, pattern: str) -> Iterator[pandas.core.frame.DataFrame]
-
Creates a generator which loads files from a folder based on a glob pattern.
Args
pattern
- A glob pattern used to select files to read. Ex. "data/sample.*"
Yields
Iterator[pd.DataFrame]
- Data represented as a pandas.DataFrame.
class RequirePropertyDtype
-
Helper class that provides a standard way to create an ABC using inheritance.
Ancestors
- abc.ABC
Subclasses
Methods
def dtype(self, dtype: Optional[str])
-
Casts an output numpy array to a given dtype.
If unset, the Protocol will choose. Ignored for non-numpy outputs.
Args
dtype
- The dtype that a numpy output will be cast into. See NumPy docs for more detail on possible types.
class TorchDatasetPreprocessor
-
Abstract base for any preprocessor that results in a PyTorch Dataset.
Ancestors
- abc.ABC
Subclasses
- TabularTorchPreprocessor
- preprocessor.image.ImageTorchPreprocessor
- preprocessor.numpy_input.NumpyTorchPreprocessor
- TabularTorchPreprocessor
Methods
def read_file(self, asset: Union[str, pathlib.Path, Package]) -> torch.utils.data.dataset.Dataset
-
Reads a file from disk and coerces it into a Torch.Dataset.
Args
asset (Union(str, Path, or Package)): File (or Package) pointing to the asset on disk.
Returns
torch.utils.data.Dataset
- A PyTorch Dataset