API Reference

class tabfilereader.Base64Type

A data type for Columns that only allows base64-encoded values and coerces them to bytes.

class tabfilereader.BooleanType

A data type for Columns that coerces values to bool.

true, t, yes, y, and 1 are interpreted as True.

false, f, no, n, and 0 are interpreted as False.

class tabfilereader.Column(location, required=True, data_type=<tabfilereader.types.StringType object>, data_required=False)

Defines the properties of a column in a schema.

Parameters
  • location (Union[str, int, Pattern, Sequence[Union[str, Pattern]]]) –

    Specifies where in the file the column is located. Can be specified in several ways:

    • A zero-based integer that corresponds to the column’s position.

    • A string that indicates the name of the column that will be found in the first record of the file.

    • A compiled regular expression that will be used to match against the names in the first record of the file.

    • A sequence of strings or compiled regular expressions that will be used to match against the names in the first record in the file.

  • required (bool) – Whether or not this column must exist in the file.

  • data_type (Callable[[str], Any]) – A callable that will translate the string read from the file and coerce it to the desired Python data type. If the value cannot be coerced, the callable should raise a ValueError.

  • data_required (bool) – Whether or not this column must contain a value.

process_value(value)

Parses/Coerces a value from a file according to this column’s data_type.

Parameters

value (str) – The raw value read from the file.

Return type

Any

class tabfilereader.CsvReader(source_file)

A Reader capable of reading CSV (Comma Separated Value) files.

Available options:

  • encoding

  • delimiter

  • doublequote

  • escapechar

  • quotechar

  • quoting

  • skipinitialspace

property column_map

A mapping describing the columns found in the file. The keys are the integer position of the column in the file, and the values are the names of the columns as defined in the Schema used in this Reader.

Return type

Optional[Mapping[int, str]]

delimiter: str = ','

A one-character string used to separate fields. It defaults to ,.

doublequote: bool = True

Controls how instances of quotechar appearing inside a field should themselves be quoted. When True`, the character is doubled. When False`, the escapechar is used as a prefix to the quotechar. It defaults to True`.

encoding: Optional[str] = None

The encoding to use when decoding the file. If not specified, the system default will be used.

escapechar: Optional[str] = None

Removes any special meaning from the following character. It defaults to None`, which disables escaping.

get_record()

Retrieves the next record in the file.

Returns a tuple of two values:

  • The first value contains the contents of the columns in the record.

  • The second value is a collection of errors encountered while parsing the record.

Return type

Tuple[RecordBase, RecordErrors]

classmethod open(source_file, schema, **options)

Creates a Reader object and opens the specified file with it.

Parameters
  • source_file (Union[IOBase, Path, str]) – The tabular file to read and parse.

  • schema (Type[Schema]) – The Schema class that defines the structure of the data to expect in the file.

  • options (Any) – Any of the options that this Reader allows.

Return type

~ReaderType

quotechar: str = '"'

A one-character string used to quote fields containing special characters, such as the delimiter or quotechar, or which contain new-line characters. It defaults to ".

quoting: int = 0

Controls when quotes should be recognised by the reader. It can take on any of the csv.QUOTE_* constants and defaults to csv.QUOTE_MINIMAL.

property records_read

The number of records that have been read from the file so far.

Return type

int

skipinitialspace: bool = False

When True, whitespace immediately following the delimiter is ignored. The default is False.

class tabfilereader.DateTimeType(fmt=None)

A data type for Columns that coerces values to datetime.datetime.

By default, allows the following formats:

  • YYYY-MM-DDTHH:MM:SS

  • YYYY-MM-DDTHH:MM:SS+HHMM

  • YYYY-MM-DDTHH:MM:SS.FFFFFF

  • YYYY-MM-DDTHH:MM:SS.FFFFFF+HHMM

Parameters

fmt (Union[str, Sequence[str], None]) – The Python strptime() format string or sequence of strings to parse.

class tabfilereader.DateType(fmt=None)

A data type for Columns that coerces values to datetime.date.

By default, allows the following formats:

  • YYYY-MM-DD

Parameters

fmt (Union[str, Sequence[str], None]) – The Python strptime() format string or sequence of strings to parse.

class tabfilereader.DecimalType

A data type for Columns that coerces values to decimal.Decimal.

class tabfilereader.ExcelDateTimeType(fmt=None)

A data type for Columns that coerces values to datetime.datetime.

This is specifically aimed at handling Excel file oddities.

class tabfilereader.ExcelDateType(fmt=None)

A data type for Columns that coerces values to datetime.date.

This is specifically aimed at handling Excel file oddities.

class tabfilereader.ExcelReader(source_file)

A Reader capable of reading Excel files. Supports both XLS- and XLSX-formatted files.

Available options:

  • worksheet

  • encoding

property column_map

A mapping describing the columns found in the file. The keys are the integer position of the column in the file, and the values are the names of the columns as defined in the Schema used in this Reader.

Return type

Optional[Mapping[int, str]]

encoding: Optional[str] = None

The encoding to use when the CODEPAGE that should be described in the file is missing or wrong.

get_record()

Retrieves the next record in the file.

Returns a tuple of two values:

  • The first value contains the contents of the columns in the record.

  • The second value is a collection of errors encountered while parsing the record.

Return type

Tuple[RecordBase, RecordErrors]

classmethod open(source_file, schema, **options)

Creates a Reader object and opens the specified file with it.

Parameters
  • source_file (Union[IOBase, Path, str]) – The tabular file to read and parse.

  • schema (Type[Schema]) – The Schema class that defines the structure of the data to expect in the file.

  • options (Any) – Any of the options that this Reader allows.

Return type

~ReaderType

property records_read

The number of records that have been read from the file so far.

Return type

int

worksheet: Union[int, str, re.Pattern] = 0

Specifies which worksheet within the file that should be read. Can be either the zero-based integer position of the worksheet, a string with the worksheet’s name, or a re.Pattern that will match the worksheet’s name. Defaults to 0 (the first worksheet in the file).

class tabfilereader.ExcelTimeType(fmt=None)

A data type for Columns that coerces values to datetime.time.

This is specifically aimed at handling Excel file oddities.

class tabfilereader.FloatType

A data type for Columns that coerces values to float.

class tabfilereader.IntegerType

A data type for Columns that coerces values to int.

class tabfilereader.JsonArrayType

A data type for Columns that only allows JSON-encoded arrays and coerces them to list.

class tabfilereader.JsonObjectType

A data type for Columns that only allows JSON-encoded objects and coerces them to dict.

class tabfilereader.JsonType

A data type for Columns that allows JSON-encoded values.

class tabfilereader.OdsReader(source_file)

A Reader capable of reading OpenDocumentFormat spreadsheet files.

Available options:

  • worksheet

property column_map

A mapping describing the columns found in the file. The keys are the integer position of the column in the file, and the values are the names of the columns as defined in the Schema used in this Reader.

Return type

Optional[Mapping[int, str]]

get_record()

Retrieves the next record in the file.

Returns a tuple of two values:

  • The first value contains the contents of the columns in the record.

  • The second value is a collection of errors encountered while parsing the record.

Return type

Tuple[RecordBase, RecordErrors]

classmethod open(source_file, schema, **options)

Creates a Reader object and opens the specified file with it.

Parameters
  • source_file (Union[IOBase, Path, str]) – The tabular file to read and parse.

  • schema (Type[Schema]) – The Schema class that defines the structure of the data to expect in the file.

  • options (Any) – Any of the options that this Reader allows.

Return type

~ReaderType

property records_read

The number of records that have been read from the file so far.

Return type

int

worksheet: Union[int, str, re.Pattern] = 0

Specifies which worksheet within the file that should be read. Can be either the zero-based integer position of the worksheet, a string with the worksheet’s name, or a re.Pattern that will match the worksheet’s name. Defaults to 0 (the first worksheet in the file).

class tabfilereader.Reader(source_file)

The abstract base class for tabular file readers.

Parameters

source_file (Union[IOBase, Path, str]) –

The tabular file to read and parse. The file can be specified in several ways:

  • str - A string specifying a path to a file.

  • pathlib.Path - A Path object specifying a path to a file.

  • io.IOBase - An open IOBase object containing the file’s contents.

property column_map

A mapping describing the columns found in the file. The keys are the integer position of the column in the file, and the values are the names of the columns as defined in the Schema used in this Reader.

Return type

Optional[Mapping[int, str]]

get_record()

Retrieves the next record in the file.

Returns a tuple of two values:

  • The first value contains the contents of the columns in the record.

  • The second value is a collection of errors encountered while parsing the record.

Return type

Tuple[RecordBase, RecordErrors]

classmethod open(source_file, schema, **options)

Creates a Reader object and opens the specified file with it.

Parameters
  • source_file (Union[IOBase, Path, str]) – The tabular file to read and parse.

  • schema (Type[Schema]) – The Schema class that defines the structure of the data to expect in the file.

  • options (Any) – Any of the options that this Reader allows.

Return type

~ReaderType

property records_read

The number of records that have been read from the file so far.

Return type

int

schema: Type[tabfilereader.schema.Schema]

The schema definition that will be used when reading the file.

class tabfilereader.RecordBase(**kwargs)

The base class for records that are read by tabfilereader.

The column values can be accessed on this class as attributes (e.g., record.my_column) or item lookups (e.g., record['my_column']).

class tabfilereader.RecordErrors

A collection of the errors encountered while reading a record from a file.

This class can be accessed like a dict, where the keys are the column names.

add(column, error)

Adds an error to the collection.

Parameters
  • column (str) – The column where the error was encountered.

  • error (Union[Exception, str]) – The error that occurred.

Return type

None

class tabfilereader.Schema

The base class for defining the structure and behavior of data to be read from a file.

This class should be subclassed in your code. The properties defined on the subclass should be instances of the Column class.

Subclasses accept the following class keyword arguments:

  • ignore_unknown_columns: Whether or not columns found in the file that do not correspond to columns defined on this class will be ignored. Defaults to False.

  • ignore_empty_records: Whether or not completely empty records in a file are ignored by tabfilereader. Defaults to False.

class tabfilereader.StringType

A data type for Columns that coerces values to str.

exception tabfilereader.TabFileReaderError

An exception raised by tabfilereader.

with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

class tabfilereader.TimeType(fmt=None)

A data type for Columns that coerces values to datetime.time.

By default, allows the following formats:

  • HH:MM

  • HH:MM:SS

  • HH:MM:SS.FFFFFF

Parameters

fmt (Union[str, Sequence[str], None]) – The Python strptime() format string or sequence of strings to parse.