API Reference¶
-
class
tabfilereader.
Base64Type
¶ A data type for Columns that only allows base64-encoded values and coerces them to
bytes
.
-
class
tabfilereader.
BooleanType
¶ A data type for Columns that coerces values to
bool
.true
,t
,yes
,y
, and1
are interpreted asTrue
.false
,f
,no
,n
, and0
are interpreted asFalse
.
-
class
tabfilereader.
Column
(location, required=True, data_type=<tabfilereader.types.StringType object>, data_required=False)¶ Defines the properties of a column in a schema.
- Parameters
location (
Union
[str
,int
,Pattern
,Sequence
[Union
[str
,Pattern
]]]) –Specifies where in the file the column is located. Can be specified in several ways:
A zero-based integer that corresponds to the column’s position.
A string that indicates the name of the column that will be found in the first record of the file.
A compiled regular expression that will be used to match against the names in the first record of the file.
A sequence of strings or compiled regular expressions that will be used to match against the names in the first record in the file.
required (
bool
) – Whether or not this column must exist in the file.data_type (
Callable
[[str
],Any
]) – A callable that will translate the string read from the file and coerce it to the desired Python data type. If the value cannot be coerced, the callable should raise aValueError
.data_required (
bool
) – Whether or not this column must contain a value.
-
process_value
(value)¶ Parses/Coerces a value from a file according to this column’s
data_type
.- Parameters
value (
str
) – The raw value read from the file.- Return type
Any
-
class
tabfilereader.
CsvReader
(source_file)¶ A
Reader
capable of reading CSV (Comma Separated Value) files.Available options:
encoding
delimiter
doublequote
escapechar
quotechar
quoting
skipinitialspace
-
property
column_map
¶ A mapping describing the columns found in the file. The keys are the integer position of the column in the file, and the values are the names of the columns as defined in the Schema used in this Reader.
- Return type
Optional
[Mapping
[int
,str
]]
-
delimiter
: str = ','¶ A one-character string used to separate fields. It defaults to
,
.
-
doublequote
: bool = True¶ Controls how instances of quotechar appearing inside a field should themselves be quoted. When
True`
, the character is doubled. WhenFalse`
, the escapechar is used as a prefix to the quotechar. It defaults toTrue`
.
-
encoding
: Optional[str] = None¶ The encoding to use when decoding the file. If not specified, the system default will be used.
-
escapechar
: Optional[str] = None¶ Removes any special meaning from the following character. It defaults to
None`
, which disables escaping.
-
get_record
()¶ Retrieves the next record in the file.
Returns a tuple of two values:
The first value contains the contents of the columns in the record.
The second value is a collection of errors encountered while parsing the record.
- Return type
Tuple
[RecordBase
,RecordErrors
]
-
classmethod
open
(source_file, schema, **options)¶ Creates a
Reader
object and opens the specified file with it.- Parameters
source_file (
Union
[IOBase
,Path
,str
]) – The tabular file to read and parse.schema (
Type
[Schema
]) – TheSchema
class that defines the structure of the data to expect in the file.options (
Any
) – Any of the options that this Reader allows.
- Return type
~ReaderType
-
quotechar
: str = '"'¶ A one-character string used to quote fields containing special characters, such as the delimiter or quotechar, or which contain new-line characters. It defaults to
"
.
-
quoting
: int = 0¶ Controls when quotes should be recognised by the reader. It can take on any of the
csv.QUOTE_*
constants and defaults tocsv.QUOTE_MINIMAL
.
-
property
records_read
¶ The number of records that have been read from the file so far.
- Return type
int
-
skipinitialspace
: bool = False¶ When
True
, whitespace immediately following the delimiter is ignored. The default isFalse
.
-
class
tabfilereader.
DateTimeType
(fmt=None)¶ A data type for Columns that coerces values to
datetime.datetime
.By default, allows the following formats:
YYYY-MM-DDTHH:MM:SS
YYYY-MM-DDTHH:MM:SS+HHMM
YYYY-MM-DDTHH:MM:SS.FFFFFF
YYYY-MM-DDTHH:MM:SS.FFFFFF+HHMM
- Parameters
fmt (
Union
[str
,Sequence
[str
],None
]) – The Pythonstrptime()
format string or sequence of strings to parse.
-
class
tabfilereader.
DateType
(fmt=None)¶ A data type for Columns that coerces values to
datetime.date
.By default, allows the following formats:
YYYY-MM-DD
- Parameters
fmt (
Union
[str
,Sequence
[str
],None
]) – The Pythonstrptime()
format string or sequence of strings to parse.
-
class
tabfilereader.
DecimalType
¶ A data type for Columns that coerces values to
decimal.Decimal
.
-
class
tabfilereader.
ExcelDateTimeType
(fmt=None)¶ A data type for Columns that coerces values to
datetime.datetime
.This is specifically aimed at handling Excel file oddities.
-
class
tabfilereader.
ExcelDateType
(fmt=None)¶ A data type for Columns that coerces values to
datetime.date
.This is specifically aimed at handling Excel file oddities.
-
class
tabfilereader.
ExcelReader
(source_file)¶ A
Reader
capable of reading Excel files. Supports both XLS- and XLSX-formatted files.Available options:
worksheet
encoding
-
property
column_map
¶ A mapping describing the columns found in the file. The keys are the integer position of the column in the file, and the values are the names of the columns as defined in the Schema used in this Reader.
- Return type
Optional
[Mapping
[int
,str
]]
-
encoding
: Optional[str] = None¶ The encoding to use when the CODEPAGE that should be described in the file is missing or wrong.
-
get_record
()¶ Retrieves the next record in the file.
Returns a tuple of two values:
The first value contains the contents of the columns in the record.
The second value is a collection of errors encountered while parsing the record.
- Return type
Tuple
[RecordBase
,RecordErrors
]
-
classmethod
open
(source_file, schema, **options)¶ Creates a
Reader
object and opens the specified file with it.- Parameters
source_file (
Union
[IOBase
,Path
,str
]) – The tabular file to read and parse.schema (
Type
[Schema
]) – TheSchema
class that defines the structure of the data to expect in the file.options (
Any
) – Any of the options that this Reader allows.
- Return type
~ReaderType
-
property
records_read
¶ The number of records that have been read from the file so far.
- Return type
int
-
worksheet
: Union[int, str, re.Pattern] = 0¶ Specifies which worksheet within the file that should be read. Can be either the zero-based integer position of the worksheet, a string with the worksheet’s name, or a
re.Pattern
that will match the worksheet’s name. Defaults to0
(the first worksheet in the file).
-
class
tabfilereader.
ExcelTimeType
(fmt=None)¶ A data type for Columns that coerces values to
datetime.time
.This is specifically aimed at handling Excel file oddities.
-
class
tabfilereader.
FloatType
¶ A data type for Columns that coerces values to
float
.
-
class
tabfilereader.
IntegerType
¶ A data type for Columns that coerces values to
int
.
-
class
tabfilereader.
JsonArrayType
¶ A data type for Columns that only allows JSON-encoded arrays and coerces them to
list
.
-
class
tabfilereader.
JsonObjectType
¶ A data type for Columns that only allows JSON-encoded objects and coerces them to
dict
.
-
class
tabfilereader.
JsonType
¶ A data type for Columns that allows JSON-encoded values.
-
class
tabfilereader.
OdsReader
(source_file)¶ A
Reader
capable of reading OpenDocumentFormat spreadsheet files.Available options:
worksheet
-
property
column_map
¶ A mapping describing the columns found in the file. The keys are the integer position of the column in the file, and the values are the names of the columns as defined in the Schema used in this Reader.
- Return type
Optional
[Mapping
[int
,str
]]
-
get_record
()¶ Retrieves the next record in the file.
Returns a tuple of two values:
The first value contains the contents of the columns in the record.
The second value is a collection of errors encountered while parsing the record.
- Return type
Tuple
[RecordBase
,RecordErrors
]
-
classmethod
open
(source_file, schema, **options)¶ Creates a
Reader
object and opens the specified file with it.- Parameters
source_file (
Union
[IOBase
,Path
,str
]) – The tabular file to read and parse.schema (
Type
[Schema
]) – TheSchema
class that defines the structure of the data to expect in the file.options (
Any
) – Any of the options that this Reader allows.
- Return type
~ReaderType
-
property
records_read
¶ The number of records that have been read from the file so far.
- Return type
int
-
worksheet
: Union[int, str, re.Pattern] = 0¶ Specifies which worksheet within the file that should be read. Can be either the zero-based integer position of the worksheet, a string with the worksheet’s name, or a
re.Pattern
that will match the worksheet’s name. Defaults to0
(the first worksheet in the file).
-
class
tabfilereader.
Reader
(source_file)¶ The abstract base class for tabular file readers.
- Parameters
source_file (
Union
[IOBase
,Path
,str
]) –The tabular file to read and parse. The file can be specified in several ways:
str
- A string specifying a path to a file.pathlib.Path
- APath
object specifying a path to a file.io.IOBase
- An openIOBase
object containing the file’s contents.
-
property
column_map
¶ A mapping describing the columns found in the file. The keys are the integer position of the column in the file, and the values are the names of the columns as defined in the Schema used in this Reader.
- Return type
Optional
[Mapping
[int
,str
]]
-
get_record
()¶ Retrieves the next record in the file.
Returns a tuple of two values:
The first value contains the contents of the columns in the record.
The second value is a collection of errors encountered while parsing the record.
- Return type
Tuple
[RecordBase
,RecordErrors
]
-
classmethod
open
(source_file, schema, **options)¶ Creates a
Reader
object and opens the specified file with it.- Parameters
source_file (
Union
[IOBase
,Path
,str
]) – The tabular file to read and parse.schema (
Type
[Schema
]) – TheSchema
class that defines the structure of the data to expect in the file.options (
Any
) – Any of the options that this Reader allows.
- Return type
~ReaderType
-
property
records_read
¶ The number of records that have been read from the file so far.
- Return type
int
-
schema
: Type[tabfilereader.schema.Schema]¶ The schema definition that will be used when reading the file.
-
class
tabfilereader.
RecordBase
(**kwargs)¶ The base class for records that are read by tabfilereader.
The column values can be accessed on this class as attributes (e.g.,
record.my_column
) or item lookups (e.g.,record['my_column']
).
-
class
tabfilereader.
RecordErrors
¶ A collection of the errors encountered while reading a record from a file.
This class can be accessed like a dict, where the keys are the column names.
-
add
(column, error)¶ Adds an error to the collection.
- Parameters
column (
str
) – The column where the error was encountered.error (
Union
[Exception
,str
]) – The error that occurred.
- Return type
None
-
-
class
tabfilereader.
Schema
¶ The base class for defining the structure and behavior of data to be read from a file.
This class should be subclassed in your code. The properties defined on the subclass should be instances of the
Column
class.Subclasses accept the following class keyword arguments:
ignore_unknown_columns
: Whether or not columns found in the file that do not correspond to columns defined on this class will be ignored. Defaults toFalse
.ignore_empty_records
: Whether or not completely empty records in a file are ignored by tabfilereader. Defaults toFalse
.
-
class
tabfilereader.
StringType
¶ A data type for Columns that coerces values to
str
.
-
exception
tabfilereader.
TabFileReaderError
¶ An exception raised by tabfilereader.
-
with_traceback
()¶ Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
-
-
class
tabfilereader.
TimeType
(fmt=None)¶ A data type for Columns that coerces values to
datetime.time
.By default, allows the following formats:
HH:MM
HH:MM:SS
HH:MM:SS.FFFFFF
- Parameters
fmt (
Union
[str
,Sequence
[str
],None
]) – The Pythonstrptime()
format string or sequence of strings to parse.