******* Welcome ******* .. image:: https://img.shields.io/pypi/v/tabfilereader.svg :target: https://pypi.python.org/pypi/tabfilereader .. image:: https://img.shields.io/pypi/l/tabfilereader.svg :target: https://pypi.python.org/pypi/tabfilereader .. image:: https://github.com/jayclassless/tabfilereader/workflows/Test/badge.svg :target: https://github.com/jayclassless/tabfilereader/actions .. image:: https://github.com/jayclassless/tabfilereader/workflows/Docs/badge.svg :target: https://jayclassless.github.io/tabfilereader/ Overview ======== ``tabfilereader`` is a small library to make reading flat, tabular data from files a bit less tedious. At its base, to use ``tabfilereader``, you simply define your Schema, then use it to open a Reader. You can then iterate through the Reader to retrieve records from the file. >>> import tabfilereader as tfr >>> class MySchema(tfr.Schema): ... column1 = tfr.Column('column_1') ... column2 = tfr.Column('column_2', data_type=tfr.IntegerType(), data_required=True) >>> reader = tfr.CsvReader.open('test/data/simple_header.csv', MySchema) >>> for record, errors in reader: ... print(record) Record(column1='foo', column2=123) Record(column1='bar', column2=None) Schemas ======= Schema classes tell ``tabfilereader`` what columns to expect in the file, and what datatypes the values contained in them should be cast as. You create your schemas by defining a class that inherits from ``tabfilereader.Schema``. In this class, you define properties that are instances of ``tabfilereader.Column``, which specify where columns are in the file, and what their datatype is. An example:: >>> import re >>> class ExampleSchema(tfr.Schema): ... first = tfr.Column('First Name') ... last = tfr.Column('Last Name', data_required=True) ... birthdate = tfr.Column(re.compile(r'^Birth.*'), data_type=tfr.DateType()) ... weight = tfr.Column('Weight', data_type=tfr.FloatType(), required=False) Columns require at least one argument that tells ``tabfilereader`` how to find the column in the file. For files where the first record contains column names, you can specify either: * The exact name of the column as a string. * An ``re.Pattern`` that will match the column name. * A sequence of strings or ``re.Pattern`` objects that the column could possibly be named as. For files that do not contain a header record, you specify the column's location with an zero-based integer index. Columns also take a series of optional parameters: ``required`` To indicate whether or not it is required that this column exists in the file. Defaults to ``True``. ``data_required`` To indicate whether or not the column must have a value for every record in the file. Defaults to ``False``. ``data_type`` With this parameter, you can provide a ``callable`` that will receive a string value from the file and return a parsed and properly-typed value. If the value is invalid, the callable should throw a ``ValueError``. ``tabfilereader`` provides an array of pre-defined Types that you can use here for the most common data types (numbers, dates, strings, etc). See the API documentation for all the available pre-defined Types. This parameter defaults to ``tabfilereader.StringType()`` if not specified. There are also a handful of optional parameteres that can be declared on the Schema itself. The available options are: ``ignore_unknown_columns`` To indicate what should be done if a Reader finds columns in the file that are not declared in the Schema. Defaults to ``False``, which means the Reader will throw an exception. ``ignore_empty_records`` To indicate what should be done if a Reader encounters a record with no columns whatsoever. Defaults to ``False``, which means the reader will return a record that is full of errors. This option is particularly useful for CSV files when people are a bit sloppy with their newlines at the end of a file. To set these Schema-level options, pass them as keyword arguments in the class declaration:: >>> class SchemaWithOptions(tfr.Schema, ignore_unknown_columns=True): ... column1 = tfr.Column('column_1') Readers ======= Readers use the Schemas to interpret the contents of the tabular files. ``tabfilereader`` provides the following Readers to handle various types of files: ``CsvReader`` Handles Comma Separated Value files (or similarly-constructed files; TSV, etc). ``ExcelReader`` Handles Excel spreadsheets; either XLS- or XLSX-formatted. ``OdsReader`` Handles OpenDocumentFormat spreadsheets. Readers can be created by either calling the ``open()`` classmethod on the specific Reader class you want to use, or by defining your own Reader class that inherits from one provided by ``tabfilereader`` like so:: >>> class MyReader(tfr.CsvReader): ... schema = MySchema ... delimiter = '|' >>> reader = MyReader('test/data/simple_header_pipe.csv') Each reader allows for a variety of optional parameters (like ``delimiter`` in the example above). See the API documentation for a full listing of the options for each. Readers are iterable. Each iteration returns a tuple of two values. The first value is a Record that contains the values from the file. The second value is a collection of all the errors encountered when trying to parse the values in the columns. >>> record, errors = next(reader) >>> record.column1 'foo' >>> record['column2'] 123 >>> bool(errors) False >>> record, errors = next(reader) >>> record.column1 'bar' >>> record['column2'] is None True >>> bool(errors) True >>> errors['column2'] 'A value is required' License ======= This project is released under the terms of the `MIT License`_. .. _MIT License: https://opensource.org/licenses/MIT