Field Processor

A Field Processor is used to transform the values in a row represented by a dictionary that maps column->value pairs. This processor can be used to transform the values of particular fields in a row, where the values of other fields in the row are not required for making the transformation.

Row Processors can be used to make transformations where values of other fields within the same row are required.

In csvio a CSV file is represented by a list of dictionaries that is populated in the rows attribute of the CSVReader or CSVWriter Classes.

Once instantiated, a Field Processor Object can be used by itself to process an arbitrary dictionary that represents a row or can be passed to the constructors of CSVReader or CSVWriter.

In the case where a Field Processor Object is passed to the constructor of CSVReader, it is applied to the rows of the CSVReader as soon as they are read from the CSV file. See example code for further details.

Similarly, in the case where a Field Processor Object is passed to the constructor of CSVWriter, it is applied to the rows of the CSVWriter as soon as they are added for writing to the output CSV using its add_rows() method. See example code for further details.

Field Processor Standalone use

Processor function definitions

def add1(x):
    return x + 1

def cast_to_int(x):
    return int(x)

def replace_big_huge(x):
    return x.replace("Big", "Huge")

Field processors and sample rows

from csvio.processors import FieldProcessor
from json import dumps

row1 = {
    "Supplier": "Big Apples",
    "Fruit": "Apple",
    "Origin": "Spain",
    "Quantity": "1"
}

row2 = {
    "Supplier": "Big Melons",
    "Fruit": "Melons",
    "Origin": "Italy",
    "Quantity": "2"
}

row3 = {
    "Supplier": "Long Mangoes",
    "Fruit": "Mango",
    "Origin": "India",
    "Quantity": "3"
}

rows = [row1, row2, row3]

proc1 = FieldProcessor('increment_qty')
proc1.add_processor("Quantity", cast_to_int)
proc1.add_processor("Quantity", add1)

proc2 = FieldProcessor('replace')
proc2.add_processor("Supplier", replace_big_huge)

Using implicit processor object

If a processor object or handle is not passed to the process_row method, the processor functions associated with the processor object whose process_row method we are calling are used implicitly.

print("Using implicit processor object:")
pretty_print("Before:", row1)
pretty_print("After:", proc1.process_row(row1)) # Using implicit processor object

Output

Using implicit processor object:
Before:
{
    "Supplier": "Big Apples",
    "Fruit": "Apple",
    "Origin": "Spain",
    "Quantity": "1"
}

After:
{
    "Supplier": "Big Apples",
    "Fruit": "Apple",
    "Origin": "Spain",
    "Quantity": 2
}

Using processor handle

Any processor object can be used to apply the processors from another object be using the handle reference as shown below. We are using the handle 'replace' associated with the proc2 object, however we are using the proc1 object to apply the processor.

print("Using processor handle:")
pretty_print("Before:", rows)
pretty_print("After:", proc1.process_rows(rows, 'replace')) # Using processor handle

Output

Using processor handle:
Before:
[
    {
        "Supplier": "Big Apples",
        "Fruit": "Apple",
        "Origin": "Spain",
        "Quantity": "1"
    },
    {
        "Supplier": "Big Melons",
        "Fruit": "Melons",
        "Origin": "Italy",
        "Quantity": "2"
    },
    {
        "Supplier": "Long Mangoes",
        "Fruit": "Mango",
        "Origin": "India",
        "Quantity": "3"
    }
]

After:
[
    {
        "Supplier": "Huge Apples",
        "Fruit": "Apple",
        "Origin": "Spain",
        "Quantity": "1"
    },
    {
        "Supplier": "Huge Melons",
        "Fruit": "Melons",
        "Origin": "Italy",
        "Quantity": "2"
    },
    {
        "Supplier": "Long Mangoes",
        "Fruit": "Mango",
        "Origin": "India",
        "Quantity": "3"
    }
]

Using explicit processor object

Similarly we can also pass any other processor object instead of a handle.

print("Using explicit processor object:")
pretty_print("Before:", rows)
pretty_print("After:", proc1.process_rows(rows, proc2)) # Using explicit processor object

Output

Using explicit processor object:
Before:
[
    {
        "Supplier": "Big Apples",
        "Fruit": "Apple",
        "Origin": "Spain",
        "Quantity": "1"
    },
    {
        "Supplier": "Big Melons",
        "Fruit": "Melons",
        "Origin": "Italy",
        "Quantity": "2"
    },
    {
        "Supplier": "Long Mangoes",
        "Fruit": "Mango",
        "Origin": "India",
        "Quantity": "3"
    }
]

After:
[
    {
        "Supplier": "Huge Apples",
        "Fruit": "Apple",
        "Origin": "Spain",
        "Quantity": "1"
    },
    {
        "Supplier": "Huge Melons",
        "Fruit": "Melons",
        "Origin": "Italy",
        "Quantity": "2"
    },
    {
        "Supplier": "Long Mangoes",
        "Fruit": "Mango",
        "Origin": "India",
        "Quantity": "3"
    }
]
class csvio.processors.field_processor.FieldProcessor(handle: str)

Bases: csvio.processors.processor_base.ProcessorBase

Parameters

handle (required) – Reference handle for the field processor

add_processor(fieldname: str, func_: Union[List[Callable[[str], Any]], Callable[[str], Any]], handle: Optional[str] = None) None

Add a processor function to process fields in a row.

The processor function reference is essentially a callback function that accepts a single argument that represents the value of the fieldname argument from the row upon which the processors will be executed.

The value of the fieldname argument from the row of a CSV is used within this callback function. This callback function should return a single value that should be set to the value of the fieldname once all the required transformations are applied.

Parameters
  • fieldname (required) – Name of the field upon which the processor should be executed.

  • func (required) – Field processor callback function reference or a list of such function references. All function references added with the same handle will be executed for the field, to transform its value in the same order as they are added.

  • handle (optional) – Processor reference handle to which the processor will be added. If not provided, the handle of the current object will be used.

Returns

None

See example code for using with CSVReader

See example code for using with CSVWriter

process_row(row: Dict[str, Any], processor_handle: Optional[Union[Type[csvio.processors.processor_base.ProcessorBase], str]] = None) Dict[str, Any]

Process a single row

This applies the processors defined using the add_processor() function in the same order that they were added using add_processor()

The output row after application of the previous processor function is passed on to the next processor function that was added using add_processor(), and the output of the last processor function added is returned as the final output of this function.

Parameters
  • row (required) – A single dictionary of fieldname->value pairs representing a single row

  • processor_handle (optional) – A processor handle or an object that references the processor functions to apply and transform the row values. The processor functions of the current object are used if this argument is not provided.

Returns

A dictionary representing a processed CSV row

process_rows(rows: List[Dict[str, Any]], processor_handle: Optional[str] = None) List[Dict[str, Any]]

Process a list of rows

Parameters
  • row (required) – A list of dictionaries of fieldname->value pairs representing a list of rows

  • processor_handle (optional) – A processor handle or an object that references the processor functions to apply and transform the row values. The processor functions of the current object are used if this argument is not provided.

Returns

A list of dictionaries representing processed CSV rows