Field Processor
A Field Processor may be used to transform the values in a row represented by
a dictionary that maps column->value
pairs.
In csvio a CSV file is represented by a list of dictionaries that is
populated in the rows
attribute of the CSVReader
or
CSVWriter
Classes.
Once instantiated, a Field Processor Object can be used by itself to process an
arbitrary dictionary that represents a row or can be passed to the constructors
of CSVReader
or CSVWriter
.
In the case where a Field Processor Object is passed to the constructor of
CSVReader
, it is applied to the rows of the
CSVReader
as soon as they are read from the CSV file.
See example code for further details.
Similarly, in the case where a Field Processor Object is passed to the
constructor of CSVWriter
, it is applied to the rows of the
CSVWriter
as soon as they are added for writing to the
output CSV using its add_rows()
method.
See example code for further details.
Standalone Example Usage
Processor function definitions
def add1(x):
return x + 1
def cast_to_int(x):
return int(x)
def replace_big_huge(x):
return x.replace("Big", "Huge")
Field processors and sample rows
from csvio.processors import FieldProcessor
from json import dumps
row1 = {
"Supplier": "Big Apples",
"Fruit": "Apple",
"Origin": "Spain",
"Quantity": "1"
}
row2 = {
"Supplier": "Big Melons",
"Fruit": "Melons",
"Origin": "Italy",
"Quantity": "2"
}
row3 = {
"Supplier": "Long Mangoes",
"Fruit": "Mango",
"Origin": "India",
"Quantity": "3"
}
rows = [row1, row2, row3]
proc1 = FieldProcessor('increment_qty')
proc1.add_processor("Quantity", cast_to_int)
proc1.add_processor("Quantity", add1)
proc2 = FieldProcessor('replace')
proc2.add_processor("Supplier", replace_big_huge)
Using implicit processor object
If a processor object or handle is not passed to the process_row
method,
the processor functions associated with the processor object whose
process_row
method we are calling are used implicitly.
print("Using implicit processor object:")
pretty_print("Before:", row1)
pretty_print("After:", proc1.process_row(row1)) # Using implicit processor object
Output
Using implicit processor object:
Before:
{
"Supplier": "Big Apples",
"Fruit": "Apple",
"Origin": "Spain",
"Quantity": "1"
}
After:
{
"Supplier": "Big Apples",
"Fruit": "Apple",
"Origin": "Spain",
"Quantity": 2
}
Using processor handle
Any processor object can be used to apply the processors from another object
be using the handle reference as shown below. We are using the handle
'replace'
associated with the proc2
object, however we are using the
proc1
object to apply the processor.
print("Using processor handle:")
pretty_print("Before:", rows)
pretty_print("After:", proc1.process_rows(rows, 'replace')) # Using processor handle
Output
Using processor handle:
Before:
[
{
"Supplier": "Big Apples",
"Fruit": "Apple",
"Origin": "Spain",
"Quantity": "1"
},
{
"Supplier": "Big Melons",
"Fruit": "Melons",
"Origin": "Italy",
"Quantity": "2"
},
{
"Supplier": "Long Mangoes",
"Fruit": "Mango",
"Origin": "India",
"Quantity": "3"
}
]
After:
[
{
"Supplier": "Huge Apples",
"Fruit": "Apple",
"Origin": "Spain",
"Quantity": "1"
},
{
"Supplier": "Huge Melons",
"Fruit": "Melons",
"Origin": "Italy",
"Quantity": "2"
},
{
"Supplier": "Long Mangoes",
"Fruit": "Mango",
"Origin": "India",
"Quantity": "3"
}
]
Using explicit processor object
Similarly we can also pass any other processor object instead of a handle.
print("Using explicit processor object:")
pretty_print("Before:", rows)
pretty_print("After:", proc1.process_rows(rows, proc2)) # Using explicit processor object
Output
Using explicit processor object:
Before:
[
{
"Supplier": "Big Apples",
"Fruit": "Apple",
"Origin": "Spain",
"Quantity": "1"
},
{
"Supplier": "Big Melons",
"Fruit": "Melons",
"Origin": "Italy",
"Quantity": "2"
},
{
"Supplier": "Long Mangoes",
"Fruit": "Mango",
"Origin": "India",
"Quantity": "3"
}
]
After:
[
{
"Supplier": "Huge Apples",
"Fruit": "Apple",
"Origin": "Spain",
"Quantity": "1"
},
{
"Supplier": "Huge Melons",
"Fruit": "Melons",
"Origin": "Italy",
"Quantity": "2"
},
{
"Supplier": "Long Mangoes",
"Fruit": "Mango",
"Origin": "India",
"Quantity": "3"
}
]
- class csvio.processors.field_processor.FieldProcessor(handle: str)
Bases:
csvio.processors.processor_base.ProcessorBase
- Parameters
handle (required) – Reference handle for the field processor
- add_processor(fieldname: str, func_: Union[List[Callable[[str], Any]], Callable[[str], Any]], handle: Optional[str] = None) None
Add a processor to process fields in a row.
- Parameters
fieldname (required) – Name of the field upon which the processor should be executed.
func (required) – Field processor function reference or a list of such function references. All function references added with the same handle will be executed for the field, to transform its value in the same order as they are added.
handle (optional) – Processor reference handle to which the processor will be added. If not provided, the handle of the current object will be used.
- Returns
None
See example code for using with
CSVReader
See example code for using with
CSVWriter
- process_row(row: Dict[str, Any], processor_handle: Optional[Union[Type[csvio.processors.processor_base.ProcessorBase], str]] = None) Dict[str, Any]
Process a single row
- Parameters
row (required) – A single dictionary of
fieldname->value
pairs representing a single rowprocessor_handle (optional) – A processor handle or an object that references the processor functions to apply and transform the row values. The processor functions of the current object are used if this argument is not provided.
- Returns
A dictionary representing a processed CSV row
- process_rows(rows: List[Dict[str, Any]], processor_handle: Optional[str] = None) List[Dict[str, Any]]
Process a list of rows
- Parameters
row (required) – A list of dictionaries of
fieldname->value
pairs representing a list of rowsprocessor_handle (optional) – A processor handle or an object that references the processor functions to apply and transform the row values. The processor functions of the current object are used if this argument is not provided.
- Returns
A list of dictionaries representing processed CSV rows