Row Processor
A Row Processor is used to transform the values of a row represented by
a dictionary that maps column->value
pairs. This processor is used in
situations where you need to transform values of particular fields in a row
depending upon the values of some other fields within the same row.
In csvio a CSV file is represented by a list of dictionaries that is
populated in the rows
attribute of the CSVReader
or
CSVWriter
Classes.
Once instantiated, a Row Processor Object can be used by itself to process an arbitrary dictionary that represents a row or can be passed to the constructors of CSVReader or CSVWriter.
In the case where a Row Processor Object is passed to the constructor of CSVReader, it is applied to the rows of the CSVReader as soon as they are read from the CSV file. See example code for further details.
Similarly, in the case where a Row Processor Object is passed to the
constructor of CSVWriter, it is applied to the rows of
the CSVWriter as soon as they are added for writing to
the output CSV using its add_rows()
method.
See example code for further details.
Row Processor Standalone use
Processor function definitions
def update_row(row):
row["Supplier"] = f"{row['Supplier']} ({row['Origin']})"
row["Quantity"] = int(row["Quantity"])
if row["Quantity"] > 2:
row["Quantity"] += 1
return row
Row processor and sample rows
from csvio.processors import RowProcessor
from json import dumps
row1 = {
"Supplier": "Big Apples",
"Fruit": "Apple",
"Origin": "Spain",
"Quantity": "1"
}
row2 = {
"Supplier": "Big Melons",
"Fruit": "Melons",
"Origin": "Italy",
"Quantity": "2"
}
row3 = {
"Supplier": "Long Mangoes",
"Fruit": "Mango",
"Origin": "India",
"Quantity": "3"
}
rows = [row1, row2, row3]
rowproc = RowProcessor("rp1")
rowproc.add_processor(update_row)
processed_rows = rowproc.process_rows(rows)
print("Before:")
print(dumps(rows, indent=4))
print()
print("After:")
print(dumps(processed_rows, indent=4))
Output
Before:
[
{
"Supplier": "Big Apples",
"Fruit": "Apple",
"Origin": "Spain",
"Quantity": "1"
},
{
"Supplier": "Big Melons",
"Fruit": "Melons",
"Origin": "Italy",
"Quantity": "2"
},
{
"Supplier": "Long Mangoes",
"Fruit": "Mango",
"Origin": "India",
"Quantity": "3"
}
]
After:
[
{
"Supplier": "Big Apples (Spain)",
"Fruit": "Apple",
"Origin": "Spain",
"Quantity": 1
},
{
"Supplier": "Big Melons (Italy)",
"Fruit": "Melons",
"Origin": "Italy",
"Quantity": 2
},
{
"Supplier": "Long Mangoes (India)",
"Fruit": "Mango",
"Origin": "India",
"Quantity": 4
}
]
- class csvio.processors.row_processor.RowProcessor(handle: str)
Bases:
csvio.processors.processor_base.ProcessorBase
- Parameters
handle (required) – Reference handle for the row processor
- add_processor(func_: Union[List[Callable[[Dict[str, Any]], Any]], Callable[[Dict[str, Any]], Any]], handle: Optional[str] = None) None
Add a processor function to process rows.
The processor function reference is essentially a callback function that accepts a single argument that represents a row.
The
fieldnames
from a CSV can be used within this callback function as the keys to this single argument that represents a row to access its values and perform the required transformations. This callback function should return a dictionary representing a row, once all the required transformations are applied.- Parameters
func (required) – Row processor callback function reference or a list of such function references. All function references added with the same handle will be executed for the row, to transform its value in the same order as they are added. A single processor function will be sufficient to perform all the transformations for the rows in a CSV, if it has all the transformation operations required in its definition.
handle (optional) – Processor reference handle to which the processor will be added. If not provided, the handle of the current object will be used.
- Returns
None
See example code for using with
CSVReader
See example code for using with
CSVWriter
- process_row(row: Dict[str, Any], processor_handle: Optional[Union[Type[csvio.processors.processor_base.ProcessorBase], str]] = None) Dict[str, Any]
Process a single row.
This applies the processors defined using the
add_processor()
function in the same order that they were added usingadd_processor()
The output row after application of the previous processor function is passed on to the next processor function that was added using
add_processor()
, and the output of the last processor function added is returned as the final output of this function.- Parameters
row (required) – A single dictionary of
fieldname->value
pairs representing a single rowprocessor_handle (optional) – A processor handle or an object that references the processor functions to apply and transform the row values. The processor functions of the current object are used if this argument is not provided.
- Returns
A dictionary representing a processed CSV row
- process_rows(rows: List[Dict[str, Any]], processor_handle: Optional[str] = None) List[Dict[str, Any]]
Process a list of rows
- Parameters
row (required) – A list of dictionaries of
fieldname->value
pairs representing a list of rowsprocessor_handle (optional) – A processor handle or an object that references the processor functions to apply and transform the row values. The processor functions of the current object are used if this argument is not provided.
- Returns
A list of dictionaries representing processed CSV rows