csvio Logo

csvio: Python Library for processing CSV files

GitHub License Python Versions Readthedocs

csvio is a Python library that provides a wrapper around Python’s built in csv.DictReader and csv.DictWriter, for ease of reading and writing CSV files.

Rows in a CSV are represented and processed as a list of dictionaries. Each item in this list is a dictionary that represents a row. The key, value pairs in each dictionary is a mapping between the column and its associated row value from the CSV.

Installation

pip install csvio

Documentation

Readthedocs

Reading CSVs

>>> from csvio import CSVReader
>>> reader = CSVReader("fruit_stock.csv")
>>> reader.fieldnames
['Supplier', 'Fruit', 'Quantity']
>>> len(reader.rows)
4

>>> import json
>>> print(json.dumps(reader.rows, indent=4))
[
   {
         "Supplier": "Big Apple",
         "Fruit": "Apple",
         "Quantity": "1"
   },
   {
         "Supplier": "Big Melons",
         "Fruit": "Melons",
         "Quantity": "2"
   },
   {
         "Supplier": "Long Mangoes",
         "Fruit": "Mango",
         "Quantity": "3"
   },
   {
         "Supplier": "Small Strawberries",
         "Fruit": "Strawberry",
         "Quantity": "4"
   }
]

CSV file contents:

Supplier,Fruit,Quantity
Big Apple,Apple,1
Big Melons,Melons,2
Long Mangoes,Mango,3
Small Strawberries,Strawberry,4

Writing CSVs

>>> from csvio import CSVWriter
>>> writer = CSVWriter("fruit_stock.csv", fieldnames=["Supplier", "Fruit", "Quantity"])
>>> row1 = {"Supplier": "Big Apple", "Fruit": "Apple", "Quantity": 1}
>>> writer.add_rows(row1)
>>> rows2_3_4 = [
...     {"Supplier": "Big Melons", "Fruit": "Melons", "Quantity": 2},
...     {"Supplier": "Long Mangoes", "Fruit": "Mango", "Quantity": 3},
...     {"Supplier": "Small Strawberries", "Fruit": "Strawberry", "Quantity": 4}
... ]
>>> writer.add_rows(rows2_3_4)
>>> len(writer.pending_rows)
4

>>> len(writer.rows)
0

>>> writer.flush()
>>> len(writer.pending_rows)
0

>>> len(writer.rows)
4

Once flush is called a CSV file with the name fruit_stock.csv will be written with the following contents.

Supplier,Fruit,Quantity
Big Apple,Apple,1
Big Melons,Melons,2
Long Mangoes,Mango,3
Small Strawberries,Strawberry,4

Apply field processors to transform row values

Example with CSVReader

Example with CSVWriter

Standalone use

Processor function definitions

def add1(x):
    return x + 1

def cast_to_int(x):
    return int(x)

def replace_big_huge(x):
    return x.replace("Big", "Huge")

Field processors and sample rows

from csvio.processors import FieldProcessor
from json import dumps

row1 = {
    "Supplier": "Big Apples",
    "Fruit": "Apple",
    "Origin": "Spain",
    "Quantity": "1"
}

row2 = {
    "Supplier": "Big Melons",
    "Fruit": "Melons",
    "Origin": "Italy",
    "Quantity": "2"
}

row3 = {
    "Supplier": "Long Mangoes",
    "Fruit": "Mango",
    "Origin": "India",
    "Quantity": "3"
}

rows = [row1, row2, row3]

proc1 = FieldProcessor('increment_qty')
proc1.add_processor("Quantity", cast_to_int)
proc1.add_processor("Quantity", add1)

proc2 = FieldProcessor('replace')
proc2.add_processor("Supplier", replace_big_huge)

Using implicit processor object

If a processor object or handle is not passed to the process_row method, the processor functions associated with the processor object whose process_row method we are calling are used implicitly.

print("Using implicit processor object:")
pretty_print("Before:", row1)
pretty_print("After:", proc1.process_row(row1)) # Using implicit processor object

Output

Using implicit processor object:
Before:
{
    "Supplier": "Big Apples",
    "Fruit": "Apple",
    "Origin": "Spain",
    "Quantity": "1"
}

After:
{
    "Supplier": "Big Apples",
    "Fruit": "Apple",
    "Origin": "Spain",
    "Quantity": 2
}

Using processor handle

Any processor object can be used to apply the processors from another object be using the handle reference as shown below. We are using the handle 'replace' associated with the proc2 object, however we are using the proc1 object to apply the processor.

print("Using processor handle:")
pretty_print("Before:", rows)
pretty_print("After:", proc1.process_rows(rows, 'replace')) # Using processor handle

Output

Using processor handle:
Before:
[
    {
        "Supplier": "Big Apples",
        "Fruit": "Apple",
        "Origin": "Spain",
        "Quantity": "1"
    },
    {
        "Supplier": "Big Melons",
        "Fruit": "Melons",
        "Origin": "Italy",
        "Quantity": "2"
    },
    {
        "Supplier": "Long Mangoes",
        "Fruit": "Mango",
        "Origin": "India",
        "Quantity": "3"
    }
]

After:
[
    {
        "Supplier": "Huge Apples",
        "Fruit": "Apple",
        "Origin": "Spain",
        "Quantity": "1"
    },
    {
        "Supplier": "Huge Melons",
        "Fruit": "Melons",
        "Origin": "Italy",
        "Quantity": "2"
    },
    {
        "Supplier": "Long Mangoes",
        "Fruit": "Mango",
        "Origin": "India",
        "Quantity": "3"
    }
]

Using explicit processor object

Similarly we can also pass any other processor object instead of a handle.

print("Using explicit processor object:")
pretty_print("Before:", rows)
pretty_print("After:", proc1.process_rows(rows, proc2)) # Using explicit processor object

Output

Using explicit processor object:
Before:
[
    {
        "Supplier": "Big Apples",
        "Fruit": "Apple",
        "Origin": "Spain",
        "Quantity": "1"
    },
    {
        "Supplier": "Big Melons",
        "Fruit": "Melons",
        "Origin": "Italy",
        "Quantity": "2"
    },
    {
        "Supplier": "Long Mangoes",
        "Fruit": "Mango",
        "Origin": "India",
        "Quantity": "3"
    }
]

After:
[
    {
        "Supplier": "Huge Apples",
        "Fruit": "Apple",
        "Origin": "Spain",
        "Quantity": "1"
    },
    {
        "Supplier": "Huge Melons",
        "Fruit": "Melons",
        "Origin": "Italy",
        "Quantity": "2"
    },
    {
        "Supplier": "Long Mangoes",
        "Fruit": "Mango",
        "Origin": "India",
        "Quantity": "3"
    }
]

Create nested dictionaries with specified path

CSV Contents: fruit_stock.csv

Supplier,Fruit,Origin,Quantity
Big Apples,Apple,Spain,1
Big Melons,Melons,Italy,2
Long Mangoes,Mango,India,3
Small Strawberries,Strawberry,France,4
Short Mangoes,Mango,France,5
Sweet Strawberries,Strawberry,Spain,6
Square Apples,Apple,Italy,7
Small Melons,Melons,Italy,8
Dark Berries,Strawberry,Australia,9
Sweet Berries,Blackcurrant,Australia,10

Create dictionary with hierarchy {"Fruit": [rows]}

from csvio.csvreader import CSVReader
from json import dumps

reader = CSVReader("fruit_stock.csv")

col_order = ["Fruit"]

dict_tree= reader.rows_to_nested_dicts(col_order)

print(dumps(dict_tree, indent=4))

Output:

{
    "Apple": [
        {
            "Supplier": "Big Apples",
            "Fruit": "Apple",
            "Origin": "Spain",
            "Quantity": "1"
        },
        {
            "Supplier": "Square Apples",
            "Fruit": "Apple",
            "Origin": "Italy",
            "Quantity": "7"
        }
    ],
    "Melons": [
        {
            "Supplier": "Big Melons",
            "Fruit": "Melons",
            "Origin": "Italy",
            "Quantity": "2"
        },
        {
            "Supplier": "Small Melons",
            "Fruit": "Melons",
            "Origin": "Italy",
            "Quantity": "8"
        }
    ],
    "Mango": [
        {
            "Supplier": "Long Mangoes",
            "Fruit": "Mango",
            "Origin": "India",
            "Quantity": "3"
        },
        {
            "Supplier": "Short Mangoes",
            "Fruit": "Mango",
            "Origin": "France",
            "Quantity": "5"
        }
    ],
    "Strawberry": [
        {
            "Supplier": "Small Strawberries",
            "Fruit": "Strawberry",
            "Origin": "France",
            "Quantity": "4"
        },
        {
            "Supplier": "Sweet Strawberries",
            "Fruit": "Strawberry",
            "Origin": "Spain",
            "Quantity": "6"
        },
        {
            "Supplier": "Dark Berries",
            "Fruit": "Strawberry",
            "Origin": "Australia",
            "Quantity": "9"
        }
    ],
    "Blackcurrant": [
        {
            "Supplier": "Sweet Berries",
            "Fruit": "Blackcurrant",
            "Origin": "Australia",
            "Quantity": "10"
        }
    ]
}

Create dictionary with hierarchy {"Fruit": "Origin" : [rows]}

from csvio.csvreader import CSVReader
from json import dumps

reader = CSVReader("fruit_stock.csv")

col_order = ["Fruit", "Origin"]

dict_tree= reader.rows_to_nested_dicts(col_order)

print(dumps(dict_tree, indent=4))

Output:

{
    "Apple": {
        "Spain": [
            {
                "Supplier": "Big Apples",
                "Fruit": "Apple",
                "Origin": "Spain",
                "Quantity": "1"
            }
        ],
        "Italy": [
            {
                "Supplier": "Square Apples",
                "Fruit": "Apple",
                "Origin": "Italy",
                "Quantity": "7"
            }
        ]
    },
    "Melons": {
        "Italy": [
            {
                "Supplier": "Big Melons",
                "Fruit": "Melons",
                "Origin": "Italy",
                "Quantity": "2"
            },
            {
                "Supplier": "Small Melons",
                "Fruit": "Melons",
                "Origin": "Italy",
                "Quantity": "8"
            }
        ]
    },
    "Mango": {
        "India": [
            {
                "Supplier": "Long Mangoes",
                "Fruit": "Mango",
                "Origin": "India",
                "Quantity": "3"
            }
        ],
        "France": [
            {
                "Supplier": "Short Mangoes",
                "Fruit": "Mango",
                "Origin": "France",
                "Quantity": "5"
            }
        ]
    },
    "Strawberry": {
        "France": [
            {
                "Supplier": "Small Strawberries",
                "Fruit": "Strawberry",
                "Origin": "France",
                "Quantity": "4"
            }
        ],
        "Spain": [
            {
                "Supplier": "Sweet Strawberries",
                "Fruit": "Strawberry",
                "Origin": "Spain",
                "Quantity": "6"
            }
        ],
        "Australia": [
            {
                "Supplier": "Dark Berries",
                "Fruit": "Strawberry",
                "Origin": "Australia",
                "Quantity": "9"
            }
        ]
    },
    "Blackcurrant": {
        "Australia": [
            {
                "Supplier": "Sweet Berries",
                "Fruit": "Blackcurrant",
                "Origin": "Australia",
                "Quantity": "10"
            }
        ]
    }
}

Construct a dictionary with number of rows for each unique Origin

from csvio.csvreader import CSVReader
from json import dumps

reader = CSVReader("fruit_stock.csv")

col_order = ["Origin"]

origin_fruit_count = {}
dict_tree = reader.rows_to_nested_dicts(col_order)

for origin in dict_tree:
    origin_fruit_count.setdefault(origin, len(dict_tree[origin]))

print(dumps(origin_fruit_count, indent=4))

Output:

{
    "Spain": 2,
    "Italy": 3,
    "India": 1,
    "France": 2,
    "Australia": 2
}

Contents