Finances are important. Especially now that I’ve become married and my home has become a household. Tracking the different costs of things can be overwhelming at times.
Naturally I turned to computers for help.
One viral thing that’s been making the rounds lately are finance graphs. You may have seen them online. They look like this:

I like to use sankeymatic for making these, which is where I got this example aboce. In addition to silently bragging about how much you make or put into your savings, they’re a great way to visualize where your money is going.
How Can AI Help?
One of the most time consuming parts of these graphs is getting the data for your own bank and credit card usage. If you’re lucky, banks will try to classify these for you, provided in nice structured data like a csv file. Most often, the only thing you get from a bank or credit card company is a monthly statement, which is a PDF with several embedded tables like this:

How do you get a category from something like this? How do you summarize such data across banks and months of purchases? The answer is AI. Using the description of each transaction, we can use an LLM to determine what each payment was for.
Let’s see how well ChatGPT 4o handles this just from a simple prompt. For illustration I passed in the same PDF as the above screenshot:

This easiest approach leaves a lot to be desired. While it was able to infer categories “Dining”/”Entertainment”/”Groceries”, ChatGPT does not get the amounts even close to correct

This response might be improved a little bit with more prompt engineering, but there are some hard truths to accept with this problem. The parsing of a PDF is not trivial, and even if the PDF could be extracted to a table of data perfectly, knowing what category a transaction belongs to is not an exact science.
In the upcoming series of blog posts, I’ll describe the initial approach with python, which will include data extraction techniques and different LLMs for categorization.
Leave a Reply