I wonder if there exists an agent that takes an Excel table significantly larger than the context window and begins to document its essence. Here are several tabs. Here on tab 5, there is a table with a million rows and five columns. The columns are as follows. We take random data from the table, looks like there are numbers, and there – surnames. We assume that there are numbers everywhere – we write a code that checks this assumption and at the same time calculates min/max and a set of unique values. So, few values, only five. We record it. Now we check the surnames. Yes, these are just strings, new sampling showed that they are indeed surnames. Here’s a formula. We see where it points. And so on. And this column – unclear purpose. We look at the data – these are some numbers from 0 to 1. We measure the average and the spread. We ask the user – maybe they’ll provide some comments. They did. It turned out to be a KPI issued to this user from an external system. We record it. And so on. Documentation emerges. Later, when there is documentation, one can request to perform some operations with all this, since the LLM now more or less understands the purpose of the data and their connection, and can build some hypotheses on detecting outliers and verifying them.
