While working on a book, I realized what kind of product I’m missing. It’s an AI diagram generator based on textual descriptions.
The idea is that the master document for the diagram is text. This textual description can be (and should be) quite detailed, so the generated diagram exactly matches the author’s vision. The diagram itself is not edited. That is, it can be edited – moving circles around, but ideally, after making changes, the system should update the text, generating from which will result in what the user adjusted.
The result — the diagram — should correspond as closely as possible to the description. If it does not match the description because, for example, it’s impossible to make a triangle with three obtuse angles, the system should do its best and provide a verbal response about what didn’t work. The user can then modify the task so that the system complies and produces the diagram correctly.
But then we understand that the author might have randomly achieved something that they liked with their flawed text. And if regenerated, it might turn out differently, and not necessarily better. Therefore —
You could ask the system to generate a diagram description from the diagram, which, if inputted back into the diagram generator, would result exactly in what the description was generated from. Yes, this description would be more verbose and complex, but it would more reliably describe the result.
So, from this point, you are no longer working with the diagram. You are working with text. If a diagram is needed — you simply compile the text into a diagram and it turns out as needed. But you don’t even work directly with the text. You work with this diagram-description text through an LLM, asking it to add some block, and the text changes, but changes in a way that everything doesn’t suddenly shift.
The final diagram should be in an object form, from which raster (PNG) or vector (SVG, EPS) images can be created.
It would also be great if such a system could take existing diagrams or diagram templates so that it could borrow styles and existing conventions on how to display what.
So, these are my fantasies. If anyone has ideas on how to implement this — let’s discuss 🙂

