I recently set up Stable Diffusion locally with the default model and openjourney – which is an unlimited prompt-based image generator. And here’s a thought. The prompt should be formulated as if the computer crafted it, looking at the final image. Computers recognize patterns in images that have previously appeared in the training dataset. Thus, it can’t properly draw a three-legged cat unless the dataset included three-legged animals. Likewise, it would see the cat in a three-legged cat, not the three-legged aspect. In essence, abstract thinking is severely limited to what it has been trained on.
Yet, it’s possible to train using algorithmically generated objects. For example, if you define three-leggedness in modern animation programs, you could generate thousands of diverse animals on such a skeleton, in various poses. It would be interesting to develop a system based on this. In theory, this approach could extend beyond just animals. Implementing a CAPTCHA based on ai-generated results could lead people to filter and tag results for free.
