I believe that the next big breakthrough will be based on the combination of generative AI with AI-driven depth map reconstruction from a single photograph. In fact, it already exists — MiDaS, and several others. Why this is interesting — essentially, it allows for integrating objects into the environment on the photograph in such a way that shadows, palette, and lighting are taken into account. Currently, this is challenging because, figuratively speaking, the AI does not know that the surface of the table in the photo is unevenly lit not just by chance, but because it is angled towards the light source in such a way, and that tree over there creates a shadow. With a depth map, this begins to make sense.
I don’t yet understand how to implement this right now, but it feels like it’s very much the near future. NVIDIA demonstrates recreating 3D from several photos — this is photogrammetry through AI, much faster and at first glance very accurate.



