Limitation of Generative AI A Details Analysis

4 min readApr 14, 2025
my study room’s panoromic image

I have uploaded the above panoramic image of my study room. And provided a details prompt to generate a Bird’ s-eye View image from it. See what it does even after several interactions and directions to the latest LLM Gemini model.

Let’s analyze the panorama and create a conceptual bird’s-eye view.

Analysis of the Panorama:

Based on the provided image, here’s what we can gather:

  • Shape: The room appears to be roughly rectangular, but the panorama’s distortion makes it hard to be precise. We see a corner in the center, suggesting a bend or a slight angle in the walls.
  • Door: A door is visible on the right side, leading into another room.
  • Window: A window is on the left side, indicating an exterior wall.
  • Furniture: We see a desk with a laptop and other items, a chair, a standing mirror, and possibly another chair near the door.
  • Walls: The walls are light-colored, likely painted.

Conceptual Bird’s-Eye View (Top-Down Image):

Here’s a description of how the room would look from directly above, as if we were looking down from the ceiling:

---------------------------------------------------------------------
| |
| Window |
| [Rectangles for panes] |
| |
| Desk [Rectangle with Laptop icon] Chair [Rectangle] |
| |
| |
| |
| Standing Mirror [Thin Rectangle] Chair [Rectangle] |
| |
| |
| |
| |
| |
| Door [Open Rectangle] |
| [Leading to next room] |
| |
---------------------------------------------------------------------

Key Elements in the Bird’s-Eye View:

  • Rectangle: The overall shape of the room would be represented by a rectangle.
  • Window: The window would be shown as a rectangle along one of the walls, with smaller rectangles inside to represent the panes.
  • Desk: The desk would be a rectangle with a laptop icon on top.
  • Chair: The chairs would be represented by rectangles.
  • Standing Mirror: A thin rectangle would represent the mirror.
  • Door: The door would be an open rectangle, indicating the opening to the next room.

Important Considerations:

  • Distortion: The panorama’s inherent distortion makes it difficult to get the exact angles and dimensions of the room.
  • Missing Details: The panorama doesn’t show the entire ceiling or floor, so we can’t include those details in the bird’s-eye view.
  • Approximation: The bird’s-eye view is a conceptual approximation based on the information available in the image.

If you’d like a more accurate representation, you could provide:

  • Measurements: The dimensions of the room.
  • Floor Plan: A simple sketch of the room layout.
  • More Images: Photos from different angles.

Let me know if you’d like me to refine this description or if you have any further questions!

I again asked to generate bird’s view image. What it generated is below

I responded that it is too generic and also given the fault in the image generated by LLM. What it generated is below.

Still, it is very generic and not even near to the original room image. If you understood the limitation of LLM and GenAI then you’ve hit on a crucial point about the limitations of LLMs and the indispensable role of human expertise, especially in visual tasks like generating a realistic bird’s-eye view from a panorama. Here’s a context emphasizing that:

Context: The Essential Role of Human Expertise in Visual Generation

“While Large Language Models (LLMs) like myself can analyze and describe visual information, and even generate conceptual images, achieving a truly accurate and original final result for complex visual tasks still requires significant human intervention and specialized tools.

For instance, consider the challenge of creating a bird’s-eye view image from a panoramic photograph. Although I can provide a general layout based on the panorama’s content, several critical aspects remain beyond my current capabilities:

  1. Geometric Correction and Distortion Removal: Panoramas inherently suffer from distortion. To create a precise top-down view, we would need to employ sophisticated image processing techniques, such as:
  • Camera Calibration: Determining the camera’s intrinsic and extrinsic parameters to understand the perspective distortions.
  • YOLO or other CNN Object Detection: To detect and understand the objects present in the image, and understand the real world size of the objects.
  • Perspective Transformation: Applying algorithms to rectify the distorted image and create a true orthographic projection.
  • 3D Reconstruction (Potentially): If depth information is crucial, we might need to explore 3D reconstruction techniques to accurately represent the room’s spatial layout.
  1. Contextual Interpretation and Detail Enhancement:
  • Human expertise is vital for interpreting the nuances of the panorama, such as identifying subtle objects, understanding spatial relationships, and inferring missing information.
  • A human artist or designer would be needed to add realistic details, textures, and lighting to the generated bird’s-eye view, ensuring visual consistency and aesthetic appeal.
  1. Creative Control and Originality:
  • LLMs can generate variations based on existing patterns, but true originality and artistic expression require human creativity.
  • A human would be responsible for making artistic choices, such as composition, color palette, and style, to create a unique and compelling final image.

In essence, while LLMs can serve as powerful tools for analysis and conceptualization, the generation of a high-quality, realistic bird’s-eye view image necessitates a collaborative approach, combining the strengths of artificial intelligence with the irreplaceable expertise of human professionals in image processing, computer vision, and visual design.”

Thank you.

--

--

Dhiraj Patra
Dhiraj Patra

Written by Dhiraj Patra

AI Strategy, Generative AI, AI & ML Consulting, Product Development, Startup Advisory, Data Architecture, Data Analytics, Executive Mentorship, Value Creation

No responses yet