Context Based Image Retrieval System for Technical Textbook to Chatbot Conversion

Overview

New Clarity built a specialized AI chatbot that transformed a highly technical textbook into an interactive assistant that was able to effectively utilize images to answer queries. This system was able to display the correct technical images alongside its answers due to a secondary RAG system created by analyzing the context around and references to each image. By combining text retrieval with intelligent image selection, the chatbot created a far more effective way for users to engage with dense, visually complex subject matter.

The Challenge

Technical textbooks often rely heavily on detailed images, diagrams, and figures. These images are tied closely to the surrounding text and are critical for understanding the subject matter. Standard LLMs, however, were not able to reliably recognize or connect these images to specific queries because of the specialization of the topic. A solution was needed that could:

Understand the relationship between the text and its images.
Identify when an image was required to fully answer a query.
Display the correct image automatically alongside the textual explanation.

The New Clarity Solution

New Clarity developed a custom retrieval augmented generation (RAG) system designed to handle both text and images. The solution included several layers of engineering:

Image Extraction: All images from the textbook were extracted and cataloged as independent assets.
Metadata Creation: Each image was assigned descriptive metadata that captured not only its visual content but also its relationship to the surrounding text and references in the book.
Dual RAG System: One RAG pipeline was built for text retrieval and another specifically for images. The text system identified relevant passages, while the image system selected the most contextually appropriate visual asset.
Context-Aware Integration: When answering a user query, the chatbot retrieved both text and images, merging them into a single coherent response.

Results

The finished chatbot delivered a more advanced learning tool than a simple question-and-answer system:

Accurate Image Selection: Users saw the most relevant technical diagram or figure paired with the explanation.
Enhanced Learning: By linking complex text with visuals, users gained deeper understanding of difficult concepts.
Automation of a Manual Process: What once required flipping through chapters and cross-referencing diagrams was now handled instantly by the AI system.

Why It Works

The success of the project came from treating images as first-class data within the retrieval system. Rather than relying on an LLM alone, New Clarity combined metadata-rich image indexing with advanced text retrieval. This allowed the AI to reason about when an image was necessary and which one best supported the user’s question.

Next Steps

This project demonstrated how advanced retrieval and metadata strategies can extend AI beyond text-only systems. Similar techniques can be applied in fields such as engineering, medicine, or legal research, where diagrams, scans, or exhibits are as important as the text itself. New Clarity continues to help organizations unlock the full value of their technical content by pairing AI agents with both textual and visual intelligence.

‍