Analyze Image LLM

The Analyze Image LLM component enables you to analyze images by using a selected Large Language Model (LLM) provider. By providing an image URL or optional base64-encoded image data, the system processes the image and generates insights using AI.

This component serves as the central hub for all image processing requirements in the system, accepting image URLs from Amazon S3 buckets or any publicly accessible image URLs without requiring download or preprocessing steps. It supports multiple LLM providers and allows custom prompting to guide analysis toward specific aspects such as defect identification, content description, or visual data extraction.

The component processes one image at a time and can be used within loops for batch image processing. It handles various image formats and can distinguish between different types of visual content based on user-defined prompts.

This component is essential for any workflow involving image understanding, visual quality control, content moderation, or automated image captioning and should be used whenever image analysis capabilities are required.

Skill Level

  • Understanding basic Prompt Engineering concepts is helpful.

  • No Python or JavaScript knowledge is required.

How to use

Key Terms

Term

Definition

LLM

Large Language Model - an AI system trained on vast amounts of data to understand and generate human-like text and analyze images.

LLM Provider

A service that offers access to large language models, such as OpenAI, Google (Gemini), or Anthropic.

System Prompt

The LLM receives initial instructions that guide its approach to the analysis task.

Image Analysis

The process of extracting meaningful information from images using AI technologies.

Image URL

URL of the image which must be openly available over the internet for the LLM to access and analyze it.

When to Use

Use Case

Description

Information Extraction

When you need to extract descriptive information, details, or context from image content.

Content Moderation

For automated screening of visual media to identify inappropriate or restricted content.

Visual Data Analysis

When processing large volumes of images to identify patterns, anomalies, or specific features.

Text-to-Data Conversion

For transforming visual information into structured text data for downstream processing.

Component Configuration

Main component interface:

Analyze Image LLM Component Interface

Required Inputs

Input

Description

Data Type

Example

LLM Provider

Select the AI service provider that performs the image analysis. Different providers have different capabilities and specializations.

LLMProvider

Gemini - Gemini-1.5-Pro

Image URL

The URL pointing to the image you want to analyze. The image must be openly accessible via this URL over the internet for the LLM to process it.

Text

https://example.com/image.jpg

Optional Inputs

Input

Description

Data Type

Example

Image data (Optional)

Base64 encoded data of the image. Use this as an alternative to the Image URL when you have the image data directly.

String

data:image/jpeg;base64,/9j/4AAQSkZJRg...

System Prompt (Optional)

System-generated prompts to guide the analysis. Can specify aspects or contexts for the LLM to consider. Multiple prompts can be added.

Text

Analyze this image for product defects.

Prompt (Optional)

User-defined instructions to customize the analysis. Helps focus the LLM on specific aspects of the image. Multiple prompts can be added.

Text

Describe all text visible in this image.

Possible Chaining

The Analyze Image LLM component can be effectively chained with various other components in the Pipeline Builder to create powerful workflows:

Common components to use with Analyze Image LLM:

  • Call HTTP Get: To fetch images from external APIs or web services.

  • Download From S3: To retrieve images stored in S3 buckets.

  • Scrape Webpage: To extract images from websites for analysis.

  • Extract Text: To further process the analysis results.

  • Check Condition: To create decision branches based on image content.

  • Generate Text: To create summaries or reports based on image analysis.

  • Create VectorDB Context: To store analysis results in a vector database.

  • Chat Response Without Context: To generate responses based on image analysis.

Common workflow patterns include:

  1. Image retrieval → Analyze Image LLM → Text extraction → Conditional logic

  2. Batch image processing → Analyze Image LLM → Database storage

Example Use Case: Product Defect Detection

Scenario: Automating quality control by analyzing images of manufactured products for defects.

Configuration:

  • LLM Provider: Gemini - Gemini-1.5-Pro

  • Image URL: URL to the product image from the manufacturing line camera

  • System Prompt: "You are a quality control specialist analyzing product images. Focus on identifying defects, anomalies, or quality issues."

  • Prompt: "Analyze this image of our manufactured widget. Identify any scratches, dents, misalignments, or color inconsistencies. Rate the overall quality on a scale of 1-10."

Process:

  1. The Pipeline Builder sends the product image to the LLM with the specified prompts.

  2. The LLM analyzes the image for the requested defects and quality issues.

  3. The component returns a detailed analysis including identified issues and quality rating.

  4. Subsequent pipeline components can use this analysis to trigger alerts or sort products based on quality.

Example Implementation:

Output Format

The component outputs a detailed text analysis in the result field. Example output:

{"result":"The image shows a product with several quality issues: 1. There's a visible scratch along the right edge approximately 2cm in length. 2. The blue coloring is inconsistent, with a lighter patch in the upper left quadrant. 3. The alignment of the front panel appears to be off by about 1mm. 4. There are no visible dents or structural damage. Overall quality rating: 6/10. The product is functional but has cosmetic issues that affect its premium appearance."}

Best Practices

  • Use high-quality images for more accurate analysis. Poor resolution or lighting can affect the results.

  • Be specific in your prompts to guide the analysis toward the information you need.

  • Choose the appropriate LLM provider based on your specific use case - some excel at certain types of visual analysis.

  • Combine system and user prompts to create a layered approach, with system prompts providing general context and user prompts focusing on specific details.

  • Test with multiple images to ensure consistent analysis across various inputs.

Troubleshooting

Issue

Possible Cause

Solution

The analysis is too general or vague

Insufficient prompting or guidance

Add more specific prompts that direct the analysis to the aspects you are interested in.

Error: "Unable to process image"

The image URL is inaccessible or the image format is unsupported

Verify the URL is publicly accessible over the internet (not behind authentication or on a private network) and use common image formats (JPG, PNG, and so on).

Analysis misses important details

Image quality issues or limitations of the selected LLM

Use higher-quality images and try a different LLM provider with better image analysis capabilities.

Response timeout

Image is too large or complex

Optimize the image size or try splitting complex analyses into multiple focused queries.

Limitations and Considerations

Limitation/Consideration

Description

Image Accessibility Requirements

Images must be openly available over the internet. Private images, images behind authentication, or images on internal networks cannot be processed by the LLM providers.

Provider Capabilities

Different LLM providers have varying levels of image analysis capabilities. Results may differ between providers.

Content Limitations

Most providers have filters for harmful or inappropriate content and may refuse to analyze certain images.

Processing Time

Complex image analysis may take longer to process compared to text-only queries.

Privacy Considerations

Be mindful that images are sent to third-party LLM providers. Avoid sending sensitive or confidential visual information.

Cost Implications

Image analysis typically consumes more tokens/credits than text analysis, which may impact usage costs.