Vision Processing Examples¶
This example demonstrates how to use vision-enabled agents in Swarms to analyze images and process visual information. You'll learn how to work with both OpenAI and Anthropic vision models for various use cases.
Prerequisites¶
- 
Python 3.7+
 - 
OpenAI API key (for GPT-4V)
 - 
Anthropic API key (for Claude 3)
 - 
Swarms library
 
Installation¶
Environment Variables¶
WORKSPACE_DIR="agent_workspace"
OPENAI_API_KEY=""  # Required for GPT-4V
ANTHROPIC_API_KEY=""  # Required for Claude 3
Working with Images¶
Supported Image Formats¶
Vision-enabled agents support various image formats:
| Format | Description | 
|---|---|
| JPEG/JPG | Standard image format with lossy compression | 
| PNG | Lossless format supporting transparency | 
| GIF | Animated format (only first frame used) | 
| WebP | Modern format with both lossy and lossless compression | 
Image Guidelines¶
- Maximum file size: 20MB
 - Recommended resolution: At least 512x512 pixels
 - Image should be clear and well-lit
 - Avoid heavily compressed or blurry images
 
Examples¶
1. Quality Control with GPT-4V¶
from swarms.structs import Agent
from swarms.prompts.logistics import Quality_Control_Agent_Prompt
# Load your image
factory_image = "path/to/your/image.jpg"  # Local file path
# Or use a URL
# factory_image = "https://example.com/image.jpg"
# Initialize quality control agent with GPT-4V
quality_control_agent = Agent(
    agent_name="Quality Control Agent",
    agent_description="A quality control agent that analyzes images and provides detailed quality reports.",
    model_name="gpt-4.1-mini",
    system_prompt=Quality_Control_Agent_Prompt,
    multi_modal=True,
    max_loops=1
)
# Run the analysis
response = quality_control_agent.run(
    task="Analyze this image and provide a detailed quality control report",
    img=factory_image
)
print(response)
2. Visual Analysis with Claude 3¶
from swarms.structs import Agent
from swarms.prompts.logistics import Visual_Analysis_Prompt
# Load your image
product_image = "path/to/your/product.jpg"
# Initialize visual analysis agent with Claude 3
visual_analyst = Agent(
    agent_name="Visual Analyst",
    agent_description="An agent that performs detailed visual analysis of products and scenes.",
    model_name="anthropic/claude-3-opus-20240229",
    system_prompt=Visual_Analysis_Prompt,
    multi_modal=True,
    max_loops=1
)
# Run the analysis
response = visual_analyst.run(
    task="Provide a comprehensive analysis of this product image",
    img=product_image
)
print(response)
3. Image Batch Processing¶
from swarms.structs import Agent
import os
def process_image_batch(image_folder, agent):
    """Process multiple images in a folder"""
    results = []
    for image_file in os.listdir(image_folder):
        if image_file.lower().endswith(('.png', '.jpg', '.jpeg', '.webp')):
            image_path = os.path.join(image_folder, image_file)
            response = agent.run(
                task="Analyze this image",
                img=image_path
            )
            results.append((image_file, response))
    return results
# Example usage
image_folder = "path/to/image/folder"
batch_results = process_image_batch(image_folder, visual_analyst)
Best Practices¶
| Category | Best Practice | Description | 
|---|---|---|
| Image Preparation | Format Support | Ensure images are in supported formats (JPEG, PNG, GIF, WebP) | 
| Size & Quality | Optimize image size and quality for better processing | |
| Image Quality | Use clear, well-lit images for accurate analysis | |
| Model Selection | GPT-4V Usage | Use for general vision tasks and detailed analysis | 
| Claude 3 Usage | Use for complex reasoning and longer outputs | |
| Batch Processing | Consider batch processing for multiple images | |
| Error Handling | Path Validation | Always validate image paths before processing | 
| API Error Handling | Implement proper error handling for API calls | |
| Rate Monitoring | Monitor API rate limits and token usage | |
| Performance Optimization | Result Caching | Cache results when processing the same images | 
| Batch Processing | Use batch processing for multiple images | |
| Parallel Processing | Implement parallel processing for large datasets |