Agent Output Types Examples with Vision Capabilities¶
This example demonstrates how to use different output types when working with Swarms agents, including vision-enabled agents that can analyze images. Each output type formats the agent's response in a specific way, making it easier to integrate with different parts of your application.
Prerequisites¶
- Python 3.7+
 - OpenAI API key
 - Anthropic API key (optional, for Claude models)
 - Swarms library
 
Installation¶
Environment Variables¶
WORKSPACE_DIR="agent_workspace"
OPENAI_API_KEY=""  # Required for GPT-4V vision capabilities
ANTHROPIC_API_KEY=""  # Optional, for Claude models
Examples¶
Vision-Enabled Quality Control Agent¶
from swarms.structs import Agent
from swarms.prompts.logistics import (
    Quality_Control_Agent_Prompt,
)
# Image for analysis
factory_image = "image.jpg"
# Quality control agent
quality_control_agent = Agent(
    agent_name="Quality Control Agent",
    agent_description="A quality control agent that analyzes images and provides a detailed report on the quality of the product in the image.",
    model_name="gpt-4.1-mini",
    system_prompt=Quality_Control_Agent_Prompt,
    multi_modal=True,
    max_loops=2,
    output_type="str-all-except-first",
)
response = quality_control_agent.run(
    task="what is in the image?",
    img=factory_image,
)
print(response)
Supported Image Formats¶
The vision-enabled agents support various image formats including:
| Format | Description | 
|---|---|
| JPEG/JPG | Standard image format with lossy compression | 
| PNG | Lossless format supporting transparency | 
| GIF | Animated format (only first frame used) | 
| WebP | Modern format with both lossy and lossless compression | 
Best Practices for Vision Tasks¶
| Best Practice | Description | 
|---|---|
| Image Quality | Ensure images are clear and well-lit for optimal analysis | 
| Image Size | Keep images under 20MB and in supported formats | 
| Task Specificity | Provide clear, specific instructions for image analysis | 
| Model Selection | Use vision-capable models (e.g., GPT-4V) for image tasks |