Summary

Moltis opened a fix to downscale oversized images before they enter model context, addressing cases where full-resolution photos could consume hundreds of thousands of tokens. The change targets a practical failure mode for multimodal agents: image inputs that exceed context budgets before the task starts.

What changed

Moltis PR #1138 downscales oversized images before embedding them into model context.

Why it matters

Multimodal agents need preprocessing controls for images just as text agents need compaction and retrieval. Without size guards, ordinary phone photos can cause immediate context overflow, failed turns, or large unintended costs.

Evidence excerpt

Agents Radar reports PR #1138 as a fix that downscales oversized images before they enter model context, avoiding roughly 350K-token base64 payload failures for full-resolution photos.

Sources