Skip to content

Python: Allow @tool functions to return rich content (images, audio)#4331

Open
giles17 wants to merge 4 commits intomicrosoft:mainfrom
giles17:giles/tool-rich-content-results
Open

Python: Allow @tool functions to return rich content (images, audio)#4331
giles17 wants to merge 4 commits intomicrosoft:mainfrom
giles17:giles/tool-rich-content-results

Conversation

@giles17
Copy link
Contributor

@giles17 giles17 commented Feb 26, 2026

Description

Closes #4272

When a @tool function returns a Content object (e.g. Content.from_data(image_bytes, "image/png")), the framework now preserves it as rich content that the model can perceive natively — instead of serializing it to a JSON string.

Problem

Previously, FunctionTool.parse_result() serialized any Content return to JSON text via _make_dumpable(). The model received {"type": "function_call_output", "output": "{...}"} — a text blob, not the actual image. The same issue existed in MCP tool results where ImageContent was JSON-serialized.

Solution

Added an items field to function_result Content that carries rich Content objects (images, audio, files) alongside the text result. Providers format these items using their existing multi-modal content handling.

User API — no decorator changes needed:

@tool
async def capture_screenshot(url: str) -> Content:
    image_bytes = await take_screenshot(url)
    return Content.from_data(data=image_bytes, media_type="image/png")

@tool
async def render_chart(data: str) -> list[Content]:
    image_bytes = render(data)
    return [
        Content.from_text("Chart rendered."),
        Content.from_data(data=image_bytes, media_type="image/png"),
    ]

Changes

Core framework:

  • _types.py: Added items field to Content and from_function_result()
  • _tools.py: Updated parse_result() to preserve Content returns instead of JSON-serializing. Added _build_function_result() helper. Updated invoke() return type.
  • _mcp.py: Updated _parse_tool_result_from_mcp() to return list[Content] for image/audio instead of JSON strings

All 6 providers updated:

  • OpenAI Responses: Injects rich items as user message with input_image after function_call_output
  • OpenAI Chat Completions: Formats tool message content as multi-part array with image_url
  • Anthropic: Formats rich items as native image blocks in tool_result content array
  • Bedrock/Ollama/Azure-AI: Logs warning when rich items present (unsupported by these APIs)

Tests: 8 new tests + 2 updated existing tests, all passing.

…udio)

Add support for tool functions to return Content objects that the model can perceive natively. Closes microsoft#4272

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings February 26, 2026 19:48
@markwallace-microsoft
Copy link
Member

markwallace-microsoft commented Feb 26, 2026

Python Test Coverage

Python Test Coverage Report •
FileStmtsMissCoverMissing
packages/anthropic/agent_framework_anthropic
   _chat_client.py3974688%457, 544, 546, 664–669, 677–678, 683, 687, 721–722, 785, 806–807, 850–852, 854, 867–868, 875–877, 881–883, 887–890, 1003, 1013, 1047, 1069, 1190, 1217–1218, 1235, 1248, 1261, 1286–1287
packages/azure-ai/agent_framework_azure_ai
   _chat_client.py4807684%391–392, 394, 578, 583–584, 586–587, 590, 593, 595, 600, 861–862, 864, 867, 870, 873–878, 881, 883, 891, 903–905, 909, 912–913, 921–924, 934, 942–945, 947–948, 950–951, 958, 966–967, 975–976, 981–982, 986–993, 998, 1001, 1009, 1015, 1023–1025, 1028, 1050–1051, 1184, 1212, 1227, 1343, 1395, 1470
packages/core/agent_framework
   _mcp.py4256484%97–98, 108–113, 124, 129, 181–182, 192–197, 207–208, 222, 269, 278, 341, 349, 500, 567, 602, 604, 608–609, 611–612, 666, 681, 699, 740, 845, 858–863, 885, 934–935, 941–943, 962, 987–988, 992–996, 1013–1017, 1161
   _tools.py8929289%166–167, 322, 324, 342–344, 351, 369, 383, 390, 397, 413, 415, 422, 459, 484, 488, 505–507, 554–556, 579, 603, 646, 668, 731–737, 773, 784–795, 817–819, 824, 828, 842–844, 883, 952, 962, 972, 1028, 1059, 1078, 1356, 1440, 1460, 1531–1535, 1657, 1661, 1685, 1711, 1713, 1729, 1731, 1816, 1846, 1866, 1868, 1921, 1984, 2175–2176, 2224, 2292–2293, 2351, 2356, 2363
   _types.py10258591%59, 68–69, 123, 128, 147, 149, 153, 157, 159, 161, 163, 181, 185, 211, 233, 238, 243, 247, 277, 634–635, 1020, 1083, 1100, 1118, 1123, 1141, 1151, 1168–1169, 1171, 1189–1190, 1192, 1199–1200, 1202, 1237, 1248–1249, 1251, 1289, 1516, 1568, 1659–1664, 1686, 1691, 1857, 1869, 2121, 2142, 2237, 2466, 2673, 2743, 2755, 2773, 2971–2973, 2976–2978, 2982, 2987, 2991, 3075–3077, 3106, 3160, 3179–3180, 3183–3187, 3193
packages/core/agent_framework/openai
   _chat_client.py2913488%210, 240–241, 245, 363, 370, 446–453, 455–458, 468, 546, 548, 564, 576–584, 622, 638, 678
   _responses_client.py6459086%292–295, 299–300, 303–304, 310–311, 316, 329–335, 356, 364, 387, 556, 611, 615, 617, 619, 621, 697, 707, 712, 755, 834, 851, 864, 925, 936, 940–942, 1029, 1034, 1038–1040, 1044–1045, 1068, 1137, 1159–1160, 1175–1176, 1194–1195, 1236–1239, 1348–1349, 1365, 1367, 1446–1454, 1573, 1628, 1643, 1686–1689, 1697–1698, 1700–1702, 1716–1718, 1728–1729, 1735, 1750
TOTAL22276278387% 

Python Unit Test Overview

Tests Skipped Failures Errors Time
4722 247 💤 0 ❌ 0 🔥 1m 16s ⏱️

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enables @tool-decorated functions to return rich content (images, audio, files) that models can perceive natively, rather than having them serialized to JSON strings. This addresses issue #4272 by allowing vision-in-the-loop workflows where tools like capture_screenshot() or render_chart() can feed image content back into the model for analysis.

Changes:

  • Core framework now preserves Content objects with rich media instead of JSON-serializing them
  • Added items field to function_result Content to carry rich media alongside text results
  • Updated all 6 provider implementations to handle rich content (OpenAI Responses, OpenAI Chat, Anthropic support it natively; Bedrock, Ollama, Azure-AI log warnings)

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
python/packages/core/agent_framework/_types.py Added items parameter to Content.init and from_function_result() to store rich media items; updated to_dict() to serialize items
python/packages/core/agent_framework/_tools.py Updated parse_result() to return str or list[Content] instead of always serializing; added _build_function_result() helper to separate text and rich items; updated invoke() return type
python/packages/core/agent_framework/_mcp.py Updated _parse_tool_result_from_mcp() to return list[Content] for results containing images/audio instead of JSON strings
python/packages/core/agent_framework/openai/_responses_client.py Injects rich items as separate user message with input_image content after function_call_output
python/packages/core/agent_framework/openai/_chat_client.py Formats tool message content as multi-part array with text and image_url/input_audio/file parts when items present
python/packages/anthropic/agent_framework_anthropic/_chat_client.py Formats rich items as native image blocks in tool_result content array; handles both data and uri image types
python/packages/bedrock/agent_framework_bedrock/_chat_client.py Logs warning when rich items present (Bedrock doesn't support them); omits items from tool result
python/packages/ollama/agent_framework_ollama/_chat_client.py Logs warning when rich items present (Ollama doesn't support them); omits items from tool result
python/packages/azure-ai/agent_framework_azure_ai/_chat_client.py Logs warning when rich items present (Azure AI Agents doesn't support them); omits items from tool output
python/packages/core/tests/core/test_types.py Added 8 new tests for parse_result(), _build_function_result(), and Content.from_function_result() with items; updated 2 existing tests to expect list[Content] instead of JSON
python/packages/core/tests/core/test_mcp.py Updated test_parse_tool_result_from_mcp to expect list[Content] for results with images; added test_parse_tool_result_from_mcp_audio_content

Copy link
Member

@eavanvalkenburg eavanvalkenburg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we recently made the switch to restrict return types, and one of the reasons was performance, the constant parsing of these results, both for otel and for the client is a bit wasteful. So could you have a look at whether a cache could be used in the parsing function in the different places? And we also need to do integration testing with this because openai chat shouldn't support this, so let's be sure, both with openai, azure openai, ollama and foundry local and maybe others that derive from openai chat

if rich_items:
# Return rich content list with text items included
result: list[Content] = []
for text in text_parts:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will change the order of the content since we don't know what came first, not sure if that's an issue but might be good to doublecheck

return normalized


def _build_function_result(call_id: str, function_result: str | list[Content]) -> Content:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably just build this into the .from_function_result method

# Always include content for tool results - API requires it even if empty
# Functions returning None should still have a tool result message
args["content"] = content.result if content.result is not None else ""
if content.items:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to their own docs, this is not supported for the chat completion api: https://developers.openai.com/api/reference/resources/chat/subresources/completions/methods/create/

@eavanvalkenburg
Copy link
Member

This is also #2513

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Python: [Feature]: Allow @tool functions to return image content that the model can analyze

4 participants