What function calling is
Function calling (also called tool use) lets a language model take actions in the real world without ever running code itself. You describe the functions your application can perform as JSON schemas; the model, given a user request, decides which function to call and produces the arguments as structured JSON. Your code then executes the real function and feeds the result back. This is how an LLM goes from a chatbot to an assistant that can check the weather, query a database, book a meeting, or hit any API you expose.
How the loop works
You send the conversation plus a tools array describing your functions. The
model responds in one of two ways: a normal message (it is done), or a
tool_calls entry containing a function name and JSON arguments (it wants you to
act). When you get a tool call, you parse the arguments, run your real function,
and append a tool message containing the result and the matching
tool_call_id. Then you call the model again with the updated conversation. It
now sees the result and either calls another tool or produces the final answer.
That cycle — call, execute, return, call again — is the tool loop, and you
cap its iterations so it can never run forever.
The make-or-break detail is the schema. A clear function name, a precise description of when to use it, and typed parameters (with enums and required fields) dramatically improve how reliably the model picks the right tool and fills in valid arguments. The builder below lets you define a function and generates both the tool schema and the Python loop that executes it.
Tips and guardrails
Write descriptions for the model, not for humans — state exactly when the function should and should not be used. Use enums and required fields to constrain arguments, and enable strict mode for schema-guaranteed JSON. Always validate parsed arguments in your own code before acting; the schema guides the model but your function is the last line of defence. Handle multiple parallel tool calls in one response. And bound the loop with a maximum iteration count so a confused model cannot spin forever and burn tokens.