Skip to content

Chatbot with Tool Calling

In this example, we build a chatbot that can do more than just generate text — it can use tools. Normally, a language model can only answer based on what it already knows. It cannot check the weather, look up fresh data, or run your code. Tool calling changes that. It allows the model to decide that it needs outside information, ask your program to run a function, receive the result, and then use that result to produce a better, more accurate answer. This makes the chatbot feel less like a text generator and more like a real assistant that can interact with the world around it. In this tutorial, you will learn how to build this step by step using Ollama, Python, and Mercury, while also streaming the model’s thinking and final response in real time.

📓 Full notebook code is in our GitHub repository.

Tool calling chat web application

In this guide you will learn how to:

  • Stream AI responses word by word
  • Stream the model’s thinking separately
  • Let the model use tools
  • Build a simple chat interface
  • Turn everything into a web app

Install Ollama using the official guide: docs.ollama.com/quickstart

We use GPT-OSS 20B.

Download and start the model:

Terminal window
ollama run gpt-oss:20b

After download, the model runs locally on your machine and you can use it in terminal. But let’s move further!

Running GPT-OSS model locally with Ollama

We need two packages:

  • ollama — to talk to the model
  • mercury — to build the chat app

Both packages are open source and easy to install:

Terminal window
pip install ollama mercury

Import them in the first code cell:

import ollama
import mercury as mr

We are ready to code more :)

We define a simple Python function. The model can call this like a tool.

def get_temperature(city: str) -> str:
"""Get the current temperature for a city"""
temperatures = {
'New York': '22°C',
'London': '15°C'
}
return temperatures.get(city, 'Unknown')

This function:

  • Takes a city name
  • Returns a temperature as string

We will store our conversation in the list messages. It is used to provide context to local LLM model, without it model will not remember about dialog.

messages = []

Let’s clreate a Chat widget for displaying messages in the next code cell:

chat = mr.Chat(placeholder="💬 Start conversation")

User will provide prompts with ChatInput widget.

prompt = mr.ChatInput()

Chat interface components:

  • chat shows messages on screen
  • prompt is where the user types

5. Stream Thinking and Answer from the Model

Section titled “5. Stream Thinking and Answer from the Model”

This is the main logic. It is called after every new prompt from user. This code cell is re-executed for every new input.

if prompt.value:
usr_msg = mr.Message(markdown=prompt.value, role="user")
chat.add(usr_msg)
messages += [{'role': 'user', 'content': prompt.value}]
stream = ollama.chat(
model='gpt-oss:20b',
messages=messages,
tools=[get_temperature],
stream=True,
)
ai_msg = mr.Message(role="assistant", emoji="🤖")
chat.add(ai_msg)
thinking, content = "", ""
tool_calls = []
for chunk in stream:
if chunk.message.thinking:
if thinking == "":
ai_msg.append_markdown("**Thinking:** ")
thinking += chunk.message.thinking
ai_msg.append_markdown(chunk.message.thinking)
elif chunk.message.content:
if content == "":
ai_msg.append_markdown("\n\n**Answer:** ")
content += chunk.message.content
ai_msg.append_markdown(chunk.message.content)
elif chunk.message.tool_calls:
tool_calls.extend(chunk.message.tool_calls)
messages += [{
'role': 'assistant',
'thinking': thinking,
'content': content,
'tool_calls': tool_calls
}]

What happens here?

  1. User message is created and displayed in the chat

  2. We call the model with streaming

  3. We listen to chunks from the model

  4. We separate:

    • thinking
    • final answer
    • tool calls

The screenshot of notebook and app preview:

Tool calling chat web application preview in Jupyter

6. 🔍 How the Last Code Piece Works (Tool Execution)

Section titled “6. 🔍 How the Last Code Piece Works (Tool Execution)”

Let’s look closer how we handle tools the model asked to use.

for tool in tool_calls:
if tool.function.name == "get_temperature":
tool_msg = mr.Message(role="tool", emoji="")
chat.add(tool_msg)
result = get_temperature(**tool.function.arguments)
tool_msg.append_markdown("Temperature is " + result)
messages += [{
"role": "tool",
"tool_name": "get_temperature",
"content": result
}]

Let’s break this down step by step:

for tool in tool_calls:

During streaming, the model may say:

“I want to call a tool.”

Those requests are stored in tool_calls. Now we go through each one.

if tool.function.name == "get_temperature":

The model tells us the name of the function. We check if it matches our function.

tool_msg = mr.Message(role="tool", emoji="")
chat.add(tool_msg)

We create a message from the tool, so the user sees that a tool is being used.

result = get_temperature(**tool.function.arguments)

Very important part.

  • tool.function.arguments contains data like:

    {"city": "London"}
  • **tool.function.arguments means: unpack arguments So this becomes:

    get_temperature(city="London")

Now Python runs the function and returns "15°C".

tool_msg.append_markdown("Temperature is " + result)

The user now sees:

Temperature is 15°C

6️⃣ Add tool result to conversation history

Section titled “6️⃣ Add tool result to conversation history”
messages += [{
"role": "tool",
"tool_name": "get_temperature",
"content": result
}]

This is very important.

We tell the model:

👉 “The tool ran.” 👉 “Here is the result.”

Now the model can continue and use this information in its final answer.

Normal chatbot:

User → Model → Answer

This system:

User → Model Thinking → Tool Call → Tool Result → Final Answer

This is the base for:

  • AI agents
  • Smart assistants
  • Tool-using AI systems

Running the notebook as web app is simple as starting a Mercury Server with command:

Terminal window
mercury

Mercury will detect all *.ipynb files and serve them as web applications.

Great job — you have just built a chatbot that behaves more like a real assistant than a simple text generator. Instead of only guessing answers from what the model already knows, your chatbot can decide to use a tool, run a Python function, and use the result to give a better and more accurate response. At the same time, users can watch how the model thinks and see the final answer appear step by step, which makes the whole system more transparent and easier to understand.

You also learned how to connect a local model running with Ollama to a Python application, how to pass conversation history, how tool calls are detected and executed, and how to show everything in an interactive interface using Mercury. By turning the notebook into a web app, you transformed a small code example into something people can actually use.

This pattern — model thinking, tool calling, tool result, and final answer — is the core idea behind modern AI assistants and agents. Once you understand this flow, you can connect models to databases, APIs, or your own business logic, and build systems that do real work, not just chat.