Chatbot with Tool Calling

In this example, we build a chatbot that can do more than just generate text — it can use tools. Normally, a language model can only answer based on what it already knows. It cannot check the weather, look up fresh data, or run your code. Tool calling changes that. It allows the model to decide that it needs outside information, ask your program to run a function, receive the result, and then use that result to produce a better, more accurate answer. This makes the chatbot feel less like a text generator and more like a real assistant that can interact with the world around it. In this tutorial, you will learn how to build this step by step using Ollama, Python, and Mercury, while also streaming the model’s thinking and final response in real time.

📓 Full notebook code is in our GitHub repository.

What You Will Learn

In this guide you will learn how to:

Stream AI responses word by word
Stream the model’s thinking separately
Let the model use tools
Build a simple chat interface
Turn everything into a web app

1. Install and Run Ollama

Install Ollama using the official guide: docs.ollama.com/quickstart

We use GPT-OSS 20B.

Download and start the model:

ollama run gpt-oss:20b

After download, the model runs locally on your machine and you can use it in terminal. But let’s move further!

Running GPT-OSS model locally with Ollama

2. Install Python Packages

We need two packages:

ollama — to talk to the model
mercury — to build the chat app

Both packages are open source and easy to install:

pip install ollama mercury

Import them in the first code cell:

import ollama
import mercury as mr

We are ready to code more :)

3. Create a Tool the Model Can Use

We define a simple Python function. The model can call this like a tool.

def get_temperature(city: str) -> str:
    """Get the current temperature for a city"""
    temperatures = {
        'New York': '22°C',
        'London': '15°C'
    }
    return temperatures.get(city, 'Unknown')

This function:

Takes a city name
Returns a temperature as string

4. Build the Chat Interface

We will store our conversation in the list messages. It is used to provide context to local LLM model, without it model will not remember about dialog.

messages = []

Let’s clreate a Chat widget for displaying messages in the next code cell:

chat = mr.Chat(placeholder="💬 Start conversation")

User will provide prompts with ChatInput widget.

prompt = mr.ChatInput()

Chat interface components:

chat shows messages on screen
prompt is where the user types

5. Stream Thinking and Answer from the Model

This is the main logic. It is called after every new prompt from user. This code cell is re-executed for every new input.

if prompt.value:
    usr_msg = mr.Message(markdown=prompt.value, role="user")
    chat.add(usr_msg)

    messages += [{'role': 'user', 'content': prompt.value}]

    stream = ollama.chat(
        model='gpt-oss:20b',
        messages=messages,
        tools=[get_temperature],
        stream=True,
    )

    ai_msg = mr.Message(role="assistant", emoji="🤖")
    chat.add(ai_msg)

    thinking, content = "", ""
    tool_calls = []

    for chunk in stream:
        if chunk.message.thinking:
            if thinking == "":
                ai_msg.append_markdown("**Thinking:** ")
            thinking += chunk.message.thinking
            ai_msg.append_markdown(chunk.message.thinking)

        elif chunk.message.content:
            if content == "":
                ai_msg.append_markdown("\n\n**Answer:** ")
            content += chunk.message.content
            ai_msg.append_markdown(chunk.message.content)

        elif chunk.message.tool_calls:
            tool_calls.extend(chunk.message.tool_calls)

    messages += [{
        'role': 'assistant',
        'thinking': thinking,
        'content': content,
        'tool_calls': tool_calls
    }]

What happens here?

User message is created and displayed in the chat
We call the model with streaming
We listen to chunks from the model
We separate:
- thinking
- final answer
- tool calls

The screenshot of notebook and app preview:

Tool calling chat web application preview in Jupyter

6. 🔍 How the Last Code Piece Works (Tool Execution)

Let’s look closer how we handle tools the model asked to use.

for tool in tool_calls:
    if tool.function.name == "get_temperature":
        tool_msg = mr.Message(role="tool", emoji="⛅")
        chat.add(tool_msg)

        result = get_temperature(**tool.function.arguments)
        tool_msg.append_markdown("Temperature is " + result)

        messages += [{
            "role": "tool",
            "tool_name": "get_temperature",
            "content": result
        }]

Let’s break this down step by step:

1️⃣ Loop through tool calls

for tool in tool_calls:

During streaming, the model may say:

“I want to call a tool.”

Those requests are stored in tool_calls. Now we go through each one.

2️⃣ Check which tool the model wants

if tool.function.name == "get_temperature":

The model tells us the name of the function. We check if it matches our function.

3️⃣ Show tool message in the chat

tool_msg = mr.Message(role="tool", emoji="⛅")
chat.add(tool_msg)

We create a message from the tool, so the user sees that a tool is being used.

4️⃣ Run the function in Python

result = get_temperature(**tool.function.arguments)

Very important part.

tool.function.arguments contains data like:
```
{"city": "London"}
```
**tool.function.arguments means: unpack arguments So this becomes:
```
get_temperature(city="London")
```

Now Python runs the function and returns "15°C".

5️⃣ Show tool result

tool_msg.append_markdown("Temperature is " + result)

The user now sees:

Temperature is 15°C

6️⃣ Add tool result to conversation history

messages += [{
    "role": "tool",
    "tool_name": "get_temperature",
    "content": result
}]

This is very important.

We tell the model:

👉 “The tool ran.” 👉 “Here is the result.”

Now the model can continue and use this information in its final answer.

Why This Pattern Is Powerful

Normal chatbot:

User → Model → Answer

This system:

User → Model Thinking → Tool Call → Tool Result → Final Answer

This is the base for:

AI agents
Smart assistants
Tool-using AI systems

Turn It Into a Web App

Running the notebook as web app is simple as starting a Mercury Server with command:

mercury

Mercury will detect all *.ipynb files and serve them as web applications.

Summary

Great job — you have just built a chatbot that behaves more like a real assistant than a simple text generator. Instead of only guessing answers from what the model already knows, your chatbot can decide to use a tool, run a Python function, and use the result to give a better and more accurate response. At the same time, users can watch how the model thinks and see the final answer appear step by step, which makes the whole system more transparent and easier to understand.

You also learned how to connect a local model running with Ollama to a Python application, how to pass conversation history, how tool calls are detected and executed, and how to show everything in an interactive interface using Mercury. By turning the notebook into a web app, you transformed a small code example into something people can actually use.

This pattern — model thinking, tool calling, tool result, and final answer — is the core idea behind modern AI assistants and agents. Once you understand this flow, you can connect models to databases, APIs, or your own business logic, and build systems that do real work, not just chat.