Build a FastAPI Streaming API with Llama 3 and Ollama - local llm

Large Language Models (LLMs) are transforming how we build applications. One of the most exciting open-source models today is Llama 3, developed by Meta and made freely available for research and commercial use. Unlike proprietary models, Llama 3 is open-source, meaning you can experiment, deploy, and integrate it without licensing barriers. In this FastAPI Ollama Llama 3 streaming API tutorial, we’ll build a Python FastAPI backend that streams responses from Ollama (a lightweight local LLM runner) and sends them to a simple HTML frontend using Server-Sent Events (SSE). This setup allows you to see responses token-by-token, just like chatting with modern AI assistants.

If you’re new to APIs or FastAPI, don’t worry — we’ll keep things beginner-friendly and walk through each step.

FastAPI Ollama Llama 3 streaming API tutorial – Prerequisites

Python 3.10+ installed
Ollama installed locally
Basic knowledge of Python and HTML

What You Need to Install

Before we dive into coding, let’s make sure your environment is ready. You’ll need to install a few tools and libraries to get everything working smoothly:

1. Python 3.10+

Make sure you have Python installed.
You can check your version by running:bashpython --version
If you don’t have it, download it from Python.org.

2. FastAPI and Uvicorn

FastAPI is the web framework we’ll use, and Uvicorn is the ASGI server that runs it.

pip install fastapi uvicorn

pip install fastapi uvicorn

Bash

3. SSE Support (Server-Sent Events)

We’ll use sse-starlette to handle streaming responses via SSE.

pip install sse-starlette

pip install sse-starlette

Bash

4. Requests Library

This lets our backend talk to Ollama’s local API.

pip install requests

pip install requests

Bash

5. Ollama

Ollama is the local runner for LLMs like Llama 3.

Download and install Ollama from ollama.ai.
Once installed, pull the Llama 3 model:bashollama pull llama3

✅ Quick Recap

You’ll need:

Python 3.10+
FastAPI + Uvicorn
sse-starlette
requests
Ollama (with Llama 3 model pulled)

With these installed, you’re ready to build your streaming API!

Step 2: Create the FastAPI Backend

Here’s a simple FastAPI app that streams responses from Ollama’s Llama 3 model:

python

from fastapi import FastAPI
from fastapi.responses import StreamingResponse
import requests

app = FastAPI()

def ollama_stream(prompt: str):
    url = "http://localhost:11434/api/generate"
    payload = {"model": "llama3", "prompt": prompt, "stream": True}
    response = requests.post(url, json=payload, stream=True)

    for line in response.iter_lines():
        if line:
            yield f"data: {line.decode('utf-8')}\n\n"

@app.get("/stream")
async def stream(prompt: str):
    return StreamingResponse(ollama_stream(prompt), media_type="text/event-stream")

from fastapi import FastAPI
from fastapi.responses import StreamingResponse
import requests

app = FastAPI()

def ollama_stream(prompt: str):
    url = "http://localhost:11434/api/generate"
    payload = {"model": "llama3", "prompt": prompt, "stream": True}
    response = requests.post(url, json=payload, stream=True)

    for line in response.iter_lines():
        if line:
            yield f"data: {line.decode('utf-8')}\n\n"

@app.get("/stream")
async def stream(prompt: str):
    return StreamingResponse(ollama_stream(prompt), media_type="text/event-stream")

Python

What’s happening here?

We send the user’s prompt to Ollama (llama3 model).
Ollama streams back the response token-by-token.
FastAPI wraps this into Server-Sent Events (SSE) so the frontend can consume it live.

Step 3: Create the HTML Frontend

Here’s a minimal HTML page that connects to the FastAPI stream:

html

<!DOCTYPE html>
<html>
<head>
  <title>Llama 3 FastAPI Demo</title>
</head>
<body>
  <h1>Chat with Llama 3</h1>
  <input id="prompt" type="text" placeholder="Ask me anything..." />
  <button onclick="sendPrompt()">Send</button>
  <div id="response"></div>

  <script>
    function sendPrompt() {
      const prompt = document.getElementById("prompt").value;
      const eventSource = new EventSource(`/stream?prompt=${encodeURIComponent(prompt)}`);

      document.getElementById("response").innerHTML = "";
      eventSource.onmessage = function(event) {
        const data = JSON.parse(event.data);
        document.getElementById("response").innerHTML += data.response || "";
      };
    }
  </script>
</body>
</html>

<!DOCTYPE html>
<html>
<head>
  <title>Llama 3 FastAPI Demo</title>
</head>
<body>
  <h1>Chat with Llama 3</h1>
  <input id="prompt" type="text" placeholder="Ask me anything..." />
  <button onclick="sendPrompt()">Send</button>
  <div id="response"></div>

  <script>
    function sendPrompt() {
      const prompt = document.getElementById("prompt").value;
      const eventSource = new EventSource(`/stream?prompt=${encodeURIComponent(prompt)}`);

      document.getElementById("response").innerHTML = "";
      eventSource.onmessage = function(event) {
        const data = JSON.parse(event.data);
        document.getElementById("response").innerHTML += data.response || "";
      };
    }
  </script>
</body>
</html>

HTML

How it works:

User enters a prompt.
The frontend opens an EventSource connection to /stream.
Tokens arrive one by one and are appended to the response div.

Step 4: Run the Server

Start FastAPI with Uvicorn:

uvicorn main:app --reload

uvicorn main:app --reload

JavaScript

Open your HTML file in a browser, type a question, and watch Llama 3 stream its response live!

Troubleshooting

Even with everything installed, you might run into a few common issues. Here’s how to fix them:

1. Ollama Not Running

If you see errors like Connection refused or Failed to connect to localhost:11434, it usually means Ollama isn’t running.
Start Ollama manually:bashollama run llama3
Keep this terminal open while testing your API.

2. Model Not Found

If you get an error saying "model not found: llama3", make sure you’ve pulled the model:bashollama pull llama3

3. Port Conflicts

By default, Ollama runs on port 11434 and FastAPI (via Uvicorn) runs on port 8000.
If another service is already using these ports, you’ll see errors.
Fix by running FastAPI on a different port:bashuvicorn main:app --reload --port 8080

4. Missing Python Packages

If you see ModuleNotFoundError, double-check that you installed all dependencies:bashpip install fastapi uvicorn sse-starlette requests

5. Frontend Not Receiving Stream

If your HTML page doesn’t show responses, check:
- The EventSource URL matches your FastAPI endpoint (/stream).
- You’re running the HTML file from a server (not just double-clicking). Try a simple Python server:bashpython -m http.server 5500 Then open http://localhost:5500/index.html.

Why Llama 3 + Ollama?

Open-source freedom: Llama 3 is free to use, modify, and deploy.
Local-first: Ollama runs models on your machine, keeping data private.
Beginner-friendly: FastAPI makes building APIs simple and intuitive.

Next Steps

Add authentication for secure APIs.
Build a chat UI with frameworks like React or Vue.
Explore other models available in Ollama.

Conclusion

By now, you’ve seen how easy it is to set up a Python FastAPI backend that streams responses from Ollama’s Llama 3 model directly into a simple HTML frontend using Server-Sent Events. What makes this approach so powerful is its combination of accessibility and openness: FastAPI keeps the backend lightweight and beginner-friendly, while Ollama ensures you can run cutting-edge models locally without needing massive infrastructure or cloud costs.

The Llama 3 model itself is a milestone in open-source AI. Unlike closed systems, it gives developers, researchers, and hobbyists the freedom to experiment, customize, and deploy without licensing restrictions. This democratization of AI means you can build real-world applications — from chatbots to knowledge assistants — with complete control over your data and workflows.

Streaming responses token-by-token also mirrors the experience of modern AI assistants, making your applications feel responsive and interactive. With just a few lines of Python and HTML, you’ve unlocked a workflow that scales from personal projects to production-ready systems.

This tutorial is only the beginning. You can extend it with authentication, richer frontends, or even integrate multiple models for specialized tasks. The open-source ecosystem around Ollama and Llama 3 is growing rapidly, and by experimenting now, you’re positioning yourself at the forefront of AI innovation.

While you are here, maybe try one of my apps for the iPhone.

Snap! I was there on the App Store

If you enjoyed this guide, don’t stop here — check out more posts on AI and APIs on my blog (From https://mydaytodo.com/blog);

Build a Local LLM API with Ollama, Llama 3 & Node.js / TypeScript

Beginners guide to building neural networks using synaptic.js

Build Neural Network in JavaScript: Step-by-Step App Tutorial – My Day To-Do

Build Neural Network in JavaScript with Brain.js: Complete Tutorial

Post Views: 140

Build a FastAPI Streaming API with Llama 3 and Ollama – local llm

Published by Bhuman Soni on November 21, 2025November 21, 2025

FastAPI Ollama Llama 3 streaming API tutorial – Prerequisites

What You Need to Install

1. Python 3.10+

2. FastAPI and Uvicorn

3. SSE Support (Server-Sent Events)

4. Requests Library

5. Ollama

✅ Quick Recap

Step 2: Create the FastAPI Backend

What’s happening here?

Step 3: Create the HTML Frontend

How it works:

Step 4: Run the Server

Troubleshooting

1. Ollama Not Running

2. Model Not Found

3. Port Conflicts

4. Missing Python Packages

5. Frontend Not Receiving Stream

Why Llama 3 + Ollama?

Next Steps

Conclusion

Related

0 Comments

Leave a ReplyCancel reply

Ollama Llama 3 REST API Tutorial with Node.js & TypeScript

Beginner’s Guide to Building AI Models with Synaptic.js (Step-by-Step Tutorial)

How to Build a Neural Network in JavaScript with Brain.js: Step-by-Step Tutorial

Build a FastAPI Streaming API with Llama 3 and Ollama – local llm

Published by Bhuman Soni on November 21, 2025November 21, 2025

FastAPI Ollama Llama 3 streaming API tutorial – Prerequisites

What You Need to Install

1. Python 3.10+

2. FastAPI and Uvicorn

3. SSE Support (Server-Sent Events)

4. Requests Library

5. Ollama

✅ Quick Recap

Step 2: Create the FastAPI Backend

What’s happening here?

Step 3: Create the HTML Frontend

How it works:

Step 4: Run the Server

Troubleshooting

1. Ollama Not Running

2. Model Not Found

3. Port Conflicts

4. Missing Python Packages

5. Frontend Not Receiving Stream

Why Llama 3 + Ollama?

Next Steps

Conclusion

Related

0 Comments

Leave a ReplyCancel reply

Related Posts

Ollama Llama 3 REST API Tutorial with Node.js & TypeScript

Beginner’s Guide to Building AI Models with Synaptic.js (Step-by-Step Tutorial)

How to Build a Neural Network in JavaScript with Brain.js: Step-by-Step Tutorial