Architecture, search, chunking, observability

Context-Aware RAG Chatbot

A GenAI-powered chatbot that provides accurate, citation-backed answers about Artificial Intelligence and Machine Learning topics using Retrieval-Augmented Generation.

RAGLangGraphHybrid SearchFastAPINext.js

View Source Code Build Something Similar

Problem

What this project solves

AI/ML information is fragmented across sources with varying levels of quality and reliability.
Traditional chatbots often answer without citations, making it difficult to verify generated claims.
Ungrounded LLM responses can hallucinate plausible but incorrect information.
Long conversations require history optimization so the model receives the right context without excessive token usage.

Solution

How it works

Ingest curated Wikipedia articles and split them with semantic chunking to preserve natural topic boundaries.
Retrieve evidence through hybrid search that combines vector MMR with BM25 keyword matching.
Filter redundant retrieved evidence before generation using context distillation.
Generate responses with numbered citations and source metadata for traceability.
Persist sessions and message history in SQLite while exposing a FastAPI backend and a Next.js frontend.

Architecture

System design

Next.js frontend sends user messages to a FastAPI API.
The API coordinates a LangGraph workflow with retrieval and generation nodes.
ChromaDB stores HuggingFace embeddings for vector retrieval while BM25 provides keyword matching.
Ollama-hosted LLMs generate responses from distilled retrieved context.
SQLite stores sessions, messages, timestamps, and citation payloads.

Features

Key capabilities

Hybrid vector and BM25 retrieval for semantic and exact-match search.
Semantic chunking with configurable breakpoint strategies.
LangGraph state machine for retrieve-and-generate conversation flow.
Citation tracking with source title, URL, preview, page, and chunk metadata.
History optimization with sliding window, token budget, and summarization strategies.
LangSmith tracing and evaluation support for observability.
Docker Compose deployment with Ollama model support.

Outcome

What it demonstrates

Grounded answers with explicit citations for transparent AI/ML topic exploration.
Improved retrieval coverage through combined semantic and keyword search.
Reduced context noise through retrieval distillation and history optimization.

Related Projects

Explore similar work using related technologies and approaches

Medical Q/A Chatbot (RAG)

A context-aware medical chatbot using Retrieval-Augmented Generation (RAG) to answer questions with cited information from ingested PDFs using OpenAI, LlamaIndex, and Pinecone and embedding model fined tuned for medical documents.

PythonChatbotRAG