Architecture, search, chunking, observability

Context-Aware RAG Chatbot

A GenAI-powered chatbot that provides accurate, citation-backed answers about Artificial Intelligence and Machine Learning topics using Retrieval-Augmented Generation.

RAGLangGraphHybrid SearchFastAPINext.js
Context-Aware RAG Chatbot project preview showing the main interface

Problem

What this project solves

  • AI/ML information is fragmented across sources with varying levels of quality and reliability.
  • Traditional chatbots often answer without citations, making it difficult to verify generated claims.
  • Ungrounded LLM responses can hallucinate plausible but incorrect information.
  • Long conversations require history optimization so the model receives the right context without excessive token usage.

Solution

How it works

  • Ingest curated Wikipedia articles and split them with semantic chunking to preserve natural topic boundaries.
  • Retrieve evidence through hybrid search that combines vector MMR with BM25 keyword matching.
  • Filter redundant retrieved evidence before generation using context distillation.
  • Generate responses with numbered citations and source metadata for traceability.
  • Persist sessions and message history in SQLite while exposing a FastAPI backend and a Next.js frontend.

Architecture

System design

  • Next.js frontend sends user messages to a FastAPI API.
  • The API coordinates a LangGraph workflow with retrieval and generation nodes.
  • ChromaDB stores HuggingFace embeddings for vector retrieval while BM25 provides keyword matching.
  • Ollama-hosted LLMs generate responses from distilled retrieved context.
  • SQLite stores sessions, messages, timestamps, and citation payloads.

Features

Key capabilities

  • Hybrid vector and BM25 retrieval for semantic and exact-match search.
  • Semantic chunking with configurable breakpoint strategies.
  • LangGraph state machine for retrieve-and-generate conversation flow.
  • Citation tracking with source title, URL, preview, page, and chunk metadata.
  • History optimization with sliding window, token budget, and summarization strategies.
  • LangSmith tracing and evaluation support for observability.
  • Docker Compose deployment with Ollama model support.

Outcome

What it demonstrates

  • Grounded answers with explicit citations for transparent AI/ML topic exploration.
  • Improved retrieval coverage through combined semantic and keyword search.
  • Reduced context noise through retrieval distillation and history optimization.