home..

Research Chatbot Built Upon Large Language Model And Langchain

Building a custom Research-Hub Chatbot has allowed me to turn my growing Notion knowledge base into an interactive Q&A assistant. Here’s how I approached it:

1. Motivation: Why a Chatbot-Powered Hub?

2. Evaluating the Landscape

Before building, I compared existing options:

Conceptual table
Figure 1: Hallucination in ChatGPT

3. Architecting the Hub

3.1 Notion as the Knowledge Store

I organize everything in Notion:

3.2 Backup & Bi-Directional Sync

To keep Notion content safe and extractable:

  1. Notion-backup workflow on GitHub Actions regularly pulls the workspace via private tokens.
  2. Changes auto-commit to a private Git repo, providing versioning and a local data source for ingestion.

3.3 Building the Chatbot Pipeline

  1. Data Ingestion (ingest.py):
    • Reads synced markdown files
    • Splits content into passages and creates vector embeddings (via OpenAI or Hugging Face models)
    • Stores vectors in a FAISS index
  2. Prompt Template tuned for “Research-Hub Assistant” ensures context relevance and reduces hallucinations:
    You are an AI assistant for my personal research hub...
    {context}
    Question: {question}
    
  3. Retrieval-Augmented Generation using LangChain’s ConversationalRetrievalChain:
    • On each query, the top-k similar passages are fetched from FAISS
    • The LLM (e.g. GPT-3.5) generates a concise answer grounded in those passages

4. Front-End: Streamlit Chat Interface

Key features:

5. Embedding Back into Notion

By wrapping the Streamlit URL in a Notion embed block, I can:

Conceptual table
Figure 2: De-hallucination in My Own Chatbot

6. Lessons & Next Steps

By combining Notion’s organizational power with a custom vector-search chatbot, I’ve created a research companion that scales with my work and helps me retrieve precisely what I need—when I need it.

© 2025 Dingyi Lai   •  Powered by Soopr   •  Theme  Moonwalk