Skip to main content

Building a Chatbot with RAG, Langchain, and Neo4j

· 6 min read

Chatbot_Anansi

A chatbot is a software application that uses AI to have conversations with users, helping them find information or answer questions. We built this chatbot using Retrieval-Augmented Generation (RAG) to improve its responses, Neo4j to store structured data, and Large Language Models (LLMs) to understand and generate natural language.

We created 2 types of Nodes/Labels, "Bank" and "Owner" and 1 type of relationship between them: "IS_OWNED_BY". The blog below lays out how we created a chatbot to query the relationship between the Node Types mentioned using RAG (Retrieval Augmented Generation) techniques.

Introduction

Anansi

Anansi is a Visual Data Lineage Tool used to visualize and manage data relationships. It ensures data quality, compliance, and provides real-time insights, with easy customization and flexible deployment.

Retrieval-Augmented Generation (RAG)

Retrieval-Augmented Generation is a technique that enhances Large Language Model (LLM) responses by retrieving source information from external data stores (e.g., databases) to augment generated responses.

Langchain

LangChain is a library for GenAI orchestration, it supports numerous LLMs, vector stores, document loaders and agents. It manages templates, composes components into chains and supports monitoring and observability.

Neo4j

Neo4j is a graph database management system that utilizes graph structures with nodes, edges, and properties to represent and store data.

Prerequisites

1. Creation of Tables using Anansi

We created the Bank Table and Owner Table in Anansi. These are the tables we created in Anansi.

Bank

System IDSystem NameSystem Description
System1InvestmentSystem is Related to Investment
System2RetailSystem is related to Retail Banking
System3TreasurySystem is related to Treasury Department
System4CustodySystem is related to Custody
System111Financial Product Service CreateServices of FPMS
System112Financial Product Service UpdateServices of FPMS
System121CRM Service CreateServices of CRM
System122CRM Service UpdateServices of CRM
System131Investment Order Service CreateServices of IOMS
System11FRM SystemsThis system facilitates the cataloging of financial products, making them easily accessible for stakeholders and clients.
System12CRM SystemsThis system centralizes all client-related data, enabling better service and personalized advisory.
System132Investment Order Service UpdateServices of IOMS
System13Investment Order Management SystemsThis system manages everything from order placement to execution.

Owner

Owner IdNameOwner TypeEmailPhone No.
1BobBusiness Owner[email protected]123-456
2DickBusiness Owner[email protected]null
4External SystemsDepartment[email protected]null
5UtilitiesDepartment[email protected]null
3JoshuaTech Owner[email protected]null
6Trading SystemsDepartment[email protected]null

After creating both the Owner and Bank tables, we established the 'IS_OWNED_BY' relationship between the Bank and Owner tables. Additionally, we created the 'CONNECTED_TO' and 'BELONGS_TO' relationships within the Bank table for the system type.

Refer to the Anansi Documentation for instructions on creating tables and relationships.

2. Full Text Index

A Full Text index is a database function that enhances text searches, making it easier to locate specific words or phrases within extensive text collections.

To establish full-text indices for the nodes within a Neo4j database, we ran the following query to create a full-text index for a bank:

CREATE FULLTEXT INDEX Bank
IF NOT EXISTS
FOR (bank:Bank)
ON EACH [bank.systemName]

In case you want to create your own Full Text Index, replace Bank with the label of your node and bank.systemName with the property you want to index. This query creates a full-text index named Bank for nodes labeled as Bank, indexing the systemName property for efficient search operations.

Similarly, we created a full-text index for our Owner Table/Node.

Code configuration

We based our code on the information tool from the Movie Agent to gather information for the Bank and Owner nodes.

Additionally, in the Utils file, we are modifying the candidate_query for the newly created Full Text Indexes (i.e., Bank and Owner).

candidate_query = """
CALL db.index.fulltext.queryNodes($index, $fulltextQuery, {limit: $limit})
YIELD node
RETURN coalesce(node.systemName, node.name, node.ownerType) AS candidate,
[el in labels(node) WHERE el IN ['Bank','Owner'] | el][0] AS label
"""

This candidate_query is used for full-text search in Neo4j. It retrieves nodes matching the query from a specified index and limits the results. The returned nodes are transformed into dictionaries containing their properties (systemName, name, or ownerType) and their label ('Bank' or 'Owner').

Configuring the Agent File

We used the Azure Open AI for our LLMs. Here is how we initialized it in the code.

llm = AzureChatOpenAI(temperature=0,
streaming=True,
deployment_name=AZURE_OPENAI_DEPLOYMENT,
openai_api_key=AZURE_OPENAI_API_KEY,
azure_endpoint=AZURE_OPENAI_ENDPOINT,
openai_api_version=AZURE_OPENAI_VERSION
)

The deployment_name, openai_api_key, azure_endpoint, openai_api_version all have been setup in env file.

Setup

Setting up the Information Tool

We have used the following two Cypher queries in our Information tool to query the Bank and Owner Tables with specific conditions.

MATCH (b:Bank)-[r:IS_OWNED_BY]-(s:Owner)
WHERE b.systemName = $candidate OR s.name=$candidate
WITH b, s, r
WITH "Service Name: " + coalesce(b.systemName, "") + "\nOwner Name: " + coalesce(s.name, "") + "\nOwner Type: " + coalesce(s.ownerType, "") + "\nPhone No.: " + coalesce(s.contact, "") AS context
RETURN context LIMIT 50
MATCH (b:Bank)-[r:IS_OWNED_BY]-(s:Owner)
WHERE b.systemName = $candidate AND s.ownerType = $owner_type
WITH b, s, r
WITH "Service Name: " + coalesce(b.systemName, "") + "\nOwner Name: " + coalesce(s.name, "") + "\nOwner Type: " + coalesce(s.ownerType, "") + "\nPhone No.: " + coalesce(s.contact, "") AS context
RETURN context LIMIT 50

Running the Chatbot

To run the chatbot, we utilized Docker. Docker provides a convenient way to package our application and its dependencies into a container. By running a few simple commands, we had our chatbot up and running on our local machine.

docker build -t llm .
docker run --env-file ./.env -p 8080:8080 llm

Testing the Chatbot

After building the Docker image, we tested the chatbot by visiting the URL: 127.0.0.1:8080/entity/playground/
This playground is setup automatically courtesy of LangServe.

Playground

Below are the sample questions we asked the chatbot.

  1. Who owns Custody?
  2. Who are the business owners of "FRM Systems" bank?
  3. Who is the tech owner of "CRM Systems" bank?
  4. Who is the tech owner of Treasury?
  5. How many services does Bob own?
  6. List all the services owned by Bob.

This is how the chatbot looks like in the Anansi tool once we deployed it successfully in there.

Chatbot_Anansi