Building a Chatbot with RAG, Langchain, and Neo4j

May 24, 2024 · 6 min read

Suraj P V

Data Engineer

Vinay Kumar S P

DevOps Engineer.

Chatbot_Anansi

A chatbot is a software application that uses AI to have conversations with users, helping them find information or answer questions. We built this chatbot using Retrieval-Augmented Generation (RAG) to improve its responses, Neo4j to store structured data, and Large Language Models (LLMs) to understand and generate natural language.

We created 2 types of Nodes/Labels, "Bank" and "Owner" and 1 type of relationship between them: "IS_OWNED_BY". The blog below lays out how we created a chatbot to query the relationship between the Node Types mentioned using RAG (Retrieval Augmented Generation) techniques.

Introduction

Anansi

Anansi is a Visual Data Lineage Tool used to visualize and manage data relationships. It ensures data quality, compliance, and provides real-time insights, with easy customization and flexible deployment.

Retrieval-Augmented Generation (RAG)

Retrieval-Augmented Generation is a technique that enhances Large Language Model (LLM) responses by retrieving source information from external data stores (e.g., databases) to augment generated responses.

Langchain

LangChain is a library for GenAI orchestration, it supports numerous LLMs, vector stores, document loaders and agents. It manages templates, composes components into chains and supports monitoring and observability.

Neo4j

Neo4j is a graph database management system that utilizes graph structures with nodes, edges, and properties to represent and store data.

Prerequisites

1. Creation of Tables using Anansi

We created the Bank Table and Owner Table in Anansi. These are the tables we created in Anansi.

Bank

System ID	System Name	System Description
System1	Investment	System is Related to Investment
System2	Retail	System is related to Retail Banking
System3	Treasury	System is related to Treasury Department
System4	Custody	System is related to Custody
System111	Financial Product Service Create	Services of FPMS
System112	Financial Product Service Update	Services of FPMS
System121	CRM Service Create	Services of CRM
System122	CRM Service Update	Services of CRM
System131	Investment Order Service Create	Services of IOMS
System11	FRM Systems	This system facilitates the cataloging of financial products, making them easily accessible for stakeholders and clients.
System12	CRM Systems	This system centralizes all client-related data, enabling better service and personalized advisory.
System132	Investment Order Service Update	Services of IOMS
System13	Investment Order Management Systems	This system manages everything from order placement to execution.

Owner

Owner Id	Name	Owner Type	Email	Phone No.
1	Bob	Business Owner	[email protected]	123-456
2	Dick	Business Owner	[email protected]	null
4	External Systems	Department	[email protected]	null
5	Utilities	Department	[email protected]	null
3	Joshua	Tech Owner	[email protected]	null
6	Trading Systems	Department	[email protected]	null

After creating both the Owner and Bank tables, we established the 'IS_OWNED_BY' relationship between the Bank and Owner tables. Additionally, we created the 'CONNECTED_TO' and 'BELONGS_TO' relationships within the Bank table for the system type.

Refer to the Anansi Documentation for instructions on creating tables and relationships.

2. Full Text Index

A Full Text index is a database function that enhances text searches, making it easier to locate specific words or phrases within extensive text collections.

To establish full-text indices for the nodes within a Neo4j database, we ran the following query to create a full-text index for a bank:

CREATE FULLTEXT INDEX Bank
IF NOT EXISTS
FOR (bank:Bank)
ON EACH [bank.systemName]

In case you want to create your own Full Text Index, replace Bank with the label of your node and bank.systemName with the property you want to index. This query creates a full-text index named Bank for nodes labeled as Bank, indexing the systemName property for efficient search operations.

Similarly, we created a full-text index for our Owner Table/Node.

Code configuration

We based our code on the information tool from the Movie Agent to gather information for the Bank and Owner nodes.

Additionally, in the Utils file, we are modifying the candidate_query for the newly created Full Text Indexes (i.e., Bank and Owner).

candidate_query = """
CALL db.index.fulltext.queryNodes($index, $fulltextQuery, {limit: $limit})
YIELD node
RETURN coalesce(node.systemName, node.name, node.ownerType) AS candidate,
       [el in labels(node) WHERE el IN ['Bank','Owner'] | el][0] AS label
"""

This candidate_query is used for full-text search in Neo4j. It retrieves nodes matching the query from a specified index and limits the results. The returned nodes are transformed into dictionaries containing their properties (systemName, name, or ownerType) and their label ('Bank' or 'Owner').

Configuring the Agent File

We used the Azure Open AI for our LLMs. Here is how we initialized it in the code.

llm = AzureChatOpenAI(temperature=0,
streaming=True,
deployment_name=AZURE_OPENAI_DEPLOYMENT,
openai_api_key=AZURE_OPENAI_API_KEY,
azure_endpoint=AZURE_OPENAI_ENDPOINT,
openai_api_version=AZURE_OPENAI_VERSION
)   

The deployment_name, openai_api_key, azure_endpoint, openai_api_version all have been setup in env file.

Setup

Setting up the Information Tool

We have used the following two Cypher queries in our Information tool to query the Bank and Owner Tables with specific conditions.

MATCH (b:Bank)-[r:IS_OWNED_BY]-(s:Owner)
WHERE b.systemName = $candidate OR s.name=$candidate
WITH b, s, r
WITH "Service Name: " + coalesce(b.systemName, "") + "\nOwner Name: " + coalesce(s.name, "") + "\nOwner Type: " + coalesce(s.ownerType, "") + "\nPhone No.: " + coalesce(s.contact, "") AS context
RETURN context LIMIT 50

MATCH (b:Bank)-[r:IS_OWNED_BY]-(s:Owner)
WHERE b.systemName = $candidate AND s.ownerType = $owner_type
WITH b, s, r
WITH "Service Name: " + coalesce(b.systemName, "") + "\nOwner Name: " + coalesce(s.name, "") + "\nOwner Type: " + coalesce(s.ownerType, "") + "\nPhone No.: " + coalesce(s.contact, "") AS context
RETURN context LIMIT 50

Running the Chatbot

To run the chatbot, we utilized Docker. Docker provides a convenient way to package our application and its dependencies into a container. By running a few simple commands, we had our chatbot up and running on our local machine.

docker build -t llm .
docker run --env-file ./.env -p 8080:8080 llm

Testing the Chatbot

After building the Docker image, we tested the chatbot by visiting the URL: 127.0.0.1:8080/entity/playground/
This playground is setup automatically courtesy of LangServe.

Playground

Below are the sample questions we asked the chatbot.

Who owns Custody?
Who are the business owners of "FRM Systems" bank?
Who is the tech owner of "CRM Systems" bank?
Who is the tech owner of Treasury?
How many services does Bob own?
List all the services owned by Bob.

This is how the chatbot looks like in the Anansi tool once we deployed it successfully in there.

Chatbot_Anansi

Introduction​

Anansi​

Retrieval-Augmented Generation (RAG)​

Langchain​

Neo4j​

Prerequisites​

1. Creation of Tables using Anansi​

2. Full Text Index​

Code configuration​

Configuring the Agent File​

Setup​

Setting up the Information Tool​

Running the Chatbot​

Testing the Chatbot​