A chatbot is a software application that uses AI to have conversations with users, helping them find information or answer questions. We built this chatbot using Retrieval-Augmented Generation (RAG) to improve its responses, Neo4j to store structured data, and Large Language Models (LLMs) to understand and generate natural language.
We created 2 types of Nodes/Labels, "Bank" and "Owner" and 1 type of relationship between them: "IS_OWNED_BY". The blog below lays out how we created a chatbot to query the relationship between the Node Types mentioned using RAG (Retrieval Augmented Generation) techniques.
Introduction
Anansi
Anansi is a Visual Data Lineage Tool used to visualize and manage data relationships. It ensures data quality, compliance, and provides real-time insights, with easy customization and flexible deployment.
Retrieval-Augmented Generation (RAG)
Retrieval-Augmented Generation is a technique that enhances Large Language Model (LLM) responses by retrieving source information from external data stores (e.g., databases) to augment generated responses.
Langchain
LangChain is a library for GenAI orchestration, it supports numerous LLMs, vector stores, document loaders and agents. It manages templates, composes components into chains and supports monitoring and observability.
Neo4j
Neo4j is a graph database management system that utilizes graph structures with nodes, edges, and properties to represent and store data.
Prerequisites
1. Creation of Tables using Anansi
We created the Bank Table and Owner Table in Anansi. These are the tables we created in Anansi.
Bank
System ID | System Name | System Description |
---|---|---|
System1 | Investment | System is Related to Investment |
System2 | Retail | System is related to Retail Banking |
System3 | Treasury | System is related to Treasury Department |
System4 | Custody | System is related to Custody |
System111 | Financial Product Service Create | Services of FPMS |
System112 | Financial Product Service Update | Services of FPMS |
System121 | CRM Service Create | Services of CRM |
System122 | CRM Service Update | Services of CRM |
System131 | Investment Order Service Create | Services of IOMS |
System11 | FRM Systems | This system facilitates the cataloging of financial products, making them easily accessible for stakeholders and clients. |
System12 | CRM Systems | This system centralizes all client-related data, enabling better service and personalized advisory. |
System132 | Investment Order Service Update | Services of IOMS |
System13 | Investment Order Management Systems | This system manages everything from order placement to execution. |
Owner
Owner Id | Name | Owner Type | Phone No. | |
---|---|---|---|---|
1 | Bob | Business Owner | [email protected] | 123-456 |
2 | Dick | Business Owner | [email protected] | null |
4 | External Systems | Department | [email protected] | null |
5 | Utilities | Department | [email protected] | null |
3 | Joshua | Tech Owner | [email protected] | null |
6 | Trading Systems | Department | [email protected] | null |
After creating both the Owner and Bank tables, we established the 'IS_OWNED_BY' relationship between the Bank and Owner tables. Additionally, we created the 'CONNECTED_TO' and 'BELONGS_TO' relationships within the Bank table for the system type.
Refer to the Anansi Documentation for instructions on creating tables and relationships.
2. Full Text Index
A Full Text index is a database function that enhances text searches, making it easier to locate specific words or phrases within extensive text collections.
To establish full-text indices for the nodes within a Neo4j database, we ran the following query to create a full-text index for a bank:
CREATE FULLTEXT INDEX Bank
IF NOT EXISTS
FOR (bank:Bank)
ON EACH [bank.systemName]
In case you want to create your own Full Text Index, replace Bank
with the label of your node and bank.systemName
with the property you want to index. This query creates a full-text index named Bank
for nodes labeled as Bank
, indexing the systemName
property for efficient search operations.
Similarly, we created a full-text index for our Owner Table/Node.
Code configuration
We based our code on the information tool from the Movie Agent to gather information for the Bank and Owner nodes.
Additionally, in the Utils file, we are modifying the candidate_query for the newly created Full Text Indexes (i.e., Bank and Owner).
candidate_query = """
CALL db.index.fulltext.queryNodes($index, $fulltextQuery, {limit: $limit})
YIELD node
RETURN coalesce(node.systemName, node.name, node.ownerType) AS candidate,
[el in labels(node) WHERE el IN ['Bank','Owner'] | el][0] AS label
"""
This candidate_query is used for full-text search in Neo4j. It retrieves nodes matching the query from a specified index and limits the results. The returned nodes are transformed into dictionaries containing their properties (systemName, name, or ownerType) and their label ('Bank' or 'Owner').
Configuring the Agent File
We used the Azure Open AI for our LLMs. Here is how we initialized it in the code.
llm = AzureChatOpenAI(temperature=0,
streaming=True,
deployment_name=AZURE_OPENAI_DEPLOYMENT,
openai_api_key=AZURE_OPENAI_API_KEY,
azure_endpoint=AZURE_OPENAI_ENDPOINT,
openai_api_version=AZURE_OPENAI_VERSION
)
The deployment_name, openai_api_key, azure_endpoint, openai_api_version all have been setup in env file.
Setup
Setting up the Information Tool
We have used the following two Cypher queries in our Information tool to query the Bank and Owner Tables with specific conditions.
MATCH (b:Bank)-[r:IS_OWNED_BY]-(s:Owner)
WHERE b.systemName = $candidate OR s.name=$candidate
WITH b, s, r
WITH "Service Name: " + coalesce(b.systemName, "") + "\nOwner Name: " + coalesce(s.name, "") + "\nOwner Type: " + coalesce(s.ownerType, "") + "\nPhone No.: " + coalesce(s.contact, "") AS context
RETURN context LIMIT 50
MATCH (b:Bank)-[r:IS_OWNED_BY]-(s:Owner)
WHERE b.systemName = $candidate AND s.ownerType = $owner_type
WITH b, s, r
WITH "Service Name: " + coalesce(b.systemName, "") + "\nOwner Name: " + coalesce(s.name, "") + "\nOwner Type: " + coalesce(s.ownerType, "") + "\nPhone No.: " + coalesce(s.contact, "") AS context
RETURN context LIMIT 50
Running the Chatbot
To run the chatbot, we utilized Docker. Docker provides a convenient way to package our application and its dependencies into a container. By running a few simple commands, we had our chatbot up and running on our local machine.
docker build -t llm .
docker run --env-file ./.env -p 8080:8080 llm
Testing the Chatbot
After building the Docker image, we tested the chatbot by visiting the URL: 127.0.0.1:8080/entity/playground/
This playground is setup automatically courtesy of LangServe.
Below are the sample questions we asked the chatbot.
- Who owns Custody?
- Who are the business owners of "FRM Systems" bank?
- Who is the tech owner of "CRM Systems" bank?
- Who is the tech owner of Treasury?
- How many services does Bob own?
- List all the services owned by Bob.
This is how the chatbot looks like in the Anansi tool once we deployed it successfully in there.