This application enables Retrieval-Augmented Generation (RAG) on a complex Excel dataset containing multi-level inventory data across different group companies. It uses Neo4j Knowledge Graph for structured representation and LangChain with Azure OpenAI for intelligent querying and answer generation.
- Create a virtual environment:
python -m venv code-rag-env
source code-rag-env/bin/activate # or .\code-rag-env\Scripts\activate on Windows- Install dependencies:
pip install -r requirements.txt- Converts Excel-based multi-level inventory data into a Neo4j Knowledge Graph
- Performs semantic RAG-style Question Answering over the graph
- Uses LangChain + Azure OpenAI to convert natural language to Cypher and generate accurate responses
- Streamlit-powered UI for easy interaction
-
Inventory Levels:
InventoryLevel1 -> InventoryLevel2 -> InventoryLevel3 -> InventoryLevel4 -
Relationships:
(:InventoryLevelX)-[:SUB_INVENTORY]->(:InventoryLevelY)(:GroupCompany)-[:HAS_VALUE {amount, year}]->(:InventoryLevel4)
-
Download and install Neo4j Desktop
-
Create a new Project and inside it, a new Local DBMS
-
Start the database and get:
- Bolt URL (e.g.,
bolt://localhost:7687) - Username (default:
neo4j) - Password (set by you)
- Bolt URL (e.g.,
-
Create a
.envfile in the root directory:
NEO4J_URI=bolt://localhost:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=your_password_here
AZURE_DEPLOYMENT=your_azure_deployment_name
AZURE_OPENAI_API_KEY=your_openai_key
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
AZURE_OPENAI_VERSION=your-api-version📝 Update the
.envfile with the correct Neo4j and Azure OpenAI credentials before executing this step.
Use the notebook create_kg.ipynb to:
-
Load Excel file (containing inventory and company data)
-
Extract and process:
- Inventory Levels (L1 to L4)
- Relationships between inventories
- GroupCompany to Inventory mappings with properties
amountandyear
-
Push structured nodes and relationships to Neo4j
The core RAG logic is defined in main.py:
-
Normalization Layer
- Uses LLM to extract structured fields:
company_name,inventory_type,year, andcanonicalized_query
- Uses LLM to extract structured fields:
-
Cypher Generation Layer
- Generates a Cypher query from the canonicalized query using LangChain prompts
-
Query Execution
- Runs the Cypher on Neo4j and extracts the relevant data
-
Answer Generation Layer
- Uses LLM to synthesize an answer from Cypher results + user query
Run the Streamlit app with:
streamlit run streamlit_app.py-
Ask financial questions like:
- "Which inventory tools has the highest difference for AS from 2023 to 2024?"
- "Show amount change for Machinery across years"
-
View normalized query, generated Cypher, and final answer
├── create_kg.ipynb # Creates and loads Neo4j KG from Excel
├── main.py # Core RAG pipeline: query normalization, Cypher gen, LLM response
├── prompts.py # Required Prompts for RAG and QA
├── streamlit_app.py # Streamlit chatbot interface
├── .env # Neo4j and Azure OpenAI secrets
├── requirements.txt # Required packages
- Neo4j – Graph DBMS for structured inventory data
- LangChain – RAG pipeline orchestration
- Azure OpenAI – LLM backend (ChatGPT)
- Streamlit – UI for chatbot
- Python – Core implementation