A powerful Retrieval-Augmented Generation (RAG) application that transforms your documents into intelligent, interactive conversations. Built with Streamlit, Google Gemini AI, and FAISS vector database for accurate document analysis and question-answering.
- Multi-Format Support: Process PDF, DOCX, and TXT files
- Intelligent Q&A: Ask questions and get contextual answers from your documents
- RAG Architecture: Combines document retrieval with AI generation for accurate responses
- Beautiful UI: Modern, responsive interface with intuitive design
- Real-time Processing: Stream responses for better user experience
- Ethical Guidelines: Built-in safeguards for responsible AI usage
- Frontend: Streamlit
- AI Model: Google Gemini 1.5 Flash
- Embeddings: Google Generative AI Embeddings
- Vector Database: FAISS (Facebook AI Similarity Search)
- Document Processing: PyPDF2, python-docx
- Text Processing: LangChain
- Python 3.8 or higher
- Google API Key for Gemini AI
-
Clone the repository
git clone https://github.com/alaaashraf24/rag-document-analyzer.git cd rag-document-analyzer -
Install dependencies
pip install -r requirements.txt
-
Set up your Google API Key
Option 1: Streamlit Secrets (Recommended for deployment) Create a
.streamlit/secrets.tomlfile:GOOGLE_API_KEY = "your_google_api_key_here"
Option 2: Environment Variable Create a
.envfile:GOOGLE_API_KEY=your_google_api_key_here
-
Run the application
streamlit run app.py
- Go to Google AI Studio
- Sign in with your Google account
- Click "Create API Key"
- Copy the generated key and use it in your configuration
- Upload Documents: Use the sidebar to upload PDF, DOCX, or TXT files
- Process Documents: Click "π Process Documents" to create embeddings
- Ask Questions: Type your questions in the chat interface
- Get Answers: Receive contextual answers based on your documents
Upload: research_paper.pdf, meeting_notes.docx
Question: "What are the main findings in the research paper?"
Answer: [AI provides summary based on document content]
- π Research & Literature Review: Analyze academic papers and research documents
- π Document Summarization: Extract key insights from lengthy documents
- π Educational Support: Understand complex materials and textbooks
- π Content Organization: Query and organize large document collections
- π Information Extraction: Find specific information across multiple documents
The application uses Retrieval-Augmented Generation to:
- Chunk Documents: Break documents into manageable pieces
- Create Embeddings: Generate vector representations of text chunks
- Store in FAISS: Use efficient similarity search
- Retrieve Context: Find relevant chunks for user queries
- Generate Answers: Use Google Gemini with retrieved context
- Built-in guidelines for responsible usage
- Designed to enhance learning, not replace it
- Encourages proper citation and academic integrity
- Prevents misuse for academic dishonesty
- PDF:
.pdffiles - Word Documents:
.docxfiles - Text Files:
.txtfiles
- Chunk Size: 1000 characters (configurable in code)
- Chunk Overlap: 200 characters
- Similarity Search: Top 5 relevant chunks (k=5)
- Model: Google Gemini 1.5 Flash
rag-document-analyzer/
β
βββ app.py # Main Streamlit application
βββ requirements.txt # Python dependencies
βββ README.md # Project documentation
βββ .streamlit/
β βββ secrets.toml # Streamlit secrets (create this)
βββ .env # Environment variables (optional)
- API Keys: Store securely using Streamlit secrets or environment variables
- Document Privacy: Documents are processed locally and not stored permanently
- No Data Retention: Chat history is session-based only
- Secure Processing: Uses Google's secure AI APIs
- Push your code to GitHub
- Connect your GitHub repo to Streamlit Cloud
- Add your
GOOGLE_API_KEYin the Streamlit Cloud secrets - Deploy with one click!
# Install in development mode
pip install -e .
# Run with debug mode
streamlit run app.py --logger.level=debug- Memory: Minimum 4GB RAM (8GB recommended for large documents)
- Storage: 1GB free space for dependencies
- Internet: Required for Google AI API calls
- Browser: Modern web browser (Chrome, Firefox, Safari, Edge)
Built with β€οΈ using Streamlit and Google Gemini AI
Transform your documents into intelligent conversations today!