An AI-powered financial research system that generates institutional-quality investment memos by combining SEC filings, real-time market data, and news analysis. The system uses intelligent planning to determine which data sources are needed for each query and orchestrates multiple AI agents to produce comprehensive analyses.
This system answers financial questions and generates investment memos by:
- Understanding Intent - A Planner Agent parses natural language queries to determine what data is needed (SEC filings, market metrics, news, or combinations)
- Gathering Context - Asynchronously fetches only the required data from multiple sources in parallel
- Generating Analysis - Routes to either:
- Answer Agent for quick factual responses (e.g., "What's Apple's P/E ratio?")
- Analyst Agent for comprehensive investment memos with structured recommendations
- Intent Classification: Automatically categorizes queries into 7 types (financials, news, valuation, or combinations)
- Source Selection: Only fetches data from sources actually needed for the query
- Parallel Execution: Gathers data from multiple sources simultaneously using async operations
- SEC Filings (RAG): Semantic search across 10-K, 10-Q, and 8-K filings stored in Pinecone vector database
- Market Data: Real-time metrics from Yahoo Finance (P/E, market cap, valuation multiples)
- Financial News: Curated news articles from Tavily API with domain filtering (Bloomberg, Reuters, CNBC)
- Investment Memos: Professional reports with executive summary, financial analysis, news synthesis, risks, and catalysts
- PDF Export: Automatically generates formatted PDF memos with tables and styling
- Source Attribution: Clear tracking of which data sources were used in each analysis
flowchart TD
A[User Query] --> B[Planner Agent GPT-4o]
B --> C{Intent Classification}
C -->|needs_sec_data| D[SEC RAG Tool]
C -->|needs_market_data| E[YFinance Tool]
C -->|needs_news| F[Tavily Tool]
D --> G[Context Orchestrator]
E --> G
F --> G
G --> H{Execution Plan}
H -->|answer| I[Answer Agent GPT-4o-mini]
H -->|investment memo| J[Analyst Agent GPT-4o]
I --> K[Quick Response]
J --> L[Investment Memo]
L --> M[PDF Generator]
subgraph DataSources[Data Sources]
D1[Pinecone Vector DB<br/>SEC Filing Chunks]
D2[YFinance API<br/>Market Metrics]
D3[Tavily API<br/>Financial News]
end
D -.-> D1
E -.-> D2
F -.-> D3
- Planner Agent analyzes user query β outputs structured
AnalysisIntent - Context Orchestrator executes data gathering in parallel based on intent flags
- Routing based on
execution_plan:- Simple questions β Answer Agent
- Comprehensive analysis β Analyst Agent
- Output printed to console and optionally exported as PDF
Finance_Research_Analyst_Agent/
βββ memo.py # Main orchestration engine (Planner + Analyst agents)
βββ doc_processor.py # SEC filing processor using Docling
βββ push2vdb.py # Pinecone vector DB loader
βββ generate_memo_pdf.py # PDF generation with ReportLab
βββ secEdgar_downloader.py # SEC EDGAR filing downloader
βββ requirements.txt # Python dependencies
βββ rag_knowledge_base/ # Processed SEC filing chunks (JSONL format)
β βββ master_index.json
β βββ AAPL/
β βββ 10-K/
β βββ 10-Q/
β βββ 8-K/
βββ sec-edgar-filings/ # Raw SEC filings (full-submission.txt)
βββ AAPL/
βββ 10-K/
βββ 10-Q/
βββ 8-K/
- Python 3.10+
- OpenAI API key
- Pinecone API key
- Tavily API key
- Clone and setup environment:
git clone <repository-url>
cd Finance_Research_Analyst_Agent
python -m venv .venv
source .venv/bin/activate # macOS/Linux- Install dependencies:
pip install -r requirements.txt- Configure API keys - Create
.envfile:
OPENAI_API_KEY=sk-...
PINECONE_API_KEY=pcsk-...
TAVILY_API_KEY=tvly-...python memo.pyThis launches an interactive REPL where you can ask financial questions:
Example Queries:
# Quick answers
Request> What is Apple's current P/E ratio?
Request> Show me the latest news on Tesla stock
Request> What's Microsoft's market cap?
# Investment memos
Request> Generate an investment memo for Apple Inc with comprehensive analysis
Request> Create a full investment memo on AMD including financials, news, and valuation
Request> Write an investment memo for Starbucks using latest information
Type examples to see more sample queries or quit to exit.
python memo.pyThis demonstrates a basic stock_agent with SEC RAG, market data, and news tools.
Purpose: Intelligent intent parsing and data source selection
Input: Natural language query
Output: Structured AnalysisIntent with:
intent_type: One of 7 categories (financials, news, valuation, combinations, comprehensive)companyandticker: Extracted entities- Boolean flags:
needs_sec_data,needs_market_data,needs_news execution_plan: "answer" or "investment memo"
Example Classification:
Query: "Is Microsoft overvalued right now?"
β intent_type: "financials_and_valuation"
β needs_sec_data: True, needs_market_data: True, needs_news: False
β execution_plan: "answer"- Semantic search against Pinecone vector database
- Separate queries for each financial section (revenue, profitability, cash flow, balance sheet)
- Filters by ticker symbol
- Returns contextualized text chunks with metadata
- Fetches from Yahoo Finance API via
yfinancelibrary - Returns: P/E, EV/EBITDA, P/B, current price, market cap, sector/industry
- Handles missing data gracefully with fallbacks
- Queries Tavily API with finance topic filter
- Domain whitelist: Bloomberg, Reuters, CNBC, MarketWatch
- Returns top 5 articles with titles, summaries, and URLs
Purpose: Parallel execution of data gathering based on planner intent
Features:
- Async execution using
asyncio.gather - Only calls tools flagged as needed by planner
- Returns
GatheredContextwith source attribution
Performance: 3 data sources fetched in parallel vs sequential (3x faster)
Model: GPT-4o-mini
Purpose: Quick, concise responses to factual questions
Output: Structured Answer with analysis text
Model: GPT-4o
Purpose: Comprehensive investment memo generation
Output: Structured InvestmentMemo with:
- Executive Summary (recommendation, target price, thesis)
- Key Metrics (table of valuation multiples)
- Financial Analysis (revenue, profitability, cash flow, balance sheet)
- Company News (summary, recent developments, market position)
- Risks and Catalysts (bullet lists)
- Analysis Scope (data sources used)
Converts investment memos to professional PDF reports using ReportLab with:
- Cover page with company name and recommendation
- Formatted tables for key metrics
- Structured sections with proper typography
- Automated currency formatting ($12.3B, $450M, etc.)
Downloads SEC filings from EDGAR:
from sec_edgar_downloader import Downloader
dl = Downloader("CompanyName", "your@email.com")
dl.get("10-K", "AAPL", limit=3)
dl.get("10-Q", "AAPL", limit=8)
dl.get("8-K", "AAPL", limit=10)Processing Pipeline:
- Extract main filing from
full-submission.txt(removes exhibits) - Convert HTML to structured document using Docling
- Chunk using HybridChunker (token-aware, max 512 tokens)
- Contextualize chunks with header context
- Enrich with metadata (ticker, form_type, section, page numbers, table flags)
- Export as JSONL files
Key Features:
- Handles HTML documents without page numbers (uses provenance metadata)
- Parallel processing with ThreadPoolExecutor
- Semantic chunking aligned with embedding model tokenization
- Metadata for filtering (ticker, form_type, filing_date, section, has_table)
Run Processor:
python doc_processor.pyOutput: rag_knowledge_base/{TICKER}/{FORM_TYPE}/{ACCESSION}_chunks.jsonl
Pipeline:
- Reads all
*_chunks.jsonlfiles fromrag_knowledge_base/ - Generates embeddings using
sentence-transformers/all-MiniLM-L6-v2 - Uploads to Pinecone with metadata for filtering
- Tests with sample queries
Run Loader:
python push2vdb.pyVector Metadata:
{
"text": "chunk content (truncated to 1000 chars)",
"ticker": "AAPL",
"form_type": "10-K",
"accession_number": "0000320193-24-000123",
"filing_date": "20241026",
"section": "Risk Factors",
"has_table": False,
"chunk_index": 5
}| Variable | Description | Default |
|---|---|---|
OPENAI_API_KEY |
OpenAI API key for GPT-4o/GPT-4o-mini | Required |
PINECONE_API_KEY |
Pinecone vector database API key | Required |
TAVILY_API_KEY |
Tavily search API key | Required |
| Parameter | Value | Description |
|---|---|---|
INDEX_NAME |
"sec-rag" |
Pinecone index name |
EMBEDDING_MODEL |
"sentence-transformers/all-MiniLM-L6-v2" |
Embedding model (384 dimensions) |
Planner_Agent |
gpt-4o |
Intent classification model |
Answer_Agent |
gpt-4o-mini |
Quick response model |
Analyst_Agent |
gpt-4o |
Investment memo generation model |
| Intent Type | Description | Data Sources | Use Case |
|---|---|---|---|
financials |
Fundamentals from filings | SEC | "Analyze Apple's financial performance" |
news |
Recent developments | News | "Latest news on Tesla" |
valuation |
Market metrics | Market | "What's Microsoft's P/E ratio?" |
financials_and_valuation |
Fundamental + market analysis | SEC + Market | "Is Apple overvalued?" |
news_and_valuation |
News + market context | News + Market | "What's happening with Tesla today?" |
financials_and_news |
Operational + narrative | SEC + News | "Compare revenue growth and recent news" |
comprehensive |
Full analysis | SEC + Market + News | "Generate investment memo for NVIDIA" |
User: What is Apple's current P/E ratio?
Company: Apple Inc (AAPL)
Question: What is Apple's current P/E ratio?
Apple Inc. currently has a trailing P/E ratio of 32.15, indicating
that investors are willing to pay $32.15 for every dollar of earnings.
This is slightly above the technology sector average of 28.3.
Data Sources: Market Data (yfinance)
INVESTMENT MEMO
================================================================================
π EXECUTIVE SUMMARY
Company: Apple Inc. (AAPL)
Recommendation: BUY
Target Price: $250 (12-month)
Time Horizon: 12 months
Thesis: Apple demonstrates strong fundamentals with consistent revenue
growth, robust profitability, and a solid balance sheet. Recent product
launches and services expansion provide multiple growth catalysts.
π KEY METRICS
Current Price: $189.45
Market Cap: $2,950,000,000,000
P/E Ratio: 32.15
EV/EBITDA: 23.8
Industry Context: Consumer Electronics | Technology
π° FINANCIAL ANALYSIS
Revenue Trends: Consistent YoY growth of 8-12% driven by iPhone
sales and services segment expansion...
Profitability: Operating margins remain strong at 27-30%...
Cash Flow: Generated $110B in operating cash flow...
π° NEWS & MARKET POSITION
Recent Developments: Apple announced new AI features...
β οΈ RISKS
1. Regulatory scrutiny in EU markets
2. Supply chain dependencies in Asia
3. Competition in premium smartphone segment
π CATALYSTS
1. AI integration across product portfolio
2. Services revenue expansion
3. India market penetration
π Analysis Scope: Comprehensive analysis using: SEC Filings, Market
Data (yfinance), Financial News (Tavily)
β
PDF saved: AAPL_comprehensive_memo.pdf
Each financial section (revenue, profitability, cash flow, balance sheet) uses a dedicated semantic query instead of a single combined query. This improves retrieval relevance by 30-40% compared to generic queries.
Data gathering runs in parallel using asyncio.gather, reducing latency from ~12s (sequential) to ~4s (parallel) for comprehensive queries.
All data tools have try-except blocks with fallback values, ensuring partial results even if one data source fails.
Every output includes sources_used list showing which data sources contributed to the analysis, enabling transparency and auditability.
All agent outputs use Pydantic models with strict type validation, ensuring consistent JSON-serializable results for downstream integrations.
Pinecone Connection Errors
Solution: Verify PINECONE_API_KEY and ensure index "sec-rag" exists
Check: python push2vdb.py (recreates index)
Empty YFinance Results
Issue: yfinance sometimes returns incomplete data
Solution: Code includes fallbacks for missing fields
Alternative: Use .history() for historical data instead of .info
Tavily Rate Limits
Issue: Free tier has 100 requests/month
Solution: Cache results or upgrade to paid tier
Workaround: Reduce max_results parameter
OpenAI API Errors
Issue: Token limits or rate limits exceeded
Solution: Check model_settings in agent initialization
GPT-4o: max_tokens=7000 for Analyst, 1024 for Answer Agent
All agents use pydantic_ai.Agent with:
output_type: Pydantic model for structured outputssystem_prompt: Detailed instructions for agent behaviormodel_settings: Temperature and max_tokens configuration
Access agent results using .output property:
result = await agent.run(prompt)
structured_output = result.output # Not .data- Create async tool function in
memo.py - Add to
gather_contextorchestrator - Update
AnalysisIntentmodel with new flag - Modify planner prompt to handle new source
- Update analyst prompt to use new data
To add new intent classification:
- Add to
intent_typeLiteral inAnalysisIntent - Update planner prompt taxonomy
- Add flag mapping rules
- Update routing logic in
generate_analysis
- Planner Intent Classification: ~1-2s
- SEC RAG Query (4 sections): ~2-3s
- Market Data Fetch: ~1-2s
- News Fetch: ~2-3s
- Answer Agent: ~3-5s
- Analyst Agent: ~15-25s
- PDF Generation: ~1s
Total for Investment Memo: ~20-30s (comprehensive analysis)
- Support for multiple company comparisons
- Historical trend analysis with time-series data
- Integration with more news sources (Google Finance, Seeking Alpha)
- Automated memo scheduling and alerts
- Web interface with Streamlit/Gradio
- Support for non-US companies and international filings
- Portfolio-level analysis across multiple positions
Contributions welcome! Please:
- Fork repository
- Create feature branch
- Add tests for new functionality
- Submit pull request with clear description
Last Updated: November 2, 2025