diff --git a/README.md b/README.md index 9e89eb6..6d1518a 100644 --- a/README.md +++ b/README.md @@ -2,20 +2,4 @@ ![retrieve-dspy](./visuals/cover.png) -`retrieve-dspy` contains pre-built Compound AI Systems for retrieval with DSPy. - -![pre-built-pipeline](./visuals/carbon/pre-built.png) - -`retrieve-dspy` contains evaluator code for the FreshStack and EnronQA benchmarks. - -![evaluate](./visuals/carbon/evaluate.png) - -You can easily interface `retrieve-dspy` pipelines with DSPy's optimizers. - -![optimizers](./visuals/carbon/optimize.png) - -### Run Tests with: - -```bash -uv run python scripts/run-eval.py -``` \ No newline at end of file +`retrieve-dspy` contains pre-built Compound AI Systems for retrieval with DSPy. \ No newline at end of file diff --git a/optimization_runs/1_gepa_optimized_query_expander.json b/optimization_runs/1_gepa_optimized_query_expander.json deleted file mode 100644 index e55fcfa..0000000 --- a/optimization_runs/1_gepa_optimized_query_expander.json +++ /dev/null @@ -1,28 +0,0 @@ -{ - "expand_query": { - "traces": [], - "train": [], - "demos": [], - "signature": { - "instructions": "You are given a user’s question. Your task is to produce a single “expanded_query” that a search engine can use to retrieve highly relevant, authoritative resources (official docs, examples, GitHub issues, and StackOverflow answers). Do not answer the question; only expand it into a rich, targeted search query.\n\nGeneral approach:\n- Identify the ecosystem, language\/runtime (JavaScript\/TypeScript vs Python), libraries, versions, and components (e.g., LangChain, LlamaIndex, Transformers, PyTorch, OpenAI\/Azure OpenAI).\n- Include exact class, function, and method names, key options\/flags, and any error messages verbatim (in quotes).\n- Add relevant synonyms and alternative phrasings (e.g., “streaming tokens”, “real-time token stream”, “callbacks”, “on token”).\n- Mention environment constraints (macOS Apple Silicon M1\/M2, CPU-only vs GPU, CUDA vs MPS, Google Colab).\n- Include migration\/versioning terms when deprecations are involved (e.g., “LangChainDeprecationWarning”, “invoke vs __call__”, “migration guide”).\n- Ask for concrete code examples, minimal reproducible snippets, configuration tips, and known issues\/bug reports.\n- Keep the expanded query concise but comprehensive (3–8 short sentences or bullets).\n\nOutput formatting:\n- Return only the expanded_query text (no headings or extra formatting).\n- Do not include explanations or code solutions; only the search-oriented query content.\n- Use plain text identifiers for classes\/functions\/options; avoid heavy formatting.\n\nDomain-specific guidance to include when relevant:\n\nA) LangChain JavaScript streaming with OpenAI:\n- Explicitly reference using ChatOpenAI (not OpenAI) with streaming: true and callbacks that implement handleLLMNewToken.\n- Ask how to attach callbacks at the model vs chain level and how to stream tokens in real time.\n- Include “ConversationChain”, “BufferMemory”, and “ChatPromptTemplate”. Note configuring BufferMemory with returnMessages: true for conversation history and system prompts.\n- Include troubleshooting for JavaScript\/TypeScript differences and up-to-date LangChain JS examples.\n\nB) LangChain Python ChatOpenAI deprecations\/invoke:\n- Use the import path from langchain_openai import ChatOpenAI (not legacy paths).\n- Ask how to replace deprecated __call__ with invoke, e.g., response = chat.invoke(history).\n- Include the warning\/error strings verbatim: “LangChainDeprecationWarning: BaseChatModel.__call__… use invoke instead” and “AttributeError: 'SystemMessage' object has no attribute 'name'”.\n- Request examples of using invoke with a list of SystemMessage, HumanMessage, AIMessage, and best practices for chat history and formatting.\n\nC) Apple Silicon (Mac M1\/M2), LlamaIndex, Transformers, PyTorch:\n- Include the exact error: “AssertionError: Torch not compiled with CUDA enabled”.\n- Note that setting device=0 is interpreted as a CUDA device index; ask how to set device=\"cpu\" or use PyTorch MPS on macOS (torch.backends.mps) and how to configure Transformers pipelines accordingly (or omit device).\n- Ask for guidance on running models like google\/flan-t5-large on Apple Silicon without CUDA, performance tips, and compatibility notes.\n- For portability: ask whether LlamaIndex (e.g., GPTSimpleVectorIndex\/GPTVectorStoreIndex) built in Colab\/GPU can be saved (save_to_disk\/index.json) and loaded\/used on Mac CPU (load_from_disk\/load_from_file), and any cross-environment caveats.\n\nD) LangChain + HuggingFace Transformers pipeline KeyError \"generated_text\":\n- Reference ConversationalRetrievalChain, HuggingFaceEmbeddings, and specific models (e.g., Gemma-2B-it); include calls like qa.invoke and qa.run and how chat_history is passed.\n- Ask about HuggingFace pipeline output schemas for text-generation vs text2text-generation (dict vs list of dicts, keys like \"generated_text\" vs others) and return_full_text.\n- Include the pitfall: ensure return_tensors='pt' is not passed when using transformers.pipeline for text generation, as it changes outputs and can cause KeyError \"generated_text\".\n- Request recent breaking changes in transformers\/LangChain affecting pipeline outputs and minimal reproducible examples.\n\nE) Azure OpenAI embeddings with LangChain JS (@langchain\/openai):\n- Include the 404 “Resource not found” error and parameters: azureOpenAIApiKey, azureOpenAIApiVersion, azureOpenAIApiDeploymentName, azureOpenAIBasePath, model\/modelName.\n- Emphasize that the deployment name must correspond to an embedding model deployment (e.g., text-embedding-3-large, text-embedding-3-small, text-embedding-ada-002) rather than a chat\/completions model (e.g., gpt-4).\n- Ask whether to use model or modelName vs azureOpenAIApiDeploymentName depending on library version, and the correct endpoint format (https:\/\/.openai.azure.com).\n- Request official examples for OpenAIEmbeddings initialization with Azure and known causes of 404 in LangChain JS.\n\nF) LangChain BaseMemory subclassing (Python):\n- Note that BaseMemory is a Pydantic model; fields must be declared as class attributes using Pydantic’s syntax (e.g., Field) rather than only in __init__ or dataclass patterns.\n- Include the error verbatim: “ValueError: 'AnotherMem' object has no field 'user_id'”.\n- Ask for examples of correctly subclassing BaseMemory with custom fields (UUID, str), implementing required abstract methods, and avoiding conflicts between __init__ and Pydantic models.\n- Request guidance on version-specific changes and best practices for custom memory classes.\n\nG) LangChain Redis retriever hybrid search (filters + vector):\n- Reference Redis vector search with LangChain Python Redis as_retriever and search_type=\"similarity_distance_threshold\".\n- Include exact params and keys: search_kwargs vs retriever_search_kwargs, filter, include_metadata, distance_threshold, k; ensure search_kwargs is passed as a Python dict (not a string).\n- Include the Redis error verbatim: “redis.exceptions.ResponseError: Invalid attribute yield_distance_as”.\n- Ask about known bug in LangChain’s _prepare_range_query (or prepare_range_query) building incorrect RediSearch query syntax and how to fix\/upgrade.\n- Ask for correct query construction where the filter expression is combined properly with the vector range, and whether the filter must precede the vector clause in the RediSearch query string.\n- Request minimal reproducible snippets and official docs for RedisVectorStore with metadata filtering, plus any version-specific notes (e.g., langchain 0.0.346, langchain-core 0.0.10).\n\nH) LangChain Python agents: stop after tool execution \/ return raw tool output:\n- Reference initialize_agent with agent='chat-conversational-react-description', tools=[...], memory, max_iterations, early_stopping_method.\n- Include BaseTool.return_direct=True to short-circuit and return tool output immediately.\n- Mention AgentFinish and how to return tool outputs verbatim without further LLM formatting or _take_next_step.\n- Ask for examples showing tools that emit serialized data and how to avoid corruption by additional agent steps, including agent_kwargs\/output parsers\/callbacks.\n\nQuality checklist before submitting:\n- Is the query specific to the user’s stack, class\/method names, exact parameters, and error messages?\n- Does it request examples, docs, configuration tips, and troubleshooting steps\/known issues?\n- If the question touches one of the domains above, did you include the corresponding domain-specific details (A–H)?\n- For Redis retriever issues, did you mention _prepare_range_query, search_kwargs as dict (not string), and the “Invalid attribute yield_distance_as” error and filter placement?\n- For agent\/tool short-circuiting, did you mention BaseTool.return_direct=True and AgentFinish?\n- Is the output only the expanded_query text with no extra commentary, kept to 3–8 short sentences or bullets?", - "fields": [ - { - "prefix": "Question:", - "description": "${question}" - }, - { - "prefix": "Expanded Query:", - "description": "${expanded_query}" - } - ] - }, - "lm": null - }, - "metadata": { - "dependency_versions": { - "python": "3.11", - "dspy": "3.0.0", - "cloudpickle": "3.1" - } - } -} \ No newline at end of file diff --git a/optimization_runs/1_gepa_query_expander_training_samples.jsonl b/optimization_runs/1_gepa_query_expander_training_samples.jsonl deleted file mode 100644 index bbf786b..0000000 --- a/optimization_runs/1_gepa_query_expander_training_samples.jsonl +++ /dev/null @@ -1,20 +0,0 @@ -{"question": "from langchain.schema import BaseMemory\n\nclass ChatMemory(BaseMemory):\n def __init__(self, user_id: UUID, type: str):\n self.user_id = user_id\n self.type = type\n\n # implemented abstract methods\n\nclass AnotherMem(ChatMemory):\n def __init__(self, user_id: UUID, type: str):\n super().__init__(user_id, type)\n\nThis seems simple enough - but I get an error: ValueError: \"AnotherMem\" object has no field \"user_id\". What am I doing wrong?\nNote that BaseMemory is an interface.\n"} -{"question": "I have successfully connected to a Redshift database like below and got all the table names;\nconn = psycopg2.connect(host,db,port,username,password)\ncursor.execute(\"SELECT tablename FROM pg_tables GROUP BY tablename ORDER BY tablename\")\n\nHowever, when I connect using langchain and sqlalchemy like below, get_usable_table_names returns few of many tables in the database;\npg_url = f\"postgresql+psycopg2://{db_user}:{db_password}@{db_host}:{port_}/{db_}\"\ndb_engine = create_engine(pg_url)\ndb = SQLDatabase(db_engine)\nllm = OpenAI(temperature=0.0, openai_api_key=OPENAI_API_KEY, model='gpt-3.5-turbo')\n\ntable_names = \"\\n\".join(db.get_usable_table_names())\n\nAnyone has any suggestions on what might be the issue?\nI have tried querying a missing table by;\ndb.run(\"SELECT * FROM db_schema.missing_table_name\") \n\nand this works. However, I need SQLDatabase from langchain.sql_database module to detect the tables right without specifying one by one. (Because I would like to Chat With Sql Database Using Langchain & OpenAI)\n"} -{"question": "I am playing with langchain/openai/faiss to create chatbot that reads all PDFs, and can answer based on what it learned from them.\nWhat I want to know is there a way to limit answers to knowledge only from documentation, if answer is not in docs bot should respond I do not know or something like that.\nHere is the code:\n llm = ChatOpenAI(temperature=0, max_tokens=1000,\n model_name=\"gpt-3.5-turbo-16k\")\n memory = ConversationBufferMemory(memory_key=\"chat_history\")\n chat = ConversationalRetrievalChain.from_llm(\n llm=llm,retriever=vector_store.as_retriever(),memory=memory)\n \n if \"messages\" not in st.session_state:\n st.session_state.messages = []\n\n if not st.session_state.messages:\n welcome_message = {\"role\": \"assistant\",\n \"content\": \"Hello, how can i help?\"}\n st.session_state.messages.append(welcome_message)\n\n for message in st.session_state.messages:\n with st.chat_message(message[\"role\"]):\n st.markdown(message[\"content\"])\n\n\n if prompt := st.chat_input(\"State your question\"):\n st.session_state.messages.append({\"role\": \"user\", \"content\": prompt})\n with st.chat_message(\"user\"):\n st.markdown(prompt)\n result = chat({\"question\": prompt, \"chat_history\": [\n (message[\"role\"], message[\"content\"]) for message in st.session_state.messages]})\n\n with st.chat_message(\"assistant\"):\n full_response = result[\"answer\"]\n st.markdown(full_response)\n\n st.session_state.messages.append(\n {\"role\": \"assistant\", \"content\": full_response})\n \n\n"} -{"question": "I am implementing RAG on a Gemma-2B-it model using langchain's HuggingFaceEmbeddings and ConversationalRetrievalChain.\nWhen running:\nchat_history = []\nquestion = \"My prompt\"\nresult = qa.invoke({\"question\": question, \"chat_history\": chat_history})\n\n\nI get\n 276 \n 277 if self.pipeline.task == \"text-generation\":\n--> 278 text = response[\"generated_text\"]\n 279 elif self.pipeline.task == \"text2text-generation\":\n 280 text = response[\"generated_text\"]\n\nKeyError: 'generated_text'\n\nI don't understand why this is happening. It used to work and, today, it just stopped working. I have also tried using qa.run instead of invoke but it still raises the same exception.\nI have tried changing models, devices but nothing fixes it.\n"} -{"question": "Im using a conversational agent, with some tools, one of them is a calculator tool (for the sake of example).\nAgent initializated as follows:\nconversational_agent = initialize_agent(\n agent='chat-conversational-react-description',\n tools=[CalculatorTool()],\n llm=llm_gpt4,\n verbose=True,\n max_iterations=2,\n early_stopping_method=\"generate\",\n memory=memory,\n # agent_kwargs=dict(output_parser=output_parser),\n )\n\n\nWhen the CalculatorTool is being activated, it will return a string output, the agent takes that output and process it further to get to the \"Final Answer\" thus changing the formatting of the output from the CalculatorTool\nFor example, for input 10*10, the tool run() function will return 100, which will be propagated back to the agent, that will call self._take_next_step() and continue processing the output.\nIt will create a final output similar the result of your prompt of 10x10 is 100\nI dont want the added formatting by the LLM, just the output of 100.\nI want to break the chain when the CalculatorTool is done, and have it's output returned to the client as is.\nI also have have tools that return serialized data, for a graph chart, having that data re-processed by next iterations of the agent will make it invalid.\n"} -{"question": "I am writing a little application in JavaScript using the LangChain library. I have the following snippet:\n/* LangChain Imports */\nimport { OpenAI } from \"langchain/llms/openai\";\nimport { BufferMemory } from \"langchain/memory\";\nimport { ConversationChain } from \"langchain/chains\";\n\n// ========================================================================================= //\n // ============= Use LangChain to send request to OpenAi API =============================== //\n // ========================================================================================= //\n\n const openAILLMOptions = {\n modelName: chatModel.value,\n openAIApiKey: decryptedString,\n temperature: parseFloat(temperatureValue.value),\n topP: parseFloat(topP.value),\n maxTokens: parseInt(maxTokens.value),\n stop: stopSequences.value.length > 0 ? stopSequences.value : null,\n streaming: true,\n};\n\n const model = new OpenAI(openAILLMOptions);\n const memory = new BufferMemory();\n const chain = new ConversationChain({ llm: model, memory: memory });\n\n try {\n const response = await chain.call({ input: content.value, signal: signal }, undefined,\n [\n {\n\n handleLLMNewToken(token) {\n process.stdout.write(token);\n },\n },\n ]\n );\n\n// handle the response\n\n}\n\nThis does not work (I tried both using the token via TypeScript and without typing). I have scoured various forums and they are either implementing streaming with Python or their solution is not relevant to this problem. So to summarize, I can successfully pull the response from OpenAI via the LangChain ConversationChain() API call, but I can\u2019t stream the response. Is there a solution?\n"} -{"question": "I'm attempted to pass draft documents and have my chatbot generate a template using a prompt create a non disclosure agreement draft for California between mike llc and fantasty world. with my code below the response i'm getting is:\n\"I'm sorry, but I cannot generate a non-disclosure agreement draft for you. However, you can use the provided context information as a template to create a non-disclosure agreement between Mike LLC and fantasty world. Just replace the placeholders in the template with the appropriate names and information for your specific agreement.\nHere is my setup:\nimport sys\nimport os\nimport openai\nimport constants\nimport gradio as gr\nfrom langchain.chat_models import ChatOpenAI\n\nfrom llama_index import SimpleDirectoryReader, GPTListIndex, GPTVectorStoreIndex, LLMPredictor, PromptHelper, load_index_from_storage\n\n# Disable SSL certificate verification (for debugging purposes)\nos.environ['REQUESTS_CA_BUNDLE'] = '' # Set it to an empty string\n\nos.environ[\"OPENAI_API_KEY\"] = constants.APIKEY\nopenai.api_key = os.getenv(\"OPENAI_API_KEY\")\nprint(os.getenv(\"OPENAI_API_KEY\"))\n\ndef createVecorIndex(path):\n max_input = 4096\n tokens = 512\n chunk_size = 600\n max_chunk_overlap = 0.1\n\n prompt_helper = PromptHelper(max_input, tokens, max_chunk_overlap, chunk_size_limit=chunk_size)\n\n #define llm\n llmPredictor = LLMPredictor(llm=ChatOpenAI(temperature=.7, model_name='gpt-3.5-turbo', max_tokens=tokens))\n\n #load data\n docs = SimpleDirectoryReader(path).load_data()\n\n #create vector index\n vectorIndex = GPTVectorStoreIndex(docs, llmpredictor=llmPredictor, prompt_helper=prompt_helper)\n vectorIndex.storage_context.persist(persist_dir='vectorIndex.json')\n\n return vectorIndex\n\nvectorIndex = createVecorIndex('docs')\n\nIn my docs directory, I have a few examples of non-disclosure agreements to create the vector index.\nThis was my first attempt at the query:\ndef chatbot(input_index):\n query_engine = vectorIndex.as_query_engine()\n response = query_engine.query(input_index)\n return response.response\n\ngr.Interface(fn=chatbot, inputs=\"text\", outputs=\"text\", title=\"Super Awesome Chatbot\").launch()\n\nI can't seem to get it to generate the draft, it keeps giving me the \"I cannot generate a draft\" response\nI also tried to create a clause for the word draft, but the setup below is essential useing the trained model instead my vector.\ndef chatbot(input_index):\n query_engine = vectorIndex.as_query_engine()\n\n # If the \"draft\" clause is active:\n if \"draft\" in input_index.lower():\n # Query the vectorIndex for relevant information/context\n vector_response = query_engine.query(input_index).response\n print(vector_response)\n # Use vector_response as context to query the OpenAI API for a draft\n prompt = f\"Based on the information: '{vector_response}', generate a draft for the input: {input_index}\"\n \n response = openai.Completion.create(\n engine=\"text-davinci-002\",\n prompt=prompt,\n max_tokens=512,\n temperature=0.2\n )\n \n openai_response = response.choices[0].text.strip()\n \n return openai_response\n\n # If \"draft\" clause isn't active, use just the vectorIndex response\n else:\n print('else clause')\n return query_engine.query(input_index).response\n\n"} -{"question": "So far my research only shows me how to filter to a specific a specific document or page but it doesn't show how to exclude some documents from the search.\nresults_with_scores = db.similarity_search_with_score(\"foo\", filter=dict(page=1))\n\n"} -{"question": "I'm trying to pass filters to redis retriever to do hybrid search on my embeddings (vector + metadata filtering). The following doesn't work! It fails to pass the filters and filters would always be None:\nretriever = redis.as_retriever(\n search_type=\"similarity_distance_threshold\",\n search_kwargs=\"{'include_metadata': True,'distance_threshold': 0.8,'k': 5}\",\n filter=\"(@launch:{false} @menu_text:(%%chicken%%))\"\n )\n\nI found another example and apparently filter expression should be pass as search_kwargs, but I can't figure out what should be the correct syntax. If I do it as follow:\nretriever = redis.as_retriever(\n search_type=\"similarity_distance_threshold\",\n \"retriever_search_kwargs\":\"{'include_metadata': True,'distance_threshold': 0.8,'k': 5, 'filter': '@menu_text:(%%chicken%%) @lunch:{true}'}\",\n}\n\nit generates this search query:\nsimilarity_search_by_vector > redis_query : (@content_vector:[VECTOR_RANGE $distance_threshold $vector] @menu_text:(%%chicken%%) @lunch:{true})=>{$yield_distance_as: distance}\nand fails with the following error:\nredis.exceptions.ResponseError: Invalid attribute yield_distance_as\nAny idea how to fix it?\nSystem Info:\nlangchain 0.0.346\nlangchain-core 0.0.10\npython 3.9.18\n"} -{"question": "I am working with LangChain for the first time. Due to data security, I want to be sure about the storage of langchain's vector store storage. I am using HNSWLib vector store, which mentions it is an in-memory store. What does it mean? Does Langchain/vector stores store any data in its servers?\nhttps://js.langchain.com/docs/modules/indexes/vector_stores/integrations/hnswlib\nhttps://github.com/nmslib/hnswlib\n"} -{"question": "I'm using langchain to process a whole bunch of documents which are in an Mongo database.\nI can load all documents fine into the chromadb vector storage using langchain. Nothing fancy being done here. This is my code:\n\nfrom langchain.embeddings.openai import OpenAIEmbeddings\nembeddings = OpenAIEmbeddings()\n\nfrom langchain.vectorstores import Chroma\ndb = Chroma.from_documents(docs, embeddings, persist_directory='db')\ndb.persist()\n\n\nNow, after storing the data, I want to get a list of all the documents and embeddings WITH id's.\nThis is so I can store them back into MongoDb.\nI also want to put them through Bertopic to get the topic categories.\nQuestion 1 is: how do I get all documents I've just stored in the Chroma database? I want the documents, and all the metadata.\nMany thanks for your help!\n"} -{"question": "I'm having trouble using LangChain embedding with Azure OpenAI credentials - it's showing a 404 error for resource not found.\nstack trace: Error: 404 Resource not found\n at APIError.generate (c:\\abcproject\\node_modules\\openai\\error.js:53:20\n\nimport { OpenAIEmbeddings } from \"@langchain/openai\"\n\nexport const embeddingModel = new OpenAIEmbeddings({ \n azureOpenAIApiKey: \"AzureOpenAI api key\",\n azureOpenAIApiVersion: \"2023-08-01-preview\",\n azureOpenAIApiDeploymentName: \"gpt-4-32k\",\n azureOpenAIBasePath:\"Azure OpenAI endpoint\"\n});\n\n\n"} -{"question": "I have put together a script that works just fine using OpenAI api. I am now trying to switch it over to AzureOpenAI yet it seems I am running into an issue with the create_sql_agent(). Can you use create_sql_agent with AzureOpenAI model gpt-35-turbo-1106? Could it be an issue with my api_version within AzureOpenAI()? The error I receive is \"TypeError: Completions. create() got an unexpected keyword argument 'tools'\" which I think could also be the option using 'openai-tools' as my agent_type?\nCode\nimport os\nfrom langchain_openai import AzureOpenAI\nfrom langchain.agents import create_sql_agent\nfrom langchain.agents.agent_toolkits import SQLDatabaseToolkit\nfrom langchain.sql_database import SQLDatabase\nfrom dotenv import load_dotenv\nfrom langchain.agents import AgentExecutor\n\nfrom langchain_core.prompts.chat import (\n ChatPromptTemplate,\n HumanMessagePromptTemplate,\n SystemMessagePromptTemplate,\n AIMessagePromptTemplate,\n MessagesPlaceholder,\n)\n\npath = (os.getcwd()+'\\creds.env')\n\nload_dotenv(path) \n\ndb = SQLDatabase.from_uri(\n f\"postgresql://{os.environ.get('user')}:{os.environ.get('password')}@{os.environ.get('host')}:{os.environ.get('port')}/{os.environ.get('database')}\")\n\nllm = AzureOpenAI(azure_endpoint=MY_ENDPOINT,\n deployment_name=MY_DEPLOYMENT_NAME,\n model_name='gpt-35-turbo', # should it be 'gpt-35-turbo-1106'?\n temperature = 0,\n api_key = MY_KEY,\n api_version = '2023-07-01-preview') #my api_version correct? Uncertain which one\n\ntoolkit = SQLDatabaseToolkit(db=db, llm=llm)\n\nprefix = \"\"\"\nYou are an agent designed to interact with a SQL database.\nGiven an input question, create a syntactically correct {dialect} query to run, then look at the results of the query and return the answer.\nUnless the user specifies a specific number of examples they wish to obtain, always limit your query to at most {top_k} results.\nYou can order the results by a relevant column to return the most interesting examples in the database.\nNever query for all the columns from a specific table, only ask for the relevant columns given the question.\nYou have access to tools for interacting with the database.\nOnly use the below tools. Only use the information returned by the below tools to construct your final answer.\nYou MUST double-check your query before executing it. If you get an error while executing a query, rewrite the query and try again.\n\nDO NOT make any DML statements (INSERT, UPDATE, DELETE, DROP, CASCADE, etc.) to the database.\n\nIf the question does not seem related to the database, just return \"I don't know\" as the answer.\n\nIf asked about a person do not return an 'ID' but return a first name and last name.\n\n\"\"\"\n\nsuffix = \"\"\" I should look at the tables in the database to see what I can query. Then I should query the schema of the most relevant tables.\n\"\"\"\n\nmessages = [\n SystemMessagePromptTemplate.from_template(prefix),\n HumanMessagePromptTemplate.from_template(\"{input}\"),\n AIMessagePromptTemplate.from_template(suffix),\n MessagesPlaceholder(variable_name=\"agent_scratchpad\"),\n ]\n\n\nagent_executor = create_sql_agent(llm,\n toolkit=toolkit,\n agent_type='openai-tools', #does this work with azure?\n prompt=prompt,\n verbose=False)\n\n\nprint(agent_executor.invoke(\"What are the names of the tables\"))\n\nError\n---------------------------------------------------------------------------\nTypeError Traceback (most recent call last)\nCell In[69], line 1\n----> 1 print(agent_executor.invoke(\"What are the names of the tables\"))\n\nFile ~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\langchain\\chains\\base.py:163, in Chain.invoke(self, input, config, **kwargs)\n 161 except BaseException as e:\n 162 run_manager.on_chain_error(e)\n--> 163 raise e\n 164 run_manager.on_chain_end(outputs)\n 166 if include_run_info:\n\nFile ~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\langchain\\chains\\base.py:153, in Chain.invoke(self, input, config, **kwargs)\n 150 try:\n 151 self._validate_inputs(inputs)\n 152 outputs = (\n--> 153 self._call(inputs, run_manager=run_manager)\n 154 if new_arg_supported\n 155 else self._call(inputs)\n 156 )\n 158 final_outputs: Dict[str, Any] = self.prep_outputs(\n 159 inputs, outputs, return_only_outputs\n 160 )\n 161 except BaseException as e:\n\nFile ~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\langchain\\agents\\agent.py:1432, in AgentExecutor._call(self, inputs, run_manager)\n 1430 # We now enter the agent loop (until it returns something).\n 1431 while self._should_continue(iterations, time_elapsed):\n-> 1432 next_step_output = self._take_next_step(\n 1433 name_to_tool_map,\n 1434 color_mapping,\n 1435 inputs,\n 1436 intermediate_steps,\n 1437 run_manager=run_manager,\n 1438 )\n 1439 if isinstance(next_step_output, AgentFinish):\n 1440 return self._return(\n 1441 next_step_output, intermediate_steps, run_manager=run_manager\n 1442 )\n\nFile ~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\langchain\\agents\\agent.py:1138, in AgentExecutor._take_next_step(self, name_to_tool_map, color_mapping, inputs, intermediate_steps, run_manager)\n 1129 def _take_next_step(\n 1130 self,\n 1131 name_to_tool_map: Dict[str, BaseTool],\n (...)\n 1135 run_manager: Optional[CallbackManagerForChainRun] = None,\n 1136 ) -> Union[AgentFinish, List[Tuple[AgentAction, str]]]:\n 1137 return self._consume_next_step(\n-> 1138 [\n 1139 a\n 1140 for a in self._iter_next_step(\n 1141 name_to_tool_map,\n 1142 color_mapping,\n 1143 inputs,\n 1144 intermediate_steps,\n 1145 run_manager,\n 1146 )\n 1147 ]\n 1148 )\n\nFile ~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\langchain\\agents\\agent.py:1138, in (.0)\n 1129 def _take_next_step(\n 1130 self,\n 1131 name_to_tool_map: Dict[str, BaseTool],\n (...)\n 1135 run_manager: Optional[CallbackManagerForChainRun] = None,\n 1136 ) -> Union[AgentFinish, List[Tuple[AgentAction, str]]]:\n 1137 return self._consume_next_step(\n-> 1138 [\n 1139 a\n 1140 for a in self._iter_next_step(\n 1141 name_to_tool_map,\n 1142 color_mapping,\n 1143 inputs,\n 1144 intermediate_steps,\n 1145 run_manager,\n 1146 )\n 1147 ]\n 1148 )\n\nFile ~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\langchain\\agents\\agent.py:1166, in AgentExecutor._iter_next_step(self, name_to_tool_map, color_mapping, inputs, intermediate_steps, run_manager)\n 1163 intermediate_steps = self._prepare_intermediate_steps(intermediate_steps)\n 1165 # Call the LLM to see what to do.\n-> 1166 output = self.agent.plan(\n 1167 intermediate_steps,\n 1168 callbacks=run_manager.get_child() if run_manager else None,\n 1169 **inputs,\n 1170 )\n 1171 except OutputParserException as e:\n 1172 if isinstance(self.handle_parsing_errors, bool):\n\nFile ~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\langchain\\agents\\agent.py:514, in RunnableMultiActionAgent.plan(self, intermediate_steps, callbacks, **kwargs)\n 506 final_output: Any = None\n 507 if self.stream_runnable:\n 508 # Use streaming to make sure that the underlying LLM is invoked in a\n 509 # streaming\n (...)\n 512 # Because the response from the plan is not a generator, we need to\n 513 # accumulate the output into final output and return that.\n--> 514 for chunk in self.runnable.stream(inputs, config={\"callbacks\": callbacks}):\n 515 if final_output is None:\n 516 final_output = chunk\n\nFile ~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\langchain_core\\runnables\\base.py:2875, in RunnableSequence.stream(self, input, config, **kwargs)\n 2869 def stream(\n 2870 self,\n 2871 input: Input,\n 2872 config: Optional[RunnableConfig] = None,\n 2873 **kwargs: Optional[Any],\n 2874 ) -> Iterator[Output]:\n-> 2875 yield from self.transform(iter([input]), config, **kwargs)\n\nFile ~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\langchain_core\\runnables\\base.py:2862, in RunnableSequence.transform(self, input, config, **kwargs)\n 2856 def transform(\n 2857 self,\n 2858 input: Iterator[Input],\n 2859 config: Optional[RunnableConfig] = None,\n 2860 **kwargs: Optional[Any],\n 2861 ) -> Iterator[Output]:\n-> 2862 yield from self._transform_stream_with_config(\n 2863 input,\n 2864 self._transform,\n 2865 patch_config(config, run_name=(config or {}).get(\"run_name\") or self.name),\n 2866 **kwargs,\n 2867 )\n\nFile ~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\langchain_core\\runnables\\base.py:1880, in Runnable._transform_stream_with_config(self, input, transformer, config, run_type, **kwargs)\n 1878 try:\n 1879 while True:\n-> 1880 chunk: Output = context.run(next, iterator) # type: ignore\n 1881 yield chunk\n 1882 if final_output_supported:\n\nFile ~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\langchain_core\\runnables\\base.py:2826, in RunnableSequence._transform(self, input, run_manager, config)\n 2817 for step in steps:\n 2818 final_pipeline = step.transform(\n 2819 final_pipeline,\n 2820 patch_config(\n (...)\n 2823 ),\n 2824 )\n-> 2826 for output in final_pipeline:\n 2827 yield output\n\nFile ~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\langchain_core\\runnables\\base.py:1283, in Runnable.transform(self, input, config, **kwargs)\n 1280 final: Input\n 1281 got_first_val = False\n-> 1283 for chunk in input:\n 1284 if not got_first_val:\n 1285 final = adapt_first_streaming_chunk(chunk) # type: ignore\n\nFile ~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\langchain_core\\runnables\\base.py:4728, in RunnableBindingBase.transform(self, input, config, **kwargs)\n 4722 def transform(\n 4723 self,\n 4724 input: Iterator[Input],\n 4725 config: Optional[RunnableConfig] = None,\n 4726 **kwargs: Any,\n 4727 ) -> Iterator[Output]:\n-> 4728 yield from self.bound.transform(\n 4729 input,\n 4730 self._merge_configs(config),\n 4731 **{**self.kwargs, **kwargs},\n 4732 )\n\nFile ~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\langchain_core\\runnables\\base.py:1300, in Runnable.transform(self, input, config, **kwargs)\n 1293 raise TypeError(\n 1294 f\"Failed while trying to add together \"\n 1295 f\"type {type(final)} and {type(chunk)}.\"\n 1296 f\"These types should be addable for transform to work.\"\n 1297 )\n 1299 if got_first_val:\n-> 1300 yield from self.stream(final, config, **kwargs)\n\nFile ~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\langchain_core\\language_models\\llms.py:458, in BaseLLM.stream(self, input, config, stop, **kwargs)\n 451 except BaseException as e:\n 452 run_manager.on_llm_error(\n 453 e,\n 454 response=LLMResult(\n 455 generations=[[generation]] if generation else []\n 456 ),\n 457 )\n--> 458 raise e\n 459 else:\n 460 run_manager.on_llm_end(LLMResult(generations=[[generation]]))\n\nFile ~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\langchain_core\\language_models\\llms.py:442, in BaseLLM.stream(self, input, config, stop, **kwargs)\n 440 generation: Optional[GenerationChunk] = None\n 441 try:\n--> 442 for chunk in self._stream(\n 443 prompt, stop=stop, run_manager=run_manager, **kwargs\n 444 ):\n 445 yield chunk.text\n 446 if generation is None:\n\nFile ~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\langchain_openai\\llms\\base.py:262, in BaseOpenAI._stream(self, prompt, stop, run_manager, **kwargs)\n 260 params = {**self._invocation_params, **kwargs, \"stream\": True}\n 261 self.get_sub_prompts(params, [prompt], stop) # this mutates params\n--> 262 for stream_resp in self.client.create(prompt=prompt, **params):\n 263 if not isinstance(stream_resp, dict):\n 264 stream_resp = stream_resp.model_dump()\n\nFile ~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\openai\\_utils\\_utils.py:277, in required_args..inner..wrapper(*args, **kwargs)\n 275 msg = f\"Missing required argument: {quote(missing[0])}\"\n 276 raise TypeError(msg)\n--> 277 return func(*args, **kwargs)\n\nTypeError: Completions.create() got an unexpected keyword argument 'tools'\n\n"} -{"question": "I'm currently working with LangChain and using the TextLoader class to load text data from a file and utilize it within a Vectorstore index. However, I've noticed that response times to my queries are increasing as my text file grows larger. To enhance performance, I'm wondering if there are ways to expedite the response times.\nSample Code:\npython\n\nimport os\nimport time\nfrom langchain.document_loaders import TextLoader\nfrom langchain.indexes import VectorstoreIndexCreator\nfrom langchain.chat_models import ChatOpenAI\nimport constants\n\nos.environ[\"OPENAI_API_KEY\"] = constants.OPENAI_API_KEY\n\nloader = TextLoader(\"all_content.txt\", encoding=\"utf-8\")\n\n# Record the start time\nstart_time = time.time()\n\nindex = VectorstoreIndexCreator().from_loaders([loader])\n\nquery = \"My question?\"\nresponse = index.query(query).encode('utf-8').decode('utf-8')\nprint(response)\n\n# Record the end time\nend_time = time.time()\n\n# Calculate the execution time\nexecution_time = end_time - start_time\nprint(f\"Execution time: {execution_time:.4f} seconds\")\n\nMy Questions:\n\nAre there ways to optimize response times when using TextLoader?\n\nCan caching be effectively employed to reduce response times? If so, how can I integrate it into my current implementation?\n\nAre there alternative approaches or techniques I can employ to effectively shorten response times?\n\n\nI've noticed that response times increase as my text file grows, and I'm actively seeking ways to enhance the performance of my queries. Any advice or suggestions for optimizing this implementation would be greatly appreciated. Thank you in advance!\nread langchain docs and tried momento cache\n"} -{"question": "I am trying to ask questions against a multiple pdf using pinecone and openAI but I dont know how to.\nThe code below works for asking questions against one document. but I would like to have multiple documents to ask questions against:\n\n# process_message.py\nfrom flask import request\nimport pinecone\n# from PyPDF2 import PdfReader\nfrom langchain.embeddings.openai import OpenAIEmbeddings\nfrom langchain.text_splitter import CharacterTextSplitter\nfrom langchain.vectorstores import ElasticVectorSearch, Pinecone, Weaviate, FAISS\nfrom langchain.chains.question_answering import load_qa_chain\nfrom langchain.llms import OpenAI\nimport os\nimport json\n# from constants.company import file_company_id_column, file_location_column, file_name_column\nfrom services.files import FileFireStorage\nfrom middleware.auth import check_authorization\nimport configparser\nfrom langchain.document_loaders import UnstructuredPDFLoader, OnlinePDFLoader, PyPDFLoader\nfrom langchain.text_splitter import RecursiveCharacterTextSplitter\n\n\ndef process_message():\n \n # Create a ConfigParser object and read the config.ini file\n config = configparser.ConfigParser()\n config.read('config.ini')\n # Retrieve the value of OPENAI_API_KEY\n openai_key = config.get('openai', 'OPENAI_API_KEY')\n pinecone_env_key = config.get('pinecone', 'PINECONE_ENVIRONMENT')\n pinecone_api_key = config.get('pinecone', 'PINECONE_API_KEY')\n\n\n loader = PyPDFLoader(\"docs/ops.pdf\")\n data = loader.load()\n # data = body['data'][1]['name']\n # Print information about the loaded data\n print(f\"You have {len(data)} document(s) in your data\")\n print(f\"There are {len(data[30].page_content)} characters in your document\")\n\n # Chunk your data up into smaller documents\n text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=0)\n texts = text_splitter.split_documents(data)\n \n\n embeddings = OpenAIEmbeddings(openai_api_key=openai_key)\n\n pinecone.init(api_key=pinecone_api_key, environment=pinecone_env_key)\n index_name = \"pdf-chatbot\" # Put in the name of your Pinecone index here\n\n docsearch = Pinecone.from_texts([t.page_content for t in texts], embeddings, index_name=index_name)\n # Query those docs to get your answer back\n llm = OpenAI(temperature=0, openai_api_key=openai_key)\n chain = load_qa_chain(llm, chain_type=\"stuff\")\n\n query = \"Are there any other documents listed in this document?\"\n docs = docsearch.similarity_search(query)\n answer = chain.run(input_documents=docs, question=query)\n print(answer)\n\n return answer\n\nI added as many comments as I could there.\nI got this information from https://www.youtube.com/watch?v=h0DHDp1FbmQ\nI tried to look at other stackoverflow questions about this but could not find anything similar\n"} -{"question": "Here's my code:\nimport pickle, os\nfrom langchain_openai.chat_models import ChatOpenAI\nfrom langchain.schema import (\n AIMessage,\n HumanMessage,\n SystemMessage\n)\n\ndef execute_prompt(text, history, jarvis_setup):\n print(f\"You said: {text}\")\n history.append(HumanMessage(content = text))\n response = jarvis_setup(history)\n history.append(AIMessage(content = response.content))\n with open('JarvisMemory.txt', 'wb') as file:\n pickle.dump(history, file)\n \n print(response.content)\n\ndef main():\n jarvis_setup = ChatOpenAI(openai_api_key=\"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\", model = \"gpt-3.5-turbo\", temperature = 0.7, max_tokens = 400)\n #history = [SystemMessage(content=\"You are a human-like virtual assistant named Jarvis.\", additional_kwargs={})]\n if os.path.exists(\"JarvisMemory.txt\"):\n with open(\"JarvisMemory.txt\", \"rb\") as file:\n history = pickle.load(file)\n else:\n with open(\"JarvisMemory.txt\", \"wb\") as file:\n history = [SystemMessage(content=\"You are a human-like virtual assistant named Jarvis. Answer all questions as shortly as possible, unless a longer, more detailed response is requested.\", additional_kwargs={})]\n pickle.dump(history, file)\n \n while True:\n print(\"\\n\")\n print(\"Enter prompt.\")\n text = input().lower()\n print(\"Prompt sent.\")\n \n if text:\n execute_prompt(text, history, jarvis_setup)\n \n else:\n print(\"No prompt given.\")\n continue\n \nif __name__ == \"__main__\":\n main()\n\nAnd I get this error:\nLangChainDeprecationWarning: The method BaseChatModel.__call__ was deprecated in langchain-core 0.1.7 and will be removed in 0.3.0. Use invoke instead.\nwarn_deprecated(\nTraceback (most recent call last):\nFile \"C:\\Users\\maste\\Documents\\Coding\\Python\\Jarvis\\JarvisTextInpuhjhjghyjvjt.py\", line 44, in \nmain()\nFile \"C:\\Users\\maste\\Documents\\Coding\\Python\\Jarvis\\JarvisTextInpuhjhjghyjvjt.py\", line 37, in main\nexecute_prompt(text, history, jarvis_setup)\nFile \"C:\\Users\\maste\\Documents\\Coding\\Python\\Jarvis\\JarvisTextInpuhjhjghyjvjt.py\", line 12, in execute_prompt\nresponse = jarvis_setup(history)\nFile \"C:\\Users\\maste\\AppData\\Roaming\\Python\\Python310\\site-packages\\langchain_core_api\\deprecation.py\", line 148, in warning_emitting_wrapper\nreturn wrapped(*args, **kwargs)\nFile \"C:\\Users\\maste\\AppData\\Roaming\\Python\\Python310\\site-packages\\langchain_core\\language_models\\chat_models.py\", line 847, in call\ngeneration = self.generate(\nFile \"C:\\Users\\maste\\AppData\\Roaming\\Python\\Python310\\site-packages\\langchain_core\\language_models\\chat_models.py\", line 456, in generate\nraise e\nFile \"C:\\Users\\maste\\AppData\\Roaming\\Python\\Python310\\site-packages\\langchain_core\\language_models\\chat_models.py\", line 446, in generate\nself._generate_with_cache(\nFile \"C:\\Users\\maste\\AppData\\Roaming\\Python\\Python310\\site-packages\\langchain_core\\language_models\\chat_models.py\", line 671, in _generate_with_cache\nresult = self._generate(\nFile \"C:\\Users\\maste\\AppData\\Roaming\\Python\\Python310\\site-packages\\langchain_openai\\chat_models\\base.py\", line 520, in _generate\nmessage_dicts, params = self._create_message_dicts(messages, stop)\nFile \"C:\\Users\\maste\\AppData\\Roaming\\Python\\Python310\\site-packages\\langchain_openai\\chat_models\\base.py\", line 533, in _create_message_dicts\nmessage_dicts = [_convert_message_to_dict(m) for m in messages]\nFile \"C:\\Users\\maste\\AppData\\Roaming\\Python\\Python310\\site-packages\\langchain_openai\\chat_models\\base.py\", line 533, in \nmessage_dicts = [_convert_message_to_dict(m) for m in messages]\nFile \"C:\\Users\\maste\\AppData\\Roaming\\Python\\Python310\\site-packages\\langchain_openai\\chat_models\\base.py\", line 182, in _convert_message_to_dict\nif (name := message.name or message.additional_kwargs.get(\"name\")) is not None:\nAttributeError: 'SystemMessage' object has no attribute 'name'\nI'm guessing I need to add \".invoke\" somewhere in the code based on some research I did on the issue, but I'm a beginner.\nI found this website showcasing a very similar error and how to fix it: https://wikidocs.net/235780\nYou can translate the page to English with Google Translate and the translations are sufficient to understand. It says to add \".invoke\" in the place you can see shown on the website. Not sure how to implement this into my code though. Also, this might not be the right solution.\nI also looked at the Langchain website and it also says to use \"invoke\" but I can't find examples of it being used in a full line of code.\n"} -{"question": "I have the following code:\nchat_history = []\nembeddings = OpenAIEmbeddings()\ndb = FAISS.from_documents(chunks, embeddings)\nqa = ConversationalRetrievalChain.from_llm(OpenAI(temperature=0.1), db.as_retriever())\nresult = qa({\"question\": \"What is stack overflow\", \"chat_history\": chat_history})\n\nThe code creates embeddings, creates a FAISS in-memory vector db with some text that I have in chunks array, then it creates a ConversationalRetrievalChain, followed by asking a question.\nBased on what I understand from ConversationalRetrievalChain, when asked a question, it will first query the FAISS vector db, then, if it can't find anything matching, it will go to OpenAI to answer that question. (is my understanding correct?)\nHow can I detect if it actually called OpenAI to get the answer or it was able to get it from the in-memory vector DB? The result object contains question, chat_history and answer properties and nothing else.\n"} -{"question": "Question #1:\nIs there a way of using Mac with M1 CPU and llama_index together?\nI cannot pass the bellow assertion:\nAssertionError Traceback (most recent call last)\n in \n 6 from transformers import pipeline\n 7 \n----> 8 class customLLM(LLM):\n 9 model_name = \"google/flan-t5-large\"\n 10 pipeline = pipeline(\"text2text-generation\", model=model_name, device=0, model_kwargs={\"torch_dtype\":torch.bfloat16})\n\n in customLLM()\n 8 class customLLM(LLM):\n 9 model_name = \"google/flan-t5-large\"\n---> 10 pipeline = pipeline(\"text2text-generation\", model=model_name, device=0, model_kwargs={\"torch_dtype\":torch.bfloat16})\n 11 \n 12 def _call(self, prompt, stop=None):\n\n~/Library/Python/3.9/lib/python/site-packages/transformers/pipelines/__init__.py in pipeline(task, model, config, tokenizer, feature_extractor, framework, revision, use_fast, use_auth_token, device, device_map, torch_dtype, trust_remote_code, model_kwargs, pipeline_class, **kwargs)\n 868 kwargs[\"device\"] = device\n 869 \n--> 870 return pipeline_class(model=model, framework=framework, task=task, **kwargs)\n\n~/Library/Python/3.9/lib/python/site-packages/transformers/pipelines/text2text_generation.py in __init__(self, *args, **kwargs)\n 63 \n 64 def __init__(self, *args, **kwargs):\n---> 65 super().__init__(*args, **kwargs)\n 66 \n 67 self.check_model_type(\n\n~/Library/Python/3.9/lib/python/site-packages/transformers/pipelines/base.py in __init__(self, model, tokenizer, feature_extractor, modelcard, framework, task, args_parser, device, binary_output, **kwargs)\n 776 # Special handling\n 777 if self.framework == \"pt\" and self.device.type != \"cpu\":\n--> 778 self.model = self.model.to(self.device)\n 779 \n 780 # Update config with task specific parameters\n\n~/Library/Python/3.9/lib/python/site-packages/transformers/modeling_utils.py in to(self, *args, **kwargs)\n 1680 )\n 1681 else:\n-> 1682 return super().to(*args, **kwargs)\n 1683 \n 1684 def half(self, *args):\n\n~/Library/Python/3.9/lib/python/site-packages/torch/nn/modules/module.py in to(self, *args, **kwargs)\n 1143 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)\n 1144 \n-> 1145 return self._apply(convert)\n 1146 \n 1147 def register_full_backward_pre_hook(\n\n~/Library/Python/3.9/lib/python/site-packages/torch/nn/modules/module.py in _apply(self, fn)\n 795 def _apply(self, fn):\n 796 for module in self.children():\n--> 797 module._apply(fn)\n 798 \n 799 def compute_should_use_set_data(tensor, tensor_applied):\n\n~/Library/Python/3.9/lib/python/site-packages/torch/nn/modules/module.py in _apply(self, fn)\n 818 # `with torch.no_grad():`\n 819 with torch.no_grad():\n--> 820 param_applied = fn(param)\n 821 should_use_set_data = compute_should_use_set_data(param, param_applied)\n 822 if should_use_set_data:\n\n~/Library/Python/3.9/lib/python/site-packages/torch/nn/modules/module.py in convert(t)\n 1141 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None,\n 1142 non_blocking, memory_format=convert_to_format)\n-> 1143 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)\n 1144 \n 1145 return self._apply(convert)\n\n~/Library/Python/3.9/lib/python/site-packages/torch/cuda/__init__.py in _lazy_init()\n 237 \"multiprocessing, you must use the 'spawn' start method\")\n 238 if not hasattr(torch._C, '_cuda_getDeviceCount'):\n--> 239 raise AssertionError(\"Torch not compiled with CUDA enabled\")\n 240 if _cudart is None:\n 241 raise AssertionError(\n\nAssertionError: Torch not compiled with CUDA enabled\n\nObviously I've no Nvidia card, but I've read Pytorch is now supporting Mac M1 as well\nI'm trying to run the below example:\nfrom llama_index import SimpleDirectoryReader, LangchainEmbedding, GPTListIndex,GPTSimpleVectorIndex, PromptHelper\nfrom langchain.embeddings.huggingface import HuggingFaceEmbeddings\nfrom llama_index import LLMPredictor, ServiceContext\nimport torch\nfrom langchain.llms.base import LLM\nfrom transformers import pipeline\n\nclass customLLM(LLM):\n model_name = \"google/flan-t5-large\"\n pipeline = pipeline(\"text2text-generation\", model=model_name, device=0, model_kwargs={\"torch_dtype\":torch.bfloat16})\n\n def _call(self, prompt, stop=None):\n return self.pipeline(prompt, max_length=9999)[0][\"generated_text\"]\n \n def _identifying_params(self):\n return {\"name_of_model\": self.model_name}\n\n def _llm_type(self):\n return \"custom\"\n\n\nllm_predictor = LLMPredictor(llm=customLLM())\n\nQuestion #2:\nAssuming the answer for the above is no - I don't mind using Google Colab with GPU, but once the index will be made, will it be possible to download it and use it on my Mac?\ni.e. something like:\non Google Colab:\nservice_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, embed_model=embed_model)\nindex = GPTSimpleVectorIndex.from_documents(documents, service_context=service_context)\nindex.save_to_disk('index.json')\n\n... and later on my Mac use load_from_file\n"} -{"question": "langchain python agent react differently, for one prompt, it can import scanpy library, but not for the other one. My question is how to make sure to import the correct library without problem.\nfrom dotenv import load_dotenv, find_dotenv\nimport openai\nimport os\nfrom langchain.chat_models import ChatOpenAI\nfrom langchain.agents.agent_types import AgentType\nfrom langchain_experimental.agents.agent_toolkits import create_python_agent\nfrom langchain_experimental.tools import PythonREPLTool\nimport scanpy as sc\n\nload_dotenv(find_dotenv())\nopenai.api_key = os.environ[\"OPENAI_API_KEY\"]\n\nagent_executor = create_python_agent(\n llm=ChatOpenAI(temperature=0, model=\"gpt-4-1106-preview\"),\n tool=PythonREPLTool(),\n verbose=True,\n agent_type=AgentType.OPENAI_FUNCTIONS,\n agent_executor_kwargs={\"handle_parsing_errors\": True},\n)\n\nif run the following,\nagent_executor.run(\"set scanpy setting verbosity = 3 \")\nI get\n> Entering new AgentExecutor chain...\n\nInvoking: Python_REPL with import scanpy as sc\nsc.settings.verbosity = 3\nprint(sc.settings.verbosity)\n\n\n3\nThe verbosity level of Scanpy has been set to 3.\n\n> Finished chain.\nThe verbosity level of Scanpy has been set to 3.\n\nbut, if run the following,\npbmc = sc.datasets.pbmc68k_reduced()\nagent_executor.run(\"use 'scanpy' library and 'pbmc' object to plot a umap\")\n\nI get,\n> Entering new AgentExecutor chain...\nPython REPL can execute arbitrary code. Use with caution.\n\nInvoking: Python_REPL with import scanpy as sc\n\n\n\nInvoking: Python_REPL with import scanpy as sc\nresponded: It seems there was an issue with the execution of the import statement for the 'scanpy' library. I will attempt to resolve this and proceed with the task. Let's try importing the library again.\n\nIt appears that there is an issue with importing the 'scanpy' library in this environment. Without being able to import the library, I cannot proceed with plotting a UMAP of the 'pbmc' object. If the library and the necessary data were available, I would typically load the data, preprocess it, and then use the sc.pl.umap function to plot the UMAP. However, since I cannot execute the code here, I'm unable to complete this task.\n\n"} -{"question": "I have a quick question: I'm using the Chroma vector store with LangChain.\nAnd I brought up a simple docsearch with Chroma.from_texts. I was initially very confused because i thought the similarity_score_with_score would be higher for queries that are close to answers, but it seems from my testing the opposite is true. Is this becasue it's returning the 'distance' between the two vectors when it searches? I was looking at docs but it only says \"List of Documents most similar to the query and score for each\" but doesnt explain what 'score' is\nDoc reference https://python.langchain.com/en/latest/reference/modules/vectorstores.html?highlight=similarity_search#langchain.vectorstores.Annoy.similarity_search_with_score Can also give more info on the (small to start) dataset im using and queries i tested with.\n"} diff --git a/optimization_runs/2_gepa_optimized_query_expander.json b/optimization_runs/2_gepa_optimized_query_expander.json deleted file mode 100644 index 93950d6..0000000 --- a/optimization_runs/2_gepa_optimized_query_expander.json +++ /dev/null @@ -1,28 +0,0 @@ -{ - "expand_query": { - "traces": [], - "train": [], - "demos": [], - "signature": { - "instructions": "You expand developer questions into a single, concise search-query paragraph that helps a search engine retrieve high-signal, authoritative fixes and examples.\n\nOutput format\n- Return only the expanded search query as one paragraph (no headings or extra commentary), 2–5 sentences total.\n\nHow to expand\n1) Extract the exact technologies, libraries, versions, models, classes\/functions, parameters, and any error messages (quote errors verbatim). Identify the user’s task and where it’s failing.\n2) Add synonyms and related names (e.g., LlamaIndex\/llama_index\/gpt_index; ChatOpenAI vs OpenAI wrappers; HuggingFacePipeline vs transformers.pipeline), plus common misconfigurations and breaking changes.\n3) Anticipate root causes and fixes: version\/compatibility issues, correct imports, supported tasks, proper parameters\/flags, environment variables, install commands, and version pins.\n4) Include keywords for the “expected correct approach” developers would search for: exact class\/function names, model IDs, flags, example code terms, correct API\/module names, and minimal repro patterns.\n5) Keep it targeted and precise; prefer specific, likely solutions over generic advice.\n\nDomain-specific nuggets to always include when relevant\n- OpenAI chat sessions and context:\n - The Chat Completions API does not maintain server-side session state; you must resend conversation history each call. Include best practices for client-side memory (ConversationSummaryMemory, ConversationBufferMemory, ConversationBufferMemoryHistory, RunnableWithMessageHistory), summarizing or truncating history, token limits, and caching. Include “no server-side session,” “conversation ID not supported,” “how to maintain context,” and “message history management.”\n- Hugging Face + LangChain:\n - HuggingFacePipeline supports only text-generation, text2text-generation, and summarization; it does not support automatic-speech-recognition (ASR). Errors like AttributeError: 'WhisperProcessor' object has no attribute 'config' can appear when misused. Include “transformers.pipeline('automatic-speech-recognition')”, correct ASR components (AutoModelForSpeechSeq2Seq\/WhisperForConditionalGeneration + WhisperProcessor\/feature extractor), and that LangChain’s HuggingFacePipeline isn’t appropriate for ASR.\n - For fully local inference (e.g., Falcon 40B instruct), load with transformers (AutoTokenizer, AutoModelForCausalLM), create a text-generation pipeline, then wrap with HuggingFacePipeline or ChatHuggingFace. Include flags\/keywords like device_map=\"auto\", torch_dtype, trust_remote_code=True (Falcon), bitsandbytes\/4-bit or 8-bit quantization, and avoiding HfHubHTTPError: 401 by not using HuggingFaceHub. Include “example code,” “local weights,” “no token,” and “integration with RetrievalQA.”\n- LangChain imports and package split:\n - If ModuleNotFoundError: No module named 'langchain_openai' occurs, note the package splits and correct imports\/installs (langchain, langchain-community, langchain-core, langchain-openai; alternatively ChatOpenAI from langchain_community.chat_models in older setups). Include “correct import path,” “installation steps,” and “breaking changes.”\n- Using OpenAI-compatible local endpoints with LangChain:\n - Use ChatOpenAI (or OpenAI) with a custom base URL (api_base\/base_url\/openai_api_base) pointing to localhost; set a dummy openai_api_key if required. Include environment variables (OPENAI_API_KEY, OPENAI_BASE_URL) and parameters (model\/model_name) and “OpenAI-compatible server,” “no custom client,” “works out of the box.”\n- Chroma vector store:\n - Chroma .get excludes embeddings by default; to retrieve them, use include=['embeddings'] (otherwise you’ll see 'embeddings': None). Include persist(), persist_directory, and verifying chroma-embeddings.parquet if persistence is relevant.\n- LangChain tools + Pydantic:\n - If using @tool(args_schema=...) and seeing ValidationError about BaseModel subclass, it’s often Pydantic v2 vs LangChain compatibility; pin pydantic==1.10.10. Include the exact pin and pip install command.\n- chromadb installation\/runtime:\n - ImportError or install loops can be due to Python version incompatibility; Python 3.10 commonly resolves chromadb issues. Include environment\/version troubleshooting and virtualenv\/conda hints.\n- LlamaIndex + LangChain RAG integration:\n - For sentence window retrieval, use llama_index’s SentenceWindowNodeParser, VectorStoreIndex, MetadataReplacementPostProcessor, LLMRerank, and sentence_window_engine to query. Inject retrieved snippets into the LangChain chat context via prompt placeholders (e.g., {context}, {summary}, {messages}) rather than dumping entire documents; include memory\/token management terms (ConversationBufferMemory, ConversationBufferMemoryHistory) and evaluation metrics (latency, token usage).\n- SQL with LangChain:\n - GPT-4\/ChatOpenAI with SQLDatabaseChain may include extra commentary around SQL, causing sqlite3.OperationalError near \"The\": syntax error. Include “create_sql_agent,” “SQLDatabaseToolkit,” “output parser,” “prompt template enforcing raw SQL,” and “use_query_checker behavior” as fixes\/alternatives. Mention model_name=\"gpt-4-0613” and differences vs text-davinci-003.\n\nGeneral best-practice keywords to weave in\n- Exact model IDs and classes: openai\/whisper-large-v2, ChatOpenAI, HuggingFacePipeline, ChatHuggingFace, RetrievalQA, Chroma.from_documents, .get(include=['embeddings']), @tool args_schema, SentenceWindowNodeParser, VectorStoreIndex, LLMRerank, MetadataReplacementPostProcessor, create_sql_agent, SQLDatabaseToolkit.\n- Search intents: “correct usage,” “supported tasks,” “version compatibility,” “install command,” “example code,” “breaking changes,” “root cause,” “why,” “how to fix,” “dummy API key,” “custom base URL,” “persist_directory,” “Python 3.10,” “pydantic==1.10.10.”\n\nStyle\n- Be specific and action-oriented; prefer concrete class\/param names, model IDs, and troubleshooting terms developers actually search for. Quote error strings verbatim. Avoid fluff.", - "fields": [ - { - "prefix": "Question:", - "description": "${question}" - }, - { - "prefix": "Expanded Query:", - "description": "${expanded_query}" - } - ] - }, - "lm": null - }, - "metadata": { - "dependency_versions": { - "python": "3.11", - "dspy": "3.0.0", - "cloudpickle": "3.1" - } - } -} \ No newline at end of file diff --git a/optimization_runs/2_gepa_query_expander_training_samples.jsonl b/optimization_runs/2_gepa_query_expander_training_samples.jsonl deleted file mode 100644 index 21d5dee..0000000 --- a/optimization_runs/2_gepa_query_expander_training_samples.jsonl +++ /dev/null @@ -1,30 +0,0 @@ -{"question": "I am currently trying to use the Helsinki-NLP/opus-mt-en-de and de-en models. I was trying to setup a pipeline and use both as LLMChain but I keep getting the same error:\nValueError: The following `model_kwargs` are not used by the model: ['pipeline_kwargs', 'return_full_text'] (note: typos in the generate arguments will also show up in this list)\n\nI used the following snippet to initialise both models and ran the snippet after to test the output:\ndef get_translation_chains():\n _de_en_translation_prompt = PromptTemplate.from_template(\n \"\"\"Translate the following text from German to English:\n {text}\n \"\"\"\n )\n\n _en_de_translation_prompt = PromptTemplate.from_template(\n \"\"\"Translate the following text from English to German:\n {text}\n \"\"\"\n )\n\n _en_to_de_tokenizer = AutoTokenizer.from_pretrained(\"Helsinki-NLP/opus-mt-en-de\")\n _en_to_de_model = AutoModelForSeq2SeqLM.from_pretrained(\"Helsinki-NLP/opus-mt-en-de\")\n _de_to_en_tokenizer = AutoTokenizer.from_pretrained(\"Helsinki-NLP/opus-mt-de-en\")\n _de_to_en_model = AutoModelForSeq2SeqLM.from_pretrained(\"Helsinki-NLP/opus-mt-de-en\")\n\n _en_to_de_pipeline = pipeline(\n model=_en_to_de_model,\n tokenizer=_en_to_de_tokenizer,\n task=\"translation\",\n )\n\n _de_to_en_pipeline = pipeline(\n model=_de_to_en_model,\n tokenizer=_de_to_en_tokenizer,\n task=\"translation\",\n )\n\n _de_to_en_llm = HuggingFacePipeline(pipeline=_de_to_en_pipeline)\n _en_to_de_llm = HuggingFacePipeline(pipeline=_en_to_de_pipeline)\n\n _de_to_en_chain = LLMChain(\n prompt=_de_en_translation_prompt,\n llm=_de_to_en_llm,\n )\n\n _en_to_de_chain = LLMChain(\n prompt=_en_de_translation_prompt,\n llm=_en_to_de_llm,\n )\n\n return _en_to_de_chain, _de_to_en_chain\n\n\n\nen_to_de_chain, de_to_en_pipeline = get_translation_chains()\n\nprint(en_to_de_chain.invoke({\"text\": \"Hello, how are you?\"}))\n\nI am fairly new to using LLMs and both the huggingface and langchain libraries and could not find anything to give me a clue on this one.\nI tried to use the pipeline with only setting the task I wanted \"translation_de_to_en\" and the other way around as well as using \"translation\" only for both default and more detailed pipeline. I also tried to set the kwargs option to None and False but with no success\n"} -{"question": "I have been reading the documentation all day and can't seem to wrap my head around how I can create a VectorStoreIndex with llama_index and use the created embeddings as supplemental information for a RAG application/chatbot that can communicate with a user. I want to use llama_index because they have some cool ways to perform more advanced retrieval techniques like sentence window retrieval and auto-merging retrieval (to be fair I have not investigated if Langchain also supports these types of vector retrieval methods). I want to use LangChain because of its functionality for developing more complex prompt templates (similarly I have not really investigated if llama_index supports this).\nMy goal is to ultimately evaluate how these different retrieval methods perform within the context of the application/chatbot. I know how to evaluate them with a separate evaluation questions file, but I would like to do things like compare the speed and humanness of responses, token usage, etc.\nThe code for a minimal reproducible example would be as follows\n1) LangChain ChatBot initiation \n from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder\n from langchain.memory import ChatMessageHistory\n \n \n prompt = ChatPromptTemplate.from_messages(\n [\n (\n \"system\",\n \"\"\"You are the world's greatest... \\\n Use this document base to help you provide the best support possible to everyone you engage with. \n \"\"\",\n ),\n MessagesPlaceholder(variable_name=\"messages\"),\n ]\n )\n \n chat = ChatOpenAI(model=llm_model, temperature=0.7)\n \n \n \n chain = prompt | chat\n \n \n chat_history = ChatMessageHistory()\n \n while True:\n user_input = input(\"You: \")\n chat_history.add_user_message(user_input)\n \n response = chain.invoke({\"messages\": chat_history.messages})\n \n if user_input.lower() == 'exit':\n break\n \n print(\"AI:\", response)\n chat_history.add_ai_message(response)\n\n\nLlama index sentence window retrieval\n\nfrom llama_index.core.node_parser import SentenceWindowNodeParser\n from llama_index.core.indices.postprocessor import MetadataReplacementPostProcessor\n from llama_index.core.postprocessor import LLMRerank\n \n class SentenceWindowUtils:\n def __init__(self, documents, llm, embed_model, sentence_window_size):\n self.documents = documents\n self.llm = llm\n self.embed_model = embed_model\n self.sentence_window_size = sentence_window_size\n # self.save_dir = save_dir\n \n self.node_parser = SentenceWindowNodeParser.from_defaults(\n window_size=self.sentence_window_size,\n window_metadata_key=\"window\",\n original_text_metadata_key=\"original_text\",\n )\n \n self.sentence_context = ServiceContext.from_defaults(\n llm=self.llm,\n embed_model=self.embed_model,\n node_parser=self.node_parser,\n )\n \n def build_sentence_window_index(self, save_dir):\n if not os.path.exists(save_dir):\n os.makedirs(save_dir)\n sentence_index = VectorStoreIndex.from_documents(\n self.documents, service_context=self.sentence_context\n )\n sentence_index.storage_context.persist(persist_dir=save_dir)\n else:\n sentence_index = load_index_from_storage(\n StorageContext.from_defaults(persist_dir=save_dir),\n service_context=self.sentence_context,\n )\n \n return sentence_index\n \n def get_sentence_window_query_engine(self, sentence_index, similarity_top_k=6, rerank_top_n=3):\n postproc = MetadataReplacementPostProcessor(target_metadata_key=\"window\")\n rerank = LLMRerank(top_n=rerank_top_n, service_context=self.sentence_context)\n \n sentence_window_engine = sentence_index.as_query_engine(\n similarity_top_k=similarity_top_k, node_postprocessors=[postproc, rerank]\n )\n \n return sentence_window_engine\n \n \n sentence_window = SentenceWindowUtils(documents=documents, llm = llm, embed_model=embed_model, sentence_window_size=1)\n sentence_window_1 = sentence_window.build_sentence_window_index(save_dir='./indexes/sentence_window_index_1')\n sentence_window_engine_1 = sentence_window.get_sentence_window_query_engine(sentence_window_1)\n\nBoth blocks of code independently will run. But the goal is that when a query is performed that warrants a retrieval to the existing document base, I can use the sentence_window_engine that was built. I suppose I could retrieve relevant information based on the query and then pass that information into a subsequent prompt for the chatbot, but I would like to try and avoid including the document data in a prompt.\nAny suggestions?\n"} -{"question": "Not a coding question, but a documentation omission that is nowhere mentioned online at this point. When using the Langchain CSVLoader, which column is being vectorized via the OpenAI embeddings I am using?\nI ask because viewing this code below, I vectorized a sample CSV, did searches (on Pinecone) and consistently received back DISsimilar responses. How do know which column Langchain is actually identifying to vectorize?\nloader = CSVLoader(file_path=file, metadata_columns=['col2', 'col3', 'col4','col5'])\nlangchain_docs = loader.load()\ntext_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=100)\ndocs = text_splitter.split_documents(langchain_docs)\nfor doc in docs:\n doc.metadata.pop('source')\n doc.metadata.pop('row')\nmy_index = pc_store.from_documents(docs, embeddings, index_name=PINECONE_INDEX_NAME)\n\nI am assuming the CSVLoader is then identifying col1 to vectorize. But, searches of Pinecone are terrible, leading me to think some other column is being vectorized.\n"} -{"question": "I am confused by how multiple messages are combined and sent to a large language model such as ChatOpenAI.\nfrom langchain_core.prompts import ChatPromptTemplate\n\ntemplate = ChatPromptTemplate.from_messages([\n (\"system\", \"You are a helpful AI bot. Your name is {name}.\"),\n (\"human\", \"Hello, how are you doing?\"),\n (\"ai\", \"I'm doing well, thanks!\"),\n (\"human\", \"{user_input}\"),\n])\n\nmessages = template.format_messages(\n name=\"Bob\",\n user_input=\"What is your name?\"\n)\n\nmessages\n\n[SystemMessage(content='You are a helpful AI bot. Your name is Bob.'),\n HumanMessage(content='Hello, how are you doing?'),\n AIMessage(content=\"I'm doing well, thanks!\"),\n HumanMessage(content='What is your name?')]\n\nIs it generating text that looks like this:\nSystem:\nHuman:\nAssistant:\nHuman:\n...\n\nHow can I print the final text sent to the llm?\n"} -{"question": "I was previously using SQLDatabaseChain to connect LLM (Language Model) with my database, and it was functioning correctly with GPT-3.5. However, when attempting the same process with GPT-4, I encountered an error stating \"incorrect syntax near 's\"\nTo address this issue, I opted to use SQLDatabaseToolkit and the create_sql_agent function. However, I encountered a problem with this approach as I was unable to pass a prompt. When attempting to include a PromptTemplate in the create_sql_agent argument, it resulted in errors.\nValueError: Prompt missing required variables: {'tool_names', 'agent_scratchpad', 'tools'}\nBelow is my code:\ntoolkit = SQLDatabaseToolkit(db=db, llm=llm)\n\nagent_executor = create_sql_agent(\n llm=llm,\n toolkit=toolkit,\n verbose=True,\n prompt=MSSQL_PROMPT,\n)\n\n"} -{"question": "I have successfully connected to a Redshift database like below and got all the table names;\nconn = psycopg2.connect(host,db,port,username,password)\ncursor.execute(\"SELECT tablename FROM pg_tables GROUP BY tablename ORDER BY tablename\")\n\nHowever, when I connect using langchain and sqlalchemy like below, get_usable_table_names returns few of many tables in the database;\npg_url = f\"postgresql+psycopg2://{db_user}:{db_password}@{db_host}:{port_}/{db_}\"\ndb_engine = create_engine(pg_url)\ndb = SQLDatabase(db_engine)\nllm = OpenAI(temperature=0.0, openai_api_key=OPENAI_API_KEY, model='gpt-3.5-turbo')\n\ntable_names = \"\\n\".join(db.get_usable_table_names())\n\nAnyone has any suggestions on what might be the issue?\nI have tried querying a missing table by;\ndb.run(\"SELECT * FROM db_schema.missing_table_name\") \n\nand this works. However, I need SQLDatabase from langchain.sql_database module to detect the tables right without specifying one by one. (Because I would like to Chat With Sql Database Using Langchain & OpenAI)\n"} -{"question": "I am trying to put together a simple \"Q&A with sources\" using Langchain and a specific URL as the source data. The URL consists of a single page with quite a lot of information on it.\nThe problem is that RetrievalQAWithSourcesChain is only giving me the entire URL back as the source of the results, which is not very useful in this case.\nIs there a way to get more detailed source info?\nPerhaps the heading of the specific section on the page?\nA clickable URL to the correct section of the page would be even more helpful!\nI am slightly unsure whether the generating of the result source is a function of the language model, URL loader or simply RetrievalQAWithSourcesChain alone.\nI have tried using UnstructuredURLLoader and SeleniumURLLoader with the hope that perhaps more detailed reading and input of the data would help - sadly not.\nRelevant code excerpt:\nllm = ChatOpenAI(temperature=0, model_name='gpt-3.5-turbo')\nchain = RetrievalQAWithSourcesChain.from_llm(llm=llm, retriever=VectorStore.as_retriever())\n\nresult = chain({\"question\": question})\n\nprint(result['answer'])\nprint(\"\\n Sources : \",result['sources'] )\n\n"} -{"question": "Following LangChain docs in my Jupyter notebook with the following code :\nfrom langchain_openai import ChatOpenAI\nfrom langchain_core.prompts import ChatPromptTemplate\nfrom langchain_core.output_parsers import StrOutputParser\n\n\nprompt = ChatPromptTemplate.from_template(\"Tell me a short joke about {topic}\")\nmodel = ChatOpenAI(model=\"gpt-3.5-turbo\")\noutput_parser = StrOutputParser()\n\nchain = prompt | model | output_parser\n\nDocs say that pip install langchain installs all necessary modules, including langchain-community and langchain-core\nHowever, I get this error:\nModuleNotFoundError: No module named 'langchain_openai'\n"} -{"question": "Hi i am trying to do speaker diarization with open/ai whisper model.\nfrom langchain.llms import HuggingFacePipeline\nimport torch\nfrom transformers import AutoTokenizer, WhisperProcessor,AutoModelForCausalLM, pipeline, AutoModelForSeq2SeqLM\n\nmodel_id = 'openai/whisper-large-v2'\ntokenizer = AutoTokenizer.from_pretrained(model_id)\nmodel = WhisperProcessor.from_pretrained(model_id)\n\n\npipe = pipeline(\n \"automatic-speech-recognition\",\n model=model, \n tokenizer=tokenizer, \n max_length=100\n)\n\nlocal_llm = HuggingFacePipeline(pipeline=pipe)\n\nThe error i am getting is \" AttributeError: 'WhisperProcessor' object has no attribute 'config'\"\nIs there anything to change from above code?\nThanks in advance\n"} -{"question": "I use this command 'from langchain.document_loaders import TextLoader' for import TextLoader. It used to work but now it is ERROR. It shows 'Error: No module named 'pydantic_v1.class_validators'; 'pydantic_v1' is not a package' Anyone know how to fix it ? please !! Using Langchain ==> langchain==0.0.266\nenter image description here\n"} -{"question": "I am trying to build a Chat PDF application using langchain,\nDuring this I installed all the necessary packages, but there is one issue with this chromadb, which no matter what I do, it keeps showing the error.\nI installed it, ran it many times, but I keep getting this error asking to install chromadb and\nhere is the screenshot of the error\nrepo link\nI tried uninstalling and installing again, GPTed, saw issues in Github but nothing seems to help me fix the issue\n"} -{"question": "I'm trying to pass filters to redis retriever to do hybrid search on my embeddings (vector + metadata filtering). The following doesn't work! It fails to pass the filters and filters would always be None:\nretriever = redis.as_retriever(\n search_type=\"similarity_distance_threshold\",\n search_kwargs=\"{'include_metadata': True,'distance_threshold': 0.8,'k': 5}\",\n filter=\"(@launch:{false} @menu_text:(%%chicken%%))\"\n )\n\nI found another example and apparently filter expression should be pass as search_kwargs, but I can't figure out what should be the correct syntax. If I do it as follow:\nretriever = redis.as_retriever(\n search_type=\"similarity_distance_threshold\",\n \"retriever_search_kwargs\":\"{'include_metadata': True,'distance_threshold': 0.8,'k': 5, 'filter': '@menu_text:(%%chicken%%) @lunch:{true}'}\",\n}\n\nit generates this search query:\nsimilarity_search_by_vector > redis_query : (@content_vector:[VECTOR_RANGE $distance_threshold $vector] @menu_text:(%%chicken%%) @lunch:{true})=>{$yield_distance_as: distance}\nand fails with the following error:\nredis.exceptions.ResponseError: Invalid attribute yield_distance_as\nAny idea how to fix it?\nSystem Info:\nlangchain 0.0.346\nlangchain-core 0.0.10\npython 3.9.18\n"} -{"question": "I making a project which uses chromadb (0.3.29), llama-index (0.6.34.post1) and langchain (0.0.245), and openai (0.27.8).But I am getting response None when I tried to query in custom pdfs.even they are getting embedded successfully , below are my codes:\nimport os, re\nimport shutil\nimport time\nfrom grpc import ServicerContext\nimport vectordb\nfrom langchain import OpenAI\nfrom llama_index import GPTTreeIndex, SimpleDirectoryReader, LLMPredictor,GPTVectorStoreIndex,PromptHelper, VectorStoreIndex\nfrom llama_index import LangchainEmbedding, ServiceContext, Prompt\nfrom llama_index import StorageContext, load_index_from_storage\nfrom langchain.embeddings import OpenAIEmbeddings\nfrom langchain.llms import AzureOpenAI\n# Import Azure OpenAI\n#from langchain_community.llms import AzureOpenAI\nimport chromadb\nfrom llama_index.vector_stores import ChromaVectorStore\n \nfrom dotenv import load_dotenv\nload_dotenv()\n#openai.api_key = os.getenv[\"OPENAI_API_KEY\"]\n\n\n\ndef regenrate_tokens(collection_name,persist_directory): \n \n if os.path.isdir((persist_directory)):\n print(\"directory existed ,replacing previous directory\")\n shutil.rmtree(persist_directory)\n print(\"Recreating Embeddings...\")\n vector=vectordb.CreatingChromaDB(collection_name,persist_directory)\n vector.storage_context.persist(persist_dir= persist_directory)\n\n else:\n print(\"directory does not exit, creating new embeddings.\")\n vector=vectordb.CreatingChromaDB(collection_name,persist_directory)\n vector.storage_context.persist(persist_dir= persist_directory)\n \n time.sleep(10) # Sleep for 10 seconds\n\n return('Token regenrated, you can ask the questions. ')\n\ndef query__from_knowledge_base(question):\n persist_directory = './ChromaDb'\n collection_name = \"chromaVectorStore\"\n\n \n if(question == 'regenerate tokens'):\n return(regenrate_tokens(collection_name,persist_directory))\n \n index = vectordb.LoadFromDisk(collection_name,persist_directory)\n print(index)\n # define custom Prompt\n # TEMPLATE_STR = (\n # \"We have provided context information below. \\n\"\n # \"---------------------\\n\"\n # \"{context_str}\"\n # \"\\n---------------------\\n\"\n # \"Given this information, please answer the question: {query_str}\\n\"\n # )\n TEMPLATE_STR = \"\"\"Create a final answer to the given questions using the provided document excerpts(in no particular order) as references. ALWAYS include a \"SOURCES\" section in your answer including only the minimal set of sources needed to answer the question. Always include the Source Preview of source. If answer has step in document please response in step. If you are unable to answer the question, simply state that you do not know. Do not attempt to fabricate an answer and leave the SOURCES section empty.\n\n \"---------------------\\n\"\n \"{context_str}\"\n \"\\n---------------------\\n\"\n \"Given this information, please answer the question: {query_str}\\n\"\n \"\"\"\n\n QA_TEMPLATE = Prompt(TEMPLATE_STR)\n \n query_engine = index.as_query_engine(text_qa_template=QA_TEMPLATE)\n print(query_engine)\n response = query_engine.query(question)\n print(question)\n # print(response)\n response = str(response) \n response = re.sub(r'Answer:', '', response)\n response = response.strip()\n return(response)\n \n\n#print(regenrate_tokens())\n#print(query__from_knowledge_base('Enabling online archive for the user\u2019s mailbox.'))\n\nfile vectordb.py,\ncontaining creation and querying methods are below:\ndef CreatingChromaDB(collection_name,persist_directory):\n\n documents = SimpleDirectoryReader('./static/upload/').load_data()\n # deployment_name = \"text-davinci-003\"\n deployment_name = \"gpt-3.5-turbo\"\n openai_api_version=\"30/08/2023\"\n\n # Create LLM via Azure OpenAI Service\n llm = AzureOpenAI(deployment_name=deployment_name,openai_api_version=openai_api_version)\n llm_predictor = LLMPredictor(llm=llm)\n llm_predictor = LLMPredictor(llm = llm_predictor)\n embedding_llm = LangchainEmbedding(OpenAIEmbeddings())\n\n # Define prompt helper\n max_input_size = 3000\n num_output = 256\n chunk_size_limit = 1000 # token window size per document\n max_chunk_overlap = 20 # overlap for each token fragment\n prompt_helper = PromptHelper(max_input_size=max_input_size, num_output=num_output,\n max_chunk_overlap=max_chunk_overlap, chunk_size_limit=chunk_size_limit)\n\n service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, embed_model=embedding_llm, prompt_helper=prompt_helper)\n chroma_client = chromadb.Client(Settings(\n chroma_db_impl=\"duckdb+parquet\",\n persist_directory= persist_directory))\n\n print(collection_name)\n\n # create a collection\n chroma_collection = chroma_client.get_or_create_collection(collection_name,embedding_function=embedding_llm)\n # https://docs.trychroma.com/api-reference\n print(chroma_collection.count())\n\n vector_store = ChromaVectorStore(chroma_collection)\n storage_context = StorageContext.from_defaults(vector_store=vector_store)\n index = GPTVectorStoreIndex.from_documents(documents, storage_context=storage_context, service_context=service_context)\n print(chroma_collection.count())\n print(chroma_collection.get()['documents'])\n print(chroma_collection.get()['metadatas'])\n\n # index.storage_context.persist()\n return index\n\ndef LoadFromDisk(collection_name,persist_directory):\n chroma_client = chromadb.Client(Settings(\n chroma_db_impl=\"duckdb+parquet\",\n persist_directory= persist_directory))\n\n print(collection_name)\n\n chroma_collection = chroma_client.get_or_create_collection(collection_name)\n vector_store = ChromaVectorStore(chroma_collection=chroma_collection)\n index = GPTVectorStoreIndex.from_vector_store(vector_store=vector_store)\n return index\n\nif we tried to regenerate tokens and try to query from pdfs then its shows \"None\" response, even if those files are embedded properly.\n\n"} -{"question": "I have the code:\nloader = PyPDFLoader(\u201chttps://arxiv.org/pdf/2303.08774.pdf\u201d)\ndata = loader.load()\ndocs = text_splitter1.split_documents(data)\nvector_search_index = \u201cvector_index\u201d\n\nvector_search = MongoDBAtlasVectorSearch.from_documents(\n documents=docs,\n embedding=OpenAIEmbeddings(disallowed_special=()),\n collection=atlas_collection,\n index_name=vector_search_index,\n)\n\nquery = \"What were the compute requirements for training GPT 4\"\nresults = vector_search1.similarity_search(query)\nprint(\"result: \", results)\n\nAnd in results I have every time only empty array. I don't understand what I do wrong. This is the link on the langchain documentation with examples. Information is saved normally in database, but I cannot search info in this collection.\n"} -{"question": "I am experimenting with langchains and its applications, but as a newbie, I could not understand how the embeddings and indexing really work together here. I know what these two are, but I can't figure out a way to use the index that I created and saved using persist_directory.\nI succesfully saved the object created by VectorstoreIndexCreator using the following code:\nindex = VectorstoreIndexCreator(vectorstore_kwargs={\"persist_directory\":\"./custom_save_dir_path\"}).from_loaders([loader])\n\nbut I cannot find a way to use the .pkl files created. How can I use these files in my chain to retrieve data?\nAlso, how does the billing in openAI work? If I cannot use any saved embeddings or index, will it re-embed all the data every time I run the code?\nAs a beginner, I am still learning my way around and any assistance would be greatly appreciated.\nHere is the full code:\nfrom langchain.document_loaders import CSVLoader\nfrom langchain.indexes import VectorstoreIndexCreator\nfrom langchain.chains import RetrievalQA\nfrom langchain.llms import OpenAI\nimport os\nos.environ[\"OPENAI_API_KEY\"] = \"sk-xxx\"\n# Load the documents\nloader = CSVLoader(file_path='data/data.csv')\n\n#creates an object with vectorstoreindexcreator\nindex = VectorstoreIndexCreator(vectorstore_kwargs={\"persist_directory\":\"./custom_save_dir_path\"}).from_loaders([loader])\n\n# Create a question-answering chain using the index\nchain = RetrievalQA.from_chain_type(llm=OpenAI(), chain_type=\"stuff\", retriever=index.vectorstore.as_retriever(), input_key=\"question\")\n\n# Pass a query to the chain\nwhile True:\n query = input(\"query: \")\n response = chain({\"question\": query})\n print(response['result'])\n\n"} -{"question": "Here is the full code. It runs perfectly fine on https://learn.deeplearning.ai/ notebook. But when I run it on my local machine, I get an error about\n\nImportError: Could not import docarray python package\n\nI have tried reinstalling/force installing langchain and lanchain[docarray] (both pip and pip3). I use mini conda virtual environment. python version 3.11.4\nfrom langchain.vectorstores import DocArrayInMemorySearch\nfrom langchain.schema import Document\nfrom langchain.indexes import VectorstoreIndexCreator\nimport openai\nimport os\n\nos.environ['OPENAI_API_KEY'] = \"xxxxxx\" #not needed in DLAI\n\ndocs = [\n Document(\n page_content=\"\"\"[{\"API_Name\":\"get_invoice_transactions\",\"API_Description\":\"This API when called will provide the list of transactions\",\"API_Inputs\":[],\"API_Outputs\":[]}]\"\"\"\n ),\n Document(\n page_content=\"\"\"[{\"API_Name\":\"get_invoice_summary_year\",\"API_Description\":\"this api summarizes the invoices by vendor, product and year\",\"API_Inputs\":[{\"API_Input\":\"Year\",\"API_Input_Type\":\"Text\"}],\"API_Outputs\":[{\"API_Output\":\"Purchase Volume\",\"API_Output_Type\":\"Float\"},{\"API_Output\":\"Vendor Name\",\"API_Output_Type\":\"Text\"},{\"API_Output\":\"Year\",\"API_Output_Type\":\"Text\"},{\"API_Output\":\"Item\",\"API_Output_Type\":\"Text\"}]}]\"\"\"\n ),\n Document(\n page_content=\"\"\"[{\"API_Name\":\"loan_payment\",\"API_Description\":\"This API calculates the monthly payment for a loan\",\"API_Inputs\":[{\"API_Input\":\"Loan_Amount\",\"API_Input_Type\":\"Float\"},{\"API_Input\":\"Interest_Rate\",\"API_Input_Type\":\"Float\"},{\"API_Input\":\"Loan_Term\",\"API_Input_Type\":\"Integer\"}],\"API_Outputs\":[{\"API_Output\":\"Monthly_Payment\",\"API_Output_Type\":\"Float\"},{\"API_Output\":\"Total_Interest\",\"API_Output_Type\":\"Float\"}]}]\"\"\"\n ),\n Document(\n page_content=\"\"\"[{\"API_Name\":\"image_processing\",\"API_Description\":\"This API processes an image and applies specified filters\",\"API_Inputs\":[{\"API_Input\":\"Image_URL\",\"API_Input_Type\":\"URL\"},{\"API_Input\":\"Filters\",\"API_Input_Type\":\"List\"}],\"API_Outputs\":[{\"API_Output\":\"Processed_Image_URL\",\"API_Output_Type\":\"URL\"}]}]\"\"\"\n ),\n Document(\n page_content=\"\"\"[{\"API_Name\":\"movies_catalog\",\"API_Description\":\"This API provides a catalog of movies based on user preferences\",\"API_Inputs\":[{\"API_Input\":\"Genre\",\"API_Input_Type\":\"Text\"},{\"API_Input\":\"Release_Year\",\"API_Input_Type\":\"Integer\"}],\"API_Outputs\":[{\"API_Output\":\"Movie_Title\",\"API_Output_Type\":\"Text\"},{\"API_Output\":\"Genre\",\"API_Output_Type\":\"Text\"},{\"API_Output\":\"Release_Year\",\"API_Output_Type\":\"Integer\"},{\"API_Output\":\"Rating\",\"API_Output_Type\":\"Float\"}]}]\"\"\"\n ),\n # Add more documents here \n]\n\nindex = VectorstoreIndexCreator(\n vectorstore_cls=DocArrayInMemorySearch\n ).from_documents(docs)\n\napi_desc = \"do analytics about movies\"\nquery = f\"Search for related APIs based on following API Description: {api_desc}\\\n Return list of API page_contents as JSON objects.\"\n\n\nprint(index.query(query))\n \n\nHere is the error:\n(streamlit) C02Z8202LVDQ:sage_response praneeth.gadam$ /Users/praneeth.gadam/opt/miniconda3/envs/streamlit/bin/python /Users/praneeth.gadam/sage_response/docsearch_copy.py Traceback (most recent call last): File \"/Users/praneeth.gadam/opt/miniconda3/envs/streamlit/lib/python3.11/site-packages/langchain/vectorstores/docarray/base.py\", line 19, in _check_docarray_import\n import docarray ModuleNotFoundError: No module named 'docarray'\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last): File \"/Users/praneeth.gadam/sage_response/docsearch_copy.py\", line 30, in \n ).from_documents(docs)\n ^^^^^^^^^^^^^^^^^^^^ File \"/Users/praneeth.gadam/opt/miniconda3/envs/streamlit/lib/python3.11/site-packages/langchain/indexes/vectorstore.py\", line 88, in from_documents\n vectorstore = self.vectorstore_cls.from_documents(\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File \"/Users/praneeth.gadam/opt/miniconda3/envs/streamlit/lib/python3.11/site-packages/langchain/vectorstores/base.py\", line 420, in from_documents\n return cls.from_texts(texts, embedding, metadatas=metadatas, **kwargs)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File \"/Users/praneeth.gadam/opt/miniconda3/envs/streamlit/lib/python3.11/site-packages/langchain/vectorstores/docarray/in_memory.py\", line 67, in from_texts\n store = cls.from_params(embedding, **kwargs)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File \"/Users/praneeth.gadam/opt/miniconda3/envs/streamlit/lib/python3.11/site-packages/langchain/vectorstores/docarray/in_memory.py\", line 38, in from_params\n _check_docarray_import() File \"/Users/praneeth.gadam/opt/miniconda3/envs/streamlit/lib/python3.11/site-packages/langchain/vectorstores/docarray/base.py\", line 29, in _check_docarray_import\n raise ImportError( ImportError: Could not import docarray python package. Please install it with `pip install \"langchain[docarray]\"`.\n\n"} -{"question": "I wrote a program trying to query local sqlite db, and it worked fine for text-davinci-003:\nllm = OpenAI(model_name=\"text-davinci-003\", verbose=True)\n\nHowever, after I changed it to GPT-4:\nllm = ChatOpenAI(model_name=\"gpt-4-0613\", verbose=True)\n...\ndb_chain = SQLDatabaseChain.from_llm(\n llm,\n db,\n verbose=True,\n use_query_checker=True,\n return_intermediate_steps=True,\n)\n\nwith get_openai_callback() as cb:\n # No intermediate steps\n # result = db_chain.run(query)\n\n # If intermediate steps are needed...\n result = db_chain(query)\n intermediate_steps = result[\"intermediate_steps\"]\n\n print(\"\")\n\n try:\n sql_result = intermediate_steps[3]\n print(\"SQL Query Result:\")\n print(json.dumps(ast.literal_eval(sql_result), indent=4))\n except Exception as e:\n print(f\"Error while parsing the SQL result:\\n{e}\")\n print(\"\")\n print(intermediate_steps)\n \n print(\"\")\n\n print(cb)\n\n... everything still works, except the final SQL query contained more text in addition to SQL query, i.e.:\n> Entering new SQLDatabaseChain chain...\nHave the user visited some news website? If yes, list all the urls.\nDO NOT specify timestamp unless query said so.\nDO NOT specify limit unless query said so.\nSQLQuery:The original query appears to be correct as it doesn't seem to have any of the common mistakes listed. Here is the same query:\n\nSELECT \"URL\" FROM browsinghistory WHERE \"Title\" LIKE '%news%'Traceback (most recent call last):\n File \"C:\\path\\Python311\\Lib\\site-packages\\sqlalchemy\\engine\\base.py\", line 1968, in _exec_single_context\n self.dialect.do_execute(\n File \"C:\\path\\Python311\\Lib\\site-packages\\sqlalchemy\\engine\\default.py\", line 920, in do_execute\n cursor.execute(statement, parameters)\nsqlite3.OperationalError: near \"The\": syntax error\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File \"D:\\path\\run.py\", line 292, in \n database_mode(llm, filepath, delimiter)\n File \"D:\\path\\run.py\", line 156, in database_mode\n llm.query_database(db_path=db_path, query=query)\n File \"D:\\path\\modules\\chatbot.py\", line 220, in query_database\n result = db_chain(query)\n ^^^^^^^^^^^^^^^\n File \"C:\\path\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\langchain\\chains\\base.py\", line 140, in __call__\n raise e\n File \"C:\\path\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\langchain\\chains\\base.py\", line 134, in __call__\n self._call(inputs, run_manager=run_manager)\n File \"C:\\path\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\langchain\\chains\\sql_database\\base.py\", line 181, in _call\n raise exc\n File \"C:\\path\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\langchain\\chains\\sql_database\\base.py\", line 151, in _call\n result = self.database.run(checked_sql_command)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"C:\\path\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\langchain\\sql_database.py\", line 334, in run\n cursor = connection.execute(text(command))\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"C:\\path\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\sqlalchemy\\engine\\base.py\", line 1413, in execute\n return meth(\n ^^^^^\n File \"C:\\path\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\sqlalchemy\\sql\\elements.py\", line 483, in _execute_on_connection\n return connection._execute_clauseelement(\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"C:\\path\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\sqlalchemy\\engine\\base.py\", line 1637, in _execute_clauseelement\n ret = self._execute_context(\n ^^^^^^^^^^^^^^^^^^^^^^\n File \"C:\\path\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\sqlalchemy\\engine\\base.py\", line 1846, in _execute_context\n return self._exec_single_context(\n ^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"C:\\path\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\sqlalchemy\\engine\\base.py\", line 1987, in _exec_single_context\n self._handle_dbapi_exception(\n File \"C:\\path\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\sqlalchemy\\engine\\base.py\", line 2344, in _handle_dbapi_exception\n raise sqlalchemy_exception.with_traceback(exc_info[2]) from e\n File \"C:\\path\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\sqlalchemy\\engine\\base.py\", line 1968, in _exec_single_context\n self.dialect.do_execute(\n File \"C:\\path\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\sqlalchemy\\engine\\default.py\", line 920, in do_execute\n cursor.execute(statement, parameters)\nsqlalchemy.exc.OperationalError: (sqlite3.OperationalError) near \"The\": syntax error\n[SQL: The original query appears to be correct as it doesn't seem to have any of the common mistakes listed. Here is the same query:\n\nSELECT \"URL\" FROM browsinghistory WHERE \"Title\" LIKE '%news%']\n(Background on this error at: https://sqlalche.me/e/20/e3q8)\n\nI know that I can try to tell it not to return anything but the query (might be unstable. though...), but why isn't this work for GPT-4, while it works for text-davinci-003?\n\nUpdate:\nTried with a different query, and the problem remains:\n> Entering new SQLDatabaseChain chain...\nList all websites visited by the user.\nDO NOT specify timestamp unless query said so.\nDO NOT specify limit unless query said so.\nSQLQuery:The original query seems to be correct. It is simply selecting the \"URL\" column from the \"browsinghistory\" table. There is no misuse of any functions, no data type mismatch, no joins, etc.\n\nReproducing the original query:\n\nSELECT \"URL\" FROM browsinghistory\n...\n...\n...\n\n"} -{"question": "I have deployed llm model locally which follows openai api schema. As it's endpoint follows openai schema, I don't want to write separate inference client.\nIs there any way we can utilize existing openai wrapper by langchain to do inference for my localhost model.\nI checked there is a openai adapter by langchain, but it seems like it require provider, which again I have to write separate client.\nOverall goal it to not write any redundant code as it's already been maintained by langchain and may change with time. We can modify our api wrt openai and it works out of the box.\nYour suggestion is appreciated.\n"} -{"question": "I'm trying to load 6b 128b 8bit llama based model from file (note the model itself is an example, I tested others and got similar problems), the pipeline is completely eating up my 8gb of vram:\n\n\nMy code:\nfrom langchain.llms import HuggingFacePipeline\nfrom langchain import PromptTemplate, LLMChain\n\nimport torch\nfrom transformers import LlamaTokenizer, LlamaForCausalLM, LlamaConfig, pipeline\n\ntorch.cuda.set_device(torch.device(\"cuda:0\"))\n\nPATH = './models/wizardLM-7B-GPTQ-4bit-128g'\nconfig = LlamaConfig.from_json_file(f'{PATH}/config.json')\nbase_model = LlamaForCausalLM(config=config).half()\n\ntorch.cuda.empty_cache()\ntokenizer = LlamaTokenizer.from_pretrained(\n pretrained_model_name_or_path=PATH,\n low_cpu_mem_usage=True,\n local_files_only=True\n)\ntorch.cuda.empty_cache()\n\npipe = pipeline(\n \"text-generation\",\n model=base_model,\n tokenizer=tokenizer,\n batch_size=1,\n device=0,\n max_length=100,\n temperature=0.6,\n top_p=0.95,\n repetition_penalty=1.2\n)\n\nHow can I make the pipeline initiation consume less vram?\ngpu: AMD\u00ae Radeon rx 6600 (8gb vram, rocm 5.4.2 & torch)\nI want to mention that I managed to load the same model on other frameworks like \"KoboldAI\" or \"text-generation-webui\" so I know it should be possible.\nTo load the model \"wizardLM-7B-GPTQ-4bit-128g\" downloaded from huggingface and run it using with langchain on python.\npip list output:\n Package Version\n------------------------ ----------------\naccelerate 0.19.0\naiofiles 23.1.0\naiohttp 3.8.4\naiosignal 1.3.1\naltair 5.0.0\nanyio 3.6.2\nargilla 1.7.0\nasync-timeout 4.0.2\nattrs 23.1.0\nbackoff 2.2.1\nbeautifulsoup4 4.12.2\nbitsandbytes 0.39.0\ncertifi 2022.12.7\ncffi 1.15.1\nchardet 5.1.0\ncharset-normalizer 2.1.1\nchromadb 0.3.23\nclick 8.1.3\nclickhouse-connect 0.5.24\ncmake 3.25.0\ncolorclass 2.2.2\ncommonmark 0.9.1\ncompressed-rtf 1.0.6\ncontourpy 1.0.7\ncryptography 40.0.2\ncycler 0.11.0\ndataclasses-json 0.5.7\ndatasets 2.12.0\nDeprecated 1.2.13\ndill 0.3.6\nduckdb 0.8.0\neasygui 0.98.3\nebcdic 1.1.1\net-xmlfile 1.1.0\nextract-msg 0.41.1\nfastapi 0.95.2\nffmpy 0.3.0\nfilelock 3.9.0\nfonttools 4.39.4\nfrozenlist 1.3.3\nfsspec 2023.5.0\ngradio 3.28.3\ngradio_client 0.2.5\ngreenlet 2.0.2\nh11 0.14.0\nhnswlib 0.7.0\nhttpcore 0.16.3\nhttptools 0.5.0\nhttpx 0.23.3\nhuggingface-hub 0.14.1\nidna 3.4\nIMAPClient 2.3.1\nJinja2 3.1.2\njoblib 1.2.0\njsonschema 4.17.3\nkiwisolver 1.4.4\nlangchain 0.0.171\nlark-parser 0.12.0\nlinkify-it-py 2.0.2\nlit 15.0.7\nllama-cpp-python 0.1.50\nloralib 0.1.1\nlxml 4.9.2\nlz4 4.3.2\nMarkdown 3.4.3\nmarkdown-it-py 2.2.0\nMarkupSafe 2.1.2\nmarshmallow 3.19.0\nmarshmallow-enum 1.5.1\nmatplotlib 3.7.1\nmdit-py-plugins 0.3.3\nmdurl 0.1.2\nmonotonic 1.6\nmpmath 1.2.1\nmsg-parser 1.2.0\nmsoffcrypto-tool 5.0.1\nmultidict 6.0.4\nmultiprocess 0.70.14\nmypy-extensions 1.0.0\nnetworkx 3.0\nnltk 3.8.1\nnumexpr 2.8.4\nnumpy 1.24.1\nnvidia-cublas-cu11 11.10.3.66\nnvidia-cuda-cupti-cu11 11.7.101\nnvidia-cuda-nvrtc-cu11 11.7.99\nnvidia-cuda-runtime-cu11 11.7.99\nnvidia-cudnn-cu11 8.5.0.96\nnvidia-cufft-cu11 10.9.0.58\nnvidia-curand-cu11 10.2.10.91\nnvidia-cusolver-cu11 11.4.0.1\nnvidia-cusparse-cu11 11.7.4.91\nnvidia-nccl-cu11 2.14.3\nnvidia-nvtx-cu11 11.7.91\nolefile 0.46\noletools 0.60.1\nopenai 0.27.7\nopenapi-schema-pydantic 1.2.4\nopenpyxl 3.1.2\norjson 3.8.12\npackaging 23.1\npandas 1.5.3\npandoc 2.3\npcodedmp 1.2.6\npdfminer.six 20221105\nPillow 9.3.0\npip 23.0.1\nplumbum 1.8.1\nply 3.11\nposthog 3.0.1\npsutil 5.9.5\npyarrow 12.0.0\npycparser 2.21\npydantic 1.10.7\npydub 0.25.1\nPygments 2.15.1\npygpt4all 1.1.0\npygptj 2.0.3\npyllamacpp 2.3.0\npypandoc 1.11\npyparsing 2.4.7\npyrsistent 0.19.3\npython-dateutil 2.8.2\npython-docx 0.8.11\npython-dotenv 1.0.0\npython-magic 0.4.27\npython-multipart 0.0.6\npython-pptx 0.6.21\npytorch-triton-rocm 2.0.1\npytz 2023.3\npytz-deprecation-shim 0.1.0.post0\nPyYAML 6.0\nred-black-tree-mod 1.20\nregex 2023.5.5\nrequests 2.28.1\nresponses 0.18.0\nrfc3986 1.5.0\nrich 13.0.1\nRTFDE 0.0.2\nscikit-learn 1.2.2\nscipy 1.10.1\nsemantic-version 2.10.0\nsentence-transformers 2.2.2\nsentencepiece 0.1.99\nsetuptools 66.0.0\nsix 1.16.0\nsniffio 1.3.0\nsoupsieve 2.4.1\nSQLAlchemy 2.0.15\nstarlette 0.27.0\nsympy 1.11.1\ntabulate 0.9.0\ntenacity 8.2.2\nthreadpoolctl 3.1.0\ntokenizers 0.13.3\ntoolz 0.12.0\ntorch 2.0.1+rocm5.4.2\ntorchaudio 2.0.2+rocm5.4.2\ntorchvision 0.15.2+rocm5.4.2\ntqdm 4.65.0\ntransformers 4.30.0.dev0\ntriton 2.0.0\ntyper 0.9.0\ntyping_extensions 4.4.0\ntyping-inspect 0.8.0\ntzdata 2023.3\ntzlocal 4.2\nuc-micro-py 1.0.2\nunstructured 0.6.6\nurllib3 1.26.13\nuvicorn 0.22.0\nuvloop 0.17.0\nwatchfiles 0.19.0\nwebsockets 11.0.3\nwheel 0.38.4\nwikipedia 1.4.0\nwrapt 1.14.1\nXlsxWriter 3.1.0\nxxhash 3.2.0\nyarl 1.9.2\nzstandard 0.21.0\n\n"} -{"question": "I am trying to make an LLM model that answers questions from the panda's data frame by using Langchain agent.\nHowever, when the model can't find the answers from the data frame, I want the model to google the question and try to get the answers from the website.\nI tried different methods but I could not incorporate the two functions together.\nI currently have a dataset in csv file, and I converted it into the pandas dataframe.\nAfter that, I have created the agent as shown below.\nagent = create_pandas_dataframe_agent(OpenAI(temperature=1), df, verbose=True)\nI am a beginner who just tried to use LLM model. Any help or support would be appreciated!\n"} -{"question": "I currently trying to implement langchain functionality to talk with pdf documents.\nI have a bunch of pdf files stored in Azure Blob Storage. I am trying to use langchain PyPDFLoader to load the pdf files to the Azure ML notebook. However, I am not being able to get it done. If I have the pdf stored locally, it is no problem, but to scale up I have to connect to the blob store. I have not really found any documents on langchain website or azure website. Wondering, if any of you is having similar problem.\nThank you\nBelow is an example of code i am trying:\nfrom azureml.fsspec import AzureMachineLearningFileSystem\nfs = AzureMachineLearningFileSystem(\"\")\n\nfrom langchain.document_loaders import PyPDFLoader\nwith fs.open('*/.../file.pdf', 'rb') as fd:\n loader = PyPDFLoader(document)\n data = loader.load()\n\nError: TypeError: expected str, bytes or os.PathLike object, not StreamInfoFileObject\n\nAnother example tried:\nfrom langchain.document_loaders import UnstructuredFileLoader\nwith fs.open('*/.../file.pdf', 'rb') as fd:\n loader = UnstructuredFileLoader(fd)\ndocuments = loader.load() \n\nError: TypeError: expected str, bytes or os.PathLike object, not StreamInfoFileObject\n\n"} -{"question": "Need some help.\nI have the following json content in a file and would like to use langchain.js and gpt to parse , store and answer question such as\nfor example:\n\"find me jobs with 2 year experience\" ==> should return a list\n\"I have knowledge in javascript find me jobs\" ==> should return the jobs pbject\nI use langchain json loader and I see the file is parse but it say that it find 13 docs . There is only be 3 docs in file . Is the json structure not correct?\nHere is snippet of my parse code\nconst loader = new DirectoryLoader(docPath, {\n \".json\": (path) => new JSONLoader(path),\n});\n\nconst docs = await loader.load();\nconsole.log(docs);\nconsole.log(docs.length);\n\nHere is my input data\n[\n {\n \"jobid\":\"job1\",\n \"title\":\"software engineer\"\n \"skills\":\"java,javascript\",\n \"description\":\"this job requires a associate degrees in CS and 2 years experience\"\n },\n {\n \"jobid\":\"job2\",\n \"skills\":\"math, accounting, spreadsheet\",\n \"description\":\"this job requires a degrees in accounting and 2 years experience\"\n },\n {\n \"jobid\":\"job3\",\n \"title\":\"programmer\"\n \"skills\":\"java,javascript,cloud computing\",\n \"description\":\"this job requires a ,master degrees in CS and 3 years experience\"\n }\n \n]\n\nOUTPUT\n[\n Document {\n pageContent: 'job1',\n metadata: {\n source: 'langchain-document-loaders-in-node-js/documents/jobs.json',\n line: 1\n }\n },\n Document {\n pageContent: 'software engineer',\n metadata: {\n source: 'langchain-document-loaders-in-node-js/documents/jobs.json',\n line: 2\n }\n },\n Document {\n pageContent: 'java,javascript',\n metadata: {\n source: 'langchain-document-loaders-in-node-js/documents/jobs.json',\n line: 3\n }\n },\n Document {\n pageContent: 'this job requires a associate degrees in CS and 2 years experience',\n metadata: {\n source: 'langchain-document-loaders-in-node-js/documents/jobs.json',\n line: 4\n }\n },\n Document {\n pageContent: 'job2',\n metadata: {\n source: 'langchain-document-loaders-in-node-js/documents/jobs.json',\n line: 5\n }\n },\n\n...\n"} -{"question": "I want to create a local LLM using falcon 40b instruct model and combine it with lanchain so I can give it a pdf or some resource to learn from so I can query it ask it questions, learn from it and ultimately be able to derive insights from the pdf report from an Excel sheet.\nFor now, I just want to load a pdf using langchain and have the falcon-40b-instruct model as the agent.\nI want to build an llm where I can make it interact with my own data using langchain.\nHere is my attempt so far:\nfrom langchain_community.llms import HuggingFaceHub\n\nllm = HuggingFaceHub(\nrepo_id=model_name,\ntask=\"text-generation\",\nmodel_kwargs={\n\"max_new_tokens\": 512,\n\"top_k\": 30,\n\"temperature\": 0.1,\n\"repetition_penalty\": 1.03\n},\nhuggingfacehub_api_token=\"hf_XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX\"\n)\n\nI reached the following stage:\nfrom langchain_community.chat_models.huggingface import ChatHuggingFace\nllm = ChatHuggingFace(llm=llm)\n\nyet I get this error:\n\nHfHubHTTPError: 401 Client Error: Unauthorized for url\n\nI am doing do this to be able to run the following:\nqa_chain = RetrievalQA.from_chain_type(\nllm=llm,\nretriever=vector_db.as_retriever()\n)\n\nWhat am I missing and is there a way to be able to do this fully local like doing the falcon model and pass it to ChatHuggingFace?\n"} -{"question": "I can see everything but the Embedding of the documents when I used Chroma with Langchain and OpenAI embeddings. It always show me None for that\nHere is the code:\nfor db_collection_name in tqdm([\"class1-sub2-chap3\", \"class2-sub3-chap4\"]):\n documents = []\n doc_ids = []\n\n for doc_index in range(3):\n cl, sub, chap = db_collection_name.split(\"-\")\n content = f\"This is {db_collection_name}-doc{doc_index}\"\n doc = Document(page_content=content, metadata={\"chunk_num\": doc_index, \"chapter\":chap, \"class\":cl, \"subject\":sub})\n documents.append(doc)\n doc_ids.append(str(doc_index))\n\n\n # # Initialize a Chroma instance with the original document\n db = Chroma.from_documents(\n collection_name=db_collection_name,\n documents=documents, ids=doc_ids,\n embedding=embeddings, \n persist_directory=\"./data\")\n \n db.persist()\n\nwhen I do db.get(), I see everything as expected except embedding is None.\n{'ids': ['0', '1', '2'],\n 'embeddings': None,\n 'documents': ['This is class1-sub2-chap3-doc0',\n 'This is class1-sub2-chap3-doc1',\n 'This is class1-sub2-chap3-doc2'],\n 'metadatas': [{'chunk_num': 0,\n 'chapter': 'chap3',\n 'class': 'class1',\n 'subject': 'sub2'},\n {'chunk_num': 1, 'chapter': 'chap3', 'class': 'class1', 'subject': 'sub2'},\n {'chunk_num': 2, 'chapter': 'chap3', 'class': 'class1', 'subject': 'sub2'}]}\n\nMy embeddings is also working fine as it returns:\nlen(embeddings.embed_documents([\"EMBED THIS\"])[0])\n>> 1536\n\nalso, in my ./data directory I have Embedding file as chroma-embeddings.parquet\n\nI tried the example with example given in document but it shows None too\n# Import Document class\nfrom langchain.docstore.document import Document\n\n# Initial document content and id\ninitial_content = \"This is an initial document content\"\ndocument_id = \"doc1\"\n\n# Create an instance of Document with initial content and metadata\noriginal_doc = Document(page_content=initial_content, metadata={\"page\": \"0\"})\n\n# Initialize a Chroma instance with the original document\nnew_db = Chroma.from_documents(\n collection_name=\"test_collection\",\n documents=[original_doc],\n embedding=OpenAIEmbeddings(), # using the same embeddings as before\n ids=[document_id],\n)\n\nHere also new_db.get() gives me None\n"} -{"question": "The following code do not do what it is supposed to do:\nfrom langchain.callbacks.base import BaseCallbackHandler\nfrom langchain import PromptTemplate\nfrom langchain.chains import LLMChain\nfrom langchain.llms import VertexAI\n\n\nclass MyCustomHandler(BaseCallbackHandler):\n def on_llm_end(self, event, context):\n print(f\"Prompt: {event.prompt}\")\n print(f\"Response: {event.response}\")\n\n\nllm = VertexAI(\n model_name='text-bison@001',\n max_output_tokens=1024,\n temperature=0.3,\n verbose=False)\nprompt = PromptTemplate.from_template(\"1 + {number} = \")\nhandler = MyCustomHandler()\nchain = LLMChain(llm=llm, prompt=prompt, callbacks=[handler])\nresponse = chain.run(number=2)\nprint(response)\n\nBased on this documentation and this tutorial, the code should execute the custom handler callback on_llm_end but in fact it doesn't.\nCan anyone please tell me why?\n"} -{"question": "I am trying to query a stack of word documents using langchain, yet I get the following traceback.\nMay I ask what's the argument that's expected here?\nAlso, side question, is there a way to do such a query locally (without internet access and openai)?\nTraceback:\nTraceback (most recent call last):\n\n File C:\\Program Files\\Spyder\\pkgs\\spyder_kernels\\py3compat.py:356 in compat_exec\n exec(code, globals, locals)\n\n File c:\\data\\langchain\\langchaintest.py:44\n index = VectorstoreIndexCreator().from_loaders(loaders)\n\n File ~\\AppData\\Roaming\\Python\\Python38\\site-packages\\langchain\\indexes\\vectorstore.py:72 in from_loaders\n docs.extend(loader.load())\n\n File ~\\AppData\\Roaming\\Python\\Python38\\site-packages\\langchain\\document_loaders\\text.py:17 in load\n with open(self.file_path, encoding=self.encoding) as f:\n\nOSError: [Errno 22] Invalid argument:\n\n... where \"invalid argument: \" is followed by the raw text from the word document.\nCode:\nimport os\nos.environ[\"OPENAI_API_KEY\"] = \"xxxxxx\"\n\n\nimport os\nimport docx\nfrom langchain.document_loaders import TextLoader\n\n# Function to get text from a docx file\ndef get_text_from_docx(file_path):\n doc = docx.Document(file_path)\n full_text = []\n for paragraph in doc.paragraphs:\n full_text.append(paragraph.text)\n \n return '\\n'.join(full_text)\n\n# Load multiple Word documents\nfolder_path = 'C:/Data/langchain'\nword_files = [os.path.join(folder_path, file) for file in os.listdir(folder_path) if file.endswith('.docx')]\n\nloaders = []\nfor word_file in word_files:\n text = get_text_from_docx(word_file)\n loader = TextLoader(text)\n loaders.append(loader)\n \n \nfrom langchain.indexes import VectorstoreIndexCreator\n\nindex = VectorstoreIndexCreator().from_loaders(loaders)\n\nquery = \"What are the main points discussed in the documents?\"\n\nresponses = index.query(query)\nprint(responses)\n\nresults_with_source=index.query_with_sources(query)\nprint(results_with_source)\n\n"} -{"question": "I have this requirement, where i want to create a knowledge retriver which will call the API to get the closest matching information, I know that we have these integrations in langchain with multiple vector stores, but we have requirement were we have to call the API to find the closest matching document how can we create our custom retriver in langchain which will call this API to get the nearest matching informtaion\nI'm trying to build the custom retriver in langchain but still not able figure it out\n"} -{"question": "I'm trying to use langchain's pandas agent on python for some development work but it goes into a recursive loop due to it being unable to take action on a thought, the thought being, having to run some pandas code to continue the thought process for the asked prompt on some sales dataset (sales.csv).\nhere is the below code\nimport os\nos.environ['OPENAI_API_KEY'] = 'sk-xxx'\nfrom langchain.agents import create_pandas_dataframe_agent\nfrom langchain.chat_models import ChatOpenAI\nfrom langchain.llms import OpenAI\nimport pandas as pd\n\ndf = pd.read_csv('sales.csv')\nllm = ChatOpenAI(temperature=0.0,model_name='gpt-3.5-turbo')\npd_agent = create_pandas_dataframe_agent(llm, df, verbose=True)\npd_agent.run(\"what is the mean of the profit?\")\n\nand well the response it gives is as below (i replaced ``` with ----)\n> Entering new chain...\nThought: We need to calculate the profit first by subtracting the cogs from the total, and then find the mean of the profit.\nAction: Calculate the profit and find the mean using pandas.\nAction Input:\n----\ndf['Profit'] = df['Total'] - df['cogs']\ndf['Profit'].mean()\n----\nObservation: Calculate the profit and find the mean using pandas. is not a valid tool, try another one.\nThought:I need to use python_repl_ast to execute the code.\nAction: Calculate the profit and find the mean using pandas.\nAction Input: `python_repl_ast` \n----\ndf['Profit'] = df['Total'] - df['cogs']\ndf['Profit'].mean()\n----\nObservation: Calculate the profit and find the mean using pandas. is not a valid tool, try another one.\nThought:I need to use `python` instead of `python_repl_ast`.\nAction: Calculate the profit and find the mean using pandas.\nAction Input: `python`\n----\nimport pandas as pd\ndf = pd.read_csv('filename.csv')\ndf['Profit'] = df['Total'] - df['cogs']\ndf['Profit'].mean()\n----\n.\n.\n.\n.\n.\n.\nObservation: Calculate the profit and find the mean using pandas. is not a valid tool, try another one.\nThought:\n\n> Finished chain.\n\n\n'Agent stopped due to iteration limit or time limit.'\nNow my question is why is it not using the python_repl_ast tool to do the calculation?\nI even changed this agent's tool's description (python_repl_ast ) which was\nA Python shell. Use this to execute python commands. Input should be a valid python command. When using this tool, sometimes output is abbreviated - make sure it does not look abbreviated before using it in your answer.\ninto\nA Python shell. Use this to execute python commands and profit, mean calculation using pandas. Input should be a valid python command. When using this tool, sometimes output is abbreviated - make sure it does not look abbreviated before using it in your answer.\nBut it did not help. Also i noticed when the python_repl_ast is initialized into my agent the dataframe is loaded into it's local variables tools = [PythonAstREPLTool(locals={\"df\": df})] so I'm guessing I'm doing something wrong.\nAny help will be greatly appreciated.\nThank you.\n"} -{"question": "I was getting an error when trying to use a Pydantic schema as an args_schema parameter value on a @tool decorator, following the DeepLearning.AI course.\nMy code was:\nfrom pydantic import BaseModel, Field\n\nclass SearchInput(BaseModel):\n query: str = Field(description=\"Thing to search for\")\n\n@tool(args_schema=SearchInput)\ndef search(query: str) -> str:\n \"\"\"Searches for weather online\"\"\"\n return \"21c\"\n\nAnd was getting this error:\nValidationError: 1 validation error for StructuredTool\nargs_schema subclass of BaseModel expected (type=type_error.subclass; expected_class=BaseModel)\n\n"} -{"question": "One can obtain a ChatGPT response to a prompt using the following example:\nfrom openai import OpenAI\n\nclient = OpenAI() # requires key in OPEN_AI_KEY environment variable\n\ncompletion = client.chat.completions.create(\n model=\"gpt-3.5-turbo\",\n messages=[\n {\"role\": \"system\", \"content\": \"You are a poetic assistant, skilled in explaining complex programming concepts with creative flair.\"},\n {\"role\": \"user\", \"content\": \"Compose a poem that explains the concept of recursion in programming.\"}\n ]\n)\n\nprint(completion.choices[0].message.content)\n\nHow can one continue the conversation? I've seen examples saying you just add a new message to the list of messages and re-submit:\n# Continue the conversation by including the initial messages and adding a new one\ncontinued_completion = client.chat.completions.create(\n model=\"gpt-3.5-turbo\",\n messages=[\n {\"role\": \"system\", \"content\": \"You are a poetic assistant, skilled in explaining complex programming concepts with creative flair.\"},\n {\"role\": \"user\", \"content\": \"Compose a poem that explains the concept of recursion in programming.\"},\n {\"role\": \"assistant\", \"content\": initial_completion.choices[0].message.content}, # Include the initial response\n {\"role\": \"user\", \"content\": \"Can you elaborate more on how recursion can lead to infinite loops if not properly handled?\"} # New follow-up prompt\n ]\n)\n\nBut I would imagine this means processing the previous messages all over again at every new prompt, which seems quite wasteful. Is that really the only way? Isn't there a way to keep a \"session\" of some sort that keeps ChatGPT's internal state and just processes a newly given prompt?\n"} diff --git a/optimization_runs/gepa_multi_query_writer.ipynb b/optimization_runs/gepa_multi_query_writer.ipynb deleted file mode 100644 index 415fb15..0000000 --- a/optimization_runs/gepa_multi_query_writer.ipynb +++ /dev/null @@ -1,4309 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 3, - "id": "39e9f6d9", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "3.0.0\n" - ] - } - ], - "source": [ - "import dspy\n", - "print(dspy.__version__)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "8850729a", - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "Prediction(\n", - " final_answer='',\n", - " sources=[Source(object_id='9608f261-9b02-4a9e-ab23-54b3b4c704f8'), Source(object_id='4cf9a2cb-e9ec-4883-8075-05054f38f8a6'), Source(object_id='acdb9703-2471-4b96-989b-b1fe0f1b8aff'), Source(object_id='250479d5-7312-4a56-8b97-edfa2b8e54b4'), Source(object_id='9043a9eb-adcc-4712-a459-9b9b2280c862'), Source(object_id='eabdeb84-50fd-43c0-8711-a0ec0a3d34b2'), Source(object_id='b980876d-0521-4440-abf5-ea2b64dc96ff'), Source(object_id='5163bd72-2249-4fa0-9ac4-7ba904a7f4e4'), Source(object_id='653f7f19-da48-45df-b9d5-19ef173390dc'), Source(object_id='6e8fc16b-f6b0-43b4-ad85-4e4ab00c9881'), Source(object_id='acdb9703-2471-4b96-989b-b1fe0f1b8aff'), Source(object_id='9608f261-9b02-4a9e-ab23-54b3b4c704f8'), Source(object_id='4cf9a2cb-e9ec-4883-8075-05054f38f8a6'), Source(object_id='2c9c4348-53cf-4f35-b070-b6de187aaa5b'), Source(object_id='250479d5-7312-4a56-8b97-edfa2b8e54b4'), Source(object_id='653f7f19-da48-45df-b9d5-19ef173390dc'), Source(object_id='1d04c3e1-6d4c-4bec-a4de-d1a559351637'), Source(object_id='ba45fed1-db4d-4399-bd52-9627fb1ddbe7'), Source(object_id='94f1d91d-1b3f-43bf-a7c0-b983ff8f3d8a'), Source(object_id='eabdeb84-50fd-43c0-8711-a0ec0a3d34b2'), Source(object_id='5163bd72-2249-4fa0-9ac4-7ba904a7f4e4'), Source(object_id='9608f261-9b02-4a9e-ab23-54b3b4c704f8'), Source(object_id='acdb9703-2471-4b96-989b-b1fe0f1b8aff'), Source(object_id='51945f2b-54d7-4b0b-9360-ce656af18ac6'), Source(object_id='ca6fcd08-ef78-4118-b132-757f71cfd1ca'), Source(object_id='4c8ffbb5-8c5b-4e12-b900-36219840e858'), Source(object_id='1cd71614-2aff-4961-a431-a05d5c37f25c'), Source(object_id='eabdeb84-50fd-43c0-8711-a0ec0a3d34b2'), Source(object_id='683a28d2-3fdf-4533-b689-a3ebfc3ab1e2'), Source(object_id='9615f79b-a42b-4320-add1-07c754f722a7'), Source(object_id='250479d5-7312-4a56-8b97-edfa2b8e54b4'), Source(object_id='4cf9a2cb-e9ec-4883-8075-05054f38f8a6'), Source(object_id='acdb9703-2471-4b96-989b-b1fe0f1b8aff'), Source(object_id='9608f261-9b02-4a9e-ab23-54b3b4c704f8'), Source(object_id='a878bad7-1e66-427d-9672-aa96164bb41b'), Source(object_id='5e0291cb-8d88-454c-8938-fead32f20f49'), Source(object_id='653f7f19-da48-45df-b9d5-19ef173390dc'), Source(object_id='ba45fed1-db4d-4399-bd52-9627fb1ddbe7'), Source(object_id='94f1d91d-1b3f-43bf-a7c0-b983ff8f3d8a'), Source(object_id='eabdeb84-50fd-43c0-8711-a0ec0a3d34b2'), Source(object_id='9608f261-9b02-4a9e-ab23-54b3b4c704f8'), Source(object_id='acdb9703-2471-4b96-989b-b1fe0f1b8aff'), Source(object_id='8737d282-e053-4c3d-8f39-717970dc682b'), Source(object_id='250479d5-7312-4a56-8b97-edfa2b8e54b4'), Source(object_id='653f7f19-da48-45df-b9d5-19ef173390dc'), Source(object_id='4cf9a2cb-e9ec-4883-8075-05054f38f8a6'), Source(object_id='eabdeb84-50fd-43c0-8711-a0ec0a3d34b2'), Source(object_id='9043a9eb-adcc-4712-a459-9b9b2280c862'), Source(object_id='ba45fed1-db4d-4399-bd52-9627fb1ddbe7'), Source(object_id='683a28d2-3fdf-4533-b689-a3ebfc3ab1e2'), Source(object_id='acdb9703-2471-4b96-989b-b1fe0f1b8aff'), Source(object_id='9608f261-9b02-4a9e-ab23-54b3b4c704f8'), Source(object_id='653f7f19-da48-45df-b9d5-19ef173390dc'), Source(object_id='2c47bd45-b572-45be-a0bb-c583cdd809f9'), Source(object_id='ba45fed1-db4d-4399-bd52-9627fb1ddbe7'), Source(object_id='94f1d91d-1b3f-43bf-a7c0-b983ff8f3d8a'), Source(object_id='4cf9a2cb-e9ec-4883-8075-05054f38f8a6'), Source(object_id='250479d5-7312-4a56-8b97-edfa2b8e54b4'), Source(object_id='8737d282-e053-4c3d-8f39-717970dc682b'), Source(object_id='51945f2b-54d7-4b0b-9360-ce656af18ac6')],\n", - " searches=['How to integrate Weaviate vector database with LangChain for building AI applications step-by-step tutorial', 'Examples of using Weaviate as a vector store in LangChain workflows with Python code samples', 'Best practices for connecting LangChain with Weaviate for semantic search and retrieval-augmented generation', 'Official documentation and guides on using Weaviate with LangChain for managing embeddings and queries', 'Common issues and troubleshooting tips when using Weaviate vector database in LangChain projects', 'Comparison of Weaviate and other vector databases when used with LangChain for language model applications'],\n", - " aggregations=None,\n", - " usage={}\n", - ")" - ] - }, - "execution_count": 8, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "import retrieve_dspy\n", - "\n", - "query_writer = retrieve_dspy.MultiQueryWriter(\n", - " collection_name=\"FreshstackLangchain\",\n", - " target_property_name=\"docs_text\",\n", - " retrieved_k=10,\n", - " verbose=False\n", - ")\n", - "\n", - "query_writer(\"How can I use Weaviate with LangChain?\")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "f2d84c1b", - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/Users/cshorten/Desktop/retrieve-dspy/.venv/lib/python3.11/site-packages/weaviate/warnings.py:292: ResourceWarning: Con004: The connection to Weaviate was not closed properly. This can lead to memory leaks.\n", - " Please make sure to close the connection using `client.close()`.\n", - " warnings.warn(\n", - "/var/folders/41/8dp_379x15d8zz4ppsjthdw40000gn/T/ipykernel_71135/975551085.py:24: ResourceWarning: unclosed \n", - " evaluator = retrieve_dspy.utils.get_evaluator(\n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n" - ] - }, - { - "data": { - "text/plain": [ - "{'path': 'gepa_multi_query_writer_training_samples.jsonl',\n", - " 'added': 30,\n", - " 'total_in_file': 30}" - ] - }, - "execution_count": 12, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "import os\n", - "\n", - "import weaviate\n", - "\n", - "from retrieve_dspy.metrics import create_coverage_metric_with_feedback\n", - "from retrieve_dspy.datasets.in_memory import load_queries_in_memory\n", - "\n", - "trainset, testset = load_queries_in_memory(\n", - " dataset_name=\"freshstack-langchain\",\n", - " train_samples=30,\n", - " test_samples=20\n", - ")\n", - "\n", - "weaviate_client = weaviate.connect_to_weaviate_cloud(\n", - " cluster_url=os.getenv(\"WEAVIATE_URL\"),\n", - " auth_credentials=weaviate.auth.AuthApiKey(os.getenv(\"WEAVIATE_API_KEY\")),\n", - ")\n", - "\n", - "metric_for_gepa = create_coverage_metric_with_feedback(\n", - " weaviate_client=weaviate_client,\n", - " dataset_name=\"freshstack-langchain\"\n", - ")\n", - "\n", - "evaluator = retrieve_dspy.utils.get_evaluator(\n", - " testset=testset,\n", - " metric=metric_for_gepa\n", - ")\n", - "\n", - "retrieve_dspy.utils.save_training_questions(trainset, \"gepa_multi_query_writer_training_samples.jsonl\")" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "id": "e3369d55", - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "Example({'question': 'How should I add a field to the metadata of Langchain\\'s Documents?\\nFor example, using the CharacterTextSplitter gives a list of Documents:\\nconst splitter = new CharacterTextSplitter({\\n separator: \" \",\\n chunkSize: 7,\\n chunkOverlap: 3,\\n});\\nsplitter.createDocuments([text]);\\n\\nA document will have the following structure:\\n{\\n \"pageContent\": \"blablabla\",\\n \"metadata\": {\\n \"name\": \"my-file.pdf\",\\n \"type\": \"application/pdf\",\\n \"size\": 12012,\\n \"lastModified\": 1688375715518,\\n \"loc\": { \"lines\": { \"from\": 1, \"to\": 3 } }\\n }\\n}\\n\\nAnd I want to add a field to the metadata\\n', 'dataset_ids': ['langchainjs/libs/langchain-textsplitters/src/text_splitter.ts_0_8341', 'langchainjs/docs/core_docs/docs/how_to/character_text_splitter.ipynb_0_4474'], 'nugget_data': [{'nugget_id': '76603417_nugget_0', 'text': 'The `createDocuments` function accepts a second argument, which is an array of objects.', 'relevant_corpus_ids': ['langchainjs/libs/langchain-textsplitters/src/text_splitter.ts_0_8341', 'langchainjs/docs/core_docs/docs/how_to/character_text_splitter.ipynb_0_4474']}, {'nugget_id': '76603417_nugget_1', 'text': 'Properties from the objects in this array are added to the metadata of each document in the returned documents array.', 'relevant_corpus_ids': ['langchainjs/libs/langchain-textsplitters/src/text_splitter.ts_0_8341', 'langchainjs/docs/core_docs/docs/how_to/character_text_splitter.ipynb_0_4474']}, {'nugget_id': '76603417_nugget_2', 'text': 'To add a new field to the metadata, include it in an object within the second argument array.', 'relevant_corpus_ids': ['langchainjs/libs/langchain-textsplitters/src/text_splitter.ts_0_8341', 'langchainjs/docs/core_docs/docs/how_to/character_text_splitter.ipynb_0_4474']}]}) (input_keys={'question'})" - ] - }, - "execution_count": 13, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "trainset[0]" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "id": "a2b2dd45", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 15.31 / 20 (76.5%): 100%|██████████| 20/20 [00:30<00:00, 1.51s/it]" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 21:54:31 INFO dspy.evaluate.evaluate: Average Metric: 15.308333333333334 / 20 (76.5%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "text/plain": [ - "EvaluationResult(score=76.54, results=)" - ] - }, - "execution_count": 14, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "dspy_evaluator_kwargs = {\n", - " \"num_threads\": 5\n", - "}\n", - "\n", - "evaluator(query_writer, **dspy_evaluator_kwargs)" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "id": "d7fa2a31", - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 21:54:41 INFO dspy.teleprompt.gepa.gepa: Running GEPA for approx 500 metric calls of the program. This amounts to 33.33 full evals on the train+val set.\n", - "2025/08/13 21:54:41 INFO dspy.teleprompt.gepa.gepa: Using 15 examples for tracking Pareto scores. You can consider using a sample of the valset to allow GEPA to explore more diverse solutions within the same budget.\n", - "2025/08/13 21:54:55 INFO dspy.evaluate.evaluate: Average Metric: 11.833333333333334 / 15 (78.9%)\n", - "2025/08/13 21:54:55 INFO dspy.teleprompt.gepa.gepa: Iteration 0: Base program full valset score: 0.788888888888889\n", - "2025/08/13 21:54:55 INFO dspy.teleprompt.gepa.gepa: Iteration 1: Selected program 0 score: 0.788888888888889\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 4.50 / 5 (90.0%): 100%|██████████| 5/5 [00:07<00:00, 1.48s/it] " - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 21:55:03 INFO dspy.evaluate.evaluate: Average Metric: 4.5 / 5 (90.0%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 21:56:02 INFO dspy.teleprompt.gepa.gepa: Iteration 1: Proposed new text for query_writer: You are given a user’s technical question. Your job is to produce a diverse set of long, highly specific search queries that will help gather the most relevant information from search engines to answer that question.\n", - "\n", - "General requirements\n", - "- Output only a flat list (array) of 10–15 search queries (strings). No explanations, no extra text.\n", - "- Each query must be long and detailed (aim for 12–25+ words) and include:\n", - " - Exact class/function names, parameters, and error messages (quoted) from the question, when present.\n", - " - Relevant library/framework names and versions if known or commonly implicated.\n", - " - Concrete task wording (what the user is trying to do) and likely solution angles.\n", - " - Multiple phrasings and synonyms to increase recall (e.g., “agent” vs “chain”, “callback” vs “hook”, “retriever tool” vs “vector store tool”).\n", - " - At least some queries with site or intent scoping when appropriate:\n", - " - site:docs.langchain.com OR site:python.langchain.com OR site:langchain.readthedocs.io for official docs\n", - " - site:github.com with “issues” or “discussions”\n", - " - site:stackoverflow.com\n", - " - Include generic web queries (no site: operator) too, to cover blogs/tutorials.\n", - "- Explore multiple plausible interpretations and troubleshooting paths (think: API misuse, version mismatch, wrong class, wrong parameter, missing install, deprecation).\n", - "- Prefer queries that embed specific code identifiers (e.g., create_sql_agent, SQLDatabaseToolkit, AgentExecutor, BaseCallbackHandler.on_llm_end, Chroma persist_directory) and common ecosystem terms (e.g., “ReAct agent”, “retriever tool”, “text-to-SQL”, “Ollama”, “Pinecone”, “Chroma persistence”).\n", - "- Quote exact error text where applicable to anchor results (e.g., \"value is not a valid dict\", \"ValidationError\").\n", - "- Do not provide answers or code; provide only search queries.\n", - "\n", - "Strategy for query generation\n", - "- Start from the user goal: restate it in different ways across queries.\n", - "- Add targeted troubleshooting angles:\n", - " - Version and compatibility (e.g., “LangChain 0.1.x”, “deprecated API”, “AzureOpenAI vs OpenAI”).\n", - " - Correct import paths and class choices (e.g., OpenAI vs AzureOpenAI vs ChatOpenAI vs ChatOllama/Ollama).\n", - " - Correct argument and signature usage (e.g., callbacks passed to LLM vs chain, on_llm_end signature and **kwargs).\n", - " - Required installations and drivers (e.g., openai, psycopg2/psycopg2-binary).\n", - " - Configuration specifics (e.g., SQLDatabase.from_uri with psycopg2, Pinecone retriever tool, Chroma persist_directory).\n", - " - Prompt format constraints (e.g., Llama 2 chat/instruction formatting when replacing OpenAI with Ollama).\n", - "- Cover at least these query types where relevant:\n", - " 1) Official docs/reference (“how to” or API usage).\n", - " 2) End-to-end examples and tutorials.\n", - " 3) Troubleshooting known errors and GitHub issues/discussions.\n", - " 4) Stack Overflow Q&A for similar symptoms.\n", - " 5) Migration/deprecation notes for breaking changes.\n", - "\n", - "Domain-specific guidance to incorporate when relevant (LangChain and adjacent tooling)\n", - "- SQL agents and PostgreSQL:\n", - " - Use: create_sql_agent, SQLDatabaseToolkit, SQLDatabase.from_uri(\"postgresql+psycopg2://...\"), AgentExecutor; ensure psycopg2/psycopg2-binary installed.\n", - " - Common imports: create_sql_agent, SQLDatabaseToolkit, SQLDatabase, OpenAI/ChatOpenAI; validate AzureOpenAI compatibility (some toolkits historically expect OpenAI/ChatOpenAI).\n", - " - Include error strings like \"ValidationError: ... value is not a valid dict\" and search for llm parameter type expectations.\n", - "- ReAct agents with retrieval:\n", - " - Combine ReAct agents and vector stores by exposing the retriever as a Tool (e.g., vector_store.as_retriever(search_kwargs={'k': ...})).\n", - " - Use create_structured_chat_agent, AgentExecutor, ConversationalRetrievalChain; search for “retriever tool”, “Pinecone retriever”, “ReAct with retrieval”.\n", - "- Callbacks:\n", - " - Pass callbacks to the LLM instance when required, not only to LLMChain; verify correct BaseCallbackHandler method signatures (on_llm_end(response, **kwargs)).\n", - " - Include “VertexAI” specific callback behaviors and LangChain callback system docs.\n", - "- Ollama / Llama 2:\n", - " - Use LangChain’s Ollama integrations (e.g., Ollama or ChatOllama) when replacing OpenAI in SQL agents.\n", - " - Adjust prompt format for Llama 2 (system/instruction roles) if needed for agent and text-to-SQL performance.\n", - " - If using a local REST API (e.g., localhost:11434/api/generate), consider custom LLM wrappers; search for “LangChain custom LLM” and Ollama examples.\n", - "- Chroma vector DB persistence:\n", - " - Use persist_directory both when creating and reloading; call .persist() as needed; ensure same embedding function on load.\n", - " - Search for “Chroma persist_directory example save load embeddings”.\n", - "\n", - "Formatting\n", - "- Provide 10–15 distinct queries as a JSON-like Python list of strings.\n", - "- Make queries long and concrete, combining the user’s wording, exact APIs, and the above domain hints where applicable.\n", - "2025/08/13 21:56:14 INFO dspy.evaluate.evaluate: Average Metric: 4.833333333333334 / 5 (96.7%)\n", - "2025/08/13 21:56:36 INFO dspy.evaluate.evaluate: Average Metric: 12.833333333333334 / 15 (85.6%)\n", - "2025/08/13 21:56:36 INFO dspy.teleprompt.gepa.gepa: Iteration 1: New program is on the linear pareto front\n", - "2025/08/13 21:56:36 INFO dspy.teleprompt.gepa.gepa: Iteration 1: Full valset score for new program: 0.8555555555555556\n", - "2025/08/13 21:56:36 INFO dspy.teleprompt.gepa.gepa: Iteration 1: Full train_val score for new program: 0.8555555555555556\n", - "2025/08/13 21:56:36 INFO dspy.teleprompt.gepa.gepa: Iteration 1: Individual valset scores for new program: [1.0, 0.8333333333333334, 0.5, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.75, 1.0, 1.0, 0.0, 1.0, 0.75]\n", - "2025/08/13 21:56:36 INFO dspy.teleprompt.gepa.gepa: Iteration 1: New valset pareto front scores: [1.0, 0.8333333333333334, 0.5, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.75, 1.0, 1.0, 0.0, 1.0, 0.75]\n", - "2025/08/13 21:56:36 INFO dspy.teleprompt.gepa.gepa: Iteration 1: Full valset pareto front score: 0.8555555555555556\n", - "2025/08/13 21:56:36 INFO dspy.teleprompt.gepa.gepa: Iteration 1: Updated valset pareto front programs: [{0, 1}, {0, 1}, {0, 1}, {0, 1}, {0, 1}, {0, 1}, {0, 1}, {0, 1}, {0, 1}, {0, 1}, {1}, {0, 1}, {0, 1}, {1}, {0, 1}]\n", - "2025/08/13 21:56:36 INFO dspy.teleprompt.gepa.gepa: Iteration 1: Best valset aggregate score so far: 0.8555555555555556\n", - "2025/08/13 21:56:36 INFO dspy.teleprompt.gepa.gepa: Iteration 1: Best program as per aggregate score on train_val: 1\n", - "2025/08/13 21:56:36 INFO dspy.teleprompt.gepa.gepa: Iteration 1: Best program as per aggregate score on valset: 1\n", - "2025/08/13 21:56:36 INFO dspy.teleprompt.gepa.gepa: Iteration 1: Best score on valset: 0.8555555555555556\n", - "2025/08/13 21:56:36 INFO dspy.teleprompt.gepa.gepa: Iteration 1: Best score on train_val: 0.8555555555555556\n", - "2025/08/13 21:56:36 INFO dspy.teleprompt.gepa.gepa: Iteration 1: Linear pareto front program index: 1\n", - "2025/08/13 21:56:36 INFO dspy.teleprompt.gepa.gepa: Iteration 1: New program candidate index: 1\n", - "2025/08/13 21:56:36 INFO dspy.teleprompt.gepa.gepa: Iteration 2: No merge candidates found\n", - "2025/08/13 21:56:36 INFO dspy.teleprompt.gepa.gepa: Iteration 2: Selected program 1 score: 0.8555555555555556\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 4.25 / 5 (85.0%): 100%|██████████| 5/5 [00:10<00:00, 2.19s/it] " - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 21:56:47 INFO dspy.evaluate.evaluate: Average Metric: 4.25 / 5 (85.0%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 21:57:37 INFO dspy.teleprompt.gepa.gepa: Iteration 2: Proposed new text for query_writer: You are given a user’s technical question. Your job is to produce a diverse set of long, highly specific search queries that will help gather the most relevant information from search engines to answer that question.\n", - "\n", - "Output format\n", - "- Output only a flat list (array) of 10–15 search queries (strings). No explanations, no extra text.\n", - "- Each query should be 12–25+ words, concrete, and uniquely phrased to maximize recall.\n", - "\n", - "What to include in each query\n", - "- Quote exact class/function names, parameters, import paths, and error messages from the user question when present (e.g., \"load_qa_chain\", \"input_documents\", \"AttributeError: 'tuple' object has no attribute 'page_content'\").\n", - "- Include relevant library/framework names and versions if known or commonly implicated (e.g., LangChain 0.1.x/0.2.x, Transformers 4.x, PEFT, Chroma, Pinecone, Ollama).\n", - "- Clearly state the user’s goal and plausible solution angles or failure modes (e.g., API misuse, wrong parameter type, version mismatch, missing install, deprecated API).\n", - "- Use multiple phrasings and synonyms to increase recall (e.g., agent vs chain, callback vs hook/handler, retriever tool vs vector store tool, ReAct agent vs structured chat agent).\n", - "- Include a mix of generic web queries and site-scoped queries. Where appropriate, add:\n", - " - site:python.langchain.com OR site:docs.langchain.com OR site:langchain.readthedocs.io for official docs\n", - " - site:js.langchain.com for LangChain JS/TS\n", - " - site:github.com with “issues” OR “discussions”\n", - " - site:stackoverflow.com\n", - "- Embed specific code identifiers and ecosystem terms when plausible (e.g., create_sql_agent, SQLDatabaseToolkit, SQLDatabase.from_uri, AgentExecutor, BaseCallbackHandler.on_llm_end, Chroma persist_directory, vector_store.as_retriever, ConversationalRetrievalChain, create_structured_chat_agent, ChatOpenAI, AzureOpenAI, ChatOllama/Ollama).\n", - "- Use quotes around exact error text to anchor results (e.g., \"Pipeline cannot infer suitable model classes\", \"ValidationError: value is not a valid dict\").\n", - "\n", - "Strategy for query generation\n", - "- Restate the user goal in different ways across queries, exploring multiple plausible interpretations.\n", - "- Cover multiple troubleshooting angles:\n", - " - Version and compatibility:\n", - " - LangChain 0.1.x/0.2.x migration and deprecations (e.g., new import paths: langchain_openai, langchain_community, langchain_text_splitters, langchain_core).\n", - " - OpenAI vs AzureOpenAI vs ChatOpenAI vs ChatOllama/Ollama differences and signature expectations.\n", - " - Correct imports and argument usage:\n", - " - Python vs JS SDK differences (js.langchain.com).\n", - " - Callbacks passed to the LLM vs chain; correct BaseCallbackHandler method signatures (on_llm_end(response, **kwargs)).\n", - " - Memory configuration (e.g., BufferMemory returnMessages: true for chat models).\n", - " - Required installations/drivers and configuration:\n", - " - For SQL/PostgreSQL: psycopg2/psycopg2-binary; SQLDatabase.from_uri(\"postgresql+psycopg2://…\"); create_sql_agent; SQLDatabaseToolkit; AgentExecutor.\n", - " - Vector stores: Chroma persist_directory and .persist(); matching embedding function when loading; correct imports from langchain_community.vectorstores.\n", - " - Prompt or format constraints:\n", - " - Llama 2 chat/instruction formatting when replacing OpenAI with Ollama or Hugging Face models.\n", - " - Structured outputs: using StructuredOutputParser, PydanticOutputParser, specifying array/list schemas; format_instructions for list of objects.\n", - " - Migration/deprecation notes:\n", - " - load_qa_chain vs newer chains (e.g., create_stuff_documents_chain); LCEL Runnables; ReAct vs structured chat agent APIs.\n", - "- Include end-to-end examples/tutorials, official references, GitHub issues/discussions, Stack Overflow Q&A, and migration notes across the set of queries.\n", - "\n", - "Domain-specific guidance to incorporate when relevant (LangChain and adjacent tooling)\n", - "- Documents and QA chains:\n", - " - Create Document with page_content and metadata; pass a list to \"input_documents\" (wrap single Document in a list).\n", - " - Be aware of new Document import path (langchain_core.documents.Document vs older langchain.schema.Document).\n", - "- ReAct agents with retrieval:\n", - " - Expose vector store retriever as a Tool (vector_store.as_retriever(search_kwargs={'k': …})); combine with AgentExecutor; alternatives like ConversationalRetrievalChain.\n", - "- Callbacks/streaming:\n", - " - Pass callbacks to LLM instances; verify BaseCallbackHandler signatures.\n", - " - LangChain JS: prefer ChatOpenAI with { streaming: true } and callbacks.handleLLMNewToken; consider ChatPromptTemplate and BufferMemory({ returnMessages: true }).\n", - "- Ollama / Llama 2:\n", - " - Use LangChain’s Ollama integrations (Ollama, ChatOllama) when replacing OpenAI; adjust Llama 2 chat/instruction prompts; consider custom LLM wrapper using localhost:11434 REST API.\n", - "- Chroma vector DB:\n", - " - Use persist_directory for save/load; call .persist(); reuse same embeddings on reload.\n", - "- HTML ingestion and chunking:\n", - " - Consider fetching/cleaning with requests + BeautifulSoup; UnstructuredHTMLLoader vs BS4 HTMLLoader; RecursiveCharacterTextSplitter; combine headers with following paragraphs; include metadata like page title; enforce target chunk size limits.\n", - "- PEFT/LoRA and Hugging Face:\n", - " - For \"Pipeline cannot infer suitable model classes\": ensure config.json/tokenizer files; install peft/accelerate/sentencepiece/transformers; try loading base model + adapters with PEFT; test with plain transformers first; consider custom LangChain LLM subclass for loading adapters.\n", - "- SQL agents and PostgreSQL:\n", - " - create_sql_agent, SQLDatabaseToolkit, SQLDatabase.from_uri(\"postgresql+psycopg2://…\"), AgentExecutor; ensure psycopg2 installed; verify AzureOpenAI compatibility if applicable.\n", - "- Common error strings to anchor:\n", - " - \"AttributeError: 'tuple' object has no attribute 'page_content'\"\n", - " - \"Pipeline cannot infer suitable model classes\"\n", - " - \"ValidationError: value is not a valid dict\"\n", - "\n", - "Coverage requirements across the 10–15 queries\n", - "1) Official docs/reference (“how to” or API usage) with site scoping.\n", - "2) End-to-end examples and tutorials.\n", - "3) Troubleshooting known errors and GitHub issues/discussions.\n", - "4) Stack Overflow Q&A with similar symptoms.\n", - "5) Migration/deprecation notes for breaking changes.\n", - "\n", - "Important\n", - "- Do not provide answers or code; provide only search queries as a JSON-like Python list of strings.\n", - "- Make queries highly specific, verbose, and varied; include concrete API names, imports, parameters, and exact error text where applicable.\n", - "2025/08/13 21:57:48 INFO dspy.evaluate.evaluate: Average Metric: 4.25 / 5 (85.0%)\n", - "2025/08/13 21:57:48 INFO dspy.teleprompt.gepa.gepa: Iteration 2: New subsample score is not better, skipping\n", - "2025/08/13 21:57:48 INFO dspy.teleprompt.gepa.gepa: Iteration 3: Selected program 1 score: 0.8555555555555556\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 3.50 / 5 (70.0%): 100%|██████████| 5/5 [00:10<00:00, 2.11s/it] " - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 21:57:59 INFO dspy.evaluate.evaluate: Average Metric: 3.5 / 5 (70.0%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 21:59:14 INFO dspy.teleprompt.gepa.gepa: Iteration 3: Proposed new text for query_writer: You are given a user’s technical question. Your task is to output a diverse set of long, highly specific search queries that will help gather the most relevant information from search engines to answer that question.\n", - "\n", - "Output format\n", - "- Output only a flat list (array) of 10–15 search queries (strings). No explanations, no extra text.\n", - "- Use a JSON-like Python list of strings.\n", - "- Each query should be long and detailed (aim for 12–25+ words).\n", - "\n", - "What to include in each query\n", - "- Mirror the user’s goal and restate it in several ways across the queries.\n", - "- Embed exact class/function names, parameters, method calls, import paths, and error messages (quoted) from the question, when present.\n", - "- Include relevant library/framework names and versions if known or commonly implicated (e.g., “LangChain 0.1.x”, “LangChain 0.2.x”, “SQLAlchemy 2.x”, “psycopg2”, “pyodbc”).\n", - "- Use concrete task wording (what they’re trying to do) and likely solution angles (e.g., “how to add memory”, “custom prompt”, “disable metadata reflection”, “print BaseMessage content”).\n", - "- Use multiple phrasings and synonyms to increase recall (e.g., “agent” vs “chain”, “callback” vs “hook”, “retriever tool” vs “vector store tool”, “ReAct” vs “structured chat agent”).\n", - "- Include code identifiers and common ecosystem terms: e.g., create_sql_agent, SQLDatabaseToolkit, SQLDatabase.from_uri(\"postgresql+psycopg2://...\"), SQLDatabaseSequentialChain, AgentExecutor, ConversationalRetrievalChain, CharacterTextSplitter.createDocuments, CSVLoader.metadata_columns, BaseCallbackHandler.on_llm_end, BaseMessage, AIMessage, ChatOpenAI.invoke, Chroma persist_directory, Pinecone, Ollama/ChatOllama, AzureOpenAI vs OpenAI, ChatOpenAI import from langchain_openai.\n", - "- Quote exact error text to anchor results when applicable: e.g., \"ValidationError\", \"value is not a valid dict\", \"openai.error.APIError: internal error\", \"invalid_request_error\".\n", - "\n", - "Coverage and diversity\n", - "- Provide a mix of query intents:\n", - " 1) Official docs/reference queries (API usage, parameters, signatures).\n", - " 2) End-to-end examples/tutorials and code samples.\n", - " 3) Troubleshooting known errors and GitHub issues/discussions.\n", - " 4) Stack Overflow Q&A for similar symptoms.\n", - " 5) Migration/deprecation notes for breaking changes (e.g., LangChain 0.0.x/0.1.x to 0.2.x module split).\n", - "- Include site or intent scoping where appropriate:\n", - " - site:docs.langchain.com OR site:python.langchain.com OR site:langchain.readthedocs.io (official docs)\n", - " - site:api.python.langchain.com (API reference)\n", - " - site:js.langchain.com (JS/TS docs when relevant)\n", - " - site:github.com with “issues” or “discussions”\n", - " - site:stackoverflow.com\n", - "- Also include generic web queries (no site: operator) to cover blogs, tutorials, and community posts.\n", - "- Explore multiple plausible interpretations and troubleshooting paths:\n", - " - API misuse or wrong class\n", - " - Incorrect argument or signature usage\n", - " - Version mismatch or deprecated API\n", - " - Missing install or wrong driver\n", - " - Configuration specifics and performance pitfalls\n", - "\n", - "Domain-specific guidance to incorporate (LangChain and adjacent tooling)\n", - "- ChatOpenAI / OpenAI usage:\n", - " - Use langchain_openai.ChatOpenAI for recent versions; distinguish OpenAI vs AzureOpenAI vs ChatOllama/Ollama.\n", - " - invoke(...) returns a BaseMessage/AIMessage; to view text, access the .content attribute (e.g., print(result.content)).\n", - " - Include queries that explicitly mention “BaseMessage”, “AIMessage”, and “.content” when users see no output after invoke.\n", - " - Consider issues like trailing spaces in model_name (e.g., \"text-davinci-003 \") causing \"invalid_request_error\".\n", - "- Memory and custom prompts for retrieval chat:\n", - " - To combine memory and retrieval, prefer ConversationalRetrievalChain; customize condense question and answer prompts (PromptTemplate).\n", - " - Include queries asking for end-to-end code that demonstrates memory affecting responses and custom prompts influencing output.\n", - " - Compare RetrievalQA.from_chain_type vs ConversationalRetrievalChain; how to attach ConversationBufferMemory; how to override prompts.\n", - "- SQL agents and databases:\n", - " - Use create_sql_agent, SQLDatabaseToolkit, SQLDatabase.from_uri(\"postgresql+psycopg2://...\"), AgentExecutor; ensure drivers installed (psycopg2/psycopg2-binary, pyodbc).\n", - " - For MSSQL: \"mssql+pyodbc://...?...driver=ODBC Driver 17 for SQL Server\".\n", - " - SQLDatabaseSequentialChain can be slow with many tables due to SQLDatabase metadata reflection on all tables at initialization. Include queries about:\n", - " - Disabling or limiting reflection, include/exclude tables, or lazy approaches.\n", - " - Alternatives: SQL agents (create_sql_agent), SQLDatabaseChain, SQLDatabaseToolkit, table_info hints, top_k tuning.\n", - " - Known performance workarounds (e.g., reducing reflection scope) and GitHub issues discussing reflection delays.\n", - "- ReAct agents with retrieval:\n", - " - Expose vector store retriever as a Tool (vector_store.as_retriever(search_kwargs={'k': ...})); use AgentExecutor or create_structured_chat_agent; consider ConversationalRetrievalChain.\n", - "- Callbacks:\n", - " - Pass callbacks to the LLM instance when required, not just to chains.\n", - " - Verify BaseCallbackHandler method signatures (on_llm_end(response, **kwargs)) and kwargs handling; include VertexAI callback nuances when relevant.\n", - "- Ollama / Llama models:\n", - " - Use LangChain’s Ollama or ChatOllama when replacing OpenAI in agents/chains.\n", - " - Adjust prompt format for Llama 2/3 (system/instruction roles) for agents and text-to-SQL; consider custom LLM wrappers for local REST APIs.\n", - "- Chroma vector DB persistence:\n", - " - Use persist_directory on create and reload; call .persist(); ensure same embedding function on load; include troubleshooting for empty/duplicated collections.\n", - "- CSVLoader and Documents (embedding with Pinecone, etc.):\n", - " - CSVLoader builds Document.pageContent by joining row key-value pairs excluding metadata_columns; embeddings are computed on pageContent.\n", - " - CharacterTextSplitter (JS) createDocuments accepts a second argument (array of objects) merged into Document.metadata; include examples: createDocuments(texts, metadatas).\n", - "- Versioning/migration:\n", - " - LangChain 0.1.x/0.2.x modular packages (langchain, langchain-openai, langchain-community); changes in imports and deprecations; migration notes.\n", - "\n", - "Quality checks before submitting\n", - "- 10–15 distinct, detailed queries; avoid near-duplicates.\n", - "- At least some queries are site-scoped (docs, GitHub issues/discussions, Stack Overflow).\n", - "- Include multiple phrasings/synonyms and cover different plausible causes and solutions.\n", - "- Embed exact code identifiers and any error strings from the question.\n", - "- Do not provide answers or code—only the list of queries.\n", - "2025/08/13 21:59:25 INFO dspy.evaluate.evaluate: Average Metric: 4.5 / 5 (90.0%)\n", - "2025/08/13 21:59:46 INFO dspy.evaluate.evaluate: Average Metric: 13.0 / 15 (86.7%)\n", - "2025/08/13 21:59:46 INFO dspy.teleprompt.gepa.gepa: Iteration 3: New program is on the linear pareto front\n", - "2025/08/13 21:59:46 INFO dspy.teleprompt.gepa.gepa: Iteration 3: Full valset score for new program: 0.8666666666666667\n", - "2025/08/13 21:59:46 INFO dspy.teleprompt.gepa.gepa: Iteration 3: Full train_val score for new program: 0.8666666666666667\n", - "2025/08/13 21:59:46 INFO dspy.teleprompt.gepa.gepa: Iteration 3: Individual valset scores for new program: [1.0, 1.0, 0.5, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.75, 1.0, 1.0, 0.0, 1.0, 0.75]\n", - "2025/08/13 21:59:46 INFO dspy.teleprompt.gepa.gepa: Iteration 3: New valset pareto front scores: [1.0, 1.0, 0.5, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.75, 1.0, 1.0, 0.0, 1.0, 0.75]\n", - "2025/08/13 21:59:46 INFO dspy.teleprompt.gepa.gepa: Iteration 3: Full valset pareto front score: 0.8666666666666667\n", - "2025/08/13 21:59:46 INFO dspy.teleprompt.gepa.gepa: Iteration 3: Updated valset pareto front programs: [{0, 1, 2}, {2}, {0, 1, 2}, {0, 1, 2}, {0, 1, 2}, {0, 1, 2}, {0, 1, 2}, {0, 1, 2}, {0, 1, 2}, {0, 1, 2}, {1, 2}, {0, 1, 2}, {0, 1, 2}, {1, 2}, {0, 1, 2}]\n", - "2025/08/13 21:59:46 INFO dspy.teleprompt.gepa.gepa: Iteration 3: Best valset aggregate score so far: 0.8666666666666667\n", - "2025/08/13 21:59:46 INFO dspy.teleprompt.gepa.gepa: Iteration 3: Best program as per aggregate score on train_val: 2\n", - "2025/08/13 21:59:46 INFO dspy.teleprompt.gepa.gepa: Iteration 3: Best program as per aggregate score on valset: 2\n", - "2025/08/13 21:59:46 INFO dspy.teleprompt.gepa.gepa: Iteration 3: Best score on valset: 0.8666666666666667\n", - "2025/08/13 21:59:46 INFO dspy.teleprompt.gepa.gepa: Iteration 3: Best score on train_val: 0.8666666666666667\n", - "2025/08/13 21:59:46 INFO dspy.teleprompt.gepa.gepa: Iteration 3: Linear pareto front program index: 2\n", - "2025/08/13 21:59:46 INFO dspy.teleprompt.gepa.gepa: Iteration 3: New program candidate index: 2\n", - "2025/08/13 21:59:46 INFO dspy.teleprompt.gepa.gepa: Iteration 4: No merge candidates found\n", - "2025/08/13 21:59:46 INFO dspy.teleprompt.gepa.gepa: Iteration 4: Selected program 2 score: 0.8666666666666667\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 5.00 / 5 (100.0%): 100%|██████████| 5/5 [00:10<00:00, 2.14s/it]" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 21:59:57 INFO dspy.evaluate.evaluate: Average Metric: 5.0 / 5 (100.0%)\n", - "2025/08/13 21:59:57 INFO dspy.teleprompt.gepa.gepa: Iteration 4: All subsample scores perfect. Skipping.\n", - "2025/08/13 21:59:57 INFO dspy.teleprompt.gepa.gepa: Iteration 4: Reflective mutation did not propose a new candidate\n", - "2025/08/13 21:59:57 INFO dspy.teleprompt.gepa.gepa: Iteration 5: Selected program 2 score: 0.8666666666666667\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "Average Metric: 4.75 / 5 (95.0%): 100%|██████████| 5/5 [00:10<00:00, 2.10s/it]" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 22:00:08 INFO dspy.evaluate.evaluate: Average Metric: 4.75 / 5 (95.0%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 22:01:02 INFO dspy.teleprompt.gepa.gepa: Iteration 5: Proposed new text for query_writer: You are given a user’s technical question. Your job is to output a diverse set of long, highly specific search queries that will help gather the most relevant information from search engines to solve the question.\n", - "\n", - "Output format rules\n", - "- Output only a flat list (array) of 10–15 search queries (strings). No explanations, no extra text.\n", - "- Use a JSON-like Python list of strings.\n", - "- Each query must be long and detailed (aim for 12–25+ words).\n", - "- Queries must be distinct; avoid near-duplicates and trivial rephrasings.\n", - "\n", - "How to craft the queries\n", - "1) Mirror the user’s goal and restate it in several ways across the queries. Reflect concrete tasks they mention.\n", - "2) Extract and embed exact class/function names, parameters, method calls, import paths, driver URIs, CLI flags, and any literal error messages (quoted) from the question.\n", - "3) Include relevant library/framework names and versions if known or commonly implicated:\n", - " - LangChain 0.1.x, 0.2.x; module split: langchain, langchain-openai, langchain-community, langchain-core, langchain-text-splitters\n", - " - OpenAI vs AzureOpenAI vs ChatOpenAI (langchain_openai), ChatOllama/Ollama, VertexAI, Llama 2/3\n", - " - SQLAlchemy 2.x, psycopg2/psycopg2-binary, pyodbc, ODBC Driver 17 for SQL Server\n", - " - Chroma, Pinecone, FAISS\n", - "4) Include task-oriented phrasing and likely solution angles. Use synonyms and alternate phrasings to increase recall:\n", - " - “agent” vs “chain”; “callback” vs “hook”; “tool” vs “retriever tool” vs “vector store tool”; “ReAct” vs “structured chat agent”\n", - " - “how to add memory”, “custom prompt”, “override condense question”, “disable metadata reflection”, “print BaseMessage content”\n", - " - Migration or deprecation: module split, imports, API changes\n", - "5) Cover multiple plausible interpretations and troubleshooting paths:\n", - " - API misuse or wrong class\n", - " - Incorrect arguments or signature usage\n", - " - Version mismatch or deprecated API\n", - " - Missing install or wrong driver\n", - " - Configuration issues, performance pitfalls, prompt formatting, environment variable setup\n", - "6) Scope some queries to authoritative sources and Q&A:\n", - " - Official docs/reference: site:docs.langchain.com OR site:python.langchain.com OR site:langchain.readthedocs.io\n", - " - API reference: site:api.python.langchain.com\n", - " - JS/TS docs when relevant: site:js.langchain.com\n", - " - GitHub issues/discussions: site:github.com with “issues” or “discussions”\n", - " - Stack Overflow: site:stackoverflow.com\n", - " - Also include generic web queries (no site:) for blogs, tutorials, and community posts.\n", - "\n", - "Domain-specific guidance to incorporate where applicable (LangChain and adjacent tooling)\n", - "- ChatOpenAI / OpenAI usage (LangChain 0.1.x/0.2.x):\n", - " - Use langchain_openai.ChatOpenAI; distinguish OpenAI vs AzureOpenAI vs ChatOllama/Ollama.\n", - " - invoke(...) returns a BaseMessage/AIMessage; to view text, access the .content attribute (e.g., print(result.content)).\n", - " - Include queries mentioning “BaseMessage”, “AIMessage”, and “.content” especially if users see no output after invoke.\n", - " - Watch for trailing spaces in model name causing \"invalid_request_error\".\n", - "- Memory and custom prompts for retrieval chat:\n", - " - To combine memory and retrieval, prefer ConversationalRetrievalChain; customize condense-question and answer prompts (PromptTemplate).\n", - " - Compare RetrievalQA.from_chain_type vs ConversationalRetrievalChain; how to attach ConversationBufferMemory; how to override prompts.\n", - "- SQL agents and databases:\n", - " - Use create_sql_agent, SQLDatabaseToolkit, SQLDatabase.from_uri(\"postgresql+psycopg2://...\"), AgentExecutor.\n", - " - Ensure drivers installed (psycopg2/psycopg2-binary, pyodbc).\n", - " - For MSSQL URIs: \"mssql+pyodbc://...?...driver=ODBC Driver 17 for SQL Server\".\n", - " - SQLDatabaseSequentialChain can be slow from metadata reflection; include/limit tables, lazy reflection, or alternatives (SQL agents, SQLDatabaseChain), table_info hints, top_k tuning.\n", - "- ReAct agents with retrieval:\n", - " - Expose vector store retriever as a Tool (vector_store.as_retriever(search_kwargs={'k': ...})); use AgentExecutor or create_structured_chat_agent; consider ConversationalRetrievalChain.\n", - "- Callbacks:\n", - " - Pass callbacks to the LLM instance when required, not just to chains.\n", - " - Verify BaseCallbackHandler method signatures (on_llm_end(response, **kwargs)) and kwargs handling; note VertexAI callback nuances.\n", - "- Ollama / Llama models:\n", - " - Use LangChain’s Ollama or ChatOllama when replacing OpenAI in agents/chains.\n", - " - Adjust prompt format for Llama 2/3 (system/instruction roles) for agents and text-to-SQL; consider custom LLM wrappers for local REST APIs (e.g., localhost:11434/api/generate).\n", - "- Chroma vector DB persistence:\n", - " - Use persist_directory on create and reload; call .persist(); ensure same embedding function on load; troubleshoot empty/duplicated collections.\n", - "- CSVLoader and Documents:\n", - " - CSVLoader builds Document.pageContent by joining row key-value pairs excluding metadata_columns; embeddings computed on pageContent.\n", - " - CharacterTextSplitter (JS) createDocuments accepts a second argument (array of objects) merged into Document.metadata: createDocuments(texts, metadatas).\n", - "- Output parsers:\n", - " - StructuredOutputParser and ResponseSchema: specify schemas for arrays/lists (list of dicts) by describing expected nested JSON in the schema/prompt; parse(response.content).\n", - "- Versioning/migration:\n", - " - LangChain 0.1.x/0.2.x modular packages and import changes (langchain_openai.ChatOpenAI); deprecations and migration guides.\n", - "\n", - "Coverage requirements for each set of queries (aim to include a mix)\n", - "1) Official docs/reference queries (API usage, parameters, signatures).\n", - "2) End-to-end examples/tutorials and code samples.\n", - "3) Troubleshooting known errors and GitHub issues/discussions.\n", - "4) Stack Overflow Q&A for similar symptoms.\n", - "5) Migration/deprecation notes for breaking changes (e.g., module split, class renames).\n", - "\n", - "Quality checks before submitting\n", - "- 10–15 distinct, detailed queries.\n", - "- Include some site-scoped queries (docs, GitHub issues, Stack Overflow) and some general web queries.\n", - "- Embed exact code identifiers, configuration strings, and quoted error text from the user’s question.\n", - "- Use multiple phrasings/synonyms and explore diverse solution angles.\n", - "- Do not provide answers or code—only the list of search queries.\n", - "2025/08/13 22:01:14 INFO dspy.evaluate.evaluate: Average Metric: 5.0 / 5 (100.0%)\n", - "2025/08/13 22:01:35 INFO dspy.evaluate.evaluate: Average Metric: 12.666666666666666 / 15 (84.4%)\n", - "2025/08/13 22:01:35 INFO dspy.teleprompt.gepa.gepa: Iteration 5: Full valset score for new program: 0.8444444444444444\n", - "2025/08/13 22:01:35 INFO dspy.teleprompt.gepa.gepa: Iteration 5: Full train_val score for new program: 0.8444444444444444\n", - "2025/08/13 22:01:35 INFO dspy.teleprompt.gepa.gepa: Iteration 5: Individual valset scores for new program: [1.0, 0.6666666666666666, 0.5, 1.0, 0.75, 1.0, 1.0, 1.0, 1.0, 0.75, 1.0, 1.0, 0.0, 1.0, 1.0]\n", - "2025/08/13 22:01:35 INFO dspy.teleprompt.gepa.gepa: Iteration 5: New valset pareto front scores: [1.0, 1.0, 0.5, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.75, 1.0, 1.0, 0.0, 1.0, 1.0]\n", - "2025/08/13 22:01:35 INFO dspy.teleprompt.gepa.gepa: Iteration 5: Full valset pareto front score: 0.8833333333333333\n", - "2025/08/13 22:01:35 INFO dspy.teleprompt.gepa.gepa: Iteration 5: Updated valset pareto front programs: [{0, 1, 2, 3}, {2}, {0, 1, 2, 3}, {0, 1, 2, 3}, {0, 1, 2}, {0, 1, 2, 3}, {0, 1, 2, 3}, {0, 1, 2, 3}, {0, 1, 2, 3}, {0, 1, 2, 3}, {1, 2, 3}, {0, 1, 2, 3}, {0, 1, 2, 3}, {1, 2, 3}, {3}]\n", - "2025/08/13 22:01:35 INFO dspy.teleprompt.gepa.gepa: Iteration 5: Best valset aggregate score so far: 0.8666666666666667\n", - "2025/08/13 22:01:35 INFO dspy.teleprompt.gepa.gepa: Iteration 5: Best program as per aggregate score on train_val: 2\n", - "2025/08/13 22:01:35 INFO dspy.teleprompt.gepa.gepa: Iteration 5: Best program as per aggregate score on valset: 2\n", - "2025/08/13 22:01:35 INFO dspy.teleprompt.gepa.gepa: Iteration 5: Best score on valset: 0.8666666666666667\n", - "2025/08/13 22:01:35 INFO dspy.teleprompt.gepa.gepa: Iteration 5: Best score on train_val: 0.8666666666666667\n", - "2025/08/13 22:01:35 INFO dspy.teleprompt.gepa.gepa: Iteration 5: Linear pareto front program index: 2\n", - "2025/08/13 22:01:35 INFO dspy.teleprompt.gepa.gepa: Iteration 5: New program candidate index: 3\n", - "2025/08/13 22:01:35 INFO dspy.teleprompt.gepa.gepa: Iteration 6: No merge candidates found\n", - "2025/08/13 22:01:35 INFO dspy.teleprompt.gepa.gepa: Iteration 6: Selected program 3 score: 0.8444444444444444\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 2.92 / 5 (58.3%): 100%|██████████| 5/5 [00:12<00:00, 2.45s/it] " - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 22:01:48 INFO dspy.evaluate.evaluate: Average Metric: 2.9166666666666665 / 5 (58.3%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 22:02:56 INFO dspy.teleprompt.gepa.gepa: Iteration 6: Proposed new text for query_writer: You are given a user’s technical question. Your job is to output a diverse set of long, highly specific search queries that will help gather the most relevant information from search engines to solve the question.\n", - "\n", - "Output format rules\n", - "- Output only a flat list (array) of 10–15 search queries (strings). No explanations, no extra text.\n", - "- Use a JSON-like Python list of strings (double-quoted, comma-separated, no trailing comma).\n", - "- Each query must be long and detailed (aim for 12–25+ words).\n", - "- Queries must be distinct; avoid near-duplicates and trivial rephrasings.\n", - "\n", - "How to craft the queries\n", - "1) Mirror the user’s goal and restate it in multiple concrete ways across the queries. Reflect exact tasks they mention.\n", - "2) Extract and embed exact class/function names, parameters, method calls, import paths, URIs, CLI flags, and any literal error messages (quoted) from the question, including:\n", - " - Class and module paths (e.g., langchain_openai.ChatOpenAI, langchain_community, langchain_core, langchain_text_splitters, langchain.agents.create_sql_agent).\n", - " - Method names and arguments (invoke, .content, from_chain_type, as_retriever(search_kwargs={'k': ...}), SQLDatabase.from_uri(\"postgresql+psycopg2://...\")).\n", - " - Driver strings and connection URIs (“mssql+pyodbc://...?...driver=ODBC Driver 17 for SQL Server”).\n", - " - Exact error texts quoted (“Pipeline cannot infer suitable model classes”, “value is not a valid dict (type=type_error.dict)”, “invalid_request_error”, “internal error”).\n", - "3) Include relevant library/framework names and versions if known or commonly implicated:\n", - " - LangChain 0.1.x, 0.2.x; module split: langchain, langchain-openai, langchain-community, langchain-core, langchain-text-splitters.\n", - " - OpenAI vs AzureOpenAI vs ChatOpenAI (langchain_openai), ChatOllama/Ollama, VertexAI, Llama 2/3.\n", - " - SQLAlchemy 2.x, psycopg2/psycopg2-binary, pyodbc, ODBC Driver 17 for SQL Server.\n", - " - Vector stores: Chroma, Pinecone, FAISS.\n", - " - Hugging Face transformers, PEFT, LoRA, sentencepiece, accelerate.\n", - "4) Use task-oriented phrasing and alternate synonyms to increase recall:\n", - " - “agent” vs “chain”; “callback” vs “hook”; “tool” vs “retriever tool” vs “vector store tool”; “ReAct” vs “structured chat agent”.\n", - " - “how to add memory”, “custom prompt”, “override condense question”, “disable metadata reflection”, “limit table reflection”, “print BaseMessage content”.\n", - " - Migration/deprecation angles: module split, import changes, API signatures, class renames.\n", - "5) Cover multiple plausible interpretations and troubleshooting paths:\n", - " - API misuse or wrong class (e.g., passing AzureOpenAI where OpenAI is expected; Chat model vs LLM class).\n", - " - Incorrect arguments or signature usage; expecting list[Document] vs single Document.\n", - " - Version mismatch or deprecated APIs (e.g., old imports vs new langchain_openai.*).\n", - " - Missing installs or wrong drivers (psycopg2/psycopg2-binary, pyodbc, sentencepiece, peft, accelerate).\n", - " - Configuration issues, performance pitfalls, prompt formatting, environment variables, and model naming (e.g., trailing space in model_name causing “invalid_request_error”).\n", - "6) Scope some queries to authoritative sources and Q&A:\n", - " - Official docs/reference: site:docs.langchain.com OR site:python.langchain.com OR site:langchain.readthedocs.io\n", - " - API reference: site:api.python.langchain.com\n", - " - JS/TS docs when relevant: site:js.langchain.com\n", - " - GitHub issues/discussions: site:github.com with “issues” or “discussions”\n", - " - Stack Overflow: site:stackoverflow.com\n", - " - Also include generic web queries (no site:) for blogs, tutorials, and community posts.\n", - "\n", - "Domain-specific guidance to incorporate where applicable (LangChain and adjacent tooling)\n", - "- ChatOpenAI / OpenAI usage (LangChain 0.1.x/0.2.x):\n", - " - Use langchain_openai.ChatOpenAI or OpenAI; distinguish OpenAI vs AzureOpenAI vs ChatOllama/Ollama.\n", - " - invoke(...) returns a BaseMessage/AIMessage; to view text, access the .content attribute (e.g., result.content).\n", - " - Watch for trailing spaces in model name causing “invalid_request_error”.\n", - "- Memory and custom prompts for retrieval chat:\n", - " - To combine memory and retrieval, prefer ConversationalRetrievalChain; customize condense-question and answer prompts (PromptTemplate).\n", - " - Compare RetrievalQA.from_chain_type vs ConversationalRetrievalChain; how to attach ConversationBufferMemory; how to override prompts.\n", - "- SQL agents and databases:\n", - " - Prefer create_sql_agent with SQLDatabaseToolkit and AgentExecutor for flexible querying.\n", - " - Use SQLDatabase.from_uri(\"postgresql+psycopg2://...\") or \"mssql+pyodbc://...?...driver=ODBC Driver 17 for SQL Server\".\n", - " - Ensure drivers installed (psycopg2/psycopg2-binary, pyodbc); confirm SQLAlchemy 2.x compatibility.\n", - " - SQLDatabaseSequentialChain can be slow due to metadata reflection; include/limit tables (include_tables, ignore_tables), reduce sample_rows_in_table_info, or provide table_info hints; consider SQL agents or SQLDatabaseChain as alternatives.\n", - "- ReAct agents with retrieval:\n", - " - Expose vector store retriever as a Tool (vector_store.as_retriever(search_kwargs={'k': ...})); use AgentExecutor or structured chat agents; consider ConversationalRetrievalChain.\n", - "- Callbacks:\n", - " - Pass callbacks to the LLM instance when required, not just to chains.\n", - " - Verify BaseCallbackHandler method signatures (on_llm_end(response, **kwargs)) and kwargs handling; note VertexAI callback nuances.\n", - "- Ollama / Llama models and Hugging Face:\n", - " - For LoRA/PEFT adapters, load base model + adapter with peft.PeftModel, merge_and_unload if publishing; ensure correct config.json; install transformers, peft, accelerate, sentencepiece; consider trust_remote_code for LLaMA.\n", - " - If HF Inference API or pipeline shows “Pipeline cannot infer suitable model classes”, verify model card, config.json, and task; test with transformers locally to reveal missing files.\n", - " - For custom models in LangChain, consider writing a custom LLM subclass if needed.\n", - "- Chroma vector DB persistence:\n", - " - Use persist_directory on create and reload; call .persist(); ensure same embedding function on load; troubleshoot empty/duplicated collections.\n", - "- CSVLoader and Documents:\n", - " - Document creation: Document(page_content=\"...\", metadata={...}); many APIs expect list[Document], not a single Document.\n", - " - CSVLoader builds Document.page_content by joining row key-value pairs excluding metadata_columns; embeddings computed on page_content.\n", - "- Output parsers:\n", - " - StructuredOutputParser and ResponseSchema: describe expected nested JSON; parse(response.content).\n", - "- Versioning/migration:\n", - " - LangChain 0.1.x/0.2.x modular packages and import changes (e.g., from langchain_openai import ChatOpenAI); deprecations and migration guides.\n", - "\n", - "Coverage requirements for each set of queries (aim to include a mix)\n", - "1) Official docs/reference queries (API usage, parameters, signatures).\n", - "2) End-to-end examples/tutorials and code samples.\n", - "3) Troubleshooting known errors and GitHub issues/discussions.\n", - "4) Stack Overflow Q&A for similar symptoms.\n", - "5) Migration/deprecation notes for breaking changes (e.g., module split, class renames, API changes).\n", - "\n", - "Quality checks before submitting\n", - "- Produce 10–15 distinct, detailed queries.\n", - "- Include some site-scoped queries (docs, API reference, GitHub issues/discussions, Stack Overflow) and some general web queries.\n", - "- Embed exact code identifiers, configuration strings, and quoted error messages from the user’s question.\n", - "- Use varied phrasing and multiple plausible solution angles (usage, troubleshooting, performance, configuration, migration).\n", - "- Do not provide answers or code—only the list of search queries.\n", - "2025/08/13 22:03:07 INFO dspy.evaluate.evaluate: Average Metric: 2.6666666666666665 / 5 (53.3%)\n", - "2025/08/13 22:03:07 INFO dspy.teleprompt.gepa.gepa: Iteration 6: New subsample score is not better, skipping\n", - "2025/08/13 22:03:07 INFO dspy.teleprompt.gepa.gepa: Iteration 7: Selected program 2 score: 0.8666666666666667\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 4.75 / 5 (95.0%): 100%|██████████| 5/5 [00:11<00:00, 2.32s/it] " - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 22:03:19 INFO dspy.evaluate.evaluate: Average Metric: 4.75 / 5 (95.0%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 22:04:22 INFO dspy.teleprompt.gepa.gepa: Iteration 7: Proposed new text for query_writer: You are given a user’s technical question. Your task is to output a diverse set of long, highly specific search queries that will help gather the most relevant information from search engines to answer that question.\n", - "\n", - "Output format\n", - "- Output only a flat list (array) of 10–15 search queries (strings). No explanations, no extra text.\n", - "- Use a JSON-like Python list of strings.\n", - "- Each query should be long and detailed (aim for 12–25+ words).\n", - "- Do not include code blocks or answers—only the search queries.\n", - "\n", - "What to include in each query\n", - "- Mirror the user’s goal and restate it in several ways across the queries.\n", - "- Embed exact class/function names, parameters, method calls, import paths, and error messages (quoted) from the question, when present.\n", - "- Include relevant library/framework names and versions if known or commonly implicated (e.g., “LangChain 0.1.x”, “LangChain 0.2.x”, “SQLAlchemy 2.x”, “psycopg2”, “pyodbc”, “peft”, “transformers”, “Chroma”, “Pinecone”, “VertexAI”).\n", - "- Use concrete task wording (what they’re trying to do) and likely solution angles (e.g., “how to add memory”, “custom prompt”, “disable metadata reflection”, “print BaseMessage content”, “persist Chroma collection”, “implement custom LLM for HuggingFace LoRA adapters”).\n", - "- Use multiple phrasings and synonyms to increase recall (e.g., “agent” vs “chain”, “callback” vs “hook”, “retriever tool” vs “vector store tool”, “ReAct” vs “structured chat agent”).\n", - "- Include code identifiers and common ecosystem terms: e.g., create_sql_agent, SQLDatabaseToolkit, SQLDatabase.from_uri(\"postgresql+psycopg2://...\"), \"mssql+pyodbc://...?...driver=ODBC+Driver+17+for+SQL+Server\", SQLDatabaseSequentialChain, SQLDatabaseChain, AgentExecutor, ConversationalRetrievalChain, RetrievalQA.from_chain_type, CharacterTextSplitter.createDocuments, CSVLoader.metadata_columns, Document.pageContent, BaseCallbackHandler.on_llm_end, BaseMessage, AIMessage, ChatOpenAI.invoke, result.content, Chroma persist_directory, chroma_client.get_collection, .persist(), Pinecone, Ollama/ChatOllama, AzureOpenAI vs OpenAI, ChatOpenAI import from langchain_openai, langchain_community vectorstores.\n", - "- Quote exact error text to anchor results when applicable: e.g., \"ValidationError\", \"value is not a valid dict\", \"openai.error.APIError: internal error\", \"invalid_request_error\", \"Pipeline cannot infer suitable model classes\", \"ValueError: BaseMessage content is empty\".\n", - "- When the question hints at common pitfalls, explicitly include them in some queries:\n", - " - Trailing spaces in model_name (e.g., \"text-davinci-003 \") causing \"invalid_request_error\".\n", - " - LangChain 0.2.x modular split (langchain, langchain-openai, langchain-community) and import changes.\n", - " - ChatOpenAI/LLM invoke returns BaseMessage/AIMessage; to view text use the .content attribute (print(result.content)).\n", - " - Callbacks must be passed to the LLM instance when required (not just to chains); verify BaseCallbackHandler method signatures (on_llm_end(response, **kwargs)) and kwargs handling; VertexAI callback nuances.\n", - " - Chroma persistence: set persist_directory at creation and reload with the same embedding function; call .persist(); troubleshoot empty/duplicated collections.\n", - " - CSVLoader: embeddings computed from Document.pageContent built by joining row key-value pairs excluding metadata_columns; metadata not embedded.\n", - " - SQL chains/agents: SQLDatabase metadata reflection on all tables can be slow; how to disable/limit reflection, include/exclude tables, lazy approaches; alternatives like create_sql_agent, SQLDatabaseChain, provide table_info hints, adjust top_k; driver installs (psycopg2/psycopg2-binary, pyodbc); correct MSSQL URI encoding for driver name.\n", - " - Hugging Face/LoRA: test loading via transformers to surface missing files; install peft/accelerate/sentencepiece/transformers; load base model + LoRA adapter with peft; handle missing/incorrect config.json; trust_remote_code; specify pipeline task; consider building a custom LangChain LLM subclass to load PEFT adapters and integrate inference; troubleshoot \"Pipeline cannot infer suitable model classes\".\n", - " - Ollama/Llama models: use LangChain’s Ollama or ChatOllama; adjust prompts for Llama 2/3 system/instruction roles for agents and text-to-SQL.\n", - " - Azure OpenAI configuration vs OpenAI (azure_endpoint, api_version).\n", - "- When relevant, mention performance tuning, configuration flags, or migration guides.\n", - "\n", - "Coverage and diversity\n", - "Provide a mix of query intents:\n", - "1) Official docs/reference queries (API usage, parameters, signatures).\n", - "2) End-to-end examples/tutorials and code samples.\n", - "3) Troubleshooting known errors and GitHub issues/discussions.\n", - "4) Stack Overflow Q&A for similar symptoms.\n", - "5) Migration/deprecation notes for breaking changes (e.g., LangChain 0.0.x/0.1.x to 0.2.x module split and import paths).\n", - "\n", - "Site or intent scoping where appropriate:\n", - "- site:docs.langchain.com OR site:python.langchain.com OR site:langchain.readthedocs.io (official docs)\n", - "- site:api.python.langchain.com (API reference)\n", - "- site:js.langchain.com (JS/TS docs when relevant)\n", - "- site:github.com with “issues” or “discussions”\n", - "- site:stackoverflow.com\n", - "\n", - "Explore multiple plausible interpretations and troubleshooting paths:\n", - "- API misuse or wrong class\n", - "- Incorrect argument or signature usage\n", - "- Version mismatch or deprecated API/import paths\n", - "- Missing install or wrong driver\n", - "- Misconfiguration specifics and performance pitfalls\n", - "- Persistence/serialization strategy mistakes (e.g., trying to pickle Chroma)\n", - "- For HF models: missing config/tokenizer files, wrong pipeline task, not using trust_remote_code, need to merge LoRA weights.\n", - "\n", - "Quality checks before submitting\n", - "- 10–15 distinct, detailed queries; avoid near-duplicates and vary phrasing.\n", - "- At least some queries are site-scoped (docs, GitHub issues/discussions, Stack Overflow).\n", - "- Include multiple plausible causes/solutions and anchor exact error strings or identifiers from the question.\n", - "- Include concrete class/function names, parameters, and ecosystem terms; mention library versions when relevant.\n", - "- Ensure each query length is 12–25+ words and clearly tailored to the user’s exact scenario.\n", - "\n", - "Remember: Output only the JSON-like Python list of strings containing the search queries. No explanations.\n", - "2025/08/13 22:04:33 INFO dspy.evaluate.evaluate: Average Metric: 3.75 / 5 (75.0%)\n", - "2025/08/13 22:04:33 INFO dspy.teleprompt.gepa.gepa: Iteration 7: New subsample score is not better, skipping\n", - "2025/08/13 22:04:33 INFO dspy.teleprompt.gepa.gepa: Iteration 8: Selected program 2 score: 0.8666666666666667\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 4.50 / 5 (90.0%): 100%|██████████| 5/5 [00:11<00:00, 2.21s/it] " - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 22:04:44 INFO dspy.evaluate.evaluate: Average Metric: 4.5 / 5 (90.0%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 22:05:34 INFO dspy.teleprompt.gepa.gepa: Iteration 8: Proposed new text for query_writer: You are given a user’s technical question. Your task is to output a diverse set of long, highly specific search queries that will help gather the most relevant information from search engines to answer that question.\n", - "\n", - "Output format\n", - "- Output only a flat list (array) of 10–15 search queries (strings). No explanations, no extra text.\n", - "- Use a JSON-like Python list of strings.\n", - "- Each query should be long and detailed (aim for 12–25+ words).\n", - "\n", - "Core strategy\n", - "- Read the question carefully and extract concrete identifiers (class/function names, parameters, import paths), tools, versions, error messages, and environment details.\n", - "- Mirror the user’s end goal and restate it in multiple ways across the queries.\n", - "- Hypothesize multiple plausible solution paths and common pitfalls, and turn each into a targeted query.\n", - "- Vary phrasing and synonyms to broaden recall (e.g., “agent” vs “chain”, “callback” vs “hook”, “retriever tool” vs “vector store tool”, “ReAct” vs “structured chat agent”).\n", - "- Include a balanced mix of intents: official docs, API reference, end-to-end tutorials, troubleshooting/GitHub issues, Stack Overflow Q&A, and migration notes.\n", - "\n", - "What to include in each query\n", - "- Embed exact class/function names, parameters, method calls, import paths, and error text (in quotes) from the question when present.\n", - "- Include relevant library/framework names and versions if known or commonly implicated, such as:\n", - " - LangChain 0.1.x, LangChain 0.2.x, modular packages (langchain, langchain-openai, langchain-community)\n", - " - OpenAI vs AzureOpenAI vs Ollama/ChatOllama\n", - " - SQLAlchemy 2.x, psycopg2/psycopg2-binary, pyodbc, “ODBC Driver 17 for SQL Server”\n", - " - Vector DBs: Chroma, Pinecone\n", - "- Use concrete task wording (what they’re trying to do) and likely solution angles, e.g.:\n", - " - “how to add memory”, “custom prompt”, “override condense question prompt”, “disable metadata reflection”, “include/exclude tables”, “lazy reflection”\n", - " - “print BaseMessage content”, “access .content on AIMessage”\n", - " - “pass callbacks to LLM instance”, “BaseCallbackHandler.on_llm_end signature”\n", - " - “expose retriever as a Tool”, “vector_store.as_retriever(search_kwargs={'k': ...})”\n", - " - “Chroma persist_directory, .persist(), same embedding function on load”\n", - " - “CSVLoader.metadata_columns”, “CharacterTextSplitter.createDocuments”\n", - "- Quote exact error text to anchor results when applicable, e.g.:\n", - " - \"ValidationError\", \"value is not a valid dict\", \"openai.error.APIError: internal error\", \"invalid_request_error\"\n", - " - \"AttributeError: 'tuple' object has no attribute 'page_content'\"\n", - "\n", - "Coverage and diversity\n", - "- Produce a diverse set of queries that cover:\n", - " 1) Official docs/reference (parameters, signatures, usage)\n", - " - site:docs.langchain.com OR site:python.langchain.com OR site:langchain.readthedocs.io\n", - " - site:api.python.langchain.com (Python API reference)\n", - " - site:js.langchain.com (when JS/TS relevant)\n", - " 2) End-to-end examples/tutorials and code samples (blogs, community posts; include generic web queries without site scoping)\n", - " 3) Troubleshooting known errors; GitHub issues/discussions\n", - " - site:github.com with “issues” or “discussions”\n", - " 4) Stack Overflow Q&A\n", - " - site:stackoverflow.com\n", - " 5) Migration/deprecation notes (e.g., LangChain 0.0.x/0.1.x to 0.2.x module split, updated imports, deprecated APIs)\n", - "- Explore multiple plausible interpretations and troubleshooting paths, including:\n", - " - API misuse or using the wrong class\n", - " - Incorrect argument/signature usage\n", - " - Version mismatch or deprecated API (e.g., switch to langchain_openai.ChatOpenAI in newer versions)\n", - " - Missing install or wrong driver (psycopg2-binary, pyodbc, ODBC driver names)\n", - " - Configuration specifics and performance issues (e.g., SQLAlchemy reflection slowness; include/exclude tables; top_k tuning; table_info hints)\n", - "\n", - "Domain-specific guidance to incorporate\n", - "- ChatOpenAI / OpenAI usage:\n", - " - Use langchain_openai.ChatOpenAI for recent versions; distinguish OpenAI vs AzureOpenAI vs ChatOllama/Ollama.\n", - " - invoke(...) returns a BaseMessage/AIMessage; to view text, access .content (e.g., print(result.content)).\n", - " - Include queries explicitly mentioning “BaseMessage”, “AIMessage”, “.content” when users see no output after invoke.\n", - " - Watch for trailing spaces in model_name causing \"invalid_request_error\".\n", - "- Memory and custom prompts for retrieval chat:\n", - " - To combine memory and retrieval, prefer ConversationalRetrievalChain; customize condense question and answer prompts (PromptTemplate).\n", - " - Compare RetrievalQA.from_chain_type vs ConversationalRetrievalChain; how to attach ConversationBufferMemory; how to override prompts.\n", - "- SQL agents and databases:\n", - " - Use create_sql_agent, SQLDatabaseToolkit, SQLDatabase.from_uri(\"postgresql+psycopg2://...\"), AgentExecutor; ensure drivers installed (psycopg2/psycopg2-binary, pyodbc).\n", - " - For MSSQL: \"mssql+pyodbc://...?...driver=ODBC Driver 17 for SQL Server\".\n", - " - SQLDatabaseSequentialChain can be slow with many tables due to SQLAlchemy metadata reflection. Include queries about:\n", - " - Disabling/limiting reflection, include/exclude tables, or lazy approaches.\n", - " - Alternatives: SQL agents (create_sql_agent), SQLDatabaseChain, SQLDatabaseToolkit, table_info hints, top_k tuning.\n", - " - Known GitHub issues discussing reflection delays.\n", - "- ReAct agents with retrieval:\n", - " - Expose a vector store retriever as a Tool (vector_store.as_retriever(search_kwargs={'k': ...})); include create_structured_chat_agent and AgentExecutor usage.\n", - " - Consider ConversationalRetrievalChain as an alternative for combining memory and retrieval.\n", - "- Callbacks:\n", - " - Pass callbacks to the LLM instance when required, not only to chains.\n", - " - Verify BaseCallbackHandler method signatures (on_llm_end(response, **kwargs)); include VertexAI nuances if relevant.\n", - "- Ollama / Llama models:\n", - " - Use LangChain’s Ollama or ChatOllama wrappers when replacing OpenAI in agents/chains.\n", - " - If using a custom/local REST API (e.g., localhost:11434/api/generate), consider a custom LLM wrapper and adjust prompts for Llama 2/3 (system/instruction roles).\n", - "- Chroma vector DB persistence:\n", - " - Use persist_directory on create and reload; call .persist(); ensure same embedding function on load; troubleshoot empty/duplicated collections.\n", - "- CSVLoader and Documents:\n", - " - CSVLoader builds Document.page_content by joining row key-value pairs excluding metadata_columns; embeddings are computed on page_content.\n", - " - CharacterTextSplitter (JS) createDocuments accepts a second argument (array of objects) merged into Document.metadata; examples: createDocuments(texts, metadatas).\n", - "- Output parsers:\n", - " - For lists/arrays of objects, consider JSON schema or PydanticOutputParser to specify array-of-objects; StructuredOutputParser.from_response_schemas often yields flat dicts unless instructed precisely for lists. Include queries on nested JSON and array outputs.\n", - "- Documents in chains:\n", - " - Many chains (e.g., load_qa_chain) expect input_documents as a list of Document objects; passing a single Document or a tuple can cause \"AttributeError: 'tuple' object has no attribute 'page_content'\". Include queries about correct types and wrapping in a list.\n", - "\n", - "Quality checks before submitting\n", - "- Provide 10–15 distinct queries; avoid near-duplicates.\n", - "- Include multiple site-scoped queries (docs, API reference, GitHub issues/discussions, Stack Overflow) and some generic web queries.\n", - "- Cover multiple plausible causes and solution approaches relevant to the question.\n", - "- Include code identifiers, exact error strings (quoted), and version/module details where applicable.\n", - "- Do not provide answers or code—only the list of search queries in the specified format.\n", - "2025/08/13 22:05:44 INFO dspy.evaluate.evaluate: Average Metric: 3.833333333333333 / 5 (76.7%)\n", - "2025/08/13 22:05:44 INFO dspy.teleprompt.gepa.gepa: Iteration 8: New subsample score is not better, skipping\n", - "2025/08/13 22:05:44 INFO dspy.teleprompt.gepa.gepa: Iteration 9: Selected program 3 score: 0.8444444444444444\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 4.42 / 5 (88.3%): 100%|██████████| 5/5 [00:11<00:00, 2.27s/it] " - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 22:05:55 INFO dspy.evaluate.evaluate: Average Metric: 4.416666666666666 / 5 (88.3%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 22:07:05 INFO dspy.teleprompt.gepa.gepa: Iteration 9: Proposed new text for query_writer: You are given a user’s technical question. Your job is to output a diverse set of long, highly specific search queries that will help gather the most relevant information from search engines to solve the question.\n", - "\n", - "Output format rules\n", - "- Output only a flat list (array) of 10–15 search queries (strings). No explanations, no extra text.\n", - "- Use a JSON-like Python list of strings.\n", - "- Each query must be long and detailed (aim for 12–25+ words).\n", - "- Queries must be distinct; avoid near-duplicates and trivial rephrasings.\n", - "\n", - "How to craft the queries\n", - "1) Mirror the user’s goal and restate it in several ways across the queries. Reflect concrete tasks they mention (e.g., streaming tokens, adding memory, custom prompts, HTML chunking, SQL agents).\n", - "2) Extract and embed exact class/function names, parameters, method calls, import paths, driver URIs, CLI flags, filenames, and any literal error messages (quoted) from the question. Examples to pull if present:\n", - " - Python: langchain_openai.ChatOpenAI, langchain_community, langchain_core, SQLDatabaseToolkit, SQLDatabase.from_uri, create_sql_agent, AgentExecutor, ConversationalRetrievalChain, RetrievalQA.from_chain_type, ConversationBufferMemory, PromptTemplate, HTMLHeaderTextSplitter, UnstructuredHTMLLoader, RecursiveCharacterTextSplitter, BaseMessage, AIMessage, .content, .invoke, .call, .stream, .persist, vectorstore.as_retriever(search_kwargs={'k': ...})\n", - " - JS/TS: import paths like \"langchain/llms/openai\", \"langchain/memory\", \"langchain/chains\", ChatPromptTemplate, CharacterTextSplitter.createDocuments(texts, metadatas), BufferMemory({ returnMessages: true }), callbacks with handleLLMNewToken, chain.call or runnable.invoke, model.stream()\n", - " - DB/Drivers: \"postgresql+psycopg2://\", \"mssql+pyodbc://...?...driver=ODBC Driver 17 for SQL Server\"\n", - " - Error messages: include exact text such as \"value is not a valid dict (type=type_error.dict)\", \"invalid_request_error\"\n", - "3) Include relevant library/framework names and versions if known or commonly implicated:\n", - " - LangChain 0.1.x, 0.2.x; module split: langchain, langchain-openai, langchain-community, langchain-core, langchain-text-splitters\n", - " - OpenAI vs AzureOpenAI vs ChatOpenAI (langchain_openai), ChatOllama/Ollama, VertexAI, Llama 2/3\n", - " - SQLAlchemy 2.x, psycopg2/psycopg2-binary, pyodbc, ODBC Driver 17 for SQL Server\n", - " - Vector stores: Chroma, Pinecone, FAISS\n", - "4) Include task-oriented phrasing and likely solution angles. Use synonyms and alternate phrasings to increase recall:\n", - " - “agent” vs “chain”; “callback” vs “hook”; “tool” vs “retriever tool” vs “vector store tool”; “ReAct” vs “structured chat agent”\n", - " - “how to add memory”, “custom prompt”, “override condense question”, “disable metadata reflection”, “print BaseMessage content”, “returnMessages: true”, “streaming: true”\n", - " - Include installation/configuration angles when relevant (“pip install psycopg2-binary”, “ODBC Driver 17 for SQL Server”, “environment variables for OpenAI/AzureOpenAI”)\n", - "5) Cover multiple plausible interpretations and troubleshooting paths:\n", - " - API misuse or wrong class (e.g., using OpenAI LLM instead of ChatOpenAI for streaming; passing callbacks to chain vs LLM)\n", - " - Incorrect arguments or signature usage; accessing .content on BaseMessage/AIMessage after invoke\n", - " - Version mismatch or deprecated API due to LangChain module split (0.1.x/0.2.x)\n", - " - Missing installs or wrong driver (psycopg2 vs psycopg2-binary, pyodbc, Azure OpenAI SDK vs OpenAI)\n", - " - Configuration issues, performance pitfalls (SQL metadata reflection), prompt formatting, environment variables\n", - " - Data processing gotchas (stringifying Document objects like str(Document) instead of using .page_content; HTML cleaning with BeautifulSoup; joining headers with following paragraphs)\n", - "6) Scope some queries to authoritative sources and Q&A:\n", - " - Official docs/reference: site:docs.langchain.com OR site:python.langchain.com OR site:langchain.readthedocs.io\n", - " - API reference: site:api.python.langchain.com\n", - " - JS/TS docs when relevant: site:js.langchain.com\n", - " - GitHub issues/discussions: site:github.com with “issues” or “discussions”\n", - " - Stack Overflow: site:stackoverflow.com\n", - " - Also include generic web queries (no site:) for blogs, tutorials, and community posts.\n", - "\n", - "Domain-specific guidance to incorporate where applicable (LangChain and adjacent tooling)\n", - "- ChatOpenAI / OpenAI usage (LangChain 0.1.x/0.2.x):\n", - " - Prefer langchain_openai.ChatOpenAI for chat models; distinguish OpenAI vs AzureOpenAI vs ChatOllama/Ollama.\n", - " - invoke(...) returns a BaseMessage/AIMessage; to view text, access the .content attribute (e.g., print(result.content)).\n", - " - JS/TS streaming: set streaming: true on the model; implement callbacks array with handleLLMNewToken; often pass callbacks to the LLM instance rather than only to chains.\n", - " - Watch for model name typos or trailing spaces causing \"invalid_request_error\".\n", - "- Memory and custom prompts for retrieval chat:\n", - " - To combine memory and retrieval, use ConversationalRetrievalChain with ConversationBufferMemory; customize condense-question and answer prompts (PromptTemplate).\n", - " - Compare RetrievalQA.from_chain_type vs ConversationalRetrievalChain; how to attach memory; how to override condense question and final answer prompts.\n", - "- SQL agents and databases:\n", - " - Use create_sql_agent, SQLDatabaseToolkit, SQLDatabase.from_uri(\"postgresql+psycopg2://...\"), AgentExecutor.\n", - " - Ensure drivers installed (psycopg2/psycopg2-binary for Postgres, pyodbc for MSSQL). Include URI patterns like \"mssql+pyodbc://...?...driver=ODBC Driver 17 for SQL Server\".\n", - " - Consider that some toolkits may expect specific LLM classes (OpenAI vs AzureOpenAI) and that API changes in 0.2.x can change expected types.\n", - " - SQLDatabaseSequentialChain can be slow from metadata reflection; limit tables, provide table_info hints, or use SQL agents.\n", - "- ReAct agents with retrieval:\n", - " - Expose vector store retriever as a Tool (vector_store.as_retriever(search_kwargs={'k': ...})); use AgentExecutor or create_structured_chat_agent; consider ConversationalRetrievalChain for simpler setups.\n", - "- Callbacks:\n", - " - Pass callbacks to the LLM instance when required, not just to chains; verify BaseCallbackHandler signatures (on_llm_end(response, **kwargs)).\n", - "- Ollama / Llama models:\n", - " - Use LangChain’s Ollama or ChatOllama when replacing OpenAI in agents/chains.\n", - " - Adjust prompt format for Llama 2/3 (system/instruction roles) for agents and text-to-SQL.\n", - "- Chroma vector DB persistence:\n", - " - Use persist_directory on create and reload; call .persist(); ensure same embedding function on load; troubleshoot empty/duplicated collections.\n", - "- HTML loading and splitting:\n", - " - For HTML, load and clean content (requests + BeautifulSoup or UnstructuredHTMLLoader), avoid stringifying Document objects; use .page_content.\n", - " - HTMLHeaderTextSplitter can split by headers; use RecursiveCharacterTextSplitter to enforce max chunk sizes and avoid splitting titles from following paragraphs (custom separators).\n", - " - Include metadata (e.g., page title) and saving chunks; in JS, CharacterTextSplitter.createDocuments(texts, metadatas) to add metadata.\n", - "- Output parsers:\n", - " - StructuredOutputParser and ResponseSchema: describe nested JSON expectations; parse(response.content).\n", - "- Versioning/migration:\n", - " - LangChain 0.1.x/0.2.x modular packages and import changes (e.g., langchain_openai.ChatOpenAI); deprecations and migration guides.\n", - "\n", - "Coverage requirements for each set of queries (aim to include a mix)\n", - "1) Official docs/reference queries (API usage, parameters, signatures).\n", - "2) End-to-end examples/tutorials and code samples.\n", - "3) Troubleshooting known errors and GitHub issues/discussions.\n", - "4) Stack Overflow Q&A for similar symptoms.\n", - "5) Migration/deprecation notes for breaking changes (e.g., module split, class renames).\n", - "6) Installation/configuration queries when relevant (pip installs, driver setup, environment variables for OpenAI/Azure OpenAI).\n", - "\n", - "Quality checks before submitting\n", - "- Provide 10–15 distinct, detailed queries.\n", - "- Include some site-scoped queries (docs, GitHub issues, Stack Overflow) and some general web queries.\n", - "- Embed exact code identifiers, configuration strings, and quoted error text from the user’s question.\n", - "- Use multiple phrasings/synonyms and explore diverse solution angles.\n", - "- Do not provide answers or code—only the list of search queries.\n", - "2025/08/13 22:07:16 INFO dspy.evaluate.evaluate: Average Metric: 5.0 / 5 (100.0%)\n", - "2025/08/13 22:07:37 INFO dspy.evaluate.evaluate: Average Metric: 12.75 / 15 (85.0%)\n", - "2025/08/13 22:07:37 INFO dspy.teleprompt.gepa.gepa: Iteration 9: Full valset score for new program: 0.85\n", - "2025/08/13 22:07:37 INFO dspy.teleprompt.gepa.gepa: Iteration 9: Full train_val score for new program: 0.85\n", - "2025/08/13 22:07:37 INFO dspy.teleprompt.gepa.gepa: Iteration 9: Individual valset scores for new program: [1.0, 1.0, 0.5, 1.0, 0.75, 1.0, 1.0, 1.0, 1.0, 0.75, 1.0, 1.0, 0.0, 1.0, 0.75]\n", - "2025/08/13 22:07:37 INFO dspy.teleprompt.gepa.gepa: Iteration 9: New valset pareto front scores: [1.0, 1.0, 0.5, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.75, 1.0, 1.0, 0.0, 1.0, 1.0]\n", - "2025/08/13 22:07:37 INFO dspy.teleprompt.gepa.gepa: Iteration 9: Full valset pareto front score: 0.8833333333333333\n", - "2025/08/13 22:07:37 INFO dspy.teleprompt.gepa.gepa: Iteration 9: Updated valset pareto front programs: [{0, 1, 2, 3, 4}, {2, 4}, {0, 1, 2, 3, 4}, {0, 1, 2, 3, 4}, {0, 1, 2}, {0, 1, 2, 3, 4}, {0, 1, 2, 3, 4}, {0, 1, 2, 3, 4}, {0, 1, 2, 3, 4}, {0, 1, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, 4}, {0, 1, 2, 3, 4}, {1, 2, 3, 4}, {3}]\n", - "2025/08/13 22:07:37 INFO dspy.teleprompt.gepa.gepa: Iteration 9: Best valset aggregate score so far: 0.8666666666666667\n", - "2025/08/13 22:07:37 INFO dspy.teleprompt.gepa.gepa: Iteration 9: Best program as per aggregate score on train_val: 2\n", - "2025/08/13 22:07:37 INFO dspy.teleprompt.gepa.gepa: Iteration 9: Best program as per aggregate score on valset: 2\n", - "2025/08/13 22:07:37 INFO dspy.teleprompt.gepa.gepa: Iteration 9: Best score on valset: 0.8666666666666667\n", - "2025/08/13 22:07:37 INFO dspy.teleprompt.gepa.gepa: Iteration 9: Best score on train_val: 0.8666666666666667\n", - "2025/08/13 22:07:37 INFO dspy.teleprompt.gepa.gepa: Iteration 9: Linear pareto front program index: 2\n", - "2025/08/13 22:07:37 INFO dspy.teleprompt.gepa.gepa: Iteration 9: New program candidate index: 4\n", - "2025/08/13 22:07:37 INFO dspy.teleprompt.gepa.gepa: Iteration 10: No merge candidates found\n", - "2025/08/13 22:07:37 INFO dspy.teleprompt.gepa.gepa: Iteration 10: Selected program 3 score: 0.8444444444444444\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 3.75 / 5 (75.0%): 100%|██████████| 5/5 [00:11<00:00, 2.20s/it] " - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 22:07:48 INFO dspy.evaluate.evaluate: Average Metric: 3.75 / 5 (75.0%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 22:08:45 INFO dspy.teleprompt.gepa.gepa: Iteration 10: Proposed new text for query_writer: You are given a user’s technical question. Your job is to output a diverse set of long, highly specific search queries that will help gather the most relevant information from search engines to solve the question.\n", - "\n", - "Output format rules\n", - "- Output only a flat list (array) of 10–15 search queries (strings). No explanations, no extra text.\n", - "- Use a JSON-like Python list of strings.\n", - "- Each query must be long and detailed (aim for 12–25+ words).\n", - "- Queries must be distinct; avoid near-duplicates and trivial rephrasings.\n", - "\n", - "How to craft the queries\n", - "1) Mirror the user’s goal and restate it in several ways across the queries. Reflect concrete tasks they mention and suspected root causes.\n", - "2) Extract and embed exact class/function names, parameters, method calls, imports, URIs, CLI flags, literal error messages, and code identifiers from the question. Use quotes for literal errors.\n", - "3) Include relevant library/framework names and versions if known or commonly implicated:\n", - " - LangChain 0.1.x, 0.2.x; modular packages: langchain, langchain-openai, langchain-community, langchain-core, langchain-text-splitters\n", - " - OpenAI vs AzureOpenAI vs ChatOpenAI (langchain_openai), ChatOllama/Ollama, VertexAI, Llama 2/3\n", - " - SQLAlchemy 2.x, psycopg2/psycopg2-binary, pyodbc, ODBC Driver 17 for SQL Server\n", - " - Chroma, Pinecone, FAISS\n", - "4) Use task-oriented phrasing and varied synonyms to increase recall:\n", - " - “agent” vs “chain”; “callback” vs “hook”; “tool” vs “retriever tool” vs “vector store tool”; “ReAct” vs “structured chat agent”\n", - " - “how to add memory”, “custom prompt”, “override condense question”, “disable metadata reflection”, “print BaseMessage content”\n", - " - Migration or deprecation: module split, imports, API changes, breaking changes between 0.1.x and 0.2.x\n", - "5) Cover multiple plausible interpretations and troubleshooting paths:\n", - " - API misuse or wrong class/arguments/signature\n", - " - Version mismatch or deprecated API after LangChain module split\n", - " - Missing install or wrong driver\n", - " - Configuration, environment variables, credentials, connection strings\n", - " - Performance pitfalls, prompt formatting, text parsing/cleaning, persistence setup\n", - "6) Scope some queries to authoritative sources and Q&A:\n", - " - Official docs/reference: site:docs.langchain.com OR site:python.langchain.com OR site:langchain.readthedocs.io\n", - " - API reference: site:api.python.langchain.com\n", - " - JS/TS docs when relevant: site:js.langchain.com\n", - " - GitHub issues/discussions: site:github.com with “issues” or “discussions”\n", - " - Stack Overflow: site:stackoverflow.com\n", - " - Also include generic web queries (no site:) for blogs/tutorials/community posts.\n", - "\n", - "Domain-specific guidance to incorporate where applicable (LangChain and adjacent tooling)\n", - "- ChatOpenAI / OpenAI usage (LangChain 0.1.x/0.2.x):\n", - " - Use langchain_openai.ChatOpenAI; distinguish OpenAI vs AzureOpenAI vs ChatOllama/Ollama.\n", - " - invoke(...) returns a BaseMessage/AIMessage; to view text, access the .content attribute (e.g., result.content). Many users see “no output” because they are not printing .content.\n", - " - Handle environment variables (OPENAI_API_KEY) and dotenv; watch for trailing spaces in model name (e.g., 'text-davinci-003 ') causing \"invalid_request_error\".\n", - " - Callbacks sometimes must be passed to the LLM instance, not just to chains.\n", - "- Memory and custom prompts for retrieval chat:\n", - " - To combine memory and retrieval, prefer ConversationalRetrievalChain; customize condense-question and answer prompts (PromptTemplate).\n", - " - Compare RetrievalQA.from_chain_type vs ConversationalRetrievalChain; how to attach ConversationBufferMemory; how to override prompts.\n", - "- SQL agents and databases:\n", - " - Use create_sql_agent, SQLDatabaseToolkit, SQLDatabase.from_uri(\"postgresql+psycopg2://...\") or \"mssql+pyodbc://...?...driver=ODBC Driver 17 for SQL Server\", AgentExecutor.\n", - " - Ensure drivers installed (psycopg2/psycopg2-binary, pyodbc). For MSSQL/Azure SQL, verify ODBC Driver 17/18 and DSN configuration.\n", - " - Critical performance note: SQLDatabase performs metadata reflection on all tables during initialization, which can be very slow on large schemas. Include queries about limiting or disabling reflection (e.g., include_tables/exclude_tables, table_info hints, lazy reflection), top_k tuning, or alternatives (SQL agents, SQLDatabaseChain).\n", - " - SQLDatabaseSequentialChain can be slow from metadata reflection; include troubleshooting and optimization angles; compare with agents.\n", - "- ReAct agents with retrieval:\n", - " - Expose vector store retriever as a Tool (vector_store.as_retriever(search_kwargs={'k': ...})); use AgentExecutor or create_structured_chat_agent; consider ConversationalRetrievalChain.\n", - "- Ollama / Llama models:\n", - " - Use LangChain’s Ollama or ChatOllama when replacing OpenAI in agents/chains.\n", - " - Adjust prompt format for Llama 2/3 (system/instruction roles) for agents and text-to-SQL.\n", - "- Chroma vector DB persistence:\n", - " - Use persist_directory on create and reload; call .persist(); ensure same embedding function on load; troubleshoot empty/duplicated collections; confirm collection name and client settings.\n", - "- CSVLoader and Documents:\n", - " - CSVLoader builds Document.page_content by joining row key-value pairs excluding metadata_columns; embeddings computed on page_content.\n", - "- HTML/Document loading and splitting:\n", - " - UnstructuredHTMLLoader returns Document objects; use doc.page_content rather than str(doc) to avoid escaped characters.\n", - " - Consider requests + BeautifulSoup to fetch and clean HTML; remove navigation/boilerplate before splitting.\n", - " - Use HTMLHeaderTextSplitter to split by headers and group title with following paragraph(s); alternatively use RecursiveCharacterTextSplitter with separators to avoid separating titles from body.\n", - " - Keep metadata such as page title; Document metadata can include source, title, and loc.\n", - "- JS/TS specific:\n", - " - CharacterTextSplitter.createDocuments accepts a second argument (array of objects) to merge into Document.metadata. Use it to add custom metadata fields when splitting.\n", - "- Output parsers:\n", - " - StructuredOutputParser and ResponseSchema: specify schemas for arrays/lists; parse(response.content).\n", - "- Versioning/migration:\n", - " - LangChain 0.1.x/0.2.x modular packages and import changes (e.g., from langchain_openai import ChatOpenAI); deprecations and migration guides.\n", - "\n", - "Coverage requirements for each set of queries (aim to include a mix)\n", - "1) Official docs/reference queries (API usage, parameters, signatures).\n", - "2) End-to-end examples/tutorials and code samples.\n", - "3) Troubleshooting known errors and GitHub issues/discussions.\n", - "4) Stack Overflow Q&A for similar symptoms.\n", - "5) Migration/deprecation notes for breaking changes (e.g., module split, class renames).\n", - "\n", - "Quality checks before submitting\n", - "- 10–15 distinct, detailed queries.\n", - "- Include some site-scoped queries (docs, GitHub issues, Stack Overflow) and some general web queries.\n", - "- Embed exact code identifiers, configuration strings, literal error messages, driver names, and relevant parameters from the user’s question.\n", - "- Use multiple phrasings/synonyms and explore diverse solution angles (performance, configuration, API usage, persistence, parsing/cleaning, migration).\n", - "- Do not provide answers or code—only the list of search queries.\n", - "2025/08/13 22:08:56 INFO dspy.evaluate.evaluate: Average Metric: 4.0 / 5 (80.0%)\n", - "2025/08/13 22:09:17 INFO dspy.evaluate.evaluate: Average Metric: 12.5 / 15 (83.3%)\n", - "2025/08/13 22:09:17 INFO dspy.teleprompt.gepa.gepa: Iteration 10: Full valset score for new program: 0.8333333333333334\n", - "2025/08/13 22:09:17 INFO dspy.teleprompt.gepa.gepa: Iteration 10: Full train_val score for new program: 0.8333333333333334\n", - "2025/08/13 22:09:17 INFO dspy.teleprompt.gepa.gepa: Iteration 10: Individual valset scores for new program: [1.0, 1.0, 0.5, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.75, 0.5, 1.0, 0.0, 1.0, 0.75]\n", - "2025/08/13 22:09:17 INFO dspy.teleprompt.gepa.gepa: Iteration 10: New valset pareto front scores: [1.0, 1.0, 0.5, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.75, 1.0, 1.0, 0.0, 1.0, 1.0]\n", - "2025/08/13 22:09:17 INFO dspy.teleprompt.gepa.gepa: Iteration 10: Full valset pareto front score: 0.8833333333333333\n", - "2025/08/13 22:09:17 INFO dspy.teleprompt.gepa.gepa: Iteration 10: Updated valset pareto front programs: [{0, 1, 2, 3, 4, 5}, {2, 4, 5}, {0, 1, 2, 3, 4, 5}, {0, 1, 2, 3, 4, 5}, {0, 1, 2, 5}, {0, 1, 2, 3, 4, 5}, {0, 1, 2, 3, 4, 5}, {0, 1, 2, 3, 4, 5}, {0, 1, 2, 3, 4, 5}, {0, 1, 2, 3, 4, 5}, {1, 2, 3, 4}, {0, 1, 2, 3, 4, 5}, {0, 1, 2, 3, 4, 5}, {1, 2, 3, 4, 5}, {3}]\n", - "2025/08/13 22:09:17 INFO dspy.teleprompt.gepa.gepa: Iteration 10: Best valset aggregate score so far: 0.8666666666666667\n", - "2025/08/13 22:09:17 INFO dspy.teleprompt.gepa.gepa: Iteration 10: Best program as per aggregate score on train_val: 2\n", - "2025/08/13 22:09:17 INFO dspy.teleprompt.gepa.gepa: Iteration 10: Best program as per aggregate score on valset: 2\n", - "2025/08/13 22:09:17 INFO dspy.teleprompt.gepa.gepa: Iteration 10: Best score on valset: 0.8666666666666667\n", - "2025/08/13 22:09:17 INFO dspy.teleprompt.gepa.gepa: Iteration 10: Best score on train_val: 0.8666666666666667\n", - "2025/08/13 22:09:17 INFO dspy.teleprompt.gepa.gepa: Iteration 10: Linear pareto front program index: 2\n", - "2025/08/13 22:09:17 INFO dspy.teleprompt.gepa.gepa: Iteration 10: New program candidate index: 5\n", - "2025/08/13 22:09:17 INFO dspy.teleprompt.gepa.gepa: Iteration 11: No merge candidates found\n", - "2025/08/13 22:09:17 INFO dspy.teleprompt.gepa.gepa: Iteration 11: Selected program 2 score: 0.8666666666666667\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 4.50 / 5 (90.0%): 100%|██████████| 5/5 [00:10<00:00, 2.03s/it] " - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 22:09:27 INFO dspy.evaluate.evaluate: Average Metric: 4.5 / 5 (90.0%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 22:10:55 INFO dspy.teleprompt.gepa.gepa: Iteration 11: Proposed new text for query_writer: You are given a user’s technical question. Your task is to output a diverse set of long, highly specific search queries that will help gather the most relevant information from search engines to answer that question.\n", - "\n", - "Output format\n", - "- Output only a flat list (array) of 10–15 search queries (strings). No explanations, no extra text.\n", - "- Use a JSON-like Python list of strings.\n", - "- Each query should be long and detailed (aim for 12–25+ words).\n", - "\n", - "What to include in each query\n", - "- Mirror the user’s goal and restate it in several ways across the queries.\n", - "- Embed exact class/function names, parameters, method calls, import paths, and error messages (quoted) from the question, when present.\n", - "- Include relevant library/framework names and versions if known or commonly implicated (e.g., “LangChain 0.1.x”, “LangChain 0.2.x”, “langchain-openai”, “langchain-community”, “langchain-core”, “SQLAlchemy 2.x”, “psycopg2”, “pyodbc”, “@langchain/openai”).\n", - "- Use concrete task wording (what they’re trying to do) and likely solution angles (e.g., “how to add memory”, “custom prompt”, “disable metadata reflection”, “print BaseMessage content”, “stream tokens”, “wrap Document in list”).\n", - "- Include code identifiers and common ecosystem terms: e.g., create_sql_agent, SQLDatabaseToolkit, SQLDatabase.from_uri(\"postgresql+psycopg2://...\"), SQLDatabaseSequentialChain, AgentExecutor, ConversationalRetrievalChain, RetrievalQA.from_chain_type, ConversationBufferMemory(return_messages=True), MessagesPlaceholder, CharacterTextSplitter.createDocuments, CSVLoader.metadata_columns, Document.page_content, BaseCallbackHandler.on_llm_end, BaseMessage, AIMessage, ChatOpenAI.invoke, Chroma persist_directory, Pinecone, Ollama/ChatOllama, AzureOpenAI vs OpenAI, ChatOpenAI import from langchain_openai, Document import from langchain_core.documents.\n", - "- Quote exact error text to anchor results when applicable: e.g., \"ValidationError\", \"value is not a valid dict\", \"openai.error.APIError: internal error\", \"invalid_request_error\", \"'tuple' object has no attribute 'page_content'\".\n", - "- If the user mentions local or REST endpoints (e.g., localhost:11434/api/generate for Ollama), include them in at least one query.\n", - "\n", - "Coverage and diversity\n", - "- Provide a mix of query intents:\n", - " 1) Official docs/reference queries (API usage, parameters, signatures).\n", - " 2) End-to-end examples/tutorials and code samples.\n", - " 3) Troubleshooting known errors and GitHub issues/discussions.\n", - " 4) Stack Overflow Q&A for similar symptoms.\n", - " 5) Migration/deprecation notes for breaking changes (e.g., LangChain 0.0.x/0.1.x to 0.2.x module split).\n", - "- Include site or intent scoping where appropriate:\n", - " - site:docs.langchain.com OR site:python.langchain.com OR site:langchain.readthedocs.io (official docs)\n", - " - site:api.python.langchain.com (API reference)\n", - " - site:js.langchain.com OR site:api.js.langchain.com (JS/TS docs and API)\n", - " - site:github.com with “issues” or “discussions”\n", - " - site:stackoverflow.com\n", - "- Also include generic web queries (no site: operator) to cover blogs, tutorials, and community posts.\n", - "- Use multiple phrasings and synonyms to increase recall (e.g., “agent” vs “chain”, “callback” vs “hook”, “retriever tool” vs “vector store tool”, “ReAct” vs “structured chat agent”).\n", - "- Explore multiple plausible interpretations and troubleshooting paths:\n", - " - API misuse or wrong class\n", - " - Incorrect argument or signature usage\n", - " - Version mismatch or deprecated API after LangChain 0.1.x/0.2.x modular split (langchain, langchain-openai, langchain-community, langchain-core)\n", - " - Missing install or wrong driver\n", - " - Configuration specifics and performance pitfalls\n", - "\n", - "Domain-specific guidance to incorporate (LangChain and adjacent tooling)\n", - "- ChatOpenAI / OpenAI usage (Python):\n", - " - For recent versions import from langchain_openai: from langchain_openai import ChatOpenAI.\n", - " - invoke(...) returns a BaseMessage/AIMessage; to view text, access the .content attribute (e.g., print(result.content)).\n", - " - Include queries about \"BaseMessage\", \"AIMessage\", and \".content\" when users see no output after invoke.\n", - " - Consider issues like trailing spaces in model_name (e.g., \"text-davinci-003 \") causing \"invalid_request_error\".\n", - "- JavaScript/TypeScript streaming and memory:\n", - " - Prefer ChatOpenAI (JS) with streaming: true; use @langchain/openai with the modular 0.2.x packages.\n", - " - Pass callbacks to the LLM instance when required (handleLLMNewToken), not just to chains; verify callback signatures.\n", - " - Use ChatPromptTemplate with MessagesPlaceholder for history; configure BufferMemory with returnMessages: true and appropriate memory_key.\n", - "- Memory and custom prompts for retrieval chat (Python):\n", - " - To combine memory and retrieval, consider ConversationalRetrievalChain; customize condense question and answer prompts (PromptTemplate).\n", - " - Compare RetrievalQA.from_chain_type vs ConversationalRetrievalChain; how to attach ConversationBufferMemory; how to override prompts.\n", - "- SQL agents and databases:\n", - " - Use create_sql_agent, SQLDatabaseToolkit, AgentExecutor, SQLDatabase.from_uri(\"postgresql+psycopg2://...\").\n", - " - Ensure drivers installed (psycopg2/psycopg2-binary for Postgres, pyodbc for MSSQL).\n", - " - For MSSQL URIs: \"mssql+pyodbc://...?...driver=ODBC Driver 17 for SQL Server\".\n", - " - SQLDatabaseSequentialChain can be slow with many tables due to SQLAlchemy metadata reflection on all tables; include queries about limiting reflection, include/exclude tables, lazy approaches, table_info hints, top_k tuning, or switching to SQL agents.\n", - "- ReAct agents with retrieval:\n", - " - Expose vector store retriever as a Tool (vector_store.as_retriever(search_kwargs={'k': ...})); use AgentExecutor or create_structured_chat_agent; consider ConversationalRetrievalChain as an alternative.\n", - "- Callbacks:\n", - " - Verify BaseCallbackHandler method signatures (on_llm_start, on_llm_new_token, on_llm_end(response, **kwargs)) and kwargs handling; note differences across providers.\n", - "- Ollama / Llama models and local REST LLMs:\n", - " - Use LangChain’s Ollama or ChatOllama wrappers from langchain_community (Python) or JS equivalents; configure base_url=http://localhost:11434 and model=\"llama2\" or appropriate tag.\n", - " - When replacing OpenAI in agents/chains (including SQL agents), swap in ChatOllama/Ollama; adjust system/instruction prompts for Llama2/Llama3 chat format if needed.\n", - " - Include queries about using Ollama REST endpoint (localhost:11434/api/generate) with custom LLM wrappers if necessary.\n", - "- Chroma vector DB persistence:\n", - " - Use persist_directory on create and reload; call .persist(); ensure same embedding function on load; troubleshoot empty/duplicated collections.\n", - "- CSVLoader and Documents:\n", - " - CSVLoader constructs Document.page_content by joining row key-value pairs excluding metadata_columns; embeddings are computed on page_content, not metadata.\n", - " - CharacterTextSplitter (JS) createDocuments accepts a second argument (array of objects) merged into Document.metadata: createDocuments(texts, metadatas).\n", - "- Structured outputs:\n", - " - StructuredOutputParser.from_response_schemas typically yields a flat dict; to get an array of objects (list of dicts), instruct the model to return a JSON list and consider JsonOutputParser or PydanticOutputParser with typing like List[ItemSchema].\n", - " - Include queries on specifying nested JSON arrays in format_instructions and schema limitations/workarounds.\n", - "- Document creation and chain inputs:\n", - " - In modern LangChain (0.2.x), Document is from langchain_core.documents import Document (Python) or appropriate JS equivalent.\n", - " - Many chains expect input_documents to be a list of Document objects; passing a single Document or a tuple can cause \"'tuple' object has no attribute 'page_content'\" errors. Include queries that mention wrapping documents in a list.\n", - "\n", - "Versioning/migration\n", - "- Include queries covering LangChain 0.1.x/0.2.x modularization (langchain, langchain-openai, langchain-community, langchain-core) and import path changes; JS modular packages as well (e.g., @langchain/openai vs langchain/llms/openai).\n", - "\n", - "Quality checks before submitting\n", - "- 10–15 distinct, detailed queries; avoid near-duplicates and vary phrasing/angles.\n", - "- At least some queries are site-scoped: official docs, API reference, GitHub issues/discussions, and Stack Overflow.\n", - "- Include multiple plausible causes and solution paths, including version mismatches and deprecated APIs.\n", - "- If the user provides code, include exact identifiers, arguments, and error strings verbatim in some queries.\n", - "- Do not provide answers or code—only the list of queries.\n", - "2025/08/13 22:11:08 INFO dspy.evaluate.evaluate: Average Metric: 4.5 / 5 (90.0%)\n", - "2025/08/13 22:11:08 INFO dspy.teleprompt.gepa.gepa: Iteration 11: New subsample score is not better, skipping\n", - "2025/08/13 22:11:08 INFO dspy.teleprompt.gepa.gepa: Iteration 12: Selected program 3 score: 0.8444444444444444\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 4.50 / 5 (90.0%): 100%|██████████| 5/5 [00:10<00:00, 2.15s/it]" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 22:11:18 INFO dspy.evaluate.evaluate: Average Metric: 4.5 / 5 (90.0%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 22:12:38 INFO dspy.teleprompt.gepa.gepa: Iteration 12: Proposed new text for query_writer: You are given a user’s technical question. Your job is to output a diverse set of long, highly specific search queries that will help gather the most relevant information from search engines to solve the question.\n", - "\n", - "Output format rules\n", - "- Output only a flat list (array) of 10–15 search queries (strings). No explanations, no extra text.\n", - "- Use a JSON-like Python list of strings.\n", - "- Each query must be long and detailed (aim for 12–25+ words).\n", - "- Queries must be distinct; avoid near-duplicates and trivial rephrasings.\n", - "\n", - "How to craft the queries\n", - "1) Mirror the user’s goal and restate it several ways across the queries. Reflect concrete tasks they mention. Include end-to-end example phrasing when useful (e.g., “complete code example” or “step-by-step tutorial”).\n", - "2) Extract and embed exact class/function names, parameters, method calls, import paths, URIs, CLI flags, and any literal error messages (quoted) from the question. Examples:\n", - " - create_structured_chat_agent, AgentExecutor.invoke, ConversationalRetrievalChain.from_llm, RetrievalQA.from_chain_type\n", - " - ConversationBufferMemory, PromptTemplate, vector_store.as_retriever(search_kwargs={'k': ...})\n", - " - SQLDatabaseToolkit, SQLDatabase.from_uri(\"postgresql+psycopg2://...\"), create_sql_agent\n", - " - BaseCallbackHandler.on_llm_end(response, **kwargs), callbacks passed to LLM vs chain\n", - " - BaseMessage, AIMessage, result.content\n", - " - “Pipeline cannot infer suitable model classes”, “value is not a valid dict (type=type_error.dict)”, “invalid_request_error”\n", - "3) Include relevant library/framework names and versions if known or commonly implicated:\n", - " - LangChain 0.1.x, 0.2.x; module split: langchain, langchain-openai, langchain-community, langchain-core, langchain-text-splitters\n", - " - OpenAI vs AzureOpenAI vs ChatOpenAI (langchain_openai), VertexAI/ChatVertexAI (langchain_google_vertexai), ChatOllama/Ollama, VertexAI, Llama 2/3\n", - " - SQLAlchemy 2.x, psycopg2/psycopg2-binary, pyodbc, ODBC Driver 17 for SQL Server\n", - " - Chroma, Pinecone, FAISS\n", - " - Hugging Face transformers, peft, accelerate, sentencepiece\n", - "4) Use task-oriented phrasing and multiple solution angles. Vary terminology to increase recall:\n", - " - “agent” vs “chain”; “callback” vs “hook”; “tool” vs “retriever tool” vs “vector store tool”; “ReAct” vs “structured chat agent”\n", - " - “how to add memory”, “custom prompt”, “override condense question”, “disable metadata reflection”, “print BaseMessage content”\n", - " - “end-to-end example”, “full working code”, “step-by-step tutorial”\n", - " - Migration or deprecation: module split, imports, API changes\n", - "5) Cover multiple plausible interpretations and troubleshooting paths:\n", - " - API misuse or wrong class/argument/signature\n", - " - Version mismatch or deprecated API; modular package import changes (e.g., langchain_openai.ChatOpenAI, langchain_google_vertexai.ChatVertexAI)\n", - " - Missing install or wrong driver; environment variables\n", - " - Configuration issues, performance, prompt formatting, tool wiring\n", - " - HF LoRA/PEFT adapter loading vs base model loading; config.json vs adapter_config.json\n", - "6) Scope some queries to authoritative sources and Q&A:\n", - " - Official docs/reference: site:docs.langchain.com OR site:python.langchain.com OR site:langchain.readthedocs.io\n", - " - API reference: site:api.python.langchain.com\n", - " - JS/TS docs when relevant: site:js.langchain.com\n", - " - GitHub issues/discussions: site:github.com with “issues” or “discussions”\n", - " - Stack Overflow: site:stackoverflow.com\n", - " - Also include generic web queries (no site:) for blogs, tutorials, and community posts.\n", - "\n", - "Domain-specific guidance to incorporate where applicable (LangChain and adjacent tooling)\n", - "- ChatOpenAI / OpenAI usage (LangChain 0.1.x/0.2.x):\n", - " - Use langchain_openai.ChatOpenAI; distinguish OpenAI vs AzureOpenAI vs ChatOllama/Ollama.\n", - " - invoke(...) returns a BaseMessage/AIMessage; to view text, access the .content attribute (e.g., print(result.content)).\n", - " - Watch model name spacing causing \"invalid_request_error\".\n", - "- Memory and custom prompts for retrieval chat:\n", - " - To combine memory and retrieval, prefer ConversationalRetrievalChain; customize condense-question and answer prompts (PromptTemplate).\n", - " - Compare RetrievalQA.from_chain_type vs ConversationalRetrievalChain; how to attach ConversationBufferMemory; how to override prompts (condense question / question_generator).\n", - " - Include queries asking for end-to-end examples showing both custom prompts and memory affecting responses.\n", - "- SQL agents and databases:\n", - " - Use create_sql_agent, SQLDatabaseToolkit, SQLDatabase.from_uri(\"postgresql+psycopg2://...\"), AgentExecutor.\n", - " - Ensure drivers installed (psycopg2/psycopg2-binary, pyodbc).\n", - " - For MSSQL URIs: \"mssql+pyodbc://...?...driver=ODBC Driver 17 for SQL Server\".\n", - " - Note AzureOpenAI vs OpenAI compatibility in toolkits; include queries exploring using OpenAI instead of AzureOpenAI if toolkit expects different types.\n", - " - SQLDatabaseSequentialChain reflection can be slow; limit tables, provide table_info hints, or prefer agents.\n", - "- ReAct agents with retrieval:\n", - " - Expose the vector store retriever as a Tool (vector_store.as_retriever(search_kwargs={'k': ...})); add retriever tool in tools list; use AgentExecutor or create_structured_chat_agent; consider ConversationalRetrievalChain alternative.\n", - " - Reference specific prompts like hwchase17/structured-chat-agent.\n", - "- Callbacks:\n", - " - Pass callbacks to the LLM instance when required, not just to chains.\n", - " - Verify BaseCallbackHandler method signatures (on_llm_end(response, **kwargs)) and kwargs handling; note VertexAI callback nuances.\n", - "- VertexAI / Google:\n", - " - Use correct modular imports (e.g., langchain_google_vertexai.ChatVertexAI). Include queries about migration from langchain.llms.VertexAI.\n", - "- Ollama / Llama models:\n", - " - Use LangChain’s Ollama or ChatOllama when replacing OpenAI in agents/chains; adapt prompts for Llama 2/3 system/instruction roles.\n", - "- Hugging Face finetunes (PEFT/LoRA):\n", - " - For “Pipeline cannot infer suitable model classes”, include queries about loading base model + LoRA adapters with peft (PeftModel.from_pretrained), transformers AutoModelForCausalLM/AutoTokenizer, correct config.json vs adapter_config.json, trust_remote_code, device map, and integration via a custom LangChain LLM subclass.\n", - "- Chroma vector DB persistence:\n", - " - Use persist_directory on create and reload; call .persist(); ensure same embedding function on load; troubleshoot empty/duplicated collections.\n", - "- CSVLoader and Documents:\n", - " - CSVLoader builds Document.pageContent; embeddings computed on pageContent; CharacterTextSplitter createDocuments accepts texts and metadatas arrays (JS).\n", - "- Output parsers:\n", - " - StructuredOutputParser and ResponseSchema; specify nested JSON in schema/prompt; parse(response.content).\n", - "- Versioning/migration:\n", - " - LangChain 0.1.x/0.2.x modular packages and import changes (langchain_openai.ChatOpenAI, langchain_google_vertexai.ChatVertexAI); deprecations and migration guides.\n", - "\n", - "Coverage requirements for each set of queries (include a mix)\n", - "1) Official docs/reference queries (API usage, parameters, signatures).\n", - "2) End-to-end examples/tutorials and code samples demonstrating the setup working.\n", - "3) Troubleshooting known errors and GitHub issues/discussions.\n", - "4) Stack Overflow Q&A for similar symptoms.\n", - "5) Migration/deprecation notes for breaking changes (e.g., module split, class renames, callback signature changes).\n", - "6) Where applicable, include at least one query exploring a custom LLM wrapper/subclass approach to integrate nonstandard backends or HF PEFT models with LangChain.\n", - "\n", - "Quality checks before submitting\n", - "- 10–15 distinct, detailed queries.\n", - "- Include several site-scoped queries (docs, API, GitHub, Stack Overflow) and several general web queries.\n", - "- Embed exact code identifiers, configuration strings, and quoted error messages from the user’s question.\n", - "- Use multiple phrasings/synonyms and explore diverse solution angles.\n", - "- Do not provide answers or code—only the list of search queries.\n", - "2025/08/13 22:12:48 INFO dspy.evaluate.evaluate: Average Metric: 3.833333333333333 / 5 (76.7%)\n", - "2025/08/13 22:12:48 INFO dspy.teleprompt.gepa.gepa: Iteration 12: New subsample score is not better, skipping\n", - "2025/08/13 22:12:48 INFO dspy.teleprompt.gepa.gepa: Iteration 13: Selected program 3 score: 0.8444444444444444\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 3.25 / 5 (65.0%): 100%|██████████| 5/5 [00:10<00:00, 2.08s/it] " - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 22:12:59 INFO dspy.evaluate.evaluate: Average Metric: 3.25 / 5 (65.0%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 22:14:34 INFO dspy.teleprompt.gepa.gepa: Iteration 13: Proposed new text for query_writer: You are given a user’s technical question. Your job is to output a diverse set of long, highly specific search queries that will help gather the most relevant information from search engines to solve the question.\n", - "\n", - "Output format rules\n", - "- Output only a flat list (array) of 10–15 search queries (strings). No explanations, no extra text.\n", - "- Use a JSON-like Python list of strings.\n", - "- Each query must be long and detailed (aim for 12–25+ words).\n", - "- Queries must be distinct; avoid near-duplicates and trivial rephrasings.\n", - "\n", - "How to craft the queries\n", - "1) Mirror the user’s goal and restate it in several ways across the queries. Reflect concrete tasks they mention and the exact operations they are trying to do.\n", - "2) Extract and embed exact class/function names, parameters, method calls, import paths, driver URIs, CLI flags, environment variables, file names, and any literal error messages (use quotes) from the question.\n", - " - Examples to embed when relevant: create_sql_agent, SQLDatabaseToolkit, SQLDatabase.from_uri(\"postgresql+psycopg2://...\"), AgentExecutor, RetrievalQA.from_chain_type, ConversationalRetrievalChain, CharacterTextSplitter.createDocuments, Document.page_content, BaseMessage, AIMessage, .content, HuggingFaceHub, HuggingFaceHubEmbeddings, Chroma.from_texts, .persist(), as_retriever(search_kwargs={'k': ...}), OpenAI(model_name='text-davinci-003'), ChatOpenAI, AzureOpenAI/AzureChatOpenAI, langchain_openai.ChatOpenAI, langchain_openai.AzureOpenAI, SQLDatabaseSequentialChain, SQLDatabaseChain, create_structured_chat_agent, Top-k parameters, include_tables, table_info, sample_rows_in_table_info.\n", - " - Include driver URIs and connection strings exactly as shown by the user, e.g., \"mssql+pyodbc://user:pass@server/db?driver=ODBC Driver 17 for SQL Server\" and alternatives using quote_plus encoding.\n", - "3) Include relevant library/framework names and versions if known or commonly implicated:\n", - " - LangChain 0.1.x, 0.2.x; module split: langchain, langchain-openai, langchain-community, langchain-core, langchain-text-splitters; JS: site:js.langchain.com\n", - " - OpenAI vs AzureOpenAI vs ChatOpenAI (langchain_openai), ChatOllama/Ollama, VertexAI, Llama 2/3\n", - " - SQLAlchemy 2.x, psycopg2/psycopg2-binary, pyodbc, ODBC Driver 17 for SQL Server\n", - " - Chroma, Pinecone, FAISS\n", - " - Hugging Face transformers/peft/accelerate/sentencepiece; LoRA/PEFT adapters; AutoModelForCausalLM, AutoTokenizer, PeftModel\n", - "4) Use task-oriented phrasing and multiple solution angles to increase recall:\n", - " - “agent” vs “chain”; “callback” vs “hook”; “tool” vs “retriever tool” vs “vector store tool”; “ReAct” vs “structured chat agent”\n", - " - “how to add memory”, “custom prompt”, “override condense question”, “disable metadata reflection”, “print BaseMessage content”, “wrap Document in a list”\n", - " - Migration or deprecation: module split, imports, API changes, class renames, deprecations in 0.1.x/0.2.x\n", - "5) Cover multiple plausible interpretations and troubleshooting paths:\n", - " - API misuse or wrong class (e.g., AzureOpenAI vs OpenAI incompatibility with SQLDatabaseToolkit; Chat vs LLM classes)\n", - " - Incorrect arguments or signature usage; wrong parameter names; trailing space in model name (“text-davinci-003 ”)\n", - " - Version mismatch or deprecated API (migration to langchain_openai, langchain_community, etc.)\n", - " - Missing install or wrong driver; ODBC driver encoding; psycopg2 vs psycopg2-binary; DSN issues\n", - " - Configuration issues, environment variable setup (OPENAI_API_KEY, AZURE_OPENAI_API_KEY, OPENAI_API_BASE, OPENAI_API_VERSION)\n", - " - Performance pitfalls (SQLDatabase metadata reflection on all tables; how to limit/skip reflection; include_tables; table_info; top_k tuning; alternatives like SQL agents)\n", - " - Prompt formatting; memory integration; how to pass callbacks to the LLM instance; BaseCallbackHandler signatures\n", - " - Embeddings/vector store persistence; using the same embedding function on reload; Chroma persist_directory and .persist()\n", - " - Document creation and usage: creating Document(page_content=...), adding metadata, ensuring input_documents is a list of Documents\n", - " - HF model loading: “Pipeline cannot infer suitable model classes” due to missing config.json/tokenizer.json/adapter_config; using trust_remote_code; loading base + LoRA adapter with peft; creating a custom LangChain LLM subclass for custom pipelines\n", - "6) Scope some queries to authoritative sources and Q&A:\n", - " - Official docs/reference: site:docs.langchain.com OR site:python.langchain.com OR site:langchain.readthedocs.io\n", - " - API reference: site:api.python.langchain.com\n", - " - JavaScript docs: site:js.langchain.com\n", - " - GitHub issues/discussions: site:github.com with “issues” or “discussions”\n", - " - Stack Overflow: site:stackoverflow.com\n", - " - Also include generic web queries (no site:) for blogs, tutorials, benchmarks, and community posts\n", - "\n", - "Domain-specific guidance to incorporate (LangChain and adjacent tooling)\n", - "- ChatOpenAI / OpenAI usage (LangChain 0.1.x/0.2.x):\n", - " - Use langchain_openai.ChatOpenAI; distinguish OpenAI vs AzureOpenAI vs AzureChatOpenAI; pass azure_deployment, api_version, and endpoint for Azure.\n", - " - invoke(...) returns a BaseMessage/AIMessage; to view text, access the .content attribute (e.g., print(result.content)).\n", - " - Watch for trailing spaces in model name causing “invalid_request_error”.\n", - "- Memory and custom prompts for retrieval chat:\n", - " - To combine memory and retrieval, consider ConversationalRetrievalChain; customize condense-question and answer prompts (PromptTemplate).\n", - " - Compare RetrievalQA.from_chain_type vs ConversationalRetrievalChain; attach ConversationBufferMemory; override prompts.\n", - "- SQL agents and databases:\n", - " - Prefer create_sql_agent + SQLDatabaseToolkit + AgentExecutor for dynamic table selection; SQLDatabase.from_uri(\"postgresql+psycopg2://...\") or MSSQL \"mssql+pyodbc://...\".\n", - " - Ensure drivers installed (psycopg2/psycopg2-binary for Postgres, pyodbc for MSSQL); encode driver name with quote_plus if needed.\n", - " - SQLDatabaseSequentialChain can be slow due to metadata reflection across all tables; mitigate with include_tables, table_info hints, sample_rows_in_table_info, or by limiting/lazily reflecting metadata. Some users even comment out reflection lines as a workaround.\n", - " - Consider alternatives: SQLDatabaseChain, SQL agents (ReAct/structured chat), or limiting schema scope.\n", - "- ReAct agents with retrieval:\n", - " - Expose vector store retriever as a Tool (vector_store.as_retriever(search_kwargs={'k': ...})); use AgentExecutor or create_structured_chat_agent; compare to ConversationalRetrievalChain.\n", - "- Callbacks:\n", - " - Pass callbacks to the LLM instance when required, not just to chains; verify BaseCallbackHandler signatures and kwargs handling.\n", - "- Ollama / Llama models:\n", - " - Use LangChain’s Ollama or ChatOllama when replacing OpenAI; adjust prompt format for Llama 2/3 (system/instruction roles).\n", - "- Chroma vector DB persistence:\n", - " - Use persist_directory on create and reload; call .persist(); ensure same embedding function on load; troubleshoot empty/duplicated collections.\n", - "- CSVLoader and Documents:\n", - " - CSVLoader builds Document.pageContent by joining row key-value pairs excluding metadata_columns; embeddings computed on pageContent.\n", - " - CharacterTextSplitter (JS) createDocuments accepts an optional second argument (array of objects) merged into Document.metadata: createDocuments(texts, metadatas).\n", - "- Output parsers:\n", - " - StructuredOutputParser and ResponseSchema: describe expected nested JSON for lists of dicts; parse(response.content).\n", - "- Versioning/migration:\n", - " - Modular packages and import changes (use langchain_openai.ChatOpenAI / AzureOpenAI; many integrations moved to langchain_community); consult migration guides and deprecation notes.\n", - "\n", - "Coverage requirements for each set of queries (aim to include a mix)\n", - "1) Official docs/reference queries (API usage, parameters, signatures, migration notes).\n", - "2) End-to-end examples/tutorials and code samples.\n", - "3) Troubleshooting known errors and GitHub issues/discussions (include exact error strings in quotes).\n", - "4) Stack Overflow Q&A for similar symptoms.\n", - "5) Migration/deprecation notes for breaking changes (e.g., module split, class renames).\n", - "\n", - "Additional reminders\n", - "- Include queries that propose viable alternatives if the user’s approach is problematic (e.g., using SQL agents instead of SQLDatabaseSequentialChain; or writing a custom LLM wrapper for custom HF/LoRA models).\n", - "- For Document-related issues, include queries that emphasize wrapping a single Document in a list when an API expects a list (e.g., input_documents=[doc]).\n", - "- For OpenAI/AzureOpenAI issues, include environment variable setup and configuration queries; explicitly mention OPENAI_API_KEY, AZURE_OPENAI_API_KEY, OPENAI_API_BASE, OPENAI_API_VERSION, and azure deployment names.\n", - "- When relevant, include queries with concrete strings from the user’s code, URIs, and parameters exactly as typed, including model names with potential trailing spaces.\n", - "\n", - "Quality checks before submitting\n", - "- 10–15 distinct, detailed queries.\n", - "- A mix of site-scoped queries (docs, GitHub, Stack Overflow) and general web queries.\n", - "- Embed exact code identifiers and error messages from the user’s question in multiple queries.\n", - "- Use varied phrasing and explore different solution avenues and tool choices.\n", - "- Do not provide answers or code—only the list of search queries.\n", - "2025/08/13 22:14:45 INFO dspy.evaluate.evaluate: Average Metric: 4.083333333333334 / 5 (81.7%)\n", - "2025/08/13 22:15:06 INFO dspy.evaluate.evaluate: Average Metric: 13.0 / 15 (86.7%)\n", - "2025/08/13 22:15:06 INFO dspy.teleprompt.gepa.gepa: Iteration 13: Full valset score for new program: 0.8666666666666667\n", - "2025/08/13 22:15:06 INFO dspy.teleprompt.gepa.gepa: Iteration 13: Full train_val score for new program: 0.8666666666666667\n", - "2025/08/13 22:15:06 INFO dspy.teleprompt.gepa.gepa: Iteration 13: Individual valset scores for new program: [1.0, 1.0, 0.5, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.75, 1.0, 1.0, 0.0, 1.0, 0.75]\n", - "2025/08/13 22:15:06 INFO dspy.teleprompt.gepa.gepa: Iteration 13: New valset pareto front scores: [1.0, 1.0, 0.5, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.75, 1.0, 1.0, 0.0, 1.0, 1.0]\n", - "2025/08/13 22:15:06 INFO dspy.teleprompt.gepa.gepa: Iteration 13: Full valset pareto front score: 0.8833333333333333\n", - "2025/08/13 22:15:06 INFO dspy.teleprompt.gepa.gepa: Iteration 13: Updated valset pareto front programs: [{0, 1, 2, 3, 4, 5, 6}, {2, 4, 5, 6}, {0, 1, 2, 3, 4, 5, 6}, {0, 1, 2, 3, 4, 5, 6}, {0, 1, 2, 5, 6}, {0, 1, 2, 3, 4, 5, 6}, {0, 1, 2, 3, 4, 5, 6}, {0, 1, 2, 3, 4, 5, 6}, {0, 1, 2, 3, 4, 5, 6}, {0, 1, 2, 3, 4, 5, 6}, {1, 2, 3, 4, 6}, {0, 1, 2, 3, 4, 5, 6}, {0, 1, 2, 3, 4, 5, 6}, {1, 2, 3, 4, 5, 6}, {3}]\n", - "2025/08/13 22:15:06 INFO dspy.teleprompt.gepa.gepa: Iteration 13: Best valset aggregate score so far: 0.8666666666666667\n", - "2025/08/13 22:15:06 INFO dspy.teleprompt.gepa.gepa: Iteration 13: Best program as per aggregate score on train_val: 2\n", - "2025/08/13 22:15:06 INFO dspy.teleprompt.gepa.gepa: Iteration 13: Best program as per aggregate score on valset: 2\n", - "2025/08/13 22:15:06 INFO dspy.teleprompt.gepa.gepa: Iteration 13: Best score on valset: 0.8666666666666667\n", - "2025/08/13 22:15:06 INFO dspy.teleprompt.gepa.gepa: Iteration 13: Best score on train_val: 0.8666666666666667\n", - "2025/08/13 22:15:06 INFO dspy.teleprompt.gepa.gepa: Iteration 13: Linear pareto front program index: 2\n", - "2025/08/13 22:15:06 INFO dspy.teleprompt.gepa.gepa: Iteration 13: New program candidate index: 6\n", - "2025/08/13 22:15:06 INFO dspy.teleprompt.gepa.gepa: Iteration 14: No merge candidates found\n", - "2025/08/13 22:15:06 INFO dspy.teleprompt.gepa.gepa: Iteration 14: Selected program 6 score: 0.8666666666666667\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 4.75 / 5 (95.0%): 100%|██████████| 5/5 [00:10<00:00, 2.05s/it] " - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 22:15:16 INFO dspy.evaluate.evaluate: Average Metric: 4.75 / 5 (95.0%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 22:16:15 INFO dspy.teleprompt.gepa.gepa: Iteration 14: Proposed new text for query_writer: You are given a user’s technical question. Your job is to output a diverse set of long, highly specific search queries that will help gather the most relevant information from search engines to solve the question.\n", - "\n", - "Output format rules\n", - "- Output only a flat list (array) of 10–15 search queries (strings). No explanations, no extra text.\n", - "- Use a JSON-like Python list of strings.\n", - "- Each query must be long and detailed (aim for 12–25+ words).\n", - "- Queries must be distinct; avoid near-duplicates and trivial rephrasings.\n", - "\n", - "How to craft the queries\n", - "1) Mirror the user’s goal and restate it in several ways across the queries. Reflect concrete tasks they mention and the exact operations they are trying to do.\n", - "2) Extract and embed exact class/function names, parameters, method calls, import paths, driver URIs, CLI flags, environment variables, file names, and any literal error messages (use quotes) from the question.\n", - " - Examples to embed when relevant: create_sql_agent, SQLDatabaseToolkit, SQLDatabase.from_uri(\"postgresql+psycopg2://...\"), AgentExecutor, RetrievalQA.from_chain_type, ConversationalRetrievalChain, CharacterTextSplitter.createDocuments, Document.page_content, BaseMessage, AIMessage, .content, HuggingFaceHub, HuggingFaceHubEmbeddings, Chroma.from_texts, .persist(), as_retriever(search_kwargs={'k': ...}), OpenAI(model_name='text-davinci-003'), ChatOpenAI, AzureOpenAI/AzureChatOpenAI, langchain_openai.ChatOpenAI, langchain_openai.AzureOpenAI, SQLDatabaseSequentialChain, SQLDatabaseChain, create_structured_chat_agent, Top-k parameters, include_tables, table_info, sample_rows_in_table_info.\n", - " - Include driver URIs and connection strings exactly as shown by the user, e.g., \"mssql+pyodbc://user:pass@server/db?driver=ODBC Driver 17 for SQL Server\" and alternatives using quote_plus encoding.\n", - " - When the user shows literal outputs or errors, include them verbatim within quotes in some queries.\n", - "3) Include relevant library/framework names and versions if known or commonly implicated:\n", - " - LangChain 0.1.x, 0.2.x; module split: langchain, langchain-openai, langchain-community, langchain-core, langchain-text-splitters; JS: site:js.langchain.com\n", - " - OpenAI vs AzureOpenAI vs ChatOpenAI (langchain_openai), ChatOllama/Ollama, VertexAI, Llama 2/3\n", - " - SQLAlchemy 2.x, psycopg2/psycopg2-binary, pyodbc, ODBC Driver 17 for SQL Server\n", - " - Chroma, Pinecone, FAISS\n", - " - Hugging Face transformers/peft/accelerate/sentencepiece; LoRA/PEFT adapters; AutoModelForCausalLM, AutoTokenizer, PeftModel\n", - "4) Use task-oriented phrasing and multiple solution angles to increase recall:\n", - " - “agent” vs “chain”; “callback” vs “hook”; “tool” vs “retriever tool” vs “vector store tool”; “ReAct” vs “structured chat agent”\n", - " - “how to add memory”, “custom prompt”, “override condense question”, “disable metadata reflection”, “print BaseMessage content”, “wrap Document in a list”\n", - " - Migration or deprecation: module split, imports, API changes, class renames, deprecations in 0.1.x/0.2.x\n", - "5) Cover multiple plausible interpretations and troubleshooting paths:\n", - " - API misuse or wrong class (e.g., AzureOpenAI vs OpenAI incompatibility with SQLDatabaseToolkit; Chat vs LLM classes)\n", - " - Incorrect arguments or signature usage; wrong parameter names; trailing space in model name (“text-davinci-003 ”)\n", - " - Version mismatch or deprecated API (migration to langchain_openai, langchain_community, etc.)\n", - " - Missing install or wrong driver; ODBC driver encoding; psycopg2 vs psycopg2-binary; DSN issues\n", - " - Configuration issues, environment variable setup (OPENAI_API_KEY, AZURE_OPENAI_API_KEY, OPENAI_API_BASE, OPENAI_API_VERSION)\n", - " - Performance pitfalls (SQLDatabase metadata reflection on all tables; how to limit/skip reflection; include_tables; table_info; top_k tuning; alternatives like SQL agents)\n", - " - Prompt formatting; memory integration; how to pass callbacks to the LLM instance; BaseCallbackHandler signatures\n", - " - Embeddings/vector store persistence; using the same embedding function on reload; Chroma persist_directory and .persist()\n", - " - Document creation and usage: creating Document(page_content=...), adding metadata, ensuring input_documents is a list of Documents\n", - " - HF model loading: “Pipeline cannot infer suitable model classes” due to missing config.json/tokenizer.json/adapter_config; using trust_remote_code; loading base + LoRA adapter with peft; creating a custom LangChain LLM subclass for custom pipelines\n", - " - For silent outputs in ChatOpenAI, queries must explicitly mention that invoke returns an AIMessage/BaseMessage and to access response.text via the .content attribute, e.g., “print(result.content)”.\n", - "6) Scope some queries to authoritative sources and Q&A:\n", - " - Official docs/reference: site:docs.langchain.com OR site:python.langchain.com OR site:langchain.readthedocs.io\n", - " - API reference: site:api.python.langchain.com\n", - " - JavaScript docs: site:js.langchain.com\n", - " - GitHub issues/discussions: site:github.com with “issues” or “discussions”\n", - " - Stack Overflow: site:stackoverflow.com\n", - " - Also include generic web queries (no site:) for blogs, tutorials, benchmarks, and community posts\n", - "\n", - "Domain-specific guidance to incorporate (LangChain and adjacent tooling)\n", - "- ChatOpenAI / OpenAI usage (LangChain 0.1.x/0.2.x):\n", - " - Use langchain_openai.ChatOpenAI; distinguish OpenAI vs AzureOpenAI vs AzureChatOpenAI; pass azure_deployment, api_version, and endpoint for Azure.\n", - " - invoke(...) returns a BaseMessage/AIMessage; to view text, access the .content attribute (e.g., print(result.content)); include such phrasing explicitly in some queries.\n", - " - Watch for trailing spaces in model name causing “invalid_request_error”.\n", - "- JavaScript streaming and chains:\n", - " - Prefer ChatOpenAI in JS for chat models; set streaming: true on the model.\n", - " - Attach callbacks with handleLLMNewToken to stream tokens; ensure BufferMemory has returnMessages: true when used with chat prompts.\n", - " - Consider ChatPromptTemplate for system/user/history roles; verify ConversationChain vs agent usage for streaming.\n", - "- Memory and custom prompts for retrieval chat:\n", - " - To combine memory and retrieval, consider ConversationalRetrievalChain; customize condense-question and answer prompts (PromptTemplate).\n", - " - Compare RetrievalQA.from_chain_type vs ConversationalRetrievalChain; attach ConversationBufferMemory; override prompts.\n", - "- SQL agents and databases:\n", - " - Prefer create_sql_agent + SQLDatabaseToolkit + AgentExecutor for dynamic table selection; SQLDatabase.from_uri(\"postgresql+psycopg2://...\") or MSSQL \"mssql+pyodbc://...\".\n", - " - Ensure drivers installed (psycopg2/psycopg2-binary for Postgres, pyodbc for MSSQL); encode driver name with quote_plus if needed.\n", - " - SQLDatabaseSequentialChain can be slow due to metadata reflection across all tables; mitigate with include_tables, table_info hints, sample_rows_in_table_info, or by limiting/lazily reflecting metadata. Consider SQLDatabaseChain or SQL agents as alternatives.\n", - "- ReAct agents with retrieval:\n", - " - Expose vector store retriever as a Tool (vector_store.as_retriever(search_kwargs={'k': ...})); use AgentExecutor or create_structured_chat_agent; compare to ConversationalRetrievalChain.\n", - "- Callbacks:\n", - " - Pass callbacks to the LLM instance when required, not just to chains; verify BaseCallbackHandler signatures and kwargs handling.\n", - "- Ollama / Llama models:\n", - " - Use LangChain’s Ollama or ChatOllama when replacing OpenAI; adjust prompt format for Llama 2/3 (system/instruction roles).\n", - "- Chroma vector DB persistence:\n", - " - Use persist_directory on create and reload; call .persist(); ensure same embedding function on load; troubleshoot empty/duplicated collections. Include phrasing about “persist_directory” in queries.\n", - "- CSVLoader and Documents:\n", - " - CSVLoader builds Document.page_content by joining row key-value pairs excluding metadata_columns; embeddings computed on page_content.\n", - " - CharacterTextSplitter (JS) createDocuments accepts an optional second argument (array of objects) merged into Document.metadata: createDocuments(texts, metadatas).\n", - "- Output parsers:\n", - " - StructuredOutputParser and ResponseSchema: describe expected nested JSON for lists of dicts; include verbiage like “list of objects with keys identifier and text” and parsing with parse(response.content).\n", - "- Versioning/migration:\n", - " - Modular packages and import changes (use langchain_openai.ChatOpenAI / AzureOpenAI; many integrations moved to langchain_community); consult migration guides and deprecation notes.\n", - "\n", - "Coverage requirements for each set of queries (aim to include a mix)\n", - "1) Official docs/reference queries (API usage, parameters, signatures, migration notes).\n", - "2) End-to-end examples/tutorials and code samples.\n", - "3) Troubleshooting known errors and GitHub issues/discussions (include exact error strings in quotes).\n", - "4) Stack Overflow Q&A for similar symptoms.\n", - "5) Migration/deprecation notes for breaking changes (e.g., module split, class renames).\n", - "\n", - "Additional reminders\n", - "- Include queries that propose viable alternatives if the user’s approach is problematic (e.g., using SQL agents instead of SQLDatabaseSequentialChain; or writing a custom LLM wrapper for custom HF/LoRA models).\n", - "- For Document-related issues, include queries that emphasize wrapping a single Document in a list when an API expects a list (e.g., input_documents=[doc]).\n", - "- For OpenAI/AzureOpenAI issues, include environment variable setup and configuration queries; explicitly mention OPENAI_API_KEY, AZURE_OPENAI_API_KEY, OPENAI_API_BASE, OPENAI_API_VERSION, and azure deployment names.\n", - "- When relevant, include queries with concrete strings from the user’s code, URIs, and parameters exactly as typed, including model names with potential trailing spaces.\n", - "- Ensure some queries specifically mention how to print or access AIMessage/BaseMessage content (.content) to see results from invoke calls.\n", - "\n", - "Quality checks before submitting\n", - "- 10–15 distinct, detailed queries.\n", - "- A mix of site-scoped queries (docs, GitHub, Stack Overflow) and general web queries.\n", - "- Embed exact code identifiers and error messages from the user’s question in multiple queries.\n", - "- Use varied phrasing and explore different solution avenues and tool choices.\n", - "- Do not provide answers or code—only the list of search queries.\n", - "2025/08/13 22:16:27 INFO dspy.evaluate.evaluate: Average Metric: 4.25 / 5 (85.0%)\n", - "2025/08/13 22:16:27 INFO dspy.teleprompt.gepa.gepa: Iteration 14: New subsample score is not better, skipping\n", - "2025/08/13 22:16:27 INFO dspy.teleprompt.gepa.gepa: Iteration 15: Selected program 6 score: 0.8666666666666667\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 4.75 / 5 (95.0%): 100%|██████████| 5/5 [00:10<00:00, 2.07s/it] " - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 22:16:37 INFO dspy.evaluate.evaluate: Average Metric: 4.75 / 5 (95.0%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 22:18:37 INFO dspy.teleprompt.gepa.gepa: Iteration 15: Proposed new text for query_writer: You are given a user’s technical question. Your job is to output a diverse set of long, highly specific search queries that will help gather the most relevant information from search engines to solve the question.\n", - "\n", - "Output format rules\n", - "- Output only a flat list (array) of 10–15 search queries (strings). No explanations, no extra text.\n", - "- Use a JSON-like Python list of strings.\n", - "- Each query must be long and detailed (aim for 12–25+ words).\n", - "- Queries must be distinct; avoid near-duplicates and trivial rephrasings.\n", - "\n", - "How to craft the queries\n", - "1) Mirror the user’s goal and restate it in several ways across the queries. Reflect concrete tasks they mention and the exact operations they are trying to do, including desired outcomes, constraints, and edge cases.\n", - "2) Extract and embed exact class/function names, parameters, method calls, import paths, driver URIs, CLI flags, environment variables, file names, and any literal error messages (use quotes) from the question.\n", - " - When relevant, include identifiers such as: create_sql_agent, SQLDatabaseToolkit, SQLDatabase.from_uri(\"postgresql+psycopg2://...\"), AgentExecutor, RetrievalQA.from_chain_type, ConversationalRetrievalChain, CharacterTextSplitter.createDocuments, Document.page_content, BaseMessage, AIMessage, .content, HuggingFaceHub, HuggingFaceHubEmbeddings, Chroma.from_texts, .persist(), as_retriever(search_kwargs={'k': ...}), OpenAI(model_name='text-davinci-003'), ChatOpenAI, AzureOpenAI/AzureChatOpenAI, langchain_openai.ChatOpenAI, langchain_openai.AzureOpenAI, SQLDatabaseSequentialChain, SQLDatabaseChain, create_structured_chat_agent, Top-k parameters, include_tables, table_info, sample_rows_in_table_info.\n", - " - Include driver URIs and connection strings exactly as shown or plausible alternatives, e.g., \"postgresql+psycopg2://user:pass@host:5432/db\", \"mssql+pyodbc://user:pass@server/db?driver=ODBC Driver 17 for SQL Server\", and quote_plus encoded forms.\n", - " - Include environment variables when applicable: OPENAI_API_KEY, AZURE_OPENAI_API_KEY, OPENAI_API_BASE, OPENAI_API_VERSION, and Azure deployment names.\n", - " - Include exact error strings the user shows or might encounter, e.g., \"invalid_request_error\", \"Pipeline cannot infer suitable model classes\", tokenization/config issues like missing config.json, tokenizer.json, adapter_config.json, or LoRA/PEFT adapter loading problems.\n", - "3) Mention relevant library/framework names and versions if known or commonly implicated:\n", - " - LangChain 0.1.x, 0.2.x and the module split across langchain, langchain-openai, langchain-community, langchain-core, langchain-text-splitters; JS docs: site:js.langchain.com\n", - " - OpenAI vs AzureOpenAI vs ChatOpenAI (langchain_openai), ChatOllama/Ollama, VertexAI, Llama 2/3\n", - " - SQLAlchemy 2.x, psycopg2/psycopg2-binary, pyodbc, ODBC Driver 17 for SQL Server\n", - " - Vector stores: Chroma, Pinecone, FAISS\n", - " - Hugging Face: transformers/peft/accelerate/sentencepiece; AutoModelForCausalLM, AutoTokenizer, PeftModel; LoRA/PEFT adapters\n", - "4) Use task-oriented phrasing and multiple solution angles to increase recall:\n", - " - Vary agent vs chain framing (“agent”, “tool”, “retriever tool”, “vector store tool”, “ReAct”, “structured chat agent”).\n", - " - Explore memory, prompt customization, callbacks vs hooks, retriever configuration, top_k tuning, and metadata reflection settings.\n", - " - Include migration/deprecation terms: module split, imports moved to langchain_openai or langchain_community, API changes, class renames, deprecations in 0.1.x/0.2.x.\n", - "5) Cover multiple plausible interpretations and troubleshooting paths:\n", - " - Wrong class or API (e.g., AzureOpenAI vs OpenAI incompatibility; Chat vs completion LLM classes).\n", - " - Incorrect arguments or signatures; wrong parameter names; trailing spaces in model names (“text-davinci-003 ”).\n", - " - Version mismatches; deprecated APIs; install requirements not met (psycopg2 vs psycopg2-binary; pyodbc; ODBC driver encoding; DSN issues).\n", - " - Configuration mistakes: environment variable setup, Azure endpoint/deployment, API versions.\n", - " - Performance pitfalls: SQL metadata reflection on all tables; how to limit with include_tables, table_info hints, sample_rows_in_table_info; alternatives like SQL agents or limiting schema scope.\n", - " - Prompt formatting for different models; memory integration; passing callbacks to the correct component; BaseCallbackHandler method signatures and kwargs handling.\n", - " - Embeddings/vector store persistence; using the same embedding function on reload; Chroma persist_directory and .persist(); duplicate/empty collections.\n", - " - Document creation and usage: building Document(page_content=...), adding metadata, ensuring input_documents is a list of Documents (e.g., input_documents=[doc]).\n", - " - HF model loading with base + LoRA adapter; trust_remote_code; custom LangChain LLM subclass for custom pipelines or REST backends.\n", - "6) Scope some queries to authoritative sources and Q&A:\n", - " - Official docs/reference: site:docs.langchain.com OR site:python.langchain.com OR site:langchain.readthedocs.io\n", - " - API reference: site:api.python.langchain.com\n", - " - JavaScript docs (if relevant): site:js.langchain.com\n", - " - GitHub issues/discussions: site:github.com with “issues” or “discussions”\n", - " - Stack Overflow: site:stackoverflow.com\n", - " - Also include broader web queries (no site:) for blogs, tutorials, and community posts\n", - "\n", - "Domain-specific guidance to incorporate (LangChain and adjacent tooling)\n", - "- ChatOpenAI / OpenAI usage in LangChain 0.1.x/0.2.x:\n", - " - Prefer langchain_openai.ChatOpenAI; distinguish OpenAI vs AzureOpenAI/AzureChatOpenAI; pass azure_deployment, api_version, and endpoint for Azure.\n", - " - invoke(...) returns a BaseMessage/AIMessage; access the .content attribute to view text (print(result.content)).\n", - " - Trailing spaces in model names can cause “invalid_request_error”.\n", - "- Memory and custom prompts for retrieval chat:\n", - " - To combine memory and retrieval, consider ConversationalRetrievalChain; customize condense-question and answer prompts (PromptTemplate).\n", - " - Compare RetrievalQA.from_chain_type vs ConversationalRetrievalChain; attach ConversationBufferMemory; override prompts; demonstrate examples that show both memory and prompt influence.\n", - "- SQL agents and databases:\n", - " - Prefer create_sql_agent + SQLDatabaseToolkit + AgentExecutor for dynamic table selection.\n", - " - Connect via SQLDatabase.from_uri(\"postgresql+psycopg2://...\") or MSSQL \"mssql+pyodbc://...\".\n", - " - Ensure drivers installed (psycopg2/psycopg2-binary for Postgres, pyodbc for MSSQL); encode ODBC driver name with quote_plus if needed.\n", - " - SQLDatabaseSequentialChain can be slow due to metadata reflection across all tables; mitigate with include_tables, table_info hints, sample_rows_in_table_info, or lazy/limited reflection; consider SQLDatabaseChain or SQL agents as alternatives.\n", - "- ReAct agents with retrieval:\n", - " - Expose vector store retriever as a Tool (vector_store.as_retriever(search_kwargs={'k': ...})); use AgentExecutor or create_structured_chat_agent; compare to ConversationalRetrievalChain.\n", - "- Callbacks:\n", - " - Pass callbacks to the LLM instance when required, not just to chains.\n", - " - Verify BaseCallbackHandler method signatures; for example, on_llm_end should accept response and **kwargs in newer versions; review run manager kwargs and version-specific changes.\n", - "- Ollama / Llama models:\n", - " - Use LangChain’s Ollama or ChatOllama when replacing OpenAI; or implement a custom LLM wrapper for a REST API like localhost:11434/api/generate.\n", - " - Adjust prompt format for Llama 2/3 (system/instruction roles; chat templates) when replacing OpenAI; ensure compatibility with agents’ tool-use prompts.\n", - "- Chroma vector DB persistence:\n", - " - Use persist_directory both on create and reload; call .persist(); ensure the same embedding function is used on load; troubleshoot empty/duplicated collections.\n", - "- CSVLoader and Documents:\n", - " - CSVLoader builds Document.page_content by joining row key-value pairs excluding metadata_columns; embeddings are computed on page_content.\n", - " - Cleaning metadata (e.g., removing 'source' or 'row') affects only metadata, not the embedded text.\n", - "- HTML loading and splitting:\n", - " - UnstructuredHTMLLoader returns Document objects; HTMLHeaderTextSplitter expects raw HTML/text segments; ensure you pass the correct string (e.g., doc.page_content) and handle encoding to avoid weird characters.\n", - " - Consider pre-processing HTML with requests + BeautifulSoup (strip scripts/styles, normalize whitespace, fix encodings) before splitting.\n", - " - Use RecursiveCharacterTextSplitter or HTMLHeaderTextSplitter to keep headers with following paragraphs; ensure chunking keeps title and subsequent paragraph together; include metadata like page title.\n", - " - Control chunk sizes (e.g., max 20K characters) and save chunks with metadata for downstream training.\n", - "- Output parsers:\n", - " - StructuredOutputParser and ResponseSchema can describe nested JSON outputs for lists of dicts; parse(response.content).\n", - "- Versioning/migration:\n", - " - Modular packages and import changes (use langchain_openai.ChatOpenAI / AzureOpenAI; many integrations moved to langchain_community); consult migration guides and deprecation notes.\n", - "\n", - "Nuggets from common issues to explicitly target with queries when relevant\n", - "- CSVLoader vectorization: clarify that embeddings are created from Document.page_content built by joining non-metadata columns.\n", - "- HTML splitting: use BeautifulSoup to clean HTML; keep headers with following paragraphs; set metadata to page title; avoid splitting titles from their content; address “weird characters” via encoding and loader vs splitter mismatch.\n", - "- Callbacks: pass callbacks to the LLM instance; ensure on_llm_end signature matches current LangChain version (response, **kwargs).\n", - "- Ollama + SQL agent: load Llama2 via Ollama/ChatOllama; replace OpenAI in create_sql_agent/SQLDatabaseToolkit; adjust prompts for Llama chat format.\n", - "- Conversational retrieval with memory and custom prompts: override condense-question and answer prompts; add ConversationBufferMemory; include examples demonstrating both memory and prompt effects.\n", - "\n", - "Coverage requirements for each set of queries\n", - "1) Official docs/reference queries (API usage, parameters, signatures, migration notes).\n", - "2) End-to-end examples/tutorials and code samples.\n", - "3) Troubleshooting known errors and GitHub issues/discussions (include exact error strings in quotes).\n", - "4) Stack Overflow Q&A for similar symptoms.\n", - "5) Migration/deprecation notes for breaking changes (e.g., module split, class renames).\n", - "\n", - "Additional reminders\n", - "- Include queries that propose viable alternatives if the user’s approach is problematic (e.g., using SQL agents instead of SQLDatabaseSequentialChain; writing a custom LLM wrapper for a REST HF/Ollama backend).\n", - "- For Document-related issues, include queries emphasizing wrapping a single Document in a list when an API expects a list (e.g., input_documents=[doc]).\n", - "- For OpenAI/AzureOpenAI issues, include environment variable setup and configuration queries; explicitly mention OPENAI_API_KEY, AZURE_OPENAI_API_KEY, OPENAI_API_BASE, OPENAI_API_VERSION, and azure deployment names.\n", - "- Consider Python vs JS docs when the user context suggests either; include site scoping accordingly.\n", - "\n", - "Quality checks before submitting\n", - "- 10–15 distinct, detailed queries.\n", - "- A mix of site-scoped queries (docs, GitHub, Stack Overflow) and general web queries.\n", - "- Embed exact code identifiers, literals, and error messages from the user’s question in multiple queries.\n", - "- Use varied phrasing and explore multiple solution avenues and tool choices.\n", - "- Do not provide answers or code—only the list of search queries.\n", - "2025/08/13 22:18:48 INFO dspy.evaluate.evaluate: Average Metric: 5.0 / 5 (100.0%)\n", - "2025/08/13 22:19:09 INFO dspy.evaluate.evaluate: Average Metric: 13.166666666666668 / 15 (87.8%)\n", - "2025/08/13 22:19:09 INFO dspy.teleprompt.gepa.gepa: Iteration 15: New program is on the linear pareto front\n", - "2025/08/13 22:19:09 INFO dspy.teleprompt.gepa.gepa: Iteration 15: Full valset score for new program: 0.8777777777777779\n", - "2025/08/13 22:19:09 INFO dspy.teleprompt.gepa.gepa: Iteration 15: Full train_val score for new program: 0.8777777777777779\n", - "2025/08/13 22:19:09 INFO dspy.teleprompt.gepa.gepa: Iteration 15: Individual valset scores for new program: [1.0, 0.8333333333333334, 0.5, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.75, 1.0, 0.3333333333333333, 1.0, 1.0, 0.75]\n", - "2025/08/13 22:19:09 INFO dspy.teleprompt.gepa.gepa: Iteration 15: New valset pareto front scores: [1.0, 1.0, 0.5, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.75, 1.0, 1.0, 1.0, 1.0, 1.0]\n", - "2025/08/13 22:19:09 INFO dspy.teleprompt.gepa.gepa: Iteration 15: Full valset pareto front score: 0.95\n", - "2025/08/13 22:19:09 INFO dspy.teleprompt.gepa.gepa: Iteration 15: Updated valset pareto front programs: [{0, 1, 2, 3, 4, 5, 6, 7}, {2, 4, 5, 6}, {0, 1, 2, 3, 4, 5, 6, 7}, {0, 1, 2, 3, 4, 5, 6, 7}, {0, 1, 2, 5, 6, 7}, {0, 1, 2, 3, 4, 5, 6, 7}, {0, 1, 2, 3, 4, 5, 6, 7}, {0, 1, 2, 3, 4, 5, 6, 7}, {0, 1, 2, 3, 4, 5, 6, 7}, {0, 1, 2, 3, 4, 5, 6, 7}, {1, 2, 3, 4, 6, 7}, {0, 1, 2, 3, 4, 5, 6}, {7}, {1, 2, 3, 4, 5, 6, 7}, {3}]\n", - "2025/08/13 22:19:09 INFO dspy.teleprompt.gepa.gepa: Iteration 15: Best valset aggregate score so far: 0.8777777777777779\n", - "2025/08/13 22:19:09 INFO dspy.teleprompt.gepa.gepa: Iteration 15: Best program as per aggregate score on train_val: 7\n", - "2025/08/13 22:19:09 INFO dspy.teleprompt.gepa.gepa: Iteration 15: Best program as per aggregate score on valset: 7\n", - "2025/08/13 22:19:09 INFO dspy.teleprompt.gepa.gepa: Iteration 15: Best score on valset: 0.8777777777777779\n", - "2025/08/13 22:19:09 INFO dspy.teleprompt.gepa.gepa: Iteration 15: Best score on train_val: 0.8777777777777779\n", - "2025/08/13 22:19:09 INFO dspy.teleprompt.gepa.gepa: Iteration 15: Linear pareto front program index: 7\n", - "2025/08/13 22:19:09 INFO dspy.teleprompt.gepa.gepa: Iteration 15: New program candidate index: 7\n", - "2025/08/13 22:19:09 INFO dspy.teleprompt.gepa.gepa: Iteration 16: No merge candidates found\n", - "2025/08/13 22:19:09 INFO dspy.teleprompt.gepa.gepa: Iteration 16: Selected program 6 score: 0.8666666666666667\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 4.00 / 5 (80.0%): 100%|██████████| 5/5 [00:09<00:00, 1.98s/it] " - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 22:19:19 INFO dspy.evaluate.evaluate: Average Metric: 4.0 / 5 (80.0%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 22:20:33 INFO dspy.teleprompt.gepa.gepa: Iteration 16: Proposed new text for query_writer: You are given a user’s technical question. Your job is to output a diverse set of long, highly specific search queries that will help gather the most relevant information from search engines to solve the question.\n", - "\n", - "Output format rules\n", - "- Output only a flat list (array) of 10–15 search queries (strings). No explanations, no extra text before or after.\n", - "- Use a JSON-like Python list of strings.\n", - "- Each query must be long and detailed (aim for 12–25+ words).\n", - "- Queries must be distinct; avoid near-duplicates, trivial rephrasings, or only minor word swaps.\n", - "\n", - "How to craft the queries\n", - "1) Mirror the user’s goal and restate it in several ways across the queries. Reflect concrete tasks they mention and the exact operations they are trying to do.\n", - "2) Extract and embed exact class/function names, parameters, method calls, import paths, driver URIs, CLI flags, environment variables, filenames, and any literal error messages from the question. Preserve exact strings and punctuation.\n", - " - Examples to embed when relevant:\n", - " - LangChain: create_sql_agent, SQLDatabaseToolkit, SQLDatabase.from_uri(\"postgresql+psycopg2://...\"), SQLDatabaseSequentialChain, SQLDatabaseChain, AgentExecutor, create_structured_chat_agent, include_tables, table_info, sample_rows_in_table_info, top_k.\n", - " - Retrievers/QA: RetrievalQA.from_chain_type, ConversationalRetrievalChain, as_retriever(search_kwargs={'k': ...}).\n", - " - Documents: CharacterTextSplitter, RecursiveCharacterTextSplitter, HTMLHeaderTextSplitter, UnstructuredHTMLLoader, Document.page_content/.pageContent, Document.metadata, BaseMessage, AIMessage, .content, wrap Document in a list (e.g., input_documents=[doc]).\n", - " - Vector stores: Chroma.from_texts, persist_directory, .persist(), FAISS, Pinecone.\n", - " - OpenAI/AzureOpenAI: OpenAI(model_name='text-davinci-003'), ChatOpenAI, AzureOpenAI/AzureChatOpenAI, langchain_openai.ChatOpenAI, langchain_openai.AzureOpenAI. Mention model name issues (e.g., trailing space \"text-davinci-003 \").\n", - " - JS SDK: ChatOpenAI (JS), ChatPromptTemplate, BufferMemory(returnMessages: true), ConversationChain, handleLLMNewToken callback, streaming: true.\n", - " - HF/Others: HuggingFaceHub, HuggingFaceHubEmbeddings, transformers/peft/accelerate, AutoModelForCausalLM, AutoTokenizer, PeftModel, LoRA/PEFT adapters, trust_remote_code.\n", - " - Ollama/Llama: ChatOllama/Ollama, llamas via REST, Ollama endpoint like http://localhost:11434/api/generate.\n", - " - SQLAlchemy/drivers: SQLAlchemy 2.x, psycopg2/psycopg2-binary, pyodbc, “ODBC Driver 17 for SQL Server”, quote_plus for driver encoding.\n", - " - Driver URIs and connection strings exactly as shown by the user, e.g., \"mssql+pyodbc://user:pass@server/db?driver=ODBC Driver 17 for SQL Server\", \"postgresql://postgres:password@localhost:5432/postgres\".\n", - " - Env vars: OPENAI_API_KEY, AZURE_OPENAI_API_KEY, OPENAI_API_BASE, OPENAI_API_VERSION, azure deployment names.\n", - " - Literal errors: include the exact quoted text, e.g., \"openai.error.APIError: internal error\", \"invalid_request_error\", \"Pipeline cannot infer suitable model classes\", \"handleLLMNewToken not firing\".\n", - "3) Include relevant library/framework names and versions if known or commonly implicated:\n", - " - LangChain 0.1.x, 0.2.x; modular split into langchain, langchain-openai, langchain-community, langchain-core, langchain-text-splitters; JS docs: site:js.langchain.com\n", - " - SQLAlchemy 2.x; database drivers (psycopg2/psycopg2-binary for Postgres, pyodbc for MSSQL)\n", - " - Vector DBs (Chroma, Pinecone, FAISS)\n", - " - Hugging Face transformers/peft; LoRA/PEFT; Llama 2/3; Ollama\n", - "4) Use task-oriented phrasing and multiple solution angles to increase recall:\n", - " - Vary terms: “agent” vs “chain”; “callback” vs “hook”; “tool” vs “retriever tool” vs “vector store tool”; “ReAct” vs “structured chat agent”.\n", - " - Include queries on: “how to add memory”, “custom prompt”, “override condense question”, “disable metadata reflection”, “print BaseMessage content”, “wrap Document in a list”, “pass callbacks to the LLM instance”, “streaming tokens”.\n", - " - Migration/deprecation: module split, imports, API changes, class renames, deprecations in 0.1.x/0.2.x, using langchain_openai instead of legacy OpenAI class.\n", - "5) Cover multiple plausible interpretations and troubleshooting paths:\n", - " - API misuse or wrong class (e.g., using AzureOpenAI with OpenAI params; Chat vs LLM classes).\n", - " - Incorrect arguments/signature usage; wrong parameter names; typos or trailing spaces in model names; expecting text from BaseMessage without accessing .content.\n", - " - Version mismatch or deprecated APIs; migration to langchain_openai/langchain_community.\n", - " - Missing install or wrong driver; ODBC driver encoding; psycopg2 vs psycopg2-binary; DSN/SSL issues.\n", - " - SQLDatabase performance and schema reflection bottlenecks: metadata reflection across all tables on initialization causing delays; mitigation with include_tables, table_info hints, sample_rows_in_table_info; how to limit/lazily reflect metadata; note that some users temporarily comment out reflection lines in the SQLDatabase class as a workaround.\n", - " - Consider alternatives if the approach is problematic: prefer create_sql_agent + SQLDatabaseToolkit + AgentExecutor for dynamic table selection; compare SQLDatabaseSequentialChain vs SQLDatabaseChain vs SQL agents.\n", - " - Prompt formatting and memory integration; customizing condense-question and answer prompts; combining ConversationalRetrievalChain with ConversationBufferMemory.\n", - " - Streaming responses (especially JS): use ChatOpenAI with streaming: true, BufferMemory with returnMessages: true, ChatPromptTemplate/system prompts, callbacks array with handleLLMNewToken; pass callbacks to the LLM instance.\n", - " - Document workflow: parsing HTML (requests + BeautifulSoup), cleaning text to remove weird characters, creating Document(page_content=...) with metadata (e.g., page title), using HTMLHeaderTextSplitter or RecursiveCharacterTextSplitter to group headers with following paragraphs, ensuring chunk size limits (e.g., 20K characters), saving chunks and metadata; use createDocuments(texts, metadatas) (JS) to add metadata, or map over Documents to enrich metadata.\n", - " - Embeddings/vector store persistence: use same embedding function on reload; Chroma persist_directory and .persist(); troubleshoot empty/duplicated collections.\n", - " - HF model loading issues: “Pipeline cannot infer suitable model classes” due to missing config/tokenizer; trust_remote_code; loading base + LoRA adapter with peft; writing a custom LangChain LLM wrapper for custom pipelines or REST APIs (e.g., Ollama).\n", - "6) Scope some queries to authoritative sources and Q&A:\n", - " - Official docs/reference: site:docs.langchain.com OR site:python.langchain.com OR site:langchain.readthedocs.io\n", - " - API reference: site:api.python.langchain.com\n", - " - JavaScript docs: site:js.langchain.com\n", - " - GitHub issues/discussions: site:github.com with “issues” or “discussions”\n", - " - Stack Overflow: site:stackoverflow.com\n", - " - Also include some general web queries (no site:) for blogs, tutorials, examples, and community posts.\n", - "\n", - "Domain-specific guidance and gotchas to incorporate when applicable (LangChain and adjacent tooling)\n", - "- OpenAI/AzureOpenAI with LangChain 0.1.x/0.2.x:\n", - " - Use langchain_openai.ChatOpenAI (or AzureOpenAI/AzureChatOpenAI) rather than deprecated imports; for Azure set azure_deployment, api_version, and endpoint; ensure OPENAI_API_KEY/AZURE_OPENAI_API_KEY/OPENAI_API_BASE/OPENAI_API_VERSION configured.\n", - " - invoke(...) returns a BaseMessage/AIMessage; access .content for text.\n", - " - Trailing spaces in model names (e.g., 'text-davinci-003 ') can cause invalid_request_error.\n", - "- SQL and agents:\n", - " - SQLDatabaseSequentialChain can be slow due to metadata reflection across all tables; mitigate with include_tables, table_info, sample_rows_in_table_info; consider SQL agents via create_sql_agent + SQLDatabaseToolkit + AgentExecutor for dynamic table selection; or SQLDatabaseChain.\n", - " - Ensure database drivers installed (psycopg2/psycopg2-binary for Postgres, pyodbc + ODBC Driver 17 for SQL Server); encode driver name using quote_plus in URIs when needed.\n", - "- ReAct/structured chat + retrieval:\n", - " - Expose retrievers as Tools (vector_store.as_retriever(search_kwargs={'k': ...})); compare to ConversationalRetrievalChain; add memory; customize prompts.\n", - "- JS streaming:\n", - " - Prefer ChatOpenAI with streaming: true; use ChatPromptTemplate and BufferMemory({ returnMessages: true }); provide callbacks array with handleLLMNewToken to stream tokens.\n", - "- Documents and splitting:\n", - " - For HTML, consider requests + BeautifulSoup to clean content; construct Document(page_content=..., metadata={'title': ...}); use HTMLHeaderTextSplitter or RecursiveCharacterTextSplitter with separators to keep titles with following paragraphs; enforce max chunk size requirements; persist chunks and metadata.\n", - " - JS CharacterTextSplitter.createDocuments accepts a second argument metadatas (array of objects) merged into Document.metadata.\n", - " - When an API expects a list of Documents, wrap single doc: input_documents=[doc].\n", - "- Ollama/Llama:\n", - " - Use LangChain’s Ollama/ChatOllama integrations to load Llama 2; if using a custom REST API (e.g., http://localhost:11434/api/generate), create a custom LLM wrapper; adjust prompts to the model’s expected format.\n", - "\n", - "Coverage requirements for each set of queries\n", - "1) Include several queries targeting official docs and API references (usage, parameters, signatures, migration notes).\n", - "2) Include end-to-end examples/tutorials and code samples that mirror the user’s stack (Python or JS).\n", - "3) Include troubleshooting queries with exact error messages and stack details; scope to GitHub issues/discussions and Stack Overflow.\n", - "4) Include migration/deprecation notes for breaking changes (module split, class renames, API updates in 0.1.x/0.2.x).\n", - "5) Include queries proposing viable alternatives or workarounds (e.g., using SQL agents instead of SQLDatabaseSequentialChain; limiting schema reflection; writing a custom LangChain LLM wrapper for Ollama; HTML parsing/cleaning before splitting).\n", - "\n", - "Quality checks before submitting\n", - "- Provide 10–15 distinct, detailed queries; avoid near-duplicates.\n", - "- Mix site-scoped queries (docs, GitHub, Stack Overflow) and general web queries.\n", - "- Embed exact code identifiers, connection strings, model names, and literal errors from the user’s question in multiple queries.\n", - "- Vary phrasing and explore different solution avenues, including performance tuning, configuration, API correctness, and alternative designs.\n", - "- Do not provide answers or code—only the list of search queries.\n", - "2025/08/13 22:20:44 INFO dspy.evaluate.evaluate: Average Metric: 4.0 / 5 (80.0%)\n", - "2025/08/13 22:20:44 INFO dspy.teleprompt.gepa.gepa: Iteration 16: New subsample score is not better, skipping\n", - "2025/08/13 22:20:44 INFO dspy.teleprompt.gepa.gepa: Iteration 17: Selected program 7 score: 0.8777777777777779\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 4.00 / 5 (80.0%): 100%|██████████| 5/5 [00:10<00:00, 2.16s/it]" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 22:20:55 INFO dspy.evaluate.evaluate: Average Metric: 4.0 / 5 (80.0%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 22:22:36 INFO dspy.teleprompt.gepa.gepa: Iteration 17: Proposed new text for query_writer: You are given a user’s technical question. Your job is to output a diverse set of long, highly specific search queries that will help gather the most relevant information from search engines to solve the question.\n", - "\n", - "Output format rules\n", - "- Output only a flat list (array) of 10–15 search queries (strings). No explanations, no extra text.\n", - "- Use a JSON-like Python list of strings.\n", - "- Each query must be long and detailed (aim for 12–25+ words).\n", - "- Queries must be distinct; avoid near-duplicates and trivial rephrasings.\n", - "\n", - "How to craft the queries\n", - "1) Mirror the user’s goal and restate it in several ways across the queries. Reflect concrete tasks they mention and the exact operations they are trying to do, including desired outcomes, constraints, and edge cases.\n", - "2) Extract and embed exact class/function names, parameters, method calls, import paths, driver URIs, CLI flags, environment variables, file names, and any literal error messages (use quotes) from the question.\n", - " - When relevant, include identifiers such as: create_sql_agent, SQLDatabaseToolkit, SQLDatabase.from_uri(\"postgresql+psycopg2://...\"), AgentExecutor, RetrievalQA.from_chain_type, ConversationalRetrievalChain, CharacterTextSplitter.createDocuments, Document.page_content, BaseMessage, AIMessage, .content, HuggingFaceHub, HuggingFaceHubEmbeddings, Chroma.from_texts, .persist(), as_retriever(search_kwargs={'k': ...}), OpenAI(model_name='text-davinci-003'), ChatOpenAI, AzureOpenAI/AzureChatOpenAI, langchain_openai.ChatOpenAI, langchain_openai.AzureOpenAI, SQLDatabaseSequentialChain, SQLDatabaseChain, create_structured_chat_agent, Top-k parameters, include_tables, table_info, sample_rows_in_table_info.\n", - " - Include driver URIs and connection strings exactly as shown or plausible alternatives, e.g., \"postgresql+psycopg2://user:pass@host:5432/db\", \"mssql+pyodbc://user:pass@server/db?driver=ODBC Driver 17 for SQL Server\", quote_plus encoded ODBC driver names, and SQLite URIs.\n", - " - Include environment variables when applicable: OPENAI_API_KEY, AZURE_OPENAI_API_KEY, OPENAI_API_BASE, OPENAI_API_VERSION, and Azure deployment names.\n", - " - Include exact error strings the user shows or might encounter, e.g., \"invalid_request_error\", \"Pipeline cannot infer suitable model classes\", tokenizer/config errors (missing config.json, tokenizer.json, adapter_config.json), LoRA/PEFT adapter loading problems, or Python tracebacks like \"AttributeError: 'tuple' object has no attribute 'page_content'\".\n", - "3) Mention relevant library/framework names and versions if known or commonly implicated:\n", - " - LangChain 0.1.x, 0.2.x and the module split across langchain, langchain-openai, langchain-community, langchain-core, langchain-text-splitters; JavaScript docs: site:js.langchain.com\n", - " - OpenAI vs AzureOpenAI vs ChatOpenAI (langchain_openai), ChatOllama/Ollama, VertexAI, Llama 2/3\n", - " - SQLAlchemy 2.x, psycopg2/psycopg2-binary, pyodbc, ODBC Driver 17 for SQL Server\n", - " - Vector stores: Chroma, Pinecone, FAISS\n", - " - Hugging Face: transformers/peft/accelerate/sentencepiece; AutoModelForCausalLM, AutoTokenizer, PeftModel; LoRA/PEFT adapters\n", - "4) Use task-oriented phrasing and multiple solution angles to increase recall:\n", - " - Vary agent vs chain framing (“agent”, “tool”, “retriever tool”, “vector store tool”, “ReAct”, “structured chat agent”).\n", - " - Explore memory, prompt customization, callbacks vs hooks, retriever configuration, top_k tuning, and metadata reflection settings.\n", - " - Include migration/deprecation terms: module split, imports moved to langchain_openai or langchain_community, API changes, class renames, deprecations in 0.1.x/0.2.x.\n", - "5) Cover multiple plausible interpretations and troubleshooting paths:\n", - " - Wrong class or API (e.g., AzureOpenAI vs OpenAI incompatibility; Chat vs completion LLM classes).\n", - " - Incorrect arguments or signatures; wrong parameter names; trailing spaces in model names (“text-davinci-003 ”).\n", - " - Version mismatches; deprecated APIs; install requirements not met (psycopg2 vs psycopg2-binary; pyodbc; ODBC driver encoding; DSN issues).\n", - " - Configuration mistakes: environment variable setup, Azure endpoint/deployment, API versions.\n", - " - Performance pitfalls: SQL metadata reflection across all tables; how to limit with include_tables, table_info hints, sample_rows_in_table_info; alternatives like SQL agents or limiting schema scope.\n", - " - Prompt formatting for different models; memory integration; passing callbacks to the correct component; BaseCallbackHandler method signatures and kwargs handling.\n", - " - Embeddings/vector store persistence; using the same embedding function on reload; Chroma persist_directory and .persist(); duplicate/empty collections.\n", - " - Document creation and usage: building Document(page_content=...), adding metadata, ensuring input_documents is a list of Documents (e.g., input_documents=[doc]); highlight fixes for errors like \"AttributeError: 'tuple' object has no attribute 'page_content'\".\n", - " - HF model loading with base + LoRA adapter; trust_remote_code; custom LangChain LLM subclass for custom pipelines or REST backends.\n", - "6) Scope some queries to authoritative sources and Q&A:\n", - " - Official docs/reference: site:docs.langchain.com OR site:python.langchain.com OR site:langchain.readthedocs.io\n", - " - API reference: site:api.python.langchain.com\n", - " - JavaScript docs (if relevant): site:js.langchain.com\n", - " - GitHub issues/discussions: site:github.com with “issues” or “discussions”\n", - " - Stack Overflow: site:stackoverflow.com\n", - " - Also include broader web queries (no site:) for blogs, tutorials, and community posts\n", - "\n", - "Domain-specific guidance to incorporate when relevant (LangChain and adjacent tooling)\n", - "- ChatOpenAI / OpenAI usage in LangChain 0.1.x/0.2.x:\n", - " - Prefer langchain_openai.ChatOpenAI; distinguish OpenAI vs AzureOpenAI/AzureChatOpenAI; pass azure_deployment, api_version, and endpoint for Azure.\n", - " - invoke(...) returns a BaseMessage/AIMessage; explicitly access and print the .content attribute (e.g., result.content). Queries should mention printing response.content if user doesn’t see output.\n", - " - Trailing spaces in model names can cause “invalid_request_error”.\n", - "- Memory and custom prompts for retrieval chat:\n", - " - To combine memory and retrieval, consider ConversationalRetrievalChain; customize condense-question and answer prompts (PromptTemplate).\n", - " - Compare RetrievalQA.from_chain_type vs ConversationalRetrievalChain; attach ConversationBufferMemory; override prompts; include queries asking for examples that demonstrate both memory and prompt influence on answers.\n", - "- SQL agents and databases:\n", - " - Prefer create_sql_agent + SQLDatabaseToolkit + AgentExecutor for dynamic table selection.\n", - " - Connect via SQLDatabase.from_uri(\"postgresql+psycopg2://...\") or MSSQL \"mssql+pyodbc://...\".\n", - " - Ensure drivers installed (psycopg2/psycopg2-binary for Postgres, pyodbc for MSSQL); encode ODBC driver name with quote_plus if needed.\n", - " - SQLDatabaseSequentialChain can be slow due to metadata reflection across all tables; mitigate with include_tables, table_info, sample_rows_in_table_info, or limit reflection; consider SQLDatabaseChain or SQL agents as alternatives.\n", - "- ReAct agents with retrieval:\n", - " - Expose vector store retriever as a Tool (vector_store.as_retriever(search_kwargs={'k': ...})); use AgentExecutor or create_structured_chat_agent; compare to ConversationalRetrievalChain.\n", - "- Callbacks:\n", - " - Pass callbacks to the LLM instance when required, not just to chains.\n", - " - Implement BaseCallbackHandler with updated signatures; e.g., on_llm_end(self, response, **kwargs). Include queries about handler methods and passing callbacks at the correct layer.\n", - "- Ollama / Llama models:\n", - " - Use LangChain’s Ollama or ChatOllama when replacing OpenAI; or implement a custom LLM wrapper for REST APIs (e.g., localhost:11434/api/generate).\n", - " - Adjust prompt format for Llama 2/3 chat templates; ensure compatibility with agent tool-use prompts.\n", - "- Chroma vector DB persistence:\n", - " - Use persist_directory both on create and reload; call .persist(); ensure the same embedding function is used on load; troubleshoot empty/duplicated collections.\n", - "- CSVLoader and Documents:\n", - " - CSVLoader builds Document.page_content by joining row key-value pairs excluding metadata_columns; embeddings are computed on page_content.\n", - " - Cleaning metadata (e.g., removing 'source' or 'row') affects only metadata, not the embedded text.\n", - "- HTML loading and splitting:\n", - " - UnstructuredHTMLLoader returns Document objects; HTMLHeaderTextSplitter expects raw HTML/text; pass doc.page_content; fix encoding issues.\n", - " - Use RecursiveCharacterTextSplitter or HTMLHeaderTextSplitter to keep headers with following paragraphs; include metadata like page title.\n", - "- Output parsers:\n", - " - StructuredOutputParser and ResponseSchema can describe nested JSON outputs for lists of dicts; parse(response.content).\n", - "- Versioning/migration:\n", - " - Modular packages and import changes (use langchain_openai.ChatOpenAI / AzureOpenAI; many integrations moved to langchain_community); consult migration guides and deprecation notes.\n", - "\n", - "Coverage requirements for each set of queries\n", - "1) Official docs/reference queries (API usage, parameters, signatures, migration notes).\n", - "2) End-to-end examples/tutorials and code samples.\n", - "3) Troubleshooting known errors and GitHub issues/discussions (include exact error strings in quotes).\n", - "4) Stack Overflow Q&A for similar symptoms.\n", - "5) Migration/deprecation notes for breaking changes (e.g., module split, class renames).\n", - "\n", - "Additional reminders\n", - "- Include queries that propose viable alternatives if the user’s approach is problematic (e.g., using SQL agents instead of SQLDatabaseSequentialChain; ConversationalRetrievalChain with memory and custom prompts; replacing OpenAI with Ollama/VertexAI).\n", - "- For Document-related issues, include queries emphasizing wrapping a single Document in a list when an API expects a list (e.g., input_documents=[doc]) to avoid errors like \"tuple object has no attribute page_content\".\n", - "- For OpenAI/AzureOpenAI issues, include environment variable setup and configuration queries; explicitly mention OPENAI_API_KEY, AZURE_OPENAI_API_KEY, OPENAI_API_BASE, OPENAI_API_VERSION, and Azure deployment names.\n", - "- Use some site-scoped queries (docs, API refs, GitHub, Stack Overflow) and some broad web queries for tutorials/blogs.\n", - "\n", - "Quality checks before submitting\n", - "- 10–15 distinct, detailed queries.\n", - "- A mix of site-scoped queries (docs, GitHub issues/discussions, Stack Overflow) and general web queries.\n", - "- Embed exact code identifiers, literals, and error messages from the user’s question in multiple queries.\n", - "- Vary framing (agents vs chains; memory vs prompts; retriever config vs performance) to increase recall.\n", - "- Do not provide answers or code—only the list of search queries.\n", - "2025/08/13 22:22:48 INFO dspy.evaluate.evaluate: Average Metric: 3.75 / 5 (75.0%)\n", - "2025/08/13 22:22:48 INFO dspy.teleprompt.gepa.gepa: Iteration 17: New subsample score is not better, skipping\n", - "2025/08/13 22:22:48 INFO dspy.teleprompt.gepa.gepa: Iteration 18: Selected program 7 score: 0.8777777777777779\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 4.58 / 5 (91.7%): 100%|██████████| 5/5 [00:11<00:00, 2.36s/it] " - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 22:22:59 INFO dspy.evaluate.evaluate: Average Metric: 4.583333333333334 / 5 (91.7%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 22:24:27 INFO dspy.teleprompt.gepa.gepa: Iteration 18: Proposed new text for query_writer: You are given a user’s technical question. Your job is to output a diverse set of long, highly specific search queries that will help gather the most relevant information from search engines to solve the question.\n", - "\n", - "Output format rules\n", - "- Output only a flat list (array) of 10–15 search queries (strings). No explanations, no extra text.\n", - "- Use a JSON-like Python list of strings.\n", - "- Each query must be long and detailed (aim for 12–25+ words).\n", - "- Queries must be distinct; avoid near-duplicates and trivial rephrasings.\n", - "\n", - "How to craft the queries\n", - "1) Mirror the user’s goal and restate it in several ways across the queries. Reflect concrete tasks they mention and the exact operations they are trying to do, including desired outcomes, constraints, and edge cases.\n", - "\n", - "2) Extract and embed exact class/function names, parameters, method calls, import paths, driver URIs, CLI flags, environment variables, file names, and any literal error messages from the question.\n", - " - Include identifiers such as: create_sql_agent, SQLDatabaseToolkit, SQLDatabase.from_uri(\"postgresql+psycopg2://...\"), SQLDatabaseSequentialChain, SQLDatabaseChain, AgentExecutor, RetrievalQA.from_chain_type, ConversationalRetrievalChain, CharacterTextSplitter.createDocuments, Document.page_content, Document, input_documents=[doc], BaseMessage, AIMessage, .content, ChatPromptTemplate, PromptTemplate, StructuredOutputParser, ResponseSchema, StructuredOutputParser.from_response_schemas, StructuredOutputParser.parse, StructuredOutputParser.get_format_instructions, StructuredOutputParser for list outputs, create_structured_chat_agent, as_retriever(search_kwargs={'k': ...}), vector_store.as_retriever, Chroma.from_texts, Chroma.from_documents, .persist(), persist_directory, HuggingFaceHub, HuggingFaceHubEmbeddings, AutoModelForCausalLM, AutoTokenizer, PeftModel.\n", - " - Include driver URIs and connection strings exactly or with plausible alternatives, for example:\n", - " \"postgresql+psycopg2://user:pass@host:5432/db\",\n", - " \"mssql+pyodbc://user:pass@server/db?driver=ODBC Driver 17 for SQL Server\",\n", - " and quote_plus encoded driver names when relevant.\n", - " - Include environment variables when applicable: OPENAI_API_KEY, AZURE_OPENAI_API_KEY, OPENAI_API_BASE, OPENAI_API_VERSION, and Azure deployment names or parameters such as azure_deployment.\n", - " - Include exact error strings shown or likely to occur, e.g., \"invalid_request_error\", \"Pipeline cannot infer suitable model classes\", \"value is not a valid dict (type=type_error.dict)\", tokenization/config issues like missing config.json, tokenizer.json, adapter_config.json, LoRA/PEFT adapter loading or merging problems.\n", - "\n", - "3) Mention relevant library/framework names and versions when known or commonly implicated:\n", - " - LangChain 0.1.x, 0.2.x; migrated modules across langchain, langchain-openai (langchain_openai), langchain-community (langchain_community), langchain-core, langchain-text-splitters; JS docs: site:js.langchain.com\n", - " - OpenAI vs AzureOpenAI vs ChatOpenAI (langchain_openai), AzureChatOpenAI, langchain_openai.AzureOpenAI\n", - " - SQLAlchemy 2.x, psycopg2/psycopg2-binary, pyodbc, ODBC Driver 17 for SQL Server\n", - " - Vector stores: Chroma, Pinecone, FAISS\n", - " - Hugging Face: transformers, peft, accelerate, sentencepiece; AutoModelForCausalLM, AutoTokenizer, PeftModel; LoRA/PEFT adapters\n", - " - Ollama/ChatOllama, VertexAI, Llama 2/3\n", - "\n", - "4) Use task-oriented phrasing and multiple solution angles to increase recall:\n", - " - Vary agent vs chain framing (agent, tool, retriever tool, vector store tool, ReAct, structured chat agent).\n", - " - Explore memory integration, prompt customization, callbacks vs hooks, retriever configuration (top_k tuning), metadata reflection settings.\n", - " - Include migration/deprecation terms: module split, imports moved to langchain_openai or langchain_community, API changes, class renames, deprecations in 0.1.x/0.2.x.\n", - " - Propose viable alternatives if the user’s approach is problematic (e.g., use create_sql_agent instead of SQLDatabaseSequentialChain; wrap retrievers as Tools for agents; custom LLM subclass for custom pipelines or REST backends).\n", - "\n", - "5) Cover multiple plausible interpretations and troubleshooting paths:\n", - " - Wrong class or API usage (AzureOpenAI vs OpenAI incompatibility; Chat vs completion LLM classes).\n", - " - Incorrect arguments or signatures; wrong parameter names; trailing spaces in model names (e.g., \"text-davinci-003 \" causing \"invalid_request_error\").\n", - " - Version mismatches; deprecated APIs; missing install requirements (psycopg2 vs psycopg2-binary; pyodbc; ODBC driver issues).\n", - " - Configuration mistakes: environment variables, Azure endpoint/deployment, API versions, regional endpoints.\n", - " - Performance pitfalls: SQL metadata reflection on all tables; mitigate with include_tables, table_info hints, sample_rows_in_table_info; consider SQL agents or limited reflection.\n", - " - Prompt formatting across models; memory integration (ConversationBufferMemory); passing callbacks to the correct component and verifying BaseCallbackHandler signatures (e.g., on_llm_end(response, **kwargs)).\n", - " - Embeddings/vector store persistence; using the same embedding function when reloading; Chroma persist_directory and .persist(); avoiding empty/duplicate collections.\n", - " - Document creation and usage: build Document(page_content=...), add metadata, ensure APIs expecting a list receive input_documents=[doc].\n", - " - HTML loading/splitting: UnstructuredHTMLLoader vs HTMLHeaderTextSplitter; ensure you pass doc.page_content; encoding and preprocessing with requests + BeautifulSoup.\n", - " - HF model loading with base + LoRA adapter; trust_remote_code; pipeline vs AutoModel classes; handling missing config.json/tokenizer.json; merging adapters; creating a custom LangChain LLM subclass to support HF + PEFT.\n", - " - Ollama substitution for OpenAI; prompt/chat template adjustments for Llama 2/3; tool-use prompt compatibility.\n", - "\n", - "6) Scope some queries to authoritative sources and Q&A:\n", - " - Official docs/reference: site:docs.langchain.com OR site:python.langchain.com OR site:langchain.readthedocs.io\n", - " - API reference: site:api.python.langchain.com\n", - " - JavaScript docs (if relevant): site:js.langchain.com\n", - " - GitHub issues/discussions: site:github.com with “issues” or “discussions”\n", - " - Stack Overflow: site:stackoverflow.com\n", - " - Also include broader web queries (no site:) for blogs, tutorials, and community posts\n", - "\n", - "Domain-specific guidance to incorporate (LangChain and adjacent tooling)\n", - "- ChatOpenAI / OpenAI usage in LangChain 0.1.x/0.2.x:\n", - " - Prefer langchain_openai.ChatOpenAI; distinguish OpenAI vs AzureOpenAI/AzureChatOpenAI; for Azure pass azure_deployment (or deployment_name), api_version, and endpoint.\n", - " - invoke(...) returns a BaseMessage/AIMessage; access .content for text (e.g., print(result.content)).\n", - " - Trailing spaces in model names can cause “invalid_request_error”.\n", - "- Memory and custom prompts for retrieval chat:\n", - " - Combine memory and retrieval via ConversationalRetrievalChain; customize condense-question and answer prompts (PromptTemplate); compare against RetrievalQA.from_chain_type.\n", - "- SQL agents and databases:\n", - " - Prefer create_sql_agent + SQLDatabaseToolkit + AgentExecutor for dynamic table selection.\n", - " - Connect via SQLDatabase.from_uri(\"postgresql+psycopg2://...\") or MSSQL \"mssql+pyodbc://...\"; set include_tables, table_info, sample_rows_in_table_info to limit reflection.\n", - " - Ensure drivers installed (psycopg2 or psycopg2-binary for Postgres; pyodbc and ODBC Driver 17 for SQL Server, with quote_plus encoding if needed).\n", - "- ReAct agents with retrieval:\n", - " - Expose a vector store retriever as a Tool (vector_store.as_retriever(search_kwargs={'k': ...})); use AgentExecutor or create_structured_chat_agent; compare to ConversationalRetrievalChain.\n", - "- Callbacks:\n", - " - Pass callbacks to the LLM instance when required, not only to chains; verify BaseCallbackHandler method signatures for your LangChain version.\n", - "- Ollama / Llama models:\n", - " - Use LangChain’s Ollama or ChatOllama when replacing OpenAI; adjust chat templates for Llama 2/3; ensure compatibility with agents’ tool-use prompts.\n", - "- Chroma vector DB persistence:\n", - " - Use persist_directory on create and reload; call .persist(); reload with the same embedding function; troubleshoot empty/duplicated collections.\n", - "- CSVLoader and Documents:\n", - " - CSVLoader builds Document.page_content by joining non-metadata columns; embeddings are computed from page_content; metadata cleanup does not change embedded text.\n", - "- HTML loading and splitting:\n", - " - UnstructuredHTMLLoader returns Document objects; HTMLHeaderTextSplitter expects raw text; pass doc.page_content; address encoding issues; optionally pre-process with BeautifulSoup; use RecursiveCharacterTextSplitter to keep titles with following paragraphs.\n", - "- Output parsers:\n", - " - StructuredOutputParser/ResponseSchema can describe nested JSON outputs; for lists of dicts, instruct the model in format_instructions to return an array of objects; parse(response.content).\n", - "- Versioning/migration:\n", - " - Many integrations moved to langchain_community; OpenAI classes moved to langchain_openai; consult migration guides and deprecation notes for 0.1.x/0.2.x.\n", - "\n", - "Coverage requirements for each set of queries\n", - "1) Official docs/reference queries (API usage, parameters, signatures, migration notes).\n", - "2) End-to-end examples/tutorials and code samples.\n", - "3) Troubleshooting known errors and GitHub issues/discussions (include exact error strings from the user).\n", - "4) Stack Overflow Q&A for similar symptoms.\n", - "5) Migration/deprecation notes for breaking changes (e.g., module split, class renames).\n", - "\n", - "Additional reminders\n", - "- Include queries that propose viable alternatives if the user’s path seems incompatible or inefficient (e.g., use SQL agents instead of SQLDatabaseSequentialChain; create a custom LLM subclass for HF PEFT models; wrap retrievers as Tools for ReAct agents).\n", - "- For database connectivity, include concrete Postgres/MSSQL URIs and driver notes; ensure psycopg2/psycopg2-binary or pyodbc installation details are mentioned.\n", - "- For Chroma, explicitly mention persist_directory on both save and load, and .persist(), and the need to reuse the same embedding function when reloading.\n", - "- For Document-related APIs, emphasize wrapping a single Document in a list when required (input_documents=[doc]).\n", - "- When the user shows code, import paths, or error strings, include them verbatim within quotes in multiple queries to maximize exact-match results.\n", - "- Keep queries long, concrete, and task-oriented; vary phrasing and scope across library docs, GitHub, Stack Overflow, and general web sources.\n", - "\n", - "Quality checks before submitting\n", - "- 10–15 distinct, detailed queries.\n", - "- A mix of site-scoped queries (docs, GitHub, Stack Overflow) and general web queries.\n", - "- Embed exact code identifiers, parameters, connection strings, environment variables, and error messages from the user’s question.\n", - "- Vary solution angles: API usage, tutorials, troubleshooting, migration, and alternatives.\n", - "2025/08/13 22:24:38 INFO dspy.evaluate.evaluate: Average Metric: 4.75 / 5 (95.0%)\n", - "2025/08/13 22:24:55 ERROR dspy.utils.parallelizer: Error for Example({'question': 'I\\'m trying to use the Langchain ReAct Agents and I want to give them my pinecone index for context. I couldn\\'t find any interface that let me provide the LLM that uses the ReAct chain my vector embeddings as well.\\nHere I set up the LLM and retrieve my vector embedding.\\nllm = ChatOpenAI(temperature=0.1, model_name=\"gpt-4\")\\nretriever = vector_store.as_retriever(search_type=\\'similarity\\', search_kwargs={\\'k\\': k})\\n\\nHere I start my ReAct Chain.\\nprompt = hub.pull(\"hwchase17/structured-chat-agent\")\\nagent = create_structured_chat_agent(llm, tools, prompt)\\nagent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)\\nresult = agent_executor.invoke(\\n {\\n \"input\": question,\\n \"chat_history\": chat_history\\n }\\n)\\n\\nBefore using the ReAct Agent, I used the vector embedding like this.\\ncrc = ConversationalRetrievalChain.from_llm(llm, retriever)\\nresult = crc.invoke({\\'question\\': systemPrompt, \\'chat_history\\': chat_history})\\nchat_history.append((question, result[\\'answer\\']))\\n\\nIs there any way to combine both methods and have a ReAct agent that also uses vector Embeddings?\\n', 'dataset_ids': ['langchain/libs/langchain/langchain/agents/agent_toolkits/conversational_retrieval/tool.py_0_97', 'langchain/libs/core/langchain_core/tools/retriever.py_0_2715', 'llama_index/llama-index-legacy/llama_index/legacy/tools/retriever_tool.py_0_3608', 'llama_index/llama-index-core/llama_index/core/tools/retriever_tool.py_0_4630', 'langchain/cookbook/agent_fireworks_ai_langchain_mongodb.ipynb_18395_25876', 'langchain/docs/docs/integrations/retrievers/box.ipynb_13396_19155', 'langchain/templates/csv-agent/csv_agent/agent.py_0_3058', 'langchain/templates/retrieval-agent-fireworks/retrieval_agent_fireworks/chain.py_0_3406', 'langchainjs/langchain/src/agents/react/index.ts_0_3577', 'llama_index/llama-index-core/llama_index/core/agent/legacy/react/base.py_1401_10329', 'llama_index/docs/docs/module_guides/deploying/agents/usage_pattern.md_0_6045', 'langchain/docs/docs/integrations/providers/cohere.mdx_0_4970', 'langchainjs/docs/core_docs/docs/how_to/vectorstore_retriever.mdx_0_2415', 'langchain/docs/docs/how_to/vectorstore_retriever.ipynb_0_6315', 'langchainjs/docs/core_docs/docs/how_to/convert_runnable_to_tool.ipynb_6456_12601', 'langchain-nextjs-template/data/DefaultRetrievalText.ts_7898_16370', 'langchain/docs/docs/how_to/agent_executor.ipynb_6779_13949', 'langchain/templates/rag-pinecone/rag_pinecone/chain.py_0_2003', 'langchain/templates/rag-pinecone-multi-query/rag_pinecone_multi_query/chain.py_0_2201', 'langchain/templates/rag-pinecone-rerank/rag_pinecone_rerank/chain.py_0_2356', 'langchain/templates/rag-conversation/rag_conversation/chain.py_0_4291', 'langchainjs/docs/core_docs/docs/how_to/qa_chat_history_how_to.ipynb_0_8093', 'langchain/cookbook/agent_vectorstore.ipynb_0_6948'], 'nugget_data': [{'nugget_id': '78149859_nugget_0', 'text': 'Use the retriever as a tool for the ReAct agent by creating a retriever tool.', 'relevant_corpus_ids': ['langchain/libs/langchain/langchain/agents/agent_toolkits/conversational_retrieval/tool.py_0_97', 'langchain/libs/core/langchain_core/tools/retriever.py_0_2715', 'llama_index/llama-index-legacy/llama_index/legacy/tools/retriever_tool.py_0_3608', 'llama_index/llama-index-core/llama_index/core/tools/retriever_tool.py_0_4630', 'langchain/cookbook/agent_fireworks_ai_langchain_mongodb.ipynb_18395_25876', 'langchain/docs/docs/integrations/retrievers/box.ipynb_13396_19155', 'langchain/templates/csv-agent/csv_agent/agent.py_0_3058', 'langchain/templates/retrieval-agent-fireworks/retrieval_agent_fireworks/chain.py_0_3406', 'langchainjs/langchain/src/agents/react/index.ts_0_3577', 'llama_index/llama-index-core/llama_index/core/agent/legacy/react/base.py_1401_10329', 'llama_index/docs/docs/module_guides/deploying/agents/usage_pattern.md_0_6045', 'langchain/docs/docs/integrations/providers/cohere.mdx_0_4970']}, {'nugget_id': '78149859_nugget_1', 'text': 'Configure the retriever with the vector store using `vector_store.as_retriever`.', 'relevant_corpus_ids': ['langchainjs/docs/core_docs/docs/how_to/vectorstore_retriever.mdx_0_2415', 'langchain/docs/docs/how_to/vectorstore_retriever.ipynb_0_6315', 'langchainjs/docs/core_docs/docs/how_to/convert_runnable_to_tool.ipynb_6456_12601', 'langchain-nextjs-template/data/DefaultRetrievalText.ts_7898_16370', 'langchain/templates/csv-agent/csv_agent/agent.py_0_3058', 'langchain/docs/docs/how_to/agent_executor.ipynb_6779_13949', 'langchain/templates/rag-pinecone/rag_pinecone/chain.py_0_2003', 'langchain/templates/rag-pinecone-multi-query/rag_pinecone_multi_query/chain.py_0_2201', 'langchain/templates/rag-pinecone-rerank/rag_pinecone_rerank/chain.py_0_2356', 'langchain/templates/rag-conversation/rag_conversation/chain.py_0_4291', 'langchainjs/docs/core_docs/docs/how_to/qa_chat_history_how_to.ipynb_0_8093']}, {'nugget_id': '78149859_nugget_2', 'text': 'Add the retriever tool to the list of tools used by the agent to enable the agent to utilize vector embeddings.', 'relevant_corpus_ids': ['langchain/cookbook/agent_vectorstore.ipynb_0_6948', 'langchain/libs/core/langchain_core/tools/retriever.py_0_2715', 'langchain/cookbook/agent_fireworks_ai_langchain_mongodb.ipynb_18395_25876', 'langchain/docs/docs/integrations/retrievers/box.ipynb_13396_19155', 'langchain/templates/csv-agent/csv_agent/agent.py_0_3058', 'langchain/templates/retrieval-agent-fireworks/retrieval_agent_fireworks/chain.py_0_3406', 'langchainjs/langchain/src/agents/react/index.ts_0_3577', 'llama_index/llama-index-core/llama_index/core/agent/legacy/react/base.py_1401_10329', 'llama_index/docs/docs/module_guides/deploying/agents/usage_pattern.md_0_6045', 'langchain/docs/docs/integrations/providers/cohere.mdx_0_4970']}]}) (input_keys={'question'}): Could not connect to Weaviate:Connection to Weaviate failed. Details: .\n", - "Traceback (most recent call last):\n", - " File \"/Users/cshorten/Desktop/retrieve-dspy/.venv/lib/python3.11/site-packages/httpx/_transports/default.py\", line 101, in map_httpcore_exceptions\n", - " yield\n", - " File \"/Users/cshorten/Desktop/retrieve-dspy/.venv/lib/python3.11/site-packages/httpx/_transports/default.py\", line 250, in handle_request\n", - " resp = self._pool.handle_request(req)\n", - " ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n", - " File \"/Users/cshorten/Desktop/retrieve-dspy/.venv/lib/python3.11/site-packages/httpcore/_sync/connection_pool.py\", line 256, in handle_request\n", - " raise exc from None\n", - " File \"/Users/cshorten/Desktop/retrieve-dspy/.venv/lib/python3.11/site-packages/httpcore/_sync/connection_pool.py\", line 236, in handle_request\n", - " response = connection.handle_request(\n", - " ^^^^^^^^^^^^^^^^^^^^^^^^^^\n", - " File \"/Users/cshorten/Desktop/retrieve-dspy/.venv/lib/python3.11/site-packages/httpcore/_sync/connection.py\", line 101, in handle_request\n", - " raise exc\n", - " File \"/Users/cshorten/Desktop/retrieve-dspy/.venv/lib/python3.11/site-packages/httpcore/_sync/connection.py\", line 78, in handle_request\n", - " stream = self._connect(request)\n", - " ^^^^^^^^^^^^^^^^^^^^^^\n", - " File \"/Users/cshorten/Desktop/retrieve-dspy/.venv/lib/python3.11/site-packages/httpcore/_sync/connection.py\", line 124, in _connect\n", - " stream = self._network_backend.connect_tcp(**kwargs)\n", - " ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n", - " File \"/Users/cshorten/Desktop/retrieve-dspy/.venv/lib/python3.11/site-packages/httpcore/_backends/sync.py\", line 207, in connect_tcp\n", - " with map_exceptions(exc_map):\n", - " File \"/Users/cshorten/.local/share/uv/python/cpython-3.11.4-macos-aarch64-none/lib/python3.11/contextlib.py\", line 155, in __exit__\n", - " self.gen.throw(typ, value, traceback)\n", - " File \"/Users/cshorten/Desktop/retrieve-dspy/.venv/lib/python3.11/site-packages/httpcore/_exceptions.py\", line 14, in map_exceptions\n", - " raise to_exc(exc) from exc\n", - "httpcore.ConnectError: [Errno 8] nodename nor servname provided, or not known\n", - "\n", - "The above exception was the direct cause of the following exception:\n", - "\n", - "Traceback (most recent call last):\n", - " File \"/Users/cshorten/Desktop/retrieve-dspy/.venv/lib/python3.11/site-packages/weaviate/connect/executor.py\", line 80, in execute\n", - " call = method(*args, **kwargs)\n", - " ^^^^^^^^^^^^^^^^^^^^^^^\n", - " File \"/Users/cshorten/Desktop/retrieve-dspy/.venv/lib/python3.11/site-packages/httpx/_client.py\", line 914, in send\n", - " response = self._send_handling_auth(\n", - " ^^^^^^^^^^^^^^^^^^^^^^^^^\n", - " File \"/Users/cshorten/Desktop/retrieve-dspy/.venv/lib/python3.11/site-packages/httpx/_client.py\", line 942, in _send_handling_auth\n", - " response = self._send_handling_redirects(\n", - " ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n", - " File \"/Users/cshorten/Desktop/retrieve-dspy/.venv/lib/python3.11/site-packages/httpx/_client.py\", line 979, in _send_handling_redirects\n", - " response = self._send_single_request(request)\n", - " ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n", - " File \"/Users/cshorten/Desktop/retrieve-dspy/.venv/lib/python3.11/site-packages/httpx/_client.py\", line 1014, in _send_single_request\n", - " response = transport.handle_request(request)\n", - " ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n", - " File \"/Users/cshorten/Desktop/retrieve-dspy/.venv/lib/python3.11/site-packages/httpx/_transports/default.py\", line 249, in handle_request\n", - " with map_httpcore_exceptions():\n", - " File \"/Users/cshorten/.local/share/uv/python/cpython-3.11.4-macos-aarch64-none/lib/python3.11/contextlib.py\", line 155, in __exit__\n", - " self.gen.throw(typ, value, traceback)\n", - " File \"/Users/cshorten/Desktop/retrieve-dspy/.venv/lib/python3.11/site-packages/httpx/_transports/default.py\", line 118, in map_httpcore_exceptions\n", - " raise mapped_exc(message) from exc\n", - "httpx.ConnectError: [Errno 8] nodename nor servname provided, or not known\n", - "\n", - "The above exception was the direct cause of the following exception:\n", - "\n", - "Traceback (most recent call last):\n", - " File \"/Users/cshorten/Desktop/retrieve-dspy/.venv/lib/python3.11/site-packages/weaviate/connect/v4.py\", line 920, in connect\n", - " meta = executor.result(self.get_meta(False))\n", - " ^^^^^^^^^^^^^^^^^^^^\n", - " File \"/Users/cshorten/Desktop/retrieve-dspy/.venv/lib/python3.11/site-packages/weaviate/connect/v4.py\", line 887, in get_meta\n", - " return executor.execute(\n", - " ^^^^^^^^^^^^^^^^^\n", - " File \"/Users/cshorten/Desktop/retrieve-dspy/.venv/lib/python3.11/site-packages/weaviate/connect/executor.py\", line 99, in execute\n", - " return cast(T, exception_callback(e))\n", - " ^^^^^^^^^^^^^^^^^^^^^\n", - " File \"/Users/cshorten/Desktop/retrieve-dspy/.venv/lib/python3.11/site-packages/weaviate/connect/executor.py\", line 38, in raise_exception\n", - " raise e\n", - " File \"/Users/cshorten/Desktop/retrieve-dspy/.venv/lib/python3.11/site-packages/weaviate/connect/executor.py\", line 80, in execute\n", - " call = method(*args, **kwargs)\n", - " ^^^^^^^^^^^^^^^^^^^^^^^\n", - " File \"/Users/cshorten/Desktop/retrieve-dspy/.venv/lib/python3.11/site-packages/weaviate/connect/v4.py\", line 857, in get\n", - " return self._send(\n", - " ^^^^^^^^^^^\n", - " File \"/Users/cshorten/Desktop/retrieve-dspy/.venv/lib/python3.11/site-packages/weaviate/connect/v4.py\", line 716, in _send\n", - " return executor.execute(\n", - " ^^^^^^^^^^^^^^^^^\n", - " File \"/Users/cshorten/Desktop/retrieve-dspy/.venv/lib/python3.11/site-packages/weaviate/connect/executor.py\", line 99, in execute\n", - " return cast(T, exception_callback(e))\n", - " ^^^^^^^^^^^^^^^^^^^^^\n", - " File \"/Users/cshorten/Desktop/retrieve-dspy/.venv/lib/python3.11/site-packages/weaviate/connect/v4.py\", line 714, in exc\n", - " self.__handle_exceptions(e, error_msg)\n", - " File \"/Users/cshorten/Desktop/retrieve-dspy/.venv/lib/python3.11/site-packages/weaviate/connect/v4.py\", line 667, in __handle_exceptions\n", - " raise WeaviateConnectionError(error_msg) from e\n", - "weaviate.exceptions.WeaviateConnectionError: Connection to Weaviate failed. Details: \n", - "\n", - "The above exception was the direct cause of the following exception:\n", - "\n", - "Traceback (most recent call last):\n", - " File \"/Users/cshorten/Desktop/retrieve-dspy/.venv/lib/python3.11/site-packages/dspy/utils/parallelizer.py\", line 55, in safe_func\n", - " return user_function(item)\n", - " ^^^^^^^^^^^^^^^^^^^\n", - " File \"/Users/cshorten/Desktop/retrieve-dspy/.venv/lib/python3.11/site-packages/dspy/evaluate/evaluate.py\", line 155, in process_item\n", - " prediction = program(**example.inputs())\n", - " ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n", - " File \"/Users/cshorten/Desktop/retrieve-dspy/.venv/lib/python3.11/site-packages/dspy/utils/callback.py\", line 326, in sync_wrapper\n", - " return fn(instance, *args, **kwargs)\n", - " ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n", - " File \"/Users/cshorten/Desktop/retrieve-dspy/.venv/lib/python3.11/site-packages/dspy/primitives/module.py\", line 73, in __call__\n", - " output = self.forward(*args, **kwargs)\n", - " ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n", - " File \"/Users/cshorten/Desktop/retrieve-dspy/retrieve_dspy/retrievers/multi_query_writer.py\", line 66, in forward\n", - " _, src = weaviate_search_tool(\n", - " ^^^^^^^^^^^^^^^^^^^^^\n", - " File \"/Users/cshorten/Desktop/retrieve-dspy/retrieve_dspy/tools/weaviate_database.py\", line 24, in weaviate_search_tool\n", - " weaviate_client = weaviate.connect_to_weaviate_cloud(\n", - " ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n", - " File \"/Users/cshorten/Desktop/retrieve-dspy/.venv/lib/python3.11/site-packages/weaviate/connect/helpers.py\", line 107, in connect_to_weaviate_cloud\n", - " return __connect(\n", - " ^^^^^^^^^^\n", - " File \"/Users/cshorten/Desktop/retrieve-dspy/.venv/lib/python3.11/site-packages/weaviate/connect/helpers.py\", line 371, in __connect\n", - " raise e\n", - " File \"/Users/cshorten/Desktop/retrieve-dspy/.venv/lib/python3.11/site-packages/weaviate/connect/helpers.py\", line 367, in __connect\n", - " client.connect()\n", - " File \"/Users/cshorten/Desktop/retrieve-dspy/.venv/lib/python3.11/site-packages/weaviate/client_executor.py\", line 149, in connect\n", - " return executor.execute(\n", - " ^^^^^^^^^^^^^^^^^\n", - " File \"/Users/cshorten/Desktop/retrieve-dspy/.venv/lib/python3.11/site-packages/weaviate/connect/executor.py\", line 99, in execute\n", - " return cast(T, exception_callback(e))\n", - " ^^^^^^^^^^^^^^^^^^^^^\n", - " File \"/Users/cshorten/Desktop/retrieve-dspy/.venv/lib/python3.11/site-packages/weaviate/connect/executor.py\", line 38, in raise_exception\n", - " raise e\n", - " File \"/Users/cshorten/Desktop/retrieve-dspy/.venv/lib/python3.11/site-packages/weaviate/connect/executor.py\", line 80, in execute\n", - " call = method(*args, **kwargs)\n", - " ^^^^^^^^^^^^^^^^^^^^^^^\n", - " File \"/Users/cshorten/Desktop/retrieve-dspy/.venv/lib/python3.11/site-packages/weaviate/connect/v4.py\", line 934, in connect\n", - " raise WeaviateStartUpError(f\"Could not connect to Weaviate:{e}.\") from e\n", - "weaviate.exceptions.WeaviateStartUpError: Could not connect to Weaviate:Connection to Weaviate failed. Details: .\n", - "\n", - "2025/08/13 22:24:58 INFO dspy.evaluate.evaluate: Average Metric: 13.0 / 15 (86.7%)\n", - "2025/08/13 22:24:58 INFO dspy.teleprompt.gepa.gepa: Iteration 18: Full valset score for new program: 0.8666666666666667\n", - "2025/08/13 22:24:58 INFO dspy.teleprompt.gepa.gepa: Iteration 18: Full train_val score for new program: 0.8666666666666667\n", - "2025/08/13 22:24:58 INFO dspy.teleprompt.gepa.gepa: Iteration 18: Individual valset scores for new program: [1.0, 1.0, 0.5, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.75, 1.0, 0.0, 1.0, 1.0, 0.75]\n", - "2025/08/13 22:24:58 INFO dspy.teleprompt.gepa.gepa: Iteration 18: New valset pareto front scores: [1.0, 1.0, 0.5, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.75, 1.0, 1.0, 1.0, 1.0, 1.0]\n", - "2025/08/13 22:24:58 INFO dspy.teleprompt.gepa.gepa: Iteration 18: Full valset pareto front score: 0.95\n", - "2025/08/13 22:24:58 INFO dspy.teleprompt.gepa.gepa: Iteration 18: Updated valset pareto front programs: [{0, 1, 2, 3, 4, 5, 6, 7, 8}, {2, 4, 5, 6, 8}, {0, 1, 2, 3, 4, 5, 6, 7, 8}, {0, 1, 2, 3, 4, 5, 6, 7, 8}, {0, 1, 2, 5, 6, 7, 8}, {0, 1, 2, 3, 4, 5, 6, 7, 8}, {0, 1, 2, 3, 4, 5, 6, 7, 8}, {0, 1, 2, 3, 4, 5, 6, 7, 8}, {0, 1, 2, 3, 4, 5, 6, 7, 8}, {0, 1, 2, 3, 4, 5, 6, 7, 8}, {1, 2, 3, 4, 6, 7, 8}, {0, 1, 2, 3, 4, 5, 6}, {8, 7}, {1, 2, 3, 4, 5, 6, 7, 8}, {3}]\n", - "2025/08/13 22:24:58 INFO dspy.teleprompt.gepa.gepa: Iteration 18: Best valset aggregate score so far: 0.8777777777777779\n", - "2025/08/13 22:24:58 INFO dspy.teleprompt.gepa.gepa: Iteration 18: Best program as per aggregate score on train_val: 7\n", - "2025/08/13 22:24:58 INFO dspy.teleprompt.gepa.gepa: Iteration 18: Best program as per aggregate score on valset: 7\n", - "2025/08/13 22:24:58 INFO dspy.teleprompt.gepa.gepa: Iteration 18: Best score on valset: 0.8777777777777779\n", - "2025/08/13 22:24:58 INFO dspy.teleprompt.gepa.gepa: Iteration 18: Best score on train_val: 0.8777777777777779\n", - "2025/08/13 22:24:58 INFO dspy.teleprompt.gepa.gepa: Iteration 18: Linear pareto front program index: 7\n", - "2025/08/13 22:24:58 INFO dspy.teleprompt.gepa.gepa: Iteration 18: New program candidate index: 8\n", - "2025/08/13 22:24:58 INFO dspy.teleprompt.gepa.gepa: Iteration 19: No merge candidates found\n", - "2025/08/13 22:24:58 INFO dspy.teleprompt.gepa.gepa: Iteration 19: Selected program 8 score: 0.8666666666666667\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 3.75 / 5 (75.0%): 100%|██████████| 5/5 [00:10<00:00, 2.04s/it] " - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 22:25:09 INFO dspy.evaluate.evaluate: Average Metric: 3.75 / 5 (75.0%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 22:26:49 INFO dspy.teleprompt.gepa.gepa: Iteration 19: Proposed new text for query_writer: You are given a user’s technical question. Your job is to output a diverse set of long, highly specific search queries that will help gather the most relevant information from search engines to solve the question.\n", - "\n", - "Output format rules\n", - "- Output only a flat list (array) of 10–15 search queries (strings). No explanations, no extra text.\n", - "- Use a JSON-like Python list of strings, e.g., [\"query 1\", \"query 2\"].\n", - "- Each query must be long and detailed (aim for 12–25+ words).\n", - "- Queries must be distinct; avoid near-duplicates and trivial rephrasings.\n", - "- Mix site-scoped queries (official docs, GitHub, Stack Overflow) with broader web queries.\n", - "\n", - "How to craft the queries\n", - "1) Mirror the user’s goal and restate it in several ways across the queries. Reflect concrete tasks, exact operations, desired outcomes, constraints, and edge cases. Ask for end-to-end examples that include runnable code and visible outputs.\n", - "\n", - "2) Extract and embed exact identifiers from the question and common adjacent APIs. Include literal class/function names, parameters, method calls, import paths, driver URIs, CLI flags, environment variables, file names, and any literal error messages. Examples to use verbatim or as plausible alternatives:\n", - " - LangChain classes/chains/agents/tools: create_sql_agent, SQLDatabaseToolkit, SQLDatabase.from_uri(\"postgresql+psycopg2://...\"), SQLDatabaseChain, SQLDatabaseSequentialChain, AgentExecutor, RetrievalQA.from_chain_type, ConversationalRetrievalChain, CharacterTextSplitter.createDocuments, RecursiveCharacterTextSplitter, HTMLHeaderTextSplitter, UnstructuredHTMLLoader, CSVLoader, vector_store.as_retriever(search_kwargs={'k': ...}), Chroma.from_texts, Chroma.from_documents, .persist(), persist_directory.\n", - " - Prompting and output parsing: ChatPromptTemplate, PromptTemplate, StructuredOutputParser, ResponseSchema, StructuredOutputParser.from_response_schemas, StructuredOutputParser.get_format_instructions, StructuredOutputParser.parse, StructuredOutputParser for list outputs, create_structured_chat_agent.\n", - " - Messages/IO: BaseMessage, AIMessage, invoke(...), .content, input_documents=[doc], Document, Document.page_content.\n", - " - OpenAI/Azure/HF: ChatOpenAI (langchain_openai), OpenAI vs AzureOpenAI vs AzureChatOpenAI, langchain_openai.AzureOpenAI, azure_deployment (or deployment_name), OPENAI_API_KEY, AZURE_OPENAI_API_KEY, OPENAI_API_BASE, OPENAI_API_VERSION.\n", - " - HF and PEFT: transformers, peft, accelerate, sentencepiece; AutoModelForCausalLM, AutoTokenizer, PeftModel; LoRA/PEFT adapter loading/merging; trust_remote_code; pipeline vs AutoModel; missing config.json, tokenizer.json, adapter_config.json.\n", - " - Vector stores: Chroma, Pinecone, FAISS; embedding function reuse on reload.\n", - " - DB drivers/URIs: \n", - " \"postgresql+psycopg2://user:pass@host:5432/db\",\n", - " \"mssql+pyodbc://user:pass@server/db?driver=ODBC Driver 17 for SQL Server\",\n", - " \"mssql+pyodbc://user:pass@server/db?driver=ODBC+Driver+17+for+SQL+Server\" (quote_plus encoded).\n", - " - Exact/likely error strings: \"invalid_request_error\", \"internal error\", \"500\", \"value is not a valid dict (type=type_error.dict)\", \"Pipeline cannot infer suitable model classes\", tokenization/config missing files, LoRA/PEFT adapter issues, model name typos or trailing spaces like \"text-davinci-003 \" causing \"invalid_request_error\".\n", - "\n", - "3) Mention relevant library/framework names and versions when likely implicated:\n", - " - LangChain 0.1.x and 0.2.x; module split across langchain, langchain-openai (langchain_openai), langchain-community (langchain_community), langchain-core, langchain-text-splitters.\n", - " - SQLAlchemy 2.x, psycopg2/psycopg2-binary, pyodbc, ODBC Driver 17/18 for SQL Server.\n", - " - Vector stores: Chroma, Pinecone, FAISS.\n", - " - Hugging Face: transformers, peft, accelerate, sentencepiece; AutoModelForCausalLM, AutoTokenizer, PeftModel.\n", - " - Ollama/ChatOllama, VertexAI, Llama 2/3.\n", - "\n", - "4) Vary solution angles to increase recall:\n", - " - Compare agent vs chain framing (ReAct agent, SQL agent, retriever tool, vector store tool).\n", - " - Memory integration and prompt customization (ConversationBufferMemory; override condense_question_prompt and qa_prompt).\n", - " - Callbacks vs hooks; pass callbacks to correct components; BaseCallbackHandler signatures.\n", - " - Retriever configuration and performance tuning (search_kwargs={'k': ...}, metadata reflection limits).\n", - " - Migration/deprecations: imports moved to langchain_openai or langchain_community; API changes/renames in 0.1.x/0.2.x.\n", - " - Alternatives when the approach is problematic (e.g., use create_sql_agent instead of SQLDatabaseSequentialChain; wrap retrievers as Tools; custom LLM subclass for HF+PEFT; Ollama substitution for OpenAI; different vector stores).\n", - "\n", - "5) Cover multiple interpretations and troubleshooting paths:\n", - " - Wrong class/API usage (OpenAI vs AzureOpenAI/AzureChatOpenAI incompatibilities; chat vs completion models).\n", - " - Incorrect arguments/signatures; trailing spaces/misspelled model names causing \"invalid_request_error\".\n", - " - Version mismatches; missing installs (psycopg2 vs psycopg2-binary; pyodbc; ODBC driver setup).\n", - " - Cloud configuration mistakes: Azure endpoint/deployment, regional endpoints, api_version.\n", - " - SQL performance pitfalls: metadata reflection on all tables by SQLDatabase; mitigate with include_tables, table_info hints, sample_rows_in_table_info, and limiting reflection; consider SQL agents; optionally reflect specific schemas only.\n", - " - Prompt formatting across models; Llama 2/3 tool-use prompt compatibility.\n", - " - Chroma persistence specifics: set persist_directory on create and load; call .persist(); reload with the same embedding function; avoid empty/duplicate collections; verify collection name.\n", - " - Document creation/usage: build Document(page_content=...); add metadata; wrap single Document in a list as input_documents=[doc].\n", - " - HTML loading/splitting: UnstructuredHTMLLoader returns Document; HTMLHeaderTextSplitter expects raw text; pass doc.page_content (not str(Document)); handle encoding/unicode cleanup; remove boilerplate with BeautifulSoup; use RecursiveCharacterTextSplitter with keep_separator or custom separators to keep headings with following paragraphs; include page title in metadata.\n", - " - Output handling: invoke(...) returns BaseMessage/AIMessage; print(result); print(result.content) for text; include runnable examples that produce visible output.\n", - "\n", - "6) Scope some queries to authoritative sources and Q&A:\n", - " - Official docs/reference: site:docs.langchain.com OR site:python.langchain.com OR site:langchain.readthedocs.io\n", - " - API reference: site:api.python.langchain.com\n", - " - JavaScript docs when relevant: site:js.langchain.com\n", - " - GitHub issues/discussions: site:github.com with “issues” or “discussions”\n", - " - Stack Overflow: site:stackoverflow.com\n", - " - Also include broader web queries (blogs, tutorials, community posts, “end-to-end example”, “step-by-step”, “full code sample”).\n", - "\n", - "Additional domain-specific guidance to incorporate in the queries\n", - "- ChatOpenAI / OpenAI usage in LangChain 0.1.x/0.2.x:\n", - " - Prefer langchain_openai.ChatOpenAI; distinguish OpenAI vs AzureOpenAI/AzureChatOpenAI; for Azure pass azure_deployment (or deployment_name), api_version, and endpoint.\n", - " - invoke(...) returns a BaseMessage/AIMessage; access .content for text.\n", - " - Trailing spaces in model names can cause “invalid_request_error”.\n", - "- Memory and custom prompts for retrieval chat:\n", - " - Combine memory and retrieval via ConversationalRetrievalChain; customize condense-question and answer prompts (PromptTemplate).\n", - " - Compare to RetrievalQA.from_chain_type; include concrete demo code with a sample user query and printed outputs showing memory/prompt effects.\n", - "- SQL agents and databases:\n", - " - Prefer create_sql_agent + SQLDatabaseToolkit + AgentExecutor for dynamic table selection.\n", - " - Connect via SQLDatabase.from_uri(\"postgresql+psycopg2://...\") or MSSQL \"mssql+pyodbc://...\"; encode driver name with quote_plus; set include_tables, table_info, sample_rows_in_table_info to limit reflection; optionally restrict to schemas; discuss reflection performance.\n", - " - Ensure drivers installed (psycopg2 or psycopg2-binary for Postgres; pyodbc and ODBC Driver 17/18 for SQL Server); connection timeout, Encrypt, TrustServerCertificate parameters.\n", - "- ReAct agents with retrieval:\n", - " - Expose vector store retriever as a Tool (vector_store.as_retriever(search_kwargs={'k': ...})); use AgentExecutor or create_structured_chat_agent; compare to ConversationalRetrievalChain.\n", - "- Callbacks:\n", - " - Pass callbacks to the LLM instance when required; verify BaseCallbackHandler method signatures matching your LangChain version.\n", - "- Ollama / Llama models:\n", - " - Use LangChain’s Ollama or ChatOllama when replacing OpenAI; adjust chat templates for Llama 2/3; ensure agents’ tool-use prompt compatibility.\n", - "- Chroma vector DB persistence:\n", - " - Use persist_directory on both save and load; call .persist(); reload with the same embedding function; troubleshoot empty/duplicated collections.\n", - "- CSV/HTML loading and Documents:\n", - " - CSVLoader builds Document.page_content from non-metadata columns; embeddings derive from page_content.\n", - " - For HTML: prefer requests + BeautifulSoup to fetch/clean; ensure you pass doc.page_content; handle encoding and weird characters; keep headings with paragraphs; use RecursiveCharacterTextSplitter with appropriate chunk_size (e.g., 20000) and chunk_overlap; store metadata (page title).\n", - "- Output parsers:\n", - " - StructuredOutputParser/ResponseSchema can describe nested JSON outputs; for lists of dicts, instruct the model to return an array of objects; parse(response.content).\n", - "- Versioning/migration:\n", - " - Many integrations moved to langchain_community; OpenAI classes moved to langchain_openai; consult migration guides and deprecation notes for 0.1.x/0.2.x.\n", - "\n", - "Coverage requirements for each set of queries\n", - "1) Official docs/reference queries (API usage, parameters, signatures, migration notes) with site scoping.\n", - "2) End-to-end examples/tutorials and code samples, including queries that ask for runnable demos showing printed outputs (e.g., print(result.content)).\n", - "3) Troubleshooting known errors and GitHub issues/discussions, embedding exact error strings from the user.\n", - "4) Stack Overflow Q&A for similar symptoms.\n", - "5) Migration/deprecation notes for breaking changes (module split, class renames).\n", - "\n", - "Quality checks before submitting\n", - "- 10–15 distinct, detailed queries.\n", - "- Each query 12–25+ words, concrete, and task-oriented.\n", - "- Mix of site-scoped (docs, GitHub, Stack Overflow) and general web queries.\n", - "- Embed exact code identifiers, parameters, environment variables, driver URIs, and error messages from the user’s question.\n", - "- Vary solution angles: API usage, end-to-end demos with visible output, troubleshooting, performance, and migration; propose viable alternatives where appropriate.\n", - "2025/08/13 22:27:01 INFO dspy.evaluate.evaluate: Average Metric: 4.75 / 5 (95.0%)\n", - "2025/08/13 22:27:23 INFO dspy.evaluate.evaluate: Average Metric: 13.5 / 15 (90.0%)\n", - "2025/08/13 22:27:23 INFO dspy.teleprompt.gepa.gepa: Iteration 19: New program is on the linear pareto front\n", - "2025/08/13 22:27:23 INFO dspy.teleprompt.gepa.gepa: Iteration 19: Full valset score for new program: 0.9\n", - "2025/08/13 22:27:23 INFO dspy.teleprompt.gepa.gepa: Iteration 19: Full train_val score for new program: 0.9\n", - "2025/08/13 22:27:23 INFO dspy.teleprompt.gepa.gepa: Iteration 19: Individual valset scores for new program: [1.0, 1.0, 0.5, 1.0, 0.75, 1.0, 1.0, 1.0, 1.0, 0.75, 1.0, 1.0, 1.0, 1.0, 0.5]\n", - "2025/08/13 22:27:23 INFO dspy.teleprompt.gepa.gepa: Iteration 19: New valset pareto front scores: [1.0, 1.0, 0.5, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.75, 1.0, 1.0, 1.0, 1.0, 1.0]\n", - "2025/08/13 22:27:23 INFO dspy.teleprompt.gepa.gepa: Iteration 19: Full valset pareto front score: 0.95\n", - "2025/08/13 22:27:23 INFO dspy.teleprompt.gepa.gepa: Iteration 19: Updated valset pareto front programs: [{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, {2, 4, 5, 6, 8, 9}, {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, {0, 1, 2, 5, 6, 7, 8}, {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, {1, 2, 3, 4, 6, 7, 8, 9}, {0, 1, 2, 3, 4, 5, 6, 9}, {8, 9, 7}, {1, 2, 3, 4, 5, 6, 7, 8, 9}, {3}]\n", - "2025/08/13 22:27:23 INFO dspy.teleprompt.gepa.gepa: Iteration 19: Best valset aggregate score so far: 0.9\n", - "2025/08/13 22:27:23 INFO dspy.teleprompt.gepa.gepa: Iteration 19: Best program as per aggregate score on train_val: 9\n", - "2025/08/13 22:27:23 INFO dspy.teleprompt.gepa.gepa: Iteration 19: Best program as per aggregate score on valset: 9\n", - "2025/08/13 22:27:23 INFO dspy.teleprompt.gepa.gepa: Iteration 19: Best score on valset: 0.9\n", - "2025/08/13 22:27:23 INFO dspy.teleprompt.gepa.gepa: Iteration 19: Best score on train_val: 0.9\n", - "2025/08/13 22:27:23 INFO dspy.teleprompt.gepa.gepa: Iteration 19: Linear pareto front program index: 9\n", - "2025/08/13 22:27:23 INFO dspy.teleprompt.gepa.gepa: Iteration 19: New program candidate index: 9\n", - "2025/08/13 22:27:23 INFO dspy.teleprompt.gepa.gepa: Iteration 20: No merge candidates found\n", - "2025/08/13 22:27:23 INFO dspy.teleprompt.gepa.gepa: Iteration 20: Selected program 7 score: 0.8777777777777779\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 4.58 / 5 (91.7%): 100%|██████████| 5/5 [00:10<00:00, 2.11s/it] " - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 22:27:33 INFO dspy.evaluate.evaluate: Average Metric: 4.583333333333334 / 5 (91.7%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 22:29:09 INFO dspy.teleprompt.gepa.gepa: Iteration 20: Proposed new text for query_writer: You are given a user’s technical question. Your job is to output a diverse set of long, highly specific search queries that will help gather the most relevant information from search engines to solve the question.\n", - "\n", - "Output format rules\n", - "- Output only a flat list (array) of 10–15 search queries (strings). No explanations, no extra text.\n", - "- Use a JSON-like Python list of strings.\n", - "- Each query must be long and detailed (aim for 12–25+ words).\n", - "- Queries must be distinct; avoid near-duplicates and trivial rephrasings.\n", - "\n", - "How to craft the queries\n", - "1) Mirror the user’s goal and restate it in several ways across the queries. Reflect concrete tasks they mention and the exact operations they are trying to do, including desired outcomes, constraints, and edge cases. Where relevant, include performance, configuration, and migration concerns.\n", - "\n", - "2) Extract and embed exact class/function names, parameters, method calls, import paths, driver URIs, CLI flags, environment variables, file names, and any literal error messages (quote them verbatim) from the question. Include identifiers and realistic alternatives such as:\n", - " - Agents/chains/tools: create_sql_agent, SQLDatabaseToolkit, SQLDatabase.from_uri(\"postgresql+psycopg2://...\"), AgentExecutor.invoke, RetrievalQA.from_chain_type, ConversationalRetrievalChain, create_structured_chat_agent, create_retriever_tool, vector_store.as_retriever(search_kwargs={'k': ...}), ReAct agents.\n", - " - LLMs/clients: OpenAI(model_name='text-davinci-003'), langchain_openai.ChatOpenAI, langchain_openai.AzureOpenAI/AzureChatOpenAI, ChatOpenAI(model_name=\"gpt-4\"), ChatOllama/Ollama, VertexAI, HuggingFaceHub, HuggingFacePipeline.\n", - " - Embeddings/vector stores: HuggingFaceHubEmbeddings, Chroma.from_texts, .persist(), persist_directory, Pinecone, FAISS, .as_retriever(search_kwargs={'k': ...}).\n", - " - Documents/splitting: CharacterTextSplitter, RecursiveCharacterTextSplitter, HTMLHeaderTextSplitter, Document.page_content, Document(metadata=...), input_documents=[doc], UnstructuredHTMLLoader, CSVLoader.\n", - " - Callbacks/memory/prompts: BaseCallbackHandler.on_llm_end(self, response, **kwargs), callbacks=[...], handleLLMNewToken(token), BufferMemory(returnMessages=True), ChatPromptTemplate, custom PromptTemplate for condense-question and answer prompts.\n", - " - SQL URIs/drivers: \"postgresql+psycopg2://user:pass@host:5432/db\", \"mssql+pyodbc://user:pass@server/db?driver=ODBC Driver 17 for SQL Server\", quote_plus encoded driver names, include_tables, table_info, sample_rows_in_table_info.\n", - " - Environment variables: OPENAI_API_KEY, AZURE_OPENAI_API_KEY, OPENAI_API_BASE, OPENAI_API_VERSION, Azure deployment/endpoint names.\n", - " - HF/PEFT files and classes: transformers AutoModelForCausalLM, AutoTokenizer, pipeline, PeftModel, LoRA/PEFT adapters, trust_remote_code, required files (config.json, tokenizer.json, tokenizer_config.json, adapter_config.json).\n", - " - Error strings: \"ValidationError: 1 validation error for SQLDatabaseToolkit llm value is not a valid dict (type=type_error.dict)\", \"invalid_request_error\", \"Pipeline cannot infer suitable model classes\", tokenizer/config mismatch errors, adapter loading problems, missing config.json.\n", - "\n", - "3) Mention relevant library/framework names and versions commonly implicated, and module split/migration notes:\n", - " - LangChain 0.1.x and 0.2.x; imports moved to langchain_openai, langchain_community, langchain_core, langchain_text_splitters; JS docs: site:js.langchain.com\n", - " - SQLAlchemy 2.x, psycopg2/psycopg2-binary, pyodbc, ODBC Driver 17 for SQL Server\n", - " - Vector stores: Chroma, Pinecone, FAISS\n", - " - Hugging Face: transformers/peft/accelerate/sentencepiece; AutoModelForCausalLM, AutoTokenizer, PeftModel\n", - " - Use langchain_openai.ChatOpenAI vs AzureOpenAI/AzureChatOpenAI, ensuring proper Azure endpoint/deployment/api_version configuration\n", - "\n", - "4) Use task-oriented phrasing and multiple solution angles to increase recall:\n", - " - Agent vs chain framing (“agent”, “tool”, “retriever tool”, “vector store tool”, “ReAct”, “structured chat agent”, “RAG”).\n", - " - Explore memory integration, custom prompts, callback placement (LLM vs chain), retriever configuration (top_k/k tuning), schema limitation and reflection controls (include_tables, table_info, sample_rows_in_table_info).\n", - " - Migration/deprecation terms: module split, moved imports, class renames, API signature changes (e.g., BaseCallbackHandler.on_llm_end now takes response, **kwargs).\n", - " - Alternatives if the user’s approach is problematic (e.g., use create_sql_agent instead of SQLDatabaseSequentialChain; wrap retriever as a Tool for ReAct agents; custom LLM wrapper/HuggingFacePipeline for HF/PEFT models; Ollama/ChatOllama for local Llama).\n", - "\n", - "5) Cover multiple plausible interpretations and troubleshooting paths:\n", - " - Wrong class or API (e.g., using AzureOpenAI vs ChatOpenAI mismatch; JS OpenAI vs ChatOpenAI classes; chat vs completion model differences).\n", - " - Incorrect args/signatures or parameter names; trailing spaces in model names (“text-davinci-003 ”) causing \"invalid_request_error\".\n", - " - Version mismatches; module split in 0.1.x/0.2.x; install requirements not met (psycopg2 vs psycopg2-binary; pyodbc; sentencepiece/accelerate/peft/transformers).\n", - " - Configuration mistakes: environment variables, Azure endpoint/deployment/api_version; embedding/LLM mismatch on Chroma reload; persist_directory usage; duplicate/empty collections; using the same embedding function on reload.\n", - " - Performance pitfalls: SQL metadata reflection across all tables; how to limit scope; agent vs chain selection.\n", - " - Document handling: ensure input_documents is a list; build Document(page_content=...) correctly; CSVLoader embeds joined row text; HTML loader vs splitter mismatch and encoding issues; keep headers with paragraphs.\n", - " - Callback system: pass callbacks to the LLM instance where required; BaseCallbackHandler method signatures and kwargs in newer versions; JS streaming callbacks array and handleLLMNewToken.\n", - " - HF LoRA/PEFT: loading base + adapter via peft, merging adapters, missing config/tokenizer files, HF Inference API limitations when only adapters are pushed; trust_remote_code; creating a custom LangChain LLM wrapper or HuggingFacePipeline for adapter models.\n", - "\n", - "6) Scope some queries to authoritative sources and Q&A:\n", - " - Official docs/reference: site:docs.langchain.com OR site:python.langchain.com OR site:langchain.readthedocs.io\n", - " - API reference: site:api.python.langchain.com\n", - " - JavaScript docs: site:js.langchain.com\n", - " - GitHub issues/discussions: site:github.com with “issues” or “discussions”\n", - " - Stack Overflow: site:stackoverflow.com\n", - " - Also include broader web queries (blogs, tutorials, community posts)\n", - "\n", - "Domain-specific guidance to explicitly incorporate (LangChain and adjacent tooling)\n", - "- Prefer langchain_openai.ChatOpenAI for Python; distinguish OpenAI vs AzureOpenAI/AzureChatOpenAI; for Azure pass azure_deployment (deployment name), api_version, and endpoint; ensure OPENAI_API_KEY/AZURE_OPENAI_API_KEY variables.\n", - "- invoke(...) returns a BaseMessage/AIMessage; access .content to view text (e.g., print(result.content)).\n", - "- JS streaming: prefer ChatOpenAI with streaming: true; use ChatPromptTemplate and BufferMemory({ returnMessages: true }); pass callbacks with handleLLMNewToken(token).\n", - "- SQL tools: Prefer create_sql_agent + SQLDatabaseToolkit + AgentExecutor for dynamic table selection; connect via SQLDatabase.from_uri(\"postgresql+psycopg2://...\") or MSSQL \"mssql+pyodbc://...\"; ensure drivers installed; encode ODBC driver name with quote_plus.\n", - "- SQL performance: SQLDatabaseSequentialChain can be slow due to reflection; mitigate with include_tables, table_info hints, sample_rows_in_table_info; consider SQLDatabaseChain or SQL agents.\n", - "- ReAct + retrieval: expose vector store retriever as a Tool (vector_store.as_retriever(search_kwargs={'k': ...})); use create_structured_chat_agent or AgentExecutor; optionally use create_retriever_tool; compare to ConversationalRetrievalChain.\n", - "- Callbacks: pass callbacks to the LLM instance when required; BaseCallbackHandler.on_llm_end should accept response and **kwargs in newer versions.\n", - "- Ollama/Llama: use ChatOllama/Ollama; adjust prompt chat templates for Llama 2/3 when replacing OpenAI; verify agent/tool-use prompt compatibility.\n", - "- Chroma persistence: use persist_directory on create and reload; call .persist(); use the same embedding function on reload; troubleshoot empty/duplicated collections.\n", - "- CSV/HTML docs: CSVLoader builds Document.page_content by joining non-metadata columns; for HTML use BeautifulSoup to clean, then HTMLHeaderTextSplitter or RecursiveCharacterTextSplitter; keep headers with content; include page title metadata.\n", - "- Output parsers: StructuredOutputParser and ResponseSchema can enforce nested JSON outputs; parse(response.content).\n", - "- Migration: many integrations moved to langchain_community; imports moved to langchain_openai; review 0.1.x/0.2.x migration guides and deprecations.\n", - "\n", - "Coverage requirements for each set of queries\n", - "1) Official docs/reference queries (API usage, parameters, signatures, migration notes).\n", - "2) End-to-end examples/tutorials and code samples.\n", - "3) Troubleshooting known errors and GitHub issues/discussions (include exact error strings in quotes).\n", - "4) Stack Overflow Q&A for similar symptoms.\n", - "5) Migration/deprecation notes for breaking changes (e.g., module split, class renames, callback signature updates).\n", - "\n", - "Additional reminders\n", - "- Include queries that propose viable alternatives if the user’s approach is problematic (e.g., SQL agents instead of SQLDatabaseSequentialChain; retriever as a Tool for ReAct; custom LLM wrapper/HuggingFacePipeline for PEFT adapters; using ChatOpenAI vs AzureOpenAI where appropriate).\n", - "- For Document-related issues, include queries emphasizing wrapping a single Document in a list when an API expects a list (e.g., input_documents=[doc]).\n", - "- For OpenAI/AzureOpenAI issues, include environment variable setup and configuration queries; explicitly mention OPENAI_API_KEY, AZURE_OPENAI_API_KEY, OPENAI_API_BASE, OPENAI_API_VERSION, and Azure deployment names.\n", - "- Surface common pitfalls: trailing spaces in model names causing \"invalid_request_error\"; incorrect callback placement; mis-specified retriever k/top_k; missing HF files (config.json, tokenizer.json); HF Hub models that only contain adapters require loading base + adapter with peft.\n", - "\n", - "Quality checks before submitting\n", - "- 10–15 distinct, detailed queries.\n", - "- A mix of site-scoped queries (docs, GitHub, Stack Overflow) and general web queries.\n", - "- Embed exact code identifiers, literals, URIs, env vars, and error messages from the user’s question in multiple queries.\n", - "- Use varied phrasing and explore multiple solution avenues and tool choices.\n", - "- Output ONLY the JSON-like Python list of strings, with no extra commentary.\n", - "2025/08/13 22:29:20 INFO dspy.evaluate.evaluate: Average Metric: 4.583333333333334 / 5 (91.7%)\n", - "2025/08/13 22:29:20 INFO dspy.teleprompt.gepa.gepa: Iteration 20: New subsample score is not better, skipping\n", - "2025/08/13 22:29:20 INFO dspy.teleprompt.gepa.gepa: Iteration 21: Selected program 3 score: 0.8444444444444444\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 4.50 / 5 (90.0%): 100%|██████████| 5/5 [00:10<00:00, 2.03s/it] " - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 22:29:31 INFO dspy.evaluate.evaluate: Average Metric: 4.5 / 5 (90.0%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 22:30:38 INFO dspy.teleprompt.gepa.gepa: Iteration 21: Proposed new text for query_writer: You are given a user’s technical question. Your task is to output a diverse set of long, highly specific search queries that will help gather the most relevant information from search engines to solve the question.\n", - "\n", - "Output format rules\n", - "- Output only a flat list (array) of 10–15 search queries (strings). No explanations, no extra text.\n", - "- Use a JSON-like Python list of strings.\n", - "- Each query must be long and detailed (aim for 12–25+ words).\n", - "- Queries must be distinct; avoid near-duplicates and trivial rephrasings.\n", - "\n", - "How to craft the queries\n", - "1) Mirror the user’s goal and restate it several ways across the queries. Reflect concrete tasks they mention and probable next steps.\n", - "2) Extract and embed exact class/function names, parameters, method calls, import paths, driver URIs, CLI flags, filenames, and any literal error messages from the question. Put exact errors in quotes (e.g., \"tuple' object has no attribute 'page_content'\").\n", - "3) Include relevant library/framework names and versions when known or commonly implicated:\n", - " - LangChain 0.1.x, 0.2.x; module split: langchain, langchain-openai, langchain-community, langchain-core, langchain-text-splitters\n", - " - OpenAI vs AzureOpenAI vs ChatOpenAI (langchain_openai), ChatOllama/Ollama, VertexAI, Llama 2/3\n", - " - SQLAlchemy 2.x, psycopg2/psycopg2-binary, pyodbc, ODBC Driver 17 for SQL Server\n", - " - Vector stores: Chroma, Pinecone, FAISS\n", - "4) Use multiple phrasings and solution angles. Include synonyms: “agent” vs “chain”; “callback” vs “hook”; “tool” vs “retriever tool” vs “vector store tool”; “ReAct” vs “structured chat agent”. Include “how to add memory”, “custom prompt”, “override condense question”, “disable metadata reflection”, “print BaseMessage content”, etc.\n", - "5) Cover multiple plausible interpretations and troubleshooting paths:\n", - " - API misuse or wrong class\n", - " - Incorrect arguments or signature usage\n", - " - Version mismatch or deprecated API (module split, import changes)\n", - " - Missing install or wrong driver\n", - " - Configuration issues, performance pitfalls, prompt formatting, environment variable setup\n", - " - Data structure shape issues (e.g., function expects List[Document], but user passed a single Document or tuple)\n", - "6) Scope some queries to authoritative sources and Q&A:\n", - " - Official docs/reference: site:docs.langchain.com OR site:python.langchain.com OR site:langchain.readthedocs.io\n", - " - API reference: site:api.python.langchain.com\n", - " - JS/TS docs when relevant: site:js.langchain.com\n", - " - GitHub issues/discussions: site:github.com with “issues” or “discussions”\n", - " - Stack Overflow: site:stackoverflow.com\n", - " - Also include generic web queries (no site:) for blogs, tutorials, and community posts.\n", - "\n", - "Domain-specific guidance to incorporate where applicable (LangChain and adjacent tooling)\n", - "- ChatOpenAI / OpenAI usage (LangChain 0.1.x/0.2.x):\n", - " - Use langchain_openai.ChatOpenAI; distinguish OpenAI vs AzureOpenAI vs ChatOllama/Ollama.\n", - " - invoke(...) returns a BaseMessage/AIMessage; to view text, access the .content attribute (e.g., response.content). Include queries mentioning “BaseMessage”, “AIMessage”, and “.content” if users see no output after invoke.\n", - " - Watch for trailing spaces in model name causing \"invalid_request_error\".\n", - "- Output parsers and structured outputs:\n", - " - StructuredOutputParser and ResponseSchema: to get arrays/lists of dicts, instruct the model to return a JSON array and include that in format_instructions; consider PydanticOutputParser or JSON schema-based parsers for nested lists.\n", - " - Include queries like “StructuredOutputParser parse list of dictionaries,” “ResponseSchema array of objects,” “parse(response.content),” and comparisons with PydanticOutputParser.\n", - "- Memory and custom prompts for retrieval chat:\n", - " - To combine memory and retrieval, prefer ConversationalRetrievalChain; customize condense-question and answer prompts (PromptTemplate).\n", - " - Compare RetrievalQA.from_chain_type vs ConversationalRetrievalChain; how to attach ConversationBufferMemory; how to override prompts.\n", - "- Documents and loaders:\n", - " - Document.page_content holds the text that gets embedded; embeddings run on page_content, not metadata.\n", - " - CSVLoader builds Document.page_content by joining row key-value pairs excluding metadata_columns; embeddings are computed on this joined content. Include queries verifying this behavior and how to customize.\n", - " - If a chain or tool expects a list of Document (e.g., input_documents or add_documents), wrap single Document objects in a list (e.g., [doc]); errors like \"tuple' object has no attribute 'page_content'\" often indicate wrong input shape.\n", - " - In JS, CharacterTextSplitter.createDocuments(texts, metadatas?) accepts a second argument (array of objects) merged into Document.metadata; include queries referencing this when adding metadata fields.\n", - "- SQL agents and databases:\n", - " - Use create_sql_agent, SQLDatabaseToolkit, SQLDatabase.from_uri(\"postgresql+psycopg2://...\"), AgentExecutor.\n", - " - Ensure drivers installed (psycopg2/psycopg2-binary for Postgres, pyodbc for MSSQL).\n", - " - For MSSQL URIs: \"mssql+pyodbc://...?...driver=ODBC Driver 17 for SQL Server\".\n", - " - SQLDatabaseSequentialChain can be slow from metadata reflection; include/limit tables, lazy reflection, table_info hints, or use SQL agents.\n", - "- ReAct agents with retrieval:\n", - " - Expose vector store retriever as a Tool (vector_store.as_retriever(search_kwargs={'k': ...})); use AgentExecutor or create_structured_chat_agent; consider ConversationalRetrievalChain.\n", - "- Callbacks:\n", - " - Pass callbacks to the LLM instance when required, not just to chains.\n", - " - Verify BaseCallbackHandler method signatures (on_llm_end(response, **kwargs)) and kwargs handling; note VertexAI callback nuances.\n", - "- Ollama / Llama models:\n", - " - Use LangChain’s Ollama or ChatOllama wrappers (langchain_community.llms.Ollama or chat_models.ChatOllama) when replacing OpenAI in agents/chains.\n", - " - Adjust prompt format for Llama 2/3 (system/instruction roles) for agents and text-to-SQL; if using a raw REST API (localhost:11434/api/generate), consider a custom LLM wrapper.\n", - "- Vector DB persistence:\n", - " - Chroma persistence: use persist_directory on create and reload; call .persist(); ensure same embedding function on load; troubleshoot empty/duplicated collections.\n", - "\n", - "Coverage requirements for each set of queries (aim to include a mix)\n", - "1) Official docs/reference queries (API usage, parameters, signatures).\n", - "2) End-to-end examples/tutorials and code samples.\n", - "3) Troubleshooting known errors and GitHub issues/discussions.\n", - "4) Stack Overflow Q&A for similar symptoms.\n", - "5) Migration/deprecation notes for breaking changes (e.g., module split, class renames).\n", - "\n", - "Quality checks before submitting\n", - "- Provide 10–15 distinct, detailed queries, each 12–25+ words.\n", - "- Include some site-scoped queries (docs, GitHub, Stack Overflow) and some general web queries.\n", - "- Embed exact code identifiers and quoted error strings from the user’s question.\n", - "- Include multiple plausible angles (usage, troubleshooting, migration, performance, config).\n", - "- Do not provide answers or code—only the list of search queries in a JSON-like Python list of strings.\n", - "2025/08/13 22:30:48 INFO dspy.evaluate.evaluate: Average Metric: 4.5 / 5 (90.0%)\n", - "2025/08/13 22:30:48 INFO dspy.teleprompt.gepa.gepa: Iteration 21: New subsample score is not better, skipping\n", - "2025/08/13 22:30:48 INFO dspy.teleprompt.gepa.gepa: Iteration 22: Selected program 9 score: 0.9\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 4.75 / 5 (95.0%): 100%|██████████| 5/5 [00:10<00:00, 2.12s/it] " - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 22:30:59 INFO dspy.evaluate.evaluate: Average Metric: 4.75 / 5 (95.0%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 22:32:46 INFO dspy.teleprompt.gepa.gepa: Iteration 22: Proposed new text for query_writer: You are given a user’s technical question. Your job is to output a diverse set of long, highly specific search queries that will help gather the most relevant information from search engines to solve the question.\n", - "\n", - "Output format rules\n", - "- Output only a flat list (array) of 10–15 search queries (strings). No explanations, no extra text.\n", - "- Use a JSON-like Python list of strings, e.g., [\"query 1\", \"query 2\"].\n", - "- Each query must be long and detailed (aim for 12–25+ words).\n", - "- Queries must be distinct; avoid near-duplicates and trivial rephrasings.\n", - "- Mix site-scoped queries (official docs, GitHub, Stack Overflow) with broader web queries.\n", - "\n", - "How to craft the queries\n", - "1) Mirror the user’s goal and restate it in several ways across the queries. Reflect concrete tasks, exact operations, desired outcomes, constraints, and edge cases. Ask for end-to-end examples that include runnable code and visible outputs (e.g., print(result) or print(result.content)).\n", - "\n", - "2) Extract and embed exact identifiers from the question and common adjacent APIs. Include literal class/function names, parameters, method calls, import paths, driver URIs, CLI flags, environment variables, file names, and any literal error messages. Examples to use verbatim or as plausible alternatives:\n", - " - LangChain classes/chains/agents/tools: create_sql_agent, SQLDatabaseToolkit, SQLDatabase.from_uri(\"postgresql+psycopg2://...\"), SQLDatabaseChain, SQLDatabaseSequentialChain, AgentExecutor, RetrievalQA.from_chain_type, ConversationalRetrievalChain, create_structured_chat_agent, CharacterTextSplitter.createDocuments, RecursiveCharacterTextSplitter, HTMLHeaderTextSplitter, UnstructuredHTMLLoader, CSVLoader, vector_store.as_retriever(search_kwargs={'k': ...}), Chroma.from_texts, Chroma.from_documents, .persist(), persist_directory.\n", - " - Prompting and output parsing: ChatPromptTemplate, PromptTemplate, StructuredOutputParser, ResponseSchema, StructuredOutputParser.from_response_schemas, StructuredOutputParser.get_format_instructions, StructuredOutputParser.parse, StructuredOutputParser for list outputs.\n", - " - Messages/IO: BaseMessage, AIMessage, invoke(...), .content, input_documents=[doc], Document, Document.page_content, chat_history.\n", - " - OpenAI/Azure/HF: ChatOpenAI (langchain_openai), OpenAI vs AzureOpenAI vs AzureChatOpenAI, langchain_openai.AzureOpenAI, azure_deployment (or deployment_name), OPENAI_API_KEY, AZURE_OPENAI_API_KEY, OPENAI_API_BASE, OPENAI_API_VERSION.\n", - " - HF and PEFT: transformers, peft, accelerate, sentencepiece, safetensors; AutoModelForCausalLM, AutoTokenizer, PeftModel, BitsAndBytesConfig; LoRA/PEFT adapter loading/merging (merge_and_unload); trust_remote_code; pipeline vs AutoModel; missing config.json, tokenizer.json, tokenizer.model, adapter_config.json.\n", - " - Vector stores: Chroma, Pinecone, FAISS; embedding function reuse on reload.\n", - " - DB drivers/URIs:\n", - " \"postgresql+psycopg2://user:pass@host:5432/db\",\n", - " \"mssql+pyodbc://user:pass@server/db?driver=ODBC Driver 17 for SQL Server\",\n", - " \"mssql+pyodbc://user:pass@server/db?driver=ODBC+Driver+17+for+SQL+Server\" (quote_plus encoded).\n", - " - Exact/likely error strings: \"invalid_request_error\", \"internal error\", \"500\", \"value is not a valid dict (type=type_error.dict)\", \"Pipeline cannot infer suitable model classes\", tokenization/config missing files, LoRA/PEFT adapter issues, model name typos or trailing spaces like \"text-davinci-003 \" causing \"invalid_request_error\".\n", - "\n", - "3) Mention relevant library/framework names and versions when likely implicated:\n", - " - LangChain 0.1.x and 0.2.x; module split across langchain, langchain-openai (langchain_openai), langchain-community (langchain_community), langchain-core, langchain-text-splitters.\n", - " - SQLAlchemy 2.x, psycopg2/psycopg2-binary, pyodbc, ODBC Driver 17/18 for SQL Server.\n", - " - Vector stores: Chroma, Pinecone, FAISS.\n", - " - Hugging Face: transformers, peft, accelerate, sentencepiece; AutoModelForCausalLM, AutoTokenizer, PeftModel.\n", - " - Ollama/ChatOllama, VertexAI, Llama 2/3.\n", - "\n", - "4) Vary solution angles to increase recall:\n", - " - Compare agent vs chain framing (ReAct agent, SQL agent, retriever tool, vector store tool).\n", - " - Memory integration and prompt customization (ConversationBufferMemory; override condense_question_prompt and qa_prompt).\n", - " - Callbacks vs hooks; pass callbacks to correct components; BaseCallbackHandler signatures.\n", - " - Retriever configuration and performance tuning (search_kwargs={'k': ...}, metadata reflection limits).\n", - " - Migration/deprecations: imports moved to langchain_openai or langchain_community; API changes/renames in 0.1.x/0.2.x.\n", - " - Alternatives when the approach is problematic (e.g., use create_sql_agent instead of SQLDatabaseSequentialChain; wrap retrievers as Tools; custom LLM subclass for HF+PEFT; Ollama substitution for OpenAI; different vector stores).\n", - " - HTML/CSV loading nuances: pass doc.page_content (not str(Document)); handle encoding; remove boilerplate via BeautifulSoup; keep headings with following paragraphs; include page title metadata; chunk_size and chunk_overlap selection (e.g., ~20000).\n", - " - SQL performance and reflection: set include_tables, table_info, sample_rows_in_table_info, and schema limits; consider avoiding/monkeypatching full metadata reflection if necessary; trade-offs vs SQL agents.\n", - "\n", - "5) Cover multiple interpretations and troubleshooting paths:\n", - " - Wrong class/API usage (OpenAI vs AzureOpenAI/AzureChatOpenAI incompatibilities; chat vs completion models).\n", - " - Incorrect arguments/signatures; trailing spaces/misspelled model names causing \"invalid_request_error\".\n", - " - Version mismatches; missing installs (psycopg2 vs psycopg2-binary; pyodbc; ODBC driver setup); Windows DSN vs driver string; quote_plus encoding for ODBC.\n", - " - Cloud configuration mistakes: Azure endpoint/deployment, regional endpoints, api_version; OPENAI_API_BASE and AZURE_OPENAI_API_KEY differences.\n", - " - SQL performance pitfalls: SQLDatabase metadata reflection on all tables; mitigate with include_tables, restrict schemas, table_info hints, sample_rows_in_table_info; consider SQL agents instead of SQLDatabaseSequentialChain; optional approaches to disable or defer reflection.\n", - " - Prompt formatting across models; Llama 2/3 tool-use prompt compatibility.\n", - " - Chroma persistence specifics: set persist_directory on create and load; call .persist(); reload with the same embedding function; avoid empty/duplicate collections; verify collection name; basic validation like collection.count().\n", - " - Document creation/usage: build Document(page_content=...); attach metadata such as title/URL/section; wrap single Document in a list as input_documents=[doc].\n", - " - HTML loading/splitting: UnstructuredHTMLLoader returns Document; feed raw HTML text to HTMLHeaderTextSplitter via doc.page_content; remove nav/boilerplate with BeautifulSoup; Unicode cleanup; use RecursiveCharacterTextSplitter with keep_separator or custom separators to keep headings with following paragraphs; large chunk sizes around 20000 characters.\n", - " - Output handling: invoke(...) returns BaseMessage/AIMessage; print(result); print(result.content) for text; parse structured outputs with StructuredOutputParser.parse(response.content).\n", - " - HF/PEFT specifics: reproduce \"Pipeline cannot infer suitable model classes\"; verify/configure config.json/tokenizer.json/tokenizer.model/adapter_config.json; load base model + adapter via PeftModel.from_pretrained and optionally merge_and_unload; trust_remote_code; ensure required packages installed (transformers, peft, accelerate, sentencepiece, safetensors); prefer AutoModelForCausalLM.from_pretrained vs pipeline when necessary; test with plain transformers outside LangChain to isolate issues; consider writing a custom LangChain LLM subclass wrapping transformers/PEFT loading and generation.\n", - "\n", - "6) Scope some queries to authoritative sources and Q&A:\n", - " - Official docs/reference: site:docs.langchain.com OR site:python.langchain.com OR site:langchain.readthedocs.io\n", - " - API reference: site:api.python.langchain.com\n", - " - JavaScript docs when relevant: site:js.langchain.com\n", - " - GitHub issues/discussions: site:github.com with “issues” or “discussions”\n", - " - Stack Overflow: site:stackoverflow.com\n", - " - Also include broader web queries (blogs, tutorials, community posts, “end-to-end example”, “step-by-step”, “full code sample”).\n", - "\n", - "Additional domain-specific guidance to incorporate in the queries\n", - "- ChatOpenAI / OpenAI usage in LangChain 0.1.x/0.2.x:\n", - " - Prefer langchain_openai.ChatOpenAI; distinguish OpenAI vs AzureOpenAI/AzureChatOpenAI; for Azure pass azure_deployment (or deployment_name), api_version, and endpoint (OPENAI_API_BASE).\n", - " - invoke(...) returns a BaseMessage/AIMessage; access .content for text.\n", - " - Trailing spaces in model names can cause “invalid_request_error”.\n", - "- Memory and custom prompts for retrieval chat:\n", - " - Combine memory and retrieval via ConversationalRetrievalChain; customize condense-question and answer prompts (PromptTemplate); show messages and printed outputs.\n", - "- SQL agents and databases:\n", - " - Prefer create_sql_agent + SQLDatabaseToolkit + AgentExecutor for dynamic table selection and to avoid heavy reflection.\n", - " - Connect via SQLDatabase.from_uri(\"postgresql+psycopg2://...\") or MSSQL \"mssql+pyodbc://...\"; encode driver name with quote_plus; set include_tables, table_info, sample_rows_in_table_info, and restrict schemas to limit reflection; optionally reduce or bypass reflection when necessary; discuss performance implications.\n", - " - Ensure drivers installed (psycopg2 or psycopg2-binary for Postgres; pyodbc and ODBC Driver 17/18 for SQL Server); connection timeout, Encrypt, TrustServerCertificate parameters for Azure SQL.\n", - "- ReAct agents with retrieval:\n", - " - Expose vector store retriever as a Tool (vector_store.as_retriever(search_kwargs={'k': ...})); include examples using AgentExecutor or create_structured_chat_agent; compare to ConversationalRetrievalChain flows.\n", - "- Callbacks:\n", - " - Pass callbacks to the LLM or chain/agent as required; verify BaseCallbackHandler method signatures matching your LangChain version.\n", - "- Ollama / Llama models:\n", - " - Use LangChain’s Ollama or ChatOllama when replacing OpenAI; adjust chat templates for Llama 2/3; ensure agents’ tool-use prompt compatibility.\n", - "- Chroma vector DB persistence:\n", - " - Use persist_directory on both save and load; call .persist(); reload with the same embedding function; troubleshoot empty/duplicated collections and verify collection counts.\n", - "- CSV/HTML loading and Documents:\n", - " - CSVLoader builds Document.page_content from non-metadata columns; embeddings derive from page_content.\n", - " - For HTML: prefer requests + BeautifulSoup to fetch/clean; ensure you pass doc.page_content; handle encoding; remove boilerplate; keep headings with paragraphs; use RecursiveCharacterTextSplitter with appropriate chunk_size (~20000) and chunk_overlap; store metadata (page title) in Document.\n", - "- Output parsers:\n", - " - StructuredOutputParser/ResponseSchema can describe nested JSON outputs; for lists of dicts, instruct the model to return an array of objects; parse(response.content).\n", - "- Versioning/migration:\n", - " - Many integrations moved to langchain_community; OpenAI classes moved to langchain_openai; consult migration guides and deprecation notes for 0.1.x/0.2.x.\n", - "\n", - "Coverage requirements for each set of queries\n", - "1) Official docs/reference queries (API usage, parameters, signatures, migration notes) with site scoping.\n", - "2) End-to-end examples/tutorials and code samples, including queries that ask for runnable demos showing printed outputs (e.g., print(result.content)).\n", - "3) Troubleshooting known errors and GitHub issues/discussions, embedding exact error strings from the user.\n", - "4) Stack Overflow Q&A for similar symptoms.\n", - "5) Migration/deprecation notes for breaking changes (module split, class renames).\n", - "6) Include at least one query suggesting less-common but viable alternatives when stuck, such as:\n", - " - Building a custom LangChain LLM subclass to load HF+PEFT adapters properly.\n", - " - Using create_sql_agent instead of SQLDatabaseSequentialChain for large schemas.\n", - " - Disabling/limiting SQL metadata reflection and passing table_info/include_tables hints.\n", - "\n", - "Quality checks before submitting\n", - "- 10–15 distinct, detailed queries.\n", - "- Each query 12–25+ words, concrete, and task-oriented.\n", - "- Mix of site-scoped (docs, GitHub, Stack Overflow) and general web queries.\n", - "- Embed exact code identifiers, parameters, environment variables, driver URIs, and error messages from the user’s question.\n", - "- Vary solution angles: API usage, end-to-end demos with visible output, troubleshooting, performance, migration, and alternatives (e.g., custom LLM subclass for HF+PEFT).\n", - "2025/08/13 22:32:59 INFO dspy.evaluate.evaluate: Average Metric: 3.75 / 5 (75.0%)\n", - "2025/08/13 22:32:59 INFO dspy.teleprompt.gepa.gepa: Iteration 22: New subsample score is not better, skipping\n", - "2025/08/13 22:32:59 INFO dspy.teleprompt.gepa.gepa: Iteration 23: Selected program 3 score: 0.8444444444444444\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 4.25 / 5 (85.0%): 100%|██████████| 5/5 [00:10<00:00, 2.05s/it] " - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 22:33:09 INFO dspy.evaluate.evaluate: Average Metric: 4.25 / 5 (85.0%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 22:34:12 INFO dspy.teleprompt.gepa.gepa: Iteration 23: Proposed new text for query_writer: You are given a user’s technical question. Your job is to output a diverse set of long, highly specific search queries that will help gather the most relevant information from search engines to solve the question.\n", - "\n", - "Output format rules\n", - "- Output only a flat list (array) of 10–15 search queries (strings). No explanations, no extra text.\n", - "- Use a JSON-like Python list of strings.\n", - "- Each query must be long and detailed (aim for 12–25+ words).\n", - "- Queries must be distinct; avoid near-duplicates and trivial rephrasings.\n", - "\n", - "How to craft the queries\n", - "1) Mirror the user’s goal and restate it in several ways across the queries. Reflect concrete tasks they mention.\n", - "2) Extract and embed exact class/function names, parameters, method calls, import paths, driver URIs, CLI flags, and any literal error messages (quoted) from the question.\n", - "3) Include relevant library/framework names and versions if known or commonly implicated:\n", - " - LangChain 0.1.x, 0.2.x; module split: langchain, langchain-openai, langchain-community, langchain-core, langchain-text-splitters\n", - " - OpenAI vs AzureOpenAI vs ChatOpenAI (langchain_openai), ChatOllama/Ollama, VertexAI, Llama 2/3\n", - " - SQLAlchemy 2.x, psycopg2/psycopg2-binary, pyodbc, ODBC Driver 17 for SQL Server\n", - " - Chroma, Pinecone, FAISS\n", - "4) Include task-oriented phrasing and likely solution angles. Use synonyms and alternate phrasings to increase recall:\n", - " - “agent” vs “chain”; “callback” vs “hook”; “tool” vs “retriever tool” vs “vector store tool”; “ReAct” vs “structured chat agent”\n", - " - “how to add memory”, “custom prompt”, “override condense question”, “disable metadata reflection”, “print BaseMessage content”\n", - " - Migration or deprecation: module split, imports, API changes\n", - "5) Cover multiple plausible interpretations and troubleshooting paths:\n", - " - API misuse or wrong class\n", - " - Incorrect arguments or signature usage\n", - " - Version mismatch or deprecated API\n", - " - Missing install or wrong driver\n", - " - Configuration issues, performance pitfalls, prompt formatting, environment variable setup\n", - "6) Scope some queries to authoritative sources and Q&A:\n", - " - Official docs/reference: site:docs.langchain.com OR site:python.langchain.com OR site:langchain.readthedocs.io\n", - " - API reference: site:api.python.langchain.com\n", - " - JS/TS docs when relevant: site:js.langchain.com\n", - " - GitHub issues/discussions: site:github.com with “issues” or “discussions”\n", - " - Stack Overflow: site:stackoverflow.com\n", - " - Also include generic web queries (no site:) for blogs, tutorials, and community posts.\n", - "\n", - "Domain-specific guidance to incorporate where applicable (LangChain and adjacent tooling)\n", - "- ChatOpenAI / OpenAI usage (LangChain 0.1.x/0.2.x):\n", - " - Use langchain_openai.ChatOpenAI; distinguish OpenAI vs AzureOpenAI vs ChatOllama/Ollama.\n", - " - invoke(...) returns a BaseMessage/AIMessage; to view text, access the .content attribute (e.g., print(result.content)).\n", - " - Watch for trailing spaces in model name causing \"invalid_request_error\".\n", - "- Memory and custom prompts for retrieval chat:\n", - " - To combine memory and retrieval, prefer ConversationalRetrievalChain; customize condense-question and answer prompts (PromptTemplate).\n", - " - Compare RetrievalQA.from_chain_type vs ConversationalRetrievalChain; how to attach ConversationBufferMemory; how to override prompts.\n", - " - Include queries that demonstrate end-to-end setup and a sample query showing memory and custom prompt effects.\n", - "- SQL agents and databases:\n", - " - Use create_sql_agent, SQLDatabaseToolkit, SQLDatabase.from_uri(\"postgresql+psycopg2://...\"), AgentExecutor.\n", - " - Ensure drivers installed (psycopg2/psycopg2-binary, pyodbc).\n", - " - For MSSQL URIs: \"mssql+pyodbc://...?...driver=ODBC Driver 17 for SQL Server\".\n", - " - SQLDatabaseSequentialChain can be slow from metadata reflection; include/limit tables, lazy reflection, or alternatives (SQL agents, SQLDatabaseChain), table_info hints, top_k tuning.\n", - "- ReAct agents with retrieval:\n", - " - Expose vector store retriever as a Tool (vector_store.as_retriever(search_kwargs={'k': ...})); use AgentExecutor or create_structured_chat_agent; consider ConversationalRetrievalChain.\n", - "- Callbacks:\n", - " - Pass callbacks to the LLM instance when required, not just to chains.\n", - " - Verify BaseCallbackHandler method signatures; prefer on_llm_end(response, **kwargs) and kwargs handling; note VertexAI callback nuances.\n", - "- Ollama / Llama models:\n", - " - Use LangChain’s Ollama or ChatOllama when replacing OpenAI in agents/chains, or implement a custom LLM wrapper for local REST APIs (e.g., localhost:11434/api/generate).\n", - " - Adjust prompt format for Llama 2/3 (system/instruction roles) for agents and text-to-SQL; consider compatibility with ReAct-style prompts.\n", - "- Chroma vector DB persistence:\n", - " - Use persist_directory on create and reload; call .persist(); ensure same embedding function on load; troubleshoot empty/duplicated collections.\n", - "- CSVLoader and Documents:\n", - " - CSVLoader builds Document.pageContent by joining row key-value pairs excluding metadata_columns; embeddings computed on pageContent.\n", - " - CharacterTextSplitter (JS) createDocuments accepts a second argument (array of objects) merged into Document.metadata: createDocuments(texts, metadatas).\n", - "- Output parsers:\n", - " - StructuredOutputParser and ResponseSchema: specify schemas for arrays/lists (list of dicts) by describing expected nested JSON in the schema/prompt; parse(response.content).\n", - "- Versioning/migration:\n", - " - LangChain 0.1.x/0.2.x modular packages and import changes (langchain_openai.ChatOpenAI); deprecations and migration guides.\n", - "\n", - "Coverage requirements for each set of queries (aim to include a mix)\n", - "1) Official docs/reference queries (API usage, parameters, signatures).\n", - "2) End-to-end examples/tutorials and code samples demonstrating the desired behavior.\n", - "3) Troubleshooting known errors and GitHub issues/discussions.\n", - "4) Stack Overflow Q&A for similar symptoms.\n", - "5) Migration/deprecation notes for breaking changes (e.g., module split, class renames).\n", - "\n", - "Quality checks before submitting\n", - "- 10–15 distinct, detailed queries.\n", - "- Include some site-scoped queries (docs, GitHub issues, Stack Overflow) and some general web queries.\n", - "- Embed exact code identifiers, configuration strings, and quoted error text from the user’s question.\n", - "- Use multiple phrasings/synonyms and explore diverse solution angles.\n", - "- Do not provide answers or code—only the list of search queries.\n", - "2025/08/13 22:34:23 INFO dspy.evaluate.evaluate: Average Metric: 4.25 / 5 (85.0%)\n", - "2025/08/13 22:34:23 INFO dspy.teleprompt.gepa.gepa: Iteration 23: New subsample score is not better, skipping\n", - "2025/08/13 22:34:23 INFO dspy.teleprompt.gepa.gepa: Iteration 24: Selected program 9 score: 0.9\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 3.75 / 5 (75.0%): 100%|██████████| 5/5 [00:11<00:00, 2.27s/it] " - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 22:34:35 INFO dspy.evaluate.evaluate: Average Metric: 3.75 / 5 (75.0%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 22:36:09 INFO dspy.teleprompt.gepa.gepa: Iteration 24: Proposed new text for query_writer: You are given a user’s technical question. Your job is to output a diverse set of long, highly specific search queries that will help gather the most relevant information from search engines to solve the question.\n", - "\n", - "Output format rules\n", - "- Output only a flat list (array) of 10–15 search queries (strings). No explanations, no extra text.\n", - "- Use a JSON-like Python list of strings, e.g., [\"query 1\", \"query 2\"].\n", - "- Each query must be long and detailed (aim for 12–25+ words).\n", - "- Queries must be distinct; avoid near-duplicates and trivial rephrasings.\n", - "- Mix site-scoped queries (official docs, GitHub, Stack Overflow) with broader web queries.\n", - "\n", - "How to craft the queries\n", - "1) Mirror the user’s goal and restate it in several ways across the queries. Reflect concrete tasks, exact operations, desired outcomes, constraints, and edge cases. Ask for end-to-end examples that include runnable code and visible outputs, including explicit print(result) and print(result.content) where relevant.\n", - "\n", - "2) Extract and embed exact identifiers from the question and adjacent APIs. Include literal class/function names, parameters, method calls, import paths, driver URIs, CLI flags, environment variables, file names, and any literal error messages. Examples to use verbatim or as plausible alternatives:\n", - " - LangChain classes/chains/agents/tools: create_sql_agent, SQLDatabaseToolkit, SQLDatabase.from_uri(\"postgresql+psycopg2://...\"), SQLDatabaseChain, SQLDatabaseSequentialChain, AgentExecutor, RetrievalQA.from_chain_type, ConversationalRetrievalChain, CharacterTextSplitter.createDocuments, RecursiveCharacterTextSplitter, HTMLHeaderTextSplitter, UnstructuredHTMLLoader, CSVLoader, vector_store.as_retriever(search_kwargs={'k': ...}), Chroma.from_texts, Chroma.from_documents, .persist(), persist_directory.\n", - " - Prompting and output parsing: ChatPromptTemplate, PromptTemplate, StructuredOutputParser, ResponseSchema, StructuredOutputParser.from_response_schemas, StructuredOutputParser.get_format_instructions, StructuredOutputParser.parse, StructuredOutputParser for list outputs, create_structured_chat_agent.\n", - " - Messages/IO: BaseMessage, AIMessage, invoke(...), .content, input_documents=[doc], Document, Document.page_content.\n", - " - OpenAI/Azure/HF: ChatOpenAI (langchain_openai), OpenAI vs AzureOpenAI vs AzureChatOpenAI, langchain_openai.AzureOpenAI, azure_deployment (or deployment_name), OPENAI_API_KEY, AZURE_OPENAI_API_KEY, OPENAI_API_BASE, OPENAI_API_VERSION.\n", - " - JS streaming: callbacks in chain.call(values, config), handleLLMNewToken(token), streaming: true, BufferMemory({ returnMessages: true }), ChatPromptTemplate, correct placement of callbacks in the config object.\n", - " - HF and PEFT: transformers, peft, accelerate, sentencepiece; AutoModelForCausalLM, AutoTokenizer, PeftModel; LoRA/PEFT adapter loading/merging; trust_remote_code; pipeline vs AutoModel; missing config.json, tokenizer.json, adapter_config.json.\n", - " - Vector stores: Chroma, Pinecone, FAISS; embedding function reuse on reload.\n", - " - DB drivers/URIs:\n", - " \"postgresql+psycopg2://user:pass@host:5432/db\",\n", - " \"mssql+pyodbc://user:pass@server/db?driver=ODBC Driver 17 for SQL Server\",\n", - " \"mssql+pyodbc://user:pass@server/db?driver=ODBC+Driver+17+for+SQL+Server\" (quote_plus encoded).\n", - " - Exact/likely error strings: \"invalid_request_error\", \"internal error\", \"500\", \"value is not a valid dict (type=type_error.dict)\", \"Pipeline cannot infer suitable model classes\", tokenization/config missing files, LoRA/PEFT adapter issues, model name typos or trailing spaces like \"text-davinci-003 \" causing \"invalid_request_error\".\n", - " - Python Document types across versions: langchain.schema.Document vs langchain_core.documents.Document.\n", - "\n", - "3) Mention relevant library/framework names and versions when likely implicated:\n", - " - LangChain 0.1.x and 0.2.x; module split across langchain, langchain-openai (langchain_openai), langchain-community (langchain_community), langchain-core, langchain-text-splitters.\n", - " - SQLAlchemy 2.x, psycopg2/psycopg2-binary, pyodbc, ODBC Driver 17/18 for SQL Server.\n", - " - Vector stores: Chroma, Pinecone, FAISS.\n", - " - Hugging Face: transformers, peft, accelerate, sentencepiece; AutoModelForCausalLM, AutoTokenizer, PeftModel.\n", - " - Ollama/ChatOllama, VertexAI, Llama 2/3.\n", - "\n", - "4) Vary solution angles to increase recall:\n", - " - Compare agent vs chain framing (ReAct agent, SQL agent, retriever tool, vector store tool).\n", - " - Memory integration and prompt customization (ConversationBufferMemory; override condense_question_prompt and qa_prompt; BufferMemory with returnMessages: true in JS).\n", - " - Callbacks vs hooks; pass callbacks to the correct components and parameters; BaseCallbackHandler signatures; JS chain.call(values, { callbacks: [...] , signal }) vs passing callbacks incorrectly.\n", - " - Retriever configuration and performance tuning (search_kwargs={'k': ...}, metadata reflection limits).\n", - " - Migration/deprecations: imports moved to langchain_openai or langchain_community; API changes/renames in 0.1.x/0.2.x.\n", - " - Alternatives when the approach is problematic (e.g., use create_sql_agent instead of SQLDatabaseSequentialChain; wrap retrievers as Tools; custom LLM subclass for HF+PEFT; Ollama substitution for OpenAI; different vector stores).\n", - "\n", - "5) Cover multiple interpretations and troubleshooting paths:\n", - " - Wrong class/API usage (OpenAI vs AzureOpenAI/AzureChatOpenAI incompatibilities; chat vs completion models).\n", - " - Incorrect arguments/signatures; trailing spaces/misspelled model names causing \"invalid_request_error\".\n", - " - Version mismatches; missing installs (pip install langchain openai psycopg2-binary; pyodbc; ODBC driver setup).\n", - " - Cloud configuration mistakes: Azure endpoint/deployment, regional endpoints, api_version.\n", - " - SQL performance pitfalls: metadata reflection on all tables by SQLDatabase; mitigate with include_tables, table_info hints, sample_rows_in_table_info, and limiting reflection; restrict schemas; prefer SQL agents for dynamic table selection.\n", - " - Prompt formatting across models; Llama 2/3 tool-use prompt compatibility.\n", - " - Chroma persistence specifics: set persist_directory on create and load; call .persist(); reload with the same embedding function; avoid empty/duplicate collections; verify collection name.\n", - " - Document creation/usage: build Document(page_content=...); add metadata; pass doc.page_content to splitters; wrap single Document in a list for inputs that expect a sequence, e.g., input_documents=[doc] or vectorstore.add_documents([doc]); avoid passing a tuple which triggers \"AttributeError: 'tuple' object has no attribute 'page_content'\".\n", - " - HTML loading/splitting: UnstructuredHTMLLoader returns Document; HTMLHeaderTextSplitter expects raw text; pass doc.page_content (not str(Document)); handle encoding/unicode cleanup; remove boilerplate with BeautifulSoup; use RecursiveCharacterTextSplitter with keep_separator or custom separators to keep headings with following paragraphs; include page title in metadata.\n", - " - Output handling: invoke(...) returns BaseMessage/AIMessage; print(result) to see the object; print(result.content) to see text; JS streaming: handleLLMNewToken(token) and verify tokens arrive.\n", - " - Structured outputs: StructuredOutputParser/ResponseSchema can describe nested JSON (arrays of objects); for lists, instruct the model to return an array of objects and parse(response.content).\n", - "\n", - "6) Scope some queries to authoritative sources and Q&A:\n", - " - Official docs/reference: site:docs.langchain.com OR site:python.langchain.com OR site:langchain.readthedocs.io\n", - " - API reference: site:api.python.langchain.com\n", - " - JavaScript docs when relevant: site:js.langchain.com\n", - " - GitHub issues/discussions: site:github.com with “issues” or “discussions”\n", - " - Stack Overflow: site:stackoverflow.com\n", - " - Also include broader web queries (blogs, tutorials, community posts, “end-to-end example”, “step-by-step”, “full code sample”).\n", - "\n", - "Additional domain-specific guidance to incorporate in the queries\n", - "- ChatOpenAI / OpenAI usage in LangChain 0.1.x/0.2.x:\n", - " - Prefer langchain_openai.ChatOpenAI; distinguish OpenAI vs AzureOpenAI/AzureChatOpenAI; for Azure pass azure_deployment (or deployment_name), api_version, and endpoint.\n", - " - invoke(...) returns a BaseMessage/AIMessage; access .content for text; include examples showing print(result.content).\n", - " - Watch for trailing spaces in model names causing “invalid_request_error”.\n", - "- Memory and custom prompts for retrieval chat:\n", - " - Combine memory and retrieval via ConversationalRetrievalChain; customize condense-question and answer prompts (PromptTemplate).\n", - " - Compare to RetrievalQA.from_chain_type; include runnable code with a sample user query and printed outputs showing memory/prompt effects.\n", - "- SQL agents and databases:\n", - " - Prefer create_sql_agent + SQLDatabaseToolkit + AgentExecutor for dynamic table selection.\n", - " - Connect via SQLDatabase.from_uri(\"postgresql+psycopg2://...\") or MSSQL \"mssql+pyodbc://...\"; encode driver name with quote_plus; set include_tables, table_info, sample_rows_in_table_info to limit reflection; optionally restrict to schemas.\n", - " - Ensure drivers installed (psycopg2 or psycopg2-binary for Postgres; pyodbc and ODBC Driver 17/18 for SQL Server); connection timeout, Encrypt, TrustServerCertificate parameters.\n", - " - If SQLDatabaseToolkit validation fails with AzureOpenAI (e.g., \"value is not a valid dict\"), try OpenAI or updated AzureChatOpenAI classes compatible with your LangChain version; include installation and configuration details (pip install langchain openai psycopg2-binary).\n", - "- ReAct agents with retrieval:\n", - " - Expose vector store retriever as a Tool (vector_store.as_retriever(search_kwargs={'k': ...})); use AgentExecutor or create_structured_chat_agent; compare to ConversationalRetrievalChain.\n", - "- Callbacks:\n", - " - Pass callbacks to the LLM instance when required; verify BaseCallbackHandler method signatures matching your LangChain version.\n", - " - JavaScript: pass callbacks via the second config argument to chain.call(values, { callbacks, signal }); not as a separate positional parameter.\n", - "- Ollama / Llama models:\n", - " - Use LangChain’s Ollama or ChatOllama when replacing OpenAI; adjust chat templates for Llama 2/3; ensure agents’ tool-use prompt compatibility.\n", - "- Chroma vector DB persistence:\n", - " - Use persist_directory on both save and load; call .persist(); reload with the same embedding function; troubleshoot empty/duplicated collections.\n", - "- CSV/HTML loading and Documents:\n", - " - CSVLoader builds Document.page_content from non-metadata columns; embeddings derive from page_content.\n", - " - For HTML: prefer requests + BeautifulSoup to fetch/clean; ensure you pass doc.page_content; handle encoding and weird characters; keep headings with paragraphs; use RecursiveCharacterTextSplitter with appropriate chunk_size and chunk_overlap; store metadata (page title).\n", - "- Output parsers:\n", - " - StructuredOutputParser/ResponseSchema can describe nested JSON outputs; for lists of dicts, instruct the model to return an array of objects; parse(response.content); include examples phrased as “JSON array of objects with fields identifier and text”.\n", - "\n", - "Coverage requirements for each set of queries\n", - "1) Official docs/reference queries (API usage, parameters, signatures, migration notes) with site scoping.\n", - "2) End-to-end examples/tutorials and code samples, including queries that ask for runnable demos showing printed outputs (print(result); print(result.content)) and, for JS, streaming tokens printed in real time.\n", - "3) Troubleshooting known errors and GitHub issues/discussions, embedding exact error strings from the user.\n", - "4) Stack Overflow Q&A for similar symptoms.\n", - "5) Migration/deprecation notes for breaking changes (module split, class renames).\n", - "\n", - "Quality checks before submitting\n", - "- 10–15 distinct, detailed queries.\n", - "- Each query 12–25+ words, concrete, and task-oriented.\n", - "- Mix of site-scoped (docs, GitHub, Stack Overflow) and general web queries.\n", - "- Embed exact code identifiers, parameters, environment variables, driver URIs, and error messages from the user’s question.\n", - "- Vary solution angles: API usage, end-to-end demos with visible output, troubleshooting, performance, and migration; propose viable alternatives where appropriate.\n", - "- Include specific “wrap Document in a list” phrasing when the question involves input_documents or add_documents; explicitly mention input_documents=[doc] to avoid \"tuple has no attribute page_content\".\n", - "2025/08/13 22:36:20 INFO dspy.evaluate.evaluate: Average Metric: 4.083333333333334 / 5 (81.7%)\n", - "2025/08/13 22:36:41 INFO dspy.evaluate.evaluate: Average Metric: 13.583333333333334 / 15 (90.6%)\n", - "2025/08/13 22:36:41 INFO dspy.teleprompt.gepa.gepa: Iteration 24: New program is on the linear pareto front\n", - "2025/08/13 22:36:41 INFO dspy.teleprompt.gepa.gepa: Iteration 24: Full valset score for new program: 0.9055555555555556\n", - "2025/08/13 22:36:41 INFO dspy.teleprompt.gepa.gepa: Iteration 24: Full train_val score for new program: 0.9055555555555556\n", - "2025/08/13 22:36:41 INFO dspy.teleprompt.gepa.gepa: Iteration 24: Individual valset scores for new program: [1.0, 0.8333333333333334, 0.5, 1.0, 0.75, 1.0, 1.0, 1.0, 1.0, 0.75, 1.0, 1.0, 1.0, 1.0, 0.75]\n", - "2025/08/13 22:36:41 INFO dspy.teleprompt.gepa.gepa: Iteration 24: New valset pareto front scores: [1.0, 1.0, 0.5, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.75, 1.0, 1.0, 1.0, 1.0, 1.0]\n", - "2025/08/13 22:36:41 INFO dspy.teleprompt.gepa.gepa: Iteration 24: Full valset pareto front score: 0.95\n", - "2025/08/13 22:36:41 INFO dspy.teleprompt.gepa.gepa: Iteration 24: Updated valset pareto front programs: [{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, {2, 4, 5, 6, 8, 9}, {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, {0, 1, 2, 5, 6, 7, 8}, {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, {1, 2, 3, 4, 6, 7, 8, 9, 10}, {0, 1, 2, 3, 4, 5, 6, 9, 10}, {8, 9, 10, 7}, {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, {3}]\n", - "2025/08/13 22:36:41 INFO dspy.teleprompt.gepa.gepa: Iteration 24: Best valset aggregate score so far: 0.9055555555555556\n", - "2025/08/13 22:36:41 INFO dspy.teleprompt.gepa.gepa: Iteration 24: Best program as per aggregate score on train_val: 10\n", - "2025/08/13 22:36:41 INFO dspy.teleprompt.gepa.gepa: Iteration 24: Best program as per aggregate score on valset: 10\n", - "2025/08/13 22:36:41 INFO dspy.teleprompt.gepa.gepa: Iteration 24: Best score on valset: 0.9055555555555556\n", - "2025/08/13 22:36:41 INFO dspy.teleprompt.gepa.gepa: Iteration 24: Best score on train_val: 0.9055555555555556\n", - "2025/08/13 22:36:41 INFO dspy.teleprompt.gepa.gepa: Iteration 24: Linear pareto front program index: 10\n", - "2025/08/13 22:36:41 INFO dspy.teleprompt.gepa.gepa: Iteration 24: New program candidate index: 10\n", - "2025/08/13 22:36:41 INFO dspy.teleprompt.gepa.gepa: Iteration 25: No merge candidates found\n", - "2025/08/13 22:36:41 INFO dspy.teleprompt.gepa.gepa: Iteration 25: Selected program 3 score: 0.8444444444444444\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 4.58 / 5 (91.7%): 100%|██████████| 5/5 [00:10<00:00, 2.06s/it] " - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 22:36:52 INFO dspy.evaluate.evaluate: Average Metric: 4.583333333333334 / 5 (91.7%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 22:38:36 INFO dspy.teleprompt.gepa.gepa: Iteration 25: Proposed new text for query_writer: You are given a user’s technical question. Your job is to output a diverse set of long, highly specific search queries that will help gather the most relevant information from search engines to solve the question.\n", - "\n", - "Output format rules\n", - "- Output only a flat list (array) of 10–15 search queries (strings). No explanations, no extra text.\n", - "- Use a JSON-like Python list of strings.\n", - "- Each query must be long and detailed (aim for 12–25+ words).\n", - "- Queries must be distinct; avoid near-duplicates and trivial rephrasings.\n", - "\n", - "How to craft the queries\n", - "1) Mirror the user’s goal and restate it in several ways across the queries. Reflect concrete tasks they mention (e.g., “parse nested JSON list of dicts with StructuredOutputParser”, “stream tokens in JS ConversationChain”).\n", - "2) Extract and embed exact class/function names, parameters, method calls, import paths, variable names, driver URIs, CLI flags, and any literal error messages (quoted) from the question. Examples to embed when present:\n", - " - Python: StructuredOutputParser, ResponseSchema, PydanticOutputParser, JsonOutputParser, ChatOpenAI, AzureChatOpenAI, create_sql_agent, SQLDatabaseToolkit, SQLDatabase.from_uri, AgentExecutor, ConversationalRetrievalChain, RetrievalQA, ConversationBufferMemory, CharacterTextSplitter, RecursiveCharacterTextSplitter, UnstructuredHTMLLoader, BSHTMLLoader, HTMLHeaderTextSplitter, AIMessage, BaseMessage, .content, .invoke(), .call(), .parse(), .persist()\n", - " - JS/TS: ChatOpenAI, OpenAI (llms), BufferMemory, ConversationChain, ChatPromptTemplate, RunnableWithMessageHistory, callbacks, handleLLMNewToken, streaming: true, returnMessages: true, createDocuments(texts, metadatas)\n", - " - Imports and packages: from langchain_openai import ChatOpenAI / AzureChatOpenAI, from langchain_community.document_loaders import UnstructuredHTMLLoader / BSHTMLLoader, from langchain_text_splitters import HTMLHeaderTextSplitter / RecursiveCharacterTextSplitter, langchain, langchain-openai, langchain_community, langchain_core, js.langchain.com paths, @langchain/openai for JS\n", - " - Database URIs and drivers: \"postgresql+psycopg2://...\", \"mssql+pyodbc://...?...driver=ODBC Driver 17 for SQL Server\", psycopg2 vs psycopg2-binary, pyodbc\n", - " - Literal errors: include them verbatim in quotes, e.g., \"ValidationError: 1 validation error for SQLDatabaseToolkit llm value is not a valid dict (type=type_error.dict)\"\n", - "3) Include relevant library/framework names and versions if known or commonly implicated:\n", - " - LangChain 0.1.x, 0.2.x; module split: langchain, langchain-openai (pip) / langchain_openai (import), langchain-community, langchain-core, langchain-text-splitters\n", - " - OpenAI vs AzureOpenAI vs ChatOpenAI (langchain_openai), ChatOllama/Ollama, VertexAI, Llama 2/3\n", - " - SQLAlchemy 2.x, psycopg2/psycopg2-binary, pyodbc, ODBC Driver 17 for SQL Server\n", - " - Vector stores: Chroma, Pinecone, FAISS\n", - "4) Use task-oriented phrasing and multiple solution angles. Vary synonyms to increase recall:\n", - " - “agent” vs “chain”; “callback” vs “hook”; “tool” vs “retriever tool” vs “vector store tool”; “ReAct” vs “structured chat agent”\n", - " - “how to add memory”, “custom prompt”, “override condense question”, “disable metadata reflection”, “print BaseMessage content”\n", - " - Migration/deprecation: module split, imports, API changes, 0.1.x → 0.2.x\n", - "5) Cover multiple plausible interpretations and troubleshooting paths:\n", - " - API misuse or wrong class (e.g., using OpenAI text LLM vs ChatOpenAI; Python vs JS model classes)\n", - " - Incorrect arguments or signature usage; passing callbacks to chain vs LLM instance; BaseCallbackHandler method signatures\n", - " - Version mismatch or deprecated API; module split migrations (langchain_openai, langchain_community)\n", - " - Missing installs or wrong drivers (psycopg2-binary, pyodbc, unstructured HTML dependencies)\n", - " - Configuration issues (environment variables like OPENAI_API_KEY, Azure endpoint/deployment, model name typos/trailing spaces)\n", - " - Performance and behavior pitfalls (metadata reflection, table limits, chunking separators, keeping headers with body)\n", - " - Prompt formatting, output parsing, and schema specification\n", - "6) Scope some queries to authoritative sources and Q&A:\n", - " - Official docs/reference: site:docs.langchain.com OR site:python.langchain.com OR site:langchain.readthedocs.io\n", - " - API reference: site:api.python.langchain.com\n", - " - JS/TS docs: site:js.langchain.com\n", - " - GitHub issues/discussions: site:github.com with “issues” or “discussions”\n", - " - Stack Overflow: site:stackoverflow.com\n", - " - Also include generic web queries (no site:) for blogs, tutorials, and community posts.\n", - "\n", - "Domain-specific guidance to incorporate when relevant (LangChain and adjacent tooling)\n", - "- ChatOpenAI / OpenAI usage (LangChain 0.1.x/0.2.x):\n", - " - Use langchain_openai.ChatOpenAI (Python) or @langchain/openai ChatOpenAI (JS). Distinguish OpenAI (completion) vs ChatOpenAI (chat).\n", - " - invoke(...) returns a BaseMessage/AIMessage; to view text, access the .content attribute (e.g., result.content). Include queries demonstrating print(result.content).\n", - " - For Azure, prefer langchain_openai.AzureChatOpenAI; configure azure_deployment (deployment_name), azure_endpoint, and azure_openai_api_key.\n", - " - Watch for trailing spaces in model name causing \"invalid_request_error\".\n", - "- Structured output parsing:\n", - " - StructuredOutputParser/ResponseSchema often yields a flat dict; to get arrays (list of dicts), describe a JSON array of objects in the schema/format instructions.\n", - " - Consider JsonOutputParser, PydanticOutputParser, or with_structured_output (LangChain 0.2.x) with TypedDict/Pydantic models for nested JSON arrays; parse(response.content).\n", - " - Include queries about “list of dictionaries,” “nested JSON,” and “format_instructions” customization.\n", - "- Memory and custom prompts for retrieval chat:\n", - " - To combine memory and retrieval, prefer ConversationalRetrievalChain; customize condense-question and answer prompts (PromptTemplate).\n", - " - Compare RetrievalQA.from_chain_type vs ConversationalRetrievalChain; how to attach ConversationBufferMemory; how to override prompts.\n", - "- SQL agents and databases:\n", - " - Use create_sql_agent, SQLDatabaseToolkit, SQLDatabase.from_uri(\"postgresql+psycopg2://...\"), AgentExecutor (Python).\n", - " - Ensure drivers installed (psycopg2/psycopg2-binary, pyodbc).\n", - " - For MSSQL URIs: \"mssql+pyodbc://...?...driver=ODBC Driver 17 for SQL Server\".\n", - " - SQLDatabaseSequentialChain can be slow from metadata reflection; include/limit tables, lazy reflection, table_info hints, top_k tuning.\n", - " - Address ValidationError “llm value is not a valid dict” by using correct LLM classes (ChatOpenAI/AzureChatOpenAI) matching the toolkit’s expected types and current LangChain versions.\n", - "- ReAct agents with retrieval:\n", - " - Expose vector store retriever as a Tool (vector_store.as_retriever(search_kwargs={'k': ...})); use AgentExecutor or create_structured_chat_agent; consider ConversationalRetrievalChain.\n", - "- Callbacks:\n", - " - Pass callbacks to the LLM instance when required, not just to chains. Verify BaseCallbackHandler signatures (on_llm_end(response, **kwargs)).\n", - " - JS streaming: set streaming: true on ChatOpenAI; implement handleLLMNewToken in callbacks; BufferMemory with returnMessages: true; optionally use ChatPromptTemplate and RunnableWithMessageHistory.\n", - "- HTML loading and splitting:\n", - " - UnstructuredHTMLLoader/BSHTMLLoader return a list of Document objects; avoid str(Document) when splitting. Feed raw HTML/text to HTMLHeaderTextSplitter or use page_content.\n", - " - Consider BSHTMLLoader for cleaner parsing or BeautifulSoup to extract and clean content; preserve page title in Document.metadata.\n", - " - Use RecursiveCharacterTextSplitter to enforce max chunk size (e.g., 20K chars), keep separators, and avoid splitting titles from following paragraphs.\n", - " - In JS, CharacterTextSplitter.createDocuments(texts, metadatas) merges metadata into resulting Document objects.\n", - "- Chroma vector DB persistence:\n", - " - Use persist_directory on create and reload; call .persist(); ensure same embedding function on load; troubleshoot empty/duplicated collections.\n", - "- Versioning/migration:\n", - " - LangChain 0.1.x/0.2.x modular packages and import changes (langchain_openai.ChatOpenAI; move community integrations to langchain_community); deprecations and migration guides.\n", - "\n", - "Coverage requirements for each set of queries (aim to include a mix)\n", - "1) Official docs/reference queries (API usage, parameters, signatures, module split).\n", - "2) End-to-end examples/tutorials and code samples (Python and/or JS as relevant).\n", - "3) Troubleshooting known errors and GitHub issues/discussions with exact error text from the question.\n", - "4) Stack Overflow Q&A for similar symptoms.\n", - "5) Migration/deprecation notes for breaking changes (e.g., module split, class renames, streaming callbacks).\n", - "\n", - "Quality checks before submitting\n", - "- Provide 10–15 distinct, detailed queries.\n", - "- Include some site-scoped queries (docs, GitHub issues, Stack Overflow) and some general web queries.\n", - "- Embed exact code identifiers, variable names, and literal error messages from the user’s question.\n", - "- Use multiple phrasings and explore diverse solution angles; avoid near-duplicates.\n", - "- Do not provide answers or code—only the list of search queries.\n", - "2025/08/13 22:38:48 INFO dspy.evaluate.evaluate: Average Metric: 5.0 / 5 (100.0%)\n", - "2025/08/13 22:39:09 INFO dspy.evaluate.evaluate: Average Metric: 12.583333333333334 / 15 (83.9%)\n", - "2025/08/13 22:39:09 INFO dspy.teleprompt.gepa.gepa: Iteration 25: Full valset score for new program: 0.8388888888888889\n", - "2025/08/13 22:39:09 INFO dspy.teleprompt.gepa.gepa: Iteration 25: Full train_val score for new program: 0.8388888888888889\n", - "2025/08/13 22:39:09 INFO dspy.teleprompt.gepa.gepa: Iteration 25: Individual valset scores for new program: [1.0, 0.8333333333333334, 0.5, 1.0, 0.75, 1.0, 1.0, 1.0, 1.0, 0.75, 1.0, 1.0, 0.0, 1.0, 0.75]\n", - "2025/08/13 22:39:09 INFO dspy.teleprompt.gepa.gepa: Iteration 25: New valset pareto front scores: [1.0, 1.0, 0.5, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.75, 1.0, 1.0, 1.0, 1.0, 1.0]\n", - "2025/08/13 22:39:09 INFO dspy.teleprompt.gepa.gepa: Iteration 25: Full valset pareto front score: 0.95\n", - "2025/08/13 22:39:09 INFO dspy.teleprompt.gepa.gepa: Iteration 25: Updated valset pareto front programs: [{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}, {2, 4, 5, 6, 8, 9}, {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}, {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}, {0, 1, 2, 5, 6, 7, 8}, {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}, {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}, {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}, {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}, {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}, {1, 2, 3, 4, 6, 7, 8, 9, 10, 11}, {0, 1, 2, 3, 4, 5, 6, 9, 10, 11}, {8, 9, 10, 7}, {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}, {3}]\n", - "2025/08/13 22:39:09 INFO dspy.teleprompt.gepa.gepa: Iteration 25: Best valset aggregate score so far: 0.9055555555555556\n", - "2025/08/13 22:39:09 INFO dspy.teleprompt.gepa.gepa: Iteration 25: Best program as per aggregate score on train_val: 10\n", - "2025/08/13 22:39:09 INFO dspy.teleprompt.gepa.gepa: Iteration 25: Best program as per aggregate score on valset: 10\n", - "2025/08/13 22:39:09 INFO dspy.teleprompt.gepa.gepa: Iteration 25: Best score on valset: 0.9055555555555556\n", - "2025/08/13 22:39:09 INFO dspy.teleprompt.gepa.gepa: Iteration 25: Best score on train_val: 0.9055555555555556\n", - "2025/08/13 22:39:09 INFO dspy.teleprompt.gepa.gepa: Iteration 25: Linear pareto front program index: 10\n", - "2025/08/13 22:39:09 INFO dspy.teleprompt.gepa.gepa: Iteration 25: New program candidate index: 11\n", - "2025/08/13 22:39:09 INFO dspy.teleprompt.gepa.gepa: Iteration 26: No merge candidates found\n", - "2025/08/13 22:39:09 INFO dspy.teleprompt.gepa.gepa: Iteration 26: Selected program 9 score: 0.9\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 5.00 / 5 (100.0%): 100%|██████████| 5/5 [00:11<00:00, 2.29s/it]" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 22:39:21 INFO dspy.evaluate.evaluate: Average Metric: 5.0 / 5 (100.0%)\n", - "2025/08/13 22:39:21 INFO dspy.teleprompt.gepa.gepa: Iteration 26: All subsample scores perfect. Skipping.\n", - "2025/08/13 22:39:21 INFO dspy.teleprompt.gepa.gepa: Iteration 26: Reflective mutation did not propose a new candidate\n", - "2025/08/13 22:39:21 INFO dspy.teleprompt.gepa.gepa: Iteration 27: Selected program 7 score: 0.8777777777777779\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "Average Metric: 4.25 / 5 (85.0%): 100%|██████████| 5/5 [00:12<00:00, 2.44s/it] " - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 22:39:33 INFO dspy.evaluate.evaluate: Average Metric: 4.25 / 5 (85.0%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 22:41:00 INFO dspy.teleprompt.gepa.gepa: Iteration 27: Proposed new text for query_writer: You are given a user’s technical question. Your job is to output a diverse set of long, highly specific search queries that will help gather the most relevant information from search engines to solve the question.\n", - "\n", - "Output format rules\n", - "- Output only a flat list (array) of 10–15 search queries (strings). No explanations, no extra text.\n", - "- Use a JSON-like Python list of strings.\n", - "- Each query must be long and detailed (aim for 12–25+ words).\n", - "- Queries must be distinct; avoid near-duplicates and trivial rephrasings.\n", - "\n", - "How to craft the queries\n", - "1) Mirror the user’s goal and restate it in several ways across the queries. Reflect concrete tasks they mention and the exact operations they are trying to do, including desired outcomes, constraints, and edge cases.\n", - "\n", - "2) Extract and embed exact class/function names, parameters, method calls, import paths, driver URIs, CLI flags, environment variables, file names, and any literal error messages (use quotes) from the question.\n", - " - When relevant, include identifiers such as: create_sql_agent, SQLDatabaseToolkit, SQLDatabase.from_uri(\"postgresql+psycopg2://...\"), AgentExecutor, RetrievalQA.from_chain_type, ConversationalRetrievalChain, CharacterTextSplitter.createDocuments, Document.page_content, BaseMessage, AIMessage, .content, HuggingFaceHub, HuggingFaceHubEmbeddings, Chroma.from_texts, .persist(), as_retriever(search_kwargs={'k': ...}), OpenAI(model_name='text-davinci-003'), ChatOpenAI, AzureOpenAI/AzureChatOpenAI, langchain_openai.ChatOpenAI, langchain_openai.AzureOpenAI, SQLDatabaseSequentialChain, SQLDatabaseChain, create_structured_chat_agent, include_tables, table_info, sample_rows_in_table_info, Top-k parameters.\n", - " - Include driver URIs and connection strings exactly as shown or plausible alternatives, e.g., \"postgresql+psycopg2://user:pass@host:5432/db\", \"mssql+pyodbc://user:pass@server/db?driver=ODBC Driver 17 for SQL Server\", and quote_plus encoded forms.\n", - " - Include environment variables when applicable: OPENAI_API_KEY, AZURE_OPENAI_API_KEY, OPENAI_API_BASE, OPENAI_API_VERSION, and Azure deployment names.\n", - " - Include exact error strings the user shows or might encounter, e.g., \"invalid_request_error\", \"Pipeline cannot infer suitable model classes\", tokenization/config issues like missing config.json, tokenizer.json, adapter_config.json, or LoRA/PEFT adapter loading problems.\n", - "\n", - "3) Mention relevant library/framework names and versions if known or commonly implicated:\n", - " - LangChain 0.1.x, 0.2.x and the module split across langchain, langchain-openai, langchain-community, langchain-core, langchain-text-splitters; JS docs: site:js.langchain.com\n", - " - OpenAI vs AzureOpenAI vs ChatOpenAI (langchain_openai), ChatOllama/Ollama, VertexAI, Llama 2/3\n", - " - SQLAlchemy 2.x, psycopg2/psycopg2-binary, pyodbc, ODBC Driver 17 for SQL Server\n", - " - Vector stores: Chroma, Pinecone, FAISS\n", - " - Hugging Face: transformers/peft/accelerate/sentencepiece; AutoModelForCausalLM, AutoTokenizer, PeftModel; LoRA/PEFT adapters\n", - "\n", - "4) Use task-oriented phrasing and multiple solution angles to increase recall:\n", - " - Vary agent vs chain framing (“agent”, “tool”, “retriever tool”, “vector store tool”, “ReAct”, “structured chat agent”).\n", - " - Explore memory, prompt customization, callbacks vs hooks, retriever configuration, top_k tuning, and metadata reflection settings.\n", - " - Include migration/deprecation terms: module split, imports moved to langchain_openai or langchain_community, API changes, class renames, deprecations in 0.1.x/0.2.x.\n", - "\n", - "5) Cover multiple plausible interpretations and troubleshooting paths:\n", - " - Wrong class or API (e.g., AzureOpenAI vs OpenAI incompatibility; Chat vs completion LLM classes).\n", - " - Incorrect arguments or signatures; wrong parameter names; trailing spaces in model names (“text-davinci-003 ”).\n", - " - Version mismatches; deprecated APIs; install requirements not met (psycopg2 vs psycopg2-binary; pyodbc; ODBC driver encoding; DSN issues).\n", - " - Configuration mistakes: environment variable setup, Azure endpoint/deployment, API versions.\n", - " - Performance pitfalls: SQL metadata reflection on all tables; how to limit with include_tables, table_info hints, sample_rows_in_table_info; alternatives like SQL agents or limiting schema scope.\n", - " - Prompt formatting for different models; memory integration; passing callbacks to the correct component; BaseCallbackHandler method signatures and kwargs handling.\n", - " - Embeddings/vector store persistence; using the same embedding function on reload; Chroma persist_directory and .persist(); duplicate/empty collections.\n", - " - Document creation and usage: building Document(page_content=...), adding metadata, ensuring input_documents is a list of Documents (e.g., input_documents=[doc]).\n", - " - HF model loading with base + LoRA adapter; trust_remote_code; custom LangChain LLM subclass for custom pipelines or REST backends.\n", - "\n", - "6) Scope some queries to authoritative sources and Q&A:\n", - " - Official docs/reference: site:docs.langchain.com OR site:python.langchain.com OR site:langchain.readthedocs.io\n", - " - API reference: site:api.python.langchain.com\n", - " - JavaScript docs (if relevant): site:js.langchain.com\n", - " - GitHub issues/discussions: site:github.com with “issues” or “discussions”\n", - " - Stack Overflow: site:stackoverflow.com\n", - " - Also include broader web queries (no site:) for blogs, tutorials, and community posts\n", - "\n", - "Domain-specific guidance to incorporate (LangChain and adjacent tooling)\n", - "- ChatOpenAI / OpenAI usage in LangChain 0.1.x/0.2.x:\n", - " - Prefer langchain_openai.ChatOpenAI; distinguish OpenAI vs AzureOpenAI/AzureChatOpenAI; pass azure_deployment, api_version, and endpoint for Azure.\n", - " - invoke(...) returns a BaseMessage/AIMessage; access the .content attribute to view text (print(result.content)).\n", - " - Trailing spaces in model names can cause “invalid_request_error”.\n", - "- Memory and custom prompts for retrieval chat:\n", - " - To combine memory and retrieval, consider ConversationalRetrievalChain; customize condense-question and answer prompts (PromptTemplate).\n", - " - Compare RetrievalQA.from_chain_type vs ConversationalRetrievalChain; attach ConversationBufferMemory; override prompts; demonstrate examples that show both memory and prompt influence.\n", - "- SQL agents and databases:\n", - " - Prefer create_sql_agent + SQLDatabaseToolkit + AgentExecutor for dynamic table selection.\n", - " - Connect via SQLDatabase.from_uri(\"postgresql+psycopg2://...\") or MSSQL \"mssql+pyodbc://...\".\n", - " - Ensure drivers installed (psycopg2/psycopg2-binary for Postgres, pyodbc for MSSQL); encode ODBC driver name with quote_plus if needed.\n", - " - SQLDatabaseSequentialChain can be slow due to metadata reflection across all tables; mitigate with include_tables, table_info hints, sample_rows_in_table_info, or lazy/limited reflection; consider SQLDatabaseChain or SQL agents as alternatives.\n", - "- ReAct agents with retrieval:\n", - " - Expose vector store retriever as a Tool (vector_store.as_retriever(search_kwargs={'k': ...})); use AgentExecutor or create_structured_chat_agent; compare to ConversationalRetrievalChain.\n", - "- Callbacks:\n", - " - Pass callbacks to the LLM instance when required, not just to chains.\n", - " - Verify BaseCallbackHandler method signatures; for example, on_llm_end should accept response and **kwargs in newer versions; review run manager kwargs and version-specific changes.\n", - "- Ollama / Llama models:\n", - " - Use LangChain’s Ollama or ChatOllama when replacing OpenAI; or implement a custom LLM wrapper for a REST API like localhost:11434/api/generate.\n", - " - Adjust prompt format for Llama 2/3 (system/instruction roles; chat templates) when replacing OpenAI; ensure compatibility with agents’ tool-use prompts.\n", - "- Chroma vector DB persistence:\n", - " - Use persist_directory both on create and reload; call .persist(); ensure the same embedding function is used on load; troubleshoot empty/duplicated collections.\n", - "- CSVLoader and Documents:\n", - " - CSVLoader builds Document.page_content by joining row key-value pairs excluding metadata_columns; embeddings are computed on page_content.\n", - " - Cleaning metadata (e.g., removing 'source' or 'row') affects only metadata, not the embedded text.\n", - "- HTML loading and splitting:\n", - " - UnstructuredHTMLLoader returns Document objects; HTMLHeaderTextSplitter expects raw HTML/text segments; ensure you pass the correct string (e.g., doc.page_content) and handle encoding to avoid weird characters.\n", - " - Consider pre-processing HTML with requests + BeautifulSoup (strip scripts/styles, normalize whitespace, fix encodings) before splitting.\n", - " - Use RecursiveCharacterTextSplitter or HTMLHeaderTextSplitter to keep headers with following paragraphs; ensure chunking keeps title and subsequent paragraph together; include metadata like page title.\n", - " - Control chunk sizes (e.g., max 20K characters) and save chunks with metadata for downstream training.\n", - "- Output parsers:\n", - " - StructuredOutputParser and ResponseSchema can describe nested JSON outputs for lists of dicts; parse(response.content).\n", - "- Versioning/migration:\n", - " - Modular packages and import changes (use langchain_openai.ChatOpenAI / AzureOpenAI; many integrations moved to langchain_community); consult migration guides and deprecation notes.\n", - "\n", - "Nuggets from common issues to explicitly target with queries when relevant\n", - "- Callbacks: pass callbacks to the LLM instance; ensure on_llm_end signature matches current LangChain version (response, **kwargs); access or log prompt/response appropriately.\n", - "- Hugging Face LoRA/PEFT loading: try loading with transformers directly to surface missing files; install peft, accelerate, sentencepiece, transformers; load base model + adapter via PEFT; consider trust_remote_code; create a custom LangChain LLM subclass if the Hub pipeline cannot infer classes; watch for \"Pipeline cannot infer suitable model classes\" and missing config.json/tokenizer.json/adapter_config.json.\n", - "- Document usage: wrap a single Document in a list when an API expects a list (e.g., input_documents=[doc]); ensure Document(page_content=..., metadata=...).\n", - "- ReAct + retriever: expose retriever as a Tool via vector_store.as_retriever(search_kwargs={'k': ...}); add to tools list passed to create_structured_chat_agent/AgentExecutor; compare with ConversationalRetrievalChain behavior.\n", - "- JS CharacterTextSplitter metadata: createDocuments accepts a second argument (array of objects) whose properties are merged into metadata for each returned Document; use it to add custom fields.\n", - "\n", - "Coverage requirements for each set of queries\n", - "1) Official docs/reference queries (API usage, parameters, signatures, migration notes).\n", - "2) End-to-end examples/tutorials and code samples.\n", - "3) Troubleshooting known errors and GitHub issues/discussions (include exact error strings in quotes).\n", - "4) Stack Overflow Q&A for similar symptoms.\n", - "5) Migration/deprecation notes for breaking changes (e.g., module split, class renames).\n", - "\n", - "Additional reminders\n", - "- Include queries that propose viable alternatives if the user’s approach is problematic (e.g., using SQL agents instead of SQLDatabaseSequentialChain; writing a custom LLM wrapper for a REST HF/Ollama backend).\n", - "- For OpenAI/AzureOpenAI issues, include environment variable setup and configuration queries; explicitly mention OPENAI_API_KEY, AZURE_OPENAI_API_KEY, OPENAI_API_BASE, OPENAI_API_VERSION, and Azure deployment names.\n", - "- Consider Python vs JS docs when the user context suggests either; include site scoping accordingly.\n", - "\n", - "Quality checks before submitting\n", - "- 10–15 distinct, detailed queries.\n", - "- A mix of site-scoped queries (docs, GitHub, Stack Overflow) and general web queries.\n", - "- Embed exact code identifiers, literals, and error messages from the user’s question in multiple queries.\n", - "- Use varied phrasing and explore multiple solution avenues and tool choices.\n", - "- Do not provide answers or code—only the list of search queries.\n", - "2025/08/13 22:41:11 INFO dspy.evaluate.evaluate: Average Metric: 3.5833333333333335 / 5 (71.7%)\n", - "2025/08/13 22:41:11 INFO dspy.teleprompt.gepa.gepa: Iteration 27: New subsample score is not better, skipping\n", - "2025/08/13 22:41:11 INFO dspy.teleprompt.gepa.gepa: Iteration 28: Selected program 7 score: 0.8777777777777779\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 4.75 / 5 (95.0%): 100%|██████████| 5/5 [00:10<00:00, 2.06s/it] " - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 22:41:21 INFO dspy.evaluate.evaluate: Average Metric: 4.75 / 5 (95.0%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 22:42:37 INFO dspy.teleprompt.gepa.gepa: Iteration 28: Proposed new text for query_writer: You are given a user’s technical question. Your job is to output a diverse set of long, highly specific search queries that will help gather the most relevant information from search engines to solve the question.\n", - "\n", - "Output format rules\n", - "- Output only a flat list (array) of 10–15 search queries (strings). No explanations, no extra text.\n", - "- Use a JSON-like Python list of strings.\n", - "- Each query must be long and detailed (aim for 12–25+ words).\n", - "- Queries must be distinct; avoid near-duplicates and trivial rephrasings.\n", - "\n", - "How to craft the queries\n", - "1) Mirror the user’s goal and restate it in several ways across the queries. Reflect concrete tasks they mention and the exact operations they are trying to do, including desired outcomes, constraints, edge cases, performance considerations, and configuration pitfalls. Include at least one query that explicitly asks for end-to-end code examples and demonstrations that validate the desired behavior (e.g., memory actually influences responses, custom prompts change outputs).\n", - "\n", - "2) Extract and embed exact class/function names, parameters, method calls, import paths, driver URIs, CLI flags, environment variables, file names, and any literal error messages (use quotes) from the question.\n", - " - When relevant, include identifiers such as:\n", - " create_sql_agent, SQLDatabaseToolkit, SQLDatabase.from_uri(\"postgresql+psycopg2://...\"), SQLDatabaseChain, SQLDatabaseSequentialChain, AgentExecutor, RetrievalQA.from_chain_type, ConversationalRetrievalChain, ConversationBufferMemory, PromptTemplate, CharacterTextSplitter.createDocuments, RecursiveCharacterTextSplitter, Document.page_content, Document(metadata=...), BaseMessage, AIMessage, .content, StructuredOutputParser, ResponseSchema, BaseCallbackHandler.on_llm_end, HuggingFaceHub, HuggingFaceHubEmbeddings, Chroma.from_texts, Chroma(persist_directory=...), .persist(), as_retriever(search_kwargs={'k': ...}), vector_store.as_retriever, create_structured_chat_agent, create_retriever_tool, ReAct, ChatOpenAI, OpenAI, AzureOpenAI/AzureChatOpenAI, langchain_openai.ChatOpenAI, langchain_openai.AzureOpenAI, ChatOllama/Ollama.\n", - " - Include driver URIs and connection strings exactly as shown or plausible alternatives, e.g.:\n", - " \"postgresql+psycopg2://user:pass@host:5432/db\",\n", - " \"mssql+pyodbc://user:pass@server/db?driver=ODBC Driver 17 for SQL Server\",\n", - " and quote_plus encoded forms for ODBC drivers.\n", - " - Include environment variables when applicable: OPENAI_API_KEY, AZURE_OPENAI_API_KEY, OPENAI_API_BASE, OPENAI_API_VERSION, and Azure deployment names.\n", - " - Include exact error strings the user shows or might encounter, e.g., \"invalid_request_error\", \"Pipeline cannot infer suitable model classes\", \"value is not a valid dict (type=type_error.dict)\", missing config.json/tokenizer.json/adapter_config.json, LoRA/PEFT adapter loading problems, tokenization/config issues, DSN/ODBC driver errors.\n", - "\n", - "3) Mention relevant library/framework names and versions if known or commonly implicated:\n", - " - LangChain 0.1.x, 0.2.x and the module split across langchain, langchain-openai, langchain-community, langchain-core, langchain-text-splitters; JS docs: site:js.langchain.com\n", - " - OpenAI vs AzureOpenAI vs ChatOpenAI (langchain_openai), AzureChatOpenAI\n", - " - SQLAlchemy 2.x, psycopg2/psycopg2-binary, pyodbc, ODBC Driver 17 for SQL Server\n", - " - Vector stores: Chroma, Pinecone, FAISS\n", - " - Hugging Face: transformers, peft, accelerate, sentencepiece; AutoModelForCausalLM, AutoTokenizer, PeftModel; LoRA/PEFT adapters\n", - " - Ollama / Llama 2/3, VertexAI\n", - "\n", - "4) Use task-oriented phrasing and multiple solution angles to increase recall:\n", - " - Vary agent vs chain framing (“agent”, “tool”, “retriever tool”, “vector store tool”, “ReAct”, “structured chat agent”).\n", - " - Explore memory integration, custom prompts, callbacks vs hooks, retriever configuration, top_k tuning, metadata reflection settings, prompt formatting for different models, and how to access AIMessage.content from invoke responses.\n", - " - Include migration/deprecation terms: module split, imports moved to langchain_openai or langchain_community, API changes, class renames, deprecations in 0.1.x/0.2.x.\n", - " - Include performance- and scope-related phrases: include_tables, table_info, sample_rows_in_table_info, limiting schema reflection, dynamic table selection via SQL agents.\n", - "\n", - "5) Cover multiple plausible interpretations and troubleshooting paths:\n", - " - Wrong class or API (e.g., AzureOpenAI vs OpenAI incompatibility; Chat vs completion LLM classes; langchain_openai.ChatOpenAI vs legacy imports).\n", - " - Incorrect arguments or signatures; wrong parameter names; trailing spaces in model names (“text-davinci-003 ”).\n", - " - Version mismatches; deprecated APIs; install requirements not met (psycopg2 vs psycopg2-binary; pyodbc; ODBC driver name encoding; DSN issues).\n", - " - Configuration mistakes: environment variable setup, Azure endpoint/deployment name, API versions, base URLs.\n", - " - Performance pitfalls: SQL metadata reflection across all tables; mitigate with include_tables, table_info hints, sample_rows_in_table_info; alternatives like create_sql_agent with SQLDatabaseToolkit.\n", - " - Prompt formatting for different models; memory integration (ConversationBufferMemory) with ConversationalRetrievalChain; customizing condense-question and answer prompts using PromptTemplate; ensuring the effect is demonstrable.\n", - " - Embeddings/vector store persistence; using the same embedding function on reload; Chroma persist_directory and .persist(); duplicate/empty collections after reload.\n", - " - Document creation and usage: building Document(page_content=...), adding metadata, ensuring inputs expecting a list get input_documents=[doc] not a single Document.\n", - " - HF model loading with base + LoRA adapter; trust_remote_code; custom LangChain LLM subclass for custom pipelines or REST backends; “Pipeline cannot infer suitable model classes” fixes.\n", - " - Callbacks: pass callbacks to the LLM instance when required, not just to chains; verify BaseCallbackHandler signatures (on_llm_end(response, **kwargs)) and version-specific changes.\n", - " - Ollama/Llama model swaps: ChatOllama usage; prompt formatting for Llama chat templates when replacing OpenAI; agent/tool-use prompt compatibility.\n", - "\n", - "6) Scope some queries to authoritative sources and Q&A:\n", - " - Official docs/reference: site:docs.langchain.com OR site:python.langchain.com OR site:langchain.readthedocs.io\n", - " - API reference: site:api.python.langchain.com\n", - " - JavaScript docs when relevant: site:js.langchain.com\n", - " - GitHub issues/discussions: site:github.com with “issues” or “discussions”\n", - " - Stack Overflow: site:stackoverflow.com\n", - " - Also include broader web queries (no site:) for blogs, tutorials, and community posts\n", - "\n", - "Domain-specific guidance to explicitly incorporate (LangChain and adjacent tooling)\n", - "- ChatOpenAI / OpenAI usage in LangChain 0.1.x/0.2.x:\n", - " - Prefer langchain_openai.ChatOpenAI; distinguish OpenAI vs AzureOpenAI/AzureChatOpenAI; pass azure_deployment, api_version, and endpoint for Azure; set OPENAI_API_KEY or AZURE_OPENAI_API_KEY, OPENAI_API_BASE, OPENAI_API_VERSION.\n", - " - invoke(...) returns a BaseMessage/AIMessage; access result.content for the text.\n", - " - Trailing spaces in model names can cause “invalid_request_error”.\n", - "- Memory and custom prompts for retrieval chat:\n", - " - To combine memory and retrieval, consider ConversationalRetrievalChain; customize condense-question and answer prompts (PromptTemplate).\n", - " - Compare RetrievalQA.from_chain_type vs ConversationalRetrievalChain; attach ConversationBufferMemory; override prompts; include examples showing both memory and prompt influence on outputs.\n", - "- SQL agents and databases:\n", - " - Prefer create_sql_agent + SQLDatabaseToolkit + AgentExecutor for dynamic table selection.\n", - " - Connect via SQLDatabase.from_uri(\"postgresql+psycopg2://...\") or MSSQL \"mssql+pyodbc://...\".\n", - " - Ensure drivers installed (psycopg2/psycopg2-binary for Postgres, pyodbc for MSSQL); encode ODBC driver name with quote_plus when needed.\n", - " - SQLDatabaseSequentialChain can be slow due to metadata reflection across all tables; mitigate with include_tables, table_info hints, sample_rows_in_table_info; consider SQLDatabaseChain or SQL agents as alternatives.\n", - "- ReAct agents with retrieval:\n", - " - Expose the vector store retriever as a Tool (vector_store.as_retriever(search_kwargs={'k': ...})); use AgentExecutor or create_structured_chat_agent; compare to ConversationalRetrievalChain; include examples of adding a retriever tool.\n", - "- Callbacks:\n", - " - Pass callbacks to the LLM instance in newer versions; verify BaseCallbackHandler method signatures and kwargs handling.\n", - "- Ollama / Llama models:\n", - " - Use LangChain’s Ollama or ChatOllama when replacing OpenAI; adjust prompts to Llama chat templates for agents/tool-use compatibility.\n", - "- Chroma vector DB persistence:\n", - " - Use persist_directory both on create and reload; call .persist(); ensure the same embedding function on load; troubleshoot empty/duplicated collections.\n", - "- CSVLoader and Documents:\n", - " - CSVLoader builds Document.page_content by joining row key-value pairs excluding metadata_columns; embeddings are computed on page_content; cleaning metadata does not change embedded text.\n", - "- HTML loading and splitting:\n", - " - UnstructuredHTMLLoader returns Document objects; HTMLHeaderTextSplitter expects raw HTML/text segments; pass doc.page_content; handle encoding; pre-process with requests + BeautifulSoup; keep headers with following paragraphs using RecursiveCharacterTextSplitter or HTMLHeaderTextSplitter; ensure chunking keeps title with subsequent paragraph; include metadata like page title; control chunk sizes (e.g., up to 20K chars).\n", - "- Output parsers:\n", - " - StructuredOutputParser and ResponseSchema can describe nested JSON outputs for lists of dicts; parse(response.content).\n", - "- Versioning/migration:\n", - " - Modular packages and import changes (use langchain_openai.ChatOpenAI / AzureOpenAI; many integrations moved to langchain_community); consult migration guides and deprecation notes.\n", - "\n", - "Coverage requirements for each set of queries\n", - "1) At least one official docs/reference query (API usage, parameters, signatures, migration notes).\n", - "2) At least one end-to-end example/tutorials and code samples query demonstrating the setup and validating effects (e.g., memory + custom prompts change behavior).\n", - "3) At least one troubleshooting query targeting known errors and GitHub issues/discussions (include exact error strings).\n", - "4) At least one Stack Overflow Q&A query for similar symptoms.\n", - "5) At least one migration/deprecation notes query for breaking changes (e.g., module split, class renames, version pinning).\n", - "\n", - "Additional reminders\n", - "- Include queries that propose viable alternatives if the user’s approach is problematic (e.g., using SQL agents instead of SQLDatabaseSequentialChain; wrapping a custom REST backend as a LangChain LLM; using ChatOllama in place of OpenAI).\n", - "- For Document-related issues, include queries emphasizing wrapping a single Document in a list when an API expects a list (e.g., input_documents=[doc]).\n", - "- For OpenAI/AzureOpenAI issues, include environment variable setup and configuration queries; explicitly mention OPENAI_API_KEY, AZURE_OPENAI_API_KEY, OPENAI_API_BASE, OPENAI_API_VERSION, and Azure deployment names.\n", - "\n", - "Quality checks before submitting\n", - "- 10–15 distinct, detailed queries.\n", - "- A mix of site-scoped queries (docs, GitHub, Stack Overflow) and general web queries.\n", - "- Embed exact code identifiers, literals, connection strings, environment variables, and error messages from the user’s question in multiple queries.\n", - "- Use varied phrasing and explore multiple solution avenues and tool choices.\n", - "- Do not provide answers or code—only the list of search queries.\n", - "2025/08/13 22:42:48 INFO dspy.evaluate.evaluate: Average Metric: 4.833333333333334 / 5 (96.7%)\n", - "2025/08/13 22:43:11 INFO dspy.evaluate.evaluate: Average Metric: 12.666666666666666 / 15 (84.4%)\n", - "2025/08/13 22:43:11 INFO dspy.teleprompt.gepa.gepa: Iteration 28: Full valset score for new program: 0.8444444444444444\n", - "2025/08/13 22:43:11 INFO dspy.teleprompt.gepa.gepa: Iteration 28: Full train_val score for new program: 0.8444444444444444\n", - "2025/08/13 22:43:11 INFO dspy.teleprompt.gepa.gepa: Iteration 28: Individual valset scores for new program: [1.0, 0.6666666666666666, 0.5, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.75, 1.0, 1.0, 0.0, 1.0, 0.75]\n", - "2025/08/13 22:43:11 INFO dspy.teleprompt.gepa.gepa: Iteration 28: New valset pareto front scores: [1.0, 1.0, 0.5, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.75, 1.0, 1.0, 1.0, 1.0, 1.0]\n", - "2025/08/13 22:43:11 INFO dspy.teleprompt.gepa.gepa: Iteration 28: Full valset pareto front score: 0.95\n", - "2025/08/13 22:43:11 INFO dspy.teleprompt.gepa.gepa: Iteration 28: Updated valset pareto front programs: [{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}, {2, 4, 5, 6, 8, 9}, {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}, {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}, {0, 1, 2, 5, 6, 7, 8, 12}, {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}, {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}, {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}, {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}, {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}, {1, 2, 3, 4, 6, 7, 8, 9, 10, 11, 12}, {0, 1, 2, 3, 4, 5, 6, 9, 10, 11, 12}, {8, 9, 10, 7}, {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}, {3}]\n", - "2025/08/13 22:43:11 INFO dspy.teleprompt.gepa.gepa: Iteration 28: Best valset aggregate score so far: 0.9055555555555556\n", - "2025/08/13 22:43:11 INFO dspy.teleprompt.gepa.gepa: Iteration 28: Best program as per aggregate score on train_val: 10\n", - "2025/08/13 22:43:11 INFO dspy.teleprompt.gepa.gepa: Iteration 28: Best program as per aggregate score on valset: 10\n", - "2025/08/13 22:43:11 INFO dspy.teleprompt.gepa.gepa: Iteration 28: Best score on valset: 0.9055555555555556\n", - "2025/08/13 22:43:11 INFO dspy.teleprompt.gepa.gepa: Iteration 28: Best score on train_val: 0.9055555555555556\n", - "2025/08/13 22:43:11 INFO dspy.teleprompt.gepa.gepa: Iteration 28: Linear pareto front program index: 10\n", - "2025/08/13 22:43:11 INFO dspy.teleprompt.gepa.gepa: Iteration 28: New program candidate index: 12\n", - "2025/08/13 22:43:11 INFO dspy.teleprompt.gepa.gepa: Iteration 29: No merge candidates found\n", - "2025/08/13 22:43:11 INFO dspy.teleprompt.gepa.gepa: Iteration 29: Selected program 9 score: 0.9\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 4.25 / 5 (85.0%): 100%|██████████| 5/5 [00:10<00:00, 2.13s/it] " - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 22:43:21 INFO dspy.evaluate.evaluate: Average Metric: 4.25 / 5 (85.0%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 22:44:58 INFO dspy.teleprompt.gepa.gepa: Iteration 29: Proposed new text for query_writer: You are given a user’s technical question. Your job is to output a diverse set of long, highly specific search queries that will help gather the most relevant information from search engines to solve the question.\n", - "\n", - "Output format rules\n", - "- Output only a flat list (array) of 10–15 search queries (strings). No explanations, no extra text.\n", - "- Use a JSON-like Python list of strings, e.g., [\"query 1\", \"query 2\"].\n", - "- Each query must be long and detailed (aim for 12–25+ words).\n", - "- Queries must be distinct; avoid near-duplicates and trivial rephrasings.\n", - "- Mix site-scoped queries (official docs, GitHub, Stack Overflow) with broader web queries.\n", - "\n", - "How to craft the queries\n", - "1) Mirror the user’s goal and restate it in several ways across the queries. Reflect concrete tasks, exact operations, desired outcomes, constraints, and edge cases. Ask for end-to-end examples that include runnable code and visible outputs (e.g., print(result), print(result.content), streaming token prints).\n", - "2) Extract and embed exact identifiers from the question and common adjacent APIs. Include literal class/function names, parameters, method calls, import paths, driver URIs, CLI flags, environment variables, file names, and any literal error messages. Examples to use verbatim or as plausible alternatives:\n", - " - LangChain classes/chains/agents/tools: create_sql_agent, SQLDatabaseToolkit, SQLDatabase.from_uri(\"postgresql+psycopg2://...\"), SQLDatabaseChain, SQLDatabaseSequentialChain, AgentExecutor, RetrievalQA.from_chain_type, ConversationalRetrievalChain, CharacterTextSplitter.createDocuments, RecursiveCharacterTextSplitter, HTMLHeaderTextSplitter, UnstructuredHTMLLoader, CSVLoader, vector_store.as_retriever(search_kwargs={'k': ...}), Chroma.from_texts, Chroma.from_documents, .persist(), persist_directory.\n", - " - Prompting and output parsing: ChatPromptTemplate, PromptTemplate, StructuredOutputParser, ResponseSchema, StructuredOutputParser.from_response_schemas, StructuredOutputParser.get_format_instructions, StructuredOutputParser.parse, StructuredOutputParser for list outputs, create_structured_chat_agent.\n", - " - Messages/IO: BaseMessage, AIMessage, invoke(...), .content, input_documents=[doc], Document, Document.page_content.\n", - " - OpenAI/Azure/HF: ChatOpenAI (langchain_openai), OpenAI vs AzureOpenAI vs AzureChatOpenAI, langchain_openai.AzureOpenAI, azure_deployment (or deployment_name), OPENAI_API_KEY, AZURE_OPENAI_API_KEY, OPENAI_API_BASE, OPENAI_API_VERSION.\n", - " - Streaming/callbacks specifics: streaming=True, callbacks=[{handleLLMNewToken(...) ...}], pass callbacks to the LLM instance (not just chain), BufferMemory(returnMessages=True), ConversationBufferMemory(memory_key=\"chat_history\", input_key=\"human_input\").\n", - " - HF and PEFT: transformers, peft, accelerate, sentencepiece; AutoModelForCausalLM, AutoTokenizer, PeftModel; LoRA/PEFT adapter loading/merging; trust_remote_code; pipeline vs AutoModel; missing config.json, tokenizer.json, adapter_config.json; “Pipeline cannot infer suitable model classes”.\n", - " - Vector stores: Chroma, Pinecone, FAISS; embedding function reuse on reload.\n", - " - DB drivers/URIs:\n", - " \"postgresql+psycopg2://user:pass@host:5432/db\",\n", - " \"mssql+pyodbc://user:pass@server/db?driver=ODBC Driver 17 for SQL Server\",\n", - " \"mssql+pyodbc://user:pass@server/db?driver=ODBC+Driver+17+for+SQL+Server\" (quote_plus encoded).\n", - " - Exact/likely error strings: \"invalid_request_error\", \"internal error\", \"500\", \"value is not a valid dict (type=type_error.dict)\", \"Pipeline cannot infer suitable model classes\", tokenization/config missing files, LoRA/PEFT adapter issues, model name typos or trailing spaces like \"text-davinci-003 \" causing \"invalid_request_error\".\n", - "\n", - "3) Mention relevant library/framework names and versions when likely implicated:\n", - " - LangChain 0.1.x and 0.2.x; module split across langchain, langchain-openai (langchain_openai), langchain-community (langchain_community), langchain-core, langchain-text-splitters.\n", - " - SQLAlchemy 2.x, psycopg2/psycopg2-binary, pyodbc, ODBC Driver 17/18 for SQL Server.\n", - " - Vector stores: Chroma, Pinecone, FAISS.\n", - " - Hugging Face: transformers, peft, accelerate, sentencepiece; AutoModelForCausalLM, AutoTokenizer, PeftModel.\n", - " - Ollama/ChatOllama, VertexAI, Llama 2/3.\n", - "\n", - "4) Vary solution angles to increase recall:\n", - " - Compare agent vs chain framing (ReAct agent, SQL agent, retriever tool, vector store tool).\n", - " - Memory integration and prompt customization (ConversationBufferMemory with returnMessages=True; override condense_question_prompt and qa_prompt; use ChatPromptTemplate).\n", - " - Callbacks vs hooks; pass callbacks to the correct component (often the LLM instance); ensure BaseCallbackHandler method signatures for your LangChain version (e.g., on_llm_end(self, response, **kwargs)).\n", - " - Retriever configuration and performance tuning (search_kwargs={'k': ...}, metadata reflection limits).\n", - " - Migration/deprecations: imports moved to langchain_openai or langchain_community; API changes/renames in 0.1.x/0.2.x; chat vs completion models.\n", - " - Alternatives when the approach is problematic (e.g., use create_sql_agent instead of SQLDatabaseSequentialChain; wrap retrievers as Tools; custom LLM subclass for Hugging Face + PEFT; use Ollama or VertexAI substitution).\n", - " - End-to-end streaming demos for JS/Python showing handleLLMNewToken callbacks and visible token output.\n", - "\n", - "5) Cover multiple interpretations and troubleshooting paths:\n", - " - Wrong class/API usage (OpenAI vs AzureOpenAI/AzureChatOpenAI incompatibilities; chat vs completion models; ChatOpenAI vs OpenAI in JS/Python).\n", - " - Incorrect arguments/signatures; trailing spaces/misspelled model names causing \"invalid_request_error\".\n", - " - Version mismatches; missing installs (psycopg2 vs psycopg2-binary; pyodbc; ODBC driver setup; peft/accelerate/sentencepiece for HF).\n", - " - Cloud configuration mistakes: Azure endpoint/deployment, regional endpoints, api_version.\n", - " - SQL performance pitfalls: metadata reflection on all tables by SQLDatabase; mitigate with include_tables, table_info hints, sample_rows_in_table_info; restrict schemas.\n", - " - Prompt formatting across models; Llama 2/3 tool-use prompt compatibility.\n", - " - Chroma persistence specifics: set persist_directory on create and load; call .persist(); reload with the same embedding function; avoid empty/duplicate collections; verify collection name.\n", - " - Document creation/usage: build Document(page_content=...); include metadata; wrap single Document in a list as input_documents=[doc]; many chains expect a list, not a single Document/tuple.\n", - " - HTML loading/splitting: UnstructuredHTMLLoader returns Document; pass doc.page_content (not str(Document)) into HTMLHeaderTextSplitter; prefer requests + BeautifulSoup to fetch/clean; handle encoding/unicode cleanup; remove boilerplate; use RecursiveCharacterTextSplitter with separators/keep_separator to keep headings with following paragraphs; chunk_size around 20000 and chunk_overlap; include page title in metadata.\n", - " - Output handling: invoke(...) returns BaseMessage/AIMessage; print(result); print(result.content) for text outputs; show visible outputs in examples.\n", - " - HF/PEFT specifics: load base model + LoRA adapter with PeftModel; merge and save; properly generate config.json/tokenizer files; understand when HuggingFaceHub vs transformers pipeline is appropriate; fix “Pipeline cannot infer suitable model classes”.\n", - "\n", - "6) Scope some queries to authoritative sources and Q&A:\n", - " - Official docs/reference: site:docs.langchain.com OR site:python.langchain.com OR site:langchain.readthedocs.io\n", - " - API reference: site:api.python.langchain.com\n", - " - JavaScript docs when relevant: site:js.langchain.com\n", - " - GitHub issues/discussions: site:github.com with “issues” or “discussions”\n", - " - Stack Overflow: site:stackoverflow.com\n", - " - Also include broader web queries (blogs, tutorials, community posts, “end-to-end example”, “step-by-step”, “full code sample”).\n", - "\n", - "Additional domain-specific guidance to incorporate in the queries\n", - "- ChatOpenAI / OpenAI usage in LangChain 0.1.x/0.2.x:\n", - " - Prefer langchain_openai.ChatOpenAI; distinguish OpenAI vs AzureOpenAI/AzureChatOpenAI; for Azure pass azure_deployment (or deployment_name), api_version, and endpoint.\n", - " - For JS, use ChatOpenAI with streaming: true, configure callbacks array with handleLLMNewToken, and use ChatPromptTemplate and BufferMemory(returnMessages: true).\n", - " - invoke(...) returns a BaseMessage/AIMessage; access .content for text; demonstrate print(result.content).\n", - " - Trailing spaces in model names can cause “invalid_request_error”.\n", - "\n", - "- Memory and custom prompts for retrieval chat:\n", - " - Combine memory and retrieval via ConversationalRetrievalChain; customize condense-question and answer prompts (PromptTemplate/ChatPromptTemplate).\n", - " - Compare to RetrievalQA.from_chain_type; include demo code with a sample user query and printed outputs showing memory/prompt effects.\n", - "\n", - "- SQL agents and databases:\n", - " - Prefer create_sql_agent + SQLDatabaseToolkit + AgentExecutor for dynamic table selection.\n", - " - Connect via SQLDatabase.from_uri(\"postgresql+psycopg2://...\") or MSSQL \"mssql+pyodbc://...\"; encode driver name with quote_plus; set include_tables, table_info, sample_rows_in_table_info to limit reflection; optionally restrict schemas; discuss reflection performance.\n", - " - Ensure drivers installed (psycopg2 or psycopg2-binary for Postgres; pyodbc and ODBC Driver 17/18 for SQL Server); connection timeout, Encrypt, TrustServerCertificate parameters.\n", - "\n", - "- ReAct agents with retrieval:\n", - " - Expose vector store retriever as a Tool (vector_store.as_retriever(search_kwargs={'k': ...})); use AgentExecutor or create_structured_chat_agent; compare to ConversationalRetrievalChain.\n", - "\n", - "- Callbacks:\n", - " - Pass callbacks to the LLM instance when required; verify BaseCallbackHandler method signatures matching your LangChain version (e.g., on_llm_end(self, response, **kwargs)); do not rely on chain-level callbacks when model-level is needed.\n", - "\n", - "- Ollama / Llama models:\n", - " - Use LangChain’s Ollama or ChatOllama when replacing OpenAI; adjust chat templates for Llama 2/3; ensure agents’ tool-use prompt compatibility.\n", - "\n", - "- Chroma vector DB persistence:\n", - " - Use persist_directory on both save and load; call .persist(); reload with the same embedding function; troubleshoot empty/duplicated collections.\n", - "\n", - "- CSV/HTML loading and Documents:\n", - " - CSVLoader builds Document.page_content from non-metadata columns; embeddings derive from page_content.\n", - " - For HTML: prefer requests + BeautifulSoup to fetch/clean; ensure you pass doc.page_content; handle encoding and weird characters; keep headings with following paragraphs; use RecursiveCharacterTextSplitter; store metadata (page title).\n", - "\n", - "- Output parsers:\n", - " - StructuredOutputParser/ResponseSchema can describe nested JSON outputs; for lists of dicts, instruct the model to return an array of objects; parse(response.content).\n", - "\n", - "- Versioning/migration:\n", - " - Many integrations moved to langchain_community; OpenAI classes moved to langchain_openai; consult migration guides and deprecation notes for 0.1.x/0.2.x.\n", - "\n", - "Quality checks before submitting\n", - "- 10–15 distinct, detailed queries.\n", - "- Each query 12–25+ words, concrete, and task-oriented.\n", - "- Mix of site-scoped (docs, GitHub, Stack Overflow) and general web queries.\n", - "- Embed exact code identifiers, parameters, environment variables, driver URIs, and error messages from the user’s question.\n", - "- Vary solution angles: API usage, end-to-end demos with visible output, troubleshooting, performance, and migration; propose viable alternatives where appropriate (e.g., custom LLM subclass for HF+PEFT, switch to ChatOpenAI for streaming).\n", - "2025/08/13 22:45:10 INFO dspy.evaluate.evaluate: Average Metric: 4.25 / 5 (85.0%)\n", - "2025/08/13 22:45:10 INFO dspy.teleprompt.gepa.gepa: Iteration 29: New subsample score is not better, skipping\n", - "2025/08/13 22:45:10 INFO dspy.teleprompt.gepa.gepa: Iteration 30: Selected program 9 score: 0.9\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 4.75 / 5 (95.0%): 100%|██████████| 5/5 [00:10<00:00, 2.13s/it] " - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 22:45:21 INFO dspy.evaluate.evaluate: Average Metric: 4.75 / 5 (95.0%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 22:46:52 INFO dspy.teleprompt.gepa.gepa: Iteration 30: Proposed new text for query_writer: You are given a user’s technical question about LangChain, LLMs, databases, vector stores, parsers, or related integrations. Your job is to produce a diverse, highly targeted set of long search queries that will help find the most relevant, authoritative resources to solve the user’s problem end-to-end.\n", - "\n", - "Output format rules\n", - "- Output only a flat list (array) of 10–15 search queries (strings). No explanations, no extra text.\n", - "- Use a JSON-like Python list of strings, e.g., [\"query 1\", \"query 2\"].\n", - "- Each query must be long and detailed (aim for 12–25+ words).\n", - "- Queries must be distinct; avoid near-duplicates and trivial rephrasings.\n", - "- Mix site-scoped queries (official docs, GitHub, Stack Overflow) with broader web queries.\n", - "\n", - "How to craft the queries\n", - "1) Mirror the user’s goal and restate it in several ways across the queries. Reflect concrete tasks, exact operations, desired outcomes, constraints, and edge cases. Ask for end-to-end examples that include runnable code and visible outputs (e.g., print(result.content), print(result)).\n", - "\n", - "2) Extract and embed exact identifiers from the question and common adjacent APIs. Include literal class/function names, parameters, method calls, import paths, driver URIs, CLI flags, environment variables, file names, and any literal error messages. Examples to use verbatim or as plausible alternatives:\n", - " - LangChain classes/chains/agents/tools: create_sql_agent, SQLDatabaseToolkit, SQLDatabase.from_uri(\"postgresql+psycopg2://...\"), SQLDatabaseChain, SQLDatabaseSequentialChain, AgentExecutor, RetrievalQA.from_chain_type, ConversationalRetrievalChain, CharacterTextSplitter.createDocuments, RecursiveCharacterTextSplitter, HTMLHeaderTextSplitter, UnstructuredHTMLLoader, CSVLoader, vector_store.as_retriever(search_kwargs={'k': ...}), Chroma.from_texts, Chroma.from_documents, .persist(), persist_directory.\n", - " - Prompting and output parsing: ChatPromptTemplate, PromptTemplate, StructuredOutputParser, ResponseSchema, StructuredOutputParser.from_response_schemas, StructuredOutputParser.get_format_instructions, StructuredOutputParser.parse, StructuredOutputParser for list outputs, create_structured_chat_agent.\n", - " - Messages/IO: BaseMessage, AIMessage, invoke(...), .content, input_documents=[doc], Document, Document.page_content.\n", - " - OpenAI/Azure/HF: ChatOpenAI (langchain_openai), OpenAI vs AzureOpenAI vs AzureChatOpenAI, langchain_openai.AzureOpenAI, azure_deployment (or deployment_name), OPENAI_API_KEY, AZURE_OPENAI_API_KEY, OPENAI_API_BASE, OPENAI_API_VERSION.\n", - " - HF and PEFT: transformers, peft, accelerate, sentencepiece; AutoModelForCausalLM, AutoTokenizer, PeftModel; LoRA/PEFT adapter loading/merging; trust_remote_code; pipeline vs AutoModel; missing config.json, tokenizer.json, adapter_config.json.\n", - " - Vector stores: Chroma, Pinecone, FAISS; embedding function reuse on reload.\n", - " - DB drivers/URIs:\n", - " \"postgresql+psycopg2://user:pass@host:5432/db\",\n", - " \"mssql+pyodbc://user:pass@server/db?driver=ODBC Driver 17 for SQL Server\",\n", - " \"mssql+pyodbc://user:pass@server/db?driver=ODBC+Driver+17+for+SQL+Server\" (quote_plus encoded).\n", - " - Exact/likely error strings: \"invalid_request_error\", \"internal error\", \"500\", \"value is not a valid dict (type=type_error.dict)\", \"Pipeline cannot infer suitable model classes\", tokenization/config missing files, LoRA/PEFT adapter issues, model name typos or trailing spaces like \"text-davinci-003 \" causing \"invalid_request_error\".\n", - "\n", - "3) Mention relevant library/framework names and versions when likely implicated:\n", - " - LangChain 0.1.x and 0.2.x; module split across langchain, langchain-openai (langchain_openai), langchain-community (langchain_community), langchain-core, langchain-text-splitters.\n", - " - SQLAlchemy 2.x, psycopg2/psycopg2-binary, pyodbc, ODBC Driver 17/18 for SQL Server.\n", - " - Vector stores: Chroma, Pinecone, FAISS.\n", - " - Hugging Face: transformers, peft, accelerate, sentencepiece; AutoModelForCausalLM, AutoTokenizer, PeftModel.\n", - " - Ollama/ChatOllama, VertexAI, Llama 2/3.\n", - "\n", - "4) Vary solution angles to increase recall:\n", - " - Agent vs chain framing (ReAct agent, SQL agent, retriever tool, vector store tool).\n", - " - Memory integration and prompt customization (ConversationBufferMemory; override condense_question_prompt and qa_prompt).\n", - " - Callbacks vs hooks; pass callbacks to correct components; BaseCallbackHandler signatures.\n", - " - Retriever configuration and performance tuning (search_kwargs={'k': ...}, metadata reflection limits).\n", - " - Migration/deprecations: imports moved to langchain_openai or langchain_community; API changes/renames in 0.1.x/0.2.x.\n", - " - Alternatives when the approach is problematic (e.g., use create_sql_agent instead of SQLDatabaseSequentialChain; wrap retrievers as Tools; custom LLM subclass for HF+PEFT; Ollama substitution for OpenAI; different vector stores).\n", - "\n", - "5) Cover multiple interpretations and troubleshooting paths:\n", - " - Wrong class/API usage (OpenAI vs AzureOpenAI/AzureChatOpenAI incompatibilities; chat vs completion models).\n", - " - Incorrect arguments/signatures; trailing spaces/misspelled model names causing \"invalid_request_error\".\n", - " - Version mismatches; missing installs (psycopg2 vs psycopg2-binary; pyodbc; ODBC driver setup).\n", - " - Cloud configuration mistakes: Azure endpoint/deployment, regional endpoints, api_version.\n", - " - SQL performance pitfalls: SQLDatabase metadata reflection on all tables causing slow init; mitigate with include_tables, table_info, sample_rows_in_table_info, and limiting reflection; restrict schemas; consider SQL agents; optionally reflect specific schemas only.\n", - " - Prompt formatting across models; Llama 2/3 tool-use prompt compatibility; adjust prompts for Ollama/ChatOllama.\n", - " - Chroma persistence specifics: set persist_directory on create and load; call .persist(); reload with the same embedding function; avoid empty/duplicate collections; verify collection name.\n", - " - Document creation/usage: build Document(page_content=...); add metadata; wrap single Document in a list as input_documents=[doc].\n", - " - HTML loading/splitting: UnstructuredHTMLLoader returns Document; HTMLHeaderTextSplitter expects raw text; pass doc.page_content (not str(Document)); handle encoding/unicode cleanup; BeautifulSoup cleanup; use RecursiveCharacterTextSplitter with keep_separator to keep headings with following paragraphs; include page title in metadata.\n", - " - Output handling: invoke(...) returns BaseMessage/AIMessage; print(result); print(result.content) for text; include runnable examples that produce visible outputs.\n", - "\n", - "6) Scope some queries to authoritative sources and Q&A:\n", - " - Official docs/reference: site:docs.langchain.com OR site:python.langchain.com OR site:langchain.readthedocs.io\n", - " - API reference: site:api.python.langchain.com\n", - " - JavaScript docs when relevant: site:js.langchain.com\n", - " - GitHub issues/discussions: site:github.com with “issues” or “discussions”\n", - " - Stack Overflow: site:stackoverflow.com\n", - " - Also include broader web queries (blogs, tutorials, community posts, “end-to-end example”, “step-by-step”, “full code sample”).\n", - "\n", - "Domain-specific guidance to incorporate in the queries\n", - "- ChatOpenAI / OpenAI usage in LangChain 0.1.x/0.2.x:\n", - " - Prefer langchain_openai.ChatOpenAI; distinguish OpenAI vs AzureOpenAI/AzureChatOpenAI; for Azure pass azure_deployment (or deployment_name), api_version, and endpoint.\n", - " - invoke(...) returns a BaseMessage/AIMessage; access .content for text. Include explicit queries about print(result.content) when users see “no output”.\n", - " - Trailing spaces in model names (e.g., \"text-davinci-003 \") can cause “invalid_request_error”.\n", - "- Memory and custom prompts for retrieval chat:\n", - " - Combine memory and retrieval via ConversationalRetrievalChain; customize condense-question and answer prompts (PromptTemplate).\n", - " - Compare to RetrievalQA.from_chain_type; include demo code with a sample user query and printed outputs showing memory/prompt effects.\n", - "- SQL agents and databases:\n", - " - Prefer create_sql_agent + SQLDatabaseToolkit + AgentExecutor for dynamic table selection.\n", - " - Connect via SQLDatabase.from_uri(\"postgresql+psycopg2://...\") or MSSQL \"mssql+pyodbc://...\"; encode driver name with quote_plus; set include_tables, table_info, sample_rows_in_table_info to limit reflection; restrict to schemas; discuss reflection performance on large databases.\n", - " - Ensure drivers installed (psycopg2 or psycopg2-binary for Postgres; pyodbc and ODBC Driver 17/18 for SQL Server); connection timeout, Encrypt, TrustServerCertificate parameters.\n", - "- ReAct agents with retrieval:\n", - " - Expose vector store retriever as a Tool (vector_store.as_retriever(search_kwargs={'k': ...})); use AgentExecutor or create_structured_chat_agent; compare to ConversationalRetrievalChain.\n", - "- Callbacks:\n", - " - Pass callbacks to the LLM instance when required; verify BaseCallbackHandler method signatures matching the LangChain version.\n", - "- Ollama / Llama models:\n", - " - Use LangChain’s Ollama or ChatOllama when replacing OpenAI; or implement a custom LLM subclass to call a local REST API like localhost:11434/api/generate; adjust prompts for Llama 2/3.\n", - "- Chroma vector DB persistence:\n", - " - Use persist_directory on both save and load; call .persist(); reload with the same embedding function; troubleshoot empty/duplicated collections.\n", - "- CSV/HTML loading and Documents:\n", - " - CSVLoader builds Document.page_content from non-metadata columns; embeddings derive from page_content.\n", - " - For HTML: prefer requests + BeautifulSoup to fetch/clean; ensure you pass doc.page_content; handle encoding; use RecursiveCharacterTextSplitter with appropriate chunk_size and chunk_overlap; store metadata (page title).\n", - "- Output parsers:\n", - " - StructuredOutputParser/ResponseSchema can describe nested JSON outputs; for lists of dicts, instruct the model to return an array of objects; include explicit queries about how to get a list of dictionaries and parse(response.content).\n", - "- JS Document metadata enrichment:\n", - " - CharacterTextSplitter.createDocuments accepts a second argument (array of metadata objects) to merge into each Document’s metadata; include queries showing how to add custom fields.\n", - "\n", - "Migration/deprecation notes to include\n", - "- Many integrations moved to langchain_community; OpenAI classes moved to langchain_openai; consult migration guides and deprecation notes for 0.1.x/0.2.x.\n", - "- Imports and signatures changed; ensure queries include exact new import paths and examples for both Python and JavaScript where relevant.\n", - "\n", - "Coverage requirements for each set of queries\n", - "1) Official docs/reference queries (API usage, parameters, signatures, migration notes) with site scoping.\n", - "2) End-to-end examples/tutorials and code samples, including queries that ask for runnable demos with visible outputs (e.g., print(result.content)).\n", - "3) Troubleshooting known errors and GitHub issues/discussions, embedding exact error strings from the user.\n", - "4) Stack Overflow Q&A for similar symptoms.\n", - "5) Migration/deprecation notes for breaking changes (module split, class renames).\n", - "\n", - "Quality checks before submitting\n", - "- 10–15 distinct, detailed queries.\n", - "- Each query 12–25+ words, concrete, and task-oriented.\n", - "- Mix of site-scoped (docs, GitHub, Stack Overflow) and general web queries.\n", - "- Embed exact code identifiers, parameters, environment variables, driver URIs, and error messages from the user’s question.\n", - "- Vary solution angles: API usage, end-to-end demos with visible output, troubleshooting, performance, and migration; propose viable alternatives where appropriate.\n", - "2025/08/13 22:47:02 INFO dspy.evaluate.evaluate: Average Metric: 4.75 / 5 (95.0%)\n", - "2025/08/13 22:47:02 INFO dspy.teleprompt.gepa.gepa: Iteration 30: New subsample score is not better, skipping\n", - "2025/08/13 22:47:02 INFO dspy.teleprompt.gepa.gepa: Iteration 31: Selected program 7 score: 0.8777777777777779\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 4.50 / 5 (90.0%): 100%|██████████| 5/5 [00:11<00:00, 2.21s/it] " - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 22:47:14 INFO dspy.evaluate.evaluate: Average Metric: 4.5 / 5 (90.0%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 22:48:54 INFO dspy.teleprompt.gepa.gepa: Iteration 31: Proposed new text for query_writer: You are given a user’s technical question. Your job is to output a diverse set of long, highly specific search queries that will help gather the most relevant information from search engines to solve the question.\n", - "\n", - "Output format rules\n", - "- Output only a flat list (array) of 10–15 search queries (strings). No explanations, no extra text.\n", - "- Use a JSON-like Python list of strings.\n", - "- Each query must be long and detailed (aim for 12–25+ words).\n", - "- Queries must be distinct; avoid near-duplicates and trivial rephrasings.\n", - "\n", - "How to craft the queries\n", - "1) Mirror the user’s goal and restate it in several ways across the queries. Reflect concrete tasks they mention and the exact operations they are trying to do, including desired outcomes, constraints, and edge cases.\n", - "\n", - "2) Extract and embed exact class/function names, parameters, method calls, import paths, driver URIs, CLI flags, environment variables, file names, and any literal error messages (use quotes) from the question.\n", - " - When relevant, include identifiers such as: create_sql_agent, SQLDatabaseToolkit, SQLDatabase.from_uri(\"postgresql+psycopg2://...\"), AgentExecutor, RetrievalQA.from_chain_type, ConversationalRetrievalChain, CharacterTextSplitter.createDocuments, Document.page_content, BaseMessage, AIMessage, .content, HuggingFaceHub, HuggingFaceHubEmbeddings, Chroma.from_texts, .persist(), as_retriever(search_kwargs={'k': ...}), OpenAI(model_name='text-davinci-003'), ChatOpenAI, AzureOpenAI/AzureChatOpenAI, langchain_openai.ChatOpenAI, langchain_openai.AzureOpenAI, SQLDatabaseSequentialChain, SQLDatabaseChain, create_structured_chat_agent, Top-k parameters, include_tables, table_info, sample_rows_in_table_info.\n", - " - Include driver URIs and connection strings exactly as shown or plausible alternatives, e.g., \"postgresql+psycopg2://user:pass@host:5432/db\", \"mssql+pyodbc://user:pass@server/db?driver=ODBC Driver 17 for SQL Server\", and quote_plus encoded forms.\n", - " - Include environment variables when applicable: OPENAI_API_KEY, AZURE_OPENAI_API_KEY, OPENAI_API_BASE, OPENAI_API_VERSION, and Azure deployment names.\n", - " - Include exact error strings the user shows or might encounter, e.g., \"invalid_request_error\", \"Pipeline cannot infer suitable model classes\", tokenization/config issues like missing config.json, tokenizer.json, adapter_config.json, or LoRA/PEFT adapter loading problems.\n", - "\n", - "3) Mention relevant library/framework names and versions if known or commonly implicated:\n", - " - LangChain 0.1.x, 0.2.x and the module split across langchain, langchain-openai, langchain-community, langchain-core, langchain-text-splitters; JS docs: site:js.langchain.com\n", - " - OpenAI vs AzureOpenAI vs ChatOpenAI (langchain_openai), ChatOllama/Ollama, VertexAI, Llama 2/3\n", - " - SQLAlchemy 2.x, psycopg2/psycopg2-binary, pyodbc, ODBC Driver 17 for SQL Server\n", - " - Vector stores: Chroma, Pinecone, FAISS\n", - " - Hugging Face: transformers/peft/accelerate/sentencepiece; AutoModelForCausalLM, AutoTokenizer, PeftModel; LoRA/PEFT adapters\n", - "\n", - "4) Use task-oriented phrasing and multiple solution angles to increase recall:\n", - " - Vary agent vs chain framing (“agent”, “tool”, “retriever tool”, “vector store tool”, “ReAct”, “structured chat agent”).\n", - " - Explore memory, prompt customization, callbacks vs hooks, retriever configuration, top_k tuning, and metadata reflection settings.\n", - " - Include migration/deprecation terms: module split, imports moved to langchain_openai or langchain_community, API changes, class renames, deprecations in 0.1.x/0.2.x.\n", - "\n", - "5) Cover multiple plausible interpretations and troubleshooting paths:\n", - " - Wrong class or API (e.g., AzureOpenAI vs OpenAI incompatibility; Chat vs completion LLM classes).\n", - " - Incorrect arguments or signatures; wrong parameter names; trailing spaces in model names (“text-davinci-003 ”).\n", - " - Version mismatches; deprecated APIs; install requirements not met (psycopg2 vs psycopg2-binary; pyodbc; ODBC driver encoding; DSN issues).\n", - " - Configuration mistakes: environment variable setup, Azure endpoint/deployment, API versions.\n", - " - Performance pitfalls: SQL metadata reflection on all tables; how to limit with include_tables, table_info hints, sample_rows_in_table_info; alternatives like SQL agents or limiting schema scope.\n", - " - Prompt formatting for different models; memory integration; passing callbacks to the correct component; BaseCallbackHandler method signatures and kwargs handling.\n", - " - Embeddings/vector store persistence; using the same embedding function on reload; Chroma persist_directory and .persist(); duplicate/empty collections.\n", - " - Document creation and usage: building Document(page_content=...), adding metadata, ensuring input_documents is a list of Documents (e.g., input_documents=[doc]).\n", - " - HF model loading with base + LoRA adapter; trust_remote_code; custom LangChain LLM subclass for custom pipelines or REST backends.\n", - "\n", - "6) Scope some queries to authoritative sources and Q&A:\n", - " - Official docs/reference: site:docs.langchain.com OR site:python.langchain.com OR site:langchain.readthedocs.io\n", - " - API reference: site:api.python.langchain.com\n", - " - JavaScript docs (if relevant): site:js.langchain.com\n", - " - GitHub issues/discussions: site:github.com with “issues” or “discussions”\n", - " - Stack Overflow: site:stackoverflow.com\n", - " - Also include broader web queries (no site:) for blogs, tutorials, and community posts\n", - "\n", - "Domain-specific guidance to incorporate (LangChain and adjacent tooling)\n", - "- ChatOpenAI / OpenAI usage in LangChain 0.1.x/0.2.x:\n", - " - Prefer langchain_openai.ChatOpenAI; distinguish OpenAI vs AzureOpenAI/AzureChatOpenAI; pass azure_deployment, api_version, and endpoint for Azure.\n", - " - invoke(...) returns a BaseMessage/AIMessage; access the .content attribute to view text (print(result.content)).\n", - " - Trailing spaces in model names can cause “invalid_request_error”.\n", - "\n", - "- JavaScript streaming and memory:\n", - " - For JS, use ChatOpenAI with streaming: true; pass callbacks with handleLLMNewToken at the model level if required by version.\n", - " - Use ChatPromptTemplate to structure system/user messages; configure BufferMemory with returnMessages: true to preserve and stream chat history.\n", - "\n", - "- Memory and custom prompts for retrieval chat:\n", - " - To combine memory and retrieval, consider ConversationalRetrievalChain; customize condense-question and answer prompts (PromptTemplate).\n", - " - Compare RetrievalQA.from_chain_type vs ConversationalRetrievalChain; attach ConversationBufferMemory; override prompts; demonstrate examples that show both memory and prompt influence.\n", - "\n", - "- SQL agents and databases:\n", - " - Prefer create_sql_agent + SQLDatabaseToolkit + AgentExecutor for dynamic table selection.\n", - " - Connect via SQLDatabase.from_uri(\"postgresql+psycopg2://...\") or MSSQL \"mssql+pyodbc://...\".\n", - " - Ensure drivers installed (psycopg2/psycopg2-binary for Postgres, pyodbc for MSSQL); encode ODBC driver name with quote_plus if needed.\n", - " - SQLDatabaseSequentialChain can be slow due to metadata reflection across all tables; mitigate with include_tables, table_info hints, sample_rows_in_table_info, or lazy/limited reflection; consider SQLDatabaseChain or SQL agents as alternatives.\n", - "\n", - "- ReAct agents with retrieval:\n", - " - Expose vector store retriever as a Tool (vector_store.as_retriever(search_kwargs={'k': ...})); use AgentExecutor or create_structured_chat_agent; compare to ConversationalRetrievalChain.\n", - "\n", - "- Callbacks:\n", - " - Pass callbacks to the LLM instance when required, not just to chains.\n", - " - Verify BaseCallbackHandler method signatures; for example, on_llm_end should accept response and **kwargs in newer versions; review run manager kwargs and version-specific changes.\n", - "\n", - "- Ollama / Llama models:\n", - " - Use LangChain’s Ollama or ChatOllama when replacing OpenAI; or implement a custom LLM wrapper for a REST API like localhost:11434/api/generate.\n", - " - Adjust prompt format for Llama 2/3 (system/instruction roles; chat templates) when replacing OpenAI; ensure compatibility with agents’ tool-use prompts.\n", - "\n", - "- Chroma vector DB persistence:\n", - " - Use persist_directory both on create and reload; call .persist(); ensure the same embedding function is used on load; troubleshoot empty/duplicated collections.\n", - "\n", - "- CSVLoader and Documents:\n", - " - CSVLoader builds Document.page_content by joining row key-value pairs excluding metadata_columns; embeddings are computed on page_content.\n", - " - Cleaning metadata (e.g., removing 'source' or 'row') affects only metadata, not the embedded text.\n", - "\n", - "- HTML loading and splitting:\n", - " - UnstructuredHTMLLoader returns Document objects; HTMLHeaderTextSplitter expects raw HTML/text segments; ensure you pass the correct string (e.g., doc.page_content) and handle encoding to avoid weird characters.\n", - " - Consider pre-processing HTML with requests + BeautifulSoup (strip scripts/styles, normalize whitespace, fix encodings) before splitting.\n", - " - Use RecursiveCharacterTextSplitter or HTMLHeaderTextSplitter to keep headers with following paragraphs; ensure chunking keeps title and subsequent paragraph together; include metadata like page title.\n", - " - Control chunk sizes (e.g., max 20K characters) and save chunks with metadata for downstream training.\n", - "\n", - "- Output parsers:\n", - " - StructuredOutputParser and ResponseSchema can describe nested JSON outputs, including lists of dicts; instruct the model via format_instructions to return an array of objects.\n", - " - Consider PydanticOutputParser with List[YourModel] or JsonOutputParser for list-of-objects schemas; parse(response.content).\n", - "\n", - "- Documents and chains:\n", - " - When a method expects a list of Documents (e.g., input_documents), wrap a single Document as input_documents=[doc]; avoid passing a bare Document or tuple to prevent \"tuple object has no attribute 'page_content'\".\n", - " - Create Document(page_content=\"...\", metadata={...}) correctly and verify types before passing to chains.\n", - "\n", - "- JavaScript Document metadata tip:\n", - " - CharacterTextSplitter.createDocuments accepts a second argument (array of objects). Properties from that array are merged into each Document’s metadata, enabling custom fields without manual mutation.\n", - "\n", - "- Versioning/migration:\n", - " - Be explicit about the modular split and import moves (use langchain_openai.ChatOpenAI / AzureOpenAI; many integrations moved to langchain_community); consult migration guides and deprecation notes.\n", - "\n", - "Nuggets from common issues to explicitly target with queries when relevant\n", - "- CharacterTextSplitter.createDocuments (JS): pass a second argument array whose properties are merged into metadata for each returned Document.\n", - "- CSVLoader vectorization: embeddings are created from Document.page_content built by joining non-metadata columns.\n", - "- HTML splitting: preprocess with BeautifulSoup; keep headers with following paragraphs; handle encoding; pass correct string to splitters.\n", - "- Callbacks: pass callbacks to the LLM instance; ensure on_llm_end signature matches current LangChain version (response, **kwargs).\n", - "- Ollama + SQL agent: load Llama2 via Ollama/ChatOllama; replace OpenAI in create_sql_agent/SQLDatabaseToolkit; adjust prompts for Llama chat format.\n", - "- Conversational retrieval with memory and custom prompts: override condense-question and answer prompts; add ConversationBufferMemory; include examples demonstrating both memory and prompt effects.\n", - "- JS streaming: use ChatOpenAI with streaming: true; handle tokens via handleLLMNewToken; configure BufferMemory(returnMessages: true) and ChatPromptTemplate for structured chats.\n", - "\n", - "Coverage requirements for each set of queries\n", - "1) Official docs/reference queries (API usage, parameters, signatures, migration notes).\n", - "2) End-to-end examples/tutorials and code samples.\n", - "3) Troubleshooting known errors and GitHub issues/discussions (include exact error strings in quotes).\n", - "4) Stack Overflow Q&A for similar symptoms.\n", - "5) Migration/deprecation notes for breaking changes (e.g., module split, class renames).\n", - "\n", - "Additional reminders\n", - "- Include queries that propose viable alternatives if the user’s approach is problematic (e.g., using SQL agents instead of SQLDatabaseSequentialChain; writing a custom LLM wrapper for a REST HF/Ollama backend).\n", - "- For APIs expecting lists, include queries emphasizing wrapping a single Document in a list (e.g., input_documents=[doc]).\n", - "- For OpenAI/AzureOpenAI issues, include environment variable setup and configuration queries; explicitly mention OPENAI_API_KEY, AZURE_OPENAI_API_KEY, OPENAI_API_BASE, OPENAI_API_VERSION, and azure deployment names.\n", - "- Consider Python vs JS docs when the user context suggests either; include site scoping accordingly.\n", - "\n", - "Quality checks before submitting\n", - "- 10–15 distinct, detailed queries.\n", - "- A mix of site-scoped queries (docs, GitHub, Stack Overflow) and general web queries.\n", - "- Embed exact code identifiers, literals, and any error messages from the user’s question in multiple queries.\n", - "- Use varied phrasing and explore multiple solution avenues and tool choices.\n", - "- Do not provide answers or code—only the list of search queries.\n", - "2025/08/13 22:49:05 INFO dspy.evaluate.evaluate: Average Metric: 4.5 / 5 (90.0%)\n", - "2025/08/13 22:49:05 INFO dspy.teleprompt.gepa.gepa: Iteration 31: New subsample score is not better, skipping\n", - "2025/08/13 22:49:05 INFO dspy.teleprompt.gepa.gepa: Iteration 32: Selected program 3 score: 0.8444444444444444\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 4.75 / 5 (95.0%): 100%|██████████| 5/5 [00:10<00:00, 2.06s/it] " - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 22:49:15 INFO dspy.evaluate.evaluate: Average Metric: 4.75 / 5 (95.0%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 22:50:34 INFO dspy.teleprompt.gepa.gepa: Iteration 32: Proposed new text for query_writer: You are given a user’s technical question. Your job is to output a diverse set of long, highly specific search queries that will help gather the most relevant information from search engines to solve the question.\n", - "\n", - "Output format rules\n", - "- Output only a flat list (array) of 10–15 search queries (strings). No explanations, no extra text.\n", - "- Use a JSON-like Python list of strings.\n", - "- Each query must be long and detailed (aim for 12–25+ words).\n", - "- Queries must be distinct; avoid near-duplicates and trivial rephrasings.\n", - "\n", - "How to craft the queries\n", - "1) Mirror the user’s goal and restate it in several ways across the queries. Reflect concrete tasks they mention.\n", - "2) Extract and embed exact class/function names, parameters, method calls, import paths, driver URIs, CLI flags, and any literal error messages (quoted) from the question. Examples to embed when present:\n", - " - create_sql_agent, AgentExecutor, AgentType.ZERO_SHOT_REACT_DESCRIPTION, SQLDatabaseToolkit, SQLDatabase.from_uri(\"postgresql+psycopg2://...\"), \"mssql+pyodbc://...?...driver=ODBC Driver 17 for SQL Server\"\n", - " - from langchain_openai import ChatOpenAI, from langchain_community.document_loaders import UnstructuredHTMLLoader, from langchain_text_splitters import HTMLHeaderTextSplitter\n", - " - invoke(...), result.content, BaseMessage, AIMessage, ConversationBufferMemory, PromptTemplate\n", - " - Chroma persist_directory, .persist(), FAISS, Pinecone\n", - " - Ollama / ChatOllama endpoints like localhost:11434/api/generate\n", - " - Exact quoted error text (e.g., \"value is not a valid dict (type=type_error.dict)\", \"invalid_request_error\")\n", - "3) Include relevant library/framework names and versions if known or commonly implicated:\n", - " - LangChain 0.1.x, 0.2.x; module split: langchain, langchain-openai, langchain-community, langchain-core, langchain-text-splitters\n", - " - OpenAI vs AzureOpenAI vs ChatOpenAI (langchain_openai), ChatOllama/Ollama, VertexAI, Llama 2/3\n", - " - SQLAlchemy 2.x, psycopg2/psycopg2-binary, pyodbc, ODBC Driver 17 for SQL Server\n", - " - Vector stores: Chroma, Pinecone, FAISS\n", - "4) Use task-oriented phrasing and multiple solution angles. Employ synonyms/alternate phrasings to increase recall, such as:\n", - " - “agent” vs “chain”; “callback” vs “hook”; “tool” vs “retriever tool” vs “vector store tool”\n", - " - “ReAct” vs “structured chat agent”; “add memory” vs “conversation history” vs “ConversationBufferMemory”\n", - " - “custom prompt” vs “override condense question prompt” vs “answer prompt template”\n", - " - “disable metadata reflection”, “limit tables”, “table_info hints”, “top_k tuning”\n", - " - “print BaseMessage content”, “AIMessage .content not showing”\n", - "5) Cover multiple plausible interpretations and troubleshooting paths:\n", - " - API misuse or wrong class (e.g., AzureOpenAI vs ChatOpenAI; ChatOllama vs OpenAI)\n", - " - Incorrect arguments or signature usage; wrong method return types (invoke returning BaseMessage/AIMessage; use .content)\n", - " - Version mismatch or deprecated API due to LangChain module split; import path changes\n", - " - Missing installs or wrong driver (psycopg2/psycopg2-binary, pyodbc, ODBC Driver 17)\n", - " - Configuration issues (OPENAI_API_KEY env var, Azure deployment_name, Ollama localhost endpoint)\n", - " - Performance pitfalls (SQLDatabaseSequentialChain reflection slowness; vector store indexing parameters)\n", - " - Prompt formatting for Llama 2/3 (system/instruction roles) and agents/text-to-SQL\n", - " - Document loading and splitting correctness (UnstructuredHTMLLoader vs BeautifulSoup; HTMLHeaderTextSplitter vs RecursiveCharacterTextSplitter; preserving metadata and header+paragraph grouping; 20K character chunking)\n", - "6) Scope some queries to authoritative sources and Q&A:\n", - " - Official docs/reference: site:docs.langchain.com OR site:python.langchain.com OR site:langchain.readthedocs.io\n", - " - API reference: site:api.python.langchain.com\n", - " - JS/TS docs when relevant: site:js.langchain.com\n", - " - GitHub issues/discussions: site:github.com with “issues” or “discussions”\n", - " - Stack Overflow: site:stackoverflow.com\n", - " - Also include generic web queries (no site:) for blogs, tutorials, and community posts.\n", - "\n", - "Domain-specific guidance to incorporate when relevant\n", - "- ChatOpenAI / OpenAI usage (LangChain 0.1.x/0.2.x):\n", - " - Use langchain_openai.ChatOpenAI; distinguish OpenAI vs AzureOpenAI vs ChatOllama/Ollama.\n", - " - invoke(...) returns a BaseMessage/AIMessage; to view text, access the .content attribute (e.g., result = llm.invoke(...); print(result.content)).\n", - " - Watch for trailing spaces in model names causing \"invalid_request_error\".\n", - " - Ensure OPENAI_API_KEY is set or passed; verify environment variable loading with python-dotenv.\n", - "- Memory and custom prompts for retrieval chat:\n", - " - To combine memory and retrieval, prefer ConversationalRetrievalChain; customize condense-question and answer prompts via PromptTemplate.\n", - " - Compare RetrievalQA.from_chain_type vs ConversationalRetrievalChain; how to attach ConversationBufferMemory; how to override prompts and history handling.\n", - "- SQL agents and databases:\n", - " - Use create_sql_agent, SQLDatabaseToolkit, SQLDatabase.from_uri(\"postgresql+psycopg2://...\"), AgentExecutor; for MSSQL URIs: \"mssql+pyodbc://...?...driver=ODBC Driver 17 for SQL Server\".\n", - " - Ensure drivers installed (psycopg2/psycopg2-binary, pyodbc). Include examples with correct DSNs and credentials placeholders.\n", - " - Beware ValidationError like \"value is not a valid dict (type=type_error.dict)\" from mis-typed llm/toolkit parameters or wrong LLM class; consider OpenAI vs AzureOpenAI compatibility.\n", - " - SQLDatabaseSequentialChain can be slow from metadata reflection; include/limit tables, lazy reflection, or use SQL agents; provide table_info hints; tune top_k.\n", - "- ReAct agents with retrieval:\n", - " - Expose vector store retriever as a Tool (vector_store.as_retriever(search_kwargs={'k': ...})); use AgentExecutor or create_structured_chat_agent; or compare with ConversationalRetrievalChain.\n", - "- Callbacks:\n", - " - Pass callbacks to the LLM instance when required, not just to chains.\n", - " - Verify BaseCallbackHandler method signatures (e.g., on_llm_end(response, **kwargs)) and kwargs handling; note provider-specific nuances like VertexAI.\n", - "- Ollama / Llama models:\n", - " - Use LangChain’s Ollama or ChatOllama wrappers when replacing OpenAI in agents/chains, or implement a custom LLM wrapper for local REST APIs (e.g., localhost:11434/api/generate).\n", - " - Adjust prompt format for Llama 2/3 (system/instruction roles) for agents and text-to-SQL; confirm model names and base_url configuration.\n", - "- Chroma vector DB persistence:\n", - " - Use persist_directory on create and reload; call .persist(); ensure same embedding function on load; troubleshoot empty/duplicated collections.\n", - "- CSVLoader and Documents:\n", - " - CSVLoader builds Document.page_content by joining row key-value pairs; embeddings computed on page_content; metadata_columns excluded.\n", - " - Character/RecursiveTextSplitter (JS/py) createDocuments accepts metadatas merged into Document.metadata.\n", - "- Output parsers:\n", - " - StructuredOutputParser and ResponseSchema: specify schemas for arrays/lists by describing expected nested JSON in the schema/prompt; parse(response.content).\n", - "- HTML/document splitting:\n", - " - For HTML, consider requests + BeautifulSoup to clean content; UnstructuredHTMLLoader vs bs4 parsing; HTMLHeaderTextSplitter configuration; RecursiveCharacterTextSplitter to enforce max 20K characters while keeping header + following paragraphs together; attach page title in Document.metadata; save chunks (JSONL/CSV/vector store).\n", - "\n", - "Coverage requirements for each set of queries (aim to include a mix)\n", - "1) Official docs/reference queries (API usage, parameters, signatures).\n", - "2) End-to-end examples/tutorials and code samples.\n", - "3) Troubleshooting known errors and GitHub issues/discussions.\n", - "4) Stack Overflow Q&A for similar symptoms.\n", - "5) Migration/deprecation notes for breaking changes (e.g., LangChain module split, class renames).\n", - "\n", - "Quality checks before submitting\n", - "- 10–15 distinct, detailed queries; each 12–25+ words.\n", - "- Include some site-scoped queries (docs, GitHub issues, Stack Overflow) and some generic web queries.\n", - "- Embed exact code identifiers, configuration strings, and quoted error messages from the user’s question.\n", - "- Use multiple phrasings/synonyms and explore diverse solution angles.\n", - "- Do not provide answers or code—only the list of search queries.\n", - "2025/08/13 22:50:46 INFO dspy.evaluate.evaluate: Average Metric: 4.5 / 5 (90.0%)\n", - "2025/08/13 22:50:46 INFO dspy.teleprompt.gepa.gepa: Iteration 32: New subsample score is not better, skipping\n" - ] - } - ], - "source": [ - "import dspy\n", - "\n", - "import logging\n", - "\n", - "# Simple setup for Jupyter\n", - "logging.basicConfig(level=logging.INFO, force=True)\n", - "logging.getLogger('dspy.teleprompt.gepa').setLevel(logging.INFO)\n", - "logging.getLogger('gepa').setLevel(logging.INFO)\n", - "\n", - "# SILENCE the noisy HTTP loggers\n", - "logging.getLogger('httpx').setLevel(logging.WARNING) # Only warnings and errors\n", - "logging.getLogger('openai').setLevel(logging.WARNING)\n", - "logging.getLogger('weaviate').setLevel(logging.WARNING)\n", - "logging.getLogger('httpcore').setLevel(logging.WARNING)\n", - "\n", - "reflection_lm = dspy.LM(\n", - " model=\"gpt-5\",\n", - " temperature=1.0,\n", - " max_tokens=32_000\n", - ")\n", - "\n", - "optimizer = dspy.GEPA(\n", - " metric=metric_for_gepa,\n", - " max_metric_calls=500,\n", - " reflection_lm=reflection_lm,\n", - " reflection_minibatch_size=5,\n", - " use_merge=True,\n", - " num_threads=8\n", - ")\n", - "\n", - "# there are 30 samples in `trainset` to begin with\n", - "trainset=trainset[:15] # these are randomly sampled for Reflective Prompt Mutation\n", - "valset=trainset[15:] # these samples create the pareto frontier\n", - "\n", - "optimized_query_expander = optimizer.compile(\n", - " query_writer,\n", - " trainset=trainset,\n", - " valset=valset\n", - ")" - ] - }, - { - "cell_type": "code", - "execution_count": 16, - "id": "4af29b20", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "GEPA run is finished!\n" - ] - } - ], - "source": [ - "print(\"GEPA run is finished!\")" - ] - }, - { - "cell_type": "code", - "execution_count": 17, - "id": "c4395f8d", - "metadata": {}, - "outputs": [], - "source": [ - "optimized_query_expander.save(\"gepa_optimized_multi_query_writer.json\")" - ] - }, - { - "cell_type": "code", - "execution_count": 20, - "id": "67a037a9", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 14.81 / 20 (74.0%): 100%|██████████| 20/20 [00:44<00:00, 2.20s/it]" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 23:08:07 INFO dspy.evaluate.evaluate: Average Metric: 14.808333333333332 / 20 (74.0%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "text/plain": [ - "EvaluationResult(score=74.04, results=)" - ] - }, - "execution_count": 20, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "evaluator(optimized_query_expander, **dspy_evaluator_kwargs)" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": ".venv", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.11.4" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} diff --git a/optimization_runs/gepa_multi_query_writer_training_samples.jsonl b/optimization_runs/gepa_multi_query_writer_training_samples.jsonl deleted file mode 100644 index de7ddf3..0000000 --- a/optimization_runs/gepa_multi_query_writer_training_samples.jsonl +++ /dev/null @@ -1,30 +0,0 @@ -{"question": "I have the following LangChain code that checks the chroma vectorstore and extracts the answers from the stored docs - how do I incorporate a Prompt template to create some context , such as the following:\nsales_template = \"\"\"You are customer services and you need to help people.\n{context}\nQuestion: {question}\"\"\"\nSALES_PROMPT = PromptTemplate(\n template=sales_template, input_variables=[\"context\", \"question\"]\n)\n\nHow do I incorporate the above into the below?\n#Embedding Text Using Langchain\nfrom langchain.embeddings import SentenceTransformerEmbeddings\nembeddings = SentenceTransformerEmbeddings(model_name=\"all-MiniLM-L6-v2\")\n\n# Creating Vector Store with Chroma DB\nfrom langchain.vectorstores import Chroma\n#db = Chroma.from_documents(docs, embeddings)\ndb = Chroma(persist_directory=\"./chroma_db\", embedding_function=embeddings)\n# docs = db3.similarity_search(query)\n# print(docs[0].page_content)\n\n#Using OpenAI Large Language Models (LLM) with Chroma DB\nimport os\nos.environ[\"OPENAI_API_KEY\"] = 'sk-12345678910'\n\nfrom langchain.chat_models import ChatOpenAI\nmodel_name = \"gpt-3.5-turbo\"\nllm = ChatOpenAI(model_name=model_name)\n\n#Extracting Answers from Documents\n\nfrom langchain.chains.question_answering import load_qa_chain\nchain = load_qa_chain(llm, chain_type=\"stuff\",verbose=True)\n\nquery = \"What does Neil do for work?\"\nmatching_docs = db.similarity_search(query)\nanswer = chain.run(input_documents=matching_docs, question=query)\nprint(answer)\n\n"} -{"question": "I'm trying to pass filters to redis retriever to do hybrid search on my embeddings (vector + metadata filtering). The following doesn't work! It fails to pass the filters and filters would always be None:\nretriever = redis.as_retriever(\n search_type=\"similarity_distance_threshold\",\n search_kwargs=\"{'include_metadata': True,'distance_threshold': 0.8,'k': 5}\",\n filter=\"(@launch:{false} @menu_text:(%%chicken%%))\"\n )\n\nI found another example and apparently filter expression should be pass as search_kwargs, but I can't figure out what should be the correct syntax. If I do it as follow:\nretriever = redis.as_retriever(\n search_type=\"similarity_distance_threshold\",\n \"retriever_search_kwargs\":\"{'include_metadata': True,'distance_threshold': 0.8,'k': 5, 'filter': '@menu_text:(%%chicken%%) @lunch:{true}'}\",\n}\n\nit generates this search query:\nsimilarity_search_by_vector > redis_query : (@content_vector:[VECTOR_RANGE $distance_threshold $vector] @menu_text:(%%chicken%%) @lunch:{true})=>{$yield_distance_as: distance}\nand fails with the following error:\nredis.exceptions.ResponseError: Invalid attribute yield_distance_as\nAny idea how to fix it?\nSystem Info:\nlangchain 0.0.346\nlangchain-core 0.0.10\npython 3.9.18\n"} -{"question": "I am generating chromba db which has vector embeddings for pdf different documents and I want to store them to avoid re computation every time for different inputs. Pickling and Json serialization does not seem to work for chroma object, importing from another file also makes the embedding script run again.\n"} -{"question": "I'm very new to LangChain, and I'm working with around 100-150 HTML files on my local disk that I need to upload to a server for NLP model training. However, I have to divide my information into chunks because each file is only permitted to have a maximum of 20K characters. I'm trying to use the LangChain library to do so, but I'm not being successful in splitting my files into my desired chunks.\nFor reference, I'm using this URL: http://www.hadoopadmin.co.in/faq/ Saved locally as HTML only.\nIt's a Hadoop FAQ page that I've downloaded as an HTML file onto my PC. There are many questions and answers there. I've noticed that sometimes, for some files, it gets split by a mere title, and another split is the paragraph following that title. But my desired output would be to have the title and the specific paragraph or following text from the body of the page, and as metadata, the title of the page.\nI'm using this code:\nfrom langchain_community.document_loaders import UnstructuredHTMLLoader\nfrom langchain_text_splitters import HTMLHeaderTextSplitter\n# Same Example with the URL http://www.hadoopadmin.co.in/faq/ Saved Locally as HTML Only\ndir_html_file='FAQ \u2013 BigData.html'\n\ndata_html = UnstructuredHTMLLoader(dir_html_file).load()\n\nheaders_to_split_on = [\n (\"h1\", \"Header 1\")]\nhtml_splitter = HTMLHeaderTextSplitter(headers_to_split_on=headers_to_split_on)\nhtml_header_splits = html_splitter.split_text(str(data_html))\n\nBut is returning a bunch of weird characters and not splitting the document at all.\nThis is an output:\n[Document(page_content='[Document(page_content=\\'BigData\\\\n\\\\n\"You can have data without information, but you cannot have information without Big data.\"\\\\n\\\\nsaurabhmcakiet@gmail.com\\\\n\\\\n+91-8147644946\\\\n\\\\n\\\\n\\\\n\\\\n\\\\n\\\\n\\\\n\\\\n\\\\n\\\\n\\\\n\\\\n\\\\nToggle navigation\\\\n\\\\nHome\\\\n\\\\nBigData\\\\n\\\\n\\\\tOverview of BigData\\\\n\\\\tSources of BigData\\\\n\\\\tPros & Cons of BigData\\\\n\\\\tSolutions of BigData\\\\n\\\\nHadoop Admin\\\\n\\\\n\\\\tHadoop\\\\n\\\\t\\\\n\\\\t\\\\tOverview of HDFS\\\\n\\\\t\\\\tOverview of MapReduce\\\\n\\\\t\\\\tApache YARN\\\\n\\\\t\\\\tHadoop Architecture\\\\n\\\\t\\\\n\\\\n\\\\tPlanning of Hadoop Cluster\\\\n\\\\tAdministration and Maintenance\\\\n\\\\tHadoop Ecosystem\\\\n\\\\tSetup HDP cluster from scratch\\\\n\\\\tInstallation and Configuration\\\\n\\\\tAdvanced Cluster Configuration\\\\n\\\\tOverview of Ranger\\\\n\\\\tKerberos\\\\n\\\\t\\\\n\\\\t\\\\tInstalling kerberos/Configuring the KDC and Enabling Kerberos Security\\\\n\\\\t\\\\tConfigure SPNEGO Authentication for Hadoop\\\\n\\\\t\\\\tDisabled kerberos via ambari\\\\n\\\\t\\\\tCommon issues after Disabling kerberos via Ambari\\\\n\\\\t\\\\tEnable https for ambari Server\\\\n\\\\t\\\\tEnable SSL or HTTPS for Oozie Web UI\\\\n\\\\nHadoop Dev\\\\n\\\\n\\\\tSolr\\\\n\\\\t\\\\n\\\\t\\\\tSolr Installation\\\\n\\\\t\\\\tCommits and Optimizing in Solr and its use for NRT\\\\n\\\\t\\\\tSolr FAQ\\\\n\\\\t\\\\n\\\\n\\\\tApache Kafka\\\\n\\\\t\\\\n\\\\t\\\\tKafka QuickStart\\\\n\\\\t\\\\n\\\\n\\\\tGet last access time of hdfs files\\\\n\\\\tProcess hdfs data with Java\\\\n\\\\tProcess hdfs data with Pig\\\\n\\\\tProcess hdfs data with Hive\\\\n\\\\tProcess hdfs data with Sqoop/Flume\\\\n\\\\nBigData Architect\\\\n\\\\n\\\\tSolution Vs Enterprise Vs Technical Architect\u2019s Role and Responsibilities\\\\n\\\\tSolution architect certification\\\\n\\\\nAbout me\\\\n\\\\nFAQ\\\\n\\\\nAsk Questions\\\\n\\\\nFAQ\\\\n\\\\nHome\\\\n\\\\nFAQ\\\\n\\\\nFrequently\\\\xa0Asked Questions about Big Data\\\\n\\\\nMany questions about big data have yet to be answered in a vendor-neutral way. With so many definitions, opinions run the gamut. Here I will attempt to cut to the heart of the matter by addressing some key questions I often get from readers, clients and industry analysts.\\\\n\\\\n1) What is Big Data?\\\\n\\\\n1) What is Big Data?\\\\n\\\\nBig data\u201d is an all-inclusive term used to describe vast amounts of information. In contrast to traditional structured data which is typically stored in a relational database, big data varies in terms of volume, velocity, and variety.\\\\n\\\\nBig data\\\\xa0is characteristically generated in large volumes \u2013 on the order of terabytes or exabytes of data (starts with 1 and has 18 zeros after it, or 1 million terabytes) per individual data set.\\\\n\\\\nBig data\\\\xa0is also generated with high velocity \u2013 it is collected at frequent intervals \u2013 which makes it difficult to analyze (though analyzing it rapidly makes it more valuable).\\\\n\\\\nOr in simple words we can say \u201cBig Data includes data sets whose size is beyond the ability of traditional software tools to capture, manage, and process the data in a reasonable time.\u201d\\\\n\\\\n2) How much data does it take to be called Big Data?\\\\n\\\\nThis question cannot be easily answered absolutely. Based on the infrastructure on the market the lower threshold is at about 1 to 3 terabytes.\\\\n\\\\nBut using Big Data technologies can be sensible for smaller databases as well, for example if complex mathematiccal or statistical analyses are run against a database. Netezza offers about 200 built in functions and computer languages like Revolution R or Phyton which can be used in such cases.\\\\n\\\\\n\nMy Expected output will look something like this:\nOne chunk:\n\nFrequently Asked Questions about Big Data\n\nMany questions about big data have yet to be answered in a vendor-neutral way. With so many definitions, opinions run the gamut. Here I will attempt to cut to the heart of the matter by addressing some key questions I often get from readers, clients and industry analysts.\n\n1) What is Big Data?\n\u201cBig data\u201d is an all-inclusive term used to describe vast amounts of information. In contrast to traditional structured data which is typically stored in a relational database, big data varies in terms of volume, velocity, and variety. Big data is characteristically generated in large volumes \u2013 on the order of terabytes or exabytes of data (starts with 1 and has 18 zeros after it, or 1 million terabytes) per individual data set. Big data is also generated with high velocity \u2013 it is collected at frequent intervals \u2013 which makes it difficult to analyze (though analyzing it rapidly makes it more valuable).\nOr in simple words we can say \u201cBig Data includes data sets whose size is beyond the ability of traditional software tools to capture, manage, and process the data in a reasonable time.\u201d\n2) How much data does it take to be called Big Data?\nThis question cannot be easily answered absolutely. Based on the infrastructure on the market the lower threshold is at about 1 to 3 terabytes.\nBut using Big Data technologies can be sensible for smaller databases as well, for example if complex mathematical or statistical analyses are run against a database. Netezza offers about 200 built in functions and computer languages like Revolution R or Phyton which can be used in such cases.\n\nMetadata: FAQ\n\n\nAnother Chunck\n7) Where is the big data trend going?\nEventually the big data hype will wear off, but studies show that big data adoption will continue to grow. With a projected $16.9B market by 2015 (Wikibon goes even further to say $50B by 2017), it is clear that big data is here to stay. However, the big data talent pool is lagging behind and will need to catch up to the pace of the market. McKinsey & Company estimated in May 2011 that by 2018, the US alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions.\nThe emergence of big data analytics has permanently altered many businesses\u2019 way of looking at data. Big data can take companies down a long road of staff, technology, and data storage augmentation, but the payoff \u2013 rapid insight into never-before-examined data \u2013 can be huge. As more use cases come to light over the coming years and technologies mature, big data will undoubtedly reach critical mass and will no longer be labeled a trend. Soon it will simply be another mechanism in the BI ecosystem.\n8) Who are some of the BIG DATA users?\nFrom cloud companies like Amazon to healthcare companies to financial firms, it seems as if everyone is developing a strategy to use big data. For example, every mobile phone user has a monthly bill which catalogs every call and every text; processing the sheer volume of that data can be challenging. Software logs, remote sensing technologies, information-sensing mobile devices all pose a challenge in terms of the volumes of data created. The size of Big Data can be relative to the size of the enterprise. For some, it may be hundreds of gigabytes, for others, tens or hundreds of terabytes to cause consideration.\n9) Data visualization is becoming more popular than ever.\nIn my opinion, it is absolutely essential for organizations to embrace interactive data visualization tools. Blame or thank big data for that and these tools are amazing. They are helping employees make sense of the never-ending stream of data hitting them faster than ever. Our brains respond much better to visuals than rows on a spreadsheet.\nCompanies like Amazon, Apple, Facebook, Google, Twitter, Netflix and many others understand the cardinal need to visualize data. And this goes way beyond Excel charts, graphs or even pivot tables. Companies like Tableau Software have allowed non-technical users to create very interactive and imaginative ways to visually represent information.\n\nMetadata: FAQ \n\nMy thought process is being able to gather all the information and split it into chunks, but I don't want titles without their following paragraphs separated, and I also want as much info as possible (max 20K characters) before creating another chunk.\nI would also like to save these chunks and their meta data. Is there a function in LangChain to do this?\nI am open to hearing not to do this in LangChain for efficiency reasons.\n"} -{"question": "I've searched all over langchain documentation on their official website but I didn't find how to create a langchain doc from a str variable in python so I searched in their GitHub code and I found this :\n doc=Document(\n page_content=\"text\",\n metadata={\"source\": \"local\"}\n )\n\n\nPS: I added the metadata attribute\nthen I tried using that doc with my chain:\nMemory and Chain:\nmemory = ConversationBufferMemory(memory_key=\"chat_history\", input_key=\"human_input\")\nchain = load_qa_chain(\n llm, chain_type=\"stuff\", memory=memory, prompt=prompt\n)\n\n\nthe call method:\n chain({\"input_documents\": doc, \"human_input\": query})\n\nprompt template:\ntemplate = \"\"\"You are a senior financial analyst analyzing the below document and having a conversation with a human.\n{context}\n{chat_history}\nHuman: {human_input}\nsenior financial analyst:\"\"\"\n\nprompt = PromptTemplate(\n input_variables=[\"chat_history\", \"human_input\", \"context\"], template=template\n)\n\nbut I am getting the following error:\nAttributeError: 'tuple' object has no attribute 'page_content'\n\n\nwhen I tried to check the type and the page content of the Document object before using it with the chain I got this\nprint(type(doc))\n\nprint(doc.page_content)\n\"text\"\n\n\n\n"} -{"question": "How should I add a field to the metadata of Langchain's Documents?\nFor example, using the CharacterTextSplitter gives a list of Documents:\nconst splitter = new CharacterTextSplitter({\n separator: \" \",\n chunkSize: 7,\n chunkOverlap: 3,\n});\nsplitter.createDocuments([text]);\n\nA document will have the following structure:\n{\n \"pageContent\": \"blablabla\",\n \"metadata\": {\n \"name\": \"my-file.pdf\",\n \"type\": \"application/pdf\",\n \"size\": 12012,\n \"lastModified\": 1688375715518,\n \"loc\": { \"lines\": { \"from\": 1, \"to\": 3 } }\n }\n}\n\nAnd I want to add a field to the metadata\n"} -{"question": "langchain python agent react differently, for one prompt, it can import scanpy library, but not for the other one. My question is how to make sure to import the correct library without problem.\nfrom dotenv import load_dotenv, find_dotenv\nimport openai\nimport os\nfrom langchain.chat_models import ChatOpenAI\nfrom langchain.agents.agent_types import AgentType\nfrom langchain_experimental.agents.agent_toolkits import create_python_agent\nfrom langchain_experimental.tools import PythonREPLTool\nimport scanpy as sc\n\nload_dotenv(find_dotenv())\nopenai.api_key = os.environ[\"OPENAI_API_KEY\"]\n\nagent_executor = create_python_agent(\n llm=ChatOpenAI(temperature=0, model=\"gpt-4-1106-preview\"),\n tool=PythonREPLTool(),\n verbose=True,\n agent_type=AgentType.OPENAI_FUNCTIONS,\n agent_executor_kwargs={\"handle_parsing_errors\": True},\n)\n\nif run the following,\nagent_executor.run(\"set scanpy setting verbosity = 3 \")\nI get\n> Entering new AgentExecutor chain...\n\nInvoking: Python_REPL with import scanpy as sc\nsc.settings.verbosity = 3\nprint(sc.settings.verbosity)\n\n\n3\nThe verbosity level of Scanpy has been set to 3.\n\n> Finished chain.\nThe verbosity level of Scanpy has been set to 3.\n\nbut, if run the following,\npbmc = sc.datasets.pbmc68k_reduced()\nagent_executor.run(\"use 'scanpy' library and 'pbmc' object to plot a umap\")\n\nI get,\n> Entering new AgentExecutor chain...\nPython REPL can execute arbitrary code. Use with caution.\n\nInvoking: Python_REPL with import scanpy as sc\n\n\n\nInvoking: Python_REPL with import scanpy as sc\nresponded: It seems there was an issue with the execution of the import statement for the 'scanpy' library. I will attempt to resolve this and proceed with the task. Let's try importing the library again.\n\nIt appears that there is an issue with importing the 'scanpy' library in this environment. Without being able to import the library, I cannot proceed with plotting a UMAP of the 'pbmc' object. If the library and the necessary data were available, I would typically load the data, preprocess it, and then use the sc.pl.umap function to plot the UMAP. However, since I cannot execute the code here, I'm unable to complete this task.\n\n"} -{"question": "I use the following line to add langchain documents to a chroma database: Chroma.from_documents(docs, embeddings, ids=ids, persist_directory='db')\nwhen ids are duplicates, I get this error: chromadb.errors.IDAlreadyExistsError\nhow do I catch the error? (duplicate ids are expected - I expect Chorma to not add them)\nI've tried identifying the error in langchain documentation. Not sure how to catch it.\n"} -{"question": "I am building a very simple rag application using Langchain. The problem I'm having is that when I use ChatOpenAI and ask a question. The model doesn't make any sentences when it answers, it doesn't behave like a \"chatbot\" unlike llama2 for example (see images below). When I switch from ChatOpenAI to llama2, I don't touch anything in my code except to comment on the model.\nMy data is based on openfoodfacts, which is why I ask for specific ingredients in the question.\nWhat's the problem and what can I do to get the same result as llama2 using ChatOpenAI ?\nChatOpenAI :\n\nLlama2:\n\nCode :\nfrom fastapi import FastAPI\nfrom langchain.vectorstores import FAISS\nfrom langchain_community.embeddings import HuggingFaceEmbeddings\nfrom langserve import add_routes\nfrom langchain_community.llms import Ollama\nfrom langchain.chat_models import ChatOpenAI\nfrom langchain_core.prompts import ChatPromptTemplate\nfrom langchain_core.output_parsers import StrOutputParser\nfrom langchain_core.prompts import ChatPromptTemplate\nfrom langchain_core.runnables import RunnableLambda, RunnablePassthrough\nfrom langchain.embeddings import OpenAIEmbeddings\nimport os\n\nos.environ[\"OPENAI_API_KEY\"] = \"SECRET\"\n\n# model = Ollama(model=\"llama2\")\nmodel = ChatOpenAI(temperature=0.1)\n\nimport pandas as pd\nproducts = pd.read_csv('./data/products.csv')\nvectorstore = FAISS.from_texts(\n products['text'], embedding=OpenAIEmbeddings()\n)\nretriever = vectorstore.as_retriever()\n\n\napp = FastAPI(\n title=\"LangChain Server\",\n version=\"1.0\",\n description=\"Spin up a simple api server using Langchain's Runnable interfaces\",\n)\n\nANSWER_TEMPLATE = \"\"\"Answer the question based on the following context:\n{context}\n\nQuestion: {question}\n\"\"\"\n\nprompt = ChatPromptTemplate.from_template(ANSWER_TEMPLATE)\n\nchain = (\n {\"context\": retriever, \"question\": RunnablePassthrough()}\n | prompt\n | model\n | StrOutputParser()\n)\n\n# Adds routes to the app for using the retriever under:\n# /invoke\n# /batch\n# /stream\nadd_routes(app, chain)\n\nif __name__ == \"__main__\":\n import uvicorn\n\n uvicorn.run(app, host=\"localhost\", port=8000)\n\n"} -{"question": "using LangChain and OpenAI, how can I have the model return a specific default response? for instance, let's say I have these statement/responses\nStatement: Hi, I need to update my email address.\nAnswer: Thank you for updating us. Please text it here.\n\nStatement: Hi, I have a few questions regarding my case. Can you call me back?\nAnswer: Hi. Yes, one of our case managers will give you a call shortly. \n\nif the input is similar to one of the above statements, I would like to have OpenAI respond with the specific answer.\n"} -{"question": "I am writing a little application in JavaScript using the LangChain library. I have the following snippet:\n/* LangChain Imports */\nimport { OpenAI } from \"langchain/llms/openai\";\nimport { BufferMemory } from \"langchain/memory\";\nimport { ConversationChain } from \"langchain/chains\";\n\n// ========================================================================================= //\n // ============= Use LangChain to send request to OpenAi API =============================== //\n // ========================================================================================= //\n\n const openAILLMOptions = {\n modelName: chatModel.value,\n openAIApiKey: decryptedString,\n temperature: parseFloat(temperatureValue.value),\n topP: parseFloat(topP.value),\n maxTokens: parseInt(maxTokens.value),\n stop: stopSequences.value.length > 0 ? stopSequences.value : null,\n streaming: true,\n};\n\n const model = new OpenAI(openAILLMOptions);\n const memory = new BufferMemory();\n const chain = new ConversationChain({ llm: model, memory: memory });\n\n try {\n const response = await chain.call({ input: content.value, signal: signal }, undefined,\n [\n {\n\n handleLLMNewToken(token) {\n process.stdout.write(token);\n },\n },\n ]\n );\n\n// handle the response\n\n}\n\nThis does not work (I tried both using the token via TypeScript and without typing). I have scoured various forums and they are either implementing streaming with Python or their solution is not relevant to this problem. So to summarize, I can successfully pull the response from OpenAI via the LangChain ConversationChain() API call, but I can\u2019t stream the response. Is there a solution?\n"} -{"question": "I'm trying to use the Langchain ReAct Agents and I want to give them my pinecone index for context. I couldn't find any interface that let me provide the LLM that uses the ReAct chain my vector embeddings as well.\nHere I set up the LLM and retrieve my vector embedding.\nllm = ChatOpenAI(temperature=0.1, model_name=\"gpt-4\")\nretriever = vector_store.as_retriever(search_type='similarity', search_kwargs={'k': k})\n\nHere I start my ReAct Chain.\nprompt = hub.pull(\"hwchase17/structured-chat-agent\")\nagent = create_structured_chat_agent(llm, tools, prompt)\nagent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)\nresult = agent_executor.invoke(\n {\n \"input\": question,\n \"chat_history\": chat_history\n }\n)\n\nBefore using the ReAct Agent, I used the vector embedding like this.\ncrc = ConversationalRetrievalChain.from_llm(llm, retriever)\nresult = crc.invoke({'question': systemPrompt, 'chat_history': chat_history})\nchat_history.append((question, result['answer']))\n\nIs there any way to combine both methods and have a ReAct agent that also uses vector Embeddings?\n"} -{"question": "I am using Langchain with OpenAI API for getting the summary of PDF Files. Some of my PDFs have many pages (more than the max token allowed in ChatGPT). Im trying two approaches to reduce the tokens so that I can input longer texts, but is still not working for a 300 inch- PDF.\n\nRetrieval augmented generation: more specifically the text splitter\n\ntext_splitter = RecursiveCharacterTextSplitter(chunk_size = 1000, chunk_overlap = 50)\n all_splits = text_splitter.split_documents(data)\n\n\nText summarisation: using stuff documents chain\n\n stuff_chain = StuffDocumentsChain(llm_chain=llm_chain, document_variable_name=\"text\")\n\nI would like to understand what is the text splitter doing because is not helping me to input longer text in the prompt. How can do this?\n"} -{"question": "Not a coding question, but a documentation omission that is nowhere mentioned online at this point. When using the Langchain CSVLoader, which column is being vectorized via the OpenAI embeddings I am using?\nI ask because viewing this code below, I vectorized a sample CSV, did searches (on Pinecone) and consistently received back DISsimilar responses. How do know which column Langchain is actually identifying to vectorize?\nloader = CSVLoader(file_path=file, metadata_columns=['col2', 'col3', 'col4','col5'])\nlangchain_docs = loader.load()\ntext_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=100)\ndocs = text_splitter.split_documents(langchain_docs)\nfor doc in docs:\n doc.metadata.pop('source')\n doc.metadata.pop('row')\nmy_index = pc_store.from_documents(docs, embeddings, index_name=PINECONE_INDEX_NAME)\n\nI am assuming the CSVLoader is then identifying col1 to vectorize. But, searches of Pinecone are terrible, leading me to think some other column is being vectorized.\n"} -{"question": "The following code do not do what it is supposed to do:\nfrom langchain.callbacks.base import BaseCallbackHandler\nfrom langchain import PromptTemplate\nfrom langchain.chains import LLMChain\nfrom langchain.llms import VertexAI\n\n\nclass MyCustomHandler(BaseCallbackHandler):\n def on_llm_end(self, event, context):\n print(f\"Prompt: {event.prompt}\")\n print(f\"Response: {event.response}\")\n\n\nllm = VertexAI(\n model_name='text-bison@001',\n max_output_tokens=1024,\n temperature=0.3,\n verbose=False)\nprompt = PromptTemplate.from_template(\"1 + {number} = \")\nhandler = MyCustomHandler()\nchain = LLMChain(llm=llm, prompt=prompt, callbacks=[handler])\nresponse = chain.run(number=2)\nprint(response)\n\nBased on this documentation and this tutorial, the code should execute the custom handler callback on_llm_end but in fact it doesn't.\nCan anyone please tell me why?\n"} -{"question": "I have a quick question: I'm using the Chroma vector store with LangChain.\nAnd I brought up a simple docsearch with Chroma.from_texts. I was initially very confused because i thought the similarity_score_with_score would be higher for queries that are close to answers, but it seems from my testing the opposite is true. Is this becasue it's returning the 'distance' between the two vectors when it searches? I was looking at docs but it only says \"List of Documents most similar to the query and score for each\" but doesnt explain what 'score' is\nDoc reference https://python.langchain.com/en/latest/reference/modules/vectorstores.html?highlight=similarity_search#langchain.vectorstores.Annoy.similarity_search_with_score Can also give more info on the (small to start) dataset im using and queries i tested with.\n"} -{"question": "I am using LangChain to create embeddings and then ask a question to those embeddings like so:\nembeddings: OpenAIEmbeddings = OpenAIEmbeddings(disallowed_special=())\ndb = DeepLake(\n dataset_path=deeplake_url,\n read_only=True,\n embedding_function=embeddings,\n)\nretriever: VectorStoreRetriever = db.as_retriever()\nmodel = ChatOpenAI(model_name=\"gpt-3.5-turbo\") \nqa = ConversationalRetrievalChain.from_llm(model, retriever=retriever)\nresult = qa({\"question\": question, \"chat_history\": chat_history})\n\nBut I am getting the following error:\nFile \"/xxxxx/openai/api_requestor.py\", line 763, in _interpret_response_line\n raise self.handle_error_response(\nopenai.error.InvalidRequestError: This model's maximum context length is 4097 tokens. However, your messages resulted in 13918 tokens. Please reduce the length of the messages.\n\nThe chat_history is empty and the question is quite small.\nHow can I reduce the size of tokens being passed to OpenAI?\nI'm assuming the response from the embeddings is too large being passed to openai. It might be easy enough to just figure out how to truncate the data being sent to openai.\n"} -{"question": "I'm going to learn LangChain and stumble upon their Getting Started section. Because it doesn't work and I'm curious if I am the only person where LangChain examples don't work.\nThis is their tutorial I am talking about. https://python.langchain.com/docs/get_started/quickstart/\nLet's use the very first example:\nllm = ChatOpenAI(openai_api_key=api_key)\nllm.invoke(\"how can langsmith help with testing?\")\n\nI wrote some initializing code as well to make ChatOpenAI work:\nimport os\nfrom langchain_openai import ChatOpenAI\nfrom dotenv import load_dotenv\n\nload_dotenv()\n\napi_key = os.getenv(\"OPENAI_API_KEY\")\n\nllm = ChatOpenAI(openai_api_key=api_key)\nllm.invoke(\"how can langsmith help with testing?\")\n\nThe invoke function seems to be executed as I can't see any error message. But I also can't see any further output. Nothing happens.\nThey even wrote \"We can also guide its response with a prompt template.\". However, there is not response.\nWho can explain to me, what is happening here? And can you probably recommend me a better tutorial instead of that from LangChain?\n"} -{"question": "I am trying to create a chatbot with langchain and openAI that can query the database with large number of tables based on user query. I have used SQLDatabaseSequentialChain which is said to be best if you have large number of tables in the database.\nThe problem is when I run this code, it takes forever to establish the connection and at the end I get this error:\n raise self.handle_error_response(\nopenai.error.APIError: internal error {\n \"message\": \"internal error\",\n \"type\": \"invalid_request_error\",\n \"param\": null,\n \"code\": null\n }\n}\n 500 {'error': {'message': 'internal error', 'type': 'invalid_request_error', 'param': None, 'code': None}} {'Date': 'Wed, 21 Jun 2023 14:49:42 GMT', 'Content-Type': \n'application/json; charset=utf-8', 'Content-Length': '147', 'Connection': 'keep-alive', 'vary': 'Origin', 'x-request-id': '37d9d00a37ce69e68166317740bad7da', 'strict-transport-security': 'max-age=15724800; includeSubDomains', 'CF-Cache-Status': 'DYNAMIC', 'Server': 'cloudflare', 'CF-RAY': '7dad0f24fa9c6ec5-BOM', 'alt-svc': 'h3=\":443\"; ma=86400'}\n\n\nBelow is the code I found on the internet:\nfrom langchain import OpenAI, SQLDatabase\nfrom langchain.chains import SQLDatabaseSequentialChain\nimport pyodbc\n\nserver = 'XYZ'\ndatabase = 'XYZ'\nusername = 'XYZ'\npassword = 'XYZ'\ndriver = 'ODBC Driver 17 for SQL Server'\n\nconn_str = f\"mssql+pyodbc://{username}:{password}@{server}/{database}?driver={driver}\"\n\ntry:\n # Establish a connection to the database\n conn = SQLDatabase.from_uri(conn_str)\n\nexcept pyodbc.Error as e:\n # Handle any errors that occur during the connection or query execution\n print(f\"Error connecting to Azure SQL Database: {str(e)}\")\n\nOPENAI_API_KEY = \"XYZ key\"\n\nllm = OpenAI(temperature=0, openai_api_key=OPENAI_API_KEY, model_name='text-davinci-003 ')\n\nPROMPT = \"\"\" \nGiven an input question, first create a syntactically correct SQL query to run, \nthen look at the results of the query and return the answer. \nThe question: {question}\n\"\"\"\n\ndb_chain = SQLDatabaseSequentialChain.from_llm(llm, conn, verbose=True, top_k=3)\n\nquestion = \"What is the property code of Ambassador, 821?\"\n\ndb_chain.run(PROMPT.format(question=question))\n\n\nI have confirmed that my openAI API key is up and running.\nPlease help me out with this.\nAlso if you have suggestions for any other method that I should consider, please let me know. I am currently doing RnD on this project but didn't found any satisfactory solution.\nThank you\nI tried to check if my openAI API key is available and yes, it is. Expected to get a response from GPT model.\n"} -{"question": "One can obtain a ChatGPT response to a prompt using the following example:\nfrom openai import OpenAI\n\nclient = OpenAI() # requires key in OPEN_AI_KEY environment variable\n\ncompletion = client.chat.completions.create(\n model=\"gpt-3.5-turbo\",\n messages=[\n {\"role\": \"system\", \"content\": \"You are a poetic assistant, skilled in explaining complex programming concepts with creative flair.\"},\n {\"role\": \"user\", \"content\": \"Compose a poem that explains the concept of recursion in programming.\"}\n ]\n)\n\nprint(completion.choices[0].message.content)\n\nHow can one continue the conversation? I've seen examples saying you just add a new message to the list of messages and re-submit:\n# Continue the conversation by including the initial messages and adding a new one\ncontinued_completion = client.chat.completions.create(\n model=\"gpt-3.5-turbo\",\n messages=[\n {\"role\": \"system\", \"content\": \"You are a poetic assistant, skilled in explaining complex programming concepts with creative flair.\"},\n {\"role\": \"user\", \"content\": \"Compose a poem that explains the concept of recursion in programming.\"},\n {\"role\": \"assistant\", \"content\": initial_completion.choices[0].message.content}, # Include the initial response\n {\"role\": \"user\", \"content\": \"Can you elaborate more on how recursion can lead to infinite loops if not properly handled?\"} # New follow-up prompt\n ]\n)\n\nBut I would imagine this means processing the previous messages all over again at every new prompt, which seems quite wasteful. Is that really the only way? Isn't there a way to keep a \"session\" of some sort that keeps ChatGPT's internal state and just processes a newly given prompt?\n"} -{"question": "I'm experimenting with LangChain's AgentType.CHAT_ZERO_SHOT_REACT agent. By its name I'd assume this is an agent intended for chat use and I've given it memory but it doesn't seem able to access its memory. What else do I need to do so that this will access its memory? Or have I incorrectly assumed that this agent can handle chats?\nHere is my code and sample output:\nllm = ChatOpenAI(model_name=\"gpt-4\",\n temperature=0)\n\ntools = load_tools([\"llm-math\", \"wolfram-alpha\", \"wikipedia\"], llm=llm)\nmemory = ConversationBufferMemory(memory_key=\"chat_history\")\n\nagent_test = initialize_agent(\n tools=tools, \n llm=llm, \n agent=AgentType.CHAT_ZERO_SHOT_REACT_DESCRIPTION, \n handle_parsing_errors=True,\n memory=memory, \n verbose=True\n)\n\n>>> agent_test.run(\"What is the height of the empire state building?\")\n'The Empire State Building stands a total of 1,454 feet tall, including its antenna.'\n>>> agent_test.run(\"What was the last question I asked?\")\n\"I'm sorry, but I can't provide the information you're looking for.\"\n\n"} -{"question": "Trying to conenct postgresql with langchain.llm used - AzureOpenAI\nfrom langchain.llms import AzureOpenAI\n\nllms = AzureOpenAI( temperature=0,deployment_name=\"gpt3turbo\".......)\n\ntoolkit = SQLDatabaseToolkit(db=db,llm=llms)\n\nError:\nValidationError: 1 validation error for SQLDatabaseToolkit\nllm\n value is not a valid dict (type=type_error.dict)\n\nTried different versions of langchain\n"} -{"question": "I was following a tutorial on langchain, and after using loader.load() to load a PDF file, it gave me an error and suggested that some dependencies are missing and I should install them using pip install unstructured[local-inference]. So, I did. But it is now installing a whole lot of packages. A whole lot of it includes some packages to do with nvidia-*. Can someone please explain what this command does? It took a good couple of hours for this command to complete.\n"} -{"question": "I am trying to use my llama2 model (exposed as an API using ollama). I want to chat with the llama agent and query my Postgres db (i.e. generate text to sql). I was able to find langchain code that uses open AI to do this. However, I am unable to find anything out there which fits my situation.\nAny pointers will be of great help.\nCode with openai\n# Create connection to postgres\nimport psycopg2 # Import the library\n\ndatabase = 'postgres'\nusername = 'postgres'\npassword = 'password'\nserver = 'localhost'\nport = '5432'\n\n# Establish the connection\nconn = psycopg2.connect(\n dbname=database,\n user=username,\n password=password,\n host=server,\n port=port\n)\n\ndb = SQLDatabase.from_uri(\n \"postgresql://postgres:password@localhost:5432/postgres\")\ntoolkit = SQLDatabaseToolkit(db=db, llm=OpenAI(temperature=0))\n\nagent_executor = create_sql_agent(\n llm=OpenAI(temperature=0),\n toolkit=toolkit,\n verbose=True,\n agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION,\n)\n\nagent_executor.run(\"Describe the transaction table\")\n\nI want to make the above code work for my llama2 model exposed via an API at localhost:11434/api/generate\n"} -{"question": "I loaded pdf files from a directory and I need to split them to smaller chunks to make a summary. The problem is that I can't iterate on documents object in a for loop and I get an error like this: AttributeError: 'tuple' object has no attribute 'page_content'\nHow can I iterate on my document items to call the summary function for each of them?\nHere is my code:\n# Load the documents\n\nfrom langchain.document_loaders import DirectoryLoader\ndocument_directory = \"pdf_files\"\nloader = DirectoryLoader(document_directory)\ndocuments = loader.load()\n\ntext_splitter = RecursiveCharacterTextSplitter(chunk_size=4000, chunk_overlap=50)\n\n# Iterate on long pdf documents to make chunks (2 pdf files here)\nfor doc in documents:\n \n # it fails on this line \n texts = text_splitter.split_documents(doc) \n chain = load_summarize_chain(llm, chain_type=\"map_reduce\", map_prompt=prompt, combine_prompt=prompt)\n\n\n"} -{"question": "finetuned a model (https://huggingface.co/decapoda-research/llama-7b-hf) using peft and lora and saved as https://huggingface.co/lucas0/empath-llama-7b. Now im getting Pipeline cannot infer suitable model classes from when trying to use it along with with langchain and chroma vectordb:\nfrom langchain.embeddings import HuggingFaceHubEmbeddings\nfrom langchain import PromptTemplate, HuggingFaceHub, LLMChain\nfrom langchain.chains import RetrievalQA\nfrom langchain.prompts import PromptTemplate\nfrom langchain.vectorstores import Chroma\n\nrepo_id = \"sentence-transformers/all-mpnet-base-v2\"\nembedder = HuggingFaceHubEmbeddings(\n repo_id=repo_id,\n task=\"feature-extraction\",\n huggingfacehub_api_token=\"XXXXX\",\n)\ncomments = [\"foo\", \"bar\"]\nembeddings = embedder.embed_documents(texts=comments)\ndocsearch = Chroma.from_texts(comments, embedder).as_retriever()\n#docsearch = Chroma.from_documents(texts, embeddings)\n\nllm = HuggingFaceHub(repo_id='lucas0/empath-llama-7b', huggingfacehub_api_token='XXXXX')\nqa = RetrievalQA.from_chain_type(llm=llm, chain_type=\"stuff\", retriever=docsearch, return_source_documents=False)\n\nq = input(\"input your query:\")\nresult = qa.run(query=q)\n\nprint(result[\"result\"])\n\n\nis anyone able to tell me how to fix this? Is it an issue with the model card? I was facing issues with the lack of the config.json file and ended up just placing the same config.json as the model I used as base for the lora fine-tuning. Could that be the origin of the issue? If so, how to generate the correct config.json without having to get the original llama weights?\nAlso, is there a way of loading several sentences into a custom HF model (not only OpenAi, as the tutorial show) without using vector dbs?\nThanks!\n\nThe same issue happens when trying to run the API on the model's HF page:\n\n"} -{"question": "LangChain's BaseMessage has a function toJSON that returns a Serialized.\nOnce I have a list of BaseMessages, I can use toJSON to serialize them, but how can I later deserialize them?\nconst messages = [\n new HumanMessage(\"hello\"),\n new AIMessage(\"foo\"),\n new HumanMessage(\"bar\"),\n new AIMessage(\"baz\"),\n];\n\nconst serialized = messages.map((message) => message.toJSON());\n\nconst deserialized = ???\n\n"} -{"question": "How do i add memory to RetrievalQA.from_chain_type? or, how do I add a custom prompt to ConversationalRetrievalChain?\nFor the past 2 weeks ive been trying to make a chatbot that can chat over documents (so not in just a semantic search/qa so with memory) but also with a custom prompt. I've tried every combination of all the chains and so far the closest I've gotten is ConversationalRetrievalChain, but without custom prompts, and RetrievalQA.from_chain_type but without memory\n"} -{"question": "I am using StructuredParser of Langchain library. I am getting flat dictionary from parser. Please guide me to get a list of dictionaries from output parser.\nPROMPT_TEMPLATE = \"\"\" \nYou are an android developer. \nParse this error message and provide me identifiers & texts mentioend in error message. \n--------\nError message is {msg}\n--------\n{format_instructions}\n\"\"\"\n\ndef get_output_parser():\n missing_id = ResponseSchema(name=\"identifier\", description=\"This is missing identifier.\")\n missing_text = ResponseSchema(name=\"text\", description=\"This is missing text.\")\n\n response_schemas = [missing_id, missing_text]\n output_parser = StructuredOutputParser.from_response_schemas(response_schemas)\n return output_parser\n\n\ndef predict_result(msg):\n model = ChatOpenAI(open_api_key=\"\", openai_api_base=\"\", model=\"llama-2-70b-chat-hf\", temperature=0, max_tokens=2000)\n output_parser = get_output_parser()\n format_instructions = output_parser.get_format_instructions()\n \n prompt = ChatPromptTemplate.from_template(template=PROMPT_TEMPLATE)\n message = prompt.format_messages(msg=msg, format_instructions=format_instructions)\n response = model.invoke(message)\n\n response_as_dict = output_parser.parse(response.content)\n print(response_as_dict)\n\n\npredict_result(\"ObjectNotFoundException AnyOf(AllOf(withId:identifier1, withText:text1),AllOf(withId:identifier2, withText:text1),AllOf(withId:identifier3, withText:text1))\")\n\nThe output I get is\n{\n \"identifier\":\"identifier1\",\n \"text\":\"text1\"\n}\n\n\nExpected output is\n[\n {\n \"identifier\":\"identifier1\",\n \"text\":\"text1\"\n },\n {\n \"identifier\":\"identifier2\",\n \"text\":\"text1\"\n },\n {\n \"identifier\":\"identifier3\",\n \"text\":\"text1\"\n }\n]\n\nHow to specify such nested JSON in OutputParser\n"} -{"question": "I am trying to put together a simple \"Q&A with sources\" using Langchain and a specific URL as the source data. The URL consists of a single page with quite a lot of information on it.\nThe problem is that RetrievalQAWithSourcesChain is only giving me the entire URL back as the source of the results, which is not very useful in this case.\nIs there a way to get more detailed source info?\nPerhaps the heading of the specific section on the page?\nA clickable URL to the correct section of the page would be even more helpful!\nI am slightly unsure whether the generating of the result source is a function of the language model, URL loader or simply RetrievalQAWithSourcesChain alone.\nI have tried using UnstructuredURLLoader and SeleniumURLLoader with the hope that perhaps more detailed reading and input of the data would help - sadly not.\nRelevant code excerpt:\nllm = ChatOpenAI(temperature=0, model_name='gpt-3.5-turbo')\nchain = RetrievalQAWithSourcesChain.from_llm(llm=llm, retriever=VectorStore.as_retriever())\n\nresult = chain({\"question\": question})\n\nprint(result['answer'])\nprint(\"\\n Sources : \",result['sources'] )\n\n"} diff --git a/optimization_runs/gepa_optimized_multi_query_writer.json b/optimization_runs/gepa_optimized_multi_query_writer.json deleted file mode 100644 index 0fbff95..0000000 --- a/optimization_runs/gepa_optimized_multi_query_writer.json +++ /dev/null @@ -1,28 +0,0 @@ -{ - "query_writer": { - "traces": [], - "train": [], - "demos": [], - "signature": { - "instructions": "You are given a user’s technical question. Your job is to output a diverse set of long, highly specific search queries that will help gather the most relevant information from search engines to solve the question.\n\nOutput format rules\n- Output only a flat list (array) of 10–15 search queries (strings). No explanations, no extra text.\n- Use a JSON-like Python list of strings, e.g., [\"query 1\", \"query 2\"].\n- Each query must be long and detailed (aim for 12–25+ words).\n- Queries must be distinct; avoid near-duplicates and trivial rephrasings.\n- Mix site-scoped queries (official docs, GitHub, Stack Overflow) with broader web queries.\n\nHow to craft the queries\n1) Mirror the user’s goal and restate it in several ways across the queries. Reflect concrete tasks, exact operations, desired outcomes, constraints, and edge cases. Ask for end-to-end examples that include runnable code and visible outputs, including explicit print(result) and print(result.content) where relevant.\n\n2) Extract and embed exact identifiers from the question and adjacent APIs. Include literal class\/function names, parameters, method calls, import paths, driver URIs, CLI flags, environment variables, file names, and any literal error messages. Examples to use verbatim or as plausible alternatives:\n - LangChain classes\/chains\/agents\/tools: create_sql_agent, SQLDatabaseToolkit, SQLDatabase.from_uri(\"postgresql+psycopg2:\/\/...\"), SQLDatabaseChain, SQLDatabaseSequentialChain, AgentExecutor, RetrievalQA.from_chain_type, ConversationalRetrievalChain, CharacterTextSplitter.createDocuments, RecursiveCharacterTextSplitter, HTMLHeaderTextSplitter, UnstructuredHTMLLoader, CSVLoader, vector_store.as_retriever(search_kwargs={'k': ...}), Chroma.from_texts, Chroma.from_documents, .persist(), persist_directory.\n - Prompting and output parsing: ChatPromptTemplate, PromptTemplate, StructuredOutputParser, ResponseSchema, StructuredOutputParser.from_response_schemas, StructuredOutputParser.get_format_instructions, StructuredOutputParser.parse, StructuredOutputParser for list outputs, create_structured_chat_agent.\n - Messages\/IO: BaseMessage, AIMessage, invoke(...), .content, input_documents=[doc], Document, Document.page_content.\n - OpenAI\/Azure\/HF: ChatOpenAI (langchain_openai), OpenAI vs AzureOpenAI vs AzureChatOpenAI, langchain_openai.AzureOpenAI, azure_deployment (or deployment_name), OPENAI_API_KEY, AZURE_OPENAI_API_KEY, OPENAI_API_BASE, OPENAI_API_VERSION.\n - JS streaming: callbacks in chain.call(values, config), handleLLMNewToken(token), streaming: true, BufferMemory({ returnMessages: true }), ChatPromptTemplate, correct placement of callbacks in the config object.\n - HF and PEFT: transformers, peft, accelerate, sentencepiece; AutoModelForCausalLM, AutoTokenizer, PeftModel; LoRA\/PEFT adapter loading\/merging; trust_remote_code; pipeline vs AutoModel; missing config.json, tokenizer.json, adapter_config.json.\n - Vector stores: Chroma, Pinecone, FAISS; embedding function reuse on reload.\n - DB drivers\/URIs:\n \"postgresql+psycopg2:\/\/user:pass@host:5432\/db\",\n \"mssql+pyodbc:\/\/user:pass@server\/db?driver=ODBC Driver 17 for SQL Server\",\n \"mssql+pyodbc:\/\/user:pass@server\/db?driver=ODBC+Driver+17+for+SQL+Server\" (quote_plus encoded).\n - Exact\/likely error strings: \"invalid_request_error\", \"internal error\", \"500\", \"value is not a valid dict (type=type_error.dict)\", \"Pipeline cannot infer suitable model classes\", tokenization\/config missing files, LoRA\/PEFT adapter issues, model name typos or trailing spaces like \"text-davinci-003 \" causing \"invalid_request_error\".\n - Python Document types across versions: langchain.schema.Document vs langchain_core.documents.Document.\n\n3) Mention relevant library\/framework names and versions when likely implicated:\n - LangChain 0.1.x and 0.2.x; module split across langchain, langchain-openai (langchain_openai), langchain-community (langchain_community), langchain-core, langchain-text-splitters.\n - SQLAlchemy 2.x, psycopg2\/psycopg2-binary, pyodbc, ODBC Driver 17\/18 for SQL Server.\n - Vector stores: Chroma, Pinecone, FAISS.\n - Hugging Face: transformers, peft, accelerate, sentencepiece; AutoModelForCausalLM, AutoTokenizer, PeftModel.\n - Ollama\/ChatOllama, VertexAI, Llama 2\/3.\n\n4) Vary solution angles to increase recall:\n - Compare agent vs chain framing (ReAct agent, SQL agent, retriever tool, vector store tool).\n - Memory integration and prompt customization (ConversationBufferMemory; override condense_question_prompt and qa_prompt; BufferMemory with returnMessages: true in JS).\n - Callbacks vs hooks; pass callbacks to the correct components and parameters; BaseCallbackHandler signatures; JS chain.call(values, { callbacks: [...] , signal }) vs passing callbacks incorrectly.\n - Retriever configuration and performance tuning (search_kwargs={'k': ...}, metadata reflection limits).\n - Migration\/deprecations: imports moved to langchain_openai or langchain_community; API changes\/renames in 0.1.x\/0.2.x.\n - Alternatives when the approach is problematic (e.g., use create_sql_agent instead of SQLDatabaseSequentialChain; wrap retrievers as Tools; custom LLM subclass for HF+PEFT; Ollama substitution for OpenAI; different vector stores).\n\n5) Cover multiple interpretations and troubleshooting paths:\n - Wrong class\/API usage (OpenAI vs AzureOpenAI\/AzureChatOpenAI incompatibilities; chat vs completion models).\n - Incorrect arguments\/signatures; trailing spaces\/misspelled model names causing \"invalid_request_error\".\n - Version mismatches; missing installs (pip install langchain openai psycopg2-binary; pyodbc; ODBC driver setup).\n - Cloud configuration mistakes: Azure endpoint\/deployment, regional endpoints, api_version.\n - SQL performance pitfalls: metadata reflection on all tables by SQLDatabase; mitigate with include_tables, table_info hints, sample_rows_in_table_info, and limiting reflection; restrict schemas; prefer SQL agents for dynamic table selection.\n - Prompt formatting across models; Llama 2\/3 tool-use prompt compatibility.\n - Chroma persistence specifics: set persist_directory on create and load; call .persist(); reload with the same embedding function; avoid empty\/duplicate collections; verify collection name.\n - Document creation\/usage: build Document(page_content=...); add metadata; pass doc.page_content to splitters; wrap single Document in a list for inputs that expect a sequence, e.g., input_documents=[doc] or vectorstore.add_documents([doc]); avoid passing a tuple which triggers \"AttributeError: 'tuple' object has no attribute 'page_content'\".\n - HTML loading\/splitting: UnstructuredHTMLLoader returns Document; HTMLHeaderTextSplitter expects raw text; pass doc.page_content (not str(Document)); handle encoding\/unicode cleanup; remove boilerplate with BeautifulSoup; use RecursiveCharacterTextSplitter with keep_separator or custom separators to keep headings with following paragraphs; include page title in metadata.\n - Output handling: invoke(...) returns BaseMessage\/AIMessage; print(result) to see the object; print(result.content) to see text; JS streaming: handleLLMNewToken(token) and verify tokens arrive.\n - Structured outputs: StructuredOutputParser\/ResponseSchema can describe nested JSON (arrays of objects); for lists, instruct the model to return an array of objects and parse(response.content).\n\n6) Scope some queries to authoritative sources and Q&A:\n - Official docs\/reference: site:docs.langchain.com OR site:python.langchain.com OR site:langchain.readthedocs.io\n - API reference: site:api.python.langchain.com\n - JavaScript docs when relevant: site:js.langchain.com\n - GitHub issues\/discussions: site:github.com with “issues” or “discussions”\n - Stack Overflow: site:stackoverflow.com\n - Also include broader web queries (blogs, tutorials, community posts, “end-to-end example”, “step-by-step”, “full code sample”).\n\nAdditional domain-specific guidance to incorporate in the queries\n- ChatOpenAI \/ OpenAI usage in LangChain 0.1.x\/0.2.x:\n - Prefer langchain_openai.ChatOpenAI; distinguish OpenAI vs AzureOpenAI\/AzureChatOpenAI; for Azure pass azure_deployment (or deployment_name), api_version, and endpoint.\n - invoke(...) returns a BaseMessage\/AIMessage; access .content for text; include examples showing print(result.content).\n - Watch for trailing spaces in model names causing “invalid_request_error”.\n- Memory and custom prompts for retrieval chat:\n - Combine memory and retrieval via ConversationalRetrievalChain; customize condense-question and answer prompts (PromptTemplate).\n - Compare to RetrievalQA.from_chain_type; include runnable code with a sample user query and printed outputs showing memory\/prompt effects.\n- SQL agents and databases:\n - Prefer create_sql_agent + SQLDatabaseToolkit + AgentExecutor for dynamic table selection.\n - Connect via SQLDatabase.from_uri(\"postgresql+psycopg2:\/\/...\") or MSSQL \"mssql+pyodbc:\/\/...\"; encode driver name with quote_plus; set include_tables, table_info, sample_rows_in_table_info to limit reflection; optionally restrict to schemas.\n - Ensure drivers installed (psycopg2 or psycopg2-binary for Postgres; pyodbc and ODBC Driver 17\/18 for SQL Server); connection timeout, Encrypt, TrustServerCertificate parameters.\n - If SQLDatabaseToolkit validation fails with AzureOpenAI (e.g., \"value is not a valid dict\"), try OpenAI or updated AzureChatOpenAI classes compatible with your LangChain version; include installation and configuration details (pip install langchain openai psycopg2-binary).\n- ReAct agents with retrieval:\n - Expose vector store retriever as a Tool (vector_store.as_retriever(search_kwargs={'k': ...})); use AgentExecutor or create_structured_chat_agent; compare to ConversationalRetrievalChain.\n- Callbacks:\n - Pass callbacks to the LLM instance when required; verify BaseCallbackHandler method signatures matching your LangChain version.\n - JavaScript: pass callbacks via the second config argument to chain.call(values, { callbacks, signal }); not as a separate positional parameter.\n- Ollama \/ Llama models:\n - Use LangChain’s Ollama or ChatOllama when replacing OpenAI; adjust chat templates for Llama 2\/3; ensure agents’ tool-use prompt compatibility.\n- Chroma vector DB persistence:\n - Use persist_directory on both save and load; call .persist(); reload with the same embedding function; troubleshoot empty\/duplicated collections.\n- CSV\/HTML loading and Documents:\n - CSVLoader builds Document.page_content from non-metadata columns; embeddings derive from page_content.\n - For HTML: prefer requests + BeautifulSoup to fetch\/clean; ensure you pass doc.page_content; handle encoding and weird characters; keep headings with paragraphs; use RecursiveCharacterTextSplitter with appropriate chunk_size and chunk_overlap; store metadata (page title).\n- Output parsers:\n - StructuredOutputParser\/ResponseSchema can describe nested JSON outputs; for lists of dicts, instruct the model to return an array of objects; parse(response.content); include examples phrased as “JSON array of objects with fields identifier and text”.\n\nCoverage requirements for each set of queries\n1) Official docs\/reference queries (API usage, parameters, signatures, migration notes) with site scoping.\n2) End-to-end examples\/tutorials and code samples, including queries that ask for runnable demos showing printed outputs (print(result); print(result.content)) and, for JS, streaming tokens printed in real time.\n3) Troubleshooting known errors and GitHub issues\/discussions, embedding exact error strings from the user.\n4) Stack Overflow Q&A for similar symptoms.\n5) Migration\/deprecation notes for breaking changes (module split, class renames).\n\nQuality checks before submitting\n- 10–15 distinct, detailed queries.\n- Each query 12–25+ words, concrete, and task-oriented.\n- Mix of site-scoped (docs, GitHub, Stack Overflow) and general web queries.\n- Embed exact code identifiers, parameters, environment variables, driver URIs, and error messages from the user’s question.\n- Vary solution angles: API usage, end-to-end demos with visible output, troubleshooting, performance, and migration; propose viable alternatives where appropriate.\n- Include specific “wrap Document in a list” phrasing when the question involves input_documents or add_documents; explicitly mention input_documents=[doc] to avoid \"tuple has no attribute page_content\".", - "fields": [ - { - "prefix": "Question:", - "description": "${question}" - }, - { - "prefix": "Search Queries:", - "description": "${search_queries}" - } - ] - }, - "lm": null - }, - "metadata": { - "dependency_versions": { - "python": "3.11", - "dspy": "3.0.0", - "cloudpickle": "3.1" - } - } -} \ No newline at end of file diff --git a/optimization_runs/gepa_query_expander.ipynb b/optimization_runs/gepa_query_expander.ipynb deleted file mode 100644 index e854395..0000000 --- a/optimization_runs/gepa_query_expander.ipynb +++ /dev/null @@ -1,3224 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 2, - "id": "4eaa3f75", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "3.0.0\n" - ] - } - ], - "source": [ - "import dspy\n", - "print(dspy.__version__)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "d3666516", - "metadata": {}, - "outputs": [], - "source": [ - "import retrieve_dspy\n", - "\n", - "query_writer = retrieve_dspy.QueryExpander(\n", - " collection_name=\"FreshstackLangchain\",\n", - " target_property_name=\"docs_text\",\n", - " retrieved_k=20,\n", - " verbose=False\n", - ")\n", - "\n", - "query_writer(\"How can I use Weaviate with LangChain?\")" - ] - }, - { - "cell_type": "code", - "execution_count": 21, - "id": "1efb0a90", - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/Users/cshorten/Desktop/retrieve-dspy/.venv/lib/python3.11/site-packages/datasets/utils/py_utils.py:335: ResourceWarning: unclosed \n", - " yield key, tuple(d[key] for d in dicts)\n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "/Users/cshorten/Desktop/retrieve-dspy/.venv/lib/python3.11/site-packages/datasets/utils/py_utils.py:335: ResourceWarning: unclosed \n", - " yield key, tuple(d[key] for d in dicts)\n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "/Users/cshorten/Desktop/retrieve-dspy/.venv/lib/python3.11/site-packages/datasets/utils/py_utils.py:335: ResourceWarning: unclosed \n", - " yield key, tuple(d[key] for d in dicts)\n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "/Users/cshorten/Desktop/retrieve-dspy/.venv/lib/python3.11/site-packages/datasets/utils/py_utils.py:335: ResourceWarning: unclosed \n", - " yield key, tuple(d[key] for d in dicts)\n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "/Users/cshorten/Desktop/retrieve-dspy/.venv/lib/python3.11/site-packages/datasets/utils/py_utils.py:335: ResourceWarning: unclosed \n", - " yield key, tuple(d[key] for d in dicts)\n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "/Users/cshorten/Desktop/retrieve-dspy/.venv/lib/python3.11/site-packages/datasets/utils/py_utils.py:335: ResourceWarning: unclosed \n", - " yield key, tuple(d[key] for d in dicts)\n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "/Users/cshorten/Desktop/retrieve-dspy/.venv/lib/python3.11/site-packages/datasets/utils/py_utils.py:335: ResourceWarning: unclosed \n", - " yield key, tuple(d[key] for d in dicts)\n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "/Users/cshorten/Desktop/retrieve-dspy/.venv/lib/python3.11/site-packages/datasets/utils/py_utils.py:335: ResourceWarning: unclosed \n", - " yield key, tuple(d[key] for d in dicts)\n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "/Users/cshorten/Desktop/retrieve-dspy/.venv/lib/python3.11/site-packages/datasets/utils/py_utils.py:335: ResourceWarning: unclosed \n", - " yield key, tuple(d[key] for d in dicts)\n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "/Users/cshorten/Desktop/retrieve-dspy/.venv/lib/python3.11/site-packages/datasets/utils/py_utils.py:335: ResourceWarning: unclosed \n", - " yield key, tuple(d[key] for d in dicts)\n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "/Users/cshorten/Desktop/retrieve-dspy/.venv/lib/python3.11/site-packages/datasets/utils/py_utils.py:335: ResourceWarning: unclosed \n", - " yield key, tuple(d[key] for d in dicts)\n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "/Users/cshorten/Desktop/retrieve-dspy/.venv/lib/python3.11/site-packages/datasets/utils/py_utils.py:335: ResourceWarning: unclosed \n", - " yield key, tuple(d[key] for d in dicts)\n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n" - ] - }, - { - "data": { - "text/plain": [ - "{'path': 'gepa_query_expander_training_samples.jsonl',\n", - " 'added': 30,\n", - " 'total_in_file': 30}" - ] - }, - "execution_count": 21, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "import os\n", - "\n", - "import weaviate\n", - "\n", - "from retrieve_dspy.metrics import create_coverage_metric_with_feedback\n", - "from retrieve_dspy.datasets.in_memory import load_queries_in_memory\n", - "\n", - "trainset, testset = load_queries_in_memory(\n", - " dataset_name=\"freshstack-langchain\",\n", - " train_samples=30,\n", - " test_samples=20\n", - ")\n", - "\n", - "weaviate_client = weaviate.connect_to_weaviate_cloud(\n", - " cluster_url=os.getenv(\"WEAVIATE_URL\"),\n", - " auth_credentials=weaviate.auth.AuthApiKey(os.getenv(\"WEAVIATE_API_KEY\")),\n", - ")\n", - "\n", - "metric_for_gepa = create_coverage_metric_with_feedback(\n", - " weaviate_client=weaviate_client,\n", - " dataset_name=\"freshstack-langchain\"\n", - ")\n", - "\n", - "evaluator = retrieve_dspy.utils.get_evaluator(\n", - " testset=testset,\n", - " metric=metric_for_gepa\n", - ")\n", - "\n", - "retrieve_dspy.utils.save_training_questions(trainset, \"gepa_query_expander_training_samples.jsonl\")" - ] - }, - { - "cell_type": "code", - "execution_count": 22, - "id": "f6317ace", - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "Example({'question': 'I have been reading the documentation all day and can\\'t seem to wrap my head around how I can create a VectorStoreIndex with llama_index and use the created embeddings as supplemental information for a RAG application/chatbot that can communicate with a user. I want to use llama_index because they have some cool ways to perform more advanced retrieval techniques like sentence window retrieval and auto-merging retrieval (to be fair I have not investigated if Langchain also supports these types of vector retrieval methods). I want to use LangChain because of its functionality for developing more complex prompt templates (similarly I have not really investigated if llama_index supports this).\\nMy goal is to ultimately evaluate how these different retrieval methods perform within the context of the application/chatbot. I know how to evaluate them with a separate evaluation questions file, but I would like to do things like compare the speed and humanness of responses, token usage, etc.\\nThe code for a minimal reproducible example would be as follows\\n1) LangChain ChatBot initiation \\n from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder\\n from langchain.memory import ChatMessageHistory\\n \\n \\n prompt = ChatPromptTemplate.from_messages(\\n [\\n (\\n \"system\",\\n \"\"\"You are the world\\'s greatest... \\\\\\n Use this document base to help you provide the best support possible to everyone you engage with. \\n \"\"\",\\n ),\\n MessagesPlaceholder(variable_name=\"messages\"),\\n ]\\n )\\n \\n chat = ChatOpenAI(model=llm_model, temperature=0.7)\\n \\n \\n \\n chain = prompt | chat\\n \\n \\n chat_history = ChatMessageHistory()\\n \\n while True:\\n user_input = input(\"You: \")\\n chat_history.add_user_message(user_input)\\n \\n response = chain.invoke({\"messages\": chat_history.messages})\\n \\n if user_input.lower() == \\'exit\\':\\n break\\n \\n print(\"AI:\", response)\\n chat_history.add_ai_message(response)\\n\\n\\nLlama index sentence window retrieval\\n\\nfrom llama_index.core.node_parser import SentenceWindowNodeParser\\n from llama_index.core.indices.postprocessor import MetadataReplacementPostProcessor\\n from llama_index.core.postprocessor import LLMRerank\\n \\n class SentenceWindowUtils:\\n def __init__(self, documents, llm, embed_model, sentence_window_size):\\n self.documents = documents\\n self.llm = llm\\n self.embed_model = embed_model\\n self.sentence_window_size = sentence_window_size\\n # self.save_dir = save_dir\\n \\n self.node_parser = SentenceWindowNodeParser.from_defaults(\\n window_size=self.sentence_window_size,\\n window_metadata_key=\"window\",\\n original_text_metadata_key=\"original_text\",\\n )\\n \\n self.sentence_context = ServiceContext.from_defaults(\\n llm=self.llm,\\n embed_model=self.embed_model,\\n node_parser=self.node_parser,\\n )\\n \\n def build_sentence_window_index(self, save_dir):\\n if not os.path.exists(save_dir):\\n os.makedirs(save_dir)\\n sentence_index = VectorStoreIndex.from_documents(\\n self.documents, service_context=self.sentence_context\\n )\\n sentence_index.storage_context.persist(persist_dir=save_dir)\\n else:\\n sentence_index = load_index_from_storage(\\n StorageContext.from_defaults(persist_dir=save_dir),\\n service_context=self.sentence_context,\\n )\\n \\n return sentence_index\\n \\n def get_sentence_window_query_engine(self, sentence_index, similarity_top_k=6, rerank_top_n=3):\\n postproc = MetadataReplacementPostProcessor(target_metadata_key=\"window\")\\n rerank = LLMRerank(top_n=rerank_top_n, service_context=self.sentence_context)\\n \\n sentence_window_engine = sentence_index.as_query_engine(\\n similarity_top_k=similarity_top_k, node_postprocessors=[postproc, rerank]\\n )\\n \\n return sentence_window_engine\\n \\n \\n sentence_window = SentenceWindowUtils(documents=documents, llm = llm, embed_model=embed_model, sentence_window_size=1)\\n sentence_window_1 = sentence_window.build_sentence_window_index(save_dir=\\'./indexes/sentence_window_index_1\\')\\n sentence_window_engine_1 = sentence_window.get_sentence_window_query_engine(sentence_window_1)\\n\\nBoth blocks of code independently will run. But the goal is that when a query is performed that warrants a retrieval to the existing document base, I can use the sentence_window_engine that was built. I suppose I could retrieve relevant information based on the query and then pass that information into a subsequent prompt for the chatbot, but I would like to try and avoid including the document data in a prompt.\\nAny suggestions?\\n', 'dataset_ids': ['llama_index/llama-index-packs/llama-index-packs-sentence-window-retriever/README.md_0_2079', 'llama_index/llama-index-core/llama_index/core/chat_engine/context.py_1617_11175', 'llama_index/llama-index-packs/llama-index-packs-sentence-window-retriever/llama_index/packs/sentence_window_retriever/base.py_0_2491', 'llama_index/llama-index-core/llama_index/core/chat_engine/condense_plus_context.py_0_2774', 'llama_index/llama-index-core/llama_index/core/chat_engine/context.py_0_1614', 'langchainjs/examples/src/memory/summary_buffer.ts_0_3832', 'langchainjs/examples/src/memory/combined.ts_0_2041', 'langchainjs/examples/src/memory/getting_started.ts_0_2673', 'llama_index/docs/docs/api_reference/memory/chat_memory_buffer.md_0_47', 'llama_index/docs/docs/examples/memory/ChatSummaryMemoryBuffer.ipynb_0_7218', 'langchainjs/langchain/src/memory/summary_buffer.ts_0_5515'], 'nugget_data': [{'nugget_id': '78216871_nugget_0', 'text': 'Use the sentence_window_engine from llama_index to query the document base for relevant information.', 'relevant_corpus_ids': ['llama_index/llama-index-packs/llama-index-packs-sentence-window-retriever/README.md_0_2079', 'llama_index/llama-index-core/llama_index/core/chat_engine/context.py_1617_11175', 'llama_index/llama-index-packs/llama-index-packs-sentence-window-retriever/llama_index/packs/sentence_window_retriever/base.py_0_2491']}, {'nugget_id': '78216871_nugget_1', 'text': \"Inject the retrieved relevant information into the chatbot's context to improve response quality.\", 'relevant_corpus_ids': ['llama_index/llama-index-core/llama_index/core/chat_engine/condense_plus_context.py_0_2774', 'llama_index/llama-index-core/llama_index/core/chat_engine/context.py_0_1614', 'llama_index/llama-index-core/llama_index/core/chat_engine/context.py_1617_11175']}, {'nugget_id': '78216871_nugget_2', 'text': 'Modify the conversation prompt to include placeholders for relevant information, previous conversation summary, and current user input.', 'relevant_corpus_ids': ['langchainjs/examples/src/memory/summary_buffer.ts_0_3832', 'langchainjs/examples/src/memory/combined.ts_0_2041', 'langchainjs/examples/src/memory/getting_started.ts_0_2673']}, {'nugget_id': '78216871_nugget_3', 'text': 'Manage memory and token usage to prevent overflow, potentially using a ConversationBufferMemoryHistory.', 'relevant_corpus_ids': ['llama_index/docs/docs/api_reference/memory/chat_memory_buffer.md_0_47', 'llama_index/docs/docs/examples/memory/ChatSummaryMemoryBuffer.ipynb_0_7218', 'langchainjs/langchain/src/memory/summary_buffer.ts_0_5515']}]}) (input_keys={'question'})" - ] - }, - "execution_count": 22, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "trainset[0]" - ] - }, - { - "cell_type": "code", - "execution_count": 23, - "id": "59fe1d30", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 11.03 / 20 (55.2%): 100%|██████████| 20/20 [00:16<00:00, 1.23it/s]" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 20:27:28 INFO dspy.evaluate.evaluate: Average Metric: 11.033333333333333 / 20 (55.2%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "text/plain": [ - "EvaluationResult(score=55.17, results=)" - ] - }, - "execution_count": 23, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "dspy_evaluator_kwargs = {\n", - " \"num_threads\": 5\n", - "}\n", - "\n", - "evaluator(query_writer, **dspy_evaluator_kwargs)" - ] - }, - { - "cell_type": "code", - "execution_count": 24, - "id": "c7ef8690", - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 20:32:56 INFO dspy.teleprompt.gepa.gepa: Running GEPA for approx 500 metric calls of the program. This amounts to 33.33 full evals on the train+val set.\n", - "2025/08/13 20:32:56 INFO dspy.teleprompt.gepa.gepa: Using 15 examples for tracking Pareto scores. You can consider using a sample of the valset to allow GEPA to explore more diverse solutions within the same budget.\n", - "2025/08/13 20:33:06 INFO dspy.evaluate.evaluate: Average Metric: 8.0 / 15 (53.3%)\n", - "2025/08/13 20:33:06 INFO dspy.teleprompt.gepa.gepa: Iteration 0: Base program full valset score: 0.5333333333333333\n", - "2025/08/13 20:33:06 INFO dspy.teleprompt.gepa.gepa: Iteration 1: Selected program 0 score: 0.5333333333333333\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 3.83 / 5 (76.7%): 100%|██████████| 5/5 [00:04<00:00, 1.14it/s] " - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 20:33:10 INFO dspy.evaluate.evaluate: Average Metric: 3.833333333333333 / 5 (76.7%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 20:35:13 INFO dspy.teleprompt.gepa.gepa: Iteration 1: Proposed new text for expand_query: You are given a user’s technical question. Your task is to expand it into a search-engine-optimized query that will help find authoritative answers, fixes, and examples.\n", - "\n", - "How to write the expanded query:\n", - "- Preserve the user’s core ask, and add precise technical keywords: library/framework names, class/function/method names, parameters, config flags, environment variables, file/dir names, API endpoints, error messages, and exact version numbers when present.\n", - "- Include likely adjacent terms: “how to,” “best practices,” “known issues/bugs,” “workaround,” “version compatibility,” “examples,” “code snippets,” “configuration,” “persistence,” “performance,” “billing/costs,” “API semantics,” “syntax.”\n", - "- If the user provided code, extract and reference exact symbols (e.g., VectorstoreIndexCreator, RetrievalQA, as_retriever, JSONLoader, PromptTemplate, BaseCallbackHandler.on_llm_end, Redis similarity_distance_threshold, ChromaVectorStoreIndex, StorageContext.persist, PersistentClient, etc.) and any error strings as quoted text.\n", - "- Combine the main question with sub-questions that probe root causes, correct usage, and alternatives. Ask about correct configuration, correct API usage, troubleshooting steps, and minimal reproducible examples.\n", - "- Keep the output concise: one short paragraph or a compact bullet list. Do not add explanations or meta commentary—only the expanded query content.\n", - "\n", - "Domain-specific guidance to ensure coverage of important “nuggets” when relevant:\n", - "\n", - "LangChain vector stores, embeddings, and persistence:\n", - "- Include searches about VectorstoreIndexCreator defaults and persistence. Surface that a common setup involves Chroma (often backed by duckdb+parquet); DuckDB in-memory behavior vs persistence; how to reload persisted stores with persist_directory; and how to use .pkl or DB files for RetrievalQA retrievers.\n", - "- Cover OpenAI embedding costs: embeddings incur charges when created; loading a persisted vector store does not re-embed or re-charge; how to avoid re-embedding each run.\n", - "- Ask for exact code patterns to save/load re-usable indexes and retrievers, and how to ensure the embedding function matches on load.\n", - "\n", - "Chroma + LlamaIndex + LangChain integration issues:\n", - "- Note that Chroma persistence is handled by the Chroma client (prefer PersistentClient); manual StorageContext.persist may be unnecessary if the client persists.\n", - "- Include compatibility/usage with versions (e.g., chromadb 0.3.x, llama-index 0.6.x, langchain 0.0.245). Consider suggesting using LlamaIndex alone (LangChain not required), proper PromptTemplate usage, using OpenAIEmbeddings, and using OpenAI as the LLM for querying (or correct Azure OpenAI configuration if used).\n", - "- Ask why query_engine/query returns None despite embeddings being created; include debugging steps (collection counts, embedding function continuity, metadata, document IDs, and reload paths).\n", - "\n", - "Redis + LangChain retriever hybrid search:\n", - "- Include the exact search_type (similarity_distance_threshold), how to pass filters in search_kwargs, and correct RediSearch filter syntax.\n", - "- Surface a known issue around LangChain’s _prepare_range_query building invalid queries (e.g., “Invalid attribute yield_distance_as”), and that filters should be placed before the vector range clause. Ask for fixes or upgrading to a version where this is resolved, or for correct manual query construction.\n", - "- Include examples of correct search_kwargs and FT.SEARCH syntax.\n", - "\n", - "langchain.js JSONLoader and JSON structure:\n", - "- Ask how to configure JSONLoader so each array element/object becomes one document (e.g., JSON pointer/selectors), not each property.\n", - "- Include checks for JSON syntax errors (missing commas) that cause misparsing, and mapping which field becomes pageContent vs metadata.\n", - "- Request examples for querying data like “2 years experience” or “javascript” with proper schema.\n", - "\n", - "LangChain callbacks with VertexAI:\n", - "- Ask how to correctly attach callbacks to LLMs (pass callbacks to the LLM instance, not only to LLMChain) and the correct BaseCallbackHandler method signatures (on_llm_end(response, **kwargs)).\n", - "- Include version considerations and examples that print prompts/responses.\n", - "\n", - "Output format:\n", - "- Return only the expanded query as a single paragraph or a short bullet list. No headings or extra formatting.\n", - "2025/08/13 20:35:18 INFO dspy.evaluate.evaluate: Average Metric: 4.333333333333333 / 5 (86.7%)\n", - "2025/08/13 20:35:26 INFO dspy.evaluate.evaluate: Average Metric: 7.999999999999999 / 15 (53.3%)\n", - "2025/08/13 20:35:26 INFO dspy.teleprompt.gepa.gepa: Iteration 1: Full valset score for new program: 0.5333333333333333\n", - "2025/08/13 20:35:26 INFO dspy.teleprompt.gepa.gepa: Iteration 1: Full train_val score for new program: 0.5333333333333333\n", - "2025/08/13 20:35:26 INFO dspy.teleprompt.gepa.gepa: Iteration 1: Individual valset scores for new program: [0.25, 1.0, 0.0, 1.0, 1.0, 1.0, 0.3333333333333333, 0.3333333333333333, 0.0, 1.0, 1.0, 0.3333333333333333, 0.0, 0.5, 0.25]\n", - "2025/08/13 20:35:26 INFO dspy.teleprompt.gepa.gepa: Iteration 1: New valset pareto front scores: [0.25, 1.0, 0.0, 1.0, 1.0, 1.0, 0.3333333333333333, 0.3333333333333333, 1.0, 1.0, 1.0, 0.3333333333333333, 0.0, 0.5, 0.25]\n", - "2025/08/13 20:35:26 INFO dspy.teleprompt.gepa.gepa: Iteration 1: Full valset pareto front score: 0.6\n", - "2025/08/13 20:35:26 INFO dspy.teleprompt.gepa.gepa: Iteration 1: Updated valset pareto front programs: [{1}, {0, 1}, {0, 1}, {1}, {0, 1}, {0, 1}, {0, 1}, {0, 1}, {0}, {0, 1}, {0, 1}, {0, 1}, {0, 1}, {0, 1}, {1}]\n", - "2025/08/13 20:35:26 INFO dspy.teleprompt.gepa.gepa: Iteration 1: Best valset aggregate score so far: 0.5333333333333333\n", - "2025/08/13 20:35:26 INFO dspy.teleprompt.gepa.gepa: Iteration 1: Best program as per aggregate score on train_val: 0\n", - "2025/08/13 20:35:26 INFO dspy.teleprompt.gepa.gepa: Iteration 1: Best program as per aggregate score on valset: 0\n", - "2025/08/13 20:35:26 INFO dspy.teleprompt.gepa.gepa: Iteration 1: Best score on valset: 0.5333333333333333\n", - "2025/08/13 20:35:26 INFO dspy.teleprompt.gepa.gepa: Iteration 1: Best score on train_val: 0.5333333333333333\n", - "2025/08/13 20:35:26 INFO dspy.teleprompt.gepa.gepa: Iteration 1: Linear pareto front program index: 0\n", - "2025/08/13 20:35:26 INFO dspy.teleprompt.gepa.gepa: Iteration 1: New program candidate index: 1\n", - "2025/08/13 20:35:26 INFO dspy.teleprompt.gepa.gepa: Iteration 2: No merge candidates found\n", - "2025/08/13 20:35:26 INFO dspy.teleprompt.gepa.gepa: Iteration 2: Selected program 0 score: 0.5333333333333333\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 2.83 / 5 (56.7%): 100%|██████████| 5/5 [00:04<00:00, 1.11it/s] " - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 20:35:31 INFO dspy.evaluate.evaluate: Average Metric: 2.833333333333333 / 5 (56.7%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 20:36:27 INFO dspy.teleprompt.gepa.gepa: Iteration 2: Proposed new text for expand_query: You expand user questions into a single, concise search query paragraph that helps a search engine retrieve high-signal answers. Follow these rules:\n", - "\n", - "Output format\n", - "- Return only the expanded search query as one paragraph (no headings, labels, or extra commentary).\n", - "\n", - "How to expand\n", - "1) Extract the exact technologies, task, and any error messages from the question. Quote error messages verbatim.\n", - "2) Add synonyms, alternative module/package names, related APIs, and common misconfigurations.\n", - "3) Include specific, likely root causes and fixes, especially known version and compatibility issues.\n", - "4) Mention the expected correct approach and key keywords developers would search for (correct class names, functions, parameters, install commands, version pins).\n", - "5) Keep it precise and targeted (2–5 sentences). Avoid generic fluff.\n", - "\n", - "Domain-specific nuggets to always include when relevant\n", - "- Hugging Face + LangChain:\n", - " - HuggingFacePipeline currently supports only text-generation, text2text-generation, and summarization tasks; it does not support automatic-speech-recognition (ASR). Trying to use it for ASR (e.g., with Whisper) can cause errors like AttributeError: 'WhisperProcessor' object has no attribute 'config'. Search for the correct transformers.pipeline usage for ASR (e.g., Whisper with AutoModelForSpeechSeq2Seq/WhisperForConditionalGeneration and WhisperProcessor/feature extractor), and that LangChain’s HuggingFacePipeline isn’t appropriate for ASR.\n", - "\n", - "- LangChain imports:\n", - " - If ModuleNotFoundError: No module named 'langchain_openai' occurs, note that the correct import for ChatOpenAI can be from langchain_community.chat_models (and that package naming and splits changed). Include queries about installing/using langchain-community, langchain-core, and correct import paths.\n", - "\n", - "- Chroma vector store:\n", - " - Chroma’s .get() excludes embeddings by default; to retrieve them, use include=['embeddings']. Make clear that seeing 'embeddings': None is expected unless include is provided. Include keywords about persist(), persist_directory, and verifying chroma-embeddings.parquet when relevant.\n", - "\n", - "- LangChain tools + Pydantic:\n", - " - If using @tool(args_schema=...) with Pydantic and seeing ValidationError about BaseModel subclass, note that downgrading to pydantic==1.10.10 often resolves it (Pydantic v2 vs LangChain compatibility). Include exact pin and pip install command.\n", - "\n", - "- chromadb installation/runtime:\n", - " - ImportError or install loops can be due to Python version incompatibility; switching to Python 3.10 commonly fixes chromadb issues. Include environment/version troubleshooting (Python 3.10, virtualenv/conda).\n", - "\n", - "General best practices to weave into the query\n", - "- Include exact model IDs (e.g., openai/whisper-large-v2), class/function names (ChatOpenAI, Chroma.from_documents, .get(include=...), @tool, args_schema), and relevant frameworks (LangChain, transformers, chromadb, Pydantic).\n", - "- Add “correct usage,” “supported tasks,” “version compatibility,” “install command,” “example code,” and “breaking changes” as search intents.\n", - "- When errors are present, include “root cause,” “why,” and “how to fix” phrasing and alternative APIs or modules.\n", - "\n", - "Example shaping (do not output examples; this is guidance)\n", - "- Include: “Is HuggingFacePipeline limited to text-generation and not ASR? Correct transformers pipeline for Whisper openai/whisper-large-v2.”\n", - "- Include: “Correct import path for ChatOpenAI (langchain_community.chat_models) vs langchain_openai; installation steps.”\n", - "- Include: “Chroma .get embeddings None by default; use include=['embeddings'].”\n", - "- Include: “@tool args_schema Pydantic ValidationError; pin pydantic==1.10.10; pip install command.”\n", - "- Include: “chromadb ImportError resolved by Python 3.10.”\n", - "\n", - "Your goal: craft a search query that already anticipates these pitfalls and brings back authoritative fixes and examples.\n", - "2025/08/13 20:36:32 INFO dspy.evaluate.evaluate: Average Metric: 3.3333333333333335 / 5 (66.7%)\n", - "2025/08/13 20:36:47 INFO dspy.evaluate.evaluate: Average Metric: 7.833333333333333 / 15 (52.2%)\n", - "2025/08/13 20:36:47 INFO dspy.teleprompt.gepa.gepa: Iteration 2: Full valset score for new program: 0.5222222222222223\n", - "2025/08/13 20:36:47 INFO dspy.teleprompt.gepa.gepa: Iteration 2: Full train_val score for new program: 0.5222222222222223\n", - "2025/08/13 20:36:47 INFO dspy.teleprompt.gepa.gepa: Iteration 2: Individual valset scores for new program: [0.25, 1.0, 0.0, 1.0, 1.0, 0.6666666666666666, 0.3333333333333333, 0.3333333333333333, 0.0, 1.0, 1.0, 0.0, 0.0, 0.5, 0.75]\n", - "2025/08/13 20:36:47 INFO dspy.teleprompt.gepa.gepa: Iteration 2: New valset pareto front scores: [0.25, 1.0, 0.0, 1.0, 1.0, 1.0, 0.3333333333333333, 0.3333333333333333, 1.0, 1.0, 1.0, 0.3333333333333333, 0.0, 0.5, 0.75]\n", - "2025/08/13 20:36:47 INFO dspy.teleprompt.gepa.gepa: Iteration 2: Full valset pareto front score: 0.6333333333333333\n", - "2025/08/13 20:36:47 INFO dspy.teleprompt.gepa.gepa: Iteration 2: Updated valset pareto front programs: [{1, 2}, {0, 1, 2}, {0, 1, 2}, {1, 2}, {0, 1, 2}, {0, 1}, {0, 1, 2}, {0, 1, 2}, {0}, {0, 1, 2}, {0, 1, 2}, {0, 1}, {0, 1, 2}, {0, 1, 2}, {2}]\n", - "2025/08/13 20:36:47 INFO dspy.teleprompt.gepa.gepa: Iteration 2: Best valset aggregate score so far: 0.5333333333333333\n", - "2025/08/13 20:36:47 INFO dspy.teleprompt.gepa.gepa: Iteration 2: Best program as per aggregate score on train_val: 0\n", - "2025/08/13 20:36:47 INFO dspy.teleprompt.gepa.gepa: Iteration 2: Best program as per aggregate score on valset: 0\n", - "2025/08/13 20:36:47 INFO dspy.teleprompt.gepa.gepa: Iteration 2: Best score on valset: 0.5333333333333333\n", - "2025/08/13 20:36:47 INFO dspy.teleprompt.gepa.gepa: Iteration 2: Best score on train_val: 0.5333333333333333\n", - "2025/08/13 20:36:47 INFO dspy.teleprompt.gepa.gepa: Iteration 2: Linear pareto front program index: 0\n", - "2025/08/13 20:36:47 INFO dspy.teleprompt.gepa.gepa: Iteration 2: New program candidate index: 2\n", - "2025/08/13 20:36:47 INFO dspy.teleprompt.gepa.gepa: Iteration 3: No merge candidates found\n", - "2025/08/13 20:36:47 INFO dspy.teleprompt.gepa.gepa: Iteration 3: Selected program 2 score: 0.5222222222222223\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 2.53 / 5 (50.7%): 100%|██████████| 5/5 [00:05<00:00, 1.03s/it]" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 20:36:52 INFO dspy.evaluate.evaluate: Average Metric: 2.5333333333333337 / 5 (50.7%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 20:38:28 INFO dspy.teleprompt.gepa.gepa: Iteration 3: Proposed new text for expand_query: You expand developer questions into a single, concise search-query paragraph that helps a search engine retrieve high-signal, authoritative fixes and examples.\n", - "\n", - "Output format\n", - "- Return only the expanded search query as one paragraph (no headings or extra commentary), 2–5 sentences total.\n", - "\n", - "How to expand\n", - "1) Extract the exact technologies, libraries, versions, models, classes/functions, parameters, and any error messages (quote errors verbatim). Identify the user’s task and where it’s failing.\n", - "2) Add synonyms and related names (e.g., LlamaIndex/llama_index/gpt_index; ChatOpenAI vs OpenAI wrappers; HuggingFacePipeline vs transformers.pipeline), plus common misconfigurations and breaking changes.\n", - "3) Anticipate root causes and fixes: version/compatibility issues, correct imports, supported tasks, proper parameters/flags, environment variables, install commands, and version pins.\n", - "4) Include keywords for the “expected correct approach” developers would search for: exact class/function names, model IDs, flags, example code terms, correct API/module names, and minimal repro patterns.\n", - "5) Keep it targeted and precise; prefer specific, likely solutions over generic advice.\n", - "\n", - "Domain-specific nuggets to always include when relevant\n", - "- OpenAI chat sessions and context:\n", - " - The Chat Completions API does not maintain server-side session state; you must resend conversation history each call. Include best practices for client-side memory (ConversationSummaryMemory, ConversationBufferMemory, ConversationBufferMemoryHistory, RunnableWithMessageHistory), summarizing or truncating history, token limits, and caching. Include “no server-side session,” “conversation ID not supported,” “how to maintain context,” and “message history management.”\n", - "- Hugging Face + LangChain:\n", - " - HuggingFacePipeline supports only text-generation, text2text-generation, and summarization; it does not support automatic-speech-recognition (ASR). Errors like AttributeError: 'WhisperProcessor' object has no attribute 'config' can appear when misused. Include “transformers.pipeline('automatic-speech-recognition')”, correct ASR components (AutoModelForSpeechSeq2Seq/WhisperForConditionalGeneration + WhisperProcessor/feature extractor), and that LangChain’s HuggingFacePipeline isn’t appropriate for ASR.\n", - " - For fully local inference (e.g., Falcon 40B instruct), load with transformers (AutoTokenizer, AutoModelForCausalLM), create a text-generation pipeline, then wrap with HuggingFacePipeline or ChatHuggingFace. Include flags/keywords like device_map=\"auto\", torch_dtype, trust_remote_code=True (Falcon), bitsandbytes/4-bit or 8-bit quantization, and avoiding HfHubHTTPError: 401 by not using HuggingFaceHub. Include “example code,” “local weights,” “no token,” and “integration with RetrievalQA.”\n", - "- LangChain imports and package split:\n", - " - If ModuleNotFoundError: No module named 'langchain_openai' occurs, note the package splits and correct imports/installs (langchain, langchain-community, langchain-core, langchain-openai; alternatively ChatOpenAI from langchain_community.chat_models in older setups). Include “correct import path,” “installation steps,” and “breaking changes.”\n", - "- Using OpenAI-compatible local endpoints with LangChain:\n", - " - Use ChatOpenAI (or OpenAI) with a custom base URL (api_base/base_url/openai_api_base) pointing to localhost; set a dummy openai_api_key if required. Include environment variables (OPENAI_API_KEY, OPENAI_BASE_URL) and parameters (model/model_name) and “OpenAI-compatible server,” “no custom client,” “works out of the box.”\n", - "- Chroma vector store:\n", - " - Chroma .get excludes embeddings by default; to retrieve them, use include=['embeddings'] (otherwise you’ll see 'embeddings': None). Include persist(), persist_directory, and verifying chroma-embeddings.parquet if persistence is relevant.\n", - "- LangChain tools + Pydantic:\n", - " - If using @tool(args_schema=...) and seeing ValidationError about BaseModel subclass, it’s often Pydantic v2 vs LangChain compatibility; pin pydantic==1.10.10. Include the exact pin and pip install command.\n", - "- chromadb installation/runtime:\n", - " - ImportError or install loops can be due to Python version incompatibility; Python 3.10 commonly resolves chromadb issues. Include environment/version troubleshooting and virtualenv/conda hints.\n", - "- LlamaIndex + LangChain RAG integration:\n", - " - For sentence window retrieval, use llama_index’s SentenceWindowNodeParser, VectorStoreIndex, MetadataReplacementPostProcessor, LLMRerank, and sentence_window_engine to query. Inject retrieved snippets into the LangChain chat context via prompt placeholders (e.g., {context}, {summary}, {messages}) rather than dumping entire documents; include memory/token management terms (ConversationBufferMemory, ConversationBufferMemoryHistory) and evaluation metrics (latency, token usage).\n", - "- SQL with LangChain:\n", - " - GPT-4/ChatOpenAI with SQLDatabaseChain may include extra commentary around SQL, causing sqlite3.OperationalError near \"The\": syntax error. Include “create_sql_agent,” “SQLDatabaseToolkit,” “output parser,” “prompt template enforcing raw SQL,” and “use_query_checker behavior” as fixes/alternatives. Mention model_name=\"gpt-4-0613” and differences vs text-davinci-003.\n", - "\n", - "General best-practice keywords to weave in\n", - "- Exact model IDs and classes: openai/whisper-large-v2, ChatOpenAI, HuggingFacePipeline, ChatHuggingFace, RetrievalQA, Chroma.from_documents, .get(include=['embeddings']), @tool args_schema, SentenceWindowNodeParser, VectorStoreIndex, LLMRerank, MetadataReplacementPostProcessor, create_sql_agent, SQLDatabaseToolkit.\n", - "- Search intents: “correct usage,” “supported tasks,” “version compatibility,” “install command,” “example code,” “breaking changes,” “root cause,” “why,” “how to fix,” “dummy API key,” “custom base URL,” “persist_directory,” “Python 3.10,” “pydantic==1.10.10.”\n", - "\n", - "Style\n", - "- Be specific and action-oriented; prefer concrete class/param names, model IDs, and troubleshooting terms developers actually search for. Quote error strings verbatim. Avoid fluff.\n", - "2025/08/13 20:38:46 INFO dspy.evaluate.evaluate: Average Metric: 3.5833333333333335 / 5 (71.7%)\n", - "2025/08/13 20:39:08 INFO dspy.evaluate.evaluate: Average Metric: 10.25 / 15 (68.3%)\n", - "2025/08/13 20:39:08 INFO dspy.teleprompt.gepa.gepa: Iteration 3: New program is on the linear pareto front\n", - "2025/08/13 20:39:08 INFO dspy.teleprompt.gepa.gepa: Iteration 3: Full valset score for new program: 0.6833333333333333\n", - "2025/08/13 20:39:08 INFO dspy.teleprompt.gepa.gepa: Iteration 3: Full train_val score for new program: 0.6833333333333333\n", - "2025/08/13 20:39:08 INFO dspy.teleprompt.gepa.gepa: Iteration 3: Individual valset scores for new program: [0.5, 1.0, 0.0, 0.5, 1.0, 1.0, 0.3333333333333333, 0.3333333333333333, 1.0, 1.0, 1.0, 0.3333333333333333, 1.0, 0.5, 0.75]\n", - "2025/08/13 20:39:08 INFO dspy.teleprompt.gepa.gepa: Iteration 3: New valset pareto front scores: [0.5, 1.0, 0.0, 1.0, 1.0, 1.0, 0.3333333333333333, 0.3333333333333333, 1.0, 1.0, 1.0, 0.3333333333333333, 1.0, 0.5, 0.75]\n", - "2025/08/13 20:39:08 INFO dspy.teleprompt.gepa.gepa: Iteration 3: Full valset pareto front score: 0.7166666666666667\n", - "2025/08/13 20:39:08 INFO dspy.teleprompt.gepa.gepa: Iteration 3: Updated valset pareto front programs: [{3}, {0, 1, 2, 3}, {0, 1, 2, 3}, {1, 2}, {0, 1, 2, 3}, {0, 1, 3}, {0, 1, 2, 3}, {0, 1, 2, 3}, {0, 3}, {0, 1, 2, 3}, {0, 1, 2, 3}, {0, 1, 3}, {3}, {0, 1, 2, 3}, {2, 3}]\n", - "2025/08/13 20:39:08 INFO dspy.teleprompt.gepa.gepa: Iteration 3: Best valset aggregate score so far: 0.6833333333333333\n", - "2025/08/13 20:39:08 INFO dspy.teleprompt.gepa.gepa: Iteration 3: Best program as per aggregate score on train_val: 3\n", - "2025/08/13 20:39:08 INFO dspy.teleprompt.gepa.gepa: Iteration 3: Best program as per aggregate score on valset: 3\n", - "2025/08/13 20:39:08 INFO dspy.teleprompt.gepa.gepa: Iteration 3: Best score on valset: 0.6833333333333333\n", - "2025/08/13 20:39:08 INFO dspy.teleprompt.gepa.gepa: Iteration 3: Best score on train_val: 0.6833333333333333\n", - "2025/08/13 20:39:08 INFO dspy.teleprompt.gepa.gepa: Iteration 3: Linear pareto front program index: 3\n", - "2025/08/13 20:39:08 INFO dspy.teleprompt.gepa.gepa: Iteration 3: New program candidate index: 3\n", - "2025/08/13 20:39:08 INFO dspy.teleprompt.gepa.gepa: Iteration 4: No merge candidates found\n", - "2025/08/13 20:39:08 INFO dspy.teleprompt.gepa.gepa: Iteration 4: Selected program 3 score: 0.6833333333333333\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 4.25 / 5 (85.0%): 100%|██████████| 5/5 [00:07<00:00, 1.47s/it]" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 20:39:16 INFO dspy.evaluate.evaluate: Average Metric: 4.25 / 5 (85.0%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 20:40:11 INFO dspy.teleprompt.gepa.gepa: Iteration 4: Proposed new text for expand_query: You expand developer questions into a single, concise search-query paragraph that helps a search engine retrieve high-signal, authoritative fixes and examples.\n", - "\n", - "Output format\n", - "- Return only the expanded search query as one paragraph (no headings or extra commentary), 2–5 sentences total.\n", - "\n", - "How to expand\n", - "1) Extract exact technologies, libraries, versions, models, classes/functions, parameters, flags, and quote error messages verbatim. Identify the user’s goal and where it’s failing.\n", - "2) Add synonyms/aliases and related names (e.g., LlamaIndex/llama_index/gpt_index; ChatOpenAI/OpenAI wrappers; HuggingFacePipeline/transformers.pipeline; ChatHuggingFace/HuggingFacePipeline; VertexAI PaLM “text-bison@001”). Include likely misconfigs and breaking changes.\n", - "3) Anticipate root causes and fixes: version/compatibility issues, correct imports after package splits, supported tasks, proper parameters/flags, required env vars, install commands, and version pins. Prefer concrete, likely solutions over generic advice.\n", - "4) Include keywords for the expected correct approach: exact class/function names, model IDs, flags, example code terms, correct API/module names, minimal repro patterns, and where relevant “upgrade/pin to version X”, “correct import path”, or “local-only setup”.\n", - "5) Keep it targeted and precise; avoid fluff.\n", - "\n", - "Domain-specific nuggets to always include when relevant\n", - "- OpenAI chat sessions and context:\n", - " - The Chat Completions API does not maintain server-side session state; you must resend conversation history each call. Include “no server-side session,” “conversation ID not supported,” “how to maintain context,” “message history management,” token limits, and best practices for client-side memory (ConversationBufferMemory, ConversationSummaryMemory, ConversationBufferMemoryHistory, RunnableWithMessageHistory), plus summarizing/truncating history and caching.\n", - "- Using OpenAI-compatible local endpoints with LangChain:\n", - " - Use ChatOpenAI (or OpenAI) with a custom base URL (api_base/base_url/openai_api_base) pointing to localhost; set a dummy OPENAI_API_KEY to satisfy validation. Include environment variables (OPENAI_API_KEY, OPENAI_BASE_URL/OPENAI_API_BASE) and parameters (model/model_name). Emphasize “OpenAI-compatible server,” “no custom client,” “works out of the box.”\n", - "- Hugging Face + LangChain:\n", - " - HuggingFacePipeline supports only text-generation, text2text-generation, and summarization; it does not support ASR. Errors like \"AttributeError: 'WhisperProcessor' object has no attribute 'config'\" occur when misused. Include “transformers.pipeline('automatic-speech-recognition')”, correct ASR stack (AutoModelForSpeechSeq2Seq or WhisperForConditionalGeneration + WhisperProcessor/feature extractor), and that LangChain’s HuggingFacePipeline isn’t appropriate for ASR.\n", - " - For fully local inference (e.g., Falcon 40B instruct), load with transformers (AutoTokenizer, AutoModelForCausalLM), create a text-generation pipeline, then wrap with HuggingFacePipeline or ChatHuggingFace. Include device_map=\"auto\", torch_dtype, trust_remote_code=True (Falcon), bitsandbytes 4-bit/8-bit quantization, “local weights,” “no token,” and avoiding HfHubHTTPError: 401 by not using HuggingFaceHub. Include “example code,” and integration with RetrievalQA.\n", - "- LangChain imports and package split:\n", - " - If ModuleNotFoundError: No module named 'langchain_openai' occurs, note recent package splits and correct installs/imports (langchain, langchain-community, langchain-core, langchain-openai; alternatively ChatOpenAI from langchain_community.chat_models in older setups). Include “correct import path,” “installation steps,” and “breaking changes.”\n", - "- Chroma vector store:\n", - " - Chroma .get excludes embeddings by default; to retrieve them, use include=['embeddings'] (otherwise you’ll see 'embeddings': None). Include persist(), persist_directory, and verifying chroma-embeddings.parquet if persistence is relevant.\n", - "- chromadb installation/runtime:\n", - " - ImportError or install loops can be due to Python version incompatibility; Python 3.10 commonly resolves chromadb issues. Include environment/version troubleshooting and virtualenv/conda hints.\n", - "- Redis + LangChain retriever (hybrid/vector search):\n", - " - Pass search_kwargs as a dict (not a string). For similarity_distance_threshold, include {'k': ..., 'distance_threshold': ..., 'include_metadata': True, 'filter': '...'} or use retriever_search_kwargs appropriately. Known issue in older LangChain (e.g., ~0.0.346): _prepare_range_query may generate invalid syntax like \"=>{$yield_distance_as: distance}\", causing \"redis.exceptions.ResponseError: Invalid attribute yield_distance_as\". Include fixes: upgrade langchain/redis-py/Redisearch, ensure RediSearch module version compatibility, and correct query construction with the metadata filter preceding the vector clause. Include keywords: \"similarity_distance_threshold\", \"retriever_search_kwargs\", \"_prepare_range_query\", \"Invalid attribute yield_distance_as\", “correct Redis query syntax”.\n", - "- LangChain tools + Pydantic:\n", - " - If using @tool(args_schema=...) and seeing ValidationError about BaseModel subclass, it’s often Pydantic v2 vs LangChain compatibility; pin pydantic==1.10.10. Include the exact pin and pip install command.\n", - "- LlamaIndex + LangChain RAG integration:\n", - " - For sentence window retrieval, use llama_index’s SentenceWindowNodeParser, VectorStoreIndex, MetadataReplacementPostProcessor, LLMRerank, and sentence_window_engine to query. Inject retrieved snippets into the LangChain chat context via prompt placeholders (e.g., {context}, {summary}, {messages}); include memory/token management terms and evaluation metrics (latency, token usage).\n", - "- SQL with LangChain:\n", - " - GPT-4/ChatOpenAI with SQLDatabaseChain may include extra commentary around SQL, causing sqlite3.OperationalError near \"The\": syntax error. Include “create_sql_agent,” “SQLDatabaseToolkit,” “output parser,” “prompt template enforcing raw SQL,” and “use_query_checker behavior,” and mention model_name=\"gpt-4-0613\" and differences vs text-davinci-003.\n", - "- Callback handlers (VertexAI and LangChain callbacks):\n", - " - Correct signatures: on_llm_start(self, prompts, **kwargs), on_llm_end(self, response, **kwargs), on_llm_error(self, error, **kwargs). Pass callbacks to the LLM instance (e.g., VertexAI(..., callbacks=[handler])) or at call time, not only to LLMChain, due to propagation differences and breaking changes. Include keywords: BaseCallbackHandler, CallbackManagerForLLMRun, langchain.callbacks.base vs langchain_core.callbacks, “callbacks on LLM vs Chain,” “breaking changes,” and example code terms.\n", - "\n", - "General best-practice keywords to weave in\n", - "- Exact model IDs/classes: openai/whisper-large-v2, ChatOpenAI, OpenAI, HuggingFacePipeline, ChatHuggingFace, RetrievalQA, Chroma.from_documents, .get(include=['embeddings']), @tool args_schema, SentenceWindowNodeParser, VectorStoreIndex, LLMRerank, MetadataReplacementPostProcessor, create_sql_agent, SQLDatabaseToolkit.\n", - "- Search intents: “correct usage,” “supported tasks,” “version compatibility,” “install command,” “example code,” “breaking changes,” “root cause,” “why,” “how to fix,” “dummy API key,” “custom base URL,” “persist_directory,” “Python 3.10,” “pydantic==1.10.10,” “upgrade langchain,” “correct import path.”\n", - "\n", - "Style\n", - "- Be specific and action-oriented; prefer concrete class/param names, model IDs, error strings, and troubleshooting terms developers actually search for.\n", - "2025/08/13 20:40:21 INFO dspy.evaluate.evaluate: Average Metric: 3.95 / 5 (79.0%)\n", - "2025/08/13 20:40:21 INFO dspy.teleprompt.gepa.gepa: Iteration 4: New subsample score is not better, skipping\n", - "2025/08/13 20:40:21 INFO dspy.teleprompt.gepa.gepa: Iteration 5: Selected program 3 score: 0.6833333333333333\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 2.83 / 5 (56.7%): 100%|██████████| 5/5 [00:07<00:00, 1.44s/it] " - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 20:40:28 INFO dspy.evaluate.evaluate: Average Metric: 2.833333333333333 / 5 (56.7%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 20:41:31 INFO dspy.teleprompt.gepa.gepa: Iteration 5: Proposed new text for expand_query: You expand developer questions into a single, concise search-query paragraph that helps a search engine retrieve high-signal, authoritative fixes and examples.\n", - "\n", - "Output format\n", - "- Return only the expanded search query as one paragraph (no headings or extra commentary), 2–5 sentences total.\n", - "\n", - "How to expand\n", - "1) Extract the exact technologies, libraries, versions, models, classes/functions, parameters/flags, environment variables, and any error messages (quote errors verbatim). Identify the user’s goal, where it’s failing, and the minimal repro pattern.\n", - "2) Add synonyms/aliases and related names (e.g., LlamaIndex/llama_index/gpt_index; ChatOpenAI vs OpenAI wrappers; HuggingFacePipeline vs transformers.pipeline; ChatHuggingFace; SQLDatabaseChain vs create_sql_agent/SQLDatabaseToolkit).\n", - "3) Anticipate root causes and fixes: version/compatibility, correct imports, supported tasks, proper parameters/flags, environment variables, install commands, and exact version pins. Prefer specific, likely solutions over generic advice.\n", - "4) Include keywords for the expected correct approach developers would search for: exact class/function names, model IDs, flags, example code terms, correct API/module names, and minimal repro terms.\n", - "5) Keep it targeted and precise; focus on actionable fixes, correct usage patterns, and authoritative examples.\n", - "\n", - "Domain-specific nuggets to always include when relevant\n", - "- OpenAI chat sessions and context:\n", - " - The Chat Completions API has no server-side session; you must resend conversation history each call. Include “no server-side session,” “conversation ID not supported,” “how to maintain context,” “message history management,” token limits, summarizing/truncating history, and client-side memory/caching (ConversationSummaryMemory, ConversationBufferMemory, ConversationBufferMemoryHistory, RunnableWithMessageHistory).\n", - "- Hugging Face + LangChain:\n", - " - HuggingFacePipeline supports only text-generation, text2text-generation, and summarization; it does not support automatic-speech-recognition (ASR). Errors like \"AttributeError: 'WhisperProcessor' object has no attribute 'config'\" arise if misused for ASR. Use transformers.pipeline('automatic-speech-recognition') with ASR models (AutoModelForSpeechSeq2Seq/WhisperForConditionalGeneration + WhisperProcessor/feature extractor). LangChain’s HuggingFacePipeline isn’t appropriate for ASR; for diarization use pyannote.audio or HF diarization pipelines.\n", - " - For fully local inference (e.g., Falcon 40B instruct), load with transformers (AutoTokenizer, AutoModelForCausalLM), create a text-generation pipeline, then wrap with HuggingFacePipeline or ChatHuggingFace. Include device_map=\"auto\", torch_dtype, trust_remote_code=True (Falcon), bitsandbytes 4-bit/8-bit quantization, “local weights,” “no token,” avoid HfHubHTTPError: 401 by not using HuggingFaceHub, and “integration with RetrievalQA.”\n", - "- LangChain imports and package split:\n", - " - If ModuleNotFoundError: No module named 'langchain_openai' occurs, note the package split and correct installs/imports (langchain, langchain-community, langchain-core, langchain-openai). Mention older import paths (e.g., ChatOpenAI from langchain_community.chat_models) when relevant.\n", - "- Using OpenAI-compatible local endpoints with LangChain:\n", - " - Use ChatOpenAI/OpenAI with custom base URL pointing to localhost (api_base/base_url/openai_api_base). Set a dummy OPENAI_API_KEY if required. Include env vars (OPENAI_API_KEY, OPENAI_BASE_URL), parameters (model/model_name), and “OpenAI-compatible server,” “works out of the box,” “no custom client.”\n", - "- Chroma vector store:\n", - " - Chroma .get excludes embeddings by default; use include=['embeddings'] (otherwise 'embeddings': None). Include persist(), persist_directory, and verifying chroma-embeddings.parquet if persistence matters.\n", - "- LangChain tools + Pydantic:\n", - " - For @tool(args_schema=...) ValidationError about BaseModel subclass, call out Pydantic v2 vs LangChain compatibility; pin pydantic==1.10.10 with pip install pydantic==1.10.10.\n", - "- chromadb installation/runtime:\n", - " - Import/install issues can be Python version related; Python 3.10 commonly resolves chromadb problems. Include environment/version troubleshooting and virtualenv/conda hints.\n", - "- LlamaIndex + LangChain RAG integration:\n", - " - For sentence window retrieval, use llama_index’s SentenceWindowNodeParser, VectorStoreIndex, MetadataReplacementPostProcessor, LLMRerank, and sentence_window_engine to query. Inject retrieved snippets into the LangChain chat context via prompt placeholders (e.g., {context}, {summary}, {messages}); don’t dump entire documents. Include “how to maintain context,” memory/token management (ConversationBufferMemory, ConversationBufferMemoryHistory), and evaluation metrics (latency, token usage).\n", - "- SQL with LangChain:\n", - " - GPT-4/ChatOpenAI with SQLDatabaseChain can include commentary around SQL, causing sqlite3.OperationalError near \"The\": syntax error. Include “create_sql_agent,” “SQLDatabaseToolkit,” output parser/enforcing raw SQL-only, prompt templates that constrain output, and note use_query_checker behavior. Mention model_name=\"gpt-4-0613\" differences vs text-davinci-003.\n", - "\n", - "General best-practice keywords to weave in when applicable\n", - "- Exact model IDs and classes: openai/whisper-large-v2, WhisperForConditionalGeneration, AutoModelForSpeechSeq2Seq, ChatOpenAI, OpenAI, HuggingFacePipeline, ChatHuggingFace, RetrievalQA.\n", - "- Correct functions and flags: transformers.pipeline('automatic-speech-recognition'), device_map=\"auto\", torch_dtype, trust_remote_code=True, include=['embeddings'], persist_directory, dummy API key, custom base URL.\n", - "- Install/version pins and env: pip install commands, pydantic==1.10.10, Python 3.10, OPENAI_API_KEY, OPENAI_BASE_URL.\n", - "- Prompt/context patterns: ChatPromptTemplate, MessagesPlaceholder, {messages}, {context}, {summary}, RunnableWithMessageHistory.\n", - "\n", - "Style and quality checklist\n", - "- 2–5 sentences, one paragraph, no extra lines or headings.\n", - "- Quote error messages verbatim.\n", - "- Be specific and action-oriented; prefer concrete class/param names, model IDs, flags, and troubleshooting terms.\n", - "- Include likely root cause(s) and the “expected correct approach” with exact keywords developers would search for.\n", - "2025/08/13 20:41:38 INFO dspy.evaluate.evaluate: Average Metric: 2.833333333333333 / 5 (56.7%)\n", - "2025/08/13 20:41:38 INFO dspy.teleprompt.gepa.gepa: Iteration 5: New subsample score is not better, skipping\n", - "2025/08/13 20:41:38 INFO dspy.teleprompt.gepa.gepa: Iteration 6: Selected program 1 score: 0.5333333333333333\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 3.17 / 5 (63.3%): 100%|██████████| 5/5 [00:03<00:00, 1.39it/s]" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 20:41:41 INFO dspy.evaluate.evaluate: Average Metric: 3.1666666666666665 / 5 (63.3%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 20:42:42 INFO dspy.teleprompt.gepa.gepa: Iteration 6: Proposed new text for expand_query: You are given a user’s technical question. Your task is to expand it into a search-engine-optimized query that will find authoritative answers, fixes, and examples.\n", - "\n", - "How to write the expanded query:\n", - "- Preserve the user’s core ask, and add precise technical keywords: library/framework names, classes/functions/methods, parameters/flags, env vars, file/dir names, API endpoints, error strings (quoted), and exact version numbers when present.\n", - "- Include adjacent terms users actually search for: “how to,” “best practices,” “known issues/bugs,” “workaround,” “version compatibility,” “migration,” “examples,” “code snippets,” “configuration,” “persistence,” “performance,” “billing/costs,” “API semantics,” “syntax,” “installation,” “pip/conda.”\n", - "- If the user provided code or errors, extract and reference exact symbols and strings (e.g., VectorstoreIndexCreator, RetrievalQA, as_retriever, JSONLoader, PromptTemplate, BaseCallbackHandler.on_llm_end, Chroma.from_documents, Chroma.get(include=['embeddings']), StorageContext.persist, chromadb.PersistentClient, GPTVectorStoreIndex, as_query_engine, FT.SEARCH, search_type=\"similarity_distance_threshold\", search_kwargs, “ModuleNotFoundError: No module named 'langchain_openai'”, “Invalid attribute yield_distance_as”).\n", - "- Combine the main question with sub-questions that probe root causes, correct configuration and API usage, troubleshooting steps, minimal reproducible examples, and viable alternatives. Ask explicitly for correct save/load flows, version-specific pitfalls, and verification/debug steps.\n", - "- Keep output concise: one short paragraph or a compact bullet list. Do not add explanations or meta commentary—only the expanded query content.\n", - "\n", - "Domain-specific guidance to ensure you cover important nuggets when relevant:\n", - "\n", - "LangChain vector stores, embeddings, and persistence:\n", - "- Surface defaults and persistence: VectorstoreIndexCreator often uses Chroma (backed by duckdb+parquet); DuckDB in-memory vs persistent behavior with persist_directory; how to reload persisted stores; how to use saved DB files or pickles with RetrievalQA retrievers.\n", - "- OpenAI embedding costs: embeddings are billed when created; loading a persisted store does not re-embed or re-charge; how to avoid re-embedding on each run.\n", - "- Ask for exact patterns to save/load reusable indexes and retrievers, ensuring the same embedding function on load to avoid mismatch.\n", - "\n", - "Chroma + LlamaIndex + LangChain integration:\n", - "- Chroma persistence is handled by the Chroma client; prefer PersistentClient; manual StorageContext.persist may be unnecessary if the client persists.\n", - "- Include version/compatibility (e.g., chromadb 0.3.x, llama-index 0.6.x, langchain 0.0.245). Consider suggesting using LlamaIndex alone (LangChain not required), using proper Prompt/PromptTemplate, using OpenAIEmbeddings, and using OpenAI as the LLM for querying (or correct Azure OpenAI configuration: deployment_name, openai_api_version).\n", - "- When query_engine/query returns None despite embeddings: include debugging steps (collection counts, include=['embeddings'] in Chroma.get, embedding function continuity, metadata/IDs, reload paths, persist_directory, ensuring db.persist() if using LangChain wrapper).\n", - "\n", - "Chroma + LangChain embeddings visibility:\n", - "- The default Chroma .get does not include embeddings; include=['embeddings'] is required to see them. Ask for examples showing correct usage and verification that embeddings are stored in chroma-embeddings.parquet.\n", - "\n", - "Redis + LangChain retriever hybrid search:\n", - "- Include exact search_type (e.g., similarity_distance_threshold), passing filters in search_kwargs, and correct RediSearch FT.SEARCH filter syntax.\n", - "- Surface the known issue with LangChain’s _prepare_range_query building invalid queries (e.g., “Invalid attribute yield_distance_as”) and that filters should be placed before the vector range clause; ask for fixes/upgrades or correct manual query construction; include examples of working search_kwargs.\n", - "\n", - "langchain.js JSONLoader and JSON structure:\n", - "- Ask how to configure JSONLoader so each array element/object becomes one document (JSON pointer/selectors); mapping which fields become pageContent vs metadata; checking for JSON syntax errors; examples for querying fields like “2 years experience” or “javascript.”\n", - "\n", - "LangChain callbacks with VertexAI:\n", - "- Ask how to attach callbacks to LLMs (pass callbacks to the LLM instance, not only LLMChain) and correct BaseCallbackHandler signatures (on_llm_end(response, **kwargs)); include version considerations and examples printing prompts/responses.\n", - "\n", - "LangChain packaging/modularization and imports:\n", - "- Include breaking changes and correct imports across versions: ChatOpenAI may be in langchain_openai (requires pip install langchain-openai) or in langchain_community.chat_models in older setups; verify langchain-core, langchain-community, langchain-openai install and version compatibility; include Jupyter/venv install issues (pip in correct interpreter, kernel restart); provide minimal working import examples.\n", - "\n", - "Pydantic compatibility with LangChain tools:\n", - "- For @tool(args_schema=...), check Pydantic v1 vs v2 compatibility; known fixes include downgrading to pydantic==1.10.x for older LangChain versions or upgrading LangChain; include exact error strings and install commands.\n", - "\n", - "Output format:\n", - "- Return only the expanded query as a single paragraph or a short bullet list. No headings, no extra commentary.\n", - "2025/08/13 20:42:49 INFO dspy.evaluate.evaluate: Average Metric: 3.1666666666666665 / 5 (63.3%)\n", - "2025/08/13 20:42:49 INFO dspy.teleprompt.gepa.gepa: Iteration 6: New subsample score is not better, skipping\n", - "2025/08/13 20:42:49 INFO dspy.teleprompt.gepa.gepa: Iteration 7: Selected program 1 score: 0.5333333333333333\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 2.67 / 5 (53.3%): 100%|██████████| 5/5 [00:05<00:00, 1.09s/it]" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 20:42:55 INFO dspy.evaluate.evaluate: Average Metric: 2.6666666666666665 / 5 (53.3%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 20:44:13 INFO dspy.teleprompt.gepa.gepa: Iteration 7: Proposed new text for expand_query: You are given a user’s technical question. Your task is to expand it into a search-engine-optimized query that will help find authoritative answers, fixes, and examples.\n", - "\n", - "How to write the expanded query:\n", - "- Preserve the user’s core ask and add precise technical keywords: library/framework names, class/function/method names, parameters, config flags, environment variables, file/dir names, API endpoints, error messages (quoted exactly), and exact version numbers when present.\n", - "- Include likely adjacent terms: “how to,” “best practices,” “known issues/bugs,” “workaround,” “version compatibility,” “examples,” “code snippets,” “configuration,” “persistence,” “performance,” “billing/costs,” “API semantics,” “syntax,” “minimal reproducible example,” “MRE.”\n", - "- If the user provided code, extract and reference exact symbols (e.g., VectorstoreIndexCreator, RetrievalQA, as_retriever, JSONLoader, PromptTemplate, BaseCallbackHandler.on_llm_end, Redis similarity_distance_threshold, ChromaVectorStoreIndex, StorageContext.persist, PersistentClient, ChatHuggingFace, HuggingFacePipeline, AutoTokenizer, AutoModelForCausalLM, create_sql_agent, SQLDatabaseToolkit, etc.) and any error strings as quoted text.\n", - "- Combine the main question with sub-questions that probe root causes, correct configuration and API usage, troubleshooting steps, version compatibility, and alternatives. Ask for concrete, correct configuration parameters and minimal reproducible code snippets.\n", - "- Keep the output concise: one short paragraph or a compact bullet list. Do not add explanations or meta commentary—only the expanded query content. No headings.\n", - "\n", - "Domain-specific guidance and nuggets to include when relevant:\n", - "\n", - "LangChain vector stores, embeddings, persistence:\n", - "- Ask about VectorstoreIndexCreator defaults and persistence. Surface that a common setup involves Chroma (duckdb+parquet); DuckDB in-memory behavior vs persistence; how to reload persisted stores with persist_directory; how to reuse persisted stores (DB files, .pkl) for RetrievalQA retrievers.\n", - "- Include OpenAI embedding costs: embeddings incur charges when created; loading a persisted vector store does not re-embed or re-charge; how to avoid re-embedding each run.\n", - "- Request exact code patterns to save/load reusable indexes and retrievers, and ensuring the same embedding function is supplied on load.\n", - "\n", - "Chroma + LlamaIndex + LangChain integration issues:\n", - "- Note Chroma persistence is handled by the Chroma client (prefer chromadb.PersistentClient); manual StorageContext.persist may be unnecessary if the client persists.\n", - "- Include version compatibility/usage (e.g., chromadb 0.3.x, llama-index 0.6.x, langchain 0.0.245). Suggest using LlamaIndex alone when appropriate, proper PromptTemplate usage, OpenAIEmbeddings, and correct OpenAI/Azure OpenAI configuration.\n", - "- Ask why query_engine/query returns None despite embeddings being created; include debugging steps (collection counts, embedding function continuity, metadata schema, document IDs, reload paths).\n", - "\n", - "Redis + LangChain retriever hybrid search:\n", - "- Include exact search_type (similarity_distance_threshold), how to pass filters in search_kwargs (not a separate filter arg unless documented), and correct RediSearch FT.SEARCH filter syntax.\n", - "- Surface a known bug in LangChain’s _prepare_range_query producing invalid queries with “Invalid attribute yield_distance_as”; filters should be placed before the vector range clause. Ask for fixes/upgrading to a version where this is resolved, or for correct manual query construction.\n", - "- Request examples of correct search_kwargs and full FT.SEARCH queries.\n", - "\n", - "langchain.js JSONLoader and JSON structure:\n", - "- Ask how to configure JSONLoader so each array element/object becomes one document (JSON pointer/selectors), not each property.\n", - "- Include checks for JSON syntax errors (e.g., missing commas) that cause misparsing, and mapping which field becomes pageContent vs metadata. Request examples for querying fields like “2 years experience” or “javascript” with a proper schema.\n", - "\n", - "LangChain callbacks with VertexAI:\n", - "- Clarify that callbacks must be attached to the LLM instance (not only LLMChain) and the correct BaseCallbackHandler signature is on_llm_end(response, **kwargs). Include version considerations and examples that print prompts/responses.\n", - "\n", - "HuggingFace + Falcon 40B Instruct + LangChain:\n", - "- If seeing “HfHubHTTPError: 401 Client Error: Unauthorized,” note it can be due to lack of access to the Hugging Face Inference Endpoints or invalid HUGGINGFACEHUB_API_TOKEN; ask how to resolve permissions.\n", - "- For fully local inference, ask how to load and run Falcon (e.g., tiiuae/falcon-40b-instruct) via transformers: AutoTokenizer, AutoModelForCausalLM, and a text-generation pipeline; then wrap with LangChain’s HuggingFacePipeline (instead of relying on ChatHuggingFace for Hub inference). Include model_kwargs (max_new_tokens, top_k, temperature, repetition_penalty), GPU/quantization (bitsandbytes), and minimal examples that integrate RetrievalQA.\n", - "\n", - "LangChain SQL with GPT-4 chat models:\n", - "- Note that SQLDatabaseChain with ChatOpenAI (gpt-4-0613) can return extra text causing sqlite3.OperationalError; ask about using create_sql_agent with SQLDatabaseToolkit to handle tool execution and strict SQL parsing/output.\n", - "- Include prompts/templates or OutputParsers that force SQL-only output, and version compatibility notes for langchain, openai, sqlalchemy. Request MREs that work with GPT-4.\n", - "\n", - "Pydantic + @tool args_schema:\n", - "- If encountering “ValidationError: args_schema subclass of BaseModel expected,” surface Pydantic v1 vs v2 compatibility; suggest downgrading to pydantic==1.10.10 if needed. Include install commands (pip install pydantic==1.10.10) or pinning in requirements.txt.\n", - "\n", - "Output format:\n", - "- Return only the expanded query as a single paragraph or a compact bullet list. No extra commentary, no headings, no heavy formatting.\n", - "2025/08/13 20:44:17 INFO dspy.evaluate.evaluate: Average Metric: 3.0 / 5 (60.0%)\n", - "2025/08/13 20:44:25 INFO dspy.evaluate.evaluate: Average Metric: 8.5 / 15 (56.7%)\n", - "2025/08/13 20:44:25 INFO dspy.teleprompt.gepa.gepa: Iteration 7: Full valset score for new program: 0.5666666666666667\n", - "2025/08/13 20:44:25 INFO dspy.teleprompt.gepa.gepa: Iteration 7: Full train_val score for new program: 0.5666666666666667\n", - "2025/08/13 20:44:25 INFO dspy.teleprompt.gepa.gepa: Iteration 7: Individual valset scores for new program: [0.5, 1.0, 0.0, 1.0, 1.0, 1.0, 0.3333333333333333, 0.3333333333333333, 0.0, 1.0, 1.0, 0.3333333333333333, 0.0, 0.5, 0.5]\n", - "2025/08/13 20:44:25 INFO dspy.teleprompt.gepa.gepa: Iteration 7: New valset pareto front scores: [0.5, 1.0, 0.0, 1.0, 1.0, 1.0, 0.3333333333333333, 0.3333333333333333, 1.0, 1.0, 1.0, 0.3333333333333333, 1.0, 0.5, 0.75]\n", - "2025/08/13 20:44:25 INFO dspy.teleprompt.gepa.gepa: Iteration 7: Full valset pareto front score: 0.7166666666666667\n", - "2025/08/13 20:44:25 INFO dspy.teleprompt.gepa.gepa: Iteration 7: Updated valset pareto front programs: [{3, 4}, {0, 1, 2, 3, 4}, {0, 1, 2, 3, 4}, {1, 2, 4}, {0, 1, 2, 3, 4}, {0, 1, 3, 4}, {0, 1, 2, 3, 4}, {0, 1, 2, 3, 4}, {0, 3}, {0, 1, 2, 3, 4}, {0, 1, 2, 3, 4}, {0, 1, 3, 4}, {3}, {0, 1, 2, 3, 4}, {2, 3}]\n", - "2025/08/13 20:44:25 INFO dspy.teleprompt.gepa.gepa: Iteration 7: Best valset aggregate score so far: 0.6833333333333333\n", - "2025/08/13 20:44:25 INFO dspy.teleprompt.gepa.gepa: Iteration 7: Best program as per aggregate score on train_val: 3\n", - "2025/08/13 20:44:25 INFO dspy.teleprompt.gepa.gepa: Iteration 7: Best program as per aggregate score on valset: 3\n", - "2025/08/13 20:44:25 INFO dspy.teleprompt.gepa.gepa: Iteration 7: Best score on valset: 0.6833333333333333\n", - "2025/08/13 20:44:25 INFO dspy.teleprompt.gepa.gepa: Iteration 7: Best score on train_val: 0.6833333333333333\n", - "2025/08/13 20:44:25 INFO dspy.teleprompt.gepa.gepa: Iteration 7: Linear pareto front program index: 3\n", - "2025/08/13 20:44:25 INFO dspy.teleprompt.gepa.gepa: Iteration 7: New program candidate index: 4\n", - "2025/08/13 20:44:25 INFO dspy.teleprompt.gepa.gepa: Iteration 8: No merge candidates found\n", - "2025/08/13 20:44:25 INFO dspy.teleprompt.gepa.gepa: Iteration 8: Selected program 3 score: 0.6833333333333333\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 4.75 / 5 (95.0%): 100%|██████████| 5/5 [00:04<00:00, 1.06it/s] " - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 20:44:29 INFO dspy.evaluate.evaluate: Average Metric: 4.75 / 5 (95.0%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 20:45:46 INFO dspy.teleprompt.gepa.gepa: Iteration 8: Proposed new text for expand_query: You expand developer questions into a single, concise search-query paragraph that helps a search engine retrieve high-signal, authoritative fixes and examples.\n", - "\n", - "Output format\n", - "- Return only the expanded search query as one paragraph (no headings or extra commentary), 2–5 sentences total.\n", - "\n", - "How to expand\n", - "1) Extract the exact technologies, libraries, versions, models, classes/functions, parameters, file paths, and any error messages (quote errors verbatim). Identify the user’s goal and the failure point.\n", - "2) Add synonyms and related names (e.g., LlamaIndex/llama_index/gpt_index; ChatOpenAI vs OpenAI wrappers; HuggingFacePipeline vs transformers.pipeline; OpenAI Python SDK 1.x “from openai import OpenAI” vs legacy “import openai”), plus common misconfigs and breaking changes.\n", - "3) Anticipate root causes and fixes: version/compatibility issues, correct imports, supported tasks, required parameters/flags, environment variables, install commands and version pins, and minimal repro patterns. Include both modern and legacy import paths where relevant.\n", - "4) Include keywords for the “expected correct approach”: exact class/function names, model IDs, flags, correct API/module names, loader/retriever/vector store usage, and example code terms likely to surface canonical docs and GitHub issues.\n", - "5) Keep it targeted and precise; emphasize specific, likely solutions over generic advice.\n", - "\n", - "Domain-specific nuggets to always include when relevant\n", - "- OpenAI chat sessions and context:\n", - " - The Chat Completions API maintains no server-side session state; you must resend conversation history each call. Include “no server-side session,” “conversation ID not supported,” “how to maintain context,” “message history management,” token limits, summarizing/truncating history, and client-side memory patterns like ConversationBufferMemory, ConversationSummaryMemory, ConversationBufferMemoryHistory, RunnableWithMessageHistory, and caching.\n", - "- Using OpenAI-compatible local endpoints with LangChain:\n", - " - Use ChatOpenAI (or OpenAI) with a custom base URL (api_base/base_url/openai_api_base) pointing to localhost; set a dummy openai_api_key if required. Include environment variables (OPENAI_API_KEY plus OPENAI_BASE_URL or OPENAI_API_BASE), parameters (model/model_name), “OpenAI-compatible server,” “no custom client,” and “works out of the box.”\n", - "- LangChain imports and package split:\n", - " - For errors like ModuleNotFoundError: No module named 'langchain_openai', note package splits and correct installs/imports (langchain, langchain-community, langchain-core, langchain-openai; older: ChatOpenAI from langchain_community.chat_models). Include “correct import path,” “installation steps,” and “breaking changes.”\n", - "- Hugging Face + LangChain:\n", - " - HuggingFacePipeline supports only text-generation, text2text-generation, and summarization; it does not support ASR. Errors like \"AttributeError: 'WhisperProcessor' object has no attribute 'config'\" occur when misused. Include “transformers.pipeline('automatic-speech-recognition')”, correct ASR components (AutoModelForSpeechSeq2Seq/WhisperForConditionalGeneration + WhisperProcessor/feature extractor), and that LangChain’s HuggingFacePipeline isn’t appropriate for ASR.\n", - " - For fully local inference (e.g., Falcon 40B instruct), load with transformers (AutoTokenizer, AutoModelForCausalLM), create a text-generation pipeline, then wrap with HuggingFacePipeline or ChatHuggingFace. Include device_map=\"auto\", torch_dtype, trust_remote_code=True (Falcon), 4-bit/8-bit quantization via bitsandbytes, avoiding HfHubHTTPError: 401 by not using HuggingFaceHub, and keywords like “example code,” “local weights,” “no token,” “integration with RetrievalQA.”\n", - "- Chroma vector store:\n", - " - Chroma .get excludes embeddings by default; to retrieve them, use include=['embeddings'] (otherwise embeddings: None). Include persist(), persist_directory, and verifying files (e.g., chroma-embeddings.parquet/chroma.sqlite) if persistence matters.\n", - "- Vector store persistence and reloading (LangChain):\n", - " - Avoid re-embedding by persisting and reloading the same vector store (e.g., Chroma.from_documents with persist_directory + persist(); later Chroma(persist_directory=..., embedding_function=...)). Mention VectorstoreIndexCreator usage, how to supply persist_directory, and reusing retrievers in RetrievalQA. Clarify that embedding calls incur cost; loading/querying persisted vectors does not re-embed.\n", - "- chromadb installation/runtime:\n", - " - ImportError or install loops often stem from Python version incompatibility; Python 3.10 typically resolves chromadb issues. Include virtualenv/conda guidance, exact pip install commands, and checking conflicting dependencies.\n", - "- LangChain tools + Pydantic:\n", - " - For @tool(args_schema=...) ValidationError about BaseModel subclass, it’s often Pydantic v2 vs LangChain compatibility; pin pydantic==1.10.10 (pip install pydantic==1.10.10).\n", - "- LlamaIndex + LangChain RAG integration:\n", - " - For sentence window retrieval, use llama_index’s SentenceWindowNodeParser, VectorStoreIndex, MetadataReplacementPostProcessor, LLMRerank, and sentence_window_engine to query. Inject retrieved snippets into LangChain prompts via placeholders (e.g., {context}, {summary}, {messages}); include memory/token management terms (ConversationBufferMemory, ConversationBufferMemoryHistory) and evaluation metrics (latency, token usage).\n", - "- SQL with LangChain:\n", - " - GPT-4/ChatOpenAI with SQLDatabaseChain may output commentary causing sqlite3.OperationalError near \"The\": syntax error. Include “create_sql_agent,” “SQLDatabaseToolkit,” strict output parsers/prompt templates enforcing raw SQL, and “use_query_checker” behavior; mention model_name=\"gpt-4-0613” vs older models.\n", - "- LangChain.js JSON loading:\n", - " - JSONLoader/DirectoryLoader can split each property into separate Documents if not configured. Include fixing invalid JSON (missing commas), and configuring JSONLoader to treat each array element as one document using pointer/jq (e.g., pointer: '/[]' or jq selectors), textKey/metadata pointers, or a custom loader. Mention querying by skills/experience and returning full objects vs fields.\n", - "\n", - "General best-practice keywords to weave in\n", - "- Exact model IDs and classes: openai/whisper-large-v2, ChatOpenAI, OpenAI, HuggingFacePipeline, ChatHuggingFace, RetrievalQA, Chroma.from_documents, .get(include=['embeddings']), VectorstoreIndexCreator, @tool args_schema, SentenceWindowNodeParser, VectorStoreIndex, LLMRerank, MetadataReplacementPostProcessor, create_sql_agent, SQLDatabaseToolkit.\n", - "- Search intents: “correct usage,” “supported tasks,” “version compatibility,” “install command,” “example code,” “breaking changes,” “why,” “how to fix,” “dummy API key,” “custom base URL,” “persist_directory,” “Python 3.10,” “pydantic==1.10.10,” “token limits,” “no server-side session,” “message history,” “conversation ID not supported.”\n", - "\n", - "Style\n", - "- Be specific and action-oriented; use concrete class/param names, model IDs, and troubleshooting terms developers actually search for. Quote error strings verbatim. Avoid fluff. One paragraph, 2–5 sentences.\n", - "2025/08/13 20:45:52 INFO dspy.evaluate.evaluate: Average Metric: 4.75 / 5 (95.0%)\n", - "2025/08/13 20:45:52 INFO dspy.teleprompt.gepa.gepa: Iteration 8: New subsample score is not better, skipping\n", - "2025/08/13 20:45:52 INFO dspy.teleprompt.gepa.gepa: Iteration 9: Selected program 4 score: 0.5666666666666667\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 1.83 / 5 (36.7%): 100%|██████████| 5/5 [00:03<00:00, 1.27it/s]" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 20:45:56 INFO dspy.evaluate.evaluate: Average Metric: 1.8333333333333333 / 5 (36.7%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 20:46:54 INFO dspy.teleprompt.gepa.gepa: Iteration 9: Proposed new text for expand_query: You expand a user’s technical question into a search-engine-optimized query that reliably surfaces authoritative answers, fixes, and examples.\n", - "\n", - "How to write the expanded query:\n", - "- Preserve the user’s core ask. Add exact technical keywords and artifacts found in the question/code: library/framework names, class/function/method names, parameters and flags, environment variables, file/dir names, API endpoints, error messages (quoted exactly), and version numbers when present.\n", - "- Include likely adjacent and intent-revealing terms: “how to,” “best practices,” “known issues/bugs,” “workaround,” “version compatibility,” “examples,” “code snippets,” “configuration,” “persistence,” “performance,” “billing/costs,” “API semantics,” “syntax,” “minimal reproducible example,” “MRE,” “debugging,” “troubleshooting.”\n", - "- If the user provided code, extract and include exact symbols and identifiers (e.g., VectorstoreIndexCreator, RetrievalQA, as_retriever, JSONLoader, PromptTemplate, BaseCallbackHandler.on_llm_end, Redis similarity_distance_threshold, ChromaVectorStoreIndex, StorageContext.persist, PersistentClient, ChatHuggingFace, HuggingFacePipeline, AutoTokenizer, AutoModelForCausalLM, create_sql_agent, SQLDatabaseToolkit, SentenceWindowNodeParser, LLMRerank, MetadataReplacementPostProcessor, SimpleDirectoryReader, GPTVectorStoreIndex, ServiceContext, Chroma.from_documents, db.get(include=['embeddings']), etc.) and any error strings quoted exactly.\n", - "- Combine the main ask with sub-questions that probe root causes, correct configuration and API usage, debugging steps, version compatibility, and viable alternatives. Ask for concrete configuration parameters and minimal reproducible code snippets/patterns that work.\n", - "\n", - "Domain-specific guidance to include when relevant:\n", - "- LangChain vector stores, embeddings, persistence:\n", - " - VectorstoreIndexCreator defaults and persistence; Chroma (duckdb+parquet) common setup; DuckDB in-memory vs persist; how to reload persisted stores with persist_directory; how to reuse persisted DB files and retrievers without re-embedding.\n", - " - OpenAI embedding costs: embeddings incur charges on creation; loading a persisted vector store does not re-embed or re-charge; how to avoid re-embedding each run by passing the same embedding function on load.\n", - " - Request exact save/load patterns for reusable indexes/retrievers and ensuring the same embedding function at load time.\n", - "- Chroma + LlamaIndex + LangChain integration:\n", - " - Chroma persistence is handled by the Chroma client (prefer chromadb.PersistentClient); manual StorageContext.persist may be unnecessary if the client persists.\n", - " - Version compatibility/usage (e.g., chromadb 0.3.x, llama-index 0.6.x, langchain 0.0.245 or 0.1.x+).\n", - " - Consider using LlamaIndex alone when appropriate; proper PromptTemplate usage; using OpenAIEmbeddings and OpenAI/Azure OpenAI LLMs with correct configuration.\n", - " - Debugging “query returns None” despite embeddings: check collection counts, ensure embedding function continuity across save/load, validate metadata schema and document IDs, verify reload paths and persist directories.\n", - "- Chroma embeddings visibility:\n", - " - By default Chroma .get() omits vectors; use .get(include=['embeddings']) to retrieve embeddings; verify chroma-embeddings.parquet and collection counts.\n", - "- Redis + LangChain hybrid search:\n", - " - Use correct search_type (e.g., similarity_distance_threshold); pass filters via search_kwargs; ensure valid RediSearch FT.SEARCH filter syntax.\n", - " - Known bug: LangChain’s _prepare_range_query can emit “Invalid attribute yield_distance_as”; place filters before vector range clause; upgrade to a fixed version or construct FT.SEARCH manually; include working search_kwargs and full FT.SEARCH examples.\n", - "- langchain.js JSONLoader:\n", - " - Configure JSON pointers/selectors so each array element/object becomes one document; map pageContent vs metadata; check JSON syntax (e.g., missing commas); include examples for querying fields like “2 years experience” or “javascript.”\n", - "- LangChain callbacks with VertexAI:\n", - " - Attach callbacks to the LLM instance (not only LLMChain); handler signature BaseCallbackHandler.on_llm_end(response, **kwargs); include version considerations and examples that print prompts/responses.\n", - "- HuggingFace + Falcon 40B Instruct + LangChain:\n", - " - “HfHubHTTPError: 401 Client Error: Unauthorized” can indicate missing access or invalid HUGGINGFACEHUB_API_TOKEN; how to resolve permissions.\n", - " - For local inference: load via transformers (AutoTokenizer, AutoModelForCausalLM) and a text-generation pipeline, then wrap with LangChain’s HuggingFacePipeline; include model_kwargs (max_new_tokens, top_k, temperature, repetition_penalty), GPU/quantization (bitsandbytes), and minimal RetrievalQA integration.\n", - "- LangChain SQL with GPT-4 chat models:\n", - " - SQLDatabaseChain with ChatOpenAI (e.g., gpt-4-0613) may return extra text causing sqlite3.OperationalError; prefer create_sql_agent with SQLDatabaseToolkit; include prompts/templates or OutputParsers enforcing SQL-only; note langchain/openai/sqlalchemy version compatibility and provide MREs.\n", - "- Pydantic + @tool args_schema:\n", - " - “ValidationError: args_schema subclass of BaseModel expected” can be Pydantic v1 vs v2; suggest pinning pydantic==1.10.10 if needed; include exact install commands.\n", - "- LangChain packaging/imports:\n", - " - If “ModuleNotFoundError: No module named 'langchain_openai'”, clarify versioned imports: either install the provider package (pip install langchain-openai) and use from langchain_openai import ChatOpenAI (new packaging), or use from langchain_community.chat_models import ChatOpenAI / legacy imports depending on LangChain version; verify installed packages and Python environment.\n", - "- LlamaIndex + LangChain RAG chatbot integration:\n", - " - Use llama_index retrieval engines (e.g., sentence_window_engine with SentenceWindowNodeParser, MetadataReplacementPostProcessor, LLMRerank) to fetch context; inject retrieved context into the chatbot prompt; modify prompts to include placeholders for relevant context, prior summary, and current user input; manage memory and token usage (e.g., ConversationBufferMemory/summary).\n", - " - Persist/reload indexes with StorageContext.persist and load_index_from_storage, maintaining consistent ServiceContext and embed model.\n", - "- Whisper ASR and diarization:\n", - " - LangChain’s HuggingFacePipeline supports text-generation, text2text-generation, summarization—“automatic-speech-recognition” is not supported there.\n", - " - For Whisper ASR use transformers pipeline(\"automatic-speech-recognition\") with Whisper model (e.g., AutoModelForSeq2SeqLM.from_pretrained('openai/whisper-large-v2')) plus WhisperProcessor; WhisperProcessor has no .config (the error’s cause).\n", - " - For speaker diarization, use pyannote.audio or dedicated diarization pipelines and combine with ASR outputs.\n", - "\n", - "Output format:\n", - "- Return only the expanded query as a single short paragraph or a compact bullet list. No explanations, no meta commentary, no headings, and avoid heavy formatting.\n", - "2025/08/13 20:47:00 INFO dspy.evaluate.evaluate: Average Metric: 2.333333333333333 / 5 (46.7%)\n", - "2025/08/13 20:47:09 INFO dspy.evaluate.evaluate: Average Metric: 8.5 / 15 (56.7%)\n", - "2025/08/13 20:47:09 INFO dspy.teleprompt.gepa.gepa: Iteration 9: Full valset score for new program: 0.5666666666666667\n", - "2025/08/13 20:47:09 INFO dspy.teleprompt.gepa.gepa: Iteration 9: Full train_val score for new program: 0.5666666666666667\n", - "2025/08/13 20:47:09 INFO dspy.teleprompt.gepa.gepa: Iteration 9: Individual valset scores for new program: [0.5, 1.0, 0.0, 1.0, 1.0, 1.0, 0.3333333333333333, 0.3333333333333333, 0.0, 1.0, 1.0, 0.3333333333333333, 0.0, 0.5, 0.5]\n", - "2025/08/13 20:47:09 INFO dspy.teleprompt.gepa.gepa: Iteration 9: New valset pareto front scores: [0.5, 1.0, 0.0, 1.0, 1.0, 1.0, 0.3333333333333333, 0.3333333333333333, 1.0, 1.0, 1.0, 0.3333333333333333, 1.0, 0.5, 0.75]\n", - "2025/08/13 20:47:09 INFO dspy.teleprompt.gepa.gepa: Iteration 9: Full valset pareto front score: 0.7166666666666667\n", - "2025/08/13 20:47:09 INFO dspy.teleprompt.gepa.gepa: Iteration 9: Updated valset pareto front programs: [{3, 4, 5}, {0, 1, 2, 3, 4, 5}, {0, 1, 2, 3, 4, 5}, {1, 2, 4, 5}, {0, 1, 2, 3, 4, 5}, {0, 1, 3, 4, 5}, {0, 1, 2, 3, 4, 5}, {0, 1, 2, 3, 4, 5}, {0, 3}, {0, 1, 2, 3, 4, 5}, {0, 1, 2, 3, 4, 5}, {0, 1, 3, 4, 5}, {3}, {0, 1, 2, 3, 4, 5}, {2, 3}]\n", - "2025/08/13 20:47:09 INFO dspy.teleprompt.gepa.gepa: Iteration 9: Best valset aggregate score so far: 0.6833333333333333\n", - "2025/08/13 20:47:09 INFO dspy.teleprompt.gepa.gepa: Iteration 9: Best program as per aggregate score on train_val: 3\n", - "2025/08/13 20:47:09 INFO dspy.teleprompt.gepa.gepa: Iteration 9: Best program as per aggregate score on valset: 3\n", - "2025/08/13 20:47:09 INFO dspy.teleprompt.gepa.gepa: Iteration 9: Best score on valset: 0.6833333333333333\n", - "2025/08/13 20:47:09 INFO dspy.teleprompt.gepa.gepa: Iteration 9: Best score on train_val: 0.6833333333333333\n", - "2025/08/13 20:47:09 INFO dspy.teleprompt.gepa.gepa: Iteration 9: Linear pareto front program index: 3\n", - "2025/08/13 20:47:09 INFO dspy.teleprompt.gepa.gepa: Iteration 9: New program candidate index: 5\n", - "2025/08/13 20:47:09 INFO dspy.teleprompt.gepa.gepa: Iteration 10: No merge candidates found\n", - "2025/08/13 20:47:09 INFO dspy.teleprompt.gepa.gepa: Iteration 10: Selected program 5 score: 0.5666666666666667\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 1.92 / 5 (38.3%): 100%|██████████| 5/5 [00:05<00:00, 1.18s/it] " - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 20:47:15 INFO dspy.evaluate.evaluate: Average Metric: 1.9166666666666665 / 5 (38.3%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 20:48:28 INFO dspy.teleprompt.gepa.gepa: Iteration 10: Proposed new text for expand_query: You expand a user’s technical question into a search‑engine‑optimized query that reliably surfaces authoritative answers, fixes, and examples.\n", - "\n", - "How to write the expanded query:\n", - "- Preserve the user’s core ask and context. Include exact technical keywords and artifacts present in the question/code: library/framework names, package names, classes/functions/methods, parameters/flags, environment variables, file/dir names, API endpoints, CLI commands, error messages (quoted exactly), stack traces, and version numbers when present.\n", - "- Extract and include exact identifiers from any code: e.g., VectorStoreIndex, StorageContext.persist, load_index_from_storage, ServiceContext, SentenceWindowNodeParser, MetadataReplacementPostProcessor, LLMRerank, as_query_engine, ChatPromptTemplate, MessagesPlaceholder, ChatMessageHistory, ChatOpenAI(model=\"gpt-4-0613\"), SQLDatabaseChain.from_llm, create_sql_agent, SQLDatabaseToolkit, RetrievalQA.from_chain_type, HuggingFacePipeline, ChatHuggingFace, AutoTokenizer, AutoModelForCausalLM, JSONLoader, DirectoryLoader, .get(include=['embeddings']), PersistentClient, similarity_distance_threshold, _prepare_range_query, pipeline(\"automatic-speech-recognition\"), BaseCallbackHandler.on_llm_end, etc.\n", - "- Add intent-revealing and adjacent terms to improve recall: “how to,” “best practices,” “known issues/bugs,” “workaround,” “version compatibility,” “examples,” “code snippets,” “configuration,” “persistence,” “performance,” “billing/costs,” “API semantics,” “syntax,” “minimal reproducible example,” “MRE,” “debugging,” “troubleshooting,” “error handling.”\n", - "- Combine the main ask with concrete sub‑questions that probe root causes, correct configuration and API usage, debugging steps, version compatibility, and viable alternatives. Ask for:\n", - " - Minimal reproducible code snippets/patterns that work end‑to‑end.\n", - " - Exact configuration parameters and save/load patterns.\n", - " - Version pins and install/upgrade/downgrade commands when relevant.\n", - " - How to test/verify success (e.g., collection counts, persisted files present, embeddings visibility).\n", - "\n", - "Domain‑specific guidance (include when relevant):\n", - "- LangChain vector stores, embeddings, persistence:\n", - " - Chroma persistence: prefer chromadb.PersistentClient with persist_directory; DuckDB in‑memory vs persisted (duckdb+parquet). Reload persisted stores via persist_directory without re‑embedding. Loading a persisted vector store does not re‑embed or re‑charge (OpenAI embeddings); avoid re‑embedding by passing the same embedding function on load. Request exact save/load patterns for reusable indexes/retrievers and ensuring the same embedding function at load time.\n", - " - Chroma embeddings visibility: Chroma .get() omits vectors by default; use .get(include=['embeddings']); verify chroma-collections.parquet / chroma-embeddings.parquet and collection counts.\n", - "- Chroma + LlamaIndex + LangChain integration:\n", - " - When using Chroma via LlamaIndex, persistence is handled by the Chroma client; manual StorageContext.persist may be unnecessary if the Chroma client persists. Note version compatibility (e.g., chromadb 0.3.x, llama-index 0.6.x+, langchain 0.0.245 or 0.1.x+).\n", - " - Prefer using LlamaIndex retrieval engines directly when appropriate; correct PromptTemplate usage in LangChain; using OpenAIEmbeddings with OpenAI/Azure OpenAI LLMs.\n", - " - Debug “query returns None” despite embeddings: check collection counts, ensure embedding function continuity across save/load, validate metadata schema and document IDs, verify reload paths and persist directories.\n", - "- LlamaIndex + LangChain RAG chatbot integration:\n", - " - Use llama_index retrieval engines (e.g., sentence_window_engine with SentenceWindowNodeParser, MetadataReplacementPostProcessor, LLMRerank) to query the document base for relevant context; then inject the retrieved snippets into the chatbot prompt.\n", - " - Modify the chat prompt to include placeholders for retrieved context, prior conversation summary/memory, and current user input; manage memory and token usage (e.g., ConversationBufferMemory/summary).\n", - " - Ask for working examples that pipe llama_index retrieval outputs into LangChain’s ChatPromptTemplate/ChatMessageHistory.\n", - "- Redis + LangChain hybrid search:\n", - " - Use the correct search_type (e.g., \"similarity_distance_threshold\"); pass filters via search_kwargs with valid RediSearch FT.SEARCH syntax.\n", - " - Known bug: LangChain’s _prepare_range_query can emit “Invalid attribute yield_distance_as”; place filters before the vector range clause; upgrade to a fixed version or construct FT.SEARCH manually; include working search_kwargs and full FT.SEARCH examples.\n", - "- LangChain SQL with GPT‑4 chat models:\n", - " - SQLDatabaseChain with ChatOpenAI (e.g., model_name=\"gpt-4-0613\") may return extra explanatory text alongside SQL, causing sqlite3.OperationalError near \"The\": syntax error. Prefer create_sql_agent with SQLDatabaseToolkit to reliably execute SQL; or enforce SQL‑only via prompts/output parsers. Include MREs and version compatibility for langchain/openai/sqlalchemy.\n", - "- langchain.js JSONLoader:\n", - " - Configure JSON pointers/selectors so each array element/object becomes one Document; avoid splitting each key into a separate Document. Fix JSON syntax issues (e.g., missing commas). Include examples mapping pageContent vs metadata for queries like “2 years experience” or “javascript.”\n", - "- LangChain callbacks with VertexAI:\n", - " - Attach callbacks to the LLM instance (not only LLMChain). Handler signature BaseCallbackHandler.on_llm_end(response, **kwargs). Include version considerations and examples.\n", - "- HuggingFace + Falcon 40B Instruct + LangChain:\n", - " - “HfHubHTTPError: 401 Client Error: Unauthorized” often means missing access or invalid HUGGINGFACEHUB_API_TOKEN. For fully local inference, load via transformers: AutoTokenizer, AutoModelForCausalLM, and a text-generation pipeline; wrap with LangChain’s HuggingFacePipeline or ChatHuggingFace. Include model_kwargs (max_new_tokens, top_k, temperature, repetition_penalty), GPU/quantization (bitsandbytes), and minimal RetrievalQA integration.\n", - "- Pydantic + @tool args_schema:\n", - " - “ValidationError: args_schema subclass of BaseModel expected” can be a Pydantic v1 vs v2 mismatch. Suggest pinning pydantic==1.10.10; include exact commands (e.g., pip install \"pydantic==1.10.10\") or add to requirements.txt.\n", - "- LangChain packaging/imports:\n", - " - If “ModuleNotFoundError: No module named 'langchain_openai'”, clarify versioned imports: install provider package (pip install langchain-openai) and use from langchain_openai import ChatOpenAI (new packaging), or use legacy imports matching the installed LangChain version.\n", - "- Whisper ASR and diarization:\n", - " - LangChain’s HuggingFacePipeline supports text-generation/text2text/summarization; “automatic-speech-recognition” is not supported there. For Whisper ASR use transformers pipeline(\"automatic-speech-recognition\") with Whisper models and WhisperProcessor (which has no .config). For diarization, use pyannote.audio and merge with ASR outputs.\n", - "\n", - "Quality checklist before you output:\n", - "- Restate the user’s exact goal plus environment (models, versions, frameworks) and artifacts (functions/classes/params/paths).\n", - "- Explicitly request MREs, correct configuration, persistence patterns, verification steps, and known issues/bugs/workarounds.\n", - "- For chatbot + retrieval questions, explicitly mention: using the retrieval engine to fetch context, injecting it into the prompt, and editing prompts to include placeholders for retrieved context, prior summary, and current input.\n", - "- For SQL with GPT‑4 issues, explicitly mention the failure mode (extra text in SQL) and the recommended create_sql_agent + SQLDatabaseToolkit workaround.\n", - "- For auth/version/tooling issues, include exact install/downgrade/upgrade commands and package names.\n", - "\n", - "Output format:\n", - "- Return only the expanded query as a single short paragraph or a compact bullet list. No explanations, no headings, no meta‑commentary, and avoid heavy formatting.\n", - "2025/08/13 20:48:31 INFO dspy.evaluate.evaluate: Average Metric: 2.1666666666666665 / 5 (43.3%)\n", - "2025/08/13 20:48:39 INFO dspy.evaluate.evaluate: Average Metric: 9.25 / 15 (61.7%)\n", - "2025/08/13 20:48:39 INFO dspy.teleprompt.gepa.gepa: Iteration 10: Full valset score for new program: 0.6166666666666667\n", - "2025/08/13 20:48:39 INFO dspy.teleprompt.gepa.gepa: Iteration 10: Full train_val score for new program: 0.6166666666666667\n", - "2025/08/13 20:48:39 INFO dspy.teleprompt.gepa.gepa: Iteration 10: Individual valset scores for new program: [0.5, 1.0, 0.0, 1.0, 1.0, 1.0, 0.3333333333333333, 0.3333333333333333, 1.0, 1.0, 1.0, 0.3333333333333333, 0.0, 0.5, 0.25]\n", - "2025/08/13 20:48:39 INFO dspy.teleprompt.gepa.gepa: Iteration 10: New valset pareto front scores: [0.5, 1.0, 0.0, 1.0, 1.0, 1.0, 0.3333333333333333, 0.3333333333333333, 1.0, 1.0, 1.0, 0.3333333333333333, 1.0, 0.5, 0.75]\n", - "2025/08/13 20:48:39 INFO dspy.teleprompt.gepa.gepa: Iteration 10: Full valset pareto front score: 0.7166666666666667\n", - "2025/08/13 20:48:39 INFO dspy.teleprompt.gepa.gepa: Iteration 10: Updated valset pareto front programs: [{3, 4, 5, 6}, {0, 1, 2, 3, 4, 5, 6}, {0, 1, 2, 3, 4, 5, 6}, {1, 2, 4, 5, 6}, {0, 1, 2, 3, 4, 5, 6}, {0, 1, 3, 4, 5, 6}, {0, 1, 2, 3, 4, 5, 6}, {0, 1, 2, 3, 4, 5, 6}, {0, 3, 6}, {0, 1, 2, 3, 4, 5, 6}, {0, 1, 2, 3, 4, 5, 6}, {0, 1, 3, 4, 5, 6}, {3}, {0, 1, 2, 3, 4, 5, 6}, {2, 3}]\n", - "2025/08/13 20:48:39 INFO dspy.teleprompt.gepa.gepa: Iteration 10: Best valset aggregate score so far: 0.6833333333333333\n", - "2025/08/13 20:48:39 INFO dspy.teleprompt.gepa.gepa: Iteration 10: Best program as per aggregate score on train_val: 3\n", - "2025/08/13 20:48:39 INFO dspy.teleprompt.gepa.gepa: Iteration 10: Best program as per aggregate score on valset: 3\n", - "2025/08/13 20:48:39 INFO dspy.teleprompt.gepa.gepa: Iteration 10: Best score on valset: 0.6833333333333333\n", - "2025/08/13 20:48:39 INFO dspy.teleprompt.gepa.gepa: Iteration 10: Best score on train_val: 0.6833333333333333\n", - "2025/08/13 20:48:39 INFO dspy.teleprompt.gepa.gepa: Iteration 10: Linear pareto front program index: 3\n", - "2025/08/13 20:48:39 INFO dspy.teleprompt.gepa.gepa: Iteration 10: New program candidate index: 6\n", - "2025/08/13 20:48:39 INFO dspy.teleprompt.gepa.gepa: Iteration 11: No merge candidates found\n", - "2025/08/13 20:48:39 INFO dspy.teleprompt.gepa.gepa: Iteration 11: Selected program 3 score: 0.6833333333333333\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 3.50 / 5 (70.0%): 100%|██████████| 5/5 [00:05<00:00, 1.03s/it] " - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 20:48:44 INFO dspy.evaluate.evaluate: Average Metric: 3.5 / 5 (70.0%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 20:49:58 INFO dspy.teleprompt.gepa.gepa: Iteration 11: Proposed new text for expand_query: You expand developer questions into a single, concise search‑query paragraph that helps a search engine retrieve high‑signal, authoritative fixes and examples.\n", - "\n", - "Output format\n", - "- Return only the expanded search query as one paragraph (no headings or extra commentary), 2–5 sentences total.\n", - "\n", - "How to expand\n", - "1) Extract the exact technologies, libraries, versions, models, classes/functions, parameters, and any error messages (quote errors verbatim). Identify the user’s task and where it’s failing.\n", - "2) Add synonyms and related names (e.g., LlamaIndex/llama_index/gpt_index; ChatOpenAI vs OpenAI wrappers; HuggingFacePipeline vs transformers.pipeline), plus common misconfigurations and breaking changes.\n", - "3) Anticipate root causes and fixes: version/compatibility issues, correct imports, supported tasks, proper parameters/flags, environment variables, install commands, and version pins.\n", - "4) Include keywords for the “expected correct approach” developers would search for: exact class/function names, model IDs, flags, example code terms, correct API/module names, and minimal repro patterns.\n", - "5) Keep it targeted and precise; prefer specific, likely solutions over generic advice.\n", - "\n", - "Domain-specific nuggets to always include when relevant\n", - "- OpenAI chat sessions and context:\n", - " - The Chat Completions API does not maintain server-side session state; you must resend conversation history each call. Include best practices for client-side memory (ConversationSummaryMemory, ConversationBufferMemory, ConversationBufferMemoryHistory, RunnableWithMessageHistory), summarizing/truncating history, token limits, and caching. Include “no server-side session,” “conversation ID not supported,” “how to maintain context,” and “message history management.”\n", - "- Hugging Face + LangChain (ASR vs text generation):\n", - " - LangChain’s HuggingFacePipeline supports only text-generation, text2text-generation, and summarization; it does not support automatic-speech-recognition (ASR). Misuse can cause errors like \"AttributeError: 'WhisperProcessor' object has no attribute 'config'\".\n", - " - For ASR use transformers.pipeline(\"automatic-speech-recognition\") with the correct ASR components (AutoProcessor/WhisperProcessor + AutoModelForSpeechSeq2Seq or WhisperForConditionalGeneration) and do not wrap ASR with LangChain’s HuggingFacePipeline. Add related terms like WhisperX, pyannote.audio for speaker diarization if diarization is the goal.\n", - "- Fully local inference with transformers (e.g., Falcon 40B instruct):\n", - " - Load with transformers (AutoTokenizer, AutoModelForCausalLM) and create a text-generation pipeline, then wrap with HuggingFacePipeline or ChatHuggingFace. Include device_map=\"auto\", torch_dtype, trust_remote_code=True (Falcon), 4-bit/8-bit quantization via bitsandbytes, “local weights,” “no token,” and avoid HfHubHTTPError: 401 by not using HuggingFaceHub. Mention “example code” and “integration with RetrievalQA.”\n", - "- LangChain imports and package split:\n", - " - Recent package splits require correct installs/imports: langchain, langchain-core, langchain-community, langchain-openai. Fix ModuleNotFoundError: No module named 'langchain_openai' by installing langchain-openai or using the correct import path; alternatively import ChatOpenAI from langchain_community.chat_models in older setups. Include “correct import path,” “installation steps,” and “breaking changes.”\n", - "- Using OpenAI-compatible local endpoints with LangChain:\n", - " - Use ChatOpenAI (or OpenAI) with a custom base URL (api_base/base_url/openai_api_base) pointing to localhost; set a dummy openai_api_key if required. Include environment variables (OPENAI_API_KEY, OPENAI_BASE_URL), parameters (model/model_name), and “OpenAI-compatible server,” “works out of the box.”\n", - "- Chroma vector store:\n", - " - Chroma .get excludes embeddings by default; to retrieve them, use include=['embeddings'] (otherwise you’ll see 'embeddings': None). Include persist(), persist_directory, and verifying chroma-embeddings.parquet if persistence is relevant.\n", - "- VectorstoreIndexCreator persistence and billing:\n", - " - VectorstoreIndexCreator defaults to in-memory DuckDB (transient). To persist and reload, use Chroma with persist_directory and reload via Chroma(persist_directory=..., embedding_function=...). Embeddings are billed by OpenAI when created; loading from a persisted vector store does not re-embed or incur extra embedding costs (ensure the same embedding model on reload).\n", - "- LangChain tools + Pydantic:\n", - " - If using @tool(args_schema=...) and seeing a ValidationError about BaseModel subclass, it’s often Pydantic v2 vs LangChain compatibility; pin pydantic==1.10.10. Include the exact pin and pip install command.\n", - "- chromadb installation/runtime:\n", - " - ImportError or install loops can be due to Python version incompatibility; Python 3.10 commonly resolves chromadb issues. Include environment/version troubleshooting and virtualenv/conda hints.\n", - "- LlamaIndex + LangChain RAG integration:\n", - " - For sentence window retrieval, use llama_index’s SentenceWindowNodeParser, VectorStoreIndex, MetadataReplacementPostProcessor, LLMRerank, and sentence_window_engine to query. Inject retrieved snippets into LangChain chat context via prompt placeholders (e.g., {context}, {summary}, {messages}); include memory/token management terms and evaluation metrics (latency, token usage).\n", - "- SQL with LangChain:\n", - " - GPT-4/ChatOpenAI with SQLDatabaseChain may produce commentary causing sqlite3.OperationalError near \"The\": syntax error. Include “create_sql_agent,” “SQLDatabaseToolkit,” “output parser,” “prompt template enforcing raw SQL,” and “use_query_checker behavior,” and note model_name=\"gpt-4-0613” differences vs older models.\n", - "- LangChain callbacks:\n", - " - Callbacks should be attached to the LLM instance (e.g., VertexAI, ChatOpenAI) or via with_config(callbacks=[...]), not only to LLMChain. Implement BaseCallbackHandler with correct signatures, e.g., on_llm_end(self, response: LLMResult, **kwargs). “verbose” does not control callback invocation.\n", - "\n", - "Style\n", - "- Be specific and action-oriented; prefer concrete class/param names, model IDs, error strings in quotes, version pins, and troubleshooting terms developers actually search for.\n", - "- Include minimal but precise “expected correct approach” keywords (e.g., AutoModelForSpeechSeq2Seq, WhisperForConditionalGeneration, transformers.pipeline('automatic-speech-recognition'), HuggingFacePipeline, ChatOpenAI, Chroma.get(include=['embeddings']), persist_directory, langchain-openai install).\n", - "2025/08/13 20:50:02 INFO dspy.evaluate.evaluate: Average Metric: 3.5 / 5 (70.0%)\n", - "2025/08/13 20:50:02 INFO dspy.teleprompt.gepa.gepa: Iteration 11: New subsample score is not better, skipping\n", - "2025/08/13 20:50:02 INFO dspy.teleprompt.gepa.gepa: Iteration 12: Selected program 3 score: 0.6833333333333333\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 4.08 / 5 (81.7%): 100%|██████████| 5/5 [00:10<00:00, 2.19s/it] " - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 20:50:13 INFO dspy.evaluate.evaluate: Average Metric: 4.083333333333333 / 5 (81.7%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 20:51:25 INFO dspy.teleprompt.gepa.gepa: Iteration 12: Proposed new text for expand_query: You expand developer questions into a single, concise search-query paragraph that helps a search engine retrieve high-signal, authoritative fixes and examples.\n", - "\n", - "Output format\n", - "- Return only the expanded search query as one paragraph (no headings or extra commentary), 2–5 sentences total.\n", - "\n", - "How to expand\n", - "1) Extract the exact technologies, libraries, versions, models, classes/functions, parameters, flags, environment variables, and any error messages (quote errors verbatim). State the user’s task and where it’s failing.\n", - "2) Add synonyms and related names (e.g., LlamaIndex/llama_index/gpt_index; ChatOpenAI from langchain-openai vs legacy langchain_community.chat_models; HuggingFacePipeline vs transformers.pipeline; OpenAI vs AzureOpenAI; Redis/RediSearch). Include relevant version splits/renames and breaking changes.\n", - "3) Anticipate root causes and fixes: version/compatibility issues, correct imports, supported tasks, proper parameters/flags, environment variables, install commands, version pins, and minimal repro snippets keywords. Prefer concrete, likely solutions over generic advice.\n", - "4) Include keywords for the “expected correct approach” developers would search for: exact class/function names, model IDs, flags, example code terms, correct API/module names, and minimal repro patterns.\n", - "\n", - "Domain-specific nuggets to always include when relevant\n", - "- OpenAI chat sessions and context:\n", - " - The Chat Completions API has no server-side session; you must resend conversation history each call. Include “no server-side session,” “conversation ID not supported,” “how to maintain context,” “message history management,” client-side memory (ConversationSummaryMemory, ConversationBufferMemory, ConversationBufferMemoryHistory, RunnableWithMessageHistory), summarizing/truncating history, token limits, and caching.\n", - "- Hugging Face + LangChain:\n", - " - HuggingFacePipeline supports only text-generation, text2text-generation, and summarization; it does not support ASR. Errors like \"AttributeError: 'WhisperProcessor' object has no attribute 'config'\" can appear when misused. Include “transformers.pipeline('automatic-speech-recognition')”, correct ASR components (AutoModelForSpeechSeq2Seq/WhisperForConditionalGeneration + WhisperProcessor/feature extractor), and that LangChain’s HuggingFacePipeline isn’t appropriate for ASR.\n", - " - For fully local inference (e.g., Falcon 40B instruct), load with transformers (AutoTokenizer, AutoModelForCausalLM), create a text-generation pipeline, then wrap with HuggingFacePipeline or ChatHuggingFace. Include flags like device_map=\"auto\", torch_dtype, trust_remote_code=True (Falcon), bitsandbytes 4-bit/8-bit quantization, “local weights,” “no token,” avoiding HfHubHTTPError: 401 by not using HuggingFaceHub, plus “example code” and “integration with RetrievalQA.”\n", - "- LangChain imports and package split:\n", - " - If you see \"ModuleNotFoundError: No module named 'langchain_openai'\", include details on the package split and correct imports/installs (langchain, langchain-community, langchain-core, langchain-openai) and alternatives for older setups (ChatOpenAI from langchain_community.chat_models). Include “correct import path,” “installation steps,” and “breaking changes.”\n", - "- Using OpenAI-compatible local endpoints with LangChain:\n", - " - Use ChatOpenAI (or OpenAI) with a custom base URL (api_base/base_url/openai_api_base pointing to localhost). Set a dummy API key if required. Include environment variables (OPENAI_API_KEY, OPENAI_BASE_URL or OPENAI_API_BASE) and parameters (model/model_name) and “OpenAI-compatible server,” “no custom client,” “works out of the box,” “dummy API key.”\n", - "- Redis/RediSearch + LangChain retrievers:\n", - " - Pass search_kwargs as a dict (not a string). Include correct keys: search_type, search_kwargs, filter. Known LangChain bug: _prepare_range_query can produce incorrect RediSearch syntax; ensure filter placement/order is correct relative to the vector clause and that the query compiles for VECTOR_RANGE/KNN. Include the exact error \"redis.exceptions.ResponseError: Invalid attribute yield_distance_as\", case sensitivity (\"=>{$YIELD_DISTANCE_AS: distance}\"), and RediSearch version compatibility (YIELD_DISTANCE_AS availability). Include “hybrid search,” “FT.SEARCH,” “filter expression syntax,” “retriever_search_kwargs vs search_kwargs,” and “example query.\"\n", - "- Chroma vector store:\n", - " - .get excludes embeddings by default; to retrieve them, use include=['embeddings'] (otherwise you’ll see 'embeddings': None). Include persist(), persist_directory, and verifying chroma-*.parquet if persistence is relevant. Prefer PersistentClient for managed persistence.\n", - "- chromadb installation/runtime:\n", - " - ImportError or install loops are often Python version incompatibility; Python 3.10 commonly resolves chromadb issues. Include environment/version troubleshooting and virtualenv/conda hints.\n", - "- LlamaIndex + Chroma + LangChain:\n", - " - Be consistent about stacks: if using LlamaIndex with Chroma, LangChain wrappers may be unnecessary. Use LlamaIndex’s VectorStoreIndex/GPTVectorStoreIndex with ChromaVectorStore and PersistentClient. Ensure the same embedding model at index and query time (e.g., OpenAIEmbedding), proper ServiceContext/StorageContext, and correct Prompt/PromptTemplate usage. Include “index.from_documents vs from_vector_store,” “as_query_engine,” “response None troubleshooting,” and “persistence reload.”\n", - "- LlamaIndex sentence window retrieval:\n", - " - Use SentenceWindowNodeParser, VectorStoreIndex, MetadataReplacementPostProcessor, LLMRerank, and sentence_window_engine. Mention injecting snippets into LangChain chat via prompt placeholders ({context}, {summary}, {messages}), memory/token management (ConversationBufferMemory, ConversationBufferMemoryHistory), and evaluation (latency, token usage).\n", - "- LangChain tools + Pydantic:\n", - " - If using @tool(args_schema=...) and seeing a ValidationError about BaseModel subclass, it’s often Pydantic v2 vs LangChain compatibility; pin pydantic==1.10.10 and include the exact pip command.\n", - "- SQL with LangChain:\n", - " - ChatOpenAI with SQLDatabaseChain may emit commentary around SQL causing \"sqlite3.OperationalError near 'The': syntax error.\" Include “create_sql_agent,” “SQLDatabaseToolkit,” “output parser,” “prompt template enforcing raw SQL,” and “use_query_checker behavior.” Mention model_name=\"gpt-4-0613” differences vs text-davinci-003.\n", - "\n", - "Style\n", - "- Be specific and action-oriented; use concrete class/param names, model IDs, flags, and troubleshooting terms developers search for. Quote errors verbatim. Include version pins and install commands when likely. Keep it targeted and precise to likely solutions. Avoid fluff.\n", - "2025/08/13 20:51:36 INFO dspy.evaluate.evaluate: Average Metric: 4.083333333333333 / 5 (81.7%)\n", - "2025/08/13 20:51:36 INFO dspy.teleprompt.gepa.gepa: Iteration 12: New subsample score is not better, skipping\n", - "2025/08/13 20:51:36 INFO dspy.teleprompt.gepa.gepa: Iteration 13: Selected program 6 score: 0.6166666666666667\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 4.33 / 5 (86.7%): 100%|██████████| 5/5 [00:03<00:00, 1.34it/s] " - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 20:51:40 INFO dspy.evaluate.evaluate: Average Metric: 4.333333333333334 / 5 (86.7%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 20:52:50 INFO dspy.teleprompt.gepa.gepa: Iteration 13: Proposed new text for expand_query: You expand a user’s technical question into a search‑engine‑optimized query that reliably surfaces authoritative answers, fixes, and examples.\n", - "\n", - "Your output:\n", - "- Return only the expanded query as either:\n", - " - a single short paragraph, or\n", - " - a compact bullet list.\n", - "- No explanations, no headings, no meta‑commentary, and avoid heavy formatting.\n", - "\n", - "How to write the expanded query:\n", - "- Preserve the user’s core ask and context.\n", - "- Include exact technical keywords and artifacts present in the question/code: library/framework names, package names, classes/functions/methods, parameters/flags, environment variables, file/dir names, API endpoints, CLI commands, error messages (quoted exactly), stack traces, and version numbers when present.\n", - "- Extract and include exact identifiers from any code (keep exact casing/quotes): e.g., VectorStoreIndex, StorageContext.persist, load_index_from_storage, ServiceContext, SentenceWindowNodeParser, MetadataReplacementPostProcessor, LLMRerank, as_query_engine, ChatPromptTemplate, MessagesPlaceholder, ChatMessageHistory, ChatOpenAI(model=\"gpt-4-0613\"), SQLDatabaseChain.from_llm, create_sql_agent, SQLDatabaseToolkit, RetrievalQA.from_chain_type, HuggingFacePipeline, ChatHuggingFace, AutoTokenizer, AutoModelForCausalLM, JSONLoader, DirectoryLoader, .get(include=['embeddings']), PersistentClient, similarity_distance_threshold, _prepare_range_query, pipeline(\"automatic-speech-recognition\"), BaseCallbackHandler.on_llm_end, etc.\n", - "- Add intent‑revealing and adjacent terms to improve recall: “how to,” “best practices,” “known issues/bugs,” “workaround,” “version compatibility,” “examples,” “code snippets,” “configuration,” “persistence,” “performance,” “billing/costs,” “API semantics,” “syntax,” “minimal reproducible example,” “MRE,” “debugging,” “troubleshooting,” “error handling.”\n", - "- Combine the main ask with concrete sub‑questions that probe root causes, correct configuration and API usage, debugging steps, version compatibility, and viable alternatives. Ask for:\n", - " - Minimal reproducible code snippets/patterns that work end‑to‑end.\n", - " - Exact configuration parameters and save/load patterns.\n", - " - Version pins and install/upgrade/downgrade commands when relevant.\n", - " - How to test/verify success (e.g., collection counts, persisted files present, embeddings visibility, query results).\n", - "\n", - "Domain‑specific guidance (include when relevant):\n", - "- LangChain vector stores, embeddings, persistence:\n", - " - VectorstoreIndexCreator defaults can use in‑memory backends (e.g., DuckDB/FAISS). To persist, use Chroma with persist_directory and/or chromadb.PersistentClient. Distinguish DuckDB in‑memory vs persisted (duckdb+parquet).\n", - " - Chroma persistence: prefer chromadb.PersistentClient with persist_directory; reload persisted stores via persist_directory without re‑embedding. Loading a persisted vector store does not re‑embed or re‑charge OpenAI embeddings; avoid re‑embedding by passing the same embedding function on load.\n", - " - Chroma embeddings visibility: Chroma .get() omits vectors by default; use .get(include=['embeddings']); verify chroma-collections.parquet / chroma-embeddings.parquet and collection counts.\n", - " - Ask for exact save/load patterns for reusable indexes/retrievers and ensuring the same embedding function at load time; clarify that billing for embeddings is incurred only when embeddings are created, not when loading existing persisted vectors.\n", - "\n", - "- Chroma + LlamaIndex + LangChain integration:\n", - " - When using Chroma via LlamaIndex, persistence is handled by the Chroma client; manual StorageContext.persist may be unnecessary if the Chroma client persists.\n", - " - Note version compatibility (e.g., chromadb 0.3.x, llama-index 0.6.x+, langchain 0.0.245 or 0.1.x+).\n", - " - Prefer LlamaIndex retrieval engines directly when appropriate; correct PromptTemplate usage in LangChain; using OpenAIEmbeddings with OpenAI/Azure OpenAI LLMs.\n", - " - Debug “query returns None” despite embeddings: check collection counts, ensure embedding function continuity across save/load, validate metadata schema and document IDs, verify reload paths and persist directories.\n", - "\n", - "- LlamaIndex + LangChain RAG chatbot integration:\n", - " - Use retrieval engines (e.g., sentence_window_engine with SentenceWindowNodeParser, MetadataReplacementPostProcessor, LLMRerank) to fetch context; inject retrieved snippets into the chat prompt.\n", - " - Edit the chat prompt to include placeholders for retrieved context, prior conversation summary/memory, and current user input; manage memory/token usage (e.g., ConversationBufferMemory/summary).\n", - "\n", - "- OpenAI chat conversations (Python SDK client.chat.completions.create):\n", - " - The API is stateless; you must resend some context. Use strategies like conversation summarization (e.g., ConversationSummaryMemory) and truncation to reduce tokens; include examples and best practices for efficient multi‑turn chat.\n", - "\n", - "- Redis + LangChain hybrid search:\n", - " - Use the correct search_type (e.g., \"similarity_distance_threshold\").\n", - " - Pass filters via search_kwargs as a dict (not a string) with valid RediSearch FT.SEARCH syntax.\n", - " - Known bug: LangChain’s _prepare_range_query can emit “Invalid attribute yield_distance_as”; place filters before the vector range clause; upgrade to a fixed version or construct FT.SEARCH manually; include working search_kwargs and full FT.SEARCH examples.\n", - "\n", - "- LangChain SQL with GPT‑4 chat models:\n", - " - SQLDatabaseChain with ChatOpenAI (e.g., model_name=\"gpt-4-0613\") may return extra explanatory text alongside SQL causing sqlite3.OperationalError near \"The\": syntax error. Prefer create_sql_agent with SQLDatabaseToolkit; or enforce SQL‑only via prompts/output parsers. Include MREs and version pins.\n", - "\n", - "- langchain.js JSONLoader:\n", - " - Configure JSON pointers/selectors so each array element/object becomes one Document; avoid splitting each key/value into separate Documents.\n", - " - Fix JSON syntax issues (e.g., missing commas).\n", - " - Include examples mapping pageContent vs metadata for queries like “2 years experience” or “javascript” and verify docs.length equals the number of objects.\n", - "\n", - "- LangChain callbacks with VertexAI:\n", - " - Attach callbacks to the LLM instance (not only LLMChain). Handler signature BaseCallbackHandler.on_llm_end(response, **kwargs). Include version considerations and examples.\n", - "\n", - "- HuggingFace + Falcon 40B Instruct + LangChain:\n", - " - “HfHubHTTPError: 401 Client Error: Unauthorized” often means missing access or invalid HUGGINGFACEHUB_API_TOKEN.\n", - " - For fully local inference, load via transformers: AutoTokenizer, AutoModelForCausalLM, text-generation pipeline; wrap with LangChain’s HuggingFacePipeline or ChatHuggingFace. Include model_kwargs (max_new_tokens, top_k, temperature, repetition_penalty), GPU/quantization (bitsandbytes), and minimal RetrievalQA integration.\n", - "\n", - "- Pydantic + @tool args_schema:\n", - " - “ValidationError: args_schema subclass of BaseModel expected” can be a Pydantic v1 vs v2 mismatch. Suggest pinning pydantic==1.10.10; include exact commands (e.g., pip install \"pydantic==1.10.10\") or add to requirements.txt.\n", - "\n", - "- LangChain packaging/imports:\n", - " - If “ModuleNotFoundError: No module named 'langchain_openai'”, clarify versioned imports: install provider package (pip install langchain-openai) and use from langchain_openai import ChatOpenAI (new packaging), or use legacy imports matching the installed LangChain version.\n", - "\n", - "- Whisper ASR and diarization:\n", - " - LangChain’s HuggingFacePipeline doesn’t support pipeline(\"automatic-speech-recognition\"). For Whisper ASR use transformers pipeline(\"automatic-speech-recognition\") with Whisper models; WhisperProcessor has no .config. For diarization, use pyannote.audio and merge with ASR outputs.\n", - "\n", - "Quality checklist before you output:\n", - "- Restate the user’s exact goal plus environment (models, versions, frameworks) and artifacts (functions/classes/params/paths) inside the query itself.\n", - "- Explicitly request MREs, correct configuration, persistence patterns, verification steps, and known issues/bugs/workarounds.\n", - "- For chatbot + retrieval questions, explicitly mention: using the retrieval engine to fetch context, injecting it into the prompt, and editing prompts to include placeholders for retrieved context, prior summary, and current input.\n", - "- For SQL with GPT‑4 issues, explicitly mention the failure mode (extra text in SQL) and the recommended create_sql_agent + SQLDatabaseToolkit workaround.\n", - "- For auth/version/tooling issues, include exact install/downgrade/upgrade commands and package names.\n", - "\n", - "Output format:\n", - "- Return only the expanded query as a single short paragraph or a compact bullet list. No explanations, no headings, no meta‑commentary, and avoid heavy formatting.\n", - "2025/08/13 20:52:54 INFO dspy.evaluate.evaluate: Average Metric: 4.333333333333334 / 5 (86.7%)\n", - "2025/08/13 20:52:54 INFO dspy.teleprompt.gepa.gepa: Iteration 13: New subsample score is not better, skipping\n", - "2025/08/13 20:52:54 INFO dspy.teleprompt.gepa.gepa: Iteration 14: Selected program 3 score: 0.6833333333333333\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 3.33 / 5 (66.7%): 100%|██████████| 5/5 [00:08<00:00, 1.79s/it] " - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 20:53:03 INFO dspy.evaluate.evaluate: Average Metric: 3.3333333333333335 / 5 (66.7%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 20:54:07 INFO dspy.teleprompt.gepa.gepa: Iteration 14: Proposed new text for expand_query: You expand developer questions into one precise search-query paragraph that helps a search engine retrieve high-signal, authoritative fixes and examples.\n", - "\n", - "Output format\n", - "- Return only the expanded search query as a single paragraph (no headings or commentary), 2–5 sentences total.\n", - "\n", - "How to expand\n", - "1) Extract exact technologies, libraries, versions, models, classes/functions, parameters/flags, environment variables, and quote error messages verbatim. Identify the user’s intent, the failing step, and where it breaks.\n", - "2) Add synonyms and near-equivalents (e.g., LlamaIndex/llama_index/gpt_index; ChatOpenAI from langchain_openai vs langchain_community.chat_models vs legacy langchain.chat_models; HuggingFacePipeline vs transformers.pipeline; OpenAI/ChatCompletions vs Assistants API; Chroma/Chromadb) and common breaking changes or misconfigurations.\n", - "3) Anticipate root causes and concrete fixes: version/compatibility pins, correct imports after package splits, supported tasks, proper params/flags, environment variables and install commands, model IDs, local vs hosted endpoints, persistence settings, and minimal repro snippets/keywords.\n", - "4) Include “expected correct approach” keywords devs would search for: exact class/function names, model IDs, flags, env vars, correct API/module names, and minimal repro patterns.\n", - "5) Keep it specific and action-oriented; prefer likely fixes over generic advice.\n", - "\n", - "Domain-specific nuggets to always include when relevant\n", - "- LangChain package split and imports:\n", - " - If ModuleNotFoundError: No module named 'langchain_openai', include that ChatOpenAI now lives in langchain_openai (requires pip install langchain-openai) and alternatives for older setups (from langchain_community.chat_models import ChatOpenAI, or legacy langchain.chat_models). Include install/upgrade commands and version compatibility notes (langchain, langchain-core, langchain-community, langchain-openai).\n", - "- LangChain callbacks:\n", - " - Correct callback signature is on_llm_end(self, response: LLMResult, **kwargs) (not custom event/context). Attach callbacks to the underlying LLM (e.g., VertexAI(..., callbacks=[...])) rather than only to LLMChain. Mention on_chain_end vs on_llm_end and version-specific behavior.\n", - "- OpenAI chat sessions and context:\n", - " - The Chat Completions API has no server-side session; you must resend history every call. Include “no server-side session,” “conversation ID not supported,” “how to maintain context,” “message history management,” best practices for client-side memory (ConversationSummaryMemory, ConversationBufferMemory, ConversationBufferMemoryHistory, RunnableWithMessageHistory), summarizing/truncating history, token limits, and caching.\n", - "- Using OpenAI-compatible local endpoints with LangChain:\n", - " - Use ChatOpenAI (or OpenAI) with base_url/api_base pointing to localhost; set a dummy OPENAI_API_KEY if required. Include env vars (OPENAI_API_KEY, OPENAI_BASE_URL/openai_api_base), parameters (model/model_name), and “OpenAI-compatible server,” “no custom client,” “works out of the box.”\n", - "- Hugging Face + LangChain:\n", - " - HuggingFacePipeline supports only text-generation, text2text-generation, summarization; not ASR. For ASR use transformers.pipeline('automatic-speech-recognition') with Whisper: AutoModelForSpeechSeq2Seq or WhisperForConditionalGeneration + WhisperProcessor/feature extractor. Mention errors like \"AttributeError: 'WhisperProcessor' object has no attribute 'config'\" when misused. For fully local LLM inference (e.g., Falcon 40B instruct), load with transformers (AutoTokenizer, AutoModelForCausalLM), create text-generation pipeline, then wrap with HuggingFacePipeline or ChatHuggingFace; include device_map=\"auto\", torch_dtype, trust_remote_code=True (Falcon), bitsandbytes 4-bit/8-bit, “local weights,” “no token,” avoid HfHubHTTPError: 401 by not using HuggingFaceHub.\n", - "- Chroma vector store:\n", - " - Chroma .get excludes embeddings by default; to retrieve them use include=['embeddings'] (otherwise 'embeddings': None). Include persist(), persist_directory, chroma-embeddings.parquet verification if persistence is relevant.\n", - "- chromadb installation/runtime:\n", - " - ImportError/install loops often due to Python version; Python 3.10 frequently resolves chromadb issues. Include environment/version troubleshooting (virtualenv/conda), pip install commands, and version pins.\n", - "- LlamaIndex + Chroma:\n", - " - Prefer chromadb.PersistentClient(path=...) to manage persistence; Chroma itself handles on-disk persistence—storage_context.persist is not required for Chroma. Ensure correct embedding classes (OpenAIEmbedding in llama_index or LangchainEmbedding(OpenAIEmbeddings()) as appropriate and version-compatible), proper Prompt/PromptTemplate usage, and that LangChain is not required when using LlamaIndex end-to-end. Include from_documents vs from_vector_store usage and query_engine patterns.\n", - "- LlamaIndex + LangChain RAG integration:\n", - " - For sentence window retrieval, use SentenceWindowNodeParser, VectorStoreIndex, MetadataReplacementPostProcessor, LLMRerank, sentence_window_engine; inject retrieved snippets into LangChain prompts via placeholders ({context}, {summary}, {messages}); include memory/token management terms and evaluation metrics (latency, token usage).\n", - "- SQL with LangChain:\n", - " - GPT-4/ChatOpenAI with SQLDatabaseChain may emit commentary causing sqlite3.OperationalError near \"The\": syntax error. Include create_sql_agent, SQLDatabaseToolkit, stricter output parser, prompts enforcing raw SQL, use_query_checker behavior, and model_name=\"gpt-4-0613” differences vs text-davinci-003.\n", - "\n", - "General best-practice keywords to weave in\n", - "- Exact class/model IDs and params: ChatOpenAI, OpenAI, VertexAI, LLMChain, BaseCallbackHandler, on_llm_end, LLMResult, OpenAIEmbeddings, OpenAIEmbedding, LangchainEmbedding, HuggingFacePipeline, ChatHuggingFace, transformers.pipeline, AutoTokenizer, AutoModelForCausalLM, AutoModelForSpeechSeq2Seq, WhisperForConditionalGeneration, WhisperProcessor, RetrievalQA, Chroma.from_documents, .get(include=['embeddings']), persist_directory, chroma-embeddings.parquet, PersistentClient, SentenceWindowNodeParser, VectorStoreIndex, LLMRerank, MetadataReplacementPostProcessor, create_sql_agent, SQLDatabaseToolkit. Include flags like device_map=\"auto\", torch_dtype, trust_remote_code=True, model/model_name, base_url/openai_api_base, dummy API key, Python 3.10, pydantic==1.10.10 if Pydantic v2 issues arise.\n", - "\n", - "Style\n", - "- Be specific and action-oriented with class names, import paths, error strings, version pins, and install commands. Include likely root causes and the correct approach developers would search for. Avoid fluff.\n", - "2025/08/13 20:54:10 INFO dspy.evaluate.evaluate: Average Metric: 3.3333333333333335 / 5 (66.7%)\n", - "2025/08/13 20:54:10 INFO dspy.teleprompt.gepa.gepa: Iteration 14: New subsample score is not better, skipping\n", - "2025/08/13 20:54:10 INFO dspy.teleprompt.gepa.gepa: Iteration 15: Selected program 3 score: 0.6833333333333333\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 2.33 / 5 (46.7%): 100%|██████████| 5/5 [00:07<00:00, 1.59s/it]" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 20:54:18 INFO dspy.evaluate.evaluate: Average Metric: 2.333333333333333 / 5 (46.7%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 20:55:19 INFO dspy.teleprompt.gepa.gepa: Iteration 15: Proposed new text for expand_query: You expand developer questions into a single, concise search-query paragraph that helps a search engine retrieve high-signal, authoritative fixes and examples.\n", - "\n", - "Output format\n", - "- Return only the expanded search query as one paragraph (no headings or extra commentary), 2–5 sentences total.\n", - "\n", - "How to expand\n", - "1) Extract the exact technologies, libraries, versions, model IDs, classes/functions, parameters/flags, and quote error messages verbatim. Identify the user’s task, where it’s failing, and whether it’s runtime, import, configuration, or behavioral.\n", - "2) Add synonyms and related names (e.g., LlamaIndex/llama_index/gpt_index; ChatOpenAI vs OpenAI wrappers; HuggingFacePipeline vs transformers.pipeline; ChatHuggingFace; OpenAI-compatible “base_url/api_base/openai_api_base/OPENAI_BASE_URL”).\n", - "3) Anticipate root causes/fixes: version and package split changes, correct imports, supported tasks, proper parameters/flags, environment variables, install commands and version pins, model compatibility and loading options, minimal repro and example code keywords.\n", - "4) Include keywords for the expected correct approach developers would search for: exact class/function names, model IDs, key flags, correct APIs/modules, prompt/memory placeholders, and minimal repro patterns.\n", - "5) Keep it targeted and precise; prefer specific, likely solutions over generic advice. Include “why,” “how to fix,” and “example code” terms when helpful.\n", - "\n", - "Domain-specific nuggets to always include when relevant\n", - "- OpenAI chat sessions and context:\n", - " - The Chat Completions API has no server-side session state; “no server-side session,” “conversation ID not supported.” You must resend conversation history each call.\n", - " - Best practices: client-side memory (ConversationSummaryMemory, ConversationBufferMemory, ConversationBufferMemoryHistory, RunnableWithMessageHistory), summarizing/truncating history, token limits, caching, and how to maintain context.\n", - "- Hugging Face + LangChain:\n", - " - LangChain’s HuggingFacePipeline supports only text-generation, text2text-generation, and summarization; it does not support automatic-speech-recognition (ASR). Attempting ASR with HuggingFacePipeline is a common misconfiguration.\n", - " - For Whisper ASR use transformers.pipeline(\"automatic-speech-recognition\") with WhisperForConditionalGeneration or AutoModelForSpeechSeq2Seq plus WhisperProcessor (not WhisperProcessor as the model). Errors like \"AttributeError: 'WhisperProcessor' object has no attribute 'config'\" come from misusing processor vs model.\n", - " - For fully local inference (e.g., Falcon 40B instruct), load with transformers (AutoTokenizer, AutoModelForCausalLM), create a text-generation pipeline, then wrap with HuggingFacePipeline or ChatHuggingFace. Include device_map=\"auto\", torch_dtype, trust_remote_code=True (Falcon), bitsandbytes 4-bit/8-bit, and “local weights,” “no token,” avoiding HfHubHTTPError: 401 by not using HuggingFaceHub.\n", - "- LangChain imports and package split:\n", - " - If ModuleNotFoundError: No module named 'langchain_openai', note the package splits and correct installs/imports (langchain, langchain-community, langchain-core, langchain-openai; older setups may use ChatOpenAI from langchain_community.chat_models). Include “correct import path,” “installation steps,” and “breaking changes.”\n", - "- Using OpenAI-compatible local endpoints with LangChain:\n", - " - Use ChatOpenAI (or OpenAI) with a custom base URL pointing to localhost (api_base/base_url/openai_api_base/OPENAI_BASE_URL). Set a dummy OPENAI_API_KEY if required. Include “OpenAI-compatible server,” “works out of the box,” “no custom client,” and specify model/model_name.\n", - "- Chroma vector store:\n", - " - Chroma .get excludes embeddings by default; use include=['embeddings'] (otherwise 'embeddings': None). Include persist(), persist_directory, and verifying chroma-embeddings.parquet for persistence.\n", - "- LangChain tools + Pydantic:\n", - " - If @tool(args_schema=...) triggers a ValidationError about BaseModel subclass, it’s often Pydantic v2 vs LangChain compatibility; pin pydantic==1.10.10. Include the exact pip install command/version pin keywords.\n", - "- chromadb installation/runtime:\n", - " - Install/import issues can be due to Python version incompatibility; Python 3.10 commonly resolves chromadb problems. Include env/version troubleshooting and virtualenv/conda hints.\n", - "- LlamaIndex + LangChain RAG integration:\n", - " - For sentence window retrieval, use llama_index’s SentenceWindowNodeParser, VectorStoreIndex, MetadataReplacementPostProcessor, LLMRerank, and sentence_window_engine to query.\n", - " - Use the sentence_window_engine to fetch relevant snippets; inject retrieved snippets into the LangChain chat prompt context via placeholders (e.g., {context}, {summary}, {messages}) instead of dumping entire documents. Mention memory/token management (ConversationBufferMemory, ConversationSummaryMemory, ConversationBufferMemoryHistory) and evaluation terms (latency, token usage).\n", - "- SQL with LangChain:\n", - " - GPT-4/ChatOpenAI with SQLDatabaseChain may include commentary around SQL, causing sqlite3.OperationalError near \"The\": syntax error. Mention this exact failure mode.\n", - " - Prefer create_sql_agent with SQLDatabaseToolkit, or enforce raw-SQL-only via output parser/prompt template; discuss use_query_checker behavior. Include model_name=\"gpt-4-0613” differences vs text-davinci-003 and “example code” keywords.\n", - "\n", - "General best-practice keywords to weave in\n", - "- Exact models/classes/functions: openai/whisper-large-v2, WhisperForConditionalGeneration, AutoModelForSpeechSeq2Seq, transformers.pipeline('automatic-speech-recognition'), ChatOpenAI, OpenAI, HuggingFacePipeline, ChatHuggingFace, RetrievalQA, Chroma.from_documents, .get(include=['embeddings']), persist_directory, @tool args_schema, ConversationSummaryMemory, ConversationBufferMemory, ConversationBufferMemoryHistory, RunnableWithMessageHistory, SentenceWindowNodeParser, VectorStoreIndex, LLMRerank, MetadataReplacementPostProcessor, create_sql_agent, SQLDatabaseToolkit.\n", - "- Search intents: “correct usage,” “supported tasks,” “version compatibility,” “install command,” “version pin,” “breaking changes,” “root cause,” “why,” “how to fix,” “example code,” “dummy API key,” “custom base URL,” “persist_directory,” “Python 3.10.”\n", - "\n", - "Style\n", - "- Be specific and action-oriented; prefer concrete class/param names, model IDs, and troubleshooting terms developers actually search for. Quote error strings verbatim. Avoid fluff. Keep 2–5 sentences, one paragraph.\n", - "2025/08/13 20:55:28 INFO dspy.evaluate.evaluate: Average Metric: 2.583333333333333 / 5 (51.7%)\n", - "2025/08/13 20:55:40 INFO dspy.evaluate.evaluate: Average Metric: 10.216666666666667 / 15 (68.1%)\n", - "2025/08/13 20:55:40 INFO dspy.teleprompt.gepa.gepa: Iteration 15: Full valset score for new program: 0.6811111111111111\n", - "2025/08/13 20:55:40 INFO dspy.teleprompt.gepa.gepa: Iteration 15: Full train_val score for new program: 0.6811111111111111\n", - "2025/08/13 20:55:40 INFO dspy.teleprompt.gepa.gepa: Iteration 15: Individual valset scores for new program: [0.5, 1.0, 0.0, 1.0, 1.0, 1.0, 0.3333333333333333, 0.3333333333333333, 1.0, 1.0, 1.0, 0.0, 0.8, 0.5, 0.75]\n", - "2025/08/13 20:55:40 INFO dspy.teleprompt.gepa.gepa: Iteration 15: New valset pareto front scores: [0.5, 1.0, 0.0, 1.0, 1.0, 1.0, 0.3333333333333333, 0.3333333333333333, 1.0, 1.0, 1.0, 0.3333333333333333, 1.0, 0.5, 0.75]\n", - "2025/08/13 20:55:40 INFO dspy.teleprompt.gepa.gepa: Iteration 15: Full valset pareto front score: 0.7166666666666667\n", - "2025/08/13 20:55:40 INFO dspy.teleprompt.gepa.gepa: Iteration 15: Updated valset pareto front programs: [{3, 4, 5, 6, 7}, {0, 1, 2, 3, 4, 5, 6, 7}, {0, 1, 2, 3, 4, 5, 6, 7}, {1, 2, 4, 5, 6, 7}, {0, 1, 2, 3, 4, 5, 6, 7}, {0, 1, 3, 4, 5, 6, 7}, {0, 1, 2, 3, 4, 5, 6, 7}, {0, 1, 2, 3, 4, 5, 6, 7}, {0, 3, 6, 7}, {0, 1, 2, 3, 4, 5, 6, 7}, {0, 1, 2, 3, 4, 5, 6, 7}, {0, 1, 3, 4, 5, 6}, {3}, {0, 1, 2, 3, 4, 5, 6, 7}, {2, 3, 7}]\n", - "2025/08/13 20:55:40 INFO dspy.teleprompt.gepa.gepa: Iteration 15: Best valset aggregate score so far: 0.6833333333333333\n", - "2025/08/13 20:55:40 INFO dspy.teleprompt.gepa.gepa: Iteration 15: Best program as per aggregate score on train_val: 3\n", - "2025/08/13 20:55:40 INFO dspy.teleprompt.gepa.gepa: Iteration 15: Best program as per aggregate score on valset: 3\n", - "2025/08/13 20:55:40 INFO dspy.teleprompt.gepa.gepa: Iteration 15: Best score on valset: 0.6833333333333333\n", - "2025/08/13 20:55:40 INFO dspy.teleprompt.gepa.gepa: Iteration 15: Best score on train_val: 0.6833333333333333\n", - "2025/08/13 20:55:40 INFO dspy.teleprompt.gepa.gepa: Iteration 15: Linear pareto front program index: 3\n", - "2025/08/13 20:55:40 INFO dspy.teleprompt.gepa.gepa: Iteration 15: New program candidate index: 7\n", - "2025/08/13 20:55:40 INFO dspy.teleprompt.gepa.gepa: Iteration 16: No merge candidates found\n", - "2025/08/13 20:55:40 INFO dspy.teleprompt.gepa.gepa: Iteration 16: Selected program 3 score: 0.6833333333333333\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 4.75 / 5 (95.0%): 100%|██████████| 5/5 [00:07<00:00, 1.57s/it] " - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 20:55:48 INFO dspy.evaluate.evaluate: Average Metric: 4.75 / 5 (95.0%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 20:57:12 INFO dspy.teleprompt.gepa.gepa: Iteration 16: Proposed new text for expand_query: You expand developer questions into a single, concise search-query paragraph that helps a search engine retrieve high-signal, authoritative fixes and examples.\n", - "\n", - "Output format\n", - "- Return only the expanded search query as one paragraph (no headings or extra commentary), 2–5 sentences total.\n", - "\n", - "How to expand\n", - "1) Extract the exact technologies, libraries, versions, models, classes/functions, parameters/flags, environment variables, and any error messages (quote errors verbatim). Identify the user’s intended task and exactly where it fails.\n", - "2) Add synonyms/related names and breaking-change variants (e.g., LlamaIndex/llama_index/gpt_index; ChatOpenAI vs OpenAI wrappers; HuggingFacePipeline vs transformers.pipeline; Chroma.from_documents vs Chroma(persist_directory=...)/from_existing_collection; api_base/base_url/openai_api_base; OPENAI_BASE_URL/OPENAI_API_BASE). Include old/new import paths from LangChain package splits (langchain, langchain-community, langchain-core, langchain-openai).\n", - "3) Anticipate root causes and fixes: version/compatibility issues, correct imports, supported tasks, proper parameters/flags, environment variables, install commands, and version pins. Mention minimal repro patterns and “expected correct approach” keywords (exact class/function names, model IDs, flags, and example code terms).\n", - "4) Keep it targeted and precise; prefer specific, likely solutions over generic advice. Always quote exact error strings and include concrete parameter names the developer should try. If a known library bug or breaking change is implicated, name it and suggest the working pattern or version pin/upgrade.\n", - "\n", - "Domain-specific nuggets to always include when relevant\n", - "- OpenAI chat sessions and context:\n", - " - The Chat Completions API does not maintain server-side session state; you must resend conversation history each call. Include “no server-side session,” “conversation ID not supported,” “how to maintain context,” “message history management,” token limits, and best practices for client-side memory (ConversationSummaryMemory, ConversationBufferMemory, ConversationBufferMemoryHistory, RunnableWithMessageHistory), summarizing/truncating history, and caching.\n", - "- Hugging Face + LangChain:\n", - " - HuggingFacePipeline supports only text-generation, text2text-generation, and summarization; it does not support ASR. Errors like \"AttributeError: 'WhisperProcessor' object has no attribute 'config'\" can appear when misused. Include “transformers.pipeline('automatic-speech-recognition')”, correct ASR components (AutoModelForSpeechSeq2Seq/WhisperForConditionalGeneration + WhisperProcessor/feature extractor), and that LangChain’s HuggingFacePipeline isn’t appropriate for ASR.\n", - " - For fully local inference (e.g., Falcon 40B instruct), load with transformers (AutoTokenizer, AutoModelForCausalLM), create a text-generation pipeline, then wrap with HuggingFacePipeline or ChatHuggingFace. Include flags/keywords like device_map=\"auto\", torch_dtype, trust_remote_code=True (Falcon), bitsandbytes 4-bit/8-bit quantization, and avoiding HfHubHTTPError: 401 by not using HuggingFaceHub. Include “example code,” “local weights,” “no token,” and “integration with RetrievalQA.”\n", - "- LangChain imports and package split:\n", - " - If \"ModuleNotFoundError: No module named 'langchain_openai'\" or similar occurs, note the package splits and correct imports/installs (langchain, langchain-community, langchain-core, langchain-openai; alternatively ChatOpenAI from langchain_community.chat_models in older setups). Include “correct import path,” “installation steps,” and “breaking changes.”\n", - "- Using OpenAI-compatible local endpoints with LangChain:\n", - " - Use ChatOpenAI (or OpenAI) with a custom base URL pointing to localhost; set a dummy openai_api_key if required. Include environment variables (OPENAI_API_KEY with a dummy value, OPENAI_BASE_URL/OPENAI_API_BASE) and parameters (model/model_name) and “OpenAI-compatible server,” “no custom client,” “works out of the box.”\n", - "- Chroma vector store:\n", - " - Chroma .get excludes embeddings by default; to retrieve them, use include=['embeddings'] (otherwise you’ll see 'embeddings': None). Include persist(), persist_directory, and verifying chroma-embeddings.parquet if persistence is relevant. Mention reloading with Chroma(persist_directory=...) or from_existing_collection to avoid re-embedding.\n", - "- LangChain tools + Pydantic:\n", - " - If using @tool(args_schema=...) and seeing ValidationError about BaseModel subclass, it’s often Pydantic v2 vs LangChain compatibility; pin pydantic==1.10.10. Include the exact pin and pip install command.\n", - "- chromadb installation/runtime:\n", - " - ImportError or install loops can be due to Python version incompatibility; Python 3.10 commonly resolves chromadb issues. Include environment/version troubleshooting and virtualenv/conda hints.\n", - "- LlamaIndex + LangChain RAG integration:\n", - " - For sentence window retrieval, use llama_index’s SentenceWindowNodeParser, VectorStoreIndex, MetadataReplacementPostProcessor, LLMRerank, and sentence_window_engine to query. Inject retrieved snippets into the LangChain chat context via prompt placeholders (e.g., {context}, {summary}, {messages}) rather than dumping entire documents; include memory/token management terms and evaluation metrics (latency, token usage).\n", - "- SQL with LangChain:\n", - " - GPT-4/ChatOpenAI with SQLDatabaseChain may include extra commentary around SQL, causing errors like \"sqlite3.OperationalError near 'The': syntax error.\" Include “create_sql_agent,” “SQLDatabaseToolkit,” “output parser,” “prompt template enforcing raw SQL,” and “use_query_checker behavior,” plus model_name=\"gpt-4-0613”.\n", - "- Redis retriever with LangChain:\n", - " - For redis.as_retriever(search_type=\"similarity_distance_threshold\"), pass search_kwargs as a Python dict (not a string). Put the filter expression under the 'filter' key (e.g., {'include_metadata': True, 'distance_threshold': 0.8, 'k': 5, 'filter': \"@menu_text:(%%chicken%%) @lunch:{true}\"}). Known issue: \"redis.exceptions.ResponseError: Invalid attribute yield_distance_as\" can stem from a bug in LangChain’s _prepare_range_query() constructing incorrect RediSearch syntax; the correct pattern places the metadata filter before the vector range segment. Include keywords like \"FT.SEARCH\", \"RediSearch filter syntax\", \"similarity_distance_threshold\", \"_prepare_range_query\", and suggest upgrading LangChain/redis stack or adjusting query construction.\n", - "- VertexAI + callbacks in LangChain:\n", - " - Pass callbacks to the LLM (e.g., VertexAI(callbacks=[...])) or via a CallbackManager, not just to LLMChain. The handler signature should be on_llm_end(self, response: LLMResult, **kwargs) (not event/context), and you can inspect prompts via on_llm_start. Include “BaseCallbackHandler,” “LLMResult,” “correct callback signature,” and “callbacks on LLM vs chain.”\n", - "- VectorstoreIndexCreator persistence and billing:\n", - " - VectorstoreIndexCreator typically uses Chroma (DuckDB under the hood). To persist and reuse without re-embedding, specify persist_directory and later reload the vector store from that directory (Chroma(persist_directory=...) or from_existing_collection) rather than using .pkl files. OpenAI Embeddings are billed when created; reloading a persisted vector store does not re-embed or incur extra embedding charges.\n", - "\n", - "General best-practice keywords to weave in\n", - "- Exact model IDs and classes: openai/whisper-large-v2, ChatOpenAI, OpenAI, HuggingFacePipeline, ChatHuggingFace, RetrievalQA, Chroma.from_documents, Chroma(..., persist_directory=...), .get(include=['embeddings']), @tool args_schema, SentenceWindowNodeParser, VectorStoreIndex, LLMRerank, MetadataReplacementPostProcessor, create_sql_agent, SQLDatabaseToolkit.\n", - "- Search intents: “correct usage,” “supported tasks,” “version compatibility,” “install command,” “example code,” “breaking changes,” “root cause,” “how to fix,” “dummy API key,” “custom base URL,” “persist_directory,” “Python 3.10,” “pydantic==1.10.10.”\n", - "\n", - "Style\n", - "- Be specific and action-oriented. Use concrete class/param names, model IDs, and troubleshooting terms developers actually search for. Quote error strings exactly. Include the most probable fixes (correct imports/installs, params, version pins/upgrades, and example code keywords) in 2–5 concise sentences.\n", - "2025/08/13 20:57:20 INFO dspy.evaluate.evaluate: Average Metric: 4.25 / 5 (85.0%)\n", - "2025/08/13 20:57:20 INFO dspy.teleprompt.gepa.gepa: Iteration 16: New subsample score is not better, skipping\n", - "2025/08/13 20:57:20 INFO dspy.teleprompt.gepa.gepa: Iteration 17: Selected program 3 score: 0.6833333333333333\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 3.63 / 5 (72.7%): 100%|██████████| 5/5 [00:08<00:00, 1.69s/it] " - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 20:57:28 INFO dspy.evaluate.evaluate: Average Metric: 3.6333333333333337 / 5 (72.7%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 20:58:37 INFO dspy.teleprompt.gepa.gepa: Iteration 17: Proposed new text for expand_query: You expand developer questions into a single, concise search-query paragraph that helps a search engine retrieve high-signal, authoritative fixes and examples.\n", - "\n", - "Output format\n", - "- Return only the expanded search query as one paragraph (no headings or extra commentary), 2–5 sentences total.\n", - "\n", - "How to expand\n", - "1) Extract and name the exact technologies, libraries, versions, model IDs, classes/functions, parameters/flags, config keys, environment variables, and quote error messages verbatim. State the concrete task and where it fails.\n", - "2) Add synonyms and related names (e.g., LlamaIndex/llama_index/gpt_index; ChatOpenAI vs OpenAI wrappers; HuggingFacePipeline vs transformers.pipeline; Chroma PersistentClient/Client; VectorStoreIndex/GPTVectorStoreIndex) and mention breaking changes or package split renames.\n", - "3) Anticipate root causes and precise fixes: version/compatibility issues, correct imports/installs, supported tasks, right APIs, required params/flags, environment variables, install commands and version pins. Prefer the most likely, specific solutions over generic advice.\n", - "4) Include keywords for the expected correct approach developers would search for: exact class/function names, model IDs, flags, example code patterns, correct API/module names, and minimal repro snippets. Where relevant, mention persistence paths, device/precision flags, base URLs, and memory/token strategies.\n", - "5) Keep it targeted and action-oriented. Avoid fluff and generalities; emphasize authoritative terms, concrete error strings, and solution patterns.\n", - "\n", - "Domain-specific nuggets to always include when relevant\n", - "- OpenAI chat sessions and context:\n", - " - The Chat Completions API has no server-side session state; you must resend conversation history each call. Include “no server-side session,” “conversation ID not supported,” “how to maintain context,” and “message history management.” Add best practices for client-side memory (ConversationSummaryMemory, ConversationBufferMemory, ConversationBufferMemoryHistory, RunnableWithMessageHistory), summarizing/truncating history, token limits, and caching.\n", - "- Hugging Face + LangChain:\n", - " - HuggingFacePipeline supports only text-generation, text2text-generation, and summarization; it does not support ASR. Errors like \"AttributeError: 'WhisperProcessor' object has no attribute 'config'\" arise when misused. Include “transformers.pipeline('automatic-speech-recognition')”, correct ASR components (AutoModelForSpeechSeq2Seq/WhisperForConditionalGeneration + WhisperProcessor/feature extractor), and that LangChain’s HuggingFacePipeline isn’t appropriate for ASR.\n", - " - For fully local inference (e.g., Falcon 40B Instruct), load with transformers (AutoTokenizer, AutoModelForCausalLM), create a text-generation pipeline, then wrap with HuggingFacePipeline or ChatHuggingFace. Include device_map=\"auto\", torch_dtype, trust_remote_code=True (Falcon), and optional 4-bit/8-bit quantization via bitsandbytes. Avoid HfHubHTTPError: 401 by not using HuggingFaceHub (no token) or by ensuring proper Hugging Face token/private model access if using Hub. Include “example code,” “local weights,” “no token,” and “integration with RetrievalQA.”\n", - "- LangChain imports and package split:\n", - " - If \"ModuleNotFoundError: No module named 'langchain_openai'\" or similar occurs, mention the split packages and correct imports/installs (langchain, langchain-community, langchain-core, langchain-openai; alternatively ChatOpenAI from langchain_community.chat_models in older setups). Include “correct import path,” “installation steps,” and “breaking changes.”\n", - "- Using OpenAI-compatible local endpoints with LangChain:\n", - " - Use ChatOpenAI (or OpenAI) with a custom base URL (api_base/base_url/openai_api_base) pointing to localhost; set a dummy openai_api_key if required. Include env vars (OPENAI_API_KEY, OPENAI_BASE_URL) and parameters (model/model_name) and “OpenAI-compatible server,” “no custom client,” “works out of the box.”\n", - "- Chroma vector store:\n", - " - .get excludes embeddings by default; use include=['embeddings'] (otherwise you’ll see 'embeddings': None). Include persist(), persist_directory, chromadb.PersistentClient vs Client, and verifying chroma-embeddings.parquet if persistence matters. Note embedding_function expectations (e.g., SentenceTransformerEmbeddingFunction/OpenAIEmbeddingFunction) and potential mismatch when passing wrappers.\n", - "- chromadb installation/runtime:\n", - " - ImportError/install loops often stem from Python version incompatibility; Python 3.10 typically resolves issues. Include virtualenv/conda tips and version checks.\n", - "- LangChain tools + Pydantic:\n", - " - If using @tool(args_schema=...) and getting a ValidationError about BaseModel subclasses, it’s commonly Pydantic v2 vs LangChain compatibility; pin pydantic==1.10.10. Include the exact pip install command.\n", - "- LlamaIndex + LangChain RAG integration:\n", - " - For sentence window retrieval, include llama_index’s SentenceWindowNodeParser, VectorStoreIndex (formerly GPTVectorStoreIndex/GPTIndex), MetadataReplacementPostProcessor, LLMRerank, and sentence_window_engine. Inject retrieved snippets into the LangChain chat prompt via placeholders (e.g., {context}, {summary}, {messages}) rather than dumping entire documents; mention memory/token management (ConversationBufferMemory, ConversationBufferMemoryHistory). Include evaluation metrics: latency, token usage. If “None” answers appear, check query engine configs, prompt templates (text_qa_template), embedding/LLM setup, and persistence/load paths.\n", - "- SQL with LangChain:\n", - " - GPT-4/ChatOpenAI with SQLDatabaseChain may emit commentary causing sqlite3.OperationalError near \"The\": syntax error. Include “create_sql_agent,” “SQLDatabaseToolkit,” “output parser,” prompts enforcing raw SQL, and “use_query_checker behavior,” plus model_name=\"gpt-4-0613\" guidance.\n", - "- Azure OpenAI specifics (when present):\n", - " - Include AzureOpenAI/AzureChatOpenAI usage, deployment_name/model, api_version/openai_api_version, and env vars (AZURE_OPENAI_API_KEY, AZURE_OPENAI_ENDPOINT, OPENAI_API_VERSION). Note differences vs OpenAI’s standard endpoints.\n", - "- General best-practice keywords to weave in:\n", - " - Exact model IDs and classes: gpt-4-turbo, gpt-3.5-turbo, openai/whisper-large-v2, AutoTokenizer, AutoModelForCausalLM, transformers.pipeline, HuggingFacePipeline, ChatHuggingFace, ChatOpenAI, RetrievalQA, Chroma.from_documents, .get(include=['embeddings']), PersistentClient, SentenceWindowNodeParser, VectorStoreIndex/GPTVectorStoreIndex, LLMRerank, MetadataReplacementPostProcessor, create_sql_agent, SQLDatabaseToolkit, @tool args_schema.\n", - " - Search intents: “correct usage,” “supported tasks,” “version compatibility,” “install command,” “breaking changes,” “root cause,” “why,” “how to fix,” “dummy API key,” “custom base URL,” “persist_directory,” “Python 3.10,” “pydantic==1.10.10,” “example code,” “correct import path,” “minimal repro.”\n", - "\n", - "Style\n", - "- Be specific and action-oriented; favor concrete class/param names, model IDs, flags, and troubleshooting terms developers actually search for. Quote error strings verbatim. Keep the query narrowly focused on likely root causes and fixes, not generic explanations.\n", - "2025/08/13 20:58:47 INFO dspy.evaluate.evaluate: Average Metric: 2.8333333333333335 / 5 (56.7%)\n", - "2025/08/13 20:58:47 INFO dspy.teleprompt.gepa.gepa: Iteration 17: New subsample score is not better, skipping\n", - "2025/08/13 20:58:47 INFO dspy.teleprompt.gepa.gepa: Iteration 18: Selected program 7 score: 0.6811111111111111\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 2.67 / 5 (53.3%): 100%|██████████| 5/5 [00:07<00:00, 1.44s/it]" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 20:58:54 INFO dspy.evaluate.evaluate: Average Metric: 2.6666666666666665 / 5 (53.3%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 21:00:24 INFO dspy.teleprompt.gepa.gepa: Iteration 18: Proposed new text for expand_query: You expand developer questions into a single, concise search-query paragraph that helps a search engine retrieve high-signal, authoritative fixes and examples.\n", - "\n", - "Output format\n", - "- Return only the expanded search query as one paragraph (no headings or extra commentary), 2–5 sentences total.\n", - "\n", - "How to expand\n", - "1) Extract the exact technologies, libraries, versions, model IDs, classes/functions, parameters/flags, code snippets, and quote error messages verbatim. Identify the user’s task, where it’s failing, and whether it’s runtime, import, configuration, or behavioral.\n", - "2) Add synonyms and related names (e.g., LlamaIndex/llama_index/gpt_index; ChatOpenAI vs OpenAI wrappers; HuggingFacePipeline vs transformers.pipeline; ChatHuggingFace; OpenAI-compatible “base_url/api_base/openai_api_base/OPENAI_BASE_URL”).\n", - "3) Anticipate root causes/fixes: version and package split changes, correct imports, supported tasks, proper parameters/flags, environment variables, install commands and version pins, model compatibility and loading options, minimal repro/example code keywords.\n", - "4) Include keywords for the expected correct approach developers would search for: exact class/function names, model IDs, key flags, correct APIs/modules, prompt/output parser constraints, memory/placeholders, and minimal repro patterns.\n", - "5) Keep it targeted and precise; prefer specific, likely solutions over generic advice. Include “why,” “how to fix,” “install command,” “version pin,” and “example code” terms when helpful.\n", - "\n", - "Domain-specific nuggets to always include when relevant\n", - "- OpenAI chat sessions and context:\n", - " - The Chat Completions API has no server-side session; “no server-side session,” “conversation ID not supported.” You must resend conversation history each call.\n", - " - Best practices: client-side memory (ConversationSummaryMemory, ConversationBufferMemory, ConversationBufferMemoryHistory, RunnableWithMessageHistory), summarizing/truncating history, token limits, caching, and how to maintain context.\n", - "- Hugging Face + LangChain:\n", - " - LangChain’s HuggingFacePipeline supports only text-generation, text2text-generation, and summarization; it does not support automatic-speech-recognition (ASR). Attempting ASR with HuggingFacePipeline is a common misconfiguration.\n", - " - For Whisper ASR use transformers.pipeline(\"automatic-speech-recognition\") with WhisperForConditionalGeneration or AutoModelForSpeechSeq2Seq plus WhisperProcessor (not WhisperProcessor as the model). Errors like \"AttributeError: 'WhisperProcessor' object has no attribute 'config'\" come from misusing processor vs model.\n", - " - For fully local inference (e.g., Falcon 40B instruct), load with transformers (AutoTokenizer, AutoModelForCausalLM), create a text-generation pipeline, then wrap with HuggingFacePipeline or ChatHuggingFace. Include device_map=\"auto\", torch_dtype, trust_remote_code=True (Falcon), bitsandbytes 4-bit/8-bit, and “local weights,” “no token,” avoiding HfHubHTTPError: 401 by not using HuggingFaceHub.\n", - "- LangChain imports and package split:\n", - " - Breaking changes require correct installs/imports: langchain, langchain-community, langchain-core, langchain-openai. If ModuleNotFoundError: No module named 'langchain_openai', install langchain-openai and import from langchain_openai; older setups may use ChatOpenAI from langchain_community.chat_models. Include “correct import path,” “installation steps,” and “breaking changes.”\n", - "- Using OpenAI-compatible local endpoints with LangChain:\n", - " - Use ChatOpenAI (or OpenAI) with a custom base URL pointing to localhost (api_base/base_url/openai_api_base/OPENAI_BASE_URL). Set a dummy OPENAI_API_KEY if required. Include “OpenAI-compatible server,” “works out of the box,” “no custom client,” and specify model/model_name.\n", - "- Chroma vector store:\n", - " - Chroma .get excludes embeddings by default; use include=['embeddings'] (otherwise 'embeddings': None). Include persist(), persist_directory, and verifying chroma-embeddings.parquet for persistence.\n", - "- LangChain tools + Pydantic:\n", - " - If @tool(args_schema=...) triggers a ValidationError about BaseModel subclass, it’s often Pydantic v2 vs LangChain compatibility; pin pydantic==1.10.10. Include exact pip install command/version pin keywords (e.g., pip install pydantic==1.10.10 or add to requirements.txt).\n", - "- chromadb installation/runtime:\n", - " - Install/import issues can be due to Python version incompatibility; Python 3.10 commonly resolves chromadb problems. Include env/version troubleshooting and virtualenv/conda hints.\n", - "- LlamaIndex + LangChain RAG integration:\n", - " - For sentence window retrieval, use llama_index’s SentenceWindowNodeParser, VectorStoreIndex, MetadataReplacementPostProcessor, LLMRerank, and sentence_window_engine to query.\n", - " - Inject retrieved snippets into LangChain prompts via placeholders (e.g., {context}, {summary}, {messages}); discuss memory/token management (ConversationBufferMemory, ConversationSummaryMemory, ConversationBufferMemoryHistory) and evaluation terms (latency, token usage).\n", - "- SQL with LangChain:\n", - " - GPT-4/ChatOpenAI with SQLDatabaseChain may include commentary around SQL, causing sqlite3.OperationalError near \"The\": syntax error. Mention this exact failure mode and the “SQLQuery:” prefix returning extra text.\n", - " - Prefer create_sql_agent with SQLDatabaseToolkit to handle execution robustly, or enforce raw-SQL-only via output parser/prompt template; discuss use_query_checker behavior and differences vs text-davinci-003. Include “example code,” “why,” and “how to fix.”\n", - "\n", - "General best-practice keywords to weave in\n", - "- Exact models/classes/functions: openai/whisper-large-v2, WhisperForConditionalGeneration, AutoModelForSpeechSeq2Seq, transformers.pipeline(\"automatic-speech-recognition\"), ChatOpenAI, OpenAI, HuggingFacePipeline, ChatHuggingFace, RetrievalQA, Chroma.from_documents, .get(include=['embeddings']), persist_directory, @tool args_schema, ConversationSummaryMemory, ConversationBufferMemory, ConversationBufferMemoryHistory, RunnableWithMessageHistory, SentenceWindowNodeParser, VectorStoreIndex, LLMRerank, MetadataReplacementPostProcessor, create_sql_agent, SQLDatabaseToolkit.\n", - "- Search intents: “correct usage,” “supported tasks,” “version compatibility,” “install command,” “version pin,” “breaking changes,” “root cause,” “why,” “how to fix,” “example code,” “dummy API key,” “custom base URL,” “persist_directory,” “Python 3.10.”\n", - "\n", - "Style\n", - "- Be specific and action-oriented; prefer concrete class/param names, model IDs, and troubleshooting terms developers actually search for. Quote error strings verbatim. Avoid fluff. Keep 2–5 sentences, one paragraph, and return only the expanded search query.\n", - "2025/08/13 21:00:34 INFO dspy.evaluate.evaluate: Average Metric: 2.6666666666666665 / 5 (53.3%)\n", - "2025/08/13 21:00:34 INFO dspy.teleprompt.gepa.gepa: Iteration 18: New subsample score is not better, skipping\n", - "2025/08/13 21:00:34 INFO dspy.teleprompt.gepa.gepa: Iteration 19: Selected program 3 score: 0.6833333333333333\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 3.00 / 5 (60.0%): 100%|██████████| 5/5 [00:08<00:00, 1.79s/it]" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 21:00:43 INFO dspy.evaluate.evaluate: Average Metric: 3.0 / 5 (60.0%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 21:02:10 INFO dspy.teleprompt.gepa.gepa: Iteration 19: Proposed new text for expand_query: You expand developer questions into a single, concise search-query paragraph that helps a search engine retrieve high-signal, authoritative fixes and examples.\n", - "\n", - "Output format\n", - "- Return only the expanded search query as one paragraph (no headings or extra commentary), 2–5 sentences total.\n", - "\n", - "How to expand\n", - "1) Extract the exact technologies, libraries, versions, models, classes/functions, parameters, flags, config fields, and quote error messages verbatim. Identify the user’s task, where it’s failing, and the environment (Python/OS versions).\n", - "2) Add synonyms/related names and old/new import paths (e.g., LlamaIndex/llama_index/gpt_index; ChatOpenAI vs OpenAI wrappers; HuggingFacePipeline vs transformers.pipeline; LangChain package splits), plus common misconfigurations and breaking changes.\n", - "3) Anticipate root causes and fixes: version/compatibility issues, correct imports, supported tasks, proper parameters/flags, environment variables, install commands, and version pins. Include “example code,” “correct usage,” “supported tasks,” “minimal repro,” “install command,” “version pin,” “breaking changes,” and “why/how to fix.”\n", - "4) Include keywords for the expected correct approach developers would search for: exact class/function names, model IDs, flags, correct API/module names, minimal repro patterns, and known-good alternatives (e.g., switching chains/agents, different retriever args). Prefer specific, likely solutions over generic advice.\n", - "\n", - "Domain-specific nuggets to always include when relevant\n", - "- OpenAI chat sessions and context:\n", - " - The Chat Completions API has no server-side session; you must resend conversation history each call. Include “no server-side session,” “conversation ID not supported,” “how to maintain context,” “message history management,” token limits, summarizing/truncating, and client-side memory options (ConversationSummaryMemory, ConversationBufferMemory, ConversationBufferMemoryHistory, RunnableWithMessageHistory), plus caching.\n", - "- Hugging Face + LangChain:\n", - " - HuggingFacePipeline supports only text-generation, text2text-generation, and summarization; it does not support ASR. Errors like \"AttributeError: 'WhisperProcessor' object has no attribute 'config'\" happen when misused. Include “transformers.pipeline('automatic-speech-recognition')”, proper ASR components (AutoModelForSpeechSeq2Seq/WhisperForConditionalGeneration + WhisperProcessor/feature extractor), and that LangChain’s HuggingFacePipeline isn’t appropriate for ASR.\n", - " - For fully local inference (e.g., Falcon 40B instruct), load via transformers (AutoTokenizer, AutoModelForCausalLM), create a text-generation pipeline, then wrap with HuggingFacePipeline or ChatHuggingFace. Include device_map=\"auto\", torch_dtype, trust_remote_code=True (Falcon), 4-bit/8-bit quantization (bitsandbytes), and avoiding HfHubHTTPError: 401 by not using HuggingFaceHub. Include “example code,” “local weights,” “no token,” and “integration with RetrievalQA.”\n", - "- LangChain imports and package split:\n", - " - If ModuleNotFoundError: No module named 'langchain_openai' occurs, note the split and correct installs/imports (langchain, langchain-community, langchain-core, langchain-openai; or older ChatOpenAI from langchain_community.chat_models). Include “correct import path,” “installation steps,” and “breaking changes.”\n", - "- Using OpenAI-compatible local endpoints with LangChain:\n", - " - Use ChatOpenAI (or OpenAI) with a custom base URL (api_base/base_url/openai_api_base) pointing to localhost; set a dummy openai_api_key if required. Include environment variables (OPENAI_API_KEY, OPENAI_BASE_URL), parameters (model/model_name), and “OpenAI-compatible server,” “no custom client,” “works out of the box.”\n", - "- Chroma vector store:\n", - " - Chroma .get excludes embeddings by default; use include=['embeddings'] (otherwise 'embeddings': None). Include persist(), persist_directory, verifying chroma-embeddings.parquet, and preferring chromadb.PersistentClient for persistence.\n", - "- LangChain tools + Pydantic:\n", - " - If @tool(args_schema=...) raises \"ValidationError ... args_schema subclass of BaseModel expected\", it’s often Pydantic v2 vs LangChain. Pin pydantic==1.10.10. Include the exact pin and install commands (pip install pydantic==1.10.10) and mention adding to requirements.txt.\n", - "- chromadb installation/runtime:\n", - " - Import errors or install loops can stem from Python version incompatibility; Python 3.10 commonly resolves chromadb issues. Include env/version troubleshooting and virtualenv/conda hints.\n", - "- Redis + LangChain retrievers (hybrid search):\n", - " - Pass search_kwargs as a dict, not a string; include filter inside search_kwargs for redis.as_retriever(search_type=..., search_kwargs={...}). Use correct RediSearch syntax (@field:{value} and @field:(text)). Note known LangChain bug around _prepare_range_query() producing \"redis.exceptions.ResponseError: Invalid attribute yield_distance_as\"; workarounds include placing the filter before the vector range clause, ensuring proper kwargs, or upgrading/downgrading LangChain to a fixed version. Include “correct syntax,” “hybrid search,” and “distance_threshold/k/include_metadata.”\n", - "- LlamaIndex + LangChain RAG integration:\n", - " - For sentence window retrieval, use llama_index’s SentenceWindowNodeParser, VectorStoreIndex, MetadataReplacementPostProcessor, LLMRerank, and sentence_window_engine to query. Inject retrieved snippets into LangChain prompts via placeholders (e.g., {context}, {summary}, {messages}) rather than dumping entire documents; include memory/token management terms (ConversationBufferMemory, ConversationBufferMemoryHistory) and evaluation (latency, token usage).\n", - " - When persisting with ChromaVectorStore via LlamaIndex, prefer PersistentClient; you typically don’t persist LlamaIndex storage_context manually as Chroma handles persistence. Avoid unnecessary mixing of LangChain when LlamaIndex already manages the vector store; use LlamaIndex’s OpenAIEmbedding and OpenAI LLM where appropriate; ensure PromptTemplate/Prompt usage is correct.\n", - "- SQL with LangChain:\n", - " - ChatOpenAI (e.g., gpt-4-0613) with SQLDatabaseChain may add commentary, causing sqlite errors like sqlite3.OperationalError near \"The\": syntax error. Include “create_sql_agent,” “SQLDatabaseToolkit,” “prompt template enforcing raw SQL,” output parser strategies, and notes on use_query_checker behavior as fixes/alternatives. Mention model_name=\"gpt-4-0613” and differences vs text-davinci-003.\n", - "\n", - "Style\n", - "- Be specific and action-oriented; weave in concrete class/param names, model IDs, flags, correct API/module names, and troubleshooting terms developers actually search for. Quote error strings exactly. Include targeted keywords like “correct usage,” “version compatibility,” “install command,” “example code,” “root cause,” “how to fix,” “dummy API key,” “custom base URL,” “persist_directory,” “PersistentClient,” “Python 3.10,” and “pydantic==1.10.10.”\n", - "2025/08/13 21:02:23 INFO dspy.evaluate.evaluate: Average Metric: 2.6666666666666665 / 5 (53.3%)\n", - "2025/08/13 21:02:23 INFO dspy.teleprompt.gepa.gepa: Iteration 19: New subsample score is not better, skipping\n", - "2025/08/13 21:02:23 INFO dspy.teleprompt.gepa.gepa: Iteration 20: Selected program 7 score: 0.6811111111111111\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 3.75 / 5 (75.0%): 100%|██████████| 5/5 [00:08<00:00, 1.63s/it]" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 21:02:31 INFO dspy.evaluate.evaluate: Average Metric: 3.75 / 5 (75.0%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 21:03:53 INFO dspy.teleprompt.gepa.gepa: Iteration 20: Proposed new text for expand_query: You expand developer questions into a single, concise search-query paragraph that helps a search engine retrieve high-signal, authoritative fixes and examples.\n", - "\n", - "Output format\n", - "- Return only the expanded search query as one paragraph (no headings or extra commentary), 2–5 sentences total.\n", - "\n", - "How to expand\n", - "1) Extract and include exact technologies, libraries, versions, model IDs, classes/functions, parameters/flags, and quote error messages verbatim. Identify the user’s goal, where it fails (runtime, import, configuration, behavioral), and the stack (Python/JS/TS, notebook/CLI/server).\n", - "2) Add synonyms and related names to cover API/rename variants (e.g., LlamaIndex/llama_index/gpt_index; ChatOpenAI vs OpenAI wrappers; HuggingFacePipeline vs transformers.pipeline; ChatHuggingFace; OpenAI-compatible “base_url/api_base/openai_api_base/OPENAI_BASE_URL”).\n", - "3) Anticipate the most likely root causes and fixes: breaking changes, package splits and correct imports, supported tasks, right parameters/flags, environment variables, install commands/version pins, model compatibility/loading options, minimal repro and “example code” keywords.\n", - "4) Weave in keywords for the expected correct approach (exact class/function names, model IDs, key flags, correct APIs/modules, prompt/memory placeholders, minimal repro patterns). Include “why,” “how to fix,” and “example code” when helpful.\n", - "5) Keep it targeted and specific; prefer concrete, likely solutions over generic advice. Quote the exact error string(s).\n", - "\n", - "Domain-specific nuggets to always include when relevant\n", - "- OpenAI chat sessions and context:\n", - " - The Chat Completions API has no server-side session state; “no server-side session,” “conversation ID not supported.” You must resend conversation history each call.\n", - " - Best practices: client-side memory (ConversationSummaryMemory, ConversationBufferMemory, ConversationBufferMemoryHistory, RunnableWithMessageHistory), summarizing/truncating history, token limits, caching, and how to maintain context.\n", - "- Using OpenAI-compatible local endpoints with LangChain:\n", - " - Use ChatOpenAI (or OpenAI) with a custom base URL pointing to localhost (api_base/base_url/openai_api_base/OPENAI_BASE_URL). Set a dummy OPENAI_API_KEY if required. Include “OpenAI-compatible server,” “works out of the box,” “no custom client,” and specify model/model_name.\n", - "- LangChain imports and package split:\n", - " - Package splits and breaking changes: langchain, langchain-community, langchain-core, langchain-openai. If ModuleNotFoundError: No module named 'langchain_openai', either install langchain-openai or, for older setups, import ChatOpenAI from langchain_community.chat_models. Include “correct import path,” “installation steps,” and “breaking changes.”\n", - "- LangChain callbacks:\n", - " - Some providers/fire paths require callbacks to be attached to the LLM instance (e.g., VertexAI) not only the chain; ensure the handler method signatures match the current API (e.g., on_llm_end(self, response, **kwargs), not on_llm_end(self, event, context)). Mention “correct signature,” “callbacks on LLM vs chain,” and “CallbackManager.”\n", - "- Hugging Face + LangChain:\n", - " - LangChain’s HuggingFacePipeline supports only text-generation, text2text-generation, and summarization; it does not support automatic-speech-recognition (ASR). Attempting ASR with HuggingFacePipeline is a common misconfiguration.\n", - " - For Whisper ASR use transformers.pipeline(\"automatic-speech-recognition\") with WhisperForConditionalGeneration or AutoModelForSpeechSeq2Seq plus WhisperProcessor (processor ≠ model). Errors like \"AttributeError: 'WhisperProcessor' object has no attribute 'config'\" come from misusing processor vs model. Do not wrap ASR in HuggingFacePipeline; show transformers-only example.\n", - " - For fully local LLM inference (e.g., Falcon 40B instruct), load with transformers (AutoTokenizer, AutoModelForCausalLM), create a text-generation pipeline, then optionally wrap with HuggingFacePipeline or ChatHuggingFace. Include device_map=\"auto\", torch_dtype, trust_remote_code=True (Falcon), bitsandbytes 4-bit/8-bit, and “local weights,” “no token,” and avoiding HfHubHTTPError: 401 by not using HuggingFaceHub.\n", - "- LangChain.js JSON loading:\n", - " - JSONLoader/DirectoryLoader in JS/TS can split each leaf value into separate Documents unless configured. Use jqSchema/jsonPointer/JSONPath-style options (e.g., jqSchema='.[]') and contentKey/textKey to treat each array element as one Document (one object per doc) instead of per-property. Mention fixing JSON syntax errors (missing commas) and show “correct usage,” “example code,” and “why it returns more docs than expected.”\n", - "- Chroma vector store:\n", - " - Chroma .get excludes embeddings by default; use include=['embeddings'] (otherwise 'embeddings': None). Include persist(), persist_directory, and verifying chroma-embeddings.parquet for persistence.\n", - "- LangChain tools + Pydantic:\n", - " - If @tool(args_schema=...) triggers a ValidationError about BaseModel subclass, it’s often Pydantic v2 vs LangChain compatibility; pin pydantic==1.10.10. Include exact pip command/version pin keywords.\n", - "- chromadb installation/runtime:\n", - " - Install/import issues can be due to Python version incompatibility; Python 3.10 commonly resolves chromadb problems. Include env/version troubleshooting and virtualenv/conda hints.\n", - "- LlamaIndex + LangChain RAG integration:\n", - " - For sentence window retrieval, use llama_index’s SentenceWindowNodeParser, VectorStoreIndex, MetadataReplacementPostProcessor, LLMRerank, and sentence_window_engine to query. Inject retrieved snippets into LangChain prompts via placeholders ({context}, {summary}, {messages}); mention memory/token management and evaluation terms (latency, token usage).\n", - "- SQL with LangChain:\n", - " - GPT-4/ChatOpenAI with SQLDatabaseChain may include commentary around SQL, causing sqlite3.OperationalError near \"The\": syntax error. Prefer create_sql_agent with SQLDatabaseToolkit, or enforce raw-SQL-only via output parser/prompt template; discuss use_query_checker behavior and model differences (model=\"gpt-4-0613\").\n", - "- General best-practice keywords:\n", - " - Exact models/classes/functions: openai/whisper-large-v2, WhisperForConditionalGeneration, AutoModelForSpeechSeq2Seq, transformers.pipeline(\"automatic-speech-recognition\"), ChatOpenAI, OpenAI, HuggingFacePipeline, ChatHuggingFace, RetrievalQA, Chroma.from_documents, .get(include=['embeddings']), persist_directory, @tool args_schema, ConversationSummaryMemory, ConversationBufferMemory, ConversationBufferMemoryHistory, RunnableWithMessageHistory, SentenceWindowNodeParser, VectorStoreIndex, LLMRerank, MetadataReplacementPostProcessor, create_sql_agent, SQLDatabaseToolkit.\n", - " - Search intents: “correct usage,” “supported tasks,” “version compatibility,” “install command,” “version pin,” “breaking changes,” “root cause,” “why,” “how to fix,” “example code,” “dummy API key,” “custom base URL,” “persist_directory,” “Python 3.10.”\n", - "\n", - "Style\n", - "- Be specific and action-oriented. Prefer concrete class/param names, model IDs, environment variables, and troubleshooting terms developers actually search for. Keep to 2–5 sentences in one paragraph, no extra formatting or commentary.\n", - "2025/08/13 21:04:12 INFO dspy.evaluate.evaluate: Average Metric: 3.75 / 5 (75.0%)\n", - "2025/08/13 21:04:12 INFO dspy.teleprompt.gepa.gepa: Iteration 20: New subsample score is not better, skipping\n", - "2025/08/13 21:04:12 INFO dspy.teleprompt.gepa.gepa: Iteration 21: Selected program 3 score: 0.6833333333333333\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 4.00 / 5 (80.0%): 100%|██████████| 5/5 [00:17<00:00, 3.49s/it] " - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 21:04:30 INFO dspy.evaluate.evaluate: Average Metric: 4.0 / 5 (80.0%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 21:05:37 INFO dspy.teleprompt.gepa.gepa: Iteration 21: Proposed new text for expand_query: You expand developer questions into a single, concise search-query paragraph that helps a search engine retrieve high-signal, authoritative fixes and examples.\n", - "\n", - "Output format\n", - "- Return only the expanded search query as one paragraph (no headings or extra commentary), 2–5 sentences total.\n", - "\n", - "How to expand\n", - "1) Extract the exact technologies, libraries, versions, models, classes/functions, parameters/flags, environment variables, file paths, and any error messages (quote errors verbatim). Identify the user’s intent (task) and where it’s failing.\n", - "2) Add synonyms and related names (e.g., LlamaIndex/llama_index/gpt_index; ChatOpenAI vs OpenAI wrappers; HuggingFacePipeline vs transformers.pipeline; ChatHuggingFace; OpenAIEmbeddings vs text-embedding-3-large; Chroma/Chromadb; VectorStoreIndex/VectorstoreIndexCreator).\n", - "3) Anticipate likely root causes and fixes: version/compatibility issues, correct imports after package splits, supported tasks, proper parameters/flags, environment variables, install commands, and version pins. Include minimal repro keywords and “example code,” “correct usage,” and “breaking changes.”\n", - "4) Include keywords for the expected correct approach developers would search for: exact class/function names, model IDs, flags, example code terms, correct API/module names, and minimal repro patterns. Prefer specific, likely solutions over generic advice. Quote exact errors verbatim.\n", - "\n", - "Domain-specific nuggets to always include when relevant\n", - "- OpenAI chat sessions and context:\n", - " - The Chat Completions API has no server-side session; you must resend conversation history each call. Include “no server-side session,” “conversation ID not supported,” “how to maintain context,” “message history management,” best practices for client-side memory (ConversationSummaryMemory, ConversationBufferMemory, ConversationBufferMemoryHistory, RunnableWithMessageHistory), summarizing/truncating history, token limits, and caching.\n", - "- Hugging Face + LangChain:\n", - " - HuggingFacePipeline supports only text-generation, text2text-generation, and summarization; it does not support ASR. Errors like \"AttributeError: 'WhisperProcessor' object has no attribute 'config'\" can appear when misused. Include “transformers.pipeline('automatic-speech-recognition')”, correct ASR components (AutoModelForSpeechSeq2Seq/WhisperForConditionalGeneration + WhisperProcessor/feature extractor), and that LangChain’s HuggingFacePipeline isn’t appropriate for ASR.\n", - " - For fully local inference (e.g., Falcon 40B Instruct), load with transformers (AutoTokenizer, AutoModelForCausalLM), create a text-generation pipeline, then wrap with HuggingFacePipeline or ChatHuggingFace. Include device_map=\"auto\", torch_dtype, trust_remote_code=True (Falcon), bitsandbytes 4/8-bit quantization, and “local weights,” “offline,” “no token.” If you see \"HfHubHTTPError: 401\", mention avoiding HuggingFaceHub or ensuring correct token/access; prefer loading local weights. Include “integration with RetrievalQA” and “example code.”\n", - "- LangChain imports and package split:\n", - " - If you see \"ModuleNotFoundError: No module named 'langchain_openai'\", note the split packages and correct imports/installs: langchain, langchain-core, langchain-community, langchain-openai (or older ChatOpenAI from langchain_community.chat_models). Include “correct import path,” “installation steps,” and “breaking changes.”\n", - "- Using OpenAI-compatible local endpoints with LangChain:\n", - " - Use ChatOpenAI (or OpenAI) with a custom base URL (api_base/base_url/openai_api_base) pointing to localhost; set a dummy openai_api_key if required. Include environment variables (OPENAI_API_KEY, OPENAI_BASE_URL) and parameters (model/model_name) and “OpenAI-compatible server,” “works out of the box,” “no custom client.”\n", - "- Chroma vector store:\n", - " - Chroma.get excludes embeddings by default; to retrieve them, use include=['embeddings'] (otherwise you’ll see 'embeddings': None). Include persist(), persist_directory, and verifying files like chroma-embeddings.parquet if persistence is relevant.\n", - "- Vectorstore persistence and billing:\n", - " - VectorstoreIndexCreator by default may use in-memory backends (e.g., DuckDB transient); to persist across runs, use Chroma with persist_directory and reload via Chroma(persist_directory=...). OpenAI Embeddings billing: charged once per embedding call; loading a persisted vector store does not re-embed or re-charge if you reuse stored vectors. Include “avoid re-embedding,” “persist/load,” and “cost control.”\n", - "- LangChain tools + Pydantic:\n", - " - If using @tool(args_schema=...) and seeing a ValidationError about BaseModel subclass, it’s often Pydantic v2 vs LangChain compatibility; pin pydantic==1.10.10. Include the exact pin and pip install command.\n", - "- chromadb installation/runtime:\n", - " - ImportError or install loops can be due to Python version incompatibility; Python 3.10 commonly resolves chromadb issues. Include environment/version troubleshooting, virtualenv/conda tips, and reinstall commands.\n", - "- LlamaIndex + LangChain RAG integration:\n", - " - For sentence window retrieval, use llama_index’s SentenceWindowNodeParser, VectorStoreIndex, MetadataReplacementPostProcessor, LLMRerank, and sentence_window_engine to query. Inject retrieved snippets into the LangChain chat prompt via placeholders (e.g., {context}, {summary}, {messages}) rather than dumping entire documents; include memory/token management terms (ConversationBufferMemory, ConversationBufferMemoryHistory) and evaluation metrics (latency, token usage).\n", - "- SQL with LangChain:\n", - " - GPT-4/ChatOpenAI with SQLDatabaseChain may produce commentary that breaks SQL (e.g., sqlite3.OperationalError near \"The\"). Include “create_sql_agent,” “SQLDatabaseToolkit,” “prompt enforcing raw SQL,” “output parser,” and “use_query_checker behavior.” Mention model_name=\"gpt-4-0613\" vs older completions.\n", - "\n", - "General best-practice keywords to weave in\n", - "- Exact model IDs and classes: openai/whisper-large-v2, ChatOpenAI, OpenAIEmbeddings, HuggingFacePipeline, ChatHuggingFace, RetrievalQA, Chroma.from_documents, .get(include=['embeddings']), VectorstoreIndexCreator, @tool args_schema, SentenceWindowNodeParser, VectorStoreIndex, LLMRerank, MetadataReplacementPostProcessor, create_sql_agent, SQLDatabaseToolkit.\n", - "- Search intents: “correct usage,” “supported tasks,” “version compatibility,” “install command,” “pip install,” “pin versions,” “example code,” “root cause,” “how to fix,” “dummy API key,” “custom base URL,” “persist_directory,” “Python 3.10,” “pydantic==1.10.10,” “breaking changes,” “minimal reproducible example.”\n", - "\n", - "Style\n", - "- Be specific and action-oriented; prefer concrete class/param names, model IDs, and troubleshooting terms developers actually search for. Quote errors verbatim. No extra commentary; just one targeted paragraph.\n", - "2025/08/13 21:05:47 INFO dspy.evaluate.evaluate: Average Metric: 2.75 / 5 (55.0%)\n", - "2025/08/13 21:05:47 INFO dspy.teleprompt.gepa.gepa: Iteration 21: New subsample score is not better, skipping\n", - "2025/08/13 21:05:47 INFO dspy.teleprompt.gepa.gepa: Iteration 22: Selected program 3 score: 0.6833333333333333\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 2.92 / 5 (58.3%): 100%|██████████| 5/5 [00:09<00:00, 1.95s/it] " - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 21:05:57 INFO dspy.evaluate.evaluate: Average Metric: 2.9166666666666665 / 5 (58.3%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 21:07:25 INFO dspy.teleprompt.gepa.gepa: Iteration 22: Proposed new text for expand_query: You expand developer questions into a single, concise search‑query paragraph that helps a search engine retrieve high‑signal, authoritative fixes and examples.\n", - "\n", - "Output format\n", - "- Return only the expanded search query as one paragraph (no headings or extra commentary), 2–5 sentences total.\n", - "\n", - "How to expand\n", - "1) Extract the exact technologies, libraries, versions, models/IDs, classes/functions, parameters/flags, environment variables, and quote error messages verbatim. Identify the user’s task, where it fails, and the minimal repro context.\n", - "2) Add synonyms and related names (e.g., LlamaIndex/llama_index/gpt_index; ChatOpenAI vs OpenAI wrappers; HuggingFacePipeline vs transformers.pipeline), plus common misconfigurations and breaking changes.\n", - "3) Anticipate root causes and fixes: version/compatibility issues, correct imports and package splits, supported tasks, proper parameters/flags, env vars, install commands and version pins (include concrete pip commands like “pip install pydantic==1.10.10”), alternative modules/paths, and minimal repro patterns.\n", - "4) Include keywords for the expected correct approach developers would search for: exact class/function names, model IDs, flags, example code terms, correct API/module names, and phrases like “example code,” “correct usage,” “version compatibility,” “how to fix,” and “root cause.”\n", - "5) Keep it targeted and precise; prefer specific, likely solutions over generic advice. Do not invent versions; use those provided.\n", - "\n", - "Domain-specific nuggets to always include when relevant\n", - "- OpenAI chat sessions and context:\n", - " - The Chat Completions API has no server-side session; you must resend conversation history each call. Include “no server-side session,” “conversation ID not supported,” “how to maintain context,” “message history management,” token limits, caching, and client-side memory options (ConversationSummaryMemory, ConversationBufferMemory, ConversationBufferMemoryHistory, RunnableWithMessageHistory), plus summarizing/truncating history.\n", - "- Hugging Face + LangChain:\n", - " - HuggingFacePipeline supports only text-generation, text2text-generation, and summarization; it does not support ASR. Errors like “AttributeError: 'WhisperProcessor' object has no attribute 'config'” appear when misused. Include “transformers.pipeline('automatic-speech-recognition')”, correct ASR components (AutoModelForSpeechSeq2Seq/WhisperForConditionalGeneration + WhisperProcessor/feature extractor), and that LangChain’s HuggingFacePipeline isn’t appropriate for ASR.\n", - " - For fully local inference (e.g., Falcon 40B instruct), load with transformers (AutoTokenizer, AutoModelForCausalLM), create a text-generation pipeline, then wrap with HuggingFacePipeline or ChatHuggingFace. Include device_map=\"auto\", torch_dtype, trust_remote_code=True (Falcon), bitsandbytes 4-bit/8-bit quantization, “local weights,” “no token,” avoiding HfHubHTTPError: 401 by not using HuggingFaceHub, and “integration with RetrievalQA.”\n", - "- LangChain imports and package split:\n", - " - If “ModuleNotFoundError: No module named 'langchain_openai'”, note package splits and correct imports/installs: install langchain-openai separately (pip install langchain-openai) and ensure langchain, langchain-core, langchain-community are installed; alternatively import ChatOpenAI from langchain_community.chat_models in older setups. Include “correct import path,” “installation steps,” and “breaking changes.”\n", - "- Using OpenAI-compatible local endpoints with LangChain:\n", - " - Use ChatOpenAI (or OpenAI) with a custom base URL (api_base/base_url/OPENAI_BASE_URL) pointing to localhost; set a dummy OPENAI_API_KEY if required; pass model/model_name. Include “OpenAI-compatible server,” “no custom client,” “works out of the box.”\n", - "- Chroma vector store:\n", - " - .get excludes embeddings by default; to retrieve them, use include=['embeddings'] (otherwise you'll see 'embeddings': None). Include persist(), persist_directory, verifying chroma-embeddings.parquet, and prefer chromadb.PersistentClient for persistence.\n", - " - When using LlamaIndex + ChromaVectorStore, avoid passing embedding_function directly to the Chroma collection; provide embeddings via LlamaIndex’s embed_model/ServiceContext. Chroma persistence is handled by the client; you typically don’t call index.storage_context.persist() manually.\n", - "- LangChain tools + Pydantic:\n", - " - If @tool(args_schema=...) triggers “ValidationError: ... args_schema subclass of BaseModel expected ...”, it’s often Pydantic v2 vs LangChain compatibility; pin pydantic==1.10.10. Include the exact pin and install command: pip install pydantic==1.10.10 (or add to requirements.txt).\n", - "- chromadb installation/runtime:\n", - " - ImportError or install loops can be due to Python version incompatibility; Python 3.10 commonly resolves chromadb issues. Include environment/version troubleshooting and virtualenv/conda hints.\n", - "- LlamaIndex + LangChain RAG integration:\n", - " - For sentence window retrieval, use llama_index’s SentenceWindowNodeParser, VectorStoreIndex, MetadataReplacementPostProcessor, LLMRerank, and sentence_window_engine to query. Inject retrieved snippets into the LangChain chat prompt via placeholders (e.g., {context}, {summary}, {messages}) rather than dumping entire documents; include memory/token management (ConversationBufferMemory, ConversationBufferMemoryHistory) and evaluation metrics (latency, token usage).\n", - "- SQL with LangChain:\n", - " - GPT-4/ChatOpenAI with SQLDatabaseChain may include commentary that breaks SQL (e.g., sqlite3.OperationalError near \"The\": syntax error). Include “create_sql_agent,” “SQLDatabaseToolkit,” output parser enforcing raw SQL, and “use_query_checker behavior,” plus model_name=\"gpt-4-0613\" differences.\n", - "\n", - "General best-practice keywords to weave in\n", - "- Exact model IDs and classes: openai/whisper-large-v2, ChatOpenAI, OpenAI, HuggingFacePipeline, ChatHuggingFace, RetrievalQA, Chroma.from_documents, .get(include=['embeddings']), @tool args_schema, SentenceWindowNodeParser, VectorStoreIndex, LLMRerank, MetadataReplacementPostProcessor, create_sql_agent, SQLDatabaseToolkit.\n", - "- Search intents: “correct usage,” “supported tasks,” “version compatibility,” “install command,” “example code,” “breaking changes,” “root cause,” “why,” “how to fix,” “dummy API key,” “custom base URL,” “persist_directory,” “PersistentClient,” “Python 3.10,” “pydantic==1.10.10.”\n", - "\n", - "Style\n", - "- Be specific and action‑oriented; prefer concrete class/param names, model IDs, commands, and troubleshooting terms developers actually search for. Quote error strings verbatim. Avoid fluff.\n", - "2025/08/13 21:07:29 INFO dspy.evaluate.evaluate: Average Metric: 2.1666666666666665 / 5 (43.3%)\n", - "2025/08/13 21:07:29 INFO dspy.teleprompt.gepa.gepa: Iteration 22: New subsample score is not better, skipping\n", - "2025/08/13 21:07:29 INFO dspy.teleprompt.gepa.gepa: Iteration 23: Selected program 3 score: 0.6833333333333333\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 3.75 / 5 (75.0%): 100%|██████████| 5/5 [00:08<00:00, 1.69s/it]" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 21:07:37 INFO dspy.evaluate.evaluate: Average Metric: 3.75 / 5 (75.0%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 21:08:46 INFO dspy.teleprompt.gepa.gepa: Iteration 23: Proposed new text for expand_query: You expand developer questions into a single, concise search-query paragraph that helps a search engine retrieve high-signal, authoritative fixes and examples.\n", - "\n", - "Output format\n", - "- Return only the expanded search query as one paragraph (no headings or extra commentary), 2–5 sentences total.\n", - "\n", - "How to expand\n", - "1) Extract exact technologies, libraries, versions, models, classes/functions, parameters, env vars, and quote any error messages verbatim. Identify the user’s task and where it’s failing; include their system info (Python version, package versions, OS/GPU if given).\n", - "2) Add synonyms and related names for APIs/classes/params (e.g., LlamaIndex/llama_index/gpt_index; ChatOpenAI vs OpenAI wrappers; HuggingFacePipeline vs transformers.pipeline; base_url/api_base/openai_api_base; model/model_name), plus common misconfigurations and breaking changes.\n", - "3) Anticipate root causes and fixes: version/compatibility issues, correct imports and install packages, supported tasks, proper flags/params, env vars, install commands and pins, and minimal repro patterns. Prefer concrete, likely solutions over generic advice.\n", - "4) Include keywords for the “expected correct approach” devs search for: exact class/function names, model IDs, flags, example code terms, correct API/module names, retriever/vector store loading APIs, and troubleshooting terms (“why,” “how to fix,” “root cause,” “breaking changes”).\n", - "5) Keep it targeted and specific; avoid fluff; use the exact error strings, class names, and config keys devs will paste into search.\n", - "\n", - "Domain-specific nuggets to always weave in when relevant\n", - "- OpenAI chat sessions and context:\n", - " - The Chat Completions API has no server-side session state; you must resend conversation history every call (“no server-side session,” “conversation ID not supported,” “how to maintain context,” “message history management”).\n", - " - Best practices: client-side memory (ConversationSummaryMemory, ConversationBufferMemory, ConversationBufferMemoryHistory, RunnableWithMessageHistory), summarizing/truncating history, token limits, caching.\n", - "- Using OpenAI-compatible local endpoints with LangChain:\n", - " - Use ChatOpenAI (or OpenAI) with a custom base URL pointing to localhost (base_url/api_base/openai_api_base); set a dummy OPENAI_API_KEY if required (“no custom client,” “OpenAI-compatible server,” “works out of the box”). Include env vars (OPENAI_API_KEY, OPENAI_BASE_URL) and params (model/model_name).\n", - "- LangChain imports and package split:\n", - " - If ModuleNotFoundError: No module named 'langchain_openai', note the package split and correct installs/imports: langchain, langchain-core, langchain-community, langchain-openai; older setups may use ChatOpenAI from langchain_community.chat_models. Include “correct import path,” “installation steps,” “breaking changes.”\n", - "- Hugging Face + LangChain:\n", - " - LangChain’s HuggingFacePipeline supports only text-generation, text2text-generation, and summarization; it does not support ASR. Errors like \"AttributeError: 'WhisperProcessor' object has no attribute 'config'\" appear when misused for ASR.\n", - " - For ASR use transformers.pipeline('automatic-speech-recognition') with AutoModelForSpeechSeq2Seq or WhisperForConditionalGeneration and WhisperProcessor/feature extractor; do not wrap ASR with HuggingFacePipeline.\n", - " - For speaker diarization, Whisper alone doesn’t do diarization; combine with pyannote.audio or whisperx (“speaker diarization pipeline,” “pyannote/speaker-diarization”).\n", - "- Fully local HF models (e.g., Falcon 40B instruct):\n", - " - Load with transformers (AutoTokenizer, AutoModelForCausalLM), create a text-generation pipeline, then wrap with HuggingFacePipeline or ChatHuggingFace. Include device_map=\"auto\", torch_dtype, trust_remote_code=True (Falcon), bitsandbytes 4-bit/8-bit, “local weights,” “no token,” avoid HfHubHTTPError: 401 by not using HuggingFaceHub. Include “example code,” “integration with RetrievalQA.”\n", - "- Chroma vector store:\n", - " - Chroma.get excludes embeddings by default; use include=['embeddings'] (otherwise 'embeddings': None).\n", - " - Persistence: use persist() and persist_directory; verify files (e.g., chroma-embeddings.parquet). VectorstoreIndexCreator defaults can be transient (DuckDB/in-memory); for durable reuse, instantiate Chroma(persist_directory=...) or Chroma.from_documents(..., persist_directory=...).\n", - " - Loading persisted stores avoids re-embedding charges; embeddings are computed (and billed) once via OpenAIEmbeddings (text-embedding-ada-002, text-embedding-3-large) and reused on load.\n", - "- LangChain tools + Pydantic:\n", - " - If @tool(args_schema=...) raises ValidationError about BaseModel subclass, it’s likely Pydantic v2 vs LangChain compatibility; pin pydantic==1.10.10 (pip install \"pydantic==1.10.10\").\n", - "- chromadb installation/runtime:\n", - " - Import/install loops or native deps failures can be Python version incompatibility; Python 3.10 commonly resolves. Mention virtualenv/conda, wheel availability.\n", - "- LlamaIndex + LangChain RAG integration:\n", - " - For sentence window retrieval, use llama_index’s SentenceWindowNodeParser, VectorStoreIndex, MetadataReplacementPostProcessor, LLMRerank, and sentence_window_engine; inject snippets into LangChain prompts via placeholders ({context}, {summary}, {messages}). Include memory/token management and evaluation terms (latency, token usage).\n", - "- SQL with LangChain:\n", - " - sqlite3.OperationalError near \"The\": syntax error occurs when models add commentary; use create_sql_agent, SQLDatabaseToolkit, stricter output parsing, prompt templates enforcing raw SQL, or use_query_checker. Mention model_name=\"gpt-4-0613” vs older completions.\n", - "- Redis retriever hybrid search:\n", - " - Pass filters via dict search_kwargs/retriever_search_kwargs, not strings; include {'k': ..., 'distance_threshold': ..., 'filter': '...'}.\n", - " - Error \"redis.exceptions.ResponseError: Invalid attribute yield_distance_as\" can stem from a bug in LangChain’s _prepare_range_query() or query construction; ensure the filter precedes the vector range in the query, upgrade langchain/redis integrations, or adjust search_type (similarity vs similarity_distance_threshold). Include terms: “_prepare_range_query bug,” “filter before vector clause,” “correct RediSearch query syntax.”\n", - "\n", - "General best-practice keywords to include\n", - "- Exact model IDs and classes: openai/whisper-large-v2, WhisperForConditionalGeneration, AutoModelForSpeechSeq2Seq, ChatOpenAI, OpenAIEmbeddings, HuggingFacePipeline, ChatHuggingFace, RetrievalQA, Chroma.from_documents, .get(include=['embeddings']), SentenceWindowNodeParser, VectorStoreIndex, LLMRerank, MetadataReplacementPostProcessor, create_sql_agent, SQLDatabaseToolkit.\n", - "- Search intents: “correct usage,” “supported tasks,” “version compatibility,” “install command,” “version pin,” “dummy API key,” “custom base URL,” “persist_directory,” “Python 3.10,” “pydantic==1.10.10,” “example code,” “minimal reproducible example.”\n", - "\n", - "Style\n", - "- Be specific and action-oriented; prefer concrete class/param names, model IDs, and troubleshooting terms developers actually search for. Quote error strings verbatim. Avoid fluff. Output only the expanded search query paragraph.\n", - "2025/08/13 21:08:59 INFO dspy.evaluate.evaluate: Average Metric: 3.75 / 5 (75.0%)\n", - "2025/08/13 21:08:59 INFO dspy.teleprompt.gepa.gepa: Iteration 23: New subsample score is not better, skipping\n", - "2025/08/13 21:08:59 INFO dspy.teleprompt.gepa.gepa: Iteration 24: Selected program 7 score: 0.6811111111111111\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 4.33 / 5 (86.7%): 100%|██████████| 5/5 [00:09<00:00, 1.95s/it] " - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 21:09:09 INFO dspy.evaluate.evaluate: Average Metric: 4.333333333333333 / 5 (86.7%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 21:10:30 INFO dspy.teleprompt.gepa.gepa: Iteration 24: Proposed new text for expand_query: You expand developer questions into a single, concise search-query paragraph that helps a search engine retrieve high-signal, authoritative fixes and examples.\n", - "\n", - "Output format\n", - "- Return only the expanded search query as one paragraph (no headings or extra commentary), 2–5 sentences total.\n", - "\n", - "How to expand\n", - "1) Extract and include exact technologies, libraries, versions, model IDs, classes/functions, parameters/flags, env vars, and quote error messages verbatim. Identify the user’s task, what fails, and whether it’s runtime, import, configuration, or behavioral.\n", - "2) Add synonyms/related names (e.g., LlamaIndex/llama_index/gpt_index; ChatOpenAI vs OpenAI wrappers; HuggingFacePipeline vs transformers.pipeline; ChatHuggingFace; OpenAI-compatible “base_url/api_base/openai_api_base/OPENAI_BASE_URL”). Include install commands/version pins where relevant.\n", - "3) Anticipate root causes/fixes: recent package split changes, correct imports, supported tasks, proper parameters/flags, environment variables, model compatibility/loading options, minimal repro keywords, and example code terms. Prefer specific, likely solutions over generic advice.\n", - "4) Include keywords for the expected correct approach: exact class/function names, model IDs, key flags, correct APIs/modules, prompt/memory placeholders, and minimal repro patterns. Weave in “why,” “how to fix,” and “example code” terms when helpful. Keep the query targeted and precise.\n", - "\n", - "Domain-specific nuggets to always include when relevant\n", - "- OpenAI chat sessions and context:\n", - " - The Chat Completions API has no server-side session; “no server-side session,” “conversation ID not supported.” You must resend conversation history each call. Mention client-side memory (ConversationSummaryMemory, ConversationBufferMemory, ConversationBufferMemoryHistory, RunnableWithMessageHistory), summarizing/truncating, token limits, caching, and how to maintain context.\n", - "- Hugging Face + LangChain:\n", - " - LangChain’s HuggingFacePipeline supports only text-generation, text2text-generation, and summarization; it does not support automatic-speech-recognition (ASR). Attempting ASR with HuggingFacePipeline is a common misconfiguration.\n", - " - For Whisper ASR use transformers.pipeline(\"automatic-speech-recognition\") with WhisperForConditionalGeneration or AutoModelForSpeechSeq2Seq plus WhisperProcessor (not WhisperProcessor as the model). Quote errors like \"AttributeError: 'WhisperProcessor' object has no attribute 'config'\" and note the fix.\n", - " - For fully local inference (e.g., Falcon 40B instruct), load with transformers (AutoTokenizer, AutoModelForCausalLM), create a text-generation pipeline, then wrap with HuggingFacePipeline or ChatHuggingFace. Include device_map=\"auto\", torch_dtype, trust_remote_code=True (Falcon), bitsandbytes 4-bit/8-bit, “local weights,” avoiding HfHubHTTPError: 401 by not using HuggingFaceHub.\n", - "- LangChain imports and package split:\n", - " - If ModuleNotFoundError: No module named 'langchain_openai', include the split packages and correct installs/imports (langchain, langchain-community, langchain-core, langchain-openai; older setups may use ChatOpenAI from langchain_community.chat_models). Include “correct import path,” “installation steps,” and “breaking changes.”\n", - "- Using OpenAI-compatible local endpoints with LangChain:\n", - " - Use ChatOpenAI (or OpenAI) with a custom base URL pointing to localhost (api_base/base_url/openai_api_base/OPENAI_BASE_URL). Set a dummy OPENAI_API_KEY if required. Include “OpenAI-compatible server,” “works out of the box,” “no custom client,” and specify model/model_name.\n", - "- Chroma vector store:\n", - " - Chroma .get excludes embeddings by default; use include=['embeddings'] (otherwise 'embeddings': None). Include persist(), persist_directory, and verifying chroma-embeddings.parquet for persistence.\n", - "- LangChain tools + Pydantic:\n", - " - If @tool(args_schema=...) triggers a ValidationError about BaseModel subclass, it’s often Pydantic v2 vs LangChain compatibility; pin pydantic==1.10.10 (include pip command).\n", - "- chromadb installation/runtime:\n", - " - Import/install failures often stem from Python version incompatibility; Python 3.10 commonly resolves chromadb issues. Include env/version troubleshooting and virtualenv/conda hints.\n", - "- LlamaIndex + LangChain RAG integration:\n", - " - For sentence window retrieval, use llama_index’s SentenceWindowNodeParser, VectorStoreIndex, MetadataReplacementPostProcessor, LLMRerank, and sentence_window_engine to query. Mention injecting retrieved snippets into LangChain prompts via placeholders (e.g., {context}, {summary}, {messages}), and memory/token management and evaluation terms (latency, token usage).\n", - "- SQL with LangChain:\n", - " - GPT-4/ChatOpenAI with SQLDatabaseChain often includes commentary around SQL, causing sqlite3.OperationalError near \"The\": syntax error. Quote this exact failure mode and that SQLDatabaseChain may emit “SQLQuery:” plus extra text.\n", - " - Prefer create_sql_agent with SQLDatabaseToolkit to execute clean SQL, or enforce raw-SQL-only via output parser/prompt template; discuss use_query_checker behavior. Include example code/search terms.\n", - "- Callbacks in LangChain:\n", - " - Pass callbacks to the LLM/client instance (e.g., VertexAI, ChatOpenAI) rather than only to LLMChain. The BaseCallbackHandler.on_llm_end signature is on_llm_end(self, response, **kwargs); don’t use (event, context). Mention version-specific callback API changes and “callback manager” keywords.\n", - "- LangChain.js JSON loading:\n", - " - For JSON files, configure JSONLoader to treat each object as a single document via a JSON Pointer/jq path (e.g., \"/items/*\", \"$[*]\") and set textKey to choose the field for pageContent. Note that invalid JSON (missing commas) causes incorrect parsing; include “fix JSON syntax,” “JSON pointer,” “textKey,” “one document per object,” and DirectoryLoader mapping examples.\n", - "\n", - "Style\n", - "- Be specific and action-oriented; prefer concrete class/param names, model IDs, error strings, install/version pin commands, and troubleshooting terms developers actually search for. Quote errors verbatim. Keep 2–5 sentences, one paragraph.\n", - "2025/08/13 21:10:38 INFO dspy.evaluate.evaluate: Average Metric: 3.833333333333333 / 5 (76.7%)\n", - "2025/08/13 21:10:38 INFO dspy.teleprompt.gepa.gepa: Iteration 24: New subsample score is not better, skipping\n", - "2025/08/13 21:10:38 INFO dspy.teleprompt.gepa.gepa: Iteration 25: Selected program 3 score: 0.6833333333333333\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 4.50 / 5 (90.0%): 100%|██████████| 5/5 [00:06<00:00, 1.35s/it] " - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 21:10:45 INFO dspy.evaluate.evaluate: Average Metric: 4.5 / 5 (90.0%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 21:12:09 INFO dspy.teleprompt.gepa.gepa: Iteration 25: Proposed new text for expand_query: You expand developer questions into a single, concise search-query paragraph that helps a search engine retrieve high-signal, authoritative fixes and examples.\n", - "\n", - "Output format\n", - "- Return only the expanded search query as one paragraph (no headings or extra commentary), 2–5 sentences total.\n", - "\n", - "How to expand\n", - "1) Extract exact technologies, libraries, versions, models, classes/functions, parameters, environment variables, and quote error messages verbatim. Identify the user’s task and where it’s failing.\n", - "2) Add synonyms and related names (e.g., LlamaIndex/llama_index/gpt_index; ChatOpenAI vs OpenAI wrappers; ChatHuggingFace vs HuggingFacePipeline; transformers.pipeline vs HuggingFaceHub), plus common misconfigurations and breaking changes.\n", - "3) Anticipate likely root causes and fixes: version/compatibility issues, correct imports, supported tasks, proper parameters/flags, environment and install commands, version pins, and minimal repro patterns.\n", - "4) Include keywords for the “expected correct approach” developers would search for: exact class/function names, model IDs, flags, example code terms, correct API/module names, and canonical integration patterns.\n", - "5) Keep it targeted and precise; prefer specific, likely solutions over generic advice. If multiple versions/import paths might apply, include both.\n", - "\n", - "Domain-specific nuggets to always include when relevant\n", - "- OpenAI chat sessions and context:\n", - " - The Chat Completions API has no server-side session state; you must resend conversation history each call. Include “no server-side session,” “conversation ID not supported,” “how to maintain context,” “message history management,” best practices for client-side memory (ConversationSummaryMemory, ConversationBufferMemory, ConversationBufferMemoryHistory, RunnableWithMessageHistory), summarizing/truncating history, token limits, and caching.\n", - "- Hugging Face + LangChain:\n", - " - HuggingFacePipeline supports only text-generation, text2text-generation, and summarization; it does not support ASR. Errors like \"AttributeError: 'WhisperProcessor' object has no attribute 'config'\" can appear when misused. Include “transformers.pipeline('automatic-speech-recognition')”, correct ASR components (AutoModelForSpeechSeq2Seq/WhisperForConditionalGeneration + WhisperProcessor/feature extractor), and that LangChain’s HuggingFacePipeline isn’t appropriate for ASR.\n", - " - For fully local inference (e.g., tiiuae/falcon-40b-instruct), load with transformers (AutoTokenizer, AutoModelForCausalLM), create a text-generation pipeline, then wrap with HuggingFacePipeline or ChatHuggingFace. Include device_map=\"auto\", torch_dtype, trust_remote_code=True (Falcon), and optional 4-bit/8-bit quantization with bitsandbytes; avoid HfHubHTTPError: 401 by not using HuggingFaceHub. Include “example code,” “local weights,” “no token,” and “integration with RetrievalQA.”\n", - "- LangChain imports and package split:\n", - " - If ModuleNotFoundError: No module named 'langchain_openai' occurs, note the package splits and correct imports/installs (langchain, langchain-community, langchain-core, langchain-openai). Include “correct import path,” “pip install langchain-openai,” and “breaking changes.” For older setups, mention ChatOpenAI from langchain_community.chat_models as an alternative.\n", - "- Using OpenAI-compatible local endpoints with LangChain:\n", - " - Use ChatOpenAI (or OpenAI) with a custom base URL (api_base/base_url/openai_api_base) pointing to localhost; set a dummy openai_api_key if required. Include environment variables (OPENAI_API_KEY, OPENAI_BASE_URL) and parameters (model/model_name) and “OpenAI-compatible server,” “no custom client,” “works out of the box.”\n", - "- Chroma vector store:\n", - " - Chroma .get excludes embeddings by default; to retrieve them, use include=['embeddings'] (otherwise 'embeddings': None). Include persist(), persist_directory, and verifying chroma-embeddings.parquet if persistence matters.\n", - "- LangChain tools + Pydantic:\n", - " - With @tool(args_schema=...) ValidationError about BaseModel subclass, it’s often Pydantic v2 vs LangChain; pin pydantic==1.10.10. Include the exact pin and pip install command.\n", - "- chromadb installation/runtime:\n", - " - ImportError or install loops can be due to Python version incompatibility; Python 3.10 commonly resolves chromadb issues. Include environment/version troubleshooting and virtualenv/conda hints.\n", - "- LlamaIndex + LangChain RAG integration:\n", - " - For sentence window retrieval, use llama_index’s SentenceWindowNodeParser, VectorStoreIndex, MetadataReplacementPostProcessor, LLMRerank, and sentence_window_engine. Inject retrieved snippets into the LangChain chat context via prompt placeholders (e.g., {context}, {summary}, {messages}) rather than dumping entire documents; include memory/token management terms (ConversationBufferMemory, ConversationBufferMemoryHistory) and evaluation metrics (latency, token usage).\n", - "- SQL with LangChain:\n", - " - GPT-4/ChatOpenAI with SQLDatabaseChain may include extra commentary in SQL, causing sqlite3.OperationalError near \"The\": syntax error. Include “create_sql_agent,” “SQLDatabaseToolkit,” “output parser,” “prompt template enforcing raw SQL,” and “use_query_checker behavior” as fixes/alternatives. Mention model_name=\"gpt-4-0613” and differences vs text-davinci-003.\n", - "- LangChain.js JSON ingestion:\n", - " - JSONLoader/DirectoryLoader can split into multiple docs; configure to treat each array element/object as one document (e.g., jqSchema/JSON Pointer, JSONLinesLoader vs JSONLoader, pageContentFields/flattening). Include “correct JSON syntax,” “missing commas,” and examples for mapping fields, combining properties into a single pageContent, and avoiding per-field line-based docs. Include “example code,” “jqSchema,” “pointer,” and “LangChain.js JSONLoader configuration.”\n", - "\n", - "Best-practice keywords to weave in\n", - "- Exact model IDs/classes and patterns: tiiuae/falcon-40b-instruct, openai/whisper-large-v2, AutoTokenizer, AutoModelForCausalLM, AutoModelForSpeechSeq2Seq/WhisperForConditionalGeneration, transformers.pipeline, HuggingFacePipeline, ChatHuggingFace, ChatOpenAI, RetrievalQA, Chroma.from_documents, .get(include=['embeddings']), persist_directory, @tool args_schema, SentenceWindowNodeParser, VectorStoreIndex, LLMRerank, MetadataReplacementPostProcessor, create_sql_agent, SQLDatabaseToolkit.\n", - "- Search intents: “correct usage,” “supported tasks,” “version compatibility,” “install command,” “example code,” “breaking changes,” “root cause,” “how to fix,” “dummy API key,” “custom base URL,” “no server-side session,” “message history management,” “persist_directory,” “Python 3.10,” “pydantic==1.10.10,” “local weights,” “no token,” “401 Unauthorized,” “device_map='auto', torch_dtype, trust_remote_code=True,” “quantization 4-bit/8-bit,” and “RetrievalQA integration.”\n", - "\n", - "Style\n", - "- Be specific and action-oriented; use concrete class/param names, model IDs, and troubleshooting terms developers actually search for. Quote errors exactly. Avoid fluff.\n", - "2025/08/13 21:12:17 INFO dspy.evaluate.evaluate: Average Metric: 4.5 / 5 (90.0%)\n", - "2025/08/13 21:12:17 INFO dspy.teleprompt.gepa.gepa: Iteration 25: New subsample score is not better, skipping\n", - "2025/08/13 21:12:17 INFO dspy.teleprompt.gepa.gepa: Iteration 26: Selected program 3 score: 0.6833333333333333\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 2.83 / 5 (56.7%): 100%|██████████| 5/5 [00:07<00:00, 1.57s/it] " - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 21:12:25 INFO dspy.evaluate.evaluate: Average Metric: 2.833333333333333 / 5 (56.7%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 21:13:57 INFO dspy.teleprompt.gepa.gepa: Iteration 26: Proposed new text for expand_query: You expand developer questions into a single, concise search-query paragraph that helps a search engine retrieve high-signal, authoritative fixes and examples.\n", - "\n", - "Output format\n", - "- Return only the expanded search query as one paragraph (no headings or extra commentary), 2–5 sentences total.\n", - "- Quote error messages verbatim. Include exact versions, model IDs, class/function names, parameters, and flags.\n", - "\n", - "How to expand\n", - "1) Identify the user’s goal, the exact tech stack (libraries, versions, models, classes/functions), configuration (env vars, flags), and where it fails (errors, unexpected behavior). Extract precise strings: e.g., \"AttributeError: 'WhisperProcessor' object has no attribute 'config'\".\n", - "2) Add synonyms and related names and their import paths: LlamaIndex/llama_index/gpt_index; ChatOpenAI vs OpenAI (LangChain, langchain-openai); HuggingFacePipeline vs transformers.pipeline; Chroma/chromadb; AutoModelForSpeechSeq2Seq vs WhisperForConditionalGeneration; Azure OpenAI vs OpenAI SDK; RetrievalQA vs VectorStoreIndex/VectorStoreIndexCreator; create_sql_agent vs SQLDatabaseChain.\n", - "3) Anticipate root causes and fixes: version/compatibility issues, correct imports, supported tasks, proper parameters/flags, environment variables, install commands, and version pins. Include minimal repro and “expected correct approach” keywords (exact class/function names, model IDs, flags).\n", - "4) Prefer specific, likely solutions over generic advice. Mention breaking changes and correct modules for the user’s versions, and provide canonical “how to do it right” search terms (e.g., \"Chroma .get(include=['embeddings']) example\", \"PersistentClient duckdb+parquet\").\n", - "5) Keep it targeted and precise; do not add fluff or unrelated speculation.\n", - "\n", - "Domain-specific nuggets to always include when relevant\n", - "- OpenAI Chat Completions context:\n", - " - The API is stateless: “no server-side session,” “conversation ID not supported.” You must resend message history each call. Include “how to maintain context,” “message history management,” “ConversationBufferMemory,” “ConversationSummaryMemory,” “ConversationBufferMemoryHistory,” “RunnableWithMessageHistory,” “token limits,” “summarize/truncate history,” and “caching.”\n", - "- Hugging Face + LangChain + Whisper/ASR:\n", - " - LangChain’s HuggingFacePipeline supports only text-generation, text2text-generation, and summarization; it does not support ASR. Using it with ASR leads to errors like \"AttributeError: 'WhisperProcessor' object has no attribute 'config'\".\n", - " - Correct ASR setup uses transformers.pipeline('automatic-speech-recognition') with AutoModelForSpeechSeq2Seq or WhisperForConditionalGeneration and WhisperProcessor/AutoProcessor; do not pass WhisperProcessor as the model.\n", - " - Whisper does not perform speaker diarization by itself; include “pyannote.audio speaker diarization,” “segment audio then transcribe,” and examples combining pyannote with Whisper.\n", - "- Fully local inference (e.g., Falcon 40B instruct):\n", - " - Load with transformers (AutoTokenizer, AutoModelForCausalLM), create a text-generation pipeline, then wrap with HuggingFacePipeline/ChatHuggingFace. Include device_map=\"auto\", torch_dtype (float16/bfloat16), trust_remote_code=True (Falcon), bitsandbytes 4-bit/8-bit quantization, “local weights,” “no token,” avoid HfHubHTTPError: 401 by not using HuggingFaceHub.\n", - "- LangChain imports and package split:\n", - " - If ModuleNotFoundError: No module named 'langchain_openai' (or similar), note the package splits and correct imports/installs: langchain, langchain-core, langchain-community, langchain-openai (or older ChatOpenAI from langchain_community.chat_models). Include “correct import path,” “installation steps,” and “breaking changes.”\n", - "- Using OpenAI-compatible local endpoints with LangChain:\n", - " - Use ChatOpenAI (or OpenAI) with a custom base URL (api_base/base_url/openai_api_base) pointing to localhost; set a dummy openai_api_key if required. Include environment variables (OPENAI_API_KEY, OPENAI_BASE_URL), parameters (model/model_name), and “OpenAI-compatible server,” “no custom client.”\n", - "- Chroma vector store usage:\n", - " - .get excludes embeddings by default; to retrieve them, use include=['embeddings'] (otherwise you’ll see 'embeddings': None). Include persist(), persist_directory, verifying chroma-embeddings.parquet/chroma-collections.parquet if persistence is relevant.\n", - " - Prefer chromadb.PersistentClient (duckdb+parquet) for persistence; Chroma handles persistence—avoid unnecessary manual llama_index StorageContext.persist when using Chroma.\n", - "- LlamaIndex + Chroma + LangChain:\n", - " - When pairing LlamaIndex with Chroma, pass the embedding function on collection creation and ensure the same embed model on reload (embedding_function in Chroma; embed_model in LlamaIndex ServiceContext). Use VectorStoreIndex (formerly GPTVectorStoreIndex). Mention breaking changes (gpt_index → llama_index; index/query APIs renamed).\n", - " - LangChain is not required if you are already using LlamaIndex end to end; simplify stack unless you need LangChain chains/tools.\n", - " - Use a proper Prompt/PromptTemplate with LlamaIndex; include “text_qa_template,” “query_engine example.”\n", - " - Prefer OpenAIEmbedding/OpenAI as defaults for embedding/LLM unless Azure or local models are intended; for Azure OpenAI include azure_endpoint, api_version, and deployment_name.\n", - "- Vectorstore persistence and billing:\n", - " - VectorstoreIndexCreator by default may create an in-memory store; to persist, use Chroma with persist_directory and call db.persist(), then reload with Chroma(persist_directory=..., embedding=...). Do not rely on incidental .pkl artifacts; instead, load the vectorstore properly (e.g., FAISS.save_local/load_local for FAISS).\n", - " - OpenAI billing: Embeddings incur cost when created; loading from a persisted vectorstore does not re-embed or re-bill. Avoid re-embedding unless data changed.\n", - "- LangChain tools + Pydantic:\n", - " - If @tool(args_schema=...) triggers a ValidationError about BaseModel subclass, it’s often Pydantic v2 incompatibility; pin pydantic==1.10.10 or align with compatible LangChain versions. Include exact pip pins.\n", - "- chromadb installation/runtime:\n", - " - Import/build issues are often Python version related; Python 3.10 commonly resolves them. Include virtualenv/conda hints and version troubleshooting.\n", - "- LlamaIndex + LangChain RAG integration:\n", - " - For sentence window retrieval, include SentenceWindowNodeParser, VectorStoreIndex, MetadataReplacementPostProcessor, LLMRerank, sentence_window_engine; inject retrieved snippets into prompts via placeholders ({context}, {summary}, {messages}); include memory/token management terms (ConversationBufferMemory, ConversationBufferMemoryHistory), and evaluation metrics (latency, token usage).\n", - "- SQL with LangChain:\n", - " - GPT-4/ChatOpenAI with SQLDatabaseChain may emit prose causing sqlite3.OperationalError near \"The\": syntax error. Include “create_sql_agent,” “SQLDatabaseToolkit,” “output parser enforcing raw SQL,” “use_query_checker behavior,” and model_name=\"gpt-4-0613\".\n", - "\n", - "Style\n", - "- Be specific and action-oriented; prefer concrete class/param names, model IDs, and troubleshooting terms developers search for (e.g., \"example code,\" \"correct usage,\" \"breaking changes,\" \"install command,\" \"version pin,\" \"root cause,\" \"why,\" \"how to fix\").\n", - "- Include likely fixes and correct API/module names the user should adopt for their exact versions.\n", - "2025/08/13 21:14:03 INFO dspy.evaluate.evaluate: Average Metric: 2.833333333333333 / 5 (56.7%)\n", - "2025/08/13 21:14:03 INFO dspy.teleprompt.gepa.gepa: Iteration 26: New subsample score is not better, skipping\n", - "2025/08/13 21:14:03 INFO dspy.teleprompt.gepa.gepa: Iteration 27: Selected program 3 score: 0.6833333333333333\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 2.92 / 5 (58.3%): 100%|██████████| 5/5 [00:05<00:00, 1.12s/it]" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 21:14:09 INFO dspy.evaluate.evaluate: Average Metric: 2.9166666666666665 / 5 (58.3%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 21:15:05 INFO dspy.teleprompt.gepa.gepa: Iteration 27: Proposed new text for expand_query: You expand developer questions into a single, concise search-query paragraph that helps a search engine retrieve high-signal, authoritative fixes and examples.\n", - "\n", - "Output format\n", - "- Return only the expanded search query as one paragraph (no headings or extra commentary), 2–5 sentences total.\n", - "\n", - "How to expand\n", - "1) Extract the exact technologies, libraries, versions, models, classes/functions, parameters/flags, and quote error messages verbatim. Identify the user’s task and where it’s failing.\n", - "2) Add synonyms/aliases and related names (e.g., LlamaIndex/llama_index/gpt_index; ChatOpenAI vs OpenAI wrappers; HuggingFacePipeline vs transformers.pipeline; model vs model_name; OPENAI_BASE_URL vs openai_api_base).\n", - "3) Anticipate root causes and fixes: version/compat issues, correct imports, supported tasks, proper parameters/flags, environment variables, install commands, and version pins. Prefer specific, likely solutions over generic advice.\n", - "4) Include keywords for the expected correct approach: exact class/function names, model IDs, flags, example code terms, correct API/module names, minimal repro patterns, and known breaking changes.\n", - "5) Keep it targeted and precise; include concrete commands (e.g., pip install ...), environment hints, and config keys devs search for.\n", - "\n", - "Domain-specific nuggets to always include when relevant\n", - "- OpenAI chat sessions and context:\n", - " - The Chat Completions API has no server-side session; resend conversation history every call. Include “no server-side session,” “conversation ID not supported,” “how to maintain context,” “message history management,” token limits, summarizing/truncating, client-side memory (ConversationSummaryMemory, ConversationBufferMemory, ConversationBufferMemoryHistory, RunnableWithMessageHistory), and caching.\n", - "- Hugging Face + LangChain:\n", - " - HuggingFacePipeline supports only text-generation, text2text-generation, and summarization; not ASR. Errors like \"AttributeError: 'WhisperProcessor' object has no attribute 'config'\" indicate misuse. Include transformers.pipeline('automatic-speech-recognition'), correct ASR components (AutoModelForSpeechSeq2Seq/WhisperForConditionalGeneration + WhisperProcessor/feature extractor), and that LangChain’s HuggingFacePipeline isn’t for ASR.\n", - " - For fully local inference (e.g., Falcon 40B instruct), load with transformers (AutoTokenizer, AutoModelForCausalLM), create a text-generation pipeline, then wrap with HuggingFacePipeline or ChatHuggingFace. Include device_map=\"auto\", torch_dtype, trust_remote_code=True (Falcon), 4-bit/8-bit bitsandbytes, avoid HfHubHTTPError: 401 by not using HuggingFaceHub, “example code,” “local weights,” “no token,” and “integration with RetrievalQA.”\n", - "- LangChain imports and package split:\n", - " - If ModuleNotFoundError: No module named 'langchain_openai', include the package split and correct installs/imports (langchain, langchain-community, langchain-core, langchain-openai; or legacy ChatOpenAI from langchain_community.chat_models). Include “correct import path,” “installation steps,” and “breaking changes.”\n", - "- Using OpenAI-compatible local endpoints with LangChain:\n", - " - Use ChatOpenAI (or OpenAI) with a custom base URL (openai_api_base/base_url/OPENAI_BASE_URL) to localhost; set a dummy OPENAI_API_KEY if required. Include “OpenAI-compatible server,” “works out of the box,” “model/model_name,” and env vars (OPENAI_API_KEY, OPENAI_BASE_URL/OPENAI_API_BASE).\n", - "- Chroma vector store:\n", - " - .get excludes embeddings by default; use include=['embeddings'] (else 'embeddings': None). Include persist(), persist_directory, and verifying chroma-embeddings.parquet if persistence matters.\n", - "- LangChain tools + Pydantic:\n", - " - ValidationError for @tool(args_schema=...) is often Pydantic v2 vs LangChain. Include downgrading to pydantic==1.10.10 with exact commands (pip install pydantic==1.10.10; add to requirements.txt).\n", - "- chromadb install/runtime:\n", - " - Import/installation loops often due to Python version; Python 3.10 commonly resolves. Include environment/version troubleshooting and virtualenv/conda hints.\n", - "- LlamaIndex + LangChain RAG:\n", - " - For sentence window retrieval, include llama_index’s SentenceWindowNodeParser, VectorStoreIndex, MetadataReplacementPostProcessor, LLMRerank, and sentence_window_engine; inject retrieved snippets into LangChain prompts via placeholders ({context}, {summary}, {messages}); include memory/token management (ConversationBufferMemory, ConversationBufferMemoryHistory) and evaluation terms (latency, token usage).\n", - "- SQL with LangChain:\n", - " - GPT-4/ChatOpenAI with SQLDatabaseChain can emit commentary around SQL causing sqlite3.OperationalError near \"The\": syntax error. Include “create_sql_agent,” “SQLDatabaseToolkit,” “AgentExecutor,” “output parser,” “prompt template enforcing raw SQL,” differences vs text-davinci-003, and “use_query_checker behavior.” Mention switching to agents or strict output parsing to ensure raw SQL only.\n", - "- Redis retriever hybrid search:\n", - " - Pass search_kwargs as a dict, not a string; include filter inside search_kwargs (e.g., {'include_metadata': True, 'distance_threshold': 0.8, 'k': 5, 'filter': '@menu_text:(%%chicken%%) @lunch:{true}'}). Note known LangChain bug in _prepare_range_query() causing bad RediSearch syntax like \"Invalid attribute yield_distance_as\"; the filter must appear before the vector range fragment, and avoid the erroneous =>{$yield_distance_as: distance}. Include terms “RediSearch FT.SEARCH,” “VECTOR_RANGE,” “similarity_distance_threshold,” correct query construction, and suggest upgrading/downgrading LangChain if needed.\n", - "- LangChain callbacks (VertexAI, etc.):\n", - " - Callback method signatures: on_llm_end(response, **kwargs) (not event/context). Include that handlers often must be attached to the LLM instance (e.g., VertexAI(callbacks=[handler])) rather than only to LLMChain, and note callback API changes across versions (CallbackManager, CallbackManagerForLLMRun).\n", - "\n", - "General best-practice keywords to weave in\n", - "- Exact model IDs/classes: openai/whisper-large-v2, ChatOpenAI, OpenAI, HuggingFacePipeline, ChatHuggingFace, RetrievalQA, Chroma.from_documents, .get(include=['embeddings']), @tool args_schema, SentenceWindowNodeParser, VectorStoreIndex, LLMRerank, MetadataReplacementPostProcessor, create_sql_agent, SQLDatabaseToolkit, AgentExecutor.\n", - "- Search intents: “correct usage,” “supported tasks,” “version compatibility,” “install command,” “breaking changes,” “root cause,” “why,” “how to fix,” “dummy API key,” “custom base URL,” “persist_directory,” “Python 3.10,” “pydantic==1.10.10,” “example code,” “minimal reproducible example.”\n", - "\n", - "Style\n", - "- Be specific and action-oriented; prefer concrete class/param names, model IDs, env vars, and troubleshooting terms developers actually search for. Keep it focused, with likely fixes and exact commands/flags. Avoid fluff.\n", - "2025/08/13 21:15:10 INFO dspy.evaluate.evaluate: Average Metric: 2.9166666666666665 / 5 (58.3%)\n", - "2025/08/13 21:15:10 INFO dspy.teleprompt.gepa.gepa: Iteration 27: New subsample score is not better, skipping\n", - "2025/08/13 21:15:10 INFO dspy.teleprompt.gepa.gepa: Iteration 28: Selected program 3 score: 0.6833333333333333\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 2.67 / 5 (53.3%): 100%|██████████| 5/5 [00:06<00:00, 1.29s/it]" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 21:15:16 INFO dspy.evaluate.evaluate: Average Metric: 2.6666666666666665 / 5 (53.3%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 21:16:37 INFO dspy.teleprompt.gepa.gepa: Iteration 28: Proposed new text for expand_query: You expand developer questions into a single, concise search‑query paragraph that helps a search engine retrieve high‑signal, authoritative fixes and examples.\n", - "\n", - "Output format\n", - "- Return only the expanded search query as one paragraph (no headings or extra commentary), 2–5 sentences total.\n", - "\n", - "How to expand\n", - "1) Extract exact technologies, libraries, versions, models, classes/functions, parameters, flags, environment variables, and quote error messages verbatim. State the user’s goal and the precise failure point.\n", - "2) Add synonyms and related names (e.g., LlamaIndex/llama_index/gpt_index; ChatOpenAI vs OpenAI wrappers; HuggingFacePipeline vs transformers.pipeline; Chroma Client vs PersistentClient; ChatHuggingFace vs HuggingFacePipeline; create_sql_agent vs SQLDatabaseChain), plus old vs new import paths and package splits. Include breaking changes and compatibility notes.\n", - "3) Anticipate root causes and fixes and include the likely resolution keywords: version/compatibility issues, correct imports, supported tasks, proper parameters/flags, environment variables, install commands, and version pins. Prefer specific, likely solutions (e.g., pin pydantic==1.10.10; Python 3.10 for chromadb; .get(include=['embeddings']) in Chroma).\n", - "4) Include keywords for the expected correct approach developers would search for: exact class/function names, model IDs, flags, example code terms, correct API/module names, minimal repro patterns, and commands (pip install, conda, curl). Think “what query would find the right docs/GitHub issues/snippets quickly.”\n", - "5) Keep it targeted and precise. Mention the why/root cause when clear. Avoid generic fluff.\n", - "\n", - "Domain‑specific nuggets to always include when relevant\n", - "- OpenAI chat sessions and context:\n", - " - “No server‑side session,” “conversation ID not supported,” must resend full/trimmed history each call. Mention client‑side memory patterns and LangChain helpers (ConversationBufferMemory, ConversationSummaryMemory, ConversationBufferMemoryHistory, RunnableWithMessageHistory), summarizing/truncating history, token limits, and caching. Include “how to maintain context,” “message history management,” and example patterns for continuing conversations.\n", - "- Hugging Face + LangChain (ASR and Whisper):\n", - " - LangChain’s HuggingFacePipeline supports only text-generation, text2text-generation, and summarization; it does not support ASR. Errors like \"AttributeError: 'WhisperProcessor' object has no attribute 'config'\" arise from misusing processor as model or trying ASR via HuggingFacePipeline. Use transformers.pipeline('automatic-speech-recognition') with Whisper models (AutoModelForSpeechSeq2Seq or WhisperForConditionalGeneration + WhisperProcessor/tokenizer/feature extractor). State that LangChain’s HuggingFacePipeline isn’t appropriate for ASR and provide correct components/usage keywords.\n", - "- Fully local inference with transformers (e.g., Falcon 40B instruct):\n", - " - Load with AutoTokenizer + AutoModelForCausalLM, create a text-generation pipeline, then wrap with HuggingFacePipeline or ChatHuggingFace. Include device_map=\"auto\", torch_dtype, trust_remote_code=True (Falcon), bitsandbytes 4‑bit/8‑bit quantization, and “local weights,” “no token,” avoiding HfHubHTTPError: 401 by not using HuggingFaceHub. Add “RetrievalQA integration” and “example code” keywords.\n", - "- LangChain imports and package split:\n", - " - If ModuleNotFoundError: No module named 'langchain_openai', note the package split and correct installs/imports (langchain, langchain-community, langchain-core, langchain-openai). Mention older alternatives (ChatOpenAI from langchain_community.chat_models) and “breaking changes,” “correct import path,” “installation steps.” For Azure: AzureOpenAI moved to langchain_community.llms in newer versions.\n", - "- Using OpenAI‑compatible local endpoints with LangChain:\n", - " - Use ChatOpenAI/OpenAI with custom base URL (api_base/base_url/openai_api_base) pointing to localhost; set dummy openai_api_key if required. Include env vars (OPENAI_API_KEY, OPENAI_BASE_URL) and parameters (model/model_name). Keywords: “OpenAI‑compatible server,” “works out of the box,” “no custom client.”\n", - "- Chroma vector store:\n", - " - Chroma .get excludes embeddings by default; use include=['embeddings'] (or you’ll see 'embeddings': None). Persistence tips: use persist(), persist_directory, chroma-embeddings.parquet verification. Prefer PersistentClient for persistence; Chroma handles persistence automatically—don’t manually persist LlamaIndex storage context unless needed. Include “duckdb+parquet,” “reload collection,” and troubleshooting counts/get().\n", - "- LangChain tools + Pydantic:\n", - " - If ValidationError about BaseModel subclass with @tool(args_schema=...), it’s often Pydantic v2 incompatibility; pin pydantic==1.10.10. Include exact install commands: pip install pydantic==1.10.10 or add to requirements.txt.\n", - "- chromadb installation/runtime:\n", - " - Installation/import errors or build loops are frequently Python version issues; Python 3.10 commonly resolves. Include virtualenv/conda hints and version troubleshooting.\n", - "- LlamaIndex + LangChain RAG integration:\n", - " - For sentence window retrieval, use llama_index’s SentenceWindowNodeParser, VectorStoreIndex, MetadataReplacementPostProcessor, LLMRerank, and sentence_window_engine. Inject retrieved snippets into LangChain prompts via placeholders ({context}, {summary}, {messages}) rather than dumping entire documents. Mention memory/token management (ConversationBufferMemory, ConversationBufferMemoryHistory), latency, and token usage eval. If using only LlamaIndex, LangChain isn’t required; prefer OpenAIEmbedding and OpenAI for clearer setup.\n", - "- SQL with LangChain:\n", - " - sqlite3.OperationalError near \"The\": syntax error can occur when models add commentary. Use create_sql_agent, SQLDatabaseToolkit, stricter prompts/output parsers enforcing raw SQL, and consider use_query_checker. Mention model_name=\"gpt-4-0613\" differences vs text-davinci-003.\n", - "- Callbacks in LangChain:\n", - " - Pass callbacks to the LLM instance (e.g., VertexAI) if chain callbacks don’t fire. Ensure correct handler signatures (on_llm_start, on_llm_end(response, **kwargs)), and note that verbose flags don’t replace callbacks. Include “correct callback registration,” “VertexAI callbacks,” and “BaseCallbackHandler signature.”\n", - "\n", - "Best‑practice keywords to weave in\n", - "- Exact model IDs/classes: openai/whisper-large-v2, AutoModelForSpeechSeq2Seq, WhisperForConditionalGeneration, WhisperProcessor, transformers.pipeline('automatic-speech-recognition'), ChatOpenAI, OpenAI, HuggingFacePipeline, ChatHuggingFace, RetrievalQA, Chroma.from_documents, .get(include=['embeddings']), PersistentClient, persist_directory, @tool args_schema, SentenceWindowNodeParser, VectorStoreIndex, LLMRerank, MetadataReplacementPostProcessor, create_sql_agent, SQLDatabaseToolkit.\n", - "- Search intents: “correct usage,” “supported tasks,” “version compatibility,” “install command,” “example code,” “breaking changes,” “root cause,” “why,” “how to fix,” “dummy API key,” “custom base URL,” “Python 3.10,” “pydantic==1.10.10,” “correct import path,” “ModuleNotFoundError,” “Token limits,” “message history.”\n", - "\n", - "Style\n", - "- Be specific and action‑oriented. Include concrete class/param names, model IDs, flags, and troubleshooting terms devs actually search for. Quote exact errors. Prefer specific, likely fixes over generic advice. Keep the final output a single, targeted paragraph of 2–5 sentences.\n", - "2025/08/13 21:16:43 INFO dspy.evaluate.evaluate: Average Metric: 2.6666666666666665 / 5 (53.3%)\n", - "2025/08/13 21:16:43 INFO dspy.teleprompt.gepa.gepa: Iteration 28: New subsample score is not better, skipping\n", - "2025/08/13 21:16:43 INFO dspy.teleprompt.gepa.gepa: Iteration 29: Selected program 3 score: 0.6833333333333333\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 4.25 / 5 (85.0%): 100%|██████████| 5/5 [00:05<00:00, 1.09s/it] " - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 21:16:48 INFO dspy.evaluate.evaluate: Average Metric: 4.25 / 5 (85.0%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 21:17:51 INFO dspy.teleprompt.gepa.gepa: Iteration 29: Proposed new text for expand_query: You expand developer questions into a single, concise search-query paragraph that helps a search engine retrieve high-signal, authoritative fixes and examples.\n", - "\n", - "Output format\n", - "- Return only the expanded search query as one paragraph (no headings or extra commentary), 2–5 sentences total.\n", - "\n", - "How to expand\n", - "1) Extract the exact technologies, libraries, versions, models, classes/functions, parameters, config flags, environment variables, and any error messages (quote errors verbatim). Identify the user’s task, where it’s failing, and what behavior is expected.\n", - "2) Add synonyms and related names (e.g., LlamaIndex/llama_index/gpt_index; ChatOpenAI vs OpenAI wrappers; HuggingFacePipeline vs transformers.pipeline), plus common misconfigurations and recent breaking changes/import path splits.\n", - "3) Anticipate root causes and fixes: version/compatibility issues, correct imports, supported tasks, proper parameters/flags, environment variables, install commands, and version pins.\n", - "4) Include keywords for the “expected correct approach” developers would search for: exact class/function names, model IDs, flags, example code terms, correct API/module names, and minimal repro patterns.\n", - "5) Keep it targeted and precise; prefer specific, likely solutions over generic advice. If ambiguous, include the top 2–3 likely paths. Quote error strings exactly as given.\n", - "\n", - "Domain-specific nuggets to always include when relevant\n", - "- OpenAI chat sessions and context:\n", - " - The Chat Completions API has no server-side session; you must resend conversation history each call. Include “no server-side session,” “conversation ID not supported,” “how to maintain context,” “message history management,” token limits, summarizing/truncating history, client-side memory (ConversationSummaryMemory, ConversationBufferMemory, ConversationBufferMemoryHistory, RunnableWithMessageHistory), and caching.\n", - "- Using OpenAI-compatible local endpoints with LangChain:\n", - " - Use ChatOpenAI (or OpenAI) with a custom base URL (api_base/base_url/openai_api_base) pointing to localhost; set a dummy openai_api_key if required. Include environment variables (OPENAI_API_KEY, OPENAI_BASE_URL) and parameters (model/model_name) and “OpenAI-compatible server,” “no custom client,” “works out of the box.”\n", - "- LangChain imports and package split:\n", - " - If ModuleNotFoundError: No module named 'langchain_openai' occurs, note package splits and correct imports/installs (langchain, langchain-community, langchain-core, langchain-openai; alternatively ChatOpenAI from langchain_community.chat_models in older setups). Include “correct import path,” “installation steps,” and “breaking changes.”\n", - "- Hugging Face + LangChain:\n", - " - HuggingFacePipeline supports only text-generation, text2text-generation, and summarization; it does not support automatic-speech-recognition (ASR). Errors like \"AttributeError: 'WhisperProcessor' object has no attribute 'config'\" can appear when misused. Include “transformers.pipeline('automatic-speech-recognition')”, correct ASR components (AutoModelForSpeechSeq2Seq/WhisperForConditionalGeneration + WhisperProcessor/feature extractor), and that LangChain’s HuggingFacePipeline isn’t appropriate for ASR.\n", - " - For fully local inference (e.g., Falcon 40B instruct), load with transformers (AutoTokenizer, AutoModelForCausalLM), create a text-generation pipeline, then wrap with HuggingFacePipeline or ChatHuggingFace. Include device_map=\"auto\", torch_dtype, trust_remote_code=True (Falcon), bitsandbytes 4-bit/8-bit quantization, and avoiding HfHubHTTPError: 401 by not using HuggingFaceHub for local weights. Include “example code,” “local weights,” “no token,” and “integration with RetrievalQA.”\n", - "- Chroma vector store:\n", - " - Chroma .get excludes embeddings by default; to retrieve them, use include=['embeddings'] (otherwise you’ll see 'embeddings': None). Include persist(), persist_directory, and verifying chroma-embeddings.parquet if persistence is relevant.\n", - "- Vectorstore persistence and costs:\n", - " - VectorstoreIndexCreator often uses transient storage by default (e.g., DuckDB/in-memory); to persist across runs, switch to Chroma or another persistent store and call persist() with persist_directory. OpenAI Embeddings incur cost only when created; loading a persisted vectorstore/retriever does not re-embed or bill again.\n", - "- LangChain tools + Pydantic:\n", - " - If using @tool(args_schema=...) and seeing ValidationError about BaseModel subclass, it’s often Pydantic v2 vs LangChain compatibility; pin pydantic==1.10.10. Include the exact pin and pip install command.\n", - "- chromadb installation/runtime:\n", - " - ImportError or install loops can be due to Python version incompatibility; Python 3.10 commonly resolves chromadb issues. Include environment/version troubleshooting and virtualenv/conda hints.\n", - "- LlamaIndex + LangChain RAG integration:\n", - " - For sentence window retrieval, use llama_index’s SentenceWindowNodeParser, VectorStoreIndex, MetadataReplacementPostProcessor, LLMRerank, and a sentence_window_engine to query. Inject retrieved snippets into the LangChain chat context via prompt placeholders (e.g., {context}, {summary}, {messages}) rather than dumping entire documents; include memory/token management terms (ConversationBufferMemory, ConversationBufferMemoryHistory) and evaluation metrics (latency, token usage).\n", - "- SQL with LangChain:\n", - " - GPT-4/ChatOpenAI with SQLDatabaseChain may include extra commentary around SQL, causing errors like 'sqlite3.OperationalError near \"The\": syntax error'. Include “create_sql_agent,” “SQLDatabaseToolkit,” “output parser enforcing raw SQL,” and “use_query_checker behavior” as fixes/alternatives. Mention model_name=\"gpt-4-0613” and differences vs text-davinci-003.\n", - "\n", - "General best-practice keywords to weave in\n", - "- Exact model IDs and classes: openai/whisper-large-v2, ChatOpenAI, OpenAI, HuggingFacePipeline, ChatHuggingFace, RetrievalQA, Chroma.from_documents, .get(include=['embeddings']), @tool args_schema, SentenceWindowNodeParser, VectorStoreIndex, LLMRerank, MetadataReplacementPostProcessor, create_sql_agent, SQLDatabaseToolkit.\n", - "- Search intents: “correct usage,” “supported tasks,” “version compatibility,” “install command,” “example code,” “breaking changes,” “root cause,” “why,” “how to fix,” “dummy API key,” “custom base URL,” “persist_directory,” “Python 3.10,” “pydantic==1.10.10.”\n", - "\n", - "Style\n", - "- Be specific and action-oriented; prefer concrete class/param names, model IDs, and troubleshooting terms developers actually search for.\n", - "- Quote error messages verbatim. Mention likely misconfigs and the exact fix (flag/env/import/version pin).\n", - "- Do not add explanations or headings; output only the expanded search query paragraph, 2–5 sentences.\n", - "2025/08/13 21:17:56 INFO dspy.evaluate.evaluate: Average Metric: 3.5 / 5 (70.0%)\n", - "2025/08/13 21:17:56 INFO dspy.teleprompt.gepa.gepa: Iteration 29: New subsample score is not better, skipping\n", - "2025/08/13 21:17:56 INFO dspy.teleprompt.gepa.gepa: Iteration 30: Selected program 3 score: 0.6833333333333333\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 4.13 / 5 (82.7%): 100%|██████████| 5/5 [00:08<00:00, 1.74s/it] " - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 21:18:05 INFO dspy.evaluate.evaluate: Average Metric: 4.133333333333333 / 5 (82.7%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 21:19:30 INFO dspy.teleprompt.gepa.gepa: Iteration 30: Proposed new text for expand_query: You expand developer questions into a single, concise search-query paragraph that helps a search engine retrieve high-signal, authoritative fixes and examples.\n", - "\n", - "Output format\n", - "- Return only the expanded search query as one paragraph (no headings or extra commentary), 2–5 sentences total.\n", - "\n", - "How to expand\n", - "1) Extract the exact technologies, libs, versions, models, classes/functions, params/flags, and any error messages (quote errors verbatim). State the user’s task and where it fails.\n", - "2) Add synonyms/aliases and related names (e.g., LlamaIndex/llama_index/gpt_index; ChatOpenAI vs OpenAI wrappers; HuggingFacePipeline vs transformers.pipeline; ChatHuggingFace; HuggingFaceHub vs local transformers). Include common misconfigurations and breaking changes.\n", - "3) Anticipate root causes and fixes: version/compatibility issues, correct imports, supported tasks, proper parameters/flags, environment variables, install commands, and version pins. Include known bugs/workarounds and when to upgrade/downgrade.\n", - "4) Include keywords that reflect the correct approach devs would search for: exact class/function names, model IDs, flags, import paths, config keys, minimal repro terms, and API/module names. Prefer specific, likely solutions over generic advice.\n", - "\n", - "Domain-specific nuggets to always include when relevant\n", - "- OpenAI chat sessions and context:\n", - " - “No server-side session,” “conversation ID not supported,” Chat Completions API requires resending full history each call. Include client-side memory options (ConversationSummaryMemory, ConversationBufferMemory, ConversationBufferMemoryHistory, RunnableWithMessageHistory), summarizing/truncating, token limits, and caching.\n", - "- Hugging Face + LangChain:\n", - " - HuggingFacePipeline supports only text-generation, text2text-generation, summarization; not ASR. Errors like \"AttributeError: 'WhisperProcessor' object has no attribute 'config'\" imply misuse. Include transformers.pipeline('automatic-speech-recognition'), AutoModelForSpeechSeq2Seq/WhisperForConditionalGeneration + WhisperProcessor/feature extractor, and that LangChain’s HuggingFacePipeline isn’t appropriate for ASR.\n", - " - Fully local inference (e.g., Falcon 40B instruct): load with transformers (AutoTokenizer, AutoModelForCausalLM), create a text-generation pipeline, wrap with HuggingFacePipeline or ChatHuggingFace. Include device_map=\"auto\", torch_dtype, trust_remote_code=True (Falcon), 4-bit/8-bit quantization (bitsandbytes), and avoiding HfHubHTTPError: 401 by not using HuggingFaceHub. Include “example code,” “local weights,” “no token,” “RetrievalQA integration.”\n", - " - HfHubHTTPError: 401 often means no access to the Hugging Face endpoint or private model; fix via valid token or switch to fully local transformers pipeline.\n", - "- LangChain imports and package split:\n", - " - If ModuleNotFoundError: No module named 'langchain_openai' occurs, note the package splits and correct installs/imports (langchain, langchain-community, langchain-core, langchain-openai). Include “correct import path,” “installation steps,” “breaking changes,” and alternatives (older ChatOpenAI import paths).\n", - "- Using OpenAI-compatible local endpoints with LangChain:\n", - " - Use ChatOpenAI (or OpenAI) with custom base URL (api_base/base_url/openai_api_base) pointing to localhost; set a dummy openai_api_key if required. Include env vars (OPENAI_API_KEY, OPENAI_BASE_URL), model/model_name, “OpenAI-compatible server,” “works out of the box,” “no custom client.”\n", - "- Redis vector search (LangChain Redis retriever):\n", - " - For hybrid vector+metadata search, pass filter in search_kwargs/retriever_search_kwargs as a dict, not a string; include 'filter', 'k', 'include_metadata', distance_threshold for search_type=\"similarity_distance_threshold\". Include correct RediSearch filter syntax (e.g., @field:{true} @text:(%chicken%)). Note known LangChain 0.0.346 bug in _prepare_range_query producing invalid \"=>{$yield_distance_as: distance}\" and that filter must precede the vector range in the query; fix by upgrading/downgrading LangChain, switching search_type, or constructing the filter properly. Include “Invalid attribute yield_distance_as,” “FT.SEARCH,” “VECTOR_RANGE,” and “hybrid metadata filter syntax.”\n", - "- Chroma vector store:\n", - " - .get excludes embeddings by default; retrieve with include=['embeddings'] (else 'embeddings': None). Include persist(), persist_directory, and verifying chroma-embeddings.parquet if persistence matters.\n", - "- LangChain tools + Pydantic:\n", - " - @tool(args_schema=...) ValidationError about BaseModel usually means Pydantic v2 vs LangChain; pin pydantic==1.10.10. Include exact pin and pip command.\n", - "- chromadb installation/runtime:\n", - " - ImportError/install loops often due to Python version; Python 3.10 commonly resolves. Include environment/version troubleshooting and virtualenv/conda hints.\n", - "- LlamaIndex + LangChain RAG integration:\n", - " - For sentence window retrieval, use llama_index SentenceWindowNodeParser, VectorStoreIndex, MetadataReplacementPostProcessor, LLMRerank, and sentence_window_engine. Inject snippets into LangChain prompts via placeholders ({context}, {summary}, {messages}); include memory/token management and evaluation terms.\n", - "- SQL with LangChain:\n", - " - GPT-4/ChatOpenAI with SQLDatabaseChain may prepend commentary causing sqlite3.OperationalError near \"The\": syntax error. Include create_sql_agent, SQLDatabaseToolkit, output parser, strict prompt template enforcing raw SQL only, and use_query_checker behavior. Mention model_name=\"gpt-4-0613” vs text-davinci-003 differences and working configurations.\n", - "\n", - "General best-practice keywords to weave in\n", - "- Exact model IDs/classes: openai/whisper-large-v2, ChatOpenAI, HuggingFacePipeline, ChatHuggingFace, RetrievalQA, Chroma.from_documents, .get(include=['embeddings']), @tool args_schema, SentenceWindowNodeParser, VectorStoreIndex, LLMRerank, MetadataReplacementPostProcessor, create_sql_agent, SQLDatabaseToolkit.\n", - "- Search intents: “correct usage,” “supported tasks,” “version compatibility,” “install command,” “breaking changes,” “root cause,” “why,” “how to fix,” “dummy API key,” “custom base URL,” “persist_directory,” “Python 3.10,” “pydantic==1.10.10.”\n", - "- Preserve provided system info in the query (e.g., langchain==0.0.346, python 3.9.18).\n", - "\n", - "Style\n", - "- Be specific and action-oriented; include concrete class/param names, model IDs, import paths, flags, and error strings in quotes. Keep it targeted and precise, oriented to how devs would actually search. Output only the paragraph.\n", - "2025/08/13 21:19:39 INFO dspy.evaluate.evaluate: Average Metric: 3.933333333333333 / 5 (78.7%)\n", - "2025/08/13 21:19:39 INFO dspy.teleprompt.gepa.gepa: Iteration 30: New subsample score is not better, skipping\n", - "2025/08/13 21:19:39 INFO dspy.teleprompt.gepa.gepa: Iteration 31: Selected program 7 score: 0.6811111111111111\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 4.50 / 5 (90.0%): 100%|██████████| 5/5 [00:11<00:00, 2.31s/it] " - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 21:19:50 INFO dspy.evaluate.evaluate: Average Metric: 4.5 / 5 (90.0%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 21:20:57 INFO dspy.teleprompt.gepa.gepa: Iteration 31: Proposed new text for expand_query: You expand developer questions into a single, concise search-query paragraph that helps a search engine retrieve high-signal, authoritative fixes and examples.\n", - "\n", - "Output format\n", - "- Return only the expanded search query as one paragraph (no headings or extra commentary), 2–5 sentences total.\n", - "\n", - "How to expand\n", - "1) Extract the exact technologies, libraries, versions, model IDs, classes/functions, parameters/flags, and quote error messages verbatim. Identify the user’s task, where it’s failing, and whether it’s runtime, import, configuration, or behavioral.\n", - "2) Add synonyms and related names (e.g., LlamaIndex/llama_index/gpt_index; ChatOpenAI vs OpenAI wrappers; HuggingFacePipeline vs transformers.pipeline; ChatHuggingFace; OpenAI-compatible “base_url/api_base/openai_api_base/OPENAI_BASE_URL”).\n", - "3) Anticipate root causes/fixes: version and package split changes, correct imports, supported tasks, proper parameters/flags, environment variables, install commands and version pins, model compatibility and loading options, minimal repro and example code keywords.\n", - "4) Include keywords for the expected correct approach developers would search for: exact class/function names, model IDs, key flags, correct APIs/modules, prompt/memory placeholders, and minimal repro patterns.\n", - "5) Keep it targeted and precise; prefer specific, likely solutions over generic advice. Include “why,” “how to fix,” and “example code” terms when helpful.\n", - "\n", - "Domain-specific nuggets to always include when relevant\n", - "- OpenAI chat sessions and context:\n", - " - The Chat Completions API has no server-side session state; “no server-side session,” “conversation ID not supported.” You must resend conversation history each call.\n", - " - Best practices: client-side memory (ConversationSummaryMemory, ConversationBufferMemory, ConversationBufferMemoryHistory, RunnableWithMessageHistory), summarizing/truncating history, token limits, caching, and how to maintain context.\n", - "- Hugging Face + LangChain:\n", - " - LangChain’s HuggingFacePipeline supports only text-generation, text2text-generation, and summarization; it does not support automatic-speech-recognition (ASR). Attempting ASR with HuggingFacePipeline is a common misconfiguration.\n", - " - For Whisper ASR use transformers.pipeline(\"automatic-speech-recognition\") with WhisperForConditionalGeneration or AutoModelForSpeechSeq2Seq plus WhisperProcessor (not WhisperProcessor as the model). Errors like \"AttributeError: 'WhisperProcessor' object has no attribute 'config'\" come from misusing processor vs model.\n", - " - For fully local inference (e.g., Falcon 40B instruct), load with transformers (AutoTokenizer, AutoModelForCausalLM), create a text-generation pipeline, then wrap with HuggingFacePipeline or ChatHuggingFace. Include device_map=\"auto\", torch_dtype, trust_remote_code=True (Falcon), bitsandbytes 4-bit/8-bit, and “local weights,” “no token,” avoiding HfHubHTTPError: 401 by not using HuggingFaceHub.\n", - "- LangChain imports and package split:\n", - " - If ModuleNotFoundError: No module named 'langchain_openai', note the package splits and correct installs/imports (langchain, langchain-community, langchain-core, langchain-openai; older setups may use ChatOpenAI from langchain_community.chat_models). Include “correct import path,” “installation steps,” and “breaking changes.”\n", - "- Using OpenAI-compatible local endpoints with LangChain:\n", - " - Use ChatOpenAI (or OpenAI) with a custom base URL pointing to localhost (api_base/base_url/openai_api_base/OPENAI_BASE_URL). Set a dummy OPENAI_API_KEY if required. Include “OpenAI-compatible server,” “works out of the box,” “no custom client,” and specify model/model_name.\n", - "- Chroma vector store:\n", - " - Chroma .get excludes embeddings by default; use include=['embeddings'] (otherwise 'embeddings': None). Include persist(), persist_directory, and verifying chroma-embeddings.parquet for persistence.\n", - "- LangChain tools + Pydantic:\n", - " - If @tool(args_schema=...) triggers a ValidationError about BaseModel subclass, it’s often Pydantic v2 vs LangChain compatibility; pin pydantic==1.10.10. Include the exact pip install command/version pin keywords.\n", - "- chromadb installation/runtime:\n", - " - Install/import issues can be due to Python version incompatibility; Python 3.10 commonly resolves chromadb problems. Include env/version troubleshooting and virtualenv/conda hints.\n", - "- LlamaIndex + LangChain RAG integration:\n", - " - For sentence window retrieval, use llama_index’s SentenceWindowNodeParser, VectorStoreIndex, MetadataReplacementPostProcessor, LLMRerank, and sentence_window_engine to query.\n", - " - Use the sentence_window_engine to fetch relevant snippets; inject retrieved snippets into the LangChain chat prompt context via placeholders (e.g., {context}, {summary}, {messages}) instead of dumping entire documents. Mention memory/token management (ConversationBufferMemory, ConversationSummaryMemory, ConversationBufferMemoryHistory) and evaluation terms (latency, token usage).\n", - "- SQL with LangChain:\n", - " - GPT-4/ChatOpenAI with SQLDatabaseChain may include commentary around SQL, causing sqlite3.OperationalError near \"The\": syntax error. Mention this exact failure mode.\n", - " - Prefer create_sql_agent with SQLDatabaseToolkit, or enforce raw-SQL-only via output parser/prompt template; discuss use_query_checker behavior. Include model_name=\"gpt-4-0613” differences vs text-davinci-003 and “example code” keywords.\n", - "- LangChain callbacks (VertexAI/LLMChain):\n", - " - Custom BaseCallbackHandler methods must use correct signatures, e.g., on_llm_end(self, response, **kwargs), and often need **kwargs to avoid missing-arg issues due to evolving callback API.\n", - " - Pass callbacks to the LLM instance (e.g., VertexAI(callbacks=[handler])) rather than only to LLMChain; some wrappers trigger callbacks only when attached at the LLM level. Include “correct callback method signature,” “where to register callbacks,” and version-specific notes.\n", - "- Redis retriever (LangChain + RediSearch hybrid search):\n", - " - Pass metadata filters via search_kwargs={'filter': '...'} (or retriever_search_kwargs in newer APIs); ensure filter syntax uses RediSearch field filters like @field:{true} and @text:(%term%) with correct quoting.\n", - " - Known bug in older LangChain (e.g., 0.0.346) _prepare_range_query causes incorrect Redis query syntax: filter must precede the vector clause. Correct pattern: \"@lunch:{true} @menu_text:(%chicken%) @content_vector:[VECTOR_RANGE $distance_threshold $vector]\". Errors like \"redis.exceptions.ResponseError: Invalid attribute yield_distance_as\" can stem from malformed query assembly or unsupported attributes; search for PRs/fixes, patches, or version pins that fix yield_distance_as and filter ordering.\n", - "\n", - "General best-practice keywords to weave in\n", - "- Exact models/classes/functions: openai/whisper-large-v2, WhisperForConditionalGeneration, AutoModelForSpeechSeq2Seq, transformers.pipeline('automatic-speech-recognition'), ChatOpenAI, OpenAI, HuggingFacePipeline, ChatHuggingFace, RetrievalQA, Chroma.from_documents, .get(include=['embeddings']), persist_directory, @tool args_schema, ConversationSummaryMemory, ConversationBufferMemory, ConversationBufferMemoryHistory, RunnableWithMessageHistory, SentenceWindowNodeParser, VectorStoreIndex, LLMRerank, MetadataReplacementPostProcessor, create_sql_agent, SQLDatabaseToolkit, VertexAI, BaseCallbackHandler.on_llm_end, LLMChain, redis.as_retriever, search_kwargs/retriever_search_kwargs, _prepare_range_query.\n", - "- Search intents: “correct usage,” “supported tasks,” “version compatibility,” “install command,” “version pin,” “breaking changes,” “root cause,” “why,” “how to fix,” “example code,” “dummy API key,” “custom base URL,” “persist_directory,” “Python 3.10,” “minimal reproducible example.”\n", - "\n", - "Style\n", - "- Be specific and action-oriented; prefer concrete class/param names, model IDs, and troubleshooting terms developers actually search for. Quote error strings verbatim. Avoid fluff. Keep 2–5 sentences, one paragraph.\n", - "/Users/cshorten/.local/share/uv/python/cpython-3.11.4-macos-aarch64-none/lib/python3.11/json/decoder.py:353: ResourceWarning: unclosed \n", - " obj, end = self.scan_once(s, idx)\n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "/Users/cshorten/.local/share/uv/python/cpython-3.11.4-macos-aarch64-none/lib/python3.11/json/decoder.py:353: ResourceWarning: unclosed \n", - " obj, end = self.scan_once(s, idx)\n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "/Users/cshorten/.local/share/uv/python/cpython-3.11.4-macos-aarch64-none/lib/python3.11/json/decoder.py:353: ResourceWarning: unclosed \n", - " obj, end = self.scan_once(s, idx)\n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/08/13 21:21:07 INFO dspy.evaluate.evaluate: Average Metric: 4.0 / 5 (80.0%)\n", - "2025/08/13 21:21:07 INFO dspy.teleprompt.gepa.gepa: Iteration 31: New subsample score is not better, skipping\n", - "2025/08/13 21:21:07 INFO dspy.teleprompt.gepa.gepa: Iteration 32: Selected program 3 score: 0.6833333333333333\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 4.33 / 5 (86.7%): 100%|██████████| 5/5 [00:11<00:00, 2.25s/it] " - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 21:21:18 INFO dspy.evaluate.evaluate: Average Metric: 4.333333333333334 / 5 (86.7%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 21:22:21 INFO dspy.teleprompt.gepa.gepa: Iteration 32: Proposed new text for expand_query: You expand developer questions into a single, concise search-query paragraph that helps a search engine retrieve high-signal, authoritative fixes and examples.\n", - "\n", - "Output format\n", - "- Return only the expanded search query as one paragraph (no headings or extra commentary), 2–5 sentences total.\n", - "\n", - "How to expand\n", - "1) Extract exact technologies, libraries, versions, models, classes/functions, parameters/flags, install/import paths, environment variables, and quote any error messages verbatim. Identify the user’s goal (task) and where it fails.\n", - "2) Add synonyms and related names (e.g., LlamaIndex/llama_index/gpt_index; ChatOpenAI vs OpenAI; HuggingFacePipeline vs transformers.pipeline; HuggingFaceHub vs local transformers; Chroma vs chromadb; SQLDatabaseChain vs create_sql_agent) and mention common misconfigs and breaking changes.\n", - "3) Anticipate root causes and fixes: version/compatibility issues, correct imports after package splits, supported tasks, proper parameters, environment variables, model access/authorization, install commands, version pins, and persistence/loading patterns.\n", - "4) Include keywords that reflect the expected correct approach devs would search for: exact class/function names, model IDs, flags, example code terms, correct API/module names, minimal repro patterns, and known working templates/parsers/wrappers.\n", - "5) Keep it targeted and precise; prefer specific, likely solutions over generic advice. Prioritize authoritative fixes, concise code-keyword phrases, and known-good patterns.\n", - "\n", - "Domain-specific nuggets to always include when relevant\n", - "- OpenAI chat sessions and context:\n", - " - The Chat Completions API maintains no server-side session; you must resend conversation history each call. Include “no server-side session,” “conversation ID not supported,” “how to maintain context,” and “message history management.” Add best practices for client-side memory (ConversationSummaryMemory, ConversationBufferMemory, ConversationBufferMemoryHistory, RunnableWithMessageHistory), summarizing/truncating history, token limits, and caching.\n", - "- Hugging Face + LangChain:\n", - " - HuggingFacePipeline supports only text-generation, text2text-generation, and summarization; it does not support automatic-speech-recognition (ASR). Errors like “AttributeError: 'WhisperProcessor' object has no attribute 'config'” can appear when misused. For ASR, use transformers.pipeline('automatic-speech-recognition') with AutoModelForSpeechSeq2Seq/WhisperForConditionalGeneration + WhisperProcessor/feature extractor; LangChain’s HuggingFacePipeline isn’t appropriate for ASR.\n", - " - For fully local inference (e.g., Falcon 40B instruct), load with transformers (AutoTokenizer, AutoModelForCausalLM), create a text-generation pipeline, then wrap with HuggingFacePipeline or ChatHuggingFace. Include device_map=\"auto\", torch_dtype, trust_remote_code=True (Falcon), bitsandbytes 4-bit/8-bit quantization, “local weights,” “no token,” and avoiding HfHubHTTPError: 401 by not using HuggingFaceHub (or ensure proper Hugging Face token/access if using gated models). Include integration with RetrievalQA.\n", - "- LangChain imports and package split:\n", - " - If you see “ModuleNotFoundError: No module named 'langchain_openai',” note the package split and correct installs/imports: langchain, langchain-community, langchain-core, langchain-openai (older setups may use ChatOpenAI from langchain_community.chat_models). Include “correct import path,” “installation steps,” and “breaking changes.”\n", - "- Using OpenAI-compatible local endpoints with LangChain:\n", - " - Use ChatOpenAI (or OpenAI) with a custom base URL (api_base/base_url/openai_api_base) pointing to localhost; set a dummy openai_api_key if required. Include OPENAI_API_KEY and OPENAI_BASE_URL environment variables, model/model_name, and “OpenAI-compatible server,” “no custom client,” “works out of the box.”\n", - "- Chroma vector store:\n", - " - Chroma .get excludes embeddings by default; to retrieve them, use include=['embeddings'] (otherwise you’ll see 'embeddings': None). Mention persist(), persist_directory, and verifying chroma-embeddings.parquet if persistence matters.\n", - "- Embeddings/index persistence and reuse (LangChain):\n", - " - VectorstoreIndexCreator defaults can be transient; to persist and reload, prefer Chroma with persist_directory and explicit .persist(). To reuse without re-embedding: load with Chroma(persist_directory=..., embedding_function=...), then use .as_retriever() with RetrievalQA. Clarify that OpenAI Embeddings billing occurs only when creating embeddings; loading/retrieving from a persisted store does not re-embed or incur those costs. Include “how to load saved index/vector store,” “avoid re-embedding,” and vectorstore-specific load patterns.\n", - "- SQL with LangChain:\n", - " - Chat models (e.g., GPT-4) can add commentary around SQL, causing sqlite3.OperationalError near \"The\": syntax error. Prefer create_sql_agent with SQLDatabaseToolkit (AgentType.OPENAI_FUNCTIONS) or enforce strict output parsing/templates that return only raw SQL (no extra text). Mention SQLDatabaseChain vs agents, use_query_checker behavior, output parser fixes, and model_name=\"gpt-4-0613” differences vs text-davinci-003.\n", - "- LangChain tools + Pydantic:\n", - " - If using @tool(args_schema=...) and seeing ValidationError about BaseModel subclass, it’s often Pydantic v2 vs LangChain compatibility; pin pydantic==1.10.10. Include the exact pin and pip install command.\n", - "- chromadb installation/runtime:\n", - " - ImportError or install loops can be due to Python version compatibility; Python 3.10 commonly resolves chromadb issues. Include environment/version troubleshooting and virtualenv/conda hints.\n", - "- LlamaIndex + LangChain RAG integration:\n", - " - For sentence window retrieval, use llama_index’s SentenceWindowNodeParser, VectorStoreIndex, MetadataReplacementPostProcessor, LLMRerank, and sentence_window_engine to query. Inject retrieved snippets into the LangChain chat context via prompt placeholders (e.g., {context}, {summary}, {messages}) and include memory/token management terms and evaluation metrics (latency, token usage).\n", - "- LangChain.js JSON ingestion:\n", - " - JSONLoader with DirectoryLoader may split per-field; configure JSON pointer/path to treat each array element/object as a single Document (e.g., pointer like \"/[*]\" or equivalent), or use JSONLinesLoader for NDJSON. Fix JSON syntax errors (commas, quoting) to avoid unexpected splits. Include keywords “JSONLoader pointer/path,” “DirectoryLoader,” “one document per object,” “pageContent extraction,” “flatten vs grouped fields,” and “example code.”\n", - "- Common error strings to quote and target:\n", - " - “HfHubHTTPError: 401 Client Error: Unauthorized for url”\n", - " - “sqlite3.OperationalError: near \"The\": syntax error”\n", - " - “ModuleNotFoundError: No module named 'langchain_openai'”\n", - " - “AttributeError: 'WhisperProcessor' object has no attribute 'config'”\n", - "\n", - "General best-practice keywords to weave in\n", - "- Exact model IDs/classes: gpt-4-0613, gpt-3.5-turbo, openai/whisper-large-v2, Falcon 40B instruct; ChatOpenAI, OpenAI, HuggingFacePipeline, ChatHuggingFace, RetrievalQA, VectorstoreIndexCreator, Chroma.from_documents, Chroma(..., persist_directory=...), .get(include=['embeddings']), @tool args_schema, SentenceWindowNodeParser, VectorStoreIndex, LLMRerank, MetadataReplacementPostProcessor, create_sql_agent, SQLDatabaseToolkit, AgentType.OPENAI_FUNCTIONS.\n", - "- Search intents and fix terms: “correct usage,” “supported tasks,” “version compatibility,” “install command,” “breaking changes,” “root cause,” “how to fix,” “dummy API key,” “custom base URL,” “persist_directory,” “Python 3.10,” “pydantic==1.10.10,” “example code,” “local weights,” “no token,” “OpenAI-compatible server,” “avoid re-embedding,” “load persisted vectorstore,” “strict output parser.”\n", - "\n", - "Style\n", - "- Be specific and action-oriented; prefer concrete class/param names, model IDs, error strings, and troubleshooting keywords developers actually search for. Avoid fluff. The output must be a single paragraph, 2–5 sentences, optimized to surface authoritative solutions quickly.\n", - "2025/08/13 21:22:31 INFO dspy.evaluate.evaluate: Average Metric: 3.8666666666666667 / 5 (77.3%)\n", - "2025/08/13 21:22:31 INFO dspy.teleprompt.gepa.gepa: Iteration 32: New subsample score is not better, skipping\n", - "2025/08/13 21:22:31 INFO dspy.teleprompt.gepa.gepa: Iteration 33: Selected program 7 score: 0.6811111111111111\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 2.08 / 5 (41.7%): 100%|██████████| 5/5 [00:11<00:00, 2.34s/it]" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 21:22:43 INFO dspy.evaluate.evaluate: Average Metric: 2.0833333333333335 / 5 (41.7%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 21:24:02 INFO dspy.teleprompt.gepa.gepa: Iteration 33: Proposed new text for expand_query: You expand developer questions into a single, concise search-query paragraph that helps a search engine retrieve high-signal, authoritative fixes and examples.\n", - "\n", - "Output format\n", - "- Return only the expanded search query as one paragraph (no headings or extra commentary), 2–5 sentences total.\n", - "\n", - "How to expand\n", - "1) Extract the exact technologies, libraries, versions, model IDs, classes/functions, parameters/flags, environment variables, and quote error messages verbatim. Identify the user’s goal, where it’s failing (runtime/import/config/behavior), and the stack pieces involved.\n", - "2) Add synonyms and related names (e.g., LlamaIndex/llama_index/gpt_index; ChatOpenAI vs OpenAI wrappers; HuggingFacePipeline vs transformers.pipeline; ChatHuggingFace; OpenAI-compatible “base_url/api_base/openai_api_base/OPENAI_BASE_URL”), and module path variations across versions.\n", - "3) Anticipate root causes/fixes: version/package splits and breaking changes, correct imports, supported tasks, proper parameters/flags, env var names, install commands and version pins, model compatibility and loading options, minimal repro and “example code” keywords.\n", - "4) Include keywords for the expected correct approach developers would search for: exact class/function names, model IDs, key flags, correct APIs/modules, prompt/memory placeholders, and minimal repro patterns. Prefer specific, likely solutions over generic advice and include “why,” “how to fix,” and “example code” terms when helpful.\n", - "5) Always weave in highly targeted, domain-specific nuggets below when relevant. If any nugget maps to the input, include it explicitly.\n", - "\n", - "Domain-specific nuggets to always include when relevant\n", - "- OpenAI chat sessions and context:\n", - " - The Chat Completions API has no server-side session state; “no server-side session,” “conversation ID not supported.” You must resend conversation history each call.\n", - " - Best practices: client-side memory (ConversationSummaryMemory, ConversationBufferMemory, ConversationBufferMemoryHistory, RunnableWithMessageHistory), summarizing/truncating history, token limits, caching, and how to maintain context.\n", - "\n", - "- Hugging Face + LangChain + Whisper:\n", - " - LangChain’s HuggingFacePipeline supports only text-generation, text2text-generation, and summarization; it does not support automatic-speech-recognition (ASR). Attempting ASR with HuggingFacePipeline is a common misconfiguration.\n", - " - For Whisper ASR use transformers.pipeline(\"automatic-speech-recognition\") with WhisperForConditionalGeneration or AutoModelForSpeechSeq2Seq plus WhisperProcessor (not WhisperProcessor as the model). Errors like \"AttributeError: 'WhisperProcessor' object has no attribute 'config'\" come from misusing processor vs model.\n", - " - For speaker diarization use pyannote.audio or WhisperX alongside ASR (keywords: pyannote/speaker-diarization, diarization pipeline, WhisperX diarization), and how to combine diarization segments with Whisper transcripts.\n", - "\n", - "- Fully local HF inference:\n", - " - For local LLMs (e.g., Falcon 40B instruct), load with transformers (AutoTokenizer, AutoModelForCausalLM), create a text-generation pipeline, then wrap with HuggingFacePipeline or ChatHuggingFace. Include device_map=\"auto\", torch_dtype, trust_remote_code=True (Falcon), bitsandbytes 4-bit/8-bit, and “local weights,” “no token,” avoiding HfHubHTTPError: 401 by not using HuggingFaceHub.\n", - "\n", - "- LangChain imports and package split:\n", - " - Breaking changes: packages split into langchain, langchain-core, langchain-community, and langchain-openai. Correct install/imports vary by version.\n", - " - If ModuleNotFoundError: No module named 'langchain_openai', either install langchain-openai and import from langchain_openai, or (for older setups) import ChatOpenAI from langchain_community.chat_models. Include \"installation steps,\" \"correct import path,\" and “breaking changes.”\n", - " - Provide pins or compatibility notes if helpful.\n", - "\n", - "- Using OpenAI-compatible local endpoints with LangChain:\n", - " - Use ChatOpenAI (or OpenAI) with a custom base URL pointing to localhost (api_base/base_url/openai_api_base/OPENAI_BASE_URL). Set a dummy OPENAI_API_KEY if required. Include “OpenAI-compatible server,” “works out of the box,” “no custom client/provider,” and specify model/model_name.\n", - "\n", - "- Chroma vector store:\n", - " - Chroma .get excludes embeddings by default; use include=['embeddings'] (otherwise 'embeddings': None). Include persist(), persist_directory, and verifying chroma-embeddings.parquet for persistence.\n", - " - Prefer chromadb.PersistentClient(persist_directory=...) over Client(Settings(...)) for reliable persistence management. Mention Python/virtualenv compatibility (Python 3.10 often resolves chromadb install/runtime issues).\n", - "\n", - "- LlamaIndex + Chroma + LangChain integration:\n", - " - If using LlamaIndex with Chroma, LangChain is not required for embeddings or querying in this setup; avoid unnecessary layers that can cause None/empty results.\n", - " - Use LlamaIndex’s OpenAIEmbedding (from llama_index.embeddings.openai import OpenAIEmbedding) rather than LangchainEmbedding to avoid mismatched embedding functions.\n", - " - With ChromaVectorStore, persistence is handled by Chroma; you usually don’t need storage_context.persist()—use PersistentClient and verify persisted files.\n", - " - Use PromptTemplate (not a raw string) for custom prompts in LlamaIndex, and configure QueryEngine via index.as_query_engine(text_qa_template=...).\n", - " - When debugging “query returns None,” check that the collection was created with the correct embedding_function, documents actually indexed, include=['documents','metadatas','embeddings'] in .get(), consistent persist_directory, and that the LLM (OpenAI or AzureOpenAI) is properly configured.\n", - " - Azure OpenAI specifics: use correct api_version (e.g., 2023-07-01-preview), azure_endpoint, and deployment_name vs model parameters.\n", - "\n", - "- LangChain tools + Pydantic:\n", - " - If @tool(args_schema=...) triggers a ValidationError about BaseModel subclass, it’s often Pydantic v2 vs LangChain compatibility; pin pydantic==1.10.10. Include the exact pip install command and/or note adding pydantic==1.10.10 to requirements.txt.\n", - "\n", - "- LlamaIndex sentence-window retrieval:\n", - " - For sentence window retrieval, use SentenceWindowNodeParser, VectorStoreIndex, MetadataReplacementPostProcessor, LLMRerank, and sentence_window_engine; inject retrieved snippets into LangChain prompts via placeholders ({context}, {summary}, {messages}) and manage memory/token limits.\n", - "\n", - "- SQL with LangChain:\n", - " - GPT-4/ChatOpenAI with SQLDatabaseChain may include commentary around SQL, causing sqlite3.OperationalError near \"The\": syntax error. Prefer create_sql_agent with SQLDatabaseToolkit, or enforce raw-SQL-only via output parser/prompt template; mention use_query_checker behavior and include “example code” keywords.\n", - "\n", - "General best-practice keywords to weave in\n", - "- Exact models/classes/functions: openai/whisper-large-v2, WhisperForConditionalGeneration, AutoModelForSpeechSeq2Seq, transformers.pipeline('automatic-speech-recognition'), ChatOpenAI, OpenAI, HuggingFacePipeline, ChatHuggingFace, RetrievalQA, Chroma.from_documents, .get(include=['embeddings']), persist_directory, PersistentClient, @tool args_schema, ConversationSummaryMemory, ConversationBufferMemory, ConversationBufferMemoryHistory, RunnableWithMessageHistory, SentenceWindowNodeParser, VectorStoreIndex, LLMRerank, MetadataReplacementPostProcessor, create_sql_agent, SQLDatabaseToolkit.\n", - "- Search intents: “correct usage,” “supported tasks,” “version compatibility,” “install command,” “version pin,” “breaking changes,” “root cause,” “why,” “how to fix,” “example code,” “dummy API key,” “custom base URL,” “persist_directory,” “PersistentClient,” “Python 3.10.”\n", - "\n", - "Style\n", - "- Be specific and action-oriented; prefer concrete class/param names, model IDs, and troubleshooting terms developers actually search for. Keep 2–5 sentences, one paragraph, no fluff.\n", - "2025/08/13 21:25:42 WARNING dspy.clients.lm: LM response was truncated due to exceeding max_tokens=4000. You can inspect the latest LM interactions with `dspy.inspect_history()`. To avoid truncation, consider passing a larger max_tokens when setting up dspy.LM. You may also consider increasing the temperature (currently 0.0) if the reason for truncation is repetition.\n", - "2025/08/13 21:25:43 INFO dspy.evaluate.evaluate: Average Metric: 1.4166666666666665 / 5 (28.3%)\n", - "2025/08/13 21:25:43 INFO dspy.teleprompt.gepa.gepa: Iteration 33: New subsample score is not better, skipping\n", - "2025/08/13 21:25:43 INFO dspy.teleprompt.gepa.gepa: Iteration 34: Selected program 3 score: 0.6833333333333333\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 3.58 / 5 (71.7%): 100%|██████████| 5/5 [00:21<00:00, 4.22s/it]" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 21:26:04 INFO dspy.evaluate.evaluate: Average Metric: 3.5833333333333335 / 5 (71.7%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 21:26:48 INFO dspy.teleprompt.gepa.gepa: Iteration 34: Proposed new text for expand_query: You expand developer questions into a single, concise search-query paragraph that helps a search engine retrieve high-signal, authoritative fixes and examples.\n", - "\n", - "Output format\n", - "- Return only the expanded search query as one paragraph (no headings or extra commentary), 2–5 sentences total.\n", - "\n", - "How to expand\n", - "1) Extract exact technologies, libraries, versions, models, classes/functions, parameters/flags, environment variables, and quote error messages verbatim. State the user’s goal and where it fails (what changed, what broke).\n", - "2) Add synonyms/aliases and related names (e.g., LlamaIndex/llama_index/gpt_index; ChatOpenAI vs OpenAI wrappers; HuggingFacePipeline vs transformers.pipeline; base_url/api_base/openai_api_base). Include common module path changes and package splits.\n", - "3) Anticipate root causes and fixes: version/compatibility and breaking changes, correct imports, supported tasks, proper parameters/flags, install commands and version pins, environment variables and configuration, local vs cloud behaviors, minimal repro patterns, and when persistence is required.\n", - "4) Include keywords for the expected correct approach developers would search for: exact class/function names, model IDs, flags, example code terms, correct API/module names, and minimal reproducible snippets (conceptually).\n", - "5) Keep it targeted and precise; prioritize likely solutions over generic advice. Prefer explicit param names, environment variable keys, and install commands. Quote the exact error string.\n", - "\n", - "Domain-specific nuggets to always include when relevant\n", - "- OpenAI chat sessions and context:\n", - " - “No server-side session,” “conversation ID not supported,” and that the Chat Completions API requires resending conversation history each call. Include client-side memory options (ConversationSummaryMemory, ConversationBufferMemory, ConversationBufferMemoryHistory, RunnableWithMessageHistory), summarizing/truncating history, token limits, and caching tips.\n", - "- Hugging Face + LangChain:\n", - " - HuggingFacePipeline supports only text-generation, text2text-generation, and summarization; not automatic-speech-recognition (ASR). Errors like \"AttributeError: 'WhisperProcessor' object has no attribute 'config'\" indicate misuse. Include transformers.pipeline('automatic-speech-recognition'), correct ASR components (AutoModelForSpeechSeq2Seq or WhisperForConditionalGeneration + WhisperProcessor/feature extractor), and that LangChain’s HuggingFacePipeline isn’t appropriate for ASR.\n", - " - For fully local inference (e.g., Falcon 40B instruct), load with transformers (AutoTokenizer, AutoModelForCausalLM), create a text-generation pipeline, then wrap with HuggingFacePipeline or ChatHuggingFace. Include device_map=\"auto\", torch_dtype, trust_remote_code=True (Falcon), 4-bit/8-bit quantization via bitsandbytes, “local weights,” “no token,” and avoiding HfHubHTTPError: 401 by not using HuggingFaceHub. Mention integration with RetrievalQA and “example code.”\n", - "- LangChain imports and package split:\n", - " - If \"ModuleNotFoundError: No module named 'langchain_openai'\" or similar, mention recent package splits and correct installs/imports (langchain, langchain-community, langchain-core, langchain-openai). Include alternative historical imports (e.g., ChatOpenAI from langchain_community.chat_models) and installation steps. Call out breaking changes explicitly.\n", - "- Using OpenAI-compatible local endpoints with LangChain:\n", - " - Use ChatOpenAI (or OpenAI) with a custom base URL (api_base/base_url/openai_api_base) to localhost; set model/model_name and a dummy openai_api_key if required. Include environment variables (OPENAI_API_KEY, OPENAI_BASE_URL) and “OpenAI-compatible server,” “no custom client,” “works out of the box.”\n", - "- Chroma vector store:\n", - " - Chroma.get excludes embeddings by default; use include=['embeddings'] (otherwise you’ll see 'embeddings': None). Include persist()/persist_directory and verifying chroma-embeddings.parquet if persistence matters. Note that VectorstoreIndexCreator defaults may be transient (e.g., DuckDB/in-memory); use Chroma explicitly for persistence and reload from disk to avoid re-embedding costs.\n", - "- LangChain tools + Pydantic:\n", - " - If @tool(args_schema=...) raises a ValidationError about BaseModel subclass, it’s often Pydantic v2 incompatibility; pin pydantic==1.10.10 and provide the pip command.\n", - "- chromadb installation/runtime:\n", - " - ImportError or install loops often stem from Python version incompatibility; Python 3.10 typically resolves. Include virtualenv/conda hints and exact version checks.\n", - "- LlamaIndex + LangChain RAG integration:\n", - " - For sentence window retrieval, include SentenceWindowNodeParser, VectorStoreIndex, MetadataReplacementPostProcessor, LLMRerank, and sentence_window_engine. Show how to inject retrieved snippets into LangChain prompts via placeholders ({context}, {summary}, {messages}), with memory/token management terms (ConversationBufferMemory, ConversationBufferMemoryHistory) and evaluation metrics (latency, token usage).\n", - "- SQL with LangChain:\n", - " - GPT-4/ChatOpenAI with SQLDatabaseChain may output commentary around SQL, causing sqlite3.OperationalError near \"The\": syntax error. Include “create_sql_agent,” “SQLDatabaseToolkit,” stricter output parsing/templates enforcing raw SQL only, and mention use_query_checker behavior. Include model_name=\"gpt-4-0613” and differences vs text-davinci-003.\n", - "\n", - "Always weave in concrete keywords developers search for\n", - "- Exact IDs/classes/functions: openai/whisper-large-v2, ChatOpenAI, OpenAI, HuggingFacePipeline, ChatHuggingFace, RetrievalQA, Chroma.from_documents, .get(include=['embeddings']), @tool args_schema, SentenceWindowNodeParser, VectorStoreIndex, LLMRerank, MetadataReplacementPostProcessor, create_sql_agent, SQLDatabaseToolkit.\n", - "- Configuration/env keys and params: OPENAI_API_KEY, OPENAI_BASE_URL, openai_api_base/base_url/api_base, model/model_name, persist_directory, device_map=\"auto\", torch_dtype, trust_remote_code=True, bitsandbytes/4-bit/8-bit, Python 3.10, pydantic==1.10.10, “dummy API key,” “no token,” “local weights.”\n", - "- Search intents: “correct usage,” “supported tasks,” “version compatibility,” “install command,” “breaking changes,” “root cause,” “how to fix,” “example code,” “why,” “minimal repro.”\n", - "\n", - "Style\n", - "- Be specific and action-oriented; prioritize high-precision terms, exact error strings, class/param names, and install/config lines. Avoid fluff. Keep 2–5 sentences in one paragraph.\n", - "2025/08/13 21:27:07 INFO dspy.evaluate.evaluate: Average Metric: 3.5833333333333335 / 5 (71.7%)\n", - "2025/08/13 21:27:07 INFO dspy.teleprompt.gepa.gepa: Iteration 34: New subsample score is not better, skipping\n", - "2025/08/13 21:27:07 INFO dspy.teleprompt.gepa.gepa: Iteration 35: Selected program 3 score: 0.6833333333333333\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 2.83 / 5 (56.7%): 100%|██████████| 5/5 [00:18<00:00, 3.69s/it] " - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 21:27:25 INFO dspy.evaluate.evaluate: Average Metric: 2.833333333333333 / 5 (56.7%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 21:29:02 INFO dspy.teleprompt.gepa.gepa: Iteration 35: Proposed new text for expand_query: You expand developer questions into a single, concise search-query paragraph that helps a search engine retrieve high-signal, authoritative fixes and examples.\n", - "\n", - "Output format\n", - "- Return only the expanded search query as one paragraph (no headings or extra commentary), 2–5 sentences total.\n", - "\n", - "How to expand\n", - "1) Extract exact technologies, libraries, versions, models, classes/functions, parameters, flags, environment variables, and quote error messages verbatim. State the user’s task and where it’s failing.\n", - "2) Add synonyms/related names and split-package variants (e.g., LlamaIndex/llama_index/gpt_index; ChatOpenAI vs OpenAI wrappers; HuggingFacePipeline vs transformers.pipeline; langchain vs langchain-community/langchain-core/langchain-openai).\n", - "3) Anticipate root causes and fixes: version/compatibility issues and breaking changes, correct imports, supported tasks, proper params/flags, env vars, install commands and version pins, correct API/module names, and minimal repro patterns.\n", - "4) Include keywords for the “expected correct approach” developers would search for: exact class/function names, model IDs, flags, example code terms, correct module paths, and known-good snippets/patterns.\n", - "5) Keep it targeted and precise; prioritize specific, likely solutions over generic advice.\n", - "\n", - "Domain-specific nuggets to always include when relevant\n", - "- OpenAI chat sessions and context:\n", - " - The Chat Completions API has no server-side session state; conversation ID not supported. You must resend message history on every call. Include how to maintain context client-side (ConversationSummaryMemory, ConversationBufferMemory, ConversationBufferMemoryHistory, RunnableWithMessageHistory), summarizing/truncating history, token limits, and caching.\n", - "- Hugging Face + LangChain:\n", - " - LangChain’s HuggingFacePipeline supports only text-generation, text2text-generation, and summarization; it does not support automatic-speech-recognition (ASR). Errors like \"AttributeError: 'WhisperProcessor' object has no attribute 'config'\" occur when misused.\n", - " - For ASR use transformers.pipeline('automatic-speech-recognition') with AutoModelForSpeechSeq2Seq or WhisperForConditionalGeneration + WhisperProcessor (not HuggingFacePipeline). For speaker diarization, mention pyannote.audio or whisperx.\n", - " - For fully local LLM inference (e.g., Falcon 40B instruct), load with transformers (AutoTokenizer, AutoModelForCausalLM), create a text-generation pipeline, then wrap with HuggingFacePipeline or ChatHuggingFace. Include device_map=\"auto\", torch_dtype, trust_remote_code=True (Falcon), 4-bit/8-bit quantization via bitsandbytes, and avoiding HfHubHTTPError: 401 by not using HuggingFaceHub. Include “example code,” “local weights,” “no token,” and “integration with RetrievalQA.”\n", - "- LangChain imports and package split:\n", - " - If ModuleNotFoundError: No module named 'langchain_openai', include correct imports/installs after package splits (langchain, langchain-community, langchain-core, langchain-openai; older: ChatOpenAI from langchain_community.chat_models). Include “correct import path,” “installation steps,” and “breaking changes.”\n", - "- Using OpenAI-compatible local endpoints with LangChain:\n", - " - Use ChatOpenAI/OpenAI with a custom base URL (api_base/base_url/openai_api_base) pointing to localhost; set a dummy OPENAI_API_KEY if required. Mention env vars (OPENAI_API_KEY, OPENAI_BASE_URL) and params (model/model_name) and “OpenAI-compatible server,” “no custom client,” “works out of the box.”\n", - "- Chroma vector store:\n", - " - Chroma .get excludes embeddings by default; to retrieve them, use include=['embeddings'] (otherwise you'll see 'embeddings': None). Include persist(), persist_directory, and verifying chroma-embeddings.parquet if persistence is relevant.\n", - " - Prefer chromadb.PersistentClient for persistence; Chroma handles persistence—avoid manually persisting LlamaIndex storage_context for Chroma-managed data.\n", - "- LlamaIndex + Chroma:\n", - " - Ensure the same embedding model/function is used at index and query time; mismatches cause empty/None results. When integrating via LlamaIndex’s ChromaVectorStore, let LlamaIndex handle embeddings (don’t pass embedding_function directly to Chroma unless you control both sides consistently).\n", - " - Use current class names (VectorStoreIndex vs legacy GPTVectorStoreIndex) and compatible versions; be aware of 0.6.x vs newer breaking changes (ServiceContext/Settings, prompts).\n", - " - Don’t double-wrap LLMPredictor; ensure a valid LLM is configured for query.\n", - " - If using Azure OpenAI, the API version must be YYYY-MM-DD (e.g., \"2023-08-30\", not \"30/08/2023\"); include required Azure env vars (OPENAI_API_TYPE=azure, OPENAI_API_BASE, OPENAI_API_KEY/AZURE_OPENAI_API_KEY, OPENAI_API_VERSION) and deployment_name/model mapping.\n", - " - LlamaIndex can be used standalone; LangChain is not required for basic indexing/querying.\n", - "- LlamaIndex + LangChain RAG:\n", - " - For sentence window retrieval include SentenceWindowNodeParser, VectorStoreIndex, MetadataReplacementPostProcessor, LLMRerank, and sentence_window_engine. Inject retrieved snippets into the chat prompt via placeholders ({context}, {summary}, {messages}); include memory/token management terms (ConversationBufferMemory, ConversationBufferMemoryHistory) and evaluation metrics (latency, token usage).\n", - "- LangChain tools + Pydantic:\n", - " - If @tool(args_schema=...) triggers ValidationError about BaseModel subclass, it’s often Pydantic v2 vs LangChain compatibility; pin pydantic==1.10.10 (include exact pip command).\n", - "- chromadb installation/runtime:\n", - " - ImportError or install loops often due to Python version; Python 3.10 commonly resolves issues. Include env/version troubleshooting and virtualenv/conda hints.\n", - "- SQL with LangChain:\n", - " - GPT-4/ChatOpenAI with SQLDatabaseChain can emit commentary causing sqlite3.OperationalError near \"The\": syntax error. Include create_sql_agent, SQLDatabaseToolkit, stricter output parsers, prompts enforcing raw SQL, and use_query_checker behavior; mention model_name=\"gpt-4-0613\" vs legacy models.\n", - "- LangChain callbacks:\n", - " - Callback signatures can change across versions. on_llm_end typically receives a response/LLMResult and **kwargs (not event/context). Handlers often need to be attached to the LLM instance (VertexAI/ChatOpenAI) or passed at call time; chain-level callbacks may not propagate in older versions. Mention langchain-core callbacks and verbose/debug flags if relevant.\n", - "- LangChain.js JSON loading:\n", - " - When a JSON file is an array of objects, configure JSONLoader to treat each array element as one document (e.g., jq-like schema \".[]\", JSON Pointer/pointer options). DirectoryLoader extension mapping is fine; ensure valid JSON syntax (commas between fields). Include tips to merge fields into a single pageContent and preserve metadata.\n", - "\n", - "General best-practice keywords to weave in\n", - "- Exact model IDs and classes: openai/whisper-large-v2, ChatOpenAI, OpenAI, AzureOpenAI, AutoModelForSpeechSeq2Seq, WhisperForConditionalGeneration, WhisperProcessor, HuggingFacePipeline, ChatHuggingFace, RetrievalQA, Chroma.from_documents, .get(include=['embeddings']), PersistentClient, @tool args_schema, SentenceWindowNodeParser, VectorStoreIndex, LLMRerank, MetadataReplacementPostProcessor, create_sql_agent, SQLDatabaseToolkit.\n", - "- Search intents: “correct usage,” “supported tasks,” “version compatibility,” “install command,” “example code,” “breaking changes,” “root cause,” “why,” “how to fix,” “dummy API key,” “custom base URL,” “persist_directory,” “Python 3.10,” “pydantic==1.10.10,” “Azure OpenAI API version YYYY-MM-DD.”\n", - "\n", - "Style\n", - "- Be specific and action-oriented; include concrete class/param names, model IDs, env var names, and troubleshooting terms devs search for. Quote error strings verbatim. Avoid fluff.\n", - "\n", - "Remember: Output only the expanded search query as one paragraph (2–5 sentences), no extra commentary.\n", - "2025/08/13 21:29:12 INFO dspy.evaluate.evaluate: Average Metric: 3.333333333333333 / 5 (66.7%)\n", - "2025/08/13 21:29:23 INFO dspy.evaluate.evaluate: Average Metric: 9.75 / 15 (65.0%)\n", - "2025/08/13 21:29:23 INFO dspy.teleprompt.gepa.gepa: Iteration 35: Full valset score for new program: 0.65\n", - "2025/08/13 21:29:23 INFO dspy.teleprompt.gepa.gepa: Iteration 35: Full train_val score for new program: 0.65\n", - "2025/08/13 21:29:23 INFO dspy.teleprompt.gepa.gepa: Iteration 35: Individual valset scores for new program: [0.5, 1.0, 0.0, 1.0, 1.0, 1.0, 0.3333333333333333, 0.3333333333333333, 0.0, 1.0, 1.0, 0.3333333333333333, 1.0, 0.5, 0.75]\n", - "2025/08/13 21:29:23 INFO dspy.teleprompt.gepa.gepa: Iteration 35: New valset pareto front scores: [0.5, 1.0, 0.0, 1.0, 1.0, 1.0, 0.3333333333333333, 0.3333333333333333, 1.0, 1.0, 1.0, 0.3333333333333333, 1.0, 0.5, 0.75]\n", - "2025/08/13 21:29:23 INFO dspy.teleprompt.gepa.gepa: Iteration 35: Full valset pareto front score: 0.7166666666666667\n", - "2025/08/13 21:29:23 INFO dspy.teleprompt.gepa.gepa: Iteration 35: Updated valset pareto front programs: [{3, 4, 5, 6, 7, 8}, {0, 1, 2, 3, 4, 5, 6, 7, 8}, {0, 1, 2, 3, 4, 5, 6, 7, 8}, {1, 2, 4, 5, 6, 7, 8}, {0, 1, 2, 3, 4, 5, 6, 7, 8}, {0, 1, 3, 4, 5, 6, 7, 8}, {0, 1, 2, 3, 4, 5, 6, 7, 8}, {0, 1, 2, 3, 4, 5, 6, 7, 8}, {0, 3, 6, 7}, {0, 1, 2, 3, 4, 5, 6, 7, 8}, {0, 1, 2, 3, 4, 5, 6, 7, 8}, {0, 1, 3, 4, 5, 6, 8}, {8, 3}, {0, 1, 2, 3, 4, 5, 6, 7, 8}, {8, 2, 3, 7}]\n", - "2025/08/13 21:29:23 INFO dspy.teleprompt.gepa.gepa: Iteration 35: Best valset aggregate score so far: 0.6833333333333333\n", - "2025/08/13 21:29:23 INFO dspy.teleprompt.gepa.gepa: Iteration 35: Best program as per aggregate score on train_val: 3\n", - "2025/08/13 21:29:23 INFO dspy.teleprompt.gepa.gepa: Iteration 35: Best program as per aggregate score on valset: 3\n", - "2025/08/13 21:29:23 INFO dspy.teleprompt.gepa.gepa: Iteration 35: Best score on valset: 0.6833333333333333\n", - "2025/08/13 21:29:23 INFO dspy.teleprompt.gepa.gepa: Iteration 35: Best score on train_val: 0.6833333333333333\n", - "2025/08/13 21:29:23 INFO dspy.teleprompt.gepa.gepa: Iteration 35: Linear pareto front program index: 3\n", - "2025/08/13 21:29:23 INFO dspy.teleprompt.gepa.gepa: Iteration 35: New program candidate index: 8\n", - "2025/08/13 21:29:23 INFO dspy.teleprompt.gepa.gepa: Iteration 36: No merge candidates found\n", - "2025/08/13 21:29:23 INFO dspy.teleprompt.gepa.gepa: Iteration 36: Selected program 3 score: 0.6833333333333333\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 3.83 / 5 (76.7%): 100%|██████████| 5/5 [00:11<00:00, 2.23s/it] " - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 21:29:34 INFO dspy.evaluate.evaluate: Average Metric: 3.8333333333333335 / 5 (76.7%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 21:30:24 INFO dspy.teleprompt.gepa.gepa: Iteration 36: Proposed new text for expand_query: You expand developer questions into a single, concise search-query paragraph that helps a search engine retrieve high-signal, authoritative fixes and examples.\n", - "\n", - "Output format\n", - "- Return only the expanded search query as one paragraph (no headings or extra commentary), 2–5 sentences total.\n", - "\n", - "How to expand\n", - "1) Extract exact technologies, libraries, versions, models, classes/functions, parameters, flags, and quote error messages verbatim. State the user’s goal, where it’s failing, environment details (Python version, package versions), and minimal repro hints.\n", - "2) Add synonyms/related names (e.g., LlamaIndex/llama_index/gpt_index; ChatOpenAI vs OpenAI wrappers; HuggingFacePipeline vs transformers.pipeline), and known breaking changes or package splits.\n", - "3) Anticipate root causes and fixes: version/compatibility issues, correct imports, supported tasks, parameters/flags, env vars, install commands, version pins, and when to upgrade/downgrade specific packages. Include example-code keywords that reflect the “expected correct approach.”\n", - "4) Include targeted keywords a developer would search: exact class/function names, model IDs, flags, correct API/module names, minimal repro patterns, and precise error strings. Prefer concrete, likely solutions over generic advice.\n", - "\n", - "Domain-specific nuggets to always include when relevant\n", - "- OpenAI chat sessions and context:\n", - " - The Chat Completions API has no server-side session state; you must resend conversation history each call. Include “no server-side session,” “conversation ID not supported,” “how to maintain context,” and “message history management.” Add best practices for client-side memory (ConversationSummaryMemory, ConversationBufferMemory, ConversationBufferMemoryHistory, RunnableWithMessageHistory), summarizing/truncating history, token limits, and caching.\n", - "- LlamaIndex + LangChain RAG integration:\n", - " - For sentence window retrieval, mention llama_index’s SentenceWindowNodeParser, VectorStoreIndex, MetadataReplacementPostProcessor, LLMRerank, and building a sentence_window_engine. Emphasize injecting retrieved snippets/summaries into the LangChain chat prompt (avoid dumping entire documents) using prompt placeholders such as {context}, {summary}, and {messages}. Include memory/token management (ConversationBufferMemory, ConversationBufferMemoryHistory), and evaluation metrics (latency, token usage).\n", - "- Hugging Face + LangChain:\n", - " - HuggingFacePipeline supports only text-generation, text2text-generation, and summarization; it does not support ASR. Include correct ASR components (transformers.pipeline('automatic-speech-recognition'), AutoModelForSpeechSeq2Seq/WhisperForConditionalGeneration + WhisperProcessor/feature extractor) and that LangChain’s HuggingFacePipeline isn’t appropriate for ASR. Mention misuses like AttributeError: 'WhisperProcessor' object has no attribute 'config'.\n", - " - For fully local inference (e.g., Falcon 40B instruct), load with transformers (AutoTokenizer, AutoModelForCausalLM), create a text-generation pipeline, then wrap with HuggingFacePipeline or ChatHuggingFace. Include device_map=\"auto\", torch_dtype, trust_remote_code=True (Falcon), bitsandbytes 4-bit/8-bit quantization, offloading, and “local weights,” “no token,” “example code,” and “integration with RetrievalQA.” Avoid HfHubHTTPError: 401 by not using HuggingFaceHub.\n", - "- LangChain imports and package split:\n", - " - If ModuleNotFoundError: No module named 'langchain_openai', include the package splits and correct imports/installs (langchain, langchain-community, langchain-core, langchain-openai; or older ChatOpenAI from langchain_community.chat_models). Include “correct import path,” “installation steps,” and “breaking changes.”\n", - "- Using OpenAI-compatible local endpoints with LangChain:\n", - " - Use ChatOpenAI (or OpenAI) with custom base URL (api_base/base_url/openai_api_base) pointing to localhost; set a dummy openai_api_key if required. Include env vars (OPENAI_API_KEY, OPENAI_BASE_URL), parameters (model/model_name), “OpenAI-compatible server,” and “works out of the box.”\n", - "- Chroma vector store:\n", - " - Chroma .get excludes embeddings by default; to retrieve them, use include=['embeddings'] (otherwise 'embeddings': None). Include persist(), persist_directory, and verifying chroma-embeddings.parquet if persistence matters.\n", - "- chromadb installation/runtime:\n", - " - ImportError or install loops are often Python version incompatibility; recommend Python 3.10, virtualenv/conda setup, and verifying correct pip env. Include “Python 3.10,” “compatibility,” and minimal repro import checks.\n", - "- LangChain tools + Pydantic:\n", - " - For ValidationError about BaseModel in @tool(args_schema=...), note Pydantic v2 vs LangChain compatibility; pin pydantic==1.10.10. Include explicit commands: pip install \"pydantic==1.10.10\" or add to requirements.txt.\n", - "- Redis retriever (LangChain) hybrid search:\n", - " - Pass filters inside search_kwargs as a dict (not strings), using the 'filter' key with RediSearch syntax (e.g., '@field:(pattern) @flag:{true}'). Highlight known issues where LangChain’s _prepare_range_query() produced invalid RediSearch syntax like ')=>{$yield_distance_as: distance}'; suggest upgrading/downgrading LangChain, constructing a correct manual query, or ensuring the filter precedes the vector condition (filters first, then VECTOR_RANGE or KNN). Include “similarity_distance_threshold,” “search_kwargs dict,” “correct filter placement,” and “Invalid attribute yield_distance_as” error keywords.\n", - "- SQL with LangChain:\n", - " - GPT-4/ChatOpenAI with SQLDatabaseChain can add commentary around SQL causing sqlite3.OperationalError near \"The\": syntax error. Include “create_sql_agent,” “SQLDatabaseToolkit,” “output parser,” “prompt template enforcing raw SQL,” and “use_query_checker behavior.”\n", - "\n", - "General best-practice keywords to weave in\n", - "- Exact model IDs and classes: openai/whisper-large-v2, ChatOpenAI, HuggingFacePipeline, ChatHuggingFace, RetrievalQA, Chroma.from_documents, .get(include=['embeddings']), @tool args_schema, SentenceWindowNodeParser, VectorStoreIndex, LLMRerank, MetadataReplacementPostProcessor, create_sql_agent, SQLDatabaseToolkit, MessagesPlaceholder, RunnableWithMessageHistory.\n", - "- Search intents: “correct usage,” “supported tasks,” “version compatibility,” “install command,” “example code,” “breaking changes,” “root cause,” “how to fix,” “dummy API key,” “custom base URL,” “persist_directory,” “Python 3.10,” “pydantic==1.10.10,” “filter precedes vector query.”\n", - "\n", - "Style\n", - "- Be specific and action-oriented; include concrete class/param names, model IDs, flags, and troubleshooting terms developers search for. Quote errors exactly. Avoid fluff and keep the query sharply targeted at most likely solutions and correct patterns.\n", - "2025/08/13 21:30:30 INFO dspy.evaluate.evaluate: Average Metric: 3.0333333333333337 / 5 (60.7%)\n", - "2025/08/13 21:30:30 INFO dspy.teleprompt.gepa.gepa: Iteration 36: New subsample score is not better, skipping\n", - "2025/08/13 21:30:30 INFO dspy.teleprompt.gepa.gepa: Iteration 37: Selected program 3 score: 0.6833333333333333\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 4.50 / 5 (90.0%): 100%|██████████| 5/5 [00:08<00:00, 1.64s/it] " - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 21:30:39 INFO dspy.evaluate.evaluate: Average Metric: 4.5 / 5 (90.0%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 21:31:30 INFO dspy.teleprompt.gepa.gepa: Iteration 37: Proposed new text for expand_query: You expand developer questions into a single, concise search-query paragraph that helps a search engine retrieve high-signal, authoritative fixes and examples.\n", - "\n", - "Output format\n", - "- Return only the expanded search query as one paragraph (no headings or extra commentary), 2–5 sentences total.\n", - "\n", - "How to expand\n", - "1) Extract exact technologies, libraries, versions, models, classes/functions, parameters/flags, environment variables, and verbatim error messages. State the user’s task and where it fails.\n", - "2) Add synonyms and related names (e.g., LlamaIndex/llama_index/gpt_index; ChatOpenAI vs OpenAI wrappers; HuggingFacePipeline vs transformers.pipeline; ChatHuggingFace; OpenAI vs OpenAI-compatible server), plus common misconfigurations and breaking changes.\n", - "3) Anticipate root causes and fixes: version/compatibility issues, correct imports, supported tasks, proper parameters/flags, environment variables, install commands, and version pins. Include minimal repro patterns and “example code,” “correct usage,” and “why/how to fix.”\n", - "4) Include keywords for the “expected correct approach”: exact class/function names, model IDs, flags, API/module names, and working configuration patterns. Prefer specific, likely solutions over generic advice.\n", - "5) Be explicit about known defaults and breaking changes (e.g., defaults that hide data, renamed packages, split modules, deprecated params). Quote error strings exactly.\n", - "\n", - "Domain-specific nuggets to always include when relevant\n", - "- OpenAI chat sessions and context:\n", - " - The Chat Completions API has no server-side session state; conversation ID not supported. You must resend conversation history each call. Include best practices for client-side memory (ConversationSummaryMemory, ConversationBufferMemory, ConversationBufferMemoryHistory, RunnableWithMessageHistory), summarizing/truncating history, token limits, and caching. Add keywords like “no server-side session,” “how to maintain context,” and “message history management.”\n", - "- Hugging Face + LangChain:\n", - " - HuggingFacePipeline supports only text-generation, text2text-generation, summarization; it does not support ASR. Errors like \"AttributeError: 'WhisperProcessor' object has no attribute 'config'\" appear when misused. Include transformers.pipeline('automatic-speech-recognition') with AutoModelForSpeechSeq2Seq/WhisperForConditionalGeneration + WhisperProcessor/feature extractor, and that LangChain’s HuggingFacePipeline isn’t appropriate for ASR.\n", - " - For fully local inference (e.g., Falcon 40B instruct): use transformers (AutoTokenizer, AutoModelForCausalLM), create a text-generation pipeline, then wrap with HuggingFacePipeline/ChatHuggingFace. Include device_map=\"auto\", torch_dtype, trust_remote_code=True (Falcon), bitsandbytes 4-bit/8-bit, local weights/no token, avoiding HfHubHTTPError: 401 by not using HuggingFaceHub, and “integration with RetrievalQA.”\n", - "- LangChain imports and package split:\n", - " - If ModuleNotFoundError: No module named 'langchain_openai' (or similar): note the package split and correct installs/imports (langchain, langchain-community, langchain-core, langchain-openai; older: ChatOpenAI from langchain_community.chat_models). Include “correct import path,” “installation steps,” and “breaking changes.”\n", - "- Using OpenAI-compatible local endpoints with LangChain:\n", - " - Use ChatOpenAI (or OpenAI) with a custom base URL (api_base/base_url/openai_api_base) pointing to localhost; set a dummy openai_api_key if required. Include env vars (OPENAI_API_KEY, OPENAI_BASE_URL) and params (model/model_name). “OpenAI-compatible server,” “no custom client,” “works out of the box.”\n", - "- Chroma vector store:\n", - " - Chroma .get excludes embeddings by default; to retrieve them use include=['embeddings'] (otherwise you'll see 'embeddings': None). Include persist(), persist_directory, and verifying chroma-embeddings.parquet if persistence matters.\n", - "- LangChain tools + Pydantic:\n", - " - With @tool(args_schema=...) ValidationError about BaseModel subclass, it’s often Pydantic v2 vs LangChain compatibility; pin pydantic==1.10.10 (pip install -U \"pydantic==1.10.10\").\n", - "- chromadb installation/runtime:\n", - " - ImportError or install loops are often Python version incompatibility; Python 3.10 commonly resolves chromadb issues. Include environment/version troubleshooting and virtualenv/conda hints.\n", - "- Redis retriever hybrid search:\n", - " - search_kwargs/retriever_search_kwargs must be dicts, not strings; include 'filter' as a proper string inside the dict. Known issue: in some LangChain versions (e.g., 0.0.346) _prepare_range_query may generate invalid Redis syntax like \"Invalid attribute yield_distance_as\"; fix by ensuring the filter comes before the vector range clause and by upgrading/pinning to a fixed version. Include correct query shape, example code, and parameter names (search_type=\"similarity_distance_threshold\", k, distance_threshold, include_metadata).\n", - "- LlamaIndex + LangChain RAG integration:\n", - " - For sentence window retrieval: SentenceWindowNodeParser, VectorStoreIndex, MetadataReplacementPostProcessor, LLMRerank, sentence_window_engine. Inject retrieved snippets into LangChain prompts via placeholders (e.g., {context}, {summary}, {messages}); include memory/token management terms and evaluation metrics (latency, token usage).\n", - "- SQL with LangChain:\n", - " - GPT-4/ChatOpenAI with SQLDatabaseChain may wrap SQL in prose causing sqlite3.OperationalError near \"The\": syntax error. Include create_sql_agent, SQLDatabaseToolkit, output parser, prompt template enforcing raw SQL, and use_query_checker behavior. Mention model_name=\"gpt-4-0613” differences vs text-davinci-003.\n", - "\n", - "General best-practice keywords to weave in\n", - "- Exact model IDs and classes: openai/whisper-large-v2, ChatOpenAI, OpenAI, HuggingFacePipeline, ChatHuggingFace, RetrievalQA, Chroma.from_documents, .get(include=['embeddings']), @tool args_schema, SentenceWindowNodeParser, VectorStoreIndex, LLMRerank, MetadataReplacementPostProcessor, create_sql_agent, SQLDatabaseToolkit.\n", - "- Search intents: “correct usage,” “supported tasks,” “version compatibility,” “install command,” “example code,” “breaking changes,” “root cause,” “why,” “how to fix,” “dummy API key,” “custom base URL,” “persist_directory,” “Python 3.10,” “pydantic==1.10.10.”\n", - "\n", - "Style\n", - "- Be specific and action-oriented; prefer concrete class/param names, model IDs, error strings, and troubleshooting terms developers actually search for.\n", - "- Keep it targeted and precise; 2–5 sentences. No headings, no extra commentary.\n", - "2025/08/13 21:31:35 INFO dspy.evaluate.evaluate: Average Metric: 4.5 / 5 (90.0%)\n", - "2025/08/13 21:31:35 INFO dspy.teleprompt.gepa.gepa: Iteration 37: New subsample score is not better, skipping\n" - ] - } - ], - "source": [ - "import dspy\n", - "\n", - "import logging\n", - "\n", - "# Simple setup for Jupyter\n", - "logging.basicConfig(level=logging.INFO, force=True)\n", - "logging.getLogger('dspy.teleprompt.gepa').setLevel(logging.INFO)\n", - "logging.getLogger('gepa').setLevel(logging.INFO)\n", - "\n", - "# SILENCE the noisy HTTP loggers\n", - "logging.getLogger('httpx').setLevel(logging.WARNING) # Only warnings and errors\n", - "logging.getLogger('openai').setLevel(logging.WARNING)\n", - "logging.getLogger('weaviate').setLevel(logging.WARNING)\n", - "logging.getLogger('httpcore').setLevel(logging.WARNING)\n", - "\n", - "reflection_lm = dspy.LM(\n", - " model=\"gpt-5\",\n", - " temperature=1.0,\n", - " max_tokens=32_000\n", - ")\n", - "\n", - "optimizer = dspy.GEPA(\n", - " metric=metric_for_gepa,\n", - " max_metric_calls=500,\n", - " reflection_lm=reflection_lm,\n", - " reflection_minibatch_size=5,\n", - " use_merge=True,\n", - " num_threads=8\n", - ")\n", - "\n", - "# there are 30 samples in `trainset` to begin with\n", - "trainset=trainset[:15] # these are randomly sampled for Reflective Prompt Mutation\n", - "valset=trainset[15:] # these samples create the pareto frontier\n", - "\n", - "optimized_query_expander = optimizer.compile(\n", - " query_writer,\n", - " trainset=trainset,\n", - " valset=valset\n", - ")" - ] - }, - { - "cell_type": "code", - "execution_count": 25, - "id": "5c56e265", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "GEPA run is finished!\n" - ] - } - ], - "source": [ - "print(\"GEPA run is finished!\")" - ] - }, - { - "cell_type": "code", - "execution_count": 26, - "id": "c2e1c0ee", - "metadata": {}, - "outputs": [], - "source": [ - "optimized_query_expander.save(\"gepa_optimized_query_expander.json\")" - ] - }, - { - "cell_type": "code", - "execution_count": 28, - "id": "835a1505", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 12.78 / 20 (63.9%): 100%|██████████| 20/20 [00:21<00:00, 1.06s/it]" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/08/13 21:32:33 INFO dspy.evaluate.evaluate: Average Metric: 12.783333333333333 / 20 (63.9%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "text/plain": [ - "EvaluationResult(score=63.92, results=)" - ] - }, - "execution_count": 28, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "evaluator(optimized_query_expander, **dspy_evaluator_kwargs)" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": ".venv", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.11.4" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} diff --git a/optimization_runs/mipro_listwise_reranker.ipynb b/optimization_runs/mipro_listwise_reranker.ipynb deleted file mode 100644 index 050b3fa..0000000 --- a/optimization_runs/mipro_listwise_reranker.ipynb +++ /dev/null @@ -1,442 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 6, - "id": "5373a30a", - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pydantic/main.py:453: UserWarning: Pydantic serializer warnings:\n", - " PydanticSerializationUnexpectedValue(Expected 9 fields but got 6: Expected `Message` - serialized value may not be as expected [input_value=Message(content='[[ ## re...: None}, annotations=[]), input_type=Message])\n", - " PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [input_value=Choices(finish_reason='st...ider_specific_fields={}), input_type=Choices])\n", - " return self.__pydantic_serializer__.to_python(\n" - ] - }, - { - "data": { - "text/plain": [ - "Prediction(\n", - " final_answer='',\n", - " sources=[Source(object_id='5163bd72-2249-4fa0-9ac4-7ba904a7f4e4'), Source(object_id='1cd71614-2aff-4961-a431-a05d5c37f25c'), Source(object_id='4cf9a2cb-e9ec-4883-8075-05054f38f8a6'), Source(object_id='b980876d-0521-4440-abf5-ea2b64dc96ff'), Source(object_id='9608f261-9b02-4a9e-ab23-54b3b4c704f8'), Source(object_id='eabdeb84-50fd-43c0-8711-a0ec0a3d34b2'), Source(object_id='ca6fcd08-ef78-4118-b132-757f71cfd1ca'), Source(object_id='a878bad7-1e66-427d-9672-aa96164bb41b'), Source(object_id='542bf62c-7fa7-414d-88b5-2485d11012b1')],\n", - " searches=['How can I use Weaviate with LangChain?'],\n", - " aggregations=None,\n", - " usage={}\n", - ")" - ] - }, - "execution_count": 6, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "import retrieve_dspy\n", - "\n", - "listwise_reranker = retrieve_dspy.ListwiseReranker(\n", - " collection_name=\"FreshstackLangchain\",\n", - " target_property_name=\"docs_text\",\n", - " diverse_ranker=True,\n", - " retrieved_k=50,\n", - " reranked_k=20\n", - ")\n", - "\n", - "listwise_reranker(\"How can I use Weaviate with LangChain?\")" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "id": "0f2b8e58", - "metadata": {}, - "outputs": [], - "source": [ - "from retrieve_dspy.metrics import create_metric\n", - "from retrieve_dspy.datasets.in_memory import load_queries_in_memory\n", - "\n", - "trainset, testset = load_queries_in_memory(\n", - " dataset_name=\"freshstack-langchain\",\n", - " train_samples=10,\n", - " test_samples=10\n", - ")\n", - "\n", - "metric = create_metric(\n", - " metric_type=\"coverage\",\n", - " dataset_name=\"freshstack-langchain\"\n", - ")\n", - "\n", - "evaluator = retrieve_dspy.utils.get_evaluator(\n", - " testset=testset,\n", - " metric=metric\n", - ")" - ] - }, - { - "cell_type": "code", - "execution_count": 16, - "id": "fae47331", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Metric type: \n", - "Metric contents: .coverage_metric at 0x330629900>\n" - ] - } - ], - "source": [ - "# Add this cell to debug\n", - "print(\"Metric type:\", type(metric))\n", - "print(\"Metric contents:\", metric)" - ] - }, - { - "cell_type": "code", - "execution_count": 19, - "id": "1b79425b", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - " 0%| | 0/10 [00:00\n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n" - ] - }, - { - "data": { - "text/plain": [ - "49.17" - ] - }, - "execution_count": 19, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "dspy_evaluator_kwargs = {\n", - " \"num_threads\": 4\n", - "}\n", - "\n", - "evaluator(listwise_reranker, **dspy_evaluator_kwargs)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "b460fc82", - "metadata": {}, - "outputs": [], - "source": [ - "import dspy\n", - "\n", - "optimizer = dspy.MIPROv2(\n", - " metric=metric,\n", - " auto=\"heavy\",\n", - " verbose=True\n", - ")\n", - "\n", - "optimized_listwise_reranker = optimizer.compile(\n", - " listwise_reranker,\n", - " trainset=trainset,\n", - " requires_permission_to_run=False\n", - ")" - ] - }, - { - "cell_type": "code", - "execution_count": 21, - "id": "1e6d0414", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "MIPRO run is finished!\n" - ] - } - ], - "source": [ - "print(\"MIPRO run is finished!\")" - ] - }, - { - "cell_type": "code", - "execution_count": 22, - "id": "fd7ec029", - "metadata": {}, - "outputs": [], - "source": [ - "optimized_listwise_reranker.save(\"mipro_optimized_listwise_reranker.json\")" - ] - }, - { - "cell_type": "code", - "execution_count": 23, - "id": "5d53855c", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - " 0%| | 0/10 [00:00\n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n" - ] - }, - { - "data": { - "text/plain": [ - "60.83" - ] - }, - "execution_count": 23, - "metadata": {}, - "output_type": "execute_result" - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n" - ] - } - ], - "source": [ - "evaluator(optimized_listwise_reranker, **dspy_evaluator_kwargs)" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.10.10" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} diff --git a/optimization_runs/mipro_optimized_listwise_reranker.json b/optimization_runs/mipro_optimized_listwise_reranker.json deleted file mode 100644 index 99d7eab..0000000 --- a/optimization_runs/mipro_optimized_listwise_reranker.json +++ /dev/null @@ -1,353 +0,0 @@ -{ - "reranker": { - "traces": [], - "train": [], - "demos": [ - { - "augmented": true, - "query": "I'm trying to load 6b 128b 8bit llama based model from file (note the model itself is an example, I tested others and got similar problems), the pipeline is completely eating up my 8gb of vram:\n\n\nMy code:\nfrom langchain.llms import HuggingFacePipeline\nfrom langchain import PromptTemplate, LLMChain\n\nimport torch\nfrom transformers import LlamaTokenizer, LlamaForCausalLM, LlamaConfig, pipeline\n\ntorch.cuda.set_device(torch.device(\"cuda:0\"))\n\nPATH = '.\/models\/wizardLM-7B-GPTQ-4bit-128g'\nconfig = LlamaConfig.from_json_file(f'{PATH}\/config.json')\nbase_model = LlamaForCausalLM(config=config).half()\n\ntorch.cuda.empty_cache()\ntokenizer = LlamaTokenizer.from_pretrained(\n pretrained_model_name_or_path=PATH,\n low_cpu_mem_usage=True,\n local_files_only=True\n)\ntorch.cuda.empty_cache()\n\npipe = pipeline(\n \"text-generation\",\n model=base_model,\n tokenizer=tokenizer,\n batch_size=1,\n device=0,\n max_length=100,\n temperature=0.6,\n top_p=0.95,\n repetition_penalty=1.2\n)\n\nHow can I make the pipeline initiation consume less vram?\ngpu: AMD® Radeon rx 6600 (8gb vram, rocm 5.4.2 & torch)\nI want to mention that I managed to load the same model on other frameworks like \"KoboldAI\" or \"text-generation-webui\" so I know it should be possible.\nTo load the model \"wizardLM-7B-GPTQ-4bit-128g\" downloaded from huggingface and run it using with langchain on python.\npip list output:\n Package Version\n------------------------ ----------------\naccelerate 0.19.0\naiofiles 23.1.0\naiohttp 3.8.4\naiosignal 1.3.1\naltair 5.0.0\nanyio 3.6.2\nargilla 1.7.0\nasync-timeout 4.0.2\nattrs 23.1.0\nbackoff 2.2.1\nbeautifulsoup4 4.12.2\nbitsandbytes 0.39.0\ncertifi 2022.12.7\ncffi 1.15.1\nchardet 5.1.0\ncharset-normalizer 2.1.1\nchromadb 0.3.23\nclick 8.1.3\nclickhouse-connect 0.5.24\ncmake 3.25.0\ncolorclass 2.2.2\ncommonmark 0.9.1\ncompressed-rtf 1.0.6\ncontourpy 1.0.7\ncryptography 40.0.2\ncycler 0.11.0\ndataclasses-json 0.5.7\ndatasets 2.12.0\nDeprecated 1.2.13\ndill 0.3.6\nduckdb 0.8.0\neasygui 0.98.3\nebcdic 1.1.1\net-xmlfile 1.1.0\nextract-msg 0.41.1\nfastapi 0.95.2\nffmpy 0.3.0\nfilelock 3.9.0\nfonttools 4.39.4\nfrozenlist 1.3.3\nfsspec 2023.5.0\ngradio 3.28.3\ngradio_client 0.2.5\ngreenlet 2.0.2\nh11 0.14.0\nhnswlib 0.7.0\nhttpcore 0.16.3\nhttptools 0.5.0\nhttpx 0.23.3\nhuggingface-hub 0.14.1\nidna 3.4\nIMAPClient 2.3.1\nJinja2 3.1.2\njoblib 1.2.0\njsonschema 4.17.3\nkiwisolver 1.4.4\nlangchain 0.0.171\nlark-parser 0.12.0\nlinkify-it-py 2.0.2\nlit 15.0.7\nllama-cpp-python 0.1.50\nloralib 0.1.1\nlxml 4.9.2\nlz4 4.3.2\nMarkdown 3.4.3\nmarkdown-it-py 2.2.0\nMarkupSafe 2.1.2\nmarshmallow 3.19.0\nmarshmallow-enum 1.5.1\nmatplotlib 3.7.1\nmdit-py-plugins 0.3.3\nmdurl 0.1.2\nmonotonic 1.6\nmpmath 1.2.1\nmsg-parser 1.2.0\nmsoffcrypto-tool 5.0.1\nmultidict 6.0.4\nmultiprocess 0.70.14\nmypy-extensions 1.0.0\nnetworkx 3.0\nnltk 3.8.1\nnumexpr 2.8.4\nnumpy 1.24.1\nnvidia-cublas-cu11 11.10.3.66\nnvidia-cuda-cupti-cu11 11.7.101\nnvidia-cuda-nvrtc-cu11 11.7.99\nnvidia-cuda-runtime-cu11 11.7.99\nnvidia-cudnn-cu11 8.5.0.96\nnvidia-cufft-cu11 10.9.0.58\nnvidia-curand-cu11 10.2.10.91\nnvidia-cusolver-cu11 11.4.0.1\nnvidia-cusparse-cu11 11.7.4.91\nnvidia-nccl-cu11 2.14.3\nnvidia-nvtx-cu11 11.7.91\nolefile 0.46\noletools 0.60.1\nopenai 0.27.7\nopenapi-schema-pydantic 1.2.4\nopenpyxl 3.1.2\norjson 3.8.12\npackaging 23.1\npandas 1.5.3\npandoc 2.3\npcodedmp 1.2.6\npdfminer.six 20221105\nPillow 9.3.0\npip 23.0.1\nplumbum 1.8.1\nply 3.11\nposthog 3.0.1\npsutil 5.9.5\npyarrow 12.0.0\npycparser 2.21\npydantic 1.10.7\npydub 0.25.1\nPygments 2.15.1\npygpt4all 1.1.0\npygptj 2.0.3\npyllamacpp 2.3.0\npypandoc 1.11\npyparsing 2.4.7\npyrsistent 0.19.3\npython-dateutil 2.8.2\npython-docx 0.8.11\npython-dotenv 1.0.0\npython-magic 0.4.27\npython-multipart 0.0.6\npython-pptx 0.6.21\npytorch-triton-rocm 2.0.1\npytz 2023.3\npytz-deprecation-shim 0.1.0.post0\nPyYAML 6.0\nred-black-tree-mod 1.20\nregex 2023.5.5\nrequests 2.28.1\nresponses 0.18.0\nrfc3986 1.5.0\nrich 13.0.1\nRTFDE 0.0.2\nscikit-learn 1.2.2\nscipy 1.10.1\nsemantic-version 2.10.0\nsentence-transformers 2.2.2\nsentencepiece 0.1.99\nsetuptools 66.0.0\nsix 1.16.0\nsniffio 1.3.0\nsoupsieve 2.4.1\nSQLAlchemy 2.0.15\nstarlette 0.27.0\nsympy 1.11.1\ntabulate 0.9.0\ntenacity 8.2.2\nthreadpoolctl 3.1.0\ntokenizers 0.13.3\ntoolz 0.12.0\ntorch 2.0.1+rocm5.4.2\ntorchaudio 2.0.2+rocm5.4.2\ntorchvision 0.15.2+rocm5.4.2\ntqdm 4.65.0\ntransformers 4.30.0.dev0\ntriton 2.0.0\ntyper 0.9.0\ntyping_extensions 4.4.0\ntyping-inspect 0.8.0\ntzdata 2023.3\ntzlocal 4.2\nuc-micro-py 1.0.2\nunstructured 0.6.6\nurllib3 1.26.13\nuvicorn 0.22.0\nuvloop 0.17.0\nwatchfiles 0.19.0\nwebsockets 11.0.3\nwheel 0.38.4\nwikipedia 1.4.0\nwrapt 1.14.1\nXlsxWriter 3.1.0\nxxhash 3.2.0\nyarl 1.9.2\nzstandard 0.21.0\n\n", - "search_results": [ - { - "id": 1, - "initial_rank": 1, - "content": ". Lower Precision\n\nMemory requirements of LLMs can be best understood by seeing the LLM as a set of weight matrices and vectors and the text inputs as a sequence of vectors. In the following, the definition *weights* will be used to signify all model weight matrices and vectors.\n\nAt the time of writing this guide, LLMs consist of at least a couple billion parameters. Each parameter thereby is made of a decimal number, e.g. `4.5689` which is usually stored in either [float32](https:\/\/en.wikipedia.org\/wiki\/Single-precision_floating-point_format), [bfloat16](https:\/\/en.wikipedia.org\/wiki\/Bfloat16_floating-point_format), or [float16](https:\/\/en.wikipedia.org\/wiki\/Half-precision_floating-point_format) format. This allows us to easily compute the memory requirement to load the LLM into memory:\n\n> *Loading the weights of a model having X billion parameters requires roughly 4 * X GB of VRAM in float32 precision*\n\nNowadays, models are however rarely trained in full float32 precision, but usually in bfloat16 precision or less frequently in float16 precision. Therefore the rule of thumb becomes:\n\n> *Loading the weights of a model having X billion parameters requires roughly 2 * X GB of VRAM in bfloat16\/float16 precision*\n\nFor shorter text inputs (less than 1024 tokens), the memory requirement for inference is very much dominated by the memory requirement to load the weights. Therefore, for now, let's assume that the memory requirement for inference is equal to the memory requirement to load the model into the GPU VRAM.\n\nTo give some examples of how much VRAM it roughly takes to load a model in bfloat16:\n\n- **GPT3** requires 2 \\* 175 GB = **350 GB** VRAM\n- [**Bloom**](https:\/\/huggingface.co\/bigscience\/bloom) requires 2 \\* 176 GB = **352 GB** VRAM\n- [**Llama-2-70b**](https:\/\/huggingface.co\/meta-llama\/Llama-2-70b-hf) requires 2 \\* 70 GB = **140 GB** VRAM\n- [**Falcon-40b**](https:\/\/huggingface.co\/tiiuae\/falcon-40b) requires 2 \\* 40 GB = **80 GB** VRAM\n- [**MPT-30b**](https:\/\/huggingface.co\/mosaicml\/mpt-30b) requires 2 \\* 30 GB = **60 GB** VRAM\n- [**bigcode\/starcoder**](https:\/\/huggingface.co\/bigcode\/starcoder) requires 2 \\* 15.5 = **31 GB** VRAM\n\nAs of writing this document, the largest GPU chip on the market is the A100 & H100 offering 80GB of VRAM. Most of the models listed before require more than 80GB just to be loaded and therefore necessarily require [tensor parallelism](https:\/\/huggingface.co\/docs\/transformers\/perf_train_gpu_many#tensor-parallelism) and\/or [pipeline parallelism](https:\/\/huggingface.co\/docs\/transformers\/perf_train_gpu_many#naive-model-parallelism-vertical-and-pipeline-parallelism).\n\n🤗 Transformers does not support tensor parallelism out of the box as it requires the model architecture to be written in a specific way. If you're interested in writing models in a tensor-parallelism-friendly way, feel free to have a look at [the text-generation-inference library](https:\/\/github.com\/huggingface\/text-generation-inference\/tree\/main\/server\/text_generation_server\/models\/custom_modeling).\n\nNaive pipeline parallelism is supported out of the box. For this, simply load the model with `device=\"auto\"` which will automatically place the different layers on the available GPUs as explained [here](https:\/\/huggingface.co\/docs\/accelerate\/v0.22.0\/en\/concept_guides\/big_model_inference).\nNote, however that while very effective, this naive pipeline parallelism does not tackle the issues of GPU idling. For this more advanced pipeline parallelism is required as explained [here](https:\/\/huggingface.co\/docs\/transformers\/en\/perf_train_gpu_many#naive-model-parallelism-vertical-and-pipeline-parallelism).\n\nIf you have access to an 8 x 80GB A100 node, you could load BLOOM as follows\n\n```bash\n!pip install transformers accelerate bitsandbytes optimum\n```\n```python\nfrom transformers import AutoModelForCausalLM\n\nmodel = AutoModelForCausalLM.from_pretrained(\"bigscience\/bloom\", device_map=\"auto\", pad_token_id=0)\n```\n\nBy using `device_map=\"auto\"` the attention layers would be equally distributed over all available GPUs.\n\nIn this guide, we will use [bigcode\/octocoder](https:\/\/huggingface.co\/bigcode\/octocoder) as it can be run on a single 40 GB A100 GPU device chip. Note that all memory and speed optimizations that we will apply going forward, are equally applicable to models that require model or tensor parallelism.\n\nSince the model is loaded in bfloat16 precision, using our rule of thumb above, we would expect the memory requirement to run inference with `bigcode\/octocoder` to be around 31 GB VRAM. Let's give it a try.\n\nWe first load the model and tokenizer and then pass both to Transformers' [pipeline](https:\/\/huggingface.co\/docs\/transformers\/main_classes\/pipelines) object.\n\n```python\nfrom transformers import AutoModelForCausalLM, AutoTokenizer, pipeline\nimport torch\n\nmodel = AutoModelForCausalLM.from_pretrained(\"bigcode\/octocoder\", torch_dtype=torch.bfloat16, device_map=\"auto\", pad_token_id=0)\ntokenizer = AutoTokenizer.from_pretrained(\"bigcode\/octocoder\")\n\npipe = pipeline(\"text-generation\", model=model, tokenizer=tokenizer)\n```\n\n```python\nprompt = \"Question: Please write a function in Python that transforms bytes to Giga bytes.\\n\\nAnswer:\"\n\nresult = pipe(prompt, max_new_tokens=60)[0][\"generated_text\"][len(prompt):]\nresult\n```\n\n**Output**:\n```\nHere is a Python function that transforms bytes to Giga bytes:\\n\\n```python\\ndef bytes_to_giga_bytes(bytes):\\n return bytes \/ 1024 \/ 1024 \/ 1024\\n```\\n\\nThis function takes a single\n```\n\nNice, we can now directly use the result to convert bytes into Gigabytes.\n\n```python\ndef bytes_to_giga_bytes(bytes):\n return bytes \/ 1024 \/ 1024 \/ 1024\n```\n\nLet's call [`torch.cuda.max_memory_allocated`](https:\/\/pytorch.org\/docs\/stable\/generated\/torch.cuda.max_memory_allocated.html) to measure the peak GPU memory allocation.\n\n```python\nbytes_to_giga_bytes(torch.cuda.max_memory_allocated())\n```\n\n**Output**:\n```bash\n29.0260648727417\n```\n\nClose enough to our back-of-the-envelope computation! We can see the number is not exactly correct as going from bytes to kilobytes requires a multiplication of 1024 instead of 1000. Therefore the back-of-the-envelope formula can also be understood as an \"at most X GB\" computation.\nNote that if we had tried to run the model in full float32 precision, a whopping 64 GB of VRAM would have been required.\n\n> Almost all models are trained in bfloat16 nowadays, there is no reason to run the model in full float32 precision if [your GPU supports bfloat16](https:\/\/discuss.pytorch.org\/t\/bfloat16-native-support\/117155\/5). Float32 won't give better inference results than the precision that was used to train the model.\n\nIf you are unsure in which format the model weights are stored on the Hub, you can always look into the checkpoint's config under `\"torch_dtype\"`, *e.g.* [here](https:\/\/huggingface.co\/meta-llama\/Llama-2-7b-hf\/blob\/6fdf2e60f86ff2481f2241aaee459f85b5b0bbb9\/config.json#L21). It is recommended to set the model to the same precision type as written in the config when loading with `from_pretrained(..., torch_dtype=...)` except when the original type is float32 in which case one can use both `float16` or `bfloat16` for inference.\n\n\nLet's define a `flush(...)` function to free all allocated memory so that we can accurately measure the peak allocated GPU memory.\n\n```python\ndel pipe\ndel model\n\nimport gc\nimport torch\n\ndef flush():\n gc.collect()\n torch.cuda.empty_cache()\n torch.cuda.reset_peak_memory_stats()\n```\n\nLet's call it now for the next experiment.\n\n```python\nflush()\n```\nIn the recent version of the accelerate library, you can also use a utility method called `release_memory()`\n\n```python\nfrom accelerate.utils import release_memory\n# ...\n\nrelease_memory(model)\n```\n\nNow wh" - }, - { - "id": 2, - "initial_rank": 2, - "content": "t if your GPU does not have 32 GB of VRAM? It has been found that model weights can be quantized to 8-bit or 4-bits without a significant loss in performance (see [Dettmers et al.](https:\/\/arxiv.org\/abs\/2208.07339)).\nModel can be quantized to even 3 or 2 bits with an acceptable loss in performance as shown in the recent [GPTQ paper](https:\/\/arxiv.org\/abs\/2210.17323) 🤯.\n\nWithout going into too many details, quantization schemes aim at reducing the precision of weights while trying to keep the model's inference results as accurate as possible (*a.k.a* as close as possible to bfloat16).\nNote that quantization works especially well for text generation since all we care about is choosing the *set of most likely next tokens* and don't really care about the exact values of the next token *logit* distribution.\nAll that matters is that the next token *logit* distribution stays roughly the same so that an `argmax` or `topk` operation gives the same results.\n\nThere are various quantization techniques, which we won't discuss in detail here, but in general, all quantization techniques work as follows:\n\n- 1. Quantize all weights to the target precision\n- 2. Load the quantized weights, and pass the input sequence of vectors in bfloat16 precision\n- 3. Dynamically dequantize weights to bfloat16 to perform the computation with their input vectors in bfloat16 precision\n\nIn a nutshell, this means that *inputs-weight matrix* multiplications, with \\\\( X \\\\) being the *inputs*, \\\\( W \\\\) being a weight matrix and \\\\( Y \\\\) being the output:\n\n$$ Y = X * W $$\n\nare changed to\n\n$$ Y = X * \\text{dequantize}(W) $$\n\nfor every matrix multiplication. Dequantization and re-quantization is performed sequentially for all weight matrices as the inputs run through the network graph.\n\nTherefore, inference time is often **not** reduced when using quantized weights, but rather increases.\nEnough theory, let's give it a try! To quantize the weights with Transformers, you need to make sure that\nthe [`bitsandbytes`](https:\/\/github.com\/bitsandbytes-foundation\/bitsandbytes) library is installed.\n\n```bash\n!pip install bitsandbytes\n```\n\nWe can then load models in 8-bit quantization by simply adding a `load_in_8bit=True` flag to `from_pretrained`.\n\n```python\nmodel = AutoModelForCausalLM.from_pretrained(\"bigcode\/octocoder\", load_in_8bit=True, pad_token_id=0)\n```\n\nNow, let's run our example again and measure the memory usage.\n\n```python\npipe = pipeline(\"text-generation\", model=model, tokenizer=tokenizer)\n\nresult = pipe(prompt, max_new_tokens=60)[0][\"generated_text\"][len(prompt):]\nresult\n```\n\n**Output**:\n```\nHere is a Python function that transforms bytes to Giga bytes:\\n\\n```python\\ndef bytes_to_giga_bytes(bytes):\\n return bytes \/ 1024 \/ 1024 \/ 1024\\n```\\n\\nThis function takes a single\n```\n\nNice, we're getting the same result as before, so no loss in accuracy! Let's look at how much memory was used this time.\n\n```python\nbytes_to_giga_bytes(torch.cuda.max_memory_allocated())\n```\n\n**Output**:\n```\n15.219234466552734\n```\n\nSignificantly less! We're down to just a bit over 15 GBs and could therefore run this model on consumer GPUs like the 4090.\nWe're seeing a very nice gain in memory efficiency and more or less no degradation to the model's output. However, we can also notice a slight slow-down during inference.\n\n\nWe delete the models and flush the memory again.\n```python\ndel model\ndel pipe\n```\n\n```python\nflush()\n```\n\nLet's see what peak GPU memory consumption 4-bit quantization gives. Quantizing the model to 4-bit can be done with the same API as before - this time by passing `load_in_4bit=True` instead of `load_in_8bit=True`.\n\n```python\nmodel = AutoModelForCausalLM.from_pretrained(\"bigcode\/octocoder\", load_in_4bit=True, low_cpu_mem_usage=True, pad_token_id=0)\n\npipe = pipeline(\"text-generation\", model=model, tokenizer=tokenizer)\n\nresult = pipe(prompt, max_new_tokens=60)[0][\"generated_text\"][len(prompt):]\nresult\n```\n\n**Output**:\n```\nHere is a Python function that transforms bytes to Giga bytes:\\n\\n```\\ndef bytes_to_gigabytes(bytes):\\n return bytes \/ 1024 \/ 1024 \/ 1024\\n```\\n\\nThis function takes a single argument\n```\n\nWe're almost seeing the same output text as before - just the `python` is missing just before the code snippet. Let's see how much memory was required.\n\n```python\nbytes_to_giga_bytes(torch.cuda.max_memory_allocated())\n```\n\n**Output**:\n```\n9.543574333190918\n```\n\nJust 9.5GB! That's really not a lot for a >15 billion parameter model.\n\nWhile we see very little degradation in accuracy for our model here, 4-bit quantization can in practice often lead to different results compared to 8-bit quantization or full `bfloat16` inference. It is up to the user to try it out.\n\nAlso note that inference here was again a bit slower compared to 8-bit quantization which is due to the more aggressive quantization method used for 4-bit quantization leading to \\\\( \\text{quantize} \\\\) and \\\\( \\text{dequantize} \\\\) taking longer during inference.\n\n```python\ndel model\ndel pipe\n```\n```python\nflush()\n```\n\nOverall, we saw that running OctoCoder in 8-bit precision reduced the required GPU VRAM from 32G GPU VRAM to only 15GB and running the model in 4-bit precision further reduces the required GPU VRAM to just a bit over 9GB.\n\n4-bit quantization allows the model to be run on GPUs such as RTX3090, V100, and T4 which are quite accessible for most people.\n\nFor more information on quantization and to see how one can quantize models to require even less GPU VRAM memory than 4-bit, we recommend looking into the [`AutoGPTQ`](https:\/\/huggingface.co\/docs\/transformers\/main\/en\/main_classes\/quantization#autogptq-integration%60) implementation.\n\n> As a conclusion, it is important to remember that model quantization trades improved memory efficiency against accuracy and in some cases inference time.\n\nIf GPU memory is not a constraint for your use case, there is often no need to look into quantization. However many GPUs simply can't run LLMs without quantization methods and in this case, 4-bit and 8-bit quantization schemes are extremely useful tools.\n\nFor more in-detail usage information, we strongly recommend taking a look at the [Transformers Quantization Docs](https:\/\/huggingface.co\/docs\/transformers\/main_classes\/quantization#general-usage).\nNext, let's look into how we can improve computational and memory efficiency by using better algorithms and an improved model architecture.\n\n## 2. Fla" - }, - { - "id": 3, - "initial_rank": 3, - "content": "\n\n# Model training anatomy\n\nTo understand performance optimization techniques that one can apply to improve efficiency of model training \nspeed and memory utilization, it's helpful to get familiar with how GPU is utilized during training, and how compute \nintensity varies depending on an operation performed.\n\nLet's start by exploring a motivating example of GPU utilization and the training run of a model. For the demonstration, \nwe'll need to install a few libraries: \n\n```bash\npip install transformers datasets accelerate nvidia-ml-py3\n```\n\nThe `nvidia-ml-py3` library allows us to monitor the memory usage of the models from within Python. You might be familiar \nwith the `nvidia-smi` command in the terminal - this library allows to access the same information in Python directly.\n\nThen, we create some dummy data: random token IDs between 100 and 30000 and binary labels for a classifier. \nIn total, we get 512 sequences each with length 512 and store them in a [`~datasets.Dataset`] with PyTorch format.\n\n\n```py\n>>> import numpy as np\n>>> from datasets import Dataset\n\n\n>>> seq_len, dataset_size = 512, 512\n>>> dummy_data = {\n... \"input_ids\": np.random.randint(100, 30000, (dataset_size, seq_len)),\n... \"labels\": np.random.randint(0, 2, (dataset_size)),\n... }\n>>> ds = Dataset.from_dict(dummy_data)\n>>> ds.set_format(\"pt\")\n```\n\nTo print summary statistics for the GPU utilization and the training run with the [`Trainer`] we define two helper functions:\n\n```py\n>>> from pynvml import *\n\n\n>>> def print_gpu_utilization():\n... nvmlInit()\n... handle = nvmlDeviceGetHandleByIndex(0)\n... info = nvmlDeviceGetMemoryInfo(handle)\n... print(f\"GPU memory occupied: {info.used\/\/1024**2} MB.\")\n\n\n>>> def print_summary(result):\n... print(f\"Time: {result.metrics['train_runtime']:.2f}\")\n... print(f\"Samples\/second: {result.metrics['train_samples_per_second']:.2f}\")\n... print_gpu_utilization()\n```\n\nLet's verify that we start with a free GPU memory:\n\n```py\n>>> print_gpu_utilization()\nGPU memory occupied: 0 MB.\n```\n\nThat looks good: the GPU memory is not occupied as we would expect before we load any models. If that's not the case on \nyour machine make sure to stop all processes that are using GPU memory. However, not all free GPU memory can be used by \nthe user. When a model is loaded to the GPU the kernels are also loaded, which can take up 1-2GB of memory. To see how \nmuch it is we load a tiny tensor into the GPU which triggers the kernels to be loaded as well.\n\n```py\n>>> import torch\n\n\n>>> torch.ones((1, 1)).to(\"cuda\")\n>>> print_gpu_utilization()\nGPU memory occupied: 1343 MB.\n```\n\nWe see that the kernels alone take up 1.3GB of GPU memory. Now let's see how much space the model uses.\n\n## Load Model\n\nFirst, we load the `google-bert\/bert-large-uncased` model. We load the model weights directly to the GPU so that we can check \nhow much space just the weights use.\n\n\n```py\n>>> from transformers import AutoModelForSequenceClassification\n\n\n>>> model = AutoModelForSequenceClassification.from_pretrained(\"google-bert\/bert-large-uncased\").to(\"cuda\")\n>>> print_gpu_utilization()\nGPU memory occupied: 2631 MB.\n```\n\nWe can see that the model weights alone take up 1.3 GB of GPU memory. The exact number depends on the specific \nGPU you are using. Note that on newer GPUs a model can sometimes take up more space since the weights are loaded in an \noptimized fashion that speeds up the usage of the model. Now we can also quickly check if we get the same result \nas with `nvidia-smi` CLI:\n\n\n```bash\nnvidia-smi\n```\n\n```bash\nTue Jan 11 08:58:05 2022\n+-----------------------------------------------------------------------------+\n| NVIDIA-SMI 460.91.03 Driver Version: 460.91.03 CUDA Version: 11.2 |\n|-------------------------------+----------------------+----------------------+\n| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |\n| Fan Temp Perf Pwr:Usage\/Cap| Memory-Usage | GPU-Util Compute M. |\n| | | MIG M. |\n|===============================+======================+======================|\n| 0 Tesla V100-SXM2... On | 00000000:00:04.0 Off | 0 |\n| N\/A 37C P0 39W \/ 300W | 2631MiB \/ 16160MiB | 0% Default |\n| | | N\/A |\n+-------------------------------+----------------------+----------------------+\n\n+-----------------------------------------------------------------------------+\n| Processes: |\n| GPU GI CI PID Type Process name GPU Memory |\n| ID ID Usage |\n|=============================================================================|\n| 0 N\/A N\/A 3721 C ...nvs\/codeparrot\/bin\/python 2629MiB |\n+-----------------------------------------------------------------------------+\n```\n\nWe get the same number as before and you can also see that we are using a V100 GPU with 16GB of memory. So now we can \nstart training the model and see how the GPU memory consumption changes. First, we set up a few standard training \narguments:\n\n```py\ndefault_args = {\n \"output_dir\": \"tmp\",\n \"eval_strategy\": \"steps\",\n \"num_train_epochs\": 1,\n \"log_level\": \"error\",\n \"report_to\": \"none\",\n}\n```\n\n\n\n If you plan to run multiple experiments, in order to properly clear the memory between experiments, restart the Python \n kernel between experiments.\n\n<\/Tip>\n\n## Memory utilization at vanilla training\n\nLet's use the [`Trainer`] and train the model without using any GPU performance optimization techniques and a batch size of 4:\n\n```py\n>>> from transformers import TrainingArguments, Trainer, logging\n\n>>> logging.set_verbosity_error()\n\n\n>>> training_args = TrainingArguments(per_device_train_batch_size=4, **default_args)\n>>> trainer = Trainer(model=model, args=training_args, train_dataset=ds)\n>>> result = trainer.train()\n>>> print_summary(result)\n```\n\n```\nTime: 57.82\nSamples\/second: 8.86\nGPU memory occupied: 14949 MB.\n```\n\nWe see that already a relatively small batch size almost fills up our GPU's entire memory. However, a larger batch size \ncan often result in faster model convergence or better end performance. So ideally we want to tune the batch size to our\nmodel's needs and not to the GPU limitations. What's interesting is that we use much more memory than the size of the model. \nTo understand a bit better why this is the case let's have a look at a model's operations and memory needs.\n\n## Anatomy of Model's Operations\n\nTransformers architecture includes 3 main groups of operations grouped below by compute-intensity.\n\n1. **Tensor Contractions**\n\n Linear layers and components of Multi-Head Attention all do batched **matrix-matrix multiplications**. These operations are the most compute-intensive part of training a transformer.\n\n2. **Statistical Normalizations**\n\n Softmax and layer normalization are less compute-intensive than tensor contractions, and involve one or more **reduction operations**, the result of which is then applied via a map.\n\n3. **Element-wise Operators**\n\n These are the remaining operators: **biases, dropout, activations, and residual connections**. These are the least compute-intensive operations.\n\nThis knowledge can be helpful to know when analyzing performance bottlenecks.\n\nThis summary is derived from [Data Movement Is All You Need: A Case Study on Optimizing Transformers 2020](https:\/\/arxiv.org\/abs\/2007.00072)\n\n\n" - }, - { - "id": 4, - "initial_rank": 4, - "content": "데 필요한 메모리 요구 사항과 같다고 가정합시다.\n\n모델을 bfloat16으로 로드하는 데 대략 얼마나 많은 VRAM이 필요한지 몇 가지 예를 들어보겠습니다:\n\n- **GPT3**는 2 \\* 175 GB = **350 GB** VRAM이 필요합니다.\n- [**Bloom**](https:\/\/huggingface.co\/bigscience\/bloom)은 2 \\* 176 GB = **352 GB** VRAM이 필요합니다.\n- [**Llama-2-70b**](https:\/\/huggingface.co\/meta-llama\/Llama-2-70b-hf)는 2 \\* 70 GB = **140 GB** VRAM이 필요합니다.\n- [**Falcon-40b**](https:\/\/huggingface.co\/tiiuae\/falcon-40b)는 2 \\* 40 GB = **80 GB** VRAM이 필요합니다.\n- [**MPT-30b**](https:\/\/huggingface.co\/mosaicml\/mpt-30b)는 2 * 30 GB = **60 GB** VRAM이 필요합니다.\n- [**bigcode\/starcoder**](https:\/\/huggingface.co\/bigcode\/starcoder)는 2 * 15.5 GB = **31 GB** VRAM이 필요합니다.\n\n이 문서를 작성하는 시점에서, 현재 시장에서 가장 큰 GPU 칩은 80GB의 VRAM을 제공하는 A100과 H100입니다. 앞서 언급된 대부분의 모델들은 로드하기 위해서는 최소 80GB 이상의 용량을 필요로 하며, 따라서 [텐서 병렬 처리](https:\/\/huggingface.co\/docs\/transformers\/perf_train_gpu_many#tensor-parallelism) 및\/또는 [파이프라인 병렬 처리](https:\/\/huggingface.co\/docs\/transformers\/perf_train_gpu_many#naive-model-parallelism-vertical-and-pipeline-parallelism)를 반드시 필요로 합니다.\n\n🤗 Transformers는 텐서 병렬 처리를 바로 지원하지 않습니다. 이는 모델 아키텍처가 특정 방식으로 작성되어야 하기 때문입니다. 텐서 병렬 처리를 지원하는 방식으로 모델을 작성하는 데 관심이 있다면 [the text-generation-inference library](https:\/\/github.com\/huggingface\/text-generation-inference\/tree\/main\/server\/text_generation_server\/models\/custom_modeling)를 참조해 보시기 바랍니다.\n\n기본적인 파이프라인 병렬 처리는 바로 지원됩니다. 이를 위해 단순히 모델을 `device=\"auto\"`로 로드하면 [여기](https:\/\/huggingface.co\/docs\/accelerate\/v0.22.0\/en\/concept_guides\/big_model_inference)에 설명된 대로 사용 가능한 GPU에 모델의 서로 다른 레이어를 자동으로 배치합니다. 이것은 매우 효과적이긴 하지만 이러한 기본 파이프라인 병렬 처리는 GPU 유휴 문제를 해결하지 못한다는 점을 유의해야 합니다. 더 발전된 파이프라인 병렬 처리가 필요하며, 이에 대한 설명은 [여기](https:\/\/huggingface.co\/docs\/transformers\/en\/perf_train_gpu_many#naive-model-parallelism-vertical-and-pipeline-parallelism)에서 확인할 수 있습니다.\n\n80GB A100 GPU 8개를 가진 노드에 접근할 수 있다면, BLOOM을 다음과 같이 로드할 수 있습니다.\n\n```bash\n!pip install transformers accelerate bitsandbytes optimum\n```\n```python\nfrom transformers import AutoModelForCausalLM\n\nmodel = AutoModelForCausalLM.from_pretrained(\"bigscience\/bloom\", device_map=\"auto\", pad_token_id=0)\n```\n\n`device_map=\"auto\"`를 사용하면 모든 사용 가능한 GPU에 어텐션 레이어가 고르게 분산됩니다.\n\n이 가이드에서는 [bigcode\/octocoder](https:\/\/huggingface.co\/bigcode\/octocoder)를 사용할 것입니다. 이 모델은 단일 40GB A100 GPU 장치에서 실행할 수 있습니다. 앞으로 적용할 모든 메모리 및 속도 최적화는 모델 또는 텐서 병렬 처리를 필요로 하는 다른 모델에도 동일하게 적용될 수 있습니다.\n\n모델이 bfloat16 정밀도로 로드되기 때문에, 위의 경험적으로 알아낸 법칙을 사용하면 `bigcode\/octocoder`를 사용하여 추론을 실행하기 위한 메모리 요구 사항이 약 31GB VRAM일 것으로 예상됩니다. 한 번 시도해 보겠습니다.\n\n먼저 모델과 토크나이저를 로드한 다음, 둘 다 Transformers의 [파이프라인](https:\/\/huggingface.co\/docs\/transformers\/main_classes\/pipelines) 객체에 전달합니다.\n\n```python\nfrom transformers import AutoModelForCausalLM, AutoTokenizer, pipeline\nimport torch\n\nmodel = AutoModelForCausalLM.from_pretrained(\"bigcode\/octocoder\", torch_dtype=torch.bfloat16, device_map=\"auto\", pad_token_id=0)\ntokenizer = AutoTokenizer.from_pretrained(\"bigcode\/octocoder\")\n\npipe = pipeline(\"text-generation\", model=model, tokenizer=tokenizer)\n```\n\n```python\nprompt = \"Question: Please write a function in Python that transforms bytes to Giga bytes.\\n\\nAnswer:\"\n\nresult = pipe(prompt, max_new_tokens=60)[0][\"generated_text\"][len(prompt):]\nresult\n```\n\n**출력**:\n```\nHere is a Python function that transforms bytes to Giga bytes:\\n\\n```python\\ndef bytes_to_giga_bytes(bytes):\\n return bytes \/ 1024 \/ 1024 \/ 1024\\n```\\n\\nThis function takes a single\n```\n\n좋습니다. 이제 결과를 직접 사용하여 바이트를 기가바이트로 변환할 수 있습니다.\n\n```python\ndef bytes_to_giga_bytes(bytes):\n return bytes \/ 1024 \/ 1024 \/ 1024\n```\n\n[`torch.cuda.max_memory_allocated`](https:\/\/pytorch.org\/docs\/stable\/generated\/torch.cuda.max_memory_allocated.html)를 호출하여 최대 GPU 메모리 할당을 측정해 보겠습니다.\n\n```python\nbytes_to_giga_bytes(torch.cuda.max_memory_allocated())\n```\n\n**출력**:\n```bash\n29.0260648727417\n```\n\n대략적으로 계산한 결과와 거의 일치합니다! 바이트에서 킬로바이트로 변환할 때 1000이 아닌 1024로 곱해야 하므로 숫자가 정확하지 않은 것을 알 수 있습니다. 따라서 대략적으로 계산할 때 공식은 \"최대 X GB\"으로 이해할 수 있습니다. 만약 우리가 모델을 float32 정밀도로 실행하려고 했다면 더 큰 크기인 64GB의 VRAM이 필요했을 것입니다.\n\n> 거의 모든 모델이 요즘 bfloat16으로 학습되므로, [GPU가 bfloat16을 지원](https:\/\/discuss.pytorch.org\/t\/bfloat16-native-support\/117155\/5)한다면 모델을 float32 정밀도로 실행할 이유가 없습니다. float32로 돌리는 모델은 학습할 때 사용했던 정밀도보다 더 나은 추론 결과를 제공" - }, - { - "id": 5, - "initial_rank": 5, - "content": "\n\n# Anatomía del entrenamiento de los modelos\n\nPara entender las técnicas de optimización del rendimiento que se pueden aplicar para mejorar la eficiencia en la velocidad del entrenamiento de los modelos y la utilización de la memoria, es útil familiarizarse con cómo se utiliza la GPU durante el entrenamiento y cómo varía la intensidad de cálculo según la operación realizada.\n\nEmpecemos explorando un ejemplo enfocado en la utilización de la GPU y la ejecución del entrenamiento de un modelo. Para la demostración, necesitaremos instalar algunas bibliotecas:\n\n```bash\npip install transformers datasets accelerate nvidia-ml-py3\n```\n\nLa biblioteca `nvidia-ml-py3` nos permite monitorear la utilización de memoria de los modelos desde Python. Es posible que estés familiarizado con el comando `nvidia-smi` en la terminal, esta biblioteca nos permite acceder a la misma información en Python directamente.\n\nLuego, creamos algunos datos ficticios: IDs de tokens aleatorios entre 100 y 30000 y etiquetas binarias para un clasificador. En total, obtenemos 512 secuencias cada una con longitud 512 y las almacenamos en un [`~datasets.Dataset`] con formato PyTorch.\n\n\n```py\n>>> import numpy as np\n>>> from datasets import Dataset\n\n\n>>> seq_len, dataset_size = 512, 512\n>>> dummy_data = {\n... \"input_ids\": np.random.randint(100, 30000, (dataset_size, seq_len)),\n... \"labels\": np.random.randint(0, 1, (dataset_size)),\n... }\n>>> ds = Dataset.from_dict(dummy_data)\n>>> ds.set_format(\"pt\")\n```\n\nPara imprimir estadísticas resumidas para la utilización de la GPU y la ejecución del entrenamiento con [`Trainer`](https:\/\/huggingface.co\/docs\/transformers\/en\/main_classes\/trainer#transformers.Trainer), definimos dos funciones auxiliares:\n\n```py\n>>> from pynvml import *\n\n\n>>> def print_gpu_utilization():\n... nvmlInit()\n... handle = nvmlDeviceGetHandleByIndex(0)\n... info = nvmlDeviceGetMemoryInfo(handle)\n... print(f\"GPU memory occupied: {info.used\/\/1024**2} MB.\")\n\n\n>>> def print_summary(result):\n... print(f\"Time: {result.metrics['train_runtime']:.2f}\")\n... print(f\"Samples\/second: {result.metrics['train_samples_per_second']:.2f}\")\n... print_gpu_utilization()\n```\n\nComencemos comprobando que la memoria GPU este libre:\n\n```py\n>>> print_gpu_utilization()\nGPU memory occupied: 0 MB.\n```\n\nParece estar bien: la memoria de la GPU no está ocupada como esperaríamos antes de cargar cualquier modelo. Si no es el caso en tu máquina, asegúrate de detener todos los procesos que estén utilizando la memoria de la GPU. Sin embargo, no toda la memoria libre de la GPU puede ser utilizada por el usuario. Cuando se carga un modelo en la GPU, también se cargan los kernels, lo que puede ocupar 1-2GB de memoria. Para ver cuánta memoria será ocupada por defecto, cargemos un tensor diminuto en la GPU, lo que también desencadena la carga de los kernels.\n\n```py\n>>> import torch\n\n\n>>> torch.ones((1, 1)).to(\"cuda\")\n>>> print_gpu_utilization()\nGPU memory occupied: 1343 MB.\n```\n\nVemos que los kernels solos ocupan 1,3GB de memoria de la GPU. Ahora, veamos cuánto espacio ocupa el modelo.\n\n## Cargar el Modelo\n\nPrimero, cargamos el modelo `google-bert\/bert-large-uncased`. Los pesos del modelo son cargados directamente en la GPU para que podamos verificar cuánto espacio ocupan solo los pesos.\n\n```py\n>>> from transformers import AutoModelForSequenceClassification\n\n\n>>> model = AutoModelForSequenceClassification.from_pretrained(\"google-bert\/bert-large-uncased\").to(\"cuda\")\n>>> print_gpu_utilization()\nGPU memory occupied: 2631 MB.\n```\n\nPodemos ver que los pesos del modelo solos ocupan 1,3 GB de memoria de la GPU. El número exacto depende de la GPU específica que estés utilizando. Ten en cuenta que en GPUs más modernas, un modelo puede ocupar más espacio ya que los pesos se cargan de manera optimizada lo cual acelera el uso del modelo. Ahora también podemos verificar rápidamente si obtenemos el mismo resultado que con la CLI de `nvidia-smi`:\n\n```bash\nnvidia-smi\n```\n\n```bash\nTue Jan 11 08:58:05 2022\n+-----------------------------------------------------------------------------+\n| NVIDIA-SMI 460.91.03 Driver Version: 460.91.03 CUDA Version: 11.2 |\n|-------------------------------+----------------------+----------------------+\n| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |\n| Fan Temp Perf Pwr:Usage\/Cap| Memory-Usage | GPU-Util Compute M. |\n| | | MIG M. |\n|===============================+======================+======================|\n| 0 Tesla V100-SXM2... On | 00000000:00:04.0 Off | 0 |\n| N\/A 37C P0 39W \/ 300W | 2631MiB \/ 16160MiB | 0% Default |\n| | | N\/A |\n+-------------------------------+----------------------+----------------------+\n\n+-----------------------------------------------------------------------------+\n| Processes: |\n| GPU GI CI PID Type Process name GPU Memory |\n| ID ID Usage |\n|=============================================================================|\n| 0 N\/A N\/A 3721 C ...nvs\/codeparrot\/bin\/python 2629MiB |\n+-----------------------------------------------------------------------------+\n```\n\nObtenemos el mismo número que antes y también puedes ver que estamos utilizando una GPU V100 con 16GB de memoria. Ahora podemos empezar a entrenar el modelo y ver cómo cambia el consumo de memoria de la GPU. Primero, configuramos algunos argumentos de entrenamiento estándar:\n\n```py\ndefault_args = {\n \"output_dir\": \"tmp\",\n \"eval_strategy\": \"steps\",\n \"num_train_epochs\": 1,\n \"log_level\": \"error\",\n \"report_to\": \"none\",\n}\n```\n\n\n\nSi planeas ejecutar varias pruebas, reinicie el kernel de Python entre cada prueba para borrar correctamente la memoria.\n\n<\/Tip>\n\n## Utilización de la memoria en el entrenamiento\n\nVamos a utilizar el [`Trainer`](https:\/\/huggingface.co\/docs\/transformers\/en\/main_classes\/trainer#transformers.Trainer) y entrenar el modelo sin utilizar ninguna técnica de optimización del rendimiento de la GPU y un tamaño de lote de 4:\n\n```py\n>>> from transformers import TrainingArguments, Trainer, logging\n\n>>> logging.set_verbosity_error()\n\n\n>>> training_args = TrainingArguments(per_device_train_batch_size=4, **default_args)\n>>> trainer = Trainer(model=model, args=training_args, train_dataset=ds)\n>>> result = trainer.train()\n>>> print_summary(result)\n```\n\n```\nTime: 57.82\nSamples\/second: 8.86\nGPU memory occupied: 14949 MB.\n```\n\nVemos que incluso un tamaño de lote relativamente pequeño casi llena toda la memoria de nuestra GPU. Sin embargo, un tamaño de lote más grande a menudo puede resultar en una convergencia del modelo más rápida o un mejor rendimiento final. Así que idealmente queremos ajustar el tamaño del lote a las necesidades del modelo y no a las limitaciones de la GPU. Lo interesante es que utilizamos mucha más memoria que el tamaño del modelo. \nPara entender un poco mejor por qué es el caso, echemos un vistazo a las operaciones y necesidades de memoria de un modelo.\n\n## Anatomía de las Operaciones del Modelo\n\nLa arquitectur" - }, - { - "id": 6, - "initial_rank": 6, - "content": "\n\n# 모델 학습 해부하기 [[model-training-anatomy]]\n\n모델 훈련 속도와 메모리 활용의 효율성을 향상시키기 위해 적용할 수 있는 성능 최적화 기술을 이해하려면 GPU가 훈련 중에 어떻게 활용되는지, 그리고 수행되는 연산에 따라 연산 강도가 어떻게 변하는지에 익숙해져야 합니다.\n\n먼저 GPU 활용과 모델 훈련 실행에 대한 예시를 살펴보겠습니다. 데모를 위해 몇몇 라이브러리를 설치해야 합니다:\n\n```bash\npip install transformers datasets accelerate nvidia-ml-py3\n```\n\n`nvidia-ml-py3` 라이브러리는 Python 내부에서 모델의 메모리 사용량을 모니터링할 수 있게 해줍니다. 터미널의 `nvidia-smi` 명령어에 익숙할 수 있는데, 이 라이브러리는 Python에서 직접 동일한 정보에 접근할 수 있게 해줍니다.\n\n그 다음, 100과 30000 사이의 무작위 토큰 ID와 분류기를 위한 이진 레이블인 더미 데이터를 생성합니다.\n길이가 각각 512인 총 512개의 시퀀스를 가져와 PyTorch 형식의 [`~datasets.Dataset`]에 저장합니다.\n\n\n```py\n>>> import numpy as np\n>>> from datasets import Dataset\n\n\n>>> seq_len, dataset_size = 512, 512\n>>> dummy_data = {\n... \"input_ids\": np.random.randint(100, 30000, (dataset_size, seq_len)),\n... \"labels\": np.random.randint(0, 1, (dataset_size)),\n... }\n>>> ds = Dataset.from_dict(dummy_data)\n>>> ds.set_format(\"pt\")\n```\n\nGPU 활용 및 [`Trainer`]로 실행한 훈련 과정에 대한 요약 통계를 출력하기 위해 두 개의 도우미 함수를 정의하겠습니다:\n\n```py\n>>> from pynvml import *\n\n\n>>> def print_gpu_utilization():\n... nvmlInit()\n... handle = nvmlDeviceGetHandleByIndex(0)\n... info = nvmlDeviceGetMemoryInfo(handle)\n... print(f\"GPU memory occupied: {info.used\/\/1024**2} MB.\")\n\n\n>>> def print_summary(result):\n... print(f\"Time: {result.metrics['train_runtime']:.2f}\")\n... print(f\"Samples\/second: {result.metrics['train_samples_per_second']:.2f}\")\n... print_gpu_utilization()\n```\n\n시작할 때 GPU 메모리가 비어 있는지 확인해 봅시다:\n\n```py\n>>> print_gpu_utilization()\nGPU memory occupied: 0 MB.\n```\n\n좋습니다. 모델을 로드하기 전에는 예상대로 GPU 메모리가 점유되지 않았습니다. 그렇지 않다면 사용자의 기기에서 GPU 메모리를 사용하는 모든 프로세스를 중단해야 합니다. 그러나 사용자는 모든 여유 GPU 메모리를 사용할 수는 없습니다. 모델이 GPU에 로드될 때 커널도 로드되므로 1-2GB의 메모리를 차지할 수 있습니다. 얼마나 되는지 확인하기 위해 GPU에 작은 텐서를 로드하여 커널이 로드되도록 트리거합니다.\n\n```py\n>>> import torch\n\n\n>>> torch.ones((1, 1)).to(\"cuda\")\n>>> print_gpu_utilization()\nGPU memory occupied: 1343 MB.\n```\n\n커널만으로도 GPU 메모리의 1.3GB를 차지합니다. 이제 모델이 얼마나 많은 공간을 사용하는지 확인해 보겠습니다.\n\n## 모델 로드 [[load-model]]\n\n우선, `google-bert\/bert-large-uncased` 모델을 로드합니다. 모델의 가중치를 직접 GPU에 로드해서 가중치만이 얼마나 많은 공간을 차지하는지 확인할 수 있습니다.\n\n\n```py\n>>> from transformers import AutoModelForSequenceClassification\n\n\n>>> model = AutoModelForSequenceClassification.from_pretrained(\"google-bert\/bert-large-uncased\").to(\"cuda\")\n>>> print_gpu_utilization()\nGPU memory occupied: 2631 MB.\n```\n\n모델의 가중치만으로도 GPU 메모리를 1.3 GB 차지하는 것을 볼 수 있습니다. 정확한 숫자는 사용하는 GPU에 따라 다릅니다. 최신 GPU에서는 모델 사용 속도를 높이는 최적화된 방식으로 가중치가 로드되므로, 모델이 더 많은 공간을 차지할 수 있습니다. 이제 `nvidia-smi` CLI와 동일한 결과를 얻는지 빠르게 확인할 수 있습니다:\n\n\n```bash\nnvidia-smi\n```\n\n```bash\nTue Jan 11 08:58:05 2022\n+-----------------------------------------------------------------------------+\n| NVIDIA-SMI 460.91.03 Driver Version: 460.91.03 CUDA Version: 11.2 |\n|-------------------------------+----------------------+----------------------+\n| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |\n| Fan Temp Perf Pwr:Usage\/Cap| Memory-Usage | GPU-Util Compute M. |\n| | " - }, - { - "id": 7, - "initial_rank": 7, - "content": "지 않습니다.\n\n모델 가중치가 어떤 정밀도 형식으로 Hub에 저장되어 있는지 확실하지 않은 경우, HuggingFace Hub에서 해당 체크포인트 config의 `\"torch_dtype\"`을 확인하면 됩니다, *예*를 들어 [여기](https:\/\/huggingface.co\/meta-llama\/Llama-2-7b-hf\/blob\/6fdf2e60f86ff2481f2241aaee459f85b5b0bbb9\/config.json#L21)를 확인하세요. 모델을 `from_pretrained(..., torch_dtype=...)`로 로드할 때는 config에 명시된 정밀도 유형과 동일한 정밀도로 설정하는 것이 권장됩니다. 단, 원래 유형이 float32인 경우 추론을 위해 `float16` 또는 `bfloat16`을 둘 다 사용할 수 있습니다.\n\n이제 `flush(...)` 함수를 정의하여 모든 메모리를 해제하고, GPU 메모리의 최대 할당량을 정확하게 측정하도록 합시다.\n\n\n```python\ndel pipe\ndel model\n\nimport gc\nimport torch\n\ndef flush():\n gc.collect()\n torch.cuda.empty_cache()\n torch.cuda.reset_peak_memory_stats()\n```\n\n다음 실험을 위해 바로 호출해 봅시다.\n\n```python\nflush()\n```\n최근 버전의 accelerate 라이브러리에서는 `release_memory()`라는 유틸리티 메소드도 사용할 수 있습니다.\n\n```python\nfrom accelerate.utils import release_memory\n# ...\n\nrelease_memory(model)\n```\n\n만약 GPU에 32GB의 VRAM이 없다면 어떻게 될까요? 모델 가중치를 성능에 큰 손실 없이 8비트 또는 4비트로 양자화할 수 있다는 것이 밝혀졌습니다(참고: [Dettmers et al.](https:\/\/arxiv.org\/abs\/2208.07339)). 최근의 [GPTQ 논문](https:\/\/arxiv.org\/abs\/2210.17323) 에서는 모델을 3비트 또는 2비트로 양자화해도 성능 손실이 허용 가능한 수준임을 보여주었습니다🤯.\n\n너무 자세한 내용은 다루지 않고 설명하자면, 양자화는 가중치의 정밀도를 줄이면서 모델의 추론 결과를 가능한 한 정확하게(즉, bfloat16과 최대한 가깝게) 유지하려고 합니다. 양자화는 특히 텍스트 생성에 잘 작동하는데, 이는 우리가 *가장 가능성 있는 다음 토큰 집합*을 선택하는 것에 초점을 두고 있기 때문이며, 다음 토큰의 *logit* 분포값을 정확하게 예측할 필요는 없기 때문입니다. 핵심은 다음 토큰 *logit* 분포가 대략적으로 동일하게 유지되어 `argmax` 또는 `topk` 연산이 동일한 결과를 제공하는 것입니다.\n\n다양한 양자화 기법이 존재하지만, 자세히 다루지는 않을 것입니다. 일반적으로 모든 양자화 기법은 다음과 같이 작동합니다:\n\n- 1. 모든 가중치를 목표 정밀도로 양자화합니다.\n- 2. 양자화된 가중치를 로드하고, bfloat16 정밀도의 입력 벡터 시퀀스를 모델에 전달합니다.\n- 3. 가중치를 동적으로 bfloat16으로 반대로 양자화(dequantize)하여 입력 벡터와 함께 bfloat16 정밀도로 계산을 수행합니다.\n\n간단히 말해서, *입력-가중치 행렬* 곱셈은, \\\\( X \\\\)가 *입력*, \\\\( W \\\\)가 가중치 행렬, \\\\( Y \\\\)가 출력인 경우 다음과 같습니다:\n\n$$ Y = X * W $$\n\n위 공식이 다음과 같이 변경됩니다\n\n$$ Y = X * \\text{dequantize}(W) $$\n\n모든 행렬 곱셈에 대해 위와 같이 수행됩니다. 입력이 네트워크 그래프를 통과하면서 모든 가중치 행렬에 대해 역양자화(dequantization)와 재양자화(re-quantization)가 순차적으로 수행됩니다.\n\n따라서, 양자화된 가중치를 사용할 때 추론 시간이 감소하지 **않고** 오히려 증가하는 경우가 많습니다. 이제 이론은 충분하니 실제로 시도해 봅시다! Transformers를 사용하여 가중치를 양자화하려면 [`bitsandbytes`](https:\/\/github.com\/TimDettmers\/bitsandbytes) 라이브러리가 설치되어 있는지 확인해야 합니다.\n\n```bash\n!pip install bitsandbytes\n```\n\n그런 다음 `from_pretrained`에 `load_in_8bit=True` 플래그를 추가하여 8비트 양자화로 모델을 로드할 수 있습니다.\n\n```python\nmodel = AutoModelForCausalLM.from_pretrained(\"bigcode\/octocoder\", load_in_8bit=True, pad_token_id=0)\n```\n\n이제 예제를 다시 실행하고 메모리 사용량을 측정해 봅시다.\n\n```python\npipe = pipeline(\"text-generation\", model=model, tokenizer=tokenizer)\n\nresult = pipe(prompt, max_new_tokens=60)[0][\"generated_text\"][len(prompt):]\nresult\n```\n\n**출력**:\n```\nHere is a Python function that transforms bytes to Giga bytes:\\n\\n```python\\ndef bytes_to_giga_bytes(bytes):\\n return bytes \/ 1024 \/ 1024 \/ 1024\\n```\\n\\nThis function takes a single\n```\n\n좋습니다. 정확도 손실 없이 이전과 동일한 결과를 얻고 있습니다! 이번에는 사용된 메모리 양을 확인해 봅시다.\n\n```python\nbytes_to_giga_bytes(torch.cuda.max_memory_allocated())\n```\n\n**출력**:\n```\n15.219234466552734\n```\n\n훨씬 적네요! 메모리 사용량이 15GB를 조금 넘는 수준으로 줄어들어 4090과 같은 소비자용 GPU에서도 이 모델을 실행할 수 있습니다. 메모리 효율성에서 매우 큰 향상을 보이고 있으며 모델 출력의 품질 저하도 거의 없습니다. 그러나 추론 중에 약간의 속도 저하가 발생한 것을 확인할 수 있습니다.\n\n\n모델을 삭제하고 메모리를 다시 초기화합니다.\n\n```python\ndel model\ndel pipe\n```\n\n```python\nflush()\n```\n\n이제 4비트 양자화가 제공하는 최대 GPU 메모리 사용량을 확인해 봅시다. 4비트로 모델을 양자화하려면 이전과 동일한 API를 사용하되 이번에는 `load_in_8bit=True` 대신 `load_in_4bit=True`를 전달하면 됩니다.\n\n```python\nm" - }, - { - "id": 8, - "initial_rank": 8, - "content": "es\n\nbitsandbytes is a quantization library that includes support for 4-bit and 8-bit quantization. Quantization reduces your model size compared to its native full precision version, making it easier to fit large models onto GPUs with limited memory.\n\nMake sure you have bitsandbytes and 🤗 Accelerate installed:\n\n```bash\n# these versions support 8-bit and 4-bit\npip install bitsandbytes>=0.39.0 accelerate>=0.20.0\n\n# install Transformers\npip install transformers\n```\n\n### 4-bit\n\nTo load a model in 4-bit for inference, use the `load_in_4bit` parameter. The `device_map` parameter is optional, but we recommend setting it to `\"auto\"` to allow 🤗 Accelerate to automatically and efficiently allocate the model given the available resources in the environment.\n\n```py\nfrom transformers import AutoModelForCausalLM\n\nmodel_name = \"bigscience\/bloom-2b5\"\nmodel_4bit = AutoModelForCausalLM.from_pretrained(model_name, device_map=\"auto\", load_in_4bit=True)\n```\n\nTo load a model in 4-bit for inference with multiple GPUs, you can control how much GPU RAM you want to allocate to each GPU. For example, to distribute 600MB of memory to the first GPU and 1GB of memory to the second GPU:\n\n```py\nmax_memory_mapping = {0: \"600MB\", 1: \"1GB\"}\nmodel_name = \"bigscience\/bloom-3b\"\nmodel_4bit = AutoModelForCausalLM.from_pretrained(\n model_name, device_map=\"auto\", load_in_4bit=True, max_memory=max_memory_mapping\n)\n```\n\n### 8-bit\n\n\n\nIf you're curious and interested in learning more about the concepts underlying 8-bit quantization, read the [Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using Hugging Face Transformers, Accelerate and bitsandbytes](https:\/\/huggingface.co\/blog\/hf-bitsandbytes-integration) blog post.\n\n<\/Tip>\n\nTo load a model in 8-bit for inference, use the `load_in_8bit` parameter. The `device_map` parameter is optional, but we recommend setting it to `\"auto\"` to allow 🤗 Accelerate to automatically and efficiently allocate the model given the available resources in the environment:\n\n```py\nfrom transformers import AutoModelForCausalLM, BitsAndBytesConfig\n\nmodel_name = \"bigscience\/bloom-2b5\"\nmodel_8bit = AutoModelForCausalLM.from_pretrained(model_name, quantization_config=BitsAndBytesConfig(load_in_8bit=True))\n```\n\nIf you're loading a model in 8-bit for text generation, you should use the [`~transformers.GenerationMixin.generate`] method instead of the [`Pipeline`] function which is not optimized for 8-bit models and will be slower. Some sampling strategies, like nucleus sampling, are also not supported by the [`Pipeline`] for 8-bit models. You should also place all inputs on the same device as the model:\n\n```py\nfrom transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig\n\nmodel_name = \"bigscience\/bloom-2b5\"\ntokenizer = AutoTokenizer.from_pretrained(model_name)\nmodel_8bit = AutoModelForCausalLM.from_pretrained(model_name, quantization_config=BitsAndBytesConfig(load_in_8bit=True))\n\nprompt = \"Hello, my llama is cute\"\ninputs = tokenizer(prompt, return_tensors=\"pt\").to(\"cuda\")\ngenerated_ids = model.generate(**inputs)\noutputs = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)\n```\n\nTo load a model in 4-bit for inference with multiple GPUs, you can control how much GPU RAM you want to allocate to each GPU. For example, to distribute 1GB of memory to the first GPU and 2GB of memory to the second GPU:\n\n```py\nmax_memory_mapping = {0: \"1GB\", 1: \"2GB\"}\nmodel_name = \"bigscience\/bloom-3b\"\nmodel_8bit = AutoModelForCausalLM.from_pretrained(\n model_name, device_map=\"auto\", load_in_8bit=True, max_memory=max_memory_mapping\n)\n```\n\n\n\nFeel free to try running a 11 billion parameter [T5 model](https:\/\/colab.research.google.com\/drive\/1YORPWx4okIHXnjW7MSAidXN29mPVNT7F?usp=sharing) or the 3 billion parameter [BLOOM model](https:\/\/colab.research.google.com\/drive\/1qOjXfQIAULfKvZqwCen8-MoWKGdSatZ4?usp=sharing) for inference on Google Colab's free tier GPUs!\n\n<\/Tip>\n\n## 🤗 Optimum\n\n\n\nLearn more details about using ORT with 🤗 Optimum in the [Accelerated inference on NVIDIA GPUs](https:\/\/huggingface.co\/docs\/optimum\/onnxruntime\/usage_guides\/gpu#accelerated-inference-on-nvidia-gpus) and [Accelerated inference on AMD GPUs](https:\/\/huggingface.co\/docs\/optimum\/onnxruntime\/usage_guides\/amdgpu#accelerated-inference-on-amd-gpus) guides. This section only provides a brief and simple example.\n\n<\/Tip>\n\nONNX Runtime (ORT) is a model accelerator that supports accelerated inference on Nvidia GPUs, and AMD GPUs that use [ROCm](https:\/\/www.amd.com\/en\/products\/software\/rocm.html) stack. ORT uses optimization techniques like fusing common operations into a single node and constant folding to reduce the number of computations performed and speedup inference. ORT also places the most computationally intensive operations on the GPU and the rest on the CPU to intelligently distribute the workload between the two devices.\n\nORT is supported by 🤗 Optimum which can be used in 🤗 Transformers. You'll need to use an [`~optimum.onnxruntime.ORTModel`] for the task you're solving, and specify the `provider` parameter which can be set to either [`CUDAExecutionProvider`](https:\/\/huggingface.co\/docs\/optimum\/onnxruntime\/usage_guides\/gpu#cudaexecutionprovider), [`ROCMExecutionProvider`](https:\/\/huggingface.co\/docs\/optimum\/onnxruntime\/usage_guides\/amdgpu) or [`TensorrtExecutionProvider`](https:\/\/huggingface.co\/docs\/optimum\/onnxruntime\/usage_guides\/gpu#tensorrtexecutionprovider). If you want to load a model that was not yet exported to ONNX, you can set `export=True` to convert your model on-the-fly to the ONNX format:\n\n```py\nfrom optimum.onnxruntime import ORTModelForSequenceClassification\n\nort_model = ORTModelForSequenceClassification.from_pretrained(\n \"distilbert\/distilbert-base-uncased-finetuned-sst-2-english\",\n export=True,\n provider=\"CUDAExecutionProvider\",\n)\n```\n\nNow you're free to use the model for inference:\n\n```py\nfrom optimum.pipelines import pipeline\nfrom transformers import AutoTokenizer\n\ntokenizer = AutoTokenizer.from_pretrained(\"distilbert\/distilbert-base-uncased-finetuned-sst-2-english\")\n\npipeline = pipeline(task=\"text-classification\", model=ort_model, tokenizer=tokenizer, device=\"cuda:0\")\nresult = pipeline(\"Both the music and visual were astounding, not to mention the actors performance.\")\n```\n\n## Combine optimizations\n\nIt is often possible to combine several of the optimization techniques described above to get the best inference performance possible for your model. For example, you can load a model in 4-bit, and then enable BetterTransformer with FlashAttention:\n\n```py\nimport torch\nfrom transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig\n\n# load model in 4-bit\nquantization_config = BitsAndBytesConfig(\n load_in_4bit=True,\n bnb_4bit_compute_dtype=torch.float16\n)\n\ntokenizer = AutoTokenizer.from_pretrained(\"facebook\/opt-350m\")\nmodel = AutoModelForCausalLM.from_pretrained(\"facebook\/opt-350m\", quantization_config=quantization_config)\n\n# enable BetterTransformer\nmodel = model.to_bettertransformer()\n\ninput_text = \"Hello my dog is cute and\"\ninputs = tokenizer(input_text, return_tensors=\"pt\").to(\"cuda\")\n\n# enable FlashAttention\nwith torch.backends.cuda.sdp_kernel(enable_flash=True, enable_math=False, enable_mem_efficient=False):\n outputs = model.generate(**inputs)\n\nprint(tokenizer.decode(outputs[0], skip_special_tokens=True))\n```\n" - }, - { - "id": 9, - "initial_rank": 9, - "content": "이제 더 저수준(Low-Level) 접근 방식을 통해 대화에 포함된 각 단계를 살펴보겠습니다. \n코드 샘플로 시작한 후 이를 분석해 보겠습니다:\n\n```python\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\nimport torch\n\n# 입력값을 사전에 준비해 놓습니다\nchat = [\n {\"role\": \"system\", \"content\": \"You are a sassy, wise-cracking robot as imagined by Hollywood circa 1986.\"},\n {\"role\": \"user\", \"content\": \"Hey, can you tell me any fun things to do in New York?\"}\n]\n\n# 1: 모델과 토크나이저를 불러옵니다\nmodel = AutoModelForCausalLM.from_pretrained(\"meta-llama\/Meta-Llama-3-8B-Instruct\", device_map=\"auto\", torch_dtype=torch.bfloat16)\ntokenizer = AutoTokenizer.from_pretrained(\"meta-llama\/Meta-Llama-3-8B-Instruct\")\n\n# 2: 채팅 템플릿에 적용합니다\nformatted_chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)\nprint(\"Formatted chat:\\n\", formatted_chat)\n\n# 3: 채팅을 토큰화합니다 (바로 이전 과정에서 tokenized=True로 설정하면 한꺼번에 처리할 수 있습니다)\ninputs = tokenizer(formatted_chat, return_tensors=\"pt\", add_special_tokens=False)\n# 토큰화된 입력값을 모델이 올라와 있는 기기(CPU\/GPU)로 옮깁니다.\ninputs = {key: tensor.to(model.device) for key, tensor in inputs.items()}\nprint(\"Tokenized inputs:\\n\", inputs)\n\n# 4: 모델로부터 응답을 생성합니다\noutputs = model.generate(**inputs, max_new_tokens=512, temperature=0.1)\nprint(\"Generated tokens:\\n\", outputs)\n\n# 5: 모델이 출력한 토큰을 다시 문자열로 디코딩합니다\ndecoded_output = tokenizer.decode(outputs[0][inputs['input_ids'].size(1):], skip_special_tokens=True)\nprint(\"Decoded output:\\n\", decoded_output)\n```\n여기에는 각 부분이 자체 문서가 될 수 있을 만큼 많은 내용이 담겨 있습니다! \n너무 자세히 설명하기보다는 넓은 개념을 다루고, 세부 사항은 링크된 문서에서 다루겠습니다. \n주요 단계는 다음과 같습니다:\n\n1. [모델](https:\/\/huggingface.co\/learn\/nlp-course\/en\/chapter2\/3)과 [토크나이저](https:\/\/huggingface.co\/learn\/nlp-course\/en\/chapter2\/4?fw=pt)를 Hugging Face Hub에서 로드합니다.\n2. 대화는 토크나이저의 [채팅 템플릿](https:\/\/huggingface.co\/docs\/transformers\/main\/en\/chat_templating)을 사용하여 양식을 구성합니다.\n3. 구성된 채팅은 토크나이저를 사용하여 [토큰화](https:\/\/huggingface.co\/learn\/nlp-course\/en\/chapter2\/4)됩니다.\n4. 모델에서 응답을 [생성](https:\/\/huggingface.co\/docs\/transformers\/en\/llm_tutorial)합니다.\n5. 모델이 출력한 토큰을 다시 문자열로 디코딩합니다.\n\n## 성능, 메모리와 하드웨어[[performance-memory-and-hardware]]\n이제 대부분의 머신 러닝 작업이 GPU에서 실행된다는 것을 아실 겁니다. \n다소 느리기는 해도 CPU에서 채팅 모델이나 언어 모델로부터 텍스트를 생성하는 것도 가능합니다. \n하지만 모델을 GPU 메모리에 올려놓을 수만 있다면, GPU를 사용하는 것이 일반적으로 더 선호되는 방식입니다.\n\n### 메모리 고려사항[[memory-considerations]]\n\n기본적으로, [`TextGenerationPipeline`]이나 [`AutoModelForCausalLM`]과 같은 \nHugging Face 클래스는 모델을 `float32` 정밀도(Precision)로 로드합니다. \n이는 파라미터당 4바이트(32비트)를 필요로 하므로, \n80억 개의 파라미터를 가진 \"8B\" 모델은 약 32GB의 메모리를 필요로 한다는 것을 의미합니다. \n하지만 이는 낭비일 수 있습니다! \n대부분의 최신 언어 모델은 파라미터당 2바이트를 사용하는 \"bfloat16\" 정밀도(Precision)로 학습됩니다. \n하드웨어가 이를 지원하는 경우(Nvidia 30xx\/Axxx 이상), \n`torch_dtype` 파라미터로 위와 같이 `bfloat16` 정밀도(Precision)로 모델을 로드할 수 있습니다.\n\n또한, 16비트보다 더 낮은 정밀도(Precision)로 모델을 압축하는 \n\"양자화(quantization)\" 방법을 사용할 수도 있습니다. \n이 방법은 모델의 가중치를 손실 압축하여 각 파라미터를 8비트, \n4비트 또는 그 이하로 줄일 수 있습니다. \n특히 4비트에서 모델의 출력이 부정적인 영향을 받을 수 있지만, \n더 크고 강력한 채팅 모델을 메모리에 올리기 위해 이 같은 트레이드오프를 감수할 가치가 있습니다. \n이제 `bitsandbytes`를 사용하여 이를 실제로 확인해 보겠습니다:\n\n```python\nfrom transformers import AutoModelForCausalLM, BitsAndBytesConfig\n\nquantization_config = BitsAndBytesConfig(load_in_8bit=True) # You can also try load_in_4bit\nmodel = AutoModelForCausalLM.from_pretrained(\"meta-llama\/Meta-Llama-3-8B-Instruct\", device_map=\"auto\", quantization_config=quantization_config)\n```\n\n위의 작업은 `pipeline` API에도 적용 가능합니다:\n\n```python\nfrom transformers import pipeline, BitsAndBytesConfig\n\nquantization_config = BitsAndBytesConfig(load_in_8bit=True) # You can al" - }, - { - "id": 10, - "initial_rank": 10, - "content": "\n\n# Model training anatomy\n\nモデルトレーニングの効率を向上させるために適用できるパフォーマンス最適化テクニックを理解するには、トレーニング中にGPUがどのように利用されるか、および実行される操作に応じて計算強度がどのように変化するかを理解することが役立ちます。\n\nまずは、GPUの利用例とモデルのトレーニング実行に関する示唆に富む例を探求することから始めましょう。デモンストレーションのために、いくつかのライブラリをインストールする必要があります:\n\n```bash\npip install transformers datasets accelerate nvidia-ml-py3\n```\n\n`nvidia-ml-py3` ライブラリは、Python内からモデルのメモリ使用状況をモニターすることを可能にします。おそらく、ターミナルでの `nvidia-smi` コマンドについてはお聞きかもしれませんが、このライブラリを使用すると、Pythonから同じ情報にアクセスできます。\n\nそれから、いくつかのダミーデータを作成します。100から30000の間のランダムなトークンIDと、分類器のためのバイナリラベルです。合計で、512のシーケンスがあり、それぞれの長さは512で、PyTorchフォーマットの [`~datasets.Dataset`] に格納されます。\n\n\n```py\n>>> import numpy as np\n>>> from datasets import Dataset\n\n\n>>> seq_len, dataset_size = 512, 512\n>>> dummy_data = {\n... \"input_ids\": np.random.randint(100, 30000, (dataset_size, seq_len)),\n... \"labels\": np.random.randint(0, 1, (dataset_size)),\n... }\n>>> ds = Dataset.from_dict(dummy_data)\n>>> ds.set_format(\"pt\")\n```\n\n\n[`Trainer`]を使用してGPU利用率とトレーニング実行の要約統計情報を表示するために、2つのヘルパー関数を定義します。\n\n\n```py\n>>> from pynvml import *\n\n\n>>> def print_gpu_utilization():\n... nvmlInit()\n... handle = nvmlDeviceGetHandleByIndex(0)\n... info = nvmlDeviceGetMemoryInfo(handle)\n... print(f\"GPU memory occupied: {info.used\/\/1024**2} MB.\")\n\n\n>>> def print_summary(result):\n... print(f\"Time: {result.metrics['train_runtime']:.2f}\")\n... print(f\"Samples\/second: {result.metrics['train_samples_per_second']:.2f}\")\n... print_gpu_utilization()\n```\n\n以下は、無料のGPUメモリから開始していることを確認しましょう:\n\n\n```py\n>>> print_gpu_utilization()\nGPU memory occupied: 0 MB.\n```\n\nGPUメモリがモデルを読み込む前のように占有されていないように見えます。これがお使いのマシンでの状況でない場合は、GPUメモリを使用しているすべてのプロセスを停止してください。ただし、すべての空きGPUメモリをユーザーが使用できるわけではありません。モデルがGPUに読み込まれると、カーネルも読み込まれ、1〜2GBのメモリを使用することがあります。それがどれくらいかを確認するために、GPUに小さなテンソルを読み込むと、カーネルも読み込まれます。\n\n\n```py\n>>> import torch\n\n\n>>> torch.ones((1, 1)).to(\"cuda\")\n>>> print_gpu_utilization()\nGPU memory occupied: 1343 MB.\n```\n\nカーネルだけで1.3GBのGPUメモリを使用していることがわかります。次に、モデルがどれだけのスペースを使用しているかを見てみましょう。\n\n## Load Model\n\nまず、`google-bert\/bert-large-uncased` モデルを読み込みます。モデルの重みを直接GPUに読み込むことで、重みだけがどれだけのスペースを使用しているかを確認できます。\n\n\n```py\n>>> from transformers import AutoModelForSequenceClassification\n\n\n>>> model = AutoModelForSequenceClassification.from_pretrained(\"google-bert\/bert-large-uncased\").to(\"cuda\")\n>>> print_gpu_utilization()\nGPU memory occupied: 2631 MB.\n```\n\nモデルの重みだけで、GPUメモリを1.3 GB使用していることがわかります。正確な数値は、使用している具体的なGPUに依存します。新しいGPUでは、モデルの重みが最適化された方法で読み込まれるため、モデルの使用を高速化することがあるため、モデルがより多くのスペースを占有することがあります。さて、`nvidia-smi` CLIと同じ結果が得られるかを簡単に確認することもできます。\n\n\n```bash\nnvidia-smi\n```\n\n```bash\nTue Jan 11 08:58:05 2022\n+-----------------------------------------------------------------------------+\n| NVIDIA-SMI 460.91.03 Driver Version: 460.91.03 CUDA Version: 11.2 |\n|-------------------------------+----------------------+----------------------+\n| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |\n| Fan Temp Perf Pwr:Usage\/Cap| Memory-Usage | GPU-Util Compute M. |\n| | | MIG M. |\n|===============================+======================+======================|\n| 0 Tesla V100-SXM2... On | 00000000:00:04.0 Off | 0 |\n| N\/A 37C P0 39W \/ 300W | 2631MiB \/ 16160MiB | 0% Default |\n| | | N\/A |\n+-------------------------------+----------------------+---------" - }, - { - "id": 11, - "initial_rank": 11, - "content": "erformance, memory and hardware\n\nYou probably know by now that most machine learning tasks are run on GPUs. However, it is entirely possible\nto generate text from a chat model or language model on a CPU, albeit somewhat more slowly. If you can fit\nthe model in GPU memory, though, this will usually be the preferable option.\n\n### Memory considerations\n\nBy default, Hugging Face classes like [`TextGenerationPipeline`] or [`AutoModelForCausalLM`] will load the model in \n`float32` precision. This means that it will need 4 bytes (32 bits) per parameter, so an \"8B\" model with 8 billion\nparameters will need ~32GB of memory. However, this can be wasteful! Most modern language models are trained in \n\"bfloat16\" precision, which uses only 2 bytes per parameter. If your hardware supports it (Nvidia 30xx\/Axxx\nor newer), you can load the model in `bfloat16` precision, using the `torch_dtype` argument as we did above.\n\nIt is possible to go even lower than 16-bits using \"quantization\", a method to lossily compress model weights. This\nallows each parameter to be squeezed down to 8 bits, 4 bits or even less. Note that, especially at 4 bits,\nthe model's outputs may be negatively affected, but often this is a tradeoff worth making to fit a larger and more\ncapable chat model in memory. Let's see this in action with `bitsandbytes`:\n\n```python\nfrom transformers import AutoModelForCausalLM, BitsAndBytesConfig\n\nquantization_config = BitsAndBytesConfig(load_in_8bit=True) # You can also try load_in_4bit\nmodel = AutoModelForCausalLM.from_pretrained(\"meta-llama\/Meta-Llama-3-8B-Instruct\", device_map=\"auto\", quantization_config=quantization_config)\n```\n\nOr we can do the same thing using the `pipeline` API:\n\n```python\nfrom transformers import pipeline, BitsAndBytesConfig\n\nquantization_config = BitsAndBytesConfig(load_in_8bit=True) # You can also try load_in_4bit\npipe = pipeline(\"text-generation\", \"meta-llama\/Meta-Llama-3-8B-Instruct\", device_map=\"auto\", model_kwargs={\"quantization_config\": quantization_config})\n```\n\nThere are several other options for quantizing models besides `bitsandbytes` - please see the [Quantization guide](.\/quantization)\nfor more information.\n\n### Performance considerations\n\n\n\nFor a more extensive guide on language model performance and optimization, check out [LLM Inference Optimization](.\/llm_optims) .\n\n<\/Tip>\n\n\nAs a general rule, larger chat models will be slower in addition to requiring more memory. It's possible to be\nmore concrete about this, though: Generating text from a chat model is unusual in that it is bottlenecked by\n**memory bandwidth** rather than compute power, because every active parameter must be read from memory for each\ntoken that the model generates. This means that number of tokens per second you can generate from a chat\nmodel is generally proportional to the total bandwidth of the memory it resides in, divided by the size of the model.\n\nIn our quickstart example above, our model was ~16GB in size when loaded in `bfloat16` precision. \nThis means that 16GB must be read from memory for every token generated by the model. Total memory bandwidth can\nvary from 20-100GB\/sec for consumer CPUs to 200-900GB\/sec for consumer GPUs, specialized CPUs like\nIntel Xeon, AMD Threadripper\/Epyc or high-end Apple silicon, and finally up to 2-3TB\/sec for data center GPUs like\nthe Nvidia A100 or H100. This should give you a good idea of the generation speed you can expect from these different\nhardware types.\n\nTherefore, if you want to improve the speed of text generation, the easiest solution is to either reduce the\nsize of the model in memory (usually by quantization), or get hardware with higher memory bandwidth. For advanced users, \nseveral other techniques exist to get around this bandwidth bottleneck. The most common are variants on \n[assisted generation](https:\/\/huggingface.co\/blog\/assisted-generation), also known as \"speculative\nsampling\". These techniques try to guess multiple future tokens at once, often using a smaller \"draft model\", and then\nconfirm these generations with the chat model. If the guesses are validated by the chat model, more than one token can\nbe generated per forward pass, which greatly alleviates the bandwidth bottleneck and improves generation speed. \n\nFinally, we should also note the impact of \"Mixture of Experts\" (MoE) models here. Several popular chat models,\nsuch as Mixtral, Qwen-MoE and DBRX, are MoE models. In these models, not every parameter is active for every token generated.\nAs a result, MoE models generally have much lower memory bandwidth requirements, even though their total size\ncan be quite large. They can therefore be several times faster than a normal \"dense\" model of the same size. However,\ntechniques like assisted generation are generally ineffective for these models because more parameters will become\nactive with each new speculated token, which will negate the bandwidth and speed benefits that the MoE architecture\nprovides.\n\n" - }, - { - "id": 12, - "initial_rank": 12, - "content": "\n\n# GPTQ\n\n\n\nTry GPTQ quantization with PEFT in this [notebook](https:\/\/colab.research.google.com\/drive\/1_TIrmuKOFhuRRiTWN94iLKUFu6ZX4ceb?usp=sharing) and learn more about it's details in this [blog post](https:\/\/huggingface.co\/blog\/gptq-integration)!\n\n<\/Tip>\n\nThe [AutoGPTQ](https:\/\/github.com\/PanQiWei\/AutoGPTQ) library implements the GPTQ algorithm, a post-training quantization technique where each row of the weight matrix is quantized independently to find a version of the weights that minimizes the error. These weights are quantized to int4, but they're restored to fp16 on the fly during inference. This can save your memory-usage by 4x because the int4 weights are dequantized in a fused kernel rather than a GPU's global memory, and you can also expect a speedup in inference because using a lower bitwidth takes less time to communicate.\n\nBefore you begin, make sure the following libraries are installed:\n\n```bash\npip install auto-gptq\npip install --upgrade accelerate optimum transformers\n```\n\nTo quantize a model (currently only supported for text models), you need to create a [`GPTQConfig`] class and set the number of bits to quantize to, a dataset to calibrate the weights for quantization, and a tokenizer to prepare the dataset.\n\n```py\nfrom transformers import AutoModelForCausalLM, AutoTokenizer, GPTQConfig\n\nmodel_id = \"facebook\/opt-125m\"\ntokenizer = AutoTokenizer.from_pretrained(model_id)\ngptq_config = GPTQConfig(bits=4, dataset=\"c4\", tokenizer=tokenizer)\n```\n\nYou could also pass your own dataset as a list of strings, but it is highly recommended to use the same dataset from the GPTQ paper.\n\n```py\ndataset = [\"auto-gptq is an easy-to-use model quantization library with user-friendly apis, based on GPTQ algorithm.\"]\ngptq_config = GPTQConfig(bits=4, dataset=dataset, tokenizer=tokenizer)\n```\n\nLoad a model to quantize and pass the `gptq_config` to the [`~AutoModelForCausalLM.from_pretrained`] method. Set `device_map=\"auto\"` to automatically offload the model to a CPU to help fit the model in memory, and allow the model modules to be moved between the CPU and GPU for quantization.\n\n```py\nquantized_model = AutoModelForCausalLM.from_pretrained(model_id, device_map=\"auto\", quantization_config=gptq_config)\n```\n\nIf you're running out of memory because a dataset is too large, disk offloading is not supported. If this is the case, try passing the `max_memory` parameter to allocate the amount of memory to use on your device (GPU and CPU):\n\n```py\nquantized_model = AutoModelForCausalLM.from_pretrained(model_id, device_map=\"auto\", max_memory={0: \"30GiB\", 1: \"46GiB\", \"cpu\": \"30GiB\"}, quantization_config=gptq_config)\n```\n\n\n\nDepending on your hardware, it can take some time to quantize a model from scratch. It can take ~5 minutes to quantize the [facebook\/opt-350m](https:\/\/huggingface.co\/facebook\/opt-350m) model on a free-tier Google Colab GPU, but it'll take ~4 hours to quantize a 175B parameter model on a NVIDIA A100. Before you quantize a model, it is a good idea to check the Hub if a GPTQ-quantized version of the model already exists.\n\n<\/Tip>\n\nOnce your model is quantized, you can push the model and tokenizer to the Hub where it can be easily shared and accessed. Use the [`~PreTrainedModel.push_to_hub`] method to save the [`GPTQConfig`]:\n\n```py\nquantized_model.push_to_hub(\"opt-125m-gptq\")\ntokenizer.push_to_hub(\"opt-125m-gptq\")\n```\n\nYou could also save your quantized model locally with the [`~PreTrainedModel.save_pretrained`] method. If the model was quantized with the `device_map` parameter, make sure to move the entire model to a GPU or CPU before saving it. For example, to save the model on a CPU:\n\n```py\nquantized_model.save_pretrained(\"opt-125m-gptq\")\ntokenizer.save_pretrained(\"opt-125m-gptq\")\n\n# if quantized with device_map set\nquantized_model.to(\"cpu\")\nquantized_model.save_pretrained(\"opt-125m-gptq\")\n```\n\nReload a quantized model with the [`~PreTrainedModel.from_pretrained`] method, and set `device_map=\"auto\"` to automatically distribute the model on all available GPUs to load the model faster without using more memory than needed.\n\n```py\nfrom transformers import AutoModelForCausalLM\n\nmodel = AutoModelForCausalLM.from_pretrained(\"{your_username}\/opt-125m-gptq\", device_map=\"auto\")\n```\n\n## ExLlama\n\n[ExLlama](https:\/\/github.com\/turboderp\/exllama) is a Python\/C++\/CUDA implementation of the [Llama](model_doc\/llama) model that is designed for faster inference with 4-bit GPTQ weights (check out these [benchmarks](https:\/\/github.com\/huggingface\/optimum\/tree\/main\/tests\/benchmark#gptq-benchmark)). The ExLlama kernel is activated by default when you create a [`GPTQConfig`] object. To boost inference speed even further, use the [ExLlamaV2](https:\/\/github.com\/turboderp\/exllamav2) kernels by configuring the `exllama_config` parameter:\n\n```py\nimport torch\nfrom transformers import AutoModelForCausalLM, GPTQConfig\n\ngptq_config = GPTQConfig(bits=4, exllama_config={\"version\":2})\nmodel = AutoModelForCausalLM.from_pretrained(\"{your_username}\/opt-125m-gptq\", device_map=\"auto\", quantization_config=gptq_config)\n```\n\n\n\nOnly 4-bit models are supported, and we recommend deactivating the ExLlama kernels if you're finetuning a quantized model with PEFT.\n\n<\/Tip>\n\nThe ExLlama kernels are only supported when the entire model is on the GPU. If you're doing inference on a CPU with AutoGPTQ (version > 0.4.2), then you'll need to disable the ExLlama kernel. This overwrites the attributes related to the ExLlama kernels in the quantization config of the config.json file.\n\n```py\nimport torch\nfrom transformers import AutoModelForCausalLM, GPTQConfig\ngptq_config = GPTQConfig(bits=4, use_exllama=False)\nmodel = AutoModelForCausalLM.from_pretrained(\"{your_username}\/opt-125m-gptq\", device_map=\"cpu\", quantization_config=gptq_config)\n```" - }, - { - "id": 13, - "initial_rank": 13, - "content": "uct\", device_map=\"auto\", torch_dtype=torch.bfloat16)\ntokenizer = AutoTokenizer.from_pretrained(\"meta-llama\/Meta-Llama-3-8B-Instruct\")\n\n# 2: تطبيق قالب الدردشة\nformatted_chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)\nprint(\"Formatted chat:\\n\", formatted_chat)\n\n# 3: تحليل الدردشة (يمكن دمج هذه الخطوة مع الخطوة السابقة باستخدام tokenize=True)\ninputs = tokenizer(formatted_chat, return_tensors=\"pt\", add_special_tokens=False)\n# نقل المدخلات المحللة إلى نفس الجهاز الموجود عليه النموذج (GPU\/CPU)\ninputs = {key: tensor.to(model.device) for key, tensor in inputs.items()}\nprint(\"Tokenized inputs:\\n\", inputs)\n\n# 4: إنشاء نص من النموذج\noutputs = model.generate(**inputs, max_new_tokens=512, temperature=0.1)\nprint(\"Generated tokens:\\n\", outputs)\n\n# 5: فك تشفير الإخراج مرة أخرى إلى سلسلة\ndecoded_output = tokenizer.decode(outputs[0][inputs['input_ids'].size(1):], skip_special_tokens=True)\nprint(\"Decoded output:\\n\", decoded_output)\n```\n\nهناك الكثير هنا، ويمكن أن تكون كل قطعة وثيقة خاصة بها! بدلاً من الدخول في الكثير من التفاصيل، سأغطي\nالأفكار العامة، وأترك التفاصيل للوثائق المرتبطة بها. الخطوات الرئيسية هي:\n1. يتم تحميل [النماذج](https:\/\/huggingface.co\/learn\/nlp-course\/en\/chapter2\/3) و [المُجزّئات اللغوية](https:\/\/huggingface.co\/learn\/nlp-course\/en\/chapter2\/4?fw=pt) من Hugging Face Hub.\n2. يتم تنسيق الدردشة باستخدام [قالب الدردشة](https:\/\/huggingface.co\/docs\/transformers\/main\/en\/chat_templating) للمحلل\n3. يتم [تحليل](https:\/\/huggingface.co\/learn\/nlp-course\/en\/chapter2\/4) الدردشة المنسقة باستخدام مُجزّئ اللغوي.\n4. نقوم [بتوليد](https:\/\/huggingface.co\/docs\/transformers\/en\/llm_tutorial) استجابة من النموذج.\n5. يتم فك تشفير الرموز التي ينتجها النموذج مرة أخرى إلى سلسلة\n\n## الأداء والذاكرة والأجهزة\n\nمن المحتمل أنك تعرف الآن أن معظم مهام التعلم الآلي يتم تشغيلها على وحدات معالجة الرسومات (GPU). ومع ذلك، من الممكن تمامًا\nإنشاء نص من نموذج دردشة أو نموذج لغة على وحدة المعالجة المركزية (CPU)، على الرغم من أن ذلك أبطأ إلى حد ما. إذا كان بإمكانك وضع\nالنموذج في ذاكرة وحدة معالجة الرسومات (GPU)، فهذا عادة ما يكون الخيار المفضل.\n\n### اعتبارات الذاكرة\n\nبشكل افتراضي، تقوم فئات Hugging Face مثل [`TextGenerationPipeline`] أو [`AutoModelForCausalLM`] بتحميل النموذج في دقة \"float32\". وهذا يعني أنه يحتاج إلى 4 بايتات (32 بت) لكل معلمة، لذا فإن نموذج \"8B\" بحجم 8 مليار معلمة سيحتاج إلى ~32 جيجابايت من الذاكرة. ومع ذلك، يمكن أن يكون هذا مضيعة للموارد! يتم تدريب معظم نماذج اللغة الحديثة في دقة \"bfloat16\"، والتي تستخدم فقط 2 بايت لكل معلمة. إذا كان عتادك يدعم ذلك (Nvidia 30xx\/Axxx أو أحدث)، فيمكنك تحميل النموذج في دقة \"bfloat16\"، باستخدام معامل \"torch_dtype\" كما فعلنا أعلاه.\n\nومن الممكن أيضًا النزول إلى أقل من 16 بت باستخدام \"التكميم\"، وهي طريقة لضغط أوزان النموذج بطريقة تفقد بعض المعلومات. يسمح هذا بضغط كل معلمة إلى 8 بتات أو 4 بتات أو حتى أقل. لاحظ أنه، خاصة في 4 بتات، قد تتأثر جودة ناتج النموذج سلبًا، ولكن غالبًا ما يكون هذا مقايضة تستحق القيام بها لتناسب نموذج محادثة أكبر وأكثر قدرة في الذاكرة. دعنا كيف يمكننا تطبيق ذلك باستخدام مكتبة `bitsandbytes`:\n\n```python\nfrom transformers import AutoModelForCausalLM, BitsAndBytesConfig\n\nquantization_config = BitsAndBytesConfig(load_in_8bit=True) # يمكنك أيضًا تجربة load_in_4bit\nmodel = AutoModelForCausalLM.from_pretrained(\"meta-llama\/Meta-Llama-3-8B-Instruct\", device_map=\"auto\", quantization_config=quantization_config)\n```\n\nأو يمكننا القيام بنفس الشيء باستخدام واجهة برمجة التطبيقات \"pipeline\":\n\n```python\nfrom transformers import pipeline, BitsAndBytesConfig\n\nquantization_config = BitsAndBytesConfig(load_in_8bit=True) # يمكنك أيضًا تجربة load_in_4bit\npipe = pipeline(\"text-generation\", \"meta-llama\/Meta-Llama-3-8B-Instruct\", device_map=\"auto\", model_kwargs={\"quantization_config\": quantization_config})\n```\n\nهناك عدة خيارات أخرى لكمية نم" - }, - { - "id": 14, - "initial_rank": 14, - "content": "\n\n# Inferenza efficiente su GPU singola\n\nQuesto documento sarà presto completato con informazioni su come effetture l'inferenza su una singola GPU. Nel frattempo è possibile consultare [la guida per l'addestramento su una singola GPU](perf_train_gpu_one) e [la guida per l'inferenza su CPU](perf_infer_cpu).\n\n## `BetterTransformer` per l'inferenza più veloce\n\nAbbiamo recentemente integrato `BetterTransformer` per velocizzare l'inferenza su GPU per modelli di testo, immagini e audio. Per maggiori dettagli, consultare la documentazione su questa integrazione [qui](https:\/\/huggingface.co\/docs\/optimum\/bettertransformer\/overview).\n\n## Integrazione di `bitsandbytes` per Int8 mixed-precision matrix decomposition\n\n\n\nNota che questa funzione può essere utilizzata anche nelle configurazioni multi GPU.\n\n<\/Tip>\n\nDal paper [`LLM.int8() : 8-bit Matrix Multiplication for Transformers at Scale`](https:\/\/arxiv.org\/abs\/2208.07339), noi supportiamo l'integrazione di Hugging Face per tutti i modelli dell'Hub con poche righe di codice.\nIl metodo `nn.Linear` riduce la dimensione di 2 per i pesi `float16` e `bfloat16` e di 4 per i pesi `float32`, con un impatto quasi nullo sulla qualità, operando sugli outlier in half-precision.\n\n![HFxbitsandbytes.png](https:\/\/cdn-uploads.huggingface.co\/production\/uploads\/1659861207959-62441d1d9fdefb55a0b7d12c.png)\n\nIl metodo Int8 mixed-precision matrix decomposition funziona separando la moltiplicazione tra matrici in due flussi: (1) una matrice di flusso di outlier di caratteristiche sistematiche moltiplicata in fp16, (2) in flusso regolare di moltiplicazione di matrici int8 (99,9%). Con questo metodo, è possibile effettutare inferenza int8 per modelli molto grandi senza degrado predittivo.\nPer maggiori dettagli sul metodo, consultare il [paper](https:\/\/arxiv.org\/abs\/2208.07339) o il nostro [blogpost sull'integrazione](https:\/\/huggingface.co\/blog\/hf-bitsandbytes-integration).\n\n![MixedInt8.gif](https:\/\/cdn-uploads.huggingface.co\/production\/uploads\/1660567469965-62441d1d9fdefb55a0b7d12c.gif)\n\nNota che è necessaria una GPU per eseguire modelli di tipo mixed-8bit, poiché i kernel sono stati compilati solo per le GPU. Prima di utilizzare questa funzione, assicurarsi di disporre di memoria sufficiente sulla GPU per memorizzare un quarto del modello (o la metà se i pesi del modello sono in mezza precisione).\nDi seguito sono riportate alcune note per aiutarvi a utilizzare questo modulo, oppure seguite le dimostrazioni su [Google colab](#colab-demos).\n\n### Requisiti\n\n- Se si dispone di `bitsandbytes<0.37.0`, assicurarsi di eseguire su GPU NVIDIA che supportano tensor cores a 8 bit (Turing, Ampere o architetture più recenti - ad esempio T4, RTX20s RTX30s, A40-A100). Per `bitsandbytes>=0.37.0`, tutte le GPU dovrebbero essere supportate.\n- Installare la versione corretta di `bitsandbytes` eseguendo:\n`pip install bitsandbytes>=0.31.5`.\n- Installare `accelerate`\n`pip install accelerate>=0.12.0`\n\n### Esecuzione di modelli mixed-Int8 - configurazione per singola GPU\n\nDopo aver installato le librerie necessarie, per caricare il tuo modello mixed 8-bit è il seguente:\n\n```py\nfrom transformers import AutoModelForCausalLM, BitsAndBytesConfig\n\nmodel_name = \"bigscience\/bloom-2b5\"\nmodel_8bit = AutoModelForCausalLM.from_pretrained(model_name, quantization_config=BitsAndBytesConfig(load_in_8bit=True))\n```\n\nPer la generazione di testo, si consiglia di:\n\n* utilizzare il metodo `generate()` del modello invece della funzione `pipeline()`. Sebbene l'inferenza sia possibile con la funzione `pipeline()`, essa non è ottimizzata per i modelli mixed-8bit e sarà più lenta rispetto all'uso del metodo `generate()`. Inoltre, alcune strategie di campionamento, come il campionamento nucleaus, non sono supportate dalla funzione `pipeline()` per i modelli mixed-8bit.\n* collocare tutti gli ingressi sullo stesso dispositivo del modello.\n\nEcco un semplice esempio:\n\n```py\nfrom transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig\n\nmodel_name = \"bigscience\/bloom-2b5\"\ntokenizer = AutoTokenizer.from_pretrained(model_name)\nmodel_8bit = AutoModelForCausalLM.from_pretrained(model_name, quantization_config=BitsAndBytesConfig(load_in_8bit=True))\n\ntext = \"Hello, my llama is cute\"\ninputs = tokenizer(prompt, return_tensors=\"pt\").to(\"cuda\")\ngenerated_ids = model.generate(**inputs)\noutputs = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)\n```\n\n\n### Esecuzione di modelli mixed-8bit - configurazione multi GPU\n\nUsare il seguente modo caricare il modello mixed-8bit su più GPU (stesso comando della configurazione a GPU singola):\n```py\nmodel_name = \"bigscience\/bloom-2b5\"\nmodel_8bit = AutoModelForCausalLM.from_pretrained(model_name, quantization_config=BitsAndBytesConfig(load_in_8bit=True))\n```\nPuoi controllare la RAM della GPU che si vuole allocare su ogni GPU usando `accelerate`. Utilizzare l'argomento `max_memory` come segue:\n\n```py\nmax_memory_mapping = {0: \"1GB\", 1: \"2GB\"}\nmodel_name = \"bigscience\/bloom-3b\"\nmodel_8bit = AutoModelForCausalLM.from_pretrained(\n model_name, device_map=\"auto\", load_in_8bit=True, max_memory=max_memory_mapping\n)\n```\nIn questo esempio, la prima GPU utilizzerà 1 GB di memoria e la seconda 2 GB.\n\n### Colab demos\n\nCon questo metodo è possibile inferire modelli che prima non era possibile inferire su Google Colab.\nGuardate la demo per l'esecuzione di T5-11b (42GB in fp32)! Utilizzo la quantizzazione a 8 bit su Google Colab:\n\n[![Open In Colab: T5-11b demo](https:\/\/colab.research.google.com\/assets\/colab-badge.svg)](https:\/\/colab.research.google.com\/drive\/1YORPWx4okIHXnjW7MSAidXN29mPVNT7F?usp=sharing)\n\nOppure questa demo di BLOOM-3B:\n\n[![Open In Colab: BLOOM-3b demo](https:\/\/colab.research.google.com\/assets\/colab-badge.svg)](https:\/\/colab.research.google.com\/drive\/1qOjXfQIAULfKvZqwCen8-MoWKGdSatZ4?usp=sharing)" - }, - { - "id": 15, - "initial_rank": 15, - "content": " Requirement already satisfied: packaging>=20.9 in \/workspace\/langchain\/.venv\/lib\/python3.9\/site-packages (from huggingface-hub>=0.4.0->sentence-transformers>=2.2.2->chromadb) (23.1)\n Requirement already satisfied: anyio<5,>=3.4.0 in \/workspace\/langchain\/.venv\/lib\/python3.9\/site-packages (from starlette<0.27.0,>=0.26.1->fastapi>=0.85.1->chromadb) (3.6.2)\n Requirement already satisfied: nvidia-cuda-runtime-cu11==11.7.99 in \/workspace\/langchain\/.venv\/lib\/python3.9\/site-packages (from torch>=1.6.0->sentence-transformers>=2.2.2->chromadb) (11.7.99)\n Requirement already satisfied: nvidia-cudnn-cu11==8.5.0.96 in \/workspace\/langchain\/.venv\/lib\/python3.9\/site-packages (from torch>=1.6.0->sentence-transformers>=2.2.2->chromadb) (8.5.0.96)\n Requirement already satisfied: nvidia-cublas-cu11==11.10.3.66 in \/workspace\/langchain\/.venv\/lib\/python3.9\/site-packages (from torch>=1.6.0->sentence-transformers>=2.2.2->chromadb) (11.10.3.66)\n Requirement already satisfied: nvidia-cuda-nvrtc-cu11==11.7.99 in \/workspace\/langchain\/.venv\/lib\/python3.9\/site-packages (from torch>=1.6.0->sentence-transformers>=2.2.2->chromadb) (11.7.99)\n Requirement already satisfied: setuptools in \/workspace\/langchain\/.venv\/lib\/python3.9\/site-packages (from nvidia-cublas-cu11==11.10.3.66->torch>=1.6.0->sentence-transformers>=2.2.2->chromadb) (67.7.1)\n Requirement already satisfied: wheel in \/workspace\/langchain\/.venv\/lib\/python3.9\/site-packages (from nvidia-cublas-cu11==11.10.3.66->torch>=1.6.0->sentence-transformers>=2.2.2->chromadb) (0.40.0)\n Requirement already satisfied: regex!=2019.12.17 in \/workspace\/langchain\/.venv\/lib\/python3.9\/site-packages (from transformers<5.0.0,>=4.6.0->sentence-transformers>=2.2.2->chromadb) (2023.3.23)\n Requirement already satisfied: tokenizers!=0.11.3,<0.14,>=0.11.1 in \/workspace\/langchain\/.venv\/lib\/python3.9\/site-packages (from transformers<5.0.0,>=4.6.0->sentence-transformers>=2.2.2->chromadb) (0.13.3)\n Requirement already satisfied: joblib in \/workspace\/langchain\/.venv\/lib\/python3.9\/site-packages (from nltk->sentence-transformers>=2.2.2->chromadb) (1.2.0)\n Requirement already satisfied: threadpoolctl>=2.0.0 in \/workspace\/langchain\/.venv\/lib\/python3.9\/site-packages (from scikit-learn->sentence-transformers>=2.2.2->chromadb) (3.1.0)\n Requirement already satisfied: pillow!=8.3.*,>=5.3.0 in \/workspace\/langchain\/.venv\/lib\/python3.9\/site-packages (from torchvision->sentence-transformers>=2.2.2->chromadb) (9.5.0)\n Requirement already satisfied: sniffio>=1.1 in \/workspace\/langchain\/.venv\/lib\/python3.9\/site-packages (from anyio<5,>=3.4.0->starlette<0.27.0,>=0.26.1->fastapi>=0.85.1->chromadb) (1.3.0)\n```" - }, - { - "id": 16, - "initial_rank": 16, - "content": "の設定(ハードウェア、問題サイズ)で使用可能かどうかを確認するには、[`torch.backends.cuda.sdp_kernel`](https:\/\/pytorch.org\/docs\/master\/backends.html#torch.backends.cuda.sdp_kernel)をコンテキストマネージャとして使用します。\n\n\n```diff\nimport torch\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\n\ntokenizer = AutoTokenizer.from_pretrained(\"facebook\/opt-350m\")\nmodel = AutoModelForCausalLM.from_pretrained(\"facebook\/opt-350m\", torch_dtype=torch.float16).to(\"cuda\")\n# convert the model to BetterTransformer\nmodel.to_bettertransformer()\n\ninput_text = \"Hello my dog is cute and\"\ninputs = tokenizer(input_text, return_tensors=\"pt\").to(\"cuda\")\n\n+ with torch.backends.cuda.sdp_kernel(enable_flash=True, enable_math=False, enable_mem_efficient=False):\n outputs = model.generate(**inputs)\n\nprint(tokenizer.decode(outputs[0], skip_special_tokens=True))\n```\n\nもしトレースバックにバグが表示された場合\n\n```bash\nRuntimeError: No available kernel. Aborting execution.\n```\n\nFlash Attention の広範なカバレッジを持つかもしれない PyTorch のナイトリーバージョンを試してみることをお勧めします。\n\n```bash\npip3 install -U --pre torch torchvision torchaudio --index-url https:\/\/download.pytorch.org\/whl\/nightly\/cu118\n```\n\nOr make sure your model is correctly casted in float16 or bfloat16\n\nモデルが正しくfloat16またはbfloat16にキャストされていることを確認してください。\n\nHave a look at [this detailed blogpost](https:\/\/pytorch.org\/blog\/out-of-the-box-acceleration\/) to read more about what is possible to do with `BetterTransformer` + SDPA API.\n\n`BetterTransformer` + SDPA APIを使用して何が可能かについて詳しく読むには、[この詳細なブログポスト](https:\/\/pytorch.org\/blog\/out-of-the-box-acceleration\/)をご覧ください。\n\n## `bitsandbytes` integration for FP4 mixed-precision inference\n\nFP4混合精度推論のための`bitsandbytes`統合\n\nYou can install `bitsandbytes` and benefit from easy model compression on GPUs. Using FP4 quantization you can expect to reduce up to 8x the model size compared to its native full precision version. Check out below how to get started.\n\n`bitsandbytes`をインストールし、GPUで簡単なモデルの圧縮を利用できます。FP4量子化を使用すると、ネイティブのフルプレシジョンバージョンと比較してモデルサイズを最大8倍削減できることが期待できます。以下を確認して、どのように始めるかをご覧ください。\n\n\n\nNote that this feature can also be used in a multi GPU setup.\n\nこの機能は、マルチGPUセットアップでも使用できることに注意してください。\n\n<\/Tip>\n\n### Requirements [[requirements-for-fp4-mixedprecision-inference]]\n\n- Latest `bitsandbytes` library\n`pip install bitsandbytes>=0.39.0`\n\n- Install latest `accelerate` from source\n`pip install git+https:\/\/github.com\/huggingface\/accelerate.git`\n\n- Install latest `transformers` from source\n`pip install git+https:\/\/github.com\/huggingface\/transformers.git`\n\n\n### Running FP4 models - single GPU setup - Quickstart\n\n以下のコードを実行することで、簡単に単一のGPUでFP4モデルを実行できます:\n\n\n```py\nfrom transformers import AutoModelForCausalLM\n\nmodel_name = \"bigscience\/bloom-2b5\"\nmodel_4bit = AutoModelForCausalLM.from_pretrained(model_name, device_map=\"auto\", load_in_4bit=True)\n```\n\n注意: `device_map`はオプションですが、推論時に `device_map = 'auto'` を設定することが推奨されています。これにより、利用可能なリソースに効率的にモデルがディスパッチされます。\n\n### Running FP4 models - multi GPU setup\n\n混合4ビットモデルを複数のGPUにロードする方法は、単一GPUセットアップと同じです(単一GPUセットアップと同じコマンドです):\n\n```py\nmodel_name = \"bigscience\/bloom-2b5\"\nmodel_4bit = AutoModelForCausalLM.from_pretrained(model_name, device_map=\"auto\", load_in_4bit=True)\n```\n\nしか" - }, - { - "id": 17, - "initial_rank": 17, - "content": "del = AutoModelForCausalLM.from_pretrained(\"bigcode\/octocoder\", load_in_4bit=True, low_cpu_mem_usage=True, pad_token_id=0)\n\npipe = pipeline(\"text-generation\", model=model, tokenizer=tokenizer)\n\nresult = pipe(prompt, max_new_tokens=60)[0][\"generated_text\"][len(prompt):]\nresult\n```\n\n**출력**:\n```\nHere is a Python function that transforms bytes to Giga bytes:\\n\\n```\\ndef bytes_to_gigabytes(bytes):\\n return bytes \/ 1024 \/ 1024 \/ 1024\\n```\\n\\nThis function takes a single argument\n```\n\n바로 전 코드 스니펫에서 `python`만 누락되고, 이 전과 거의 동일한 출력 텍스트를 보고 있습니다. 이제 얼마나 많은 메모리가 필요했는지 확인해 봅시다.\n\n```python\nbytes_to_giga_bytes(torch.cuda.max_memory_allocated())\n```\n\n**출력**:\n```\n9.543574333190918\n```\n\n9.5GB밖에 되지 않습니다! 150억 개 이상의 파라미터를 가진 모델인 것을 감안하면 매우 적은 양입니다.\n\n여기서는 모델의 정확도 저하가 거의 없음을 확인할 수 있지만, 실제로는 4비트 양자화를 8비트 양자화나 `bfloat16`를 사용한 추론 결과와 비교하면 결과가 다를 수 있습니다. 사용자가 직접 시도해 보는 것이 좋겠습니다.\n\n또한 4비트 양자화에 사용된 더 공격적인 양자화 방법으로 인해 추론 시 \\\\( \\text{quantize} \\\\)와 \\\\( \\text{dequantize} \\\\) 과정이 더 오래 걸리므로 여기서도 8비트 양자화와 비교하여 추론 속도가 약간 느려졌음을 유의하세요.\n\n```python\ndel model\ndel pipe\n```\n```python\nflush()\n```\n\n전체적으로 OctoCoder를 8비트 정밀도로 실행하면 필요한 GPU VRAM이 32GB에서 15GB로 줄어들었고, 4비트 정밀도로 모델을 실행하면 필요한 GPU VRAM이 9GB로 더 줄어드는 것을 확인했습니다.\n\n4비트 양자화는 RTX3090, V100, T4와 같은 GPU에서 모델을 실행할 수 있게 해주며, 이는 대부분의 사람들이 접근할 수 있는 GPU입니다.\n\n양자화에 대한 더 많은 정보를 확인하고 4비트보다 더 적은 GPU VRAM 메모리로 모델을 양자화하거나, 더 많은 양자화 관련 정보를 보려면 [`AutoGPTQ`](https:\/\/huggingface.co\/docs\/transformers\/main\/en\/main_classes\/quantization#autogptq-integration%60) 구현을 참조하는 것을 추천합니다.\n\n> 결론적으로, 모델 양자화는 향상된 메모리 효율성과 모델 정확성 간의 균형을 맞추는 것이며, 경우에 따라 추론 시간에도 영향을 미칠 수 있습니다.\n\n실제 사례에서 GPU 메모리가 충분하다면, 양자화를 고려할 필요가 없습니다. 그러나 많은 GPU는 양자화 없이 대규모 언어 모델을 실행할 수 없으며, 이 경우 4비트 및 8비트 양자화가 매우 유용한 도구입니다.\n\n사용과 관련한 더 자세한 정보는 [트랜스포머 양자화 문서](https:\/\/huggingface.co\/docs\/transformers\/main_classes\/quantization#general-usage)를 참고하는 것을 강력히 추천합니다. 다음으로, 더 나은 알고리즘과 개선된 모델 아키텍처를 사용하여 계산 및 메모리 효율성을 향상시키는 방법을 살펴보겠습니다.\n\n## 2. 플래시 어텐션 [[2-flash-attention]]\n\n오늘날의 최고 성능을 자랑하는 대규모 언어 모델은 대체로 피드포워드 레이어(feed-forward layer), 활성화 레이어(activation layer), 레이어 정규화 레이어(layer normalization layer), 그리고 가장 중요한 셀프 어텐션 레이어(self-attention layer)로 구성된 아키텍처를 공유하고 있습니다.\n\n셀프 어텐션 레이어는 입력 토큰 간의 문맥적 관계를 이해할 수 있게 해 주기 때문에 대규모 언어 모델의 핵심 요소입니다.\n하지만 셀프 어텐션 레이어의 최대 GPU 메모리 소비는 입력 토큰의 수(이하 \\\\( N \\\\)으로 표기)와 함께 계산 및 메모리 복잡성이 *2차적*으로 증가합니다. 입력 시퀀스가 짧은 경우(최대 1000개)에는 크게 눈에 띄지 않지만, 더 긴 입력 시퀀스(약 16000개)에서는 심각한 문제가 됩니다.\n\n자세히 한 번 들여다 봅시다. 길이 \\\\( N \\\\)의 입력 \\\\( \\mathbf{X} \\\\)에 대한 셀프 어텐션 레이어의 출력 \\\\( \\mathbf{O} \\\\)을 계산하는 공식은 다음과 같습니다:\n\n$$ \\textbf{O} = \\text{Attn}(" - }, - { - "id": 18, - "initial_rank": 18, - "content": "T-J\n\n## Overview\n\nThe GPT-J model was released in the [kingoflolz\/mesh-transformer-jax](https:\/\/github.com\/kingoflolz\/mesh-transformer-jax) repository by Ben Wang and Aran Komatsuzaki. It is a GPT-2-like\ncausal language model trained on [the Pile](https:\/\/pile.eleuther.ai\/) dataset.\n\nThis model was contributed by [Stella Biderman](https:\/\/huggingface.co\/stellaathena).\n\n## Usage tips\n\n- To load [GPT-J](https:\/\/huggingface.co\/EleutherAI\/gpt-j-6B) in float32 one would need at least 2x model size\n RAM: 1x for initial weights and another 1x to load the checkpoint. So for GPT-J it would take at least 48GB\n RAM to just load the model. To reduce the RAM usage there are a few options. The `torch_dtype` argument can be\n used to initialize the model in half-precision on a CUDA device only. There is also a fp16 branch which stores the fp16 weights,\n which could be used to further minimize the RAM usage:\n\n```python\n>>> from transformers import GPTJForCausalLM\n>>> import torch\n\n>>> device = \"cuda\"\n>>> model = GPTJForCausalLM.from_pretrained(\n... \"EleutherAI\/gpt-j-6B\",\n... revision=\"float16\",\n... torch_dtype=torch.float16,\n... ).to(device)\n```\n\n- The model should fit on 16GB GPU for inference. For training\/fine-tuning it would take much more GPU RAM. Adam\n optimizer for example makes four copies of the model: model, gradients, average and squared average of the gradients.\n So it would need at least 4x model size GPU memory, even with mixed precision as gradient updates are in fp32. This\n is not including the activations and data batches, which would again require some more GPU RAM. So one should explore\n solutions such as DeepSpeed, to train\/fine-tune the model. Another option is to use the original codebase to\n train\/fine-tune the model on TPU and then convert the model to Transformers format for inference. Instructions for\n that could be found [here](https:\/\/github.com\/kingoflolz\/mesh-transformer-jax\/blob\/master\/howto_finetune.md)\n\n- Although the embedding matrix has a size of 50400, only 50257 entries are used by the GPT-2 tokenizer. These extra\n tokens are added for the sake of efficiency on TPUs. To avoid the mismatch between embedding matrix size and vocab\n size, the tokenizer for [GPT-J](https:\/\/huggingface.co\/EleutherAI\/gpt-j-6B) contains 143 extra tokens\n `<|extratoken_1|>... <|extratoken_143|>`, so the `vocab_size` of tokenizer also becomes 50400.\n\n## Usage examples\n\nThe [`~generation.GenerationMixin.generate`] method can be used to generate text using GPT-J\nmodel.\n\n```python\n>>> from transformers import AutoModelForCausalLM, AutoTokenizer\n\n>>> model = AutoModelForCausalLM.from_pretrained(\"EleutherAI\/gpt-j-6B\")\n>>> tokenizer = AutoTokenizer.from_pretrained(\"EleutherAI\/gpt-j-6B\")\n\n>>> prompt = (\n... \"In a shocking finding, scientists discovered a herd of unicorns living in a remote, \"\n... \"previously unexplored valley, in the Andes Mountains. Even more surprising to the \"\n... \"researchers was the fact that the unicorns spoke perfect English.\"\n... )\n\n>>> input_ids = tokenizer(prompt, return_tensors=\"pt\").input_ids\n\n>>> gen_tokens = model.generate(\n... input_ids,\n... do_sample=True,\n... temperature=0.9,\n... max_length=100,\n... )\n>>> gen_text = tokenizer.batch_decode(gen_tokens)[0]\n```\n\n...or in float16 precision:\n\n```python\n>>> from transformers import GPTJForCausalLM, AutoTokenizer\n>>> import torch\n\n>>> device = \"cuda\"\n>>> model = GPTJForCausalLM.from_pretrained(\"EleutherAI\/gpt-j-6B\", torch_dtype=torch.float16).to(device)\n>>> tokenizer = AutoTokenizer.from_pretrained(\"EleutherAI\/gpt-j-6B\")\n\n>>> prompt = (\n... \"In a shocking finding, scientists discovered a herd of unicorns living in a remote, \"\n... \"previously unexplored valley, in the Andes Mountains. Even more surprising to the \"\n... \"researchers was the fact that the unicorns spoke perfect English.\"\n... )\n\n>>> input_ids = tokenizer(prompt, return_tensors=\"pt\").input_ids.to(device)\n\n>>> gen_tokens = model.generate(\n... input_ids,\n... do_sample=True,\n... temperature=0.9,\n... max_length=100,\n... )\n>>> gen_text = tokenizer.batch_decode(gen_tokens)[0]\n```\n\n## Resources\n\nA list of official Hugging Face and community (indicated by 🌎) resources to help you get started with GPT-J. If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.\n\n\n\n- Description of [GPT-J](https:\/\/huggingface.co\/EleutherAI\/gpt-j-6B).\n- A blog on how to [Deploy GPT-J 6B for inference using Hugging Face Transformers and Amazon SageMaker](https:\/\/huggingface.co\/blog\/gptj-sagemaker).\n- A blog on how to [Accelerate GPT-J inference with DeepSpeed-Inference on GPUs](https:\/\/www.philschmid.de\/gptj-deepspeed-inference).\n- A blog post introducing [GPT-J-6B: 6B JAX-Based Transformer](https:\/\/arankomatsuzaki.wordpress.com\/2021\/06\/04\/gpt-j\/). 🌎\n- A notebook for [GPT-J-6B Inference Demo](https:\/\/colab.research.google.com\/github\/kingoflolz\/mesh-transformer-jax\/blob\/master\/colab_demo.ipynb). 🌎\n- Another notebook demonstrating [Inference with GPT-J-6B](https:\/\/colab.research.google.com\/github\/NielsRogge\/Transformers-Tutorials\/blob\/master\/GPT-J-6B\/Inference_with_GPT_J_6B.ipynb). \n- [Causal language modeling](https:\/\/huggingface.co\/course\/en\/chapter7\/6?fw=pt#training-a-causal-language-model-from-scratch) chapter of the 🤗 Hugging Face Course.\n- [`GPTJForCausalLM`] is supported by this [causal language modeling example script](https:\/\/github.com\/huggingface\/transformers\/tree\/main\/examples\/pytorch\/language-modeling#gpt-2gpt-and-causal-language-modeling), [text generation example script](https:\/\/github.com\/huggingface\/transformers\/tree\/main\/examples\/pytorch\/text-generation), and [notebook](https:\/\/colab.research.google.com\/github\/huggingface\/notebooks\/blob\/main\/examples\/language_modeling.ipynb).\n- [`TFGPTJForCausalLM`] is supported by this [causal language modeling example script](https:\/\/github.com\/huggingface\/transformers\/tree\/main\/examples\/tensorflow\/language-modeling#run_clmpy) and [notebook](https:\/\/colab.research.google.com\/github\/huggingface\/notebooks\/blob\/main\/examples\/language_modeling-tf.ipynb).\n- [`FlaxGPTJForCausalLM`] is supported by this [causal language modeling example script](https:\/\/github.com\/huggingface\/transformers\/tree\/main\/examples\/flax\/language-modeling#causal-language-modeling) and [notebook](https:\/\/colab.research.google.com\/github\/huggingface\/notebooks\/blob\/main\/examples\/causal_language_modeling_flax.ipynb).\n\n**Documentation resources**\n- [Text classification task guide](..\/tasks\/sequence_classification)\n- [Question answering task guide](..\/tasks\/question_answering)\n- [Causal language modeling task guide](..\/tasks\/language_modeling)\n\n## GPTJConfig\n\n[[autodoc]] GPTJConfig\n - all\n\n\n\n\n## GPTJModel\n\n[[autodoc]] GPTJModel\n - forward\n\n## GPTJForCausalLM\n\n[[autodoc]] GPTJForCausalLM\n - forward\n\n## GPTJForSequenceClassification\n\n[[autodoc]] GPTJForSequenceClassification\n - forward\n\n## GPTJForQuestionAnswering\n\n[[autodoc]] GPTJForQuestionAnswering\n - forward\n\n<\/pt>\n\n\n## TFGPTJModel\n\n[[autodoc]] TFGPTJModel\n - call\n\n## TFGPTJForCausalLM\n\n[[autodoc]] TFGPTJForCausalLM\n - call\n\n## TFGPTJForSequ" - }, - { - "id": 19, - "initial_rank": 19, - "content": "def __init__(\n self,\n model_path: Optional[str] = None,\n engine_name: Optional[str] = None,\n tokenizer_dir: Optional[str] = None,\n temperature: float = 0.1,\n max_new_tokens: int = DEFAULT_NUM_OUTPUTS,\n context_window: int = DEFAULT_CONTEXT_WINDOW,\n messages_to_prompt: Optional[Callable] = None,\n completion_to_prompt: Optional[Callable] = None,\n callback_manager: Optional[CallbackManager] = None,\n generate_kwargs: Optional[Dict[str, Any]] = None,\n model_kwargs: Optional[Dict[str, Any]] = None,\n verbose: bool = False,\n ) -> None:\n try:\n import tensorrt_llm\n from tensorrt_llm.runtime import ModelConfig, SamplingConfig\n except ImportError:\n print(\n \"Unable to import `tensorrt_llm` module. Please ensure you have\\\n `tensorrt_llm` installed in your environment. You can run\\\n `pip3 install tensorrt_llm -U --extra-index-url https:\/\/pypi.nvidia.com` to install.\"\n )\n\n model_kwargs = model_kwargs or {}\n model_kwargs.update({\"n_ctx\": context_window, \"verbose\": verbose})\n max_new_tokens = max_new_tokens\n verbose = verbose\n # check if model is cached\n if model_path is not None:\n if not os.path.exists(model_path):\n raise ValueError(\n \"Provided model path does not exist. \"\n \"Please check the path or provide a model_url to download.\"\n )\n else:\n engine_dir = model_path\n engine_dir_path = Path(engine_dir)\n config_path = engine_dir_path \/ \"config.json\"\n\n # config function\n with open(config_path) as f:\n config = json.load(f)\n use_gpt_attention_plugin = config[\"plugin_config\"][\n \"gpt_attention_plugin\"\n ]\n remove_input_padding = config[\"plugin_config\"][\"remove_input_padding\"]\n tp_size = config[\"builder_config\"][\"tensor_parallel\"]\n pp_size = 1\n if \"pipeline_parallel\" in config[\"builder_config\"]:\n pp_size = config[\"builder_config\"][\"pipeline_parallel\"]\n world_size = tp_size * pp_size\n assert (\n world_size == tensorrt_llm.mpi_world_size()\n ), f\"Engine world size ({world_size}) != Runtime world size ({tensorrt_llm.mpi_world_size()})\"\n num_heads = config[\"builder_config\"][\"num_heads\"] \/\/ tp_size\n hidden_size = config[\"builder_config\"][\"hidden_size\"] \/\/ tp_size\n vocab_size = config[\"builder_config\"][\"vocab_size\"]\n num_layers = config[\"builder_config\"][\"num_layers\"]\n num_kv_heads = config[\"builder_config\"].get(\"num_kv_heads\", num_heads)\n paged_kv_cache = config[\"plugin_config\"][\"paged_kv_cache\"]\n if config[\"builder_config\"].get(\"multi_query_mode\", False):\n tensorrt_llm.logger.warning(\n \"`multi_query_mode` config is deprecated. Please rebuild the engine.\"\n )\n num_kv_heads = 1\n num_kv_heads = (num_kv_heads + tp_size - 1) \/\/ tp_size\n\n model_config = ModelConfig(\n num_heads=num_heads,\n num_kv_heads=num_kv_heads,\n hidden_size=hidden_size,\n vocab_size=vocab_size,\n num_layers=num_layers,\n gpt_attention_plugin=use_gpt_attention_plugin,\n paged_kv_cache=paged_kv_cache,\n remove_input_padding=remove_input_padding,\n max_batch_size=config[\"builder_config\"][\"max_batch_size\"],\n )\n\n assert (\n pp_size == 1\n ), \"Python runtime does not support pipeline parallelism\"\n world_size = tp_size * pp_size\n\n runtime_rank = tensorrt_llm.mpi_rank()\n runtime_mapping = tensorrt_llm.Mapping(\n world_size, runtime_rank, tp_size=tp_size, pp_size=pp_size\n )\n\n # TensorRT-LLM must run on a GPU.\n assert (\n torch.cuda.is_available()\n ), \"LocalTensorRTLLM requires a Nvidia CUDA enabled GPU to operate\"\n torch.cuda.set_device(runtime_rank % runtime_mapping.gpus_per_node)\n tokenizer = AutoTokenizer.from_pretrained(tokenizer_dir, legacy=False)\n sampling_config = SamplingConfig(\n end_id=EOS_TOKEN,\n pad_id=PAD_TOKEN,\n num_beams=1,\n temperature=temperature,\n )\n\n serialize_path = engine_dir_path \/ (engine_name if engine_name else \"\")\n with open(serialize_path, \"rb\") as f:\n engine_buffer = f.read()\n decoder = tensorrt_llm.runtime.GenerationSession(\n model_config, engine_buffer, runtime_mapping, debug_mode=False\n )\n model = decoder\n\n generate_kwargs = generate_kwargs or {}\n generate_kwargs.update(\n {\"temperature\": temperature, \"max_tokens\": max_new_tokens}\n )\n\n super().__init__(\n model_path=model_path,\n temperature=temperature,\n context_window=context_window,\n max_new_tokens=max_new_tokens,\n messages_to_prompt=messages_to_prompt,\n completion_to_prompt=completion_to_prompt,\n callback_manager=callback_manager,\n generate_kwargs=generate_kwargs,\n model_kwargs=model_kwargs,\n verbose=verbose,\n )\n self._model = model\n self._model_config = model_config\n self._tokenizer = tokenizer\n self._sampling_config = sampling_config\n self._max_new_tokens = max_new_tokens\n self._verbose = verbose\n\n @classmethod\n def class_name(cls) -> str:\n \"\"\"Get class name.\"\"\"\n return \"LocalTensorRTLLM\"\n\n @property\n def metadata(self) -> LLMMetadata:\n \"\"\"LLM metadata.\"\"\"\n return LLMMetadata(\n context_window=self.context_window,\n num_output=self.max_new_tokens,\n model_name=self.model_path,\n )\n\n @llm_chat_callback()\n def chat(self, messages: Sequence[ChatMessage], **kwargs: Any) -> ChatResponse:\n prompt = self.messages_to_prompt(messages)\n completion_response = self.complete(prompt, formatted=True, **kwargs)\n return completion_response_to_chat_response(completion_response)\n\n @llm_completion_callback()\n def complete(\n self, prompt: str, formatted: bool = False, **kwargs: Any\n ) -> CompletionResponse:\n try:\n import torch\n except ImportError:\n raise ImportError(\"nvidia_tensorrt requires `pip install torch`.\")\n\n self.generate_kwargs.update({\"stream\": False})\n\n if not formatted:\n prompt = self.completion_to_prompt(prompt)\n\n input_text = prompt\n input_ids, input_lengths = parse_input(\n input_text, self._tokenizer, EOS_TOKEN, self._model_config\n )\n\n max_input_length = torch.max(input_lengths).item()\n self._model.setup(\n input_lengths.size(0), max_input_length, self._max_new_tokens, 1\n ) # beam size is set to 1\n if self._verbose:\n start_time = time.time()\n\n output_ids = self._model.decode(input_ids, input_lengths, self._sampling_config)\n torch.cuda.synchronize()\n\n elapsed_time = -1.0\n if self._verbose:\n end_time = time.time()\n elapsed_time = end_time - start_time\n\n output_txt, output_token_ids = get_output(\n output_ids, input_lengths, self._max_new_tokens, self._tokenizer\n )\n\n if self._verbose:\n print(f\"Input context length : {input_ids.shape[1]}\")\n print(f\"Inference time : {elapsed_time:.2f} seconds\")\n print(f\"Output context length : {len(output_token_ids)} \")\n print(\n f\"Inference token\/sec : {(len(output_token_ids) \/ elapsed_time):2f}\"\n )\n\n # call garbage collected after inference\n torch.cuda.empty_cache()\n gc.collect()\n\n return CompletionResponse(\n text=output_txt,\n raw=generate_completion_dict(output_txt, self._model, self.model_path),\n )\n\n @llm_completion_callback()\n def stream_complete(self, prompt: str, **kwargs: Any) -> CompletionResponse:\n raise NotImplementedError(\n \"Nvidia TensorRT-LLM does not currently support streaming completion.\"\n )" - }, - { - "id": 20, - "initial_rank": 20, - "content": "a new compile function that doesn't require any modification to existing PyTorch code but can \noptimize your code by adding a single line of code: `model = torch.compile(model)`.\n\nIf using [`Trainer`], you only need `to` pass the `torch_compile` option in the [`TrainingArguments`]: \n\n```python\ntraining_args = TrainingArguments(torch_compile=True, **default_args)\n```\n\n`torch.compile` uses Python's frame evaluation API to automatically create a graph from existing PyTorch programs. After \ncapturing the graph, different backends can be deployed to lower the graph to an optimized engine. \nYou can find more details and benchmarks in [PyTorch documentation](https:\/\/pytorch.org\/get-started\/pytorch-2.0\/).\n\n`torch.compile` has a growing list of backends, which can be found in by calling `torchdynamo.list_backends()`, each of which with its optional dependencies.\n\nChoose which backend to use by specifying it via `torch_compile_backend` in the [`TrainingArguments`]. Some of the most commonly used backends are:\n\n**Debugging backends**:\n* `dynamo.optimize(\"eager\")` - Uses PyTorch to run the extracted GraphModule. This is quite useful in debugging TorchDynamo issues.\n* `dynamo.optimize(\"aot_eager\")` - Uses AotAutograd with no compiler, i.e, just using PyTorch eager for the AotAutograd's extracted forward and backward graphs. This is useful for debugging, and unlikely to give speedups.\n\n**Training & inference backends**:\n* `dynamo.optimize(\"inductor\")` - Uses TorchInductor backend with AotAutograd and cudagraphs by leveraging codegened Triton kernels [Read more](https:\/\/dev-discuss.pytorch.org\/t\/torchinductor-a-pytorch-native-compiler-with-define-by-run-ir-and-symbolic-shapes\/747)\n* `dynamo.optimize(\"nvfuser\")` - nvFuser with TorchScript. [Read more](https:\/\/dev-discuss.pytorch.org\/t\/tracing-with-primitives-update-1-nvfuser-and-its-primitives\/593)\n* `dynamo.optimize(\"aot_nvfuser\")` - nvFuser with AotAutograd. [Read more](https:\/\/dev-discuss.pytorch.org\/t\/tracing-with-primitives-update-1-nvfuser-and-its-primitives\/593)\n* `dynamo.optimize(\"aot_cudagraphs\")` - cudagraphs with AotAutograd. [Read more](https:\/\/github.com\/pytorch\/torchdynamo\/pull\/757)\n\n**Inference-only backend**s:\n* `dynamo.optimize(\"ofi\")` - Uses TorchScript optimize_for_inference. [Read more](https:\/\/pytorch.org\/docs\/stable\/generated\/torch.jit.optimize_for_inference.html)\n* `dynamo.optimize(\"fx2trt\")` - Uses NVIDIA TensorRT for inference optimizations. [Read more](https:\/\/pytorch.org\/TensorRT\/tutorials\/getting_started_with_fx_path.html)\n* `dynamo.optimize(\"onnxrt\")` - Uses ONNXRT for inference on CPU\/GPU. [Read more](https:\/\/onnxruntime.ai\/)\n* `dynamo.optimize(\"ipex\")` - Uses IPEX for inference on CPU. [Read more](https:\/\/github.com\/intel\/intel-extension-for-pytorch)\n\nFor an example of using `torch.compile` with 🤗 Transformers, check out this [blog post on fine-tuning a BERT model for Text Classification using the newest PyTorch 2.0 features](https:\/\/www.philschmid.de\/getting-started-pytorch-2-0-transformers)\n\n## Using 🤗 PEFT\n\n[Parameter-Efficient Fine Tuning (PEFT)](https:\/\/huggingface.co\/blog\/peft) methods freeze the pretrained model parameters during fine-tuning and add a small number of trainable parameters (the adapters) on top of it.\n\nAs a result the [memory associated to the optimizer states and gradients](https:\/\/huggingface.co\/docs\/transformers\/model_memory_anatomy#anatomy-of-models-memory) are greatly reduced.\n\nFor example with a vanilla AdamW, the memory requirement for the optimizer state would be:\n* fp32 copy of parameters: 4 bytes\/param\n* Momentum: 4 bytes\/param\n* Variance: 4 bytes\/param\n\nSuppose a model with 7B parameters and 200 million parameters injected with [Low Rank Adapters](https:\/\/huggingface.co\/docs\/peft\/conceptual_guides\/lora).\n\nThe memory requirement for the optimizer state of the plain model would be 12 * 7 = 84 GB (assuming 7B trainable parameters).\n\nAdding Lora increases slightly the memory associated to the model weights and substantially decreases memory requirement for the optimizer state to 12 * 0.2 = 2.4GB.\n\nRead more about PEFT and its detailed usage in [the PEFT documentation](https:\/\/huggingface.co\/docs\/peft\/) or [PEFT repository](https:\/\/github.com\/huggingface\/peft).\n\n## Using 🤗 Accelerate\n\nWith [🤗 Accelerate](https:\/\/huggingface.co\/docs\/accelerate\/index) you can use the above methods while gaining full \ncontrol over the training loop and can essentially write the loop in pure PyTorch with some minor modifications. \n\nSuppose you have combined the methods in the [`TrainingArguments`] like so:\n\n```py\ntraining_args = TrainingArguments(\n per_device_train_batch_size=1,\n gradient_accumulation_steps=4,\n gradient_checkpointing=True,\n fp16=True,\n **default_args,\n)\n```\n\nThe full example training loop with 🤗 Accelerate is only a handful of lines of code long:\n\n```py\nfrom accelerate import Accelerator\nfrom torch.utils.data.dataloader import DataLoader\n\ndataloader = DataLoader(ds, batch_size=training_args.per_device_train_batch_size)\n\nif training_args.gradient_checkpointing:\n model.gradient_checkpointing_enable()\n\naccelerator = Accelerator(fp16=training_args.fp16)\nmodel, optimizer, dataloader = accelerator.prepare(model, adam_bnb_optim, dataloader)\n\nmodel.train()\nfor step, batch in enumerate(dataloader, start=1):\n loss = model(**batch).loss\n loss = loss \/ training_args.gradient_accumulation_steps\n accelerator.backward(loss)\n if step % training_args.gradient_accumulation_steps == 0:\n optimizer.step()\n optimizer.zero_grad()\n```\n\nFirst we wrap the dataset in a [`DataLoader`](https:\/\/pytorch.org\/docs\/stable\/data.html#torch.utils.data.DataLoader). \nThen we can enable gradient checkpointing by calling the model's [`~PreTrainedModel.gradient_checkpointing_enable`] method. \nWhen we initialize the [`Accelerator`](https:\/\/huggingface.co\/docs\/accelerate\/package_reference\/accelerator#accelerate.Accelerator) \nwe can specify if we want to use mixed precision training and it will take care of it for us in the [`prepare`] call. \nDuring the [`prepare`](https:\/\/huggingface.co\/docs\/accelerate\/package_reference\/accelerator#accelerate.Accelerator.prepare) \ncall the dataloader will also be distributed across workers should we use multiple GPUs. We use the same [8-bit optimizer](#8-bit-adam) from the earlier example.\n\nFinally, we can add the main training loop. Note that the `backward` call is handled by 🤗 Accelerate. We can also see\nhow gradient accumulation works: we normalize the loss, so we get the average at the end of accumulation and once we have \nenough steps we run the optimization. \n\nImplementing these optimization techniques with 🤗 Accelerate only takes a handful of lines of code and comes with the \nbenefit of more flexibility in the training loop. For a full documentation of all features have a look at the \n[Accelerate documentation](https:\/\/huggingface.co\/docs\/accelerate\/index).\n\n\n## Efficient Software Prebuilds\n\nPyTorch's [pip and conda builds](ht" - }, - { - "id": 21, - "initial_rank": 21, - "content": "uantization\n\nQuantization reduces the size of the LLM weights by storing them in a lower precision. This translates to lower memory usage and makes loading LLMs for inference more accessible if you're constrained by your GPUs memory. If you aren't limited by your GPU, you don't necessarily need to quantize your model because it can incur a small latency cost (except for AWQ and fused AWQ modules) due to the extra step required to quantize and dequantize the weights.\n\n> [!TIP]\n> There are many quantization libraries (see the [Quantization](.\/quantization) guide for more details) available, such as Quanto, AQLM, AWQ, and AutoGPTQ. Feel free to try them out and see which one works best for your use case. We also recommend reading the [Overview of natively supported quantization schemes in 🤗 Transformers](https:\/\/hf.co\/blog\/overview-quantization-transformers) blog post which compares AutoGPTQ and bitsandbytes.\n\nUse the Model Memory Calculator below to estimate and compare how much memory is required to load a model. For example, try estimating how much memory it costs to load [Mistral-7B-v0.1](https:\/\/huggingface.co\/mistralai\/Mistral-7B-v0.1).\n\n<\/iframe>\n\nTo load Mistral-7B-v0.1 in half-precision, set the `torch_dtype` parameter in the [`~transformers.AutoModelForCausalLM.from_pretrained`] method to `torch.bfloat16`. This requires 13.74GB of memory.\n\n```py\nfrom transformers import AutoTokenizer, AutoModelForCausalLM\nimport torch\n\nmodel = AutoModelForCausalLM.from_pretrained(\n \"mistralai\/Mistral-7B-v0.1\", torch_dtype=torch.bfloat16, device_map=\"auto\",\n)\n```\n\nTo load a quantized model (8-bit or 4-bit) for inference, try [bitsandbytes](https:\/\/hf.co\/docs\/bitsandbytes) and set the `load_in_4bit` or `load_in_8bit` parameters to `True`. Loading the model in 8-bits only requires 6.87 GB of memory.\n\n```py\nfrom transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig\nimport torch\n\nquant_config = BitsAndBytesConfig(load_in_8bit=True)\nmodel = AutoModelForCausalLM.from_pretrained(\n \"mistralai\/Mistral-7B-v0.1\", quantization_config=quant_config, device_map=\"auto\"\n)\n```\n" - }, - { - "id": 22, - "initial_rank": 22, - "content": "class T5ModelTest(ModelTesterMixin, GenerationTesterMixin, PipelineTesterMixin, unittest.TestCase):\n all_model_classes = (\n (T5Model, T5ForConditionalGeneration, T5ForSequenceClassification, T5ForQuestionAnswering)\n if is_torch_available()\n else ()\n )\n all_generative_model_classes = (T5ForConditionalGeneration,) if is_torch_available() else ()\n pipeline_model_mapping = (\n {\n \"feature-extraction\": T5Model,\n \"question-answering\": T5ForQuestionAnswering,\n \"summarization\": T5ForConditionalGeneration,\n \"text-classification\": T5ForSequenceClassification,\n \"text2text-generation\": T5ForConditionalGeneration,\n \"translation\": T5ForConditionalGeneration,\n \"zero-shot\": T5ForSequenceClassification,\n }\n if is_torch_available()\n else {}\n )\n all_parallelizable_model_classes = (T5Model, T5ForConditionalGeneration) if is_torch_available() else ()\n fx_compatible = True\n test_pruning = False\n test_resize_embeddings = True\n test_model_parallel = True\n is_encoder_decoder = True\n # The small T5 model needs higher percentages for CPU\/MP tests\n model_split_percents = [0.5, 0.8, 0.9]\n\n def setUp(self):\n self.model_tester = T5ModelTester(self)\n self.config_tester = ConfigTester(self, config_class=T5Config, d_model=37)\n\n # `QAPipelineTests` is not working well with slow tokenizers (for some models) and we don't want to touch the file\n # `src\/transformers\/data\/processors\/squad.py` (where this test fails for this model)\n def is_pipeline_test_to_skip(\n self, pipeline_test_case_name, config_class, model_architecture, tokenizer_name, processor_name\n ):\n if tokenizer_name is None:\n return True\n if pipeline_test_case_name == \"QAPipelineTests\" and not tokenizer_name.endswith(\"Fast\"):\n return True\n\n return False\n\n def _create_and_check_torch_fx_tracing(self, config, inputs_dict, output_loss=False):\n if not is_torch_fx_available() or not self.fx_compatible:\n self.skipTest(reason=\"torch.fx is not available or not compatible with this model\")\n\n configs_no_init = _config_zero_init(config) # To be sure we have no Nan\n configs_no_init.return_dict = False\n\n for model_class in self.all_model_classes:\n if model_class.__name__ == \"T5ForSequenceClassification\":\n continue\n model = model_class(config=configs_no_init)\n model.to(torch_device)\n model.eval()\n inputs = self._prepare_for_class(inputs_dict, model_class, return_labels=output_loss)\n\n try:\n if model.config.is_encoder_decoder:\n model.config.use_cache = False # FSTM still requires this hack -> FSTM should probably be refactored similar to BART afterward\n labels = inputs.get(\"labels\", None)\n input_names = [\n \"attention_mask\",\n \"decoder_attention_mask\",\n \"decoder_input_ids\",\n \"input_features\",\n \"input_ids\",\n \"input_values\",\n ]\n if labels is not None:\n input_names.append(\"labels\")\n\n filtered_inputs = {k: v for (k, v) in inputs.items() if k in input_names}\n input_names = list(filtered_inputs.keys())\n\n model_output = model(**filtered_inputs)\n\n traced_model = symbolic_trace(model, input_names)\n traced_output = traced_model(**filtered_inputs)\n else:\n input_names = [\n \"attention_mask\",\n \"bbox\",\n \"input_features\",\n \"input_ids\",\n \"input_values\",\n \"pixel_values\",\n \"token_type_ids\",\n \"visual_feats\",\n \"visual_pos\",\n ]\n\n labels = inputs.get(\"labels\", None)\n start_positions = inputs.get(\"start_positions\", None)\n end_positions = inputs.get(\"end_positions\", None)\n if labels is not None:\n input_names.append(\"labels\")\n if start_positions is not None:\n input_names.append(\"start_positions\")\n if end_positions is not None:\n input_names.append(\"end_positions\")\n\n filtered_inputs = {k: v for (k, v) in inputs.items() if k in input_names}\n input_names = list(filtered_inputs.keys())\n\n if model.__class__.__name__ in set(MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING_NAMES.values()) and (\n not hasattr(model.config, \"problem_type\") or model.config.problem_type is None\n ):\n model.config.problem_type = \"single_label_classification\"\n\n traced_model = symbolic_trace(model, input_names)\n traced_output = traced_model(**filtered_inputs)\n model_output = model(**filtered_inputs)\n\n except Exception as e:\n self.fail(f\"Couldn't trace module: {e}\")\n\n def flatten_output(output):\n flatten = []\n for x in output:\n if isinstance(x, (tuple, list)):\n flatten += flatten_output(x)\n elif not isinstance(x, torch.Tensor):\n continue\n else:\n flatten.append(x)\n return flatten\n\n model_output = flatten_output(model_output)\n traced_output = flatten_output(traced_output)\n num_outputs = len(model_output)\n\n for i in range(num_outputs):\n self.assertTrue(\n torch.allclose(model_output[i], traced_output[i]),\n f\"traced {i}th output doesn't match model {i}th output for {model_class}\",\n )\n\n # Test that the model can be serialized and restored properly\n with tempfile.TemporaryDirectory() as tmp_dir_name:\n pkl_file_name = os.path.join(tmp_dir_name, \"model.pkl\")\n try:\n with open(pkl_file_name, \"wb\") as f:\n pickle.dump(traced_model, f)\n with open(pkl_file_name, \"rb\") as f:\n loaded = pickle.load(f)\n except Exception as e:\n self.fail(f\"Couldn't serialize \/ deserialize the traced model: {e}\")\n\n loaded_output = loaded(**filtered_inputs)\n loaded_output = flatten_output(loaded_output)\n\n for i in range(num_outputs):\n self.assertTrue(\n torch.allclose(model_output[i], loaded_output[i]),\n f\"serialized model {i}th output doesn't match model {i}th output for {model_class}\",\n )\n\n # Avoid memory leak. Without this, each call increase RAM usage by ~20MB.\n # (Even with this call, there are still memory leak by ~0.04MB)\n self.clear_torch_jit_class_registry()\n\n def test_config(self):\n self.config_tester.run_common_tests()\n\n def test_shift_right(self):\n config_and_inputs = self.model_tester.prepare_config_and_inputs()\n self.model_tester.check_prepare_lm_labels_via_shift_left(*config_and_inputs)\n\n def test_model(self):\n config_and_inputs = self.model_tester.prepare_config_and_inputs()\n self.model_tester.create_and_check_model(*config_and_inputs)\n\n def test_model_v1_1(self):\n config_and_inputs = self.model_tester.prepare_config_and_inputs()\n # check that gated gelu feed forward and different word embeddings work\n config = config_and_inputs[0]\n config.tie_word_embeddings = False\n config.feed_forward_proj = \"gated-gelu\"\n self.model_tester.create_and_check_model(config, *config_and_inputs[1:])\n\n # T5ForSequenceClassification does not support inputs_embeds\n def test_inputs_embeds(self):\n config, inputs_dict = self.model_tester.prepare_config_and_inputs_for_common()\n\n for model_class in (T5Model, T5ForConditionalGeneration, T5ForQuestionAnswering):\n model = model_class(config)\n model.to(torch_device)\n model.eval()\n\n inputs = copy.deepcopy(self._prepare_for_class(inputs_dict, model_class))\n\n if not self.is_encoder_decoder:\n input_ids = inputs[\"input_ids\"]\n del inputs[\"input_ids\"]\n else:\n encoder_input_ids = inputs[\"input_ids\"]\n decoder_input_ids = inputs.get(\"decoder_input_ids\", encoder_input_ids)\n del inputs[\"input_ids\"]\n inputs.pop(\"decoder_input_ids\", None)\n\n wte = model.get_input_embeddings()\n if not self.is_encoder_decoder:\n inputs[\"inputs_embeds\"] = wte(input_ids)\n else:\n inputs[\"inputs_embeds\"] = wte(encoder_input_ids)\n inputs[\"decoder_inputs_embeds\"] = wte(decoder_input_ids)\n\n with torch.no_grad():\n model(**inputs)[0]\n\n def test_config_and_model_silu_gated(self):\n config_and_inputs = self.model_tester.prepare_config_and_inputs()\n config = config_and_inputs[0]\n config.feed_forward_proj = \"gated-silu\"\n self.model_tester.create_and_check_model(*config_and_inputs)\n\n def test_with_lm_head(self):\n config_and_inputs = self.model_tester.prepare_config_and_inputs()\n self.model_tester.create_and_check_with_lm_head(*config_and_inputs)\n\n def test_with_sequence_classification_head(self):\n config_and_inputs = self.model_tester.prepare_config_and_inputs()\n self.model_tester.create_and_check_with_sequence_classification_head(*config_and_inputs)" - }, - { - "id": 23, - "initial_rank": 23, - "content": "e, temperature=0.7)\nprint(tokenizer.batch_decode(outputs, skip_special_tokens=True))\n[\"The second law of thermodynamics states that energy cannot be created nor destroyed. It's not a\"]\n```\n\n<\/hfoption>\n<\/hfoptions>\n\n## 어텐션 최적화 [[attention-optimizations]]\n\n트랜스포머 모델의 알려진 문제는 셀프 어텐션 메커니즘이 입력 토큰 수와 함께 계산 및 메모리가 제곱으로 증가한다는 것입니다. 이 제한은 훨씬 더 긴 시퀀스를 처리하는 LLM에서는 더욱 커집니다. 이를 해결하기 위해 FlashAttention2 또는 PyTorch의 스케일된 점곱 어텐션을 사용해 보십시오. 이들은 더 메모리 효율적인 어텐션 구현으로 추론을 가속화할 수 있습니다.\n\n### FlashAttention-2 [[flashattention-2]]\n\nFlashAttention과 [FlashAttention-2](.\/perf_infer_gpu_one#flashattention-2)는 어텐션 계산을 더 작은 청크로 나누고 중간 읽기\/쓰기 작업을 줄여 추론 속도를 높입니다. FlashAttention-2는 원래 FlashAttention 알고리즘을 개선하여 시퀀스 길이 차원에서도 병렬 처리를 수행하고 하드웨어에서 작업을 더 잘 분할하여 동기화 및 통신 오버헤드를 줄입니다.\n\nFlashAttention-2를 사용하려면 [`~PreTrainedModel.from_pretrained`] 메서드에서 `attn_implementation=\"flash_attention_2\"`를 설정하십시오.\n\n```py\nfrom transformers import AutoModelForCausalLM, BitsAndBytesConfig\n\nquant_config = BitsAndBytesConfig(load_in_8bit=True)\nmodel = AutoModelForCausalLM.from_pretrained(\n \"google\/gemma-2b\",\n quantization_config=quant_config,\n torch_dtype=torch.bfloat16,\n attn_implementation=\"flash_attention_2\",\n)\n```\n\n### PyTorch 스케일된 점곱 어텐션(scaled dot product attention) [[pytorch-scaled-dot-product-attention]]\n\n스케일된 점곱 어텐션(SDPA)는 PyTorch 2.0에서 자동으로 활성화되며, FlashAttention, xFormers, PyTorch의 C++ 구현을 지원합니다. SDPA는 CUDA 백엔드를 사용하는 경우 가장 성능이 좋은 어텐션 알고리즘을 선택합니다. 다른 백엔드에서는 SDPA가 PyTorch C++ 구현으로 기본 설정됩니다.\n\n> [!TIP]\n> SDPA는 최신 PyTorch 버전이 설치되어 있으면 FlashAttention-2도 지원합니다.\n\n세 가지 어텐션 알고리즘 중 하나를 명시적으로 활성화하거나 비활성화하려면 [torch.backends.cuda.sdp_kernel](https:\/\/pytorch.org\/docs\/master\/generated\/torch.nn.functional.scaled_dot_product_attention.html) 컨텍스트 관리자를 사용하십시오. 예를 들어 FlashAttention을 활성화하려면 `enable_flash=True`로 설정하십시오.\n\n```py\nimport torch\nfrom transformers import AutoModelForCausalLM\n\nmodel = AutoModelForCausalLM.from_pretrained(\n \"google\/gemma-2b\",\n torch_dtype=torch.bfloat16,\n)\n\nwith torch.backends.cuda.sdp_kernel(enable_flash=True, enable_math=False, enable_mem_efficient=False):\n outputs = model.generate(**inputs)\n```\n\n## 양자화 [[quantization]]\n\n양자화는 LLM 가중치를 더 낮은 정밀도로 저장하여 크기를 줄입니다. 이는 메모리 사용량을 줄이며 GPU 메모리에 제약이 있는 경우 추론을 위해 LLM을 로드하는 것을 더 용이하게 합니다. GPU가 충분하다면, 모델을 양자화할 필요는 없습니다. 추가적인 양자화 및 양자화 해제 단계로 인해 약간의 지연이 발생할 수 있기 때문입니다(AWQ 및 융합 AWQ 모듈 제외).\n\n> [!TIP]\n> 다양한 양자화 라이브러리(자세한 내용은 [Quantization](.\/quantization) 가이드를 참조하십시오)가 있습니다. 여기에는 Quanto, AQLM, AWQ 및 AutoGPTQ가 포함됩니다. 사용 사례에 가장 잘 맞는 라이브러리를 사용해 보십시오. 또한 AutoGPTQ와 bitsandbytes를 비교하는 [Overview of natively supported quantization schemes in 🤗 Transformers](https:\/\/hf.co\/blog\/overview-quantization-transformers) 블로그 게시물을 읽어보는 것을 추천합니다.\n\n아래의 모델 메모리 계산기를 사용하여 모델을 로드하는 데 필요한 메모리를 추정하고 비교해 보십시오. 예를 들어 [Mistral-7B-v0.1](https:\/\/huggingface.co\/mistralai\/Mistral-7B-v0.1)를 로드하는 데 필요한 메모리를 추정해 보십시오.\n\n<\/iframe>\n\nMistral-7B-v0.1을 반정밀도로 로드하려면 [`~transformers.AutoModelForCausalLM.from_pretrained`] 메서드에서 `torch_dtype` 매개변수를 `torch.bfloat16`으로 설정하십시오. 이 경우 13.74GB의 메모리가 필요합니다.\n\n```py\nfrom transformers import AutoTokenizer, AutoModelForCausalLM\nimport torch\n\nmodel = AutoModelForCausalLM.from_pretrained(\n \"mistralai\/Mistral-7B-v0.1\", torch_dtype=torch.bfloat16, device_map=\"auto\",\n)\n```\n\n추론을 위해 양자화된 모델(8비트 또는 4비트)을 로드하려면 [bitsandbytes](https:\/\/hf.co\/docs\/bitsandbytes)를 사용하고 `load_in_4bit` 또는 `load_in_8bit` 매개변수를 `True`로 설정하십시오. 모델을 8비트로 로드하는 데는 6.87GB의 메모리만 필요합니다.\n\n```py\nfrom transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig\nimport torch\n\nquant_config = BitsAndBytesConfig(load_in_8bit=True)\nmodel = AutoModelForCausalLM.from_pretrained(\n \"mistralai\/Mistral-7B-v0.1\", quantization_config=quant_config, device_map=\"auto\"\n)\n```\n" - }, - { - "id": 24, - "initial_rank": 24, - "content": "all_model_classes = (\n (MT5Model, MT5ForConditionalGeneration, MT5ForSequenceClassification, MT5ForQuestionAnswering)\n if is_torch_available()\n else ()\n )\n all_generative_model_classes = (MT5ForConditionalGeneration,) if is_torch_available() else ()\n pipeline_model_mapping = (\n {\n \"feature-extraction\": MT5Model,\n \"question-answering\": MT5ForQuestionAnswering,\n \"summarization\": MT5ForConditionalGeneration,\n \"text-classification\": MT5ForSequenceClassification,\n \"text2text-generation\": MT5ForConditionalGeneration,\n \"translation\": MT5ForConditionalGeneration,\n \"zero-shot\": MT5ForSequenceClassification,\n }\n if is_torch_available()\n else {}\n )\n all_parallelizable_model_classes = (MT5Model, MT5ForConditionalGeneration) if is_torch_available() else ()\n fx_compatible = True\n test_pruning = False\n test_resize_embeddings = True\n test_model_parallel = True\n is_encoder_decoder = True\n # The small MT5 model needs higher percentages for CPU\/MP tests\n model_split_percents = [0.5, 0.8, 0.9]\n\n def setUp(self):\n self.model_tester = MT5ModelTester(self)\n self.config_tester = ConfigTester(self, config_class=MT5Config, d_model=37)\n\n # `QAPipelineTests` is not working well with slow tokenizers (for some models) and we don't want to touch the file\n # `src\/transformers\/data\/processors\/squad.py` (where this test fails for this model)\n def is_pipeline_test_to_skip(\n self, pipeline_test_case_name, config_class, model_architecture, tokenizer_name, processor_name\n ):\n if tokenizer_name is None:\n return True\n if pipeline_test_case_name == \"QAPipelineTests\" and not tokenizer_name.endswith(\"Fast\"):\n return True\n\n return False\n\n def _create_and_check_torch_fx_tracing(self, config, inputs_dict, output_loss=False):\n if not is_torch_fx_available() or not self.fx_compatible:\n self.skipTest(reason=\"torch.fx is not available or not compatible with this model\")\n\n configs_no_init = _config_zero_init(config) # To be sure we have no Nan\n configs_no_init.return_dict = False\n\n for model_class in self.all_model_classes:\n if model_class.__name__ == \"MT5ForSequenceClassification\":\n continue\n model = model_class(config=configs_no_init)\n model.to(torch_device)\n model.eval()\n inputs = self._prepare_for_class(inputs_dict, model_class, return_labels=output_loss)\n\n try:\n if model.config.is_encoder_decoder:\n model.config.use_cache = False # FSTM still requires this hack -> FSTM should probably be refactored similar to BART afterward\n labels = inputs.get(\"labels\", None)\n input_names = [\n \"attention_mask\",\n \"decoder_attention_mask\",\n \"decoder_input_ids\",\n \"input_features\",\n \"input_ids\",\n \"input_values\",\n ]\n if labels is not None:\n input_names.append(\"labels\")\n\n filtered_inputs = {k: v for (k, v) in inputs.items() if k in input_names}\n input_names = list(filtered_inputs.keys())\n\n model_output = model(**filtered_inputs)\n\n traced_model = symbolic_trace(model, input_names)\n traced_output = traced_model(**filtered_inputs)\n else:\n input_names = [\n \"attention_mask\",\n \"bbox\",\n \"input_features\",\n \"input_ids\",\n \"input_values\",\n \"pixel_values\",\n \"token_type_ids\",\n \"visual_feats\",\n \"visual_pos\",\n ]\n\n labels = inputs.get(\"labels\", None)\n start_positions = inputs.get(\"start_positions\", None)\n end_positions = inputs.get(\"end_positions\", None)\n if labels is not None:\n input_names.append(\"labels\")\n if start_positions is not None:\n input_names.append(\"start_positions\")\n if end_positions is not None:\n input_names.append(\"end_positions\")\n\n filtered_inputs = {k: v for (k, v) in inputs.items() if k in input_names}\n input_names = list(filtered_inputs.keys())\n\n if model.__class__.__name__ in set(MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING_NAMES.values()) and (\n not hasattr(model.config, \"problem_type\") or model.config.problem_type is None\n ):\n model.config.problem_type = \"single_label_classification\"\n\n traced_model = symbolic_trace(model, input_names)\n traced_output = traced_model(**filtered_inputs)\n model_output = model(**filtered_inputs)\n\n except Exception as e:\n self.fail(f\"Couldn't trace module: {e}\")\n\n def flatten_output(output):\n flatten = []\n for x in output:\n if isinstance(x, (tuple, list)):\n flatten += flatten_output(x)\n elif not isinstance(x, torch.Tensor):\n continue\n else:\n flatten.append(x)\n return flatten\n\n model_output = flatten_output(model_output)\n traced_output = flatten_output(traced_output)\n num_outputs = len(model_output)\n\n for i in range(num_outputs):\n self.assertTrue(\n torch.allclose(model_output[i], traced_output[i]),\n f\"traced {i}th output doesn't match model {i}th output for {model_class}\",\n )\n\n # Test that the model can be serialized and restored properly\n with tempfile.TemporaryDirectory() as tmp_dir_name:\n pkl_file_name = os.path.join(tmp_dir_name, \"model.pkl\")\n try:\n with open(pkl_file_name, \"wb\") as f:\n pickle.dump(traced_model, f)\n with open(pkl_file_name, \"rb\") as f:\n loaded = pickle.load(f)\n except Exception as e:\n self.fail(f\"Couldn't serialize \/ deserialize the traced model: {e}\")\n\n loaded_output = loaded(**filtered_inputs)\n loaded_output = flatten_output(loaded_output)\n\n for i in range(num_outputs):\n self.assertTrue(\n torch.allclose(model_output[i], loaded_output[i]),\n f\"serialized model {i}th output doesn't match model {i}th output for {model_class}\",\n )\n\n # Avoid memory leak. Without this, each call increase RAM usage by ~20MB.\n # (Even with this call, there are still memory leak by ~0.04MB)\n self.clear_torch_jit_class_registry()\n\n def test_config(self):\n self.config_tester.run_common_tests()\n\n def test_shift_right(self):\n config_and_inputs = self.model_tester.prepare_config_and_inputs()\n self.model_tester.check_prepare_lm_labels_via_shift_left(*config_and_inputs)\n\n def test_model(self):\n config_and_inputs = self.model_tester.prepare_config_and_inputs()\n self.model_tester.create_and_check_model(*config_and_inputs)\n\n def test_model_v1_1(self):\n config_and_inputs = self.model_tester.prepare_config_and_inputs()\n # check that gated gelu feed forward and different word embeddings work\n config = config_and_inputs[0]\n config.tie_word_embeddings = False\n config.feed_forward_proj = \"gated-gelu\"\n self.model_tester.create_and_check_model(config, *config_and_inputs[1:])\n\n # MT5ForSequenceClassification does not support inputs_embeds\n def test_inputs_embeds(self):\n config, inputs_dict = self.model_tester.prepare_config_and_inputs_for_common()\n\n for model_class in (MT5Model, MT5ForConditionalGeneration, MT5ForQuestionAnswering):\n model = model_class(config)\n model.to(torch_device)\n model.eval()\n\n inputs = copy.deepcopy(self._prepare_for_class(inputs_dict, model_class))\n\n if not self.is_encoder_decoder:\n input_ids = inputs[\"input_ids\"]\n del inputs[\"input_ids\"]\n else:\n encoder_input_ids = inputs[\"input_ids\"]\n decoder_input_ids = inputs.get(\"decoder_input_ids\", encoder_input_ids)\n del inputs[\"input_ids\"]\n inputs.pop(\"decoder_input_ids\", None)\n\n wte = model.get_input_embeddings()\n if not self.is_encoder_decoder:\n inputs[\"inputs_embeds\"] = wte(input_ids)\n else:\n inputs[\"inputs_embeds\"] = wte(encoder_input_ids)\n inputs[\"decoder_inputs_embeds\"] = wte(decoder_input_ids)\n\n with torch.no_grad():\n model(**inputs)[0]\n\n def test_config_and_model_silu_gated(self):\n config_and_inputs = self.model_tester.prepare_config_and_inputs()\n config = config_and_inputs[0]\n config.feed_forward_proj = \"gated-silu\"\n self.model_tester.create_and_check_model(*config_and_inputs)\n\n def test_with_lm_head(self):\n config_and_inputs = self.model_tester.prepare_config_and_inputs()\n self.model_tester.create_and_check_with_lm_head(*config_and_inputs)\n\n def test_with_sequence_classification_head(self):\n config_and_inputs = self.model_tester.prepare_config_and_inputs()\n self.model_tester.create_and_check_with_sequence_classification_head(*config_and_inputs)" - }, - { - "id": 25, - "initial_rank": 25, - "content": "# coding=utf-8\n# Copyright 2018 the HuggingFace Inc. team.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n# http:\/\/www.apache.org\/licenses\/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an \"AS IS\" BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n\nimport dataclasses\nimport gc\nimport importlib\nimport json\nimport math\nimport os\nimport random\nimport re\nimport subprocess\nimport sys\nimport tempfile\nimport unittest\nfrom functools import partial\nfrom itertools import product\nfrom pathlib import Path\nfrom typing import Dict, List\nfrom unittest.mock import Mock, patch\n\nimport numpy as np\nfrom huggingface_hub import HfFolder, ModelCard, create_branch, delete_repo, list_repo_commits, list_repo_files\nfrom packaging import version\nfrom parameterized import parameterized\nfrom requests.exceptions import HTTPError\n\nfrom transformers import (\n AutoFeatureExtractor,\n AutoImageProcessor,\n AutoProcessor,\n AutoTokenizer,\n IntervalStrategy,\n PretrainedConfig,\n TrainerCallback,\n TrainingArguments,\n get_polynomial_decay_schedule_with_warmup,\n is_torch_available,\n logging,\n)\nfrom transformers.hyperparameter_search import ALL_HYPERPARAMETER_SEARCH_BACKENDS\nfrom transformers.testing_utils import (\n ENDPOINT_STAGING,\n TOKEN,\n USER,\n CaptureLogger,\n LoggingLevel,\n TestCasePlus,\n backend_device_count,\n execute_subprocess_async,\n get_gpu_count,\n get_tests_dir,\n is_staging_test,\n require_accelerate,\n require_bitsandbytes,\n require_deepspeed,\n require_galore_torch,\n require_grokadamw,\n require_intel_extension_for_pytorch,\n require_liger_kernel,\n require_lomo,\n require_non_xpu,\n require_optuna,\n require_peft,\n require_ray,\n require_safetensors,\n require_schedulefree,\n require_sentencepiece,\n require_sigopt,\n require_tensorboard,\n require_tokenizers,\n require_torch,\n require_torch_accelerator,\n require_torch_bf16,\n require_torch_gpu,\n require_torch_multi_accelerator,\n require_torch_non_multi_accelerator,\n require_torch_non_multi_gpu,\n require_torch_tensorrt_fx,\n require_torch_tf32,\n require_torch_up_to_2_accelerators,\n require_torchdynamo,\n require_wandb,\n slow,\n torch_device,\n)\nfrom transformers.trainer_utils import PREFIX_CHECKPOINT_DIR, HPSearchBackend, check_target_module_exists\nfrom transformers.training_args import OptimizerNames\nfrom transformers.utils import (\n SAFE_WEIGHTS_INDEX_NAME,\n SAFE_WEIGHTS_NAME,\n WEIGHTS_INDEX_NAME,\n WEIGHTS_NAME,\n is_accelerate_available,\n is_apex_available,\n is_bitsandbytes_available,\n is_safetensors_available,\n is_torchao_available,\n is_torchdistx_available,\n)\nfrom transformers.utils.hp_naming import TrialShortNamer\n\n\nif is_torch_available():\n import torch\n from torch import nn\n from torch.utils.data import IterableDataset\n\n import transformers.optimization\n from transformers import (\n AutoModelForCausalLM,\n AutoModelForSequenceClassification,\n EarlyStoppingCallback,\n GlueDataset,\n GlueDataTrainingArguments,\n GPT2Config,\n GPT2LMHeadModel,\n LineByLineTextDataset,\n LlamaConfig,\n LlamaForCausalLM,\n PreTrainedModel,\n Trainer,\n TrainerState,\n )\n from transformers.trainer_pt_utils import AcceleratorConfig\n\n if is_safetensors_available():\n import safetensors.torch\n\n\n# for version specific tests in TrainerIntegrationTest\nrequire_accelerate_version_min_0_28 = partial(require_accelerate, min_version=\"0.28\")\nrequire_accelerate_version_min_0_30 = partial(require_accelerate, min_version=\"0.30\")\nGRAD_ACCUM_KWARGS_VERSION_AVAILABLE = is_accelerate_available(\"0.28\")\nif is_accelerate_available():\n from accelerate import Accelerator\n from accelerate.state import AcceleratorState\n\n\nPATH_SAMPLE_TEXT = f\"{get_tests_dir()}\/fixtures\/sample_text.txt\"\n\n\nclass MockCudaOOMCallback(TrainerCallback):\n \"\"\"\n Simple callback to simulate CUDA OOM error if\n the batch size is >= to `batch_size_limit`.\n \"\"\"\n\n def __init__(self, batch_size_limit=16):\n self.batch_size_limit = batch_size_limit\n\n def on_step_end(self, args, state, control, **kwargs):\n # simulate OOM on the first step\n if state.train_batch_size >= self.batch_size_limit:\n raise RuntimeError(\"CUDA out of memory.\")\n\n\nclass RegressionDataset:\n def __init__(self, a=2, b=3, length=64, seed=42, label_names=None):\n np.random.seed(seed)\n self.label_names = [\"labels\"] if label_names is None else label_names\n self.length = length\n self.x = np.random.normal(size=(length,)).astype(np.float32)\n self.ys = [a * self.x + b + np.random.normal(scale=0.1, size=(length,)) for _ in self.label_names]\n self.ys = [y.astype(np.float32) for y in self.ys]\n\n def __len__(self):\n return self.length\n\n def __getitem__(self, i):\n result = {name: y[i] for name, y in zip(self.label_names, self.ys)}\n result[\"input_x\"] = self.x[i]\n return result\n\n\n# Converting Bytes to Megabytes\ndef bytes2megabytes(x):\n return int(x \/ 2**20)\n\n\n# Copied from acclerate: https:\/\/github.com\/huggingface\/accelerate\/blob\/ee163b66fb7848892519e804688cb4ae981aacbe\/src\/accelerate\/test_utils\/scripts\/external_deps\/test_peak_memory_usage.py#L40C1-L73C68\nclass TorchTracemalloc:\n def __enter__(self):\n gc.collect()\n if torch.cuda.is_available():\n torch.cuda.empty_cache()\n torch.cuda.reset_max_memory_allocated() # reset the peak gauge to zero\n self.begin = torch.cuda.memory_allocated()\n return self\n\n def __exit__(self, *exc):\n gc.collect()\n if torch.cuda.is_available():\n torch.cuda.empty_cache()\n self.end = torch.cuda.memory_allocated()\n self.peak = torch.cuda.max_memory_allocated()\n self.used = bytes2megabytes(self.end - self.begin)\n self.peaked = bytes2megabytes(self.peak - self.begin)\n\n\n@dataclasses.dataclass\nclass RegressionTrainingArguments(TrainingArguments):\n a: float = 0.0\n b: float = 0.0\n keep_report_to: bool = False\n\n def __post_init__(self):\n super().__post_init__()\n # save resources not dealing with reporting unless specified (also avoids the warning when it's not set)\n # can be explicitly disabled via `keep_report_to`\n if not self.keep_report_to:\n self.report_to = []\n\n\nclass RepeatDataset:\n def __init__(self, x, length=64):\n self.x = x\n self.length = length\n\n def __len__(self):\n return self.length\n\n def __getitem__(self, i):\n return {\"input_ids\": self.x, \"labels\": self.x}\n\n\nclass DynamicShapesDataset:\n def __init__(self, length=64, seed=42, batch_size=8):\n self.length = length\n np.random.seed(seed)\n sizes = np.random.randint(1, 20, (length \/\/ batch_size,))\n # For easy batching, we make every batch_size consecutive samples the same size.\n self.xs = [np.random.normal(size=(s,)).astype(np.float32) for s in sizes.repeat(batch_size)]\n self.ys = [np.random.normal(size=(s,)).astype(np.float32) for s in sizes.repeat(batch_size)]\n\n def __len__(self):\n return self.length\n\n def __getitem__(self, i):\n return {\"input_x\": self.xs[i], \"labels\": self.ys[i]}\n\n\nclass AlmostAccuracy:\n def __init__(self, thresh=0.25):\n self.thresh = thresh\n\n def __call__(self, eval_pred):\n predictions, labels = eval_pred\n true = np.abs(predictions - labels) <= self.thresh\n return {\"accuracy\": true.astype(np.float32).mean().item()}" - }, - { - "id": 26, - "initial_rank": 26, - "content": "class LocalTensorRTLLM(CustomLLM):\n model_path: Optional[str] = Field(description=\"The path to the trt engine.\")\n temperature: float = Field(description=\"The temperature to use for sampling.\")\n max_new_tokens: int = Field(description=\"The maximum number of tokens to generate.\")\n context_window: int = Field(\n description=\"The maximum number of context tokens for the model.\"\n )\n messages_to_prompt: Callable = Field(\n description=\"The function to convert messages to a prompt.\", exclude=True\n )\n completion_to_prompt: Callable = Field(\n description=\"The function to convert a completion to a prompt.\", exclude=True\n )\n generate_kwargs: Dict[str, Any] = Field(\n default_factory=dict, description=\"Kwargs used for generation.\"\n )\n model_kwargs: Dict[str, Any] = Field(\n default_factory=dict, description=\"Kwargs used for model initialization.\"\n )\n verbose: bool = Field(description=\"Whether to print verbose output.\")\n\n _model: Any = PrivateAttr()\n _model_config: Any = PrivateAttr()\n _tokenizer: Any = PrivateAttr()\n _max_new_tokens = PrivateAttr()\n _sampling_config = PrivateAttr()\n _verbose = PrivateAttr()\n\n def __init__(\n self,\n model_path: Optional[str] = None,\n engine_name: Optional[str] = None,\n tokenizer_dir: Optional[str] = None,\n temperature: float = 0.1,\n max_new_tokens: int = DEFAULT_NUM_OUTPUTS,\n context_window: int = DEFAULT_CONTEXT_WINDOW,\n messages_to_prompt: Optional[Callable] = None,\n completion_to_prompt: Optional[Callable] = None,\n callback_manager: Optional[CallbackManager] = None,\n generate_kwargs: Optional[Dict[str, Any]] = None,\n model_kwargs: Optional[Dict[str, Any]] = None,\n verbose: bool = False,\n ) -> None:\n try:\n import torch\n from transformers import AutoTokenizer\n except ImportError:\n raise ImportError(\n \"nvidia_tensorrt requires `pip install torch` and `pip install transformers`.\"\n )\n\n try:\n import tensorrt_llm\n from tensorrt_llm.runtime import ModelConfig, SamplingConfig\n except ImportError:\n print(\n \"Unable to import `tensorrt_llm` module. Please ensure you have\\\n `tensorrt_llm` installed in your environment. You can run\\\n `pip3 install tensorrt_llm -U --extra-index-url https:\/\/pypi.nvidia.com` to install.\"\n )\n\n model_kwargs = model_kwargs or {}\n model_kwargs.update({\"n_ctx\": context_window, \"verbose\": verbose})\n self._max_new_tokens = max_new_tokens\n self._verbose = verbose\n # check if model is cached\n if model_path is not None:\n if not os.path.exists(model_path):\n raise ValueError(\n \"Provided model path does not exist. \"\n \"Please check the path or provide a model_url to download.\"\n )\n else:\n engine_dir = model_path\n engine_dir_path = Path(engine_dir)\n config_path = engine_dir_path \/ \"config.json\"\n\n # config function\n with open(config_path) as f:\n config = json.load(f)\n use_gpt_attention_plugin = config[\"plugin_config\"][\n \"gpt_attention_plugin\"\n ]\n remove_input_padding = config[\"plugin_config\"][\"remove_input_padding\"]\n tp_size = config[\"builder_config\"][\"tensor_parallel\"]\n pp_size = config[\"builder_config\"][\"pipeline_parallel\"]\n world_size = tp_size * pp_size\n assert (\n world_size == tensorrt_llm.mpi_world_size()\n ), f\"Engine world size ({world_size}) != Runtime world size ({tensorrt_llm.mpi_world_size()})\"\n num_heads = config[\"builder_config\"][\"num_heads\"] \/\/ tp_size\n hidden_size = config[\"builder_config\"][\"hidden_size\"] \/\/ tp_size\n vocab_size = config[\"builder_config\"][\"vocab_size\"]\n num_layers = config[\"builder_config\"][\"num_layers\"]\n num_kv_heads = config[\"builder_config\"].get(\"num_kv_heads\", num_heads)\n paged_kv_cache = config[\"plugin_config\"][\"paged_kv_cache\"]\n if config[\"builder_config\"].get(\"multi_query_mode\", False):\n tensorrt_llm.logger.warning(\n \"`multi_query_mode` config is deprecated. Please rebuild the engine.\"\n )\n num_kv_heads = 1\n num_kv_heads = (num_kv_heads + tp_size - 1) \/\/ tp_size\n\n self._model_config = ModelConfig(\n num_heads=num_heads,\n num_kv_heads=num_kv_heads,\n hidden_size=hidden_size,\n vocab_size=vocab_size,\n num_layers=num_layers,\n gpt_attention_plugin=use_gpt_attention_plugin,\n paged_kv_cache=paged_kv_cache,\n remove_input_padding=remove_input_padding,\n )\n\n assert (\n pp_size == 1\n ), \"Python runtime does not support pipeline parallelism\"\n world_size = tp_size * pp_size\n\n runtime_rank = tensorrt_llm.mpi_rank()\n runtime_mapping = tensorrt_llm.Mapping(\n world_size, runtime_rank, tp_size=tp_size, pp_size=pp_size\n )\n\n # TensorRT-LLM must run on a GPU.\n assert (\n torch.cuda.is_available()\n ), \"LocalTensorRTLLM requires a Nvidia CUDA enabled GPU to operate\"\n torch.cuda.set_device(runtime_rank % runtime_mapping.gpus_per_node)\n self._tokenizer = AutoTokenizer.from_pretrained(\n tokenizer_dir, legacy=False\n )\n self._sampling_config = SamplingConfig(\n end_id=EOS_TOKEN,\n pad_id=PAD_TOKEN,\n num_beams=1,\n temperature=temperature,\n )\n\n serialize_path = engine_dir_path \/ (engine_name if engine_name else \"\")\n with open(serialize_path, \"rb\") as f:\n engine_buffer = f.read()\n decoder = tensorrt_llm.runtime.GenerationSession(\n self._model_config, engine_buffer, runtime_mapping, debug_mode=False\n )\n self._model = decoder\n\n generate_kwargs = generate_kwargs or {}\n generate_kwargs.update(\n {\"temperature\": temperature, \"max_tokens\": max_new_tokens}\n )\n\n super().__init__(\n model_path=model_path,\n temperature=temperature,\n context_window=context_window,\n max_new_tokens=max_new_tokens,\n messages_to_prompt=messages_to_prompt,\n completion_to_prompt=completion_to_prompt,\n callback_manager=callback_manager,\n generate_kwargs=generate_kwargs,\n model_kwargs=model_kwargs,\n verbose=verbose,\n )\n\n @classmethod\n def class_name(cls) -> str:\n \"\"\"Get class name.\"\"\"\n return \"LocalTensorRTLLM\"\n\n @property\n def metadata(self) -> LLMMetadata:\n \"\"\"LLM metadata.\"\"\"\n return LLMMetadata(\n context_window=self.context_window,\n num_output=self.max_new_tokens,\n model_name=self.model_path,\n )\n\n @llm_chat_callback()\n def chat(self, messages: Sequence[ChatMessage], **kwargs: Any) -> ChatResponse:\n prompt = self.messages_to_prompt(messages)\n completion_response = self.complete(prompt, formatted=True, **kwargs)\n return completion_response_to_chat_response(completion_response)\n\n @llm_completion_callback()\n def complete(\n self, prompt: str, formatted: bool = False, **kwargs: Any\n ) -> CompletionResponse:\n try:\n import torch\n except ImportError:\n raise ImportError(\"nvidia_tensorrt requires `pip install torch`.\")\n\n self.generate_kwargs.update({\"stream\": False})\n\n if not formatted:\n prompt = self.completion_to_prompt(prompt)\n\n input_text = prompt\n input_ids, input_lengths = parse_input(\n input_text, self._tokenizer, EOS_TOKEN, self._model_config\n )\n\n max_input_length = torch.max(input_lengths).item()\n self._model.setup(\n input_lengths.size(0), max_input_length, self._max_new_tokens, 1\n ) # beam size is set to 1\n if self._verbose:\n start_time = time.time()\n\n output_ids = self._model.decode(input_ids, input_lengths, self._sampling_config)\n torch.cuda.synchronize()\n\n elapsed_time = -1.0\n if self._verbose:\n end_time = time.time()\n elapsed_time = end_time - start_time\n\n output_txt, output_token_ids = get_output(\n output_ids, input_lengths, self._max_new_tokens, self._tokenizer\n )\n\n if self._verbose:\n print(f\"Input context length : {input_ids.shape[1]}\")\n print(f\"Inference time : {elapsed_time:.2f} seconds\")\n print(f\"Output context length : {len(output_token_ids)} \")\n print(\n f\"Inference token\/sec : {(len(output_token_ids) \/ elapsed_time):2f}\"\n )\n\n # call garbage collected after inference\n torch.cuda.empty_cache()\n gc.collect()\n\n return CompletionResponse(\n text=output_txt,\n raw=generate_completion_dict(output_txt, self._model, self.model_path),\n )\n\n @llm_completion_callback()\n def stream_complete(self, prompt: str, **kwargs: Any) -> CompletionResponse:\n raise NotImplementedError(\n \"Nvidia TensorRT-LLM does not currently support streaming completion.\"\n )" - }, - { - "id": 27, - "initial_rank": 27, - "content": "ate with Cache\n\nIn 🤗 Transformers, we support various Cache types to optimize the performance across different models and tasks. By default, all models generate with caching,\nwith the [`~DynamicCache`] class being the default cache for most models. It allows us to dynamically grow cache size, by saving more and more keys and values as we generate. If for some reason you don't want to use caches, you can pass `use_cache=False` into the `generate()` method.\n\nRefer to the table below to see the difference between cache types and choose the one that suits best for your use-case. Models for which initialization is recommended should be initialized before calling the model and passed to model as a kwarg. In all other cases you can simply define desired `cache_implementation` and we take care of the rest for you.\n\n| Cache Type | Memory Efficient | Supports torch.compile() | Initialization Recommended | Latency | Long Context Generation |\n|------------------------|------------------|--------------------------|----------------------------|---------|-------------------------|\n| Dynamic Cache | No | No | No | Mid | No |\n| Static Cache | No | Yes | Yes | High | No |\n| Offloaded Cache | Yes | No | No | Low | Yes |\n| Offloaded Static Cache | No | Yes | Yes | High | Yes |\n| Quantized Cache | Yes | No | No | Low | Yes |\n| Sliding Window Cache | No | Yes | Yes | High | No |\n| Sink Cache | Yes | No | Yes | Mid | Yes |\n\n\nThese cache classes can be set with a `cache_implementation` argument when generating. To learn about the available options for the cache_implementation flag, please refer to the [API Documentation](.\/main_classes\/text_generation#transformers.GenerationConfig). Now, let's explore each cache type in detail and see how to use them. Note that the below examples are for decoder-only Tranformer-based models. We also support [\"Model-Specific Cache\"] classes for models such as Mamba or Jamba, keep reading for more details.\n\n### Quantized Cache\n\nThe key and value cache can occupy a large portion of memory, becoming a [bottleneck for long-context generation](https:\/\/huggingface.co\/blog\/llama31#inference-memory-requirements), especially for Large Language Models.\nQuantizing the cache when using `generate()` can significantly reduce memory requirements at the cost of speed.\n\nKV Cache quantization in `transformers` is largely inspired by the paper [\"KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache\"](https:\/\/arxiv.org\/abs\/2402.02750) and currently supports [`~QuantoQuantizedCache`] and [`~HQQQuantizedCache`] classes. For more information on the inner workings see the paper.\n\nTo enable quantization of the key-value cache, one needs to indicate `cache_implementation=\"quantized\"` in the `generation_config`.\nQuantization related arguments should be passed to the `generation_config` either as a `dict` or an instance of a [`~QuantizedCacheConfig`] class.\nOne has to indicate which quantization backend to use in the [`~QuantizedCacheConfig`], the default is `quanto`.\n\nIt is recommended to set `axis-key\/axis-value` parameters in the cache config to `0` if you're using the `quanto` backend and to `1` if you're using the `HQQ` backend. For other config values, please use the defaults unless you're running out of memory. In that case, you may consider decreasing the residual length.\n\n\n\nCache quantization can be detrimental in terms of latency if the context length is short and there is enough GPU VRAM available to run without cache quantization. It is recommended to seek balance between memory efficiency and latency.\n<\/Tip>\n\n\n```python\n>>> import torch\n>>> from transformers import AutoTokenizer, AutoModelForCausalLM\n\n>>> tokenizer = AutoTokenizer.from_pretrained(\"meta-llama\/Llama-2-7b-chat-hf\")\n>>> model = AutoModelForCausalLM.from_pretrained(\"meta-llama\/Llama-2-7b-chat-hf\", torch_dtype=torch.float16).to(\"cuda:0\")\n>>> inputs = tokenizer(\"I like rock music because\", return_tensors=\"pt\").to(model.device)\n\n>>> out = model.generate(**inputs, do_sample=False, max_new_tokens=20, cache_implementation=\"quantized\", cache_config={\"nbits\": 4, \"backend\": \"quanto\"})\n>>> print(tokenizer.batch_decode(out, skip_special_tokens=True)[0])\nI like rock music because it's loud and energetic. It's a great way to express myself and rel\n\n>>> out = model.generate(**inputs, do_sample=False, max_new_tokens=20)\n>>> print(tokenizer.batch_decode(out, skip_special_tokens=True)[0])\nI like rock music because it's loud and energetic. I like to listen to it when I'm feeling\n```\n\n### Offloaded Cache\n\nSimilarly to KV cache quantization, [`~OffloadedCache`] strategy aims to reduce GPU VRAM usage.\nIt does so by moving the KV cache for most layers to the CPU.\nAs the model's `forward()` method iterates over the layers, this strategy maintains the current layer cache on the GPU.\nAt the same time it asynchronously prefetches the next layer cache as well as sending the previous layer cache back to the CPU.\nUnlike KV cache quantization, this strategy always produces the same result as the default KV cache implementation.\nThus, it can serve as a drop-in replacement or a fallback for it.\n\nDepending on your model and the characteristics of your generation task (size of context, number of generated tokens, number of beams, etc.)\nyou may notice a small degradation in generation throughput compared to the default KV cache implementation.\n\nTo enable KV cache offloading, pass `cache_implementation=\"offloaded\"` in the `generation_config` or directly to the `generate()` call.\nUse `cache_implementation=\"offloaded_static\"` for an offloaded static cache (see also [Offloaded Static Cache](#offloaded-static-cache) below).\n\n```python\n>>> import torch\n>>> from transformers import AutoTokenizer, AutoModelForCausalLM\n>>> ckpt = \"microsoft\/Phi-3-mini-4k-instruct\"\n\n>>> tokenizer = AutoTokenizer.from_pretrained(ckpt)\n>>> model = AutoModelForCausalLM.from_pretrained(ckpt, torch_dtype=torch.float16).to(\"cuda:0\")\n>>> inputs = tokenizer(\"Fun fact: The shortest\", return_tensors=\"pt\").to(model.device)\n\n>>> out = model.generate(**inputs, do_sample=False, max_new_tokens=23, cache_implementation=\"offloaded\")\n>>> print(tokenizer.batch_decode(out, skip_special_tokens=True)[0])\nFun fact: The shortest war in history was between Britain and Zanzibar on August 27, 1896.\n\n>>> out = model.generate(**inputs, do_sample=False, max_new_tokens=23)\n>>> print(tokenizer.batch_decode(out, skip_special_tokens=True)[0])\nFun fact: The shortest war in history was between Britain and Zanzibar on August 27, 1896.\n```\n\n\n\nCache offloading requires a GPU and can be slower than dynamic KV cache. Use it if you are getting CUDA out of memory errors.\n\n<\/Tip>\n\nThe example below shows how KV cache offloading can be used as a fallback strategy.\n```python\n>>> import torch\n>>> from transformers import AutoTokenizer, AutoModelForCausalLM\n>>> def resilient_generate(model, *args, **kwargs):\n... oom = False\n... try:\n... return model.generate(*args, **kwargs)\n... except torch.cuda.OutOfMemoryError as e:\n... print(e)\n... print(\"retrying with cache_implementation='offloaded'\")\n... oom = True\n... if oom:\n... torch.cuda.empty_cache()\n... kwargs[\"cache_implementation\"] = \"offloaded\"\n... return model.generate(*args, **kwargs)\n...\n...\n>>> ckpt = \"microsoft\/Phi-3-mini-4k-instruct\"\n>>> tokenizer = AutoTokenizer.from_pretrained(ckpt)\n>>> model = AutoModelForCausalLM.from_pretrained(ckpt, torch_dtype=torch.float16).to(\"cuda:0\")\n>>> prompt = [\"okay \"*1000 + \"Fun fact: The most\"]\n>>> inputs = tokenizer(prompt, return_tensors=\"pt\").to(model.device)\n>>> beams = { \"num_beams\": 40, \"num_beam_groups\": 40, \"num_return_sequences\": 40, \"diversity_penalty\": 1.0, \"max_new_tokens\": 23, \"early_stopping\": True, }\n>>> out = resilient_generate(model, **inputs, **beams)\n>>> responses = tokenizer.batch_decode(out[:,-28:], skip_special_tokens=True)\n```\n\nOn a GPU with 50 GB of RAM, running this code will print\n```\nCUDA out of memory. Tried to allocate 4.83 GiB. GPU\nretrying with cache_implementation='offloaded'\n```\nbefore successfully generating 40 beams.\n\n\n### Static " - }, - { - "id": 28, - "initial_rank": 28, - "content": "\n\n# bitsandbytes\n\n[bitsandbytes](https:\/\/github.com\/TimDettmers\/bitsandbytes) is the easiest option for quantizing a model to 8 and 4-bit. 8-bit quantization multiplies outliers in fp16 with non-outliers in int8, converts the non-outlier values back to fp16, and then adds them together to return the weights in fp16. This reduces the degradative effect outlier values have on a model's performance. 4-bit quantization compresses a model even further, and it is commonly used with [QLoRA](https:\/\/hf.co\/papers\/2305.14314) to finetune quantized LLMs.\n\nTo use bitsandbytes, make sure you have the following libraries installed:\n\n\n\n\n```bash\npip install transformers accelerate bitsandbytes>0.37.0\n```\n\n<\/hfoption>\n\n\n```bash\npip install bitsandbytes>=0.39.0\npip install --upgrade accelerate transformers\n```\n\n<\/hfoption>\n<\/hfoptions>\n\n\n\nbitsandbytes is being refactored to support multiple backends beyond CUDA. Currently, ROCm (AMD GPU) and Intel CPU implementations are mature, with Intel XPU in progress and Apple Silicon support expected by Q4\/Q1. For installation instructions and the latest backend updates, visit [this link](https:\/\/huggingface.co\/docs\/bitsandbytes\/main\/en\/installation#multi-backend).\n\nWe value your feedback to help identify bugs before the full release! Check out [these docs](https:\/\/huggingface.co\/docs\/bitsandbytes\/main\/en\/non_cuda_backends) for more details and feedback links.\n\n<\/Tip>\n\nNow you can quantize a model by passing a `BitsAndBytesConfig` to [`~PreTrainedModel.from_pretrained`] method. This works for any model in any modality, as long as it supports loading with Accelerate and contains `torch.nn.Linear` layers.\n\n\n\n\nQuantizing a model in 8-bit halves the memory-usage, and for large models, set `device_map=\"auto\"` to efficiently use the GPUs available:\n\n```py\nfrom transformers import AutoModelForCausalLM, BitsAndBytesConfig\n\nquantization_config = BitsAndBytesConfig(load_in_8bit=True)\n\nmodel_8bit = AutoModelForCausalLM.from_pretrained(\n \"bigscience\/bloom-1b7\", \n quantization_config=quantization_config\n)\n```\n\nBy default, all the other modules such as `torch.nn.LayerNorm` are converted to `torch.float16`. You can change the data type of these modules with the `torch_dtype` parameter if you want:\n\n```py\nimport torch\nfrom transformers import AutoModelForCausalLM, BitsAndBytesConfig\n\nquantization_config = BitsAndBytesConfig(load_in_8bit=True)\n\nmodel_8bit = AutoModelForCausalLM.from_pretrained(\n \"facebook\/opt-350m\", \n quantization_config=quantization_config, \n torch_dtype=torch.float32\n)\nmodel_8bit.model.decoder.layers[-1].final_layer_norm.weight.dtype\n```\n\nOnce a model is quantized to 8-bit, you can't push the quantized weights to the Hub unless you're using the latest version of Transformers and bitsandbytes. If you have the latest versions, then you can push the 8-bit model to the Hub with the [`~PreTrainedModel.push_to_hub`] method. The quantization config.json file is pushed first, followed by the quantized model weights.\n\n```py\nfrom transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig\n\nquantization_config = BitsAndBytesConfig(load_in_8bit=True)\n\nmodel = AutoModelForCausalLM.from_pretrained(\n \"bigscience\/bloom-560m\", \n quantization_config=quantization_config\n)\ntokenizer = AutoTokenizer.from_pretrained(\"bigscience\/bloom-560m\")\n\nmodel.push_to_hub(\"bloom-560m-8bit\")\n```\n\n<\/hfoption>\n\n\nQuantizing a model in 4-bit reduces your memory-usage by 4x, and for large models, set `device_map=\"auto\"` to efficiently use the GPUs available:\n\n```py\nfrom transformers import AutoModelForCausalLM, BitsAndBytesConfig\n\nquantization_config = BitsAndBytesConfig(load_in_4bit=True)\n\nmodel_4bit = AutoModelForCausalLM.from_pretrained(\n \"bigscience\/bloom-1b7\",\n quantization_config=quantization_config\n)\n```\n\nBy default, all the other modules such as `torch.nn.LayerNorm` are converted to `torch.float16`. You can change the data type of these modules with the `torch_dtype` parameter if you want:\n\n```py\nimport torch\nfrom transformers import AutoModelForCausalLM, BitsAndBytesConfig\n\nquantization_config = BitsAndBytesConfig(load_in_4bit=True)\n\nmodel_4bit = AutoModelForCausalLM.from_pretrained(\n \"facebook\/opt-350m\",\n quantization_config=quantization_config, \n torch_dtype=torch.float32\n)\nmodel_4bit.model.decoder.layers[-1].final_layer_norm.weight.dtype\n```\n\nIf you have `bitsandbytes>=0.41.3`, you can serialize 4-bit models and push them on Hugging Face Hub. Simply call `model.push_to_hub()` after loading it in 4-bit precision. You can also save the serialized 4-bit models locally with `model.save_pretrained()` command. \n\n<\/hfoption>\n<\/hfoptions>\n\n\n\nTraining with 8-bit and 4-bit weights are only supported for training *extra* parameters.\n\n<\/Tip>\n\nYou can check your memory footprint with the `get_memory_footprint` method:\n\n```py\nprint(model.get_memory_footprint())\n```\n\nQuantized models can be loaded from the [`~PreTrainedModel.from_pretrained`] method without needing to specify the `load_in_8bit` or `load_in_4bit` parameters:\n\n```py\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\n\nmodel = AutoModelForCausalLM.from_pretrained(\"{your_username}\/bloom-560m-8bit\", device_map=\"auto\")\n```\n\n## " - }, - { - "id": 29, - "initial_rank": 29, - "content": ":\/\/arxiv.org\/abs\/2208.07339) 論文、または [ブログ投稿](https:\/\/huggingface.co\/blog\/hf-bitsandbytes-) をご覧ください。統合)コラボレーションについて。\n\n`0.39.0`リリース以降、FP4 データ型を活用し、4 ビット量子化を使用して`device_map`をサポートする任意のモデルをロードできます。\n\n独自の pytorch モデルを量子化したい場合は、🤗 Accelerate ライブラリの [ドキュメント](https:\/\/huggingface.co\/docs\/accelerate\/main\/en\/usage_guides\/quantization) をチェックしてください。\n\n`bitsandbytes`統合を使用してできることは次のとおりです\n\n### General usage\n\nモデルが 🤗 Accelerate による読み込みをサポートし、`torch.nn.Linear` レイヤーが含まれている限り、 [`~PreTrainedModel.from_pretrained`] メソッドを呼び出すときに `load_in_8bit` または `load_in_4bit` 引数を使用してモデルを量子化できます。これはどのようなモダリティでも同様に機能するはずです。\n\n```python\nfrom transformers import AutoModelForCausalLM\n\nmodel_8bit = AutoModelForCausalLM.from_pretrained(\"facebook\/opt-350m\", load_in_8bit=True)\nmodel_4bit = AutoModelForCausalLM.from_pretrained(\"facebook\/opt-350m\", load_in_4bit=True)\n```\n\nデフォルトでは、他のすべてのモジュール (例: `torch.nn.LayerNorm`) は `torch.float16` に変換されますが、その `dtype` を変更したい場合は、`torch_dtype` 引数を上書きできます。\n\n```python\n>>> import torch\n>>> from transformers import AutoModelForCausalLM\n\n>>> model_8bit = AutoModelForCausalLM.from_pretrained(\"facebook\/opt-350m\", load_in_8bit=True, torch_dtype=torch.float32)\n>>> model_8bit.model.decoder.layers[-1].final_layer_norm.weight.dtype\ntorch.float32\n```\n\n### FP4 quantization \n\n#### Requirements\n\n以下のコード スニペットを実行する前に、以下の要件がインストールされていることを確認してください。\n\n- 最新の`bitsandbytes`ライブラリ\n`pip install bitsandbytes>=0.39.0`\n\n- 最新の`accelerate`をインストールする\n`pip install --upgrade accelerate`\n\n- 最新の `transformers` をインストールする\n`pip install --upgrade transformers`\n\n#### Tips and best practices\n\n- **高度な使用法:** 可能なすべてのオプションを使用した 4 ビット量子化の高度な使用法については、[この Google Colab ノートブック](https:\/\/colab.research.google.com\/drive\/1ge2F1QSK8Q7h0hn3YKuBCOAS0bK8E0wf) を参照してください。\n\n- **`batch_size=1` による高速推論 :** bitsandbytes の `0.40.0` リリース以降、`batch_size=1` では高速推論の恩恵を受けることができます。 [これらのリリース ノート](https:\/\/github.com\/TimDettmers\/bitsandbytes\/releases\/tag\/0.40.0) を確認し、この機能を活用するには`0.40.0`以降のバージョンを使用していることを確認してください。箱の。\n\n- **トレーニング:** [QLoRA 論文](https:\/\/arxiv.org\/abs\/2305.14314) によると、4 ビット基本モデルをトレーニングする場合 (例: LoRA アダプターを使用)、`bnb_4bit_quant_type='nf4'` を使用する必要があります。 。\n\n- **推論:** 推論の場合、`bnb_4bit_quant_type` はパフォーマンスに大きな影響を与えません。ただし、モデルの重みとの一貫性を保つために、必ず同じ `bnb_4bit_compute_dtype` および `torch_dtype` 引数を使用してください。\n\n\n#### Load a large model in 4bit\n\n`.from_pretrained` メソッドを呼び出すときに `load_in_4bit=True` を使用すると、メモリ使用量を (おおよそ) 4 で割ることができます。\n\n```python\n# pip install transformers accelerate bitsandbytes\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\n\nmodel_id = \"bigscience\/bloom-1b7\"\n\ntokenizer = AutoTokenizer.from_pretrained(model_id)\nmodel = AutoModelForCausalLM.from_pretrained(model_id, device_map=\"auto\", load_in_4bit=True)\n```\n\n\n\nモデルが 4 ビットでロードされると、現時点では量子化された重みをハブにプッシュすることはできないことに注意してください。 4 ビットの重みはまだサポートされていないため、トレーニングできないことにも注意してください。ただし、4 ビット モデルを使用して追加のパラメーターをトレーニングすることもできます。これについては次のセクションで説明します。\n\n<\/Tip>\n\n### Load a large model in 8bit\n\n`.from_pretrained` メソッドを呼び出すときに `load_in_8bit=True` 引数を使用すると、メモリ要件をおよそ半分にしてモデルをロードできます。\n\n```python\n# pip install transformers accelerate bitsandbytes\nfrom transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig\n\nmodel_id = \"bigscience\/bloom-1b7\"\n\ntokenizer = AutoTokenizer.from_pretrained(model_id)\nmodel = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=BitsAndBytesConfig(load_in_8bit=True))\n```\n\n次に、通常 [`PreTrainedModel`] を使用するのと同じようにモデルを使用します。\n\n`get_memory_footprint` メソッドを使用して、モデルのメモリ フットプリントを確認できます。\n\n```python\nprint(model.get_memory_foot" - }, - { - "id": 30, - "initial_rank": 30, - "content": "```python\nfrom transformers import AutoModelForCausalLM\nmodel = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=gptq_config)\n\n```\n\n请注意,您需要一个GPU来量化模型。我们将模型放在cpu中,并将模块来回移动到gpu中,以便对其进行量化。\n\n如果您想在使用 CPU 卸载的同时最大化 GPU 使用率,您可以设置 `device_map = \"auto\"`。\n\n\n```python\nfrom transformers import AutoModelForCausalLM\nmodel = AutoModelForCausalLM.from_pretrained(model_id, device_map=\"auto\", quantization_config=gptq_config)\n```\n\n请注意,不支持磁盘卸载。此外,如果由于数据集而内存不足,您可能需要在`from_pretrained`中设置`max_memory`。查看这个[指南](https:\/\/huggingface.co\/docs\/accelerate\/usage_guides\/big_modeling#designing-a-device-map)以了解有关`device_map`和`max_memory`的更多信息。\n\n\n\n目前,GPTQ量化仅适用于文本模型。此外,量化过程可能会花费很多时间,具体取决于硬件性能(175B模型在NVIDIA A100上需要4小时)。请在Hub上检查是否有模型的GPTQ量化版本。如果没有,您可以在GitHub上提交需求。 \n<\/Tip>\n\n### 推送量化模型到 🤗 Hub\n\n您可以使用`push_to_hub`将量化模型像任何模型一样推送到Hub。量化配置将与模型一起保存和推送。\n\n```python\nquantized_model.push_to_hub(\"opt-125m-gptq\")\ntokenizer.push_to_hub(\"opt-125m-gptq\")\n```\n\n如果您想在本地计算机上保存量化模型,您也可以使用`save_pretrained`来完成:\n\n```python\nquantized_model.save_pretrained(\"opt-125m-gptq\")\ntokenizer.save_pretrained(\"opt-125m-gptq\")\n```\n\n请注意,如果您量化模型时想使用`device_map`,请确保在保存之前将整个模型移动到您的GPU或CPU之一。\n\n```python\nquantized_model.to(\"cpu\")\nquantized_model.save_pretrained(\"opt-125m-gptq\")\n```\n\n### 从 🤗 Hub 加载一个量化模型\n\n您可以使用`from_pretrained`从Hub加载量化模型。\n请确保推送权重是量化的,检查模型配置对象中是否存在`quantization_config`属性。\n\n\n```python\nfrom transformers import AutoModelForCausalLM\nmodel = AutoModelForCausalLM.from_pretrained(\"{your_username}\/opt-125m-gptq\")\n```\n\n如果您想更快地加载模型,并且不需要分配比实际需要内存更多的内存,量化模型也使用`device_map`参数。确保您已安装`accelerate`库。\n\n```python\nfrom transformers import AutoModelForCausalLM\nmodel = AutoModelForCausalLM.from_pretrained(\"{your_username}\/opt-125m-gptq\", device_map=\"auto\")\n```\n\n### Exllama内核加快推理速度\n\n保留格式:对于 4 位模型,您可以使用 exllama 内核来提高推理速度。默认情况下,它处于启用状态。您可以通过在 [`GPTQConfig`] 中传递 `use_exllama` 来更改此配置。这将覆盖存储在配置中的量化配置。请注意,您只能覆盖与内核相关的属性。此外,如果您想使用 exllama 内核,整个模型需要全部部署在 gpus 上。此外,您可以使用 版本 > 0.4.2 的 Auto-GPTQ 并传递 `device_map` = \"cpu\" 来执行 CPU 推理。对于 CPU 推理,您必须在 `GPTQConfig` 中传递 `use_exllama = False`。\n\n```py\nimport torch\ngptq_config = GPTQConfig(bits=4)\nmodel = AutoModelForCausalLM.from_pretrained(\"{your_username}\/opt-125m-gptq\", device_map=\"auto\", quantization_config=gptq_config)\n```\n\n随着 exllamav2 内核的发布,与 exllama 内核相比,您可以获得更快的推理速度。您只需在 [`GPTQConfig`] 中传递 `exllama_config={\"version\": 2}`:\n\n```py\nimport torch\ngptq_config = GPTQConfig(bits=4, exllama_config={\"version\":2})\nmodel = AutoModelForCausalLM.from_pretrained(\"{your_username}\/opt-125m-gptq\", device_map=\"auto\", quantization_config = gptq_config)\n```\n\n请注意,目前仅支持 4 位模型。此外,如果您正在使用 peft 对量化模型进行微调,建议禁用 exllama 内核。 \n\n您可以在此找到这些内核的基准测试 [这里](https:\/\/github.com\/huggingface\/optimum\/tree\/main\/tests\/benchmark#gptq-benchmark)\n\n\n#### 微调一个量化模型\n\n在Hugging Face生态系统的官方支持下,您可以使用GPTQ进行量化后的模型进行微调。 \n请查看`peft`库了解更多详情。\n\n### 示例演示\n\n请查看 Google Colab [notebook](https:\/\/colab.research.google.com\/drive\/1_TIrmuKOFhuRRiTWN94ilkUFu6ZX4ceb?usp=sharing),了解如何使用GPTQ量化您的模型以及如何使用peft微调量化模型。\n\n### GPTQConfig\n\n[[autodoc]] GPTQConfig\n\n\n## `bitsandbytes` 集成\n\n🤗 Transformers 与 `bitsandbytes` 上最常用的模块紧密集成。您可以使用几行代码以 8 位精度加载您的模型。\n自bitsandbytes的0.37.0版本发布以来,大多数GPU硬件都支持这一点。\n\n在[LLM.int8()](https:\/\/arxiv.org\/abs\/2208.07339)论文中了解更多关于量化方法的信息,或者在[博客文章](https:\/\/huggingface.co\/blog\/hf-bitsandbytes-integration)中了解关于合作的更多信息。\n\n自其“0.39.0”版本发布以来,您可以使用FP4数据类型,通过4位量化加载任何支持“device_map”的模型。\n\n如果您想量化自己的 pytorch 模型,请查看 🤗 Accelerate 的[文档](https:\/\/huggingface.co\/docs\/accelerate\/main\/en\/usage_guides\/quantization)。\n\n以下是您可以使用“bitsandbytes”集成完成的事情\n\n### 通用用法\n\n只要您的模型支持使用 🤗 Accelerate 进行加载并包含 `torch.nn.Linear` 层,您可以在调用 [`~PreTrainedModel.from_pretrained`] 方法时使用 `load_in_8bit` 或 `load_in_4bit` 参数来量化模型。这也应该适用于任何模态。\n\n```python\nfrom transformers import AutoModelForCausalLM\n\nmod" - }, - { - "id": 31, - "initial_rank": 31, - "content": "absl-py==1.0.0\naiohttp==3.10.2\naiosignal==1.2.0\nalembic==1.7.7\nappdirs==1.4.4\nAPScheduler==3.9.1\narrow==1.2.2\nasttokens==2.0.5\nastunparse==1.6.3\nasync-timeout==4.0.2\nattrs==21.4.0\naudioread==2.1.9\nautopage==0.5.0\nbackcall==0.2.0\nbackoff==1.11.1\nbackports.zoneinfo==0.2.1\nbinaryornot==0.4.4\nblack==24.3.0\nboto3==1.16.34\nbotocore==1.19.63\nBrotli==1.0.9\ncachetools==5.0.0\ncertifi==2024.7.4\ncffi==1.15.0\nchardet==4.0.0\ncharset-normalizer==2.0.12\nchex==0.1.1\nclick==8.0.4\ncliff==3.10.1\nclldutils==3.11.1\ncloudpickle==2.0.0\ncmaes==0.8.2\ncmd2==2.4.0\ncodecarbon==1.2.0\ncolorlog==6.6.0\ncookiecutter==2.1.1\ncryptography==43.0.1\ncsvw==2.0.0\ncycler==0.11.0\nCython==0.29.28\ndash==2.15.0\ndash-bootstrap-components==1.0.3\ndash-core-components==2.0.0\ndash-html-components==2.0.0\ndash-table==5.0.0\ndatasets==2.0.0\ndecorator==5.1.1\nDeprecated==1.2.13\ndill==0.3.4\ndlinfo==1.2.1\ndm-tree==0.1.6\ndocker==4.4.4\nexecnet==1.9.0\nexecuting==0.8.3\nfaiss-cpu==1.7.2\nfasteners==0.17.3\nfilelock==3.6.0\nfire==0.4.0\nflake8==4.0.1\nFlask==2.3.2\nFlask-Compress==1.11\nflatbuffers==2.0\nflax==0.4.0\nfonttools==4.43.0\nfrozenlist==1.3.0\nfsspec==2022.2.0\nfugashi==1.1.2\ngast==0.5.3\ngitdb==4.0.9\nGitPython==3.1.41\nglfw==2.5.1\ngoogle-auth==2.6.2\ngoogle-auth-oauthlib==0.4.6\ngoogle-pasta==0.2.0\ngreenlet==1.1.2\ngrpcio==1.53.2\ngym==0.23.1\ngym-notices==0.0.6\nh5py==3.6.0\nhuggingface-hub==0.4.0\nhypothesis==6.39.4\nidna==3.7\nimageio==2.16.1\nimportlib-metadata==4.11.3\nimportlib-resources==5.4.0\niniconfig==1.1.1\nipadic==1.0.0\nipython==8.10.0\nisodate==0.6.1\nisort==5.10.1\nitsdangerous==2.1.1\njax==0.3.4\njaxlib==0.3.2\njedi==0.18.1\nJinja2==3.1.4\njinja2-time==0.2.0\njmespath==0.10.0\njoblib==1.2.0\njsonschema==4.4.0\nkeras==2.13.1\nKeras-Preprocessing==1.1.2\nkiwisolver==1.4.0\nkubernetes==12.0.1\nlibclang==13.0.0\nlibrosa==0.9.1\nllvmlite==0.38.0\nMako==1.2.2\nMarkdown==3.3.6\nMarkupSafe==1.1.1\nmatplotlib==3.5.1\nmatplotlib-inline==0.1.3\nmccabe==0.6.1\nmsgpack==1.0.3\nmujoco-py==2.1.2.14\nmultidict==6.0.2\nmultiprocess==0.70.12.2\nmypy-extensions==0.4.3\nnltk==3.9\nnumba==0.55.1\nnumpy==1.22.3\noauthlib==3.2.2\nonnx>=1.15.0\nonnxconverter-common==1.9.0\nopt-einsum==3.3.0\noptax==0.1.1\noptuna==2.10.0\npackaging==21.3\npandas==1.4.1\nparameterized==0.8.1\nparso==0.8.3\npathspec==0.9.0\npbr==5.8.1\npexpect==4.8.0\nphonemizer==3.0.1\npickleshare==0.7.5\nPillow==10.3.0\nPint==0.16.1\nplac==1.3.4\nplatformdirs==2.5.1\nplotly==5.6.0\npluggy==1.0.0\npooch==1.6.0\nportalocker==2.0.0\npoyo==0.5.0\nprettytable==3.2.0\nprompt-toolkit==3.0.28\nprotobuf==3.19.5\npsutil==5.9.0\nptyprocess==0.7.0\npure-eval==0.2.2\npy==1.11.0\npy-cpuinfo==8.0.0\npyarrow==15.0.0\npyasn1==0.4.8\npyasn1-modules==0.2.8\npycodestyle==2.8.0\npycparser==2.21\npyctcdecode==0.3.0\npyflakes==2.4.0\nPygments==2.15.0\npygtrie==2.4.2\npynvml==11.4.1\npyOpenSSL==22.0.0\npyparsing==3.0.7\npyperclip==1.8.2\npypng==0.0.21\npyrsistent==0.18.1\npytest==7.1.1\npytest-forked==1.4.0\npytest-timeout==2.1.0\npytest-xdist==2.5.0\npython-dateutil==2.8.2\npython-slugify==6.1.1\npytz==2022.1\npytz-deprecation-shim==0.1.0.post0\nPyYAML==6.0\nray>2.6.3\nredis==4.5.4\nregex==2022.3.15\nrequests==2.32.0\nrequests-oauthlib==1.3.1\nresampy==0.2.2\nresponses==0.18.0\nrfc3986==1.5.0\nrouge-score==0.0.4\nrsa==4.8\ns3transfer==0.3.7\nsacrebleu==1.5.1\nsacremoses==0.0.49\nscikit-learn==1.5.0\nscipy==1.8.0\nsegments==2.2.0\nsentencepiece==0.1.96\nsigopt==8.2.0\nsix==1.16.0\nsmmap==5.0.0\nsortedcontainers==2.4.0\nSoundFile==0.10.3.post1\nSQLAlchemy==1.4.32\nstack-data==0.2.0\nstevedore==3.5.0\ntabulate==0.8.9\ntenacity==8.0.1\ntensorboard==2.8.0\ntensorboard-data-server==0.6.1\ntensorboard-plugin-wit==1.8.1\ntensorboardX==2.5\ntensorflow==2.12.1\ntensorflow-io-gcs-filesystem==0.24.0\ntermcolor==1.1.0\ntext-unidecode==1.3\ntf-estimator-nightly==2.8.0.dev2021122109\ntf2onnx==1.9.3" - }, - { - "id": 32, - "initial_rank": 32, - "content": "ive Model Parallelism to Pipeline Parallelism\n\nTo explain Pipeline parallelism, we'll first look into Naive Model Parallelism (MP), also known as Vertical MP. This approach\ninvolves distributing groups of model layers across multiple GPUs by assigning specific layers to specific GPUs with `.to()`. \nAs data flows through these layers, it is moved to the same GPU as the layer, while the other layers remain untouched.\n\nWe refer to this Model parallelism as \"Vertical\" because of how models are typically visualized. For example, the \nfollowing diagram shows an 8-layer model split vertically into two slices, placing layers 0-3 onto \nGPU0 and 4-7 to GPU1:\n\n```\n================\n| Layer | |\n| 0 | |\n| 1 | GPU0 |\n| 2 | |\n| 3 | |\n================\n| Layer | |\n| 4 | |\n| 5 | GPU1 |\n| 6 | |\n| 7 | |\n================\n```\n\nIn this example, when data moves from layer 0 to 3, it's no different from regular forward pass. However, passing data \nfrom layer 3 to 4 requires moving it from GPU0 to GPU1, introducing a communication overhead. If the participating \nGPUs are on the same compute node (e.g. same physical machine) this copying is fast, but if the GPUs are distributed \nacross different compute nodes (e.g. multiple machines), the communication overhead could be substantially greater.\n\nFollowing that, layers 4 to 7 work as they would in the original model. Upon completion of the 7th layer, there is often \na need to send the data back to layer 0 where the labels are (or alternatively send the labels to the last layer). Now the loss can be \ncomputed and the optimizer can do its work.\n\nNaive Model Parallelism comes several shortcomings:\n- **All but one GPU are idle at any given moment**: if 4 GPUs are used, it's nearly identical to quadrupling the amount of memory of a single GPU, and ignoring the rest of the hardware. \n- **Overhead in data transfer between devices**: E.g. 4x 6GB cards will be able to accommodate the same size as 1x 24GB card using naive MP, but a single 24GB card will complete the training faster, because it doesn't have the data copying overhead. But, say, if you have 40GB cards and need to fit a 45GB model you can with 4x 40GB cards (but barely because of the gradient and optimizer states)\n- **Copying shared embeddings**: Shared embeddings may need to get copied back and forth between GPUs.\n\nNow that you are familiar with how the naive approach to model parallelism works and its shortcomings, let's look at Pipeline Parallelism (PP).\nPP is almost identical to a naive MP, but it solves the GPU idling problem by chunking the incoming batch into micro-batches \nand artificially creating a pipeline, which allows different GPUs to concurrently participate in the computation process.\n\nThe following illustration from the [GPipe paper](https:\/\/ai.googleblog.com\/2019\/03\/introducing-gpipe-open-source-library.html) \nshows the naive MP on the top, and PP on the bottom:\n\n
\n \"MP\n<\/div>\n\nAt the bottom of the diagram, you can observe that the Pipeline Parallelism (PP) approach minimizes the number of idle \nGPU zones, referred to as 'bubbles'. Both parts of the diagram show a parallelism level of degree 4, meaning that 4 GPUs \nare involved in the pipeline. You can see that there's a forward path of 4 pipe stages (F0, F1, F2 and F3) followed by \na backward path in reverse order (B3, B2, B1, and B0).\n\nPP introduces a new hyperparameter to tune - `chunks`, which determines how many data chunks are sent in a sequence \nthrough the same pipe stage. For example, in the bottom diagram you can see `chunks=4`. GPU0 performs the same \nforward path on chunk 0, 1, 2 and 3 (F0,0, F0,1, F0,2, F0,3) and then it waits for other GPUs to do complete their work. \nOnly when the other GPUs begin to complete their work, GPU0 starts to work again doing the backward path for chunks \n3, 2, 1 and 0 (B0,3, B0,2, B0,1, B0,0).\n\nNote that this is the same concept as gradient accumulation steps. PyTorch uses `chunks`, while DeepSpeed refers \nto the same hyperparameter as gradient accumulation steps.\n\nBecause of the chunks, PP introduces the notion of micro-batches (MBS). DP splits the global data batch size into \nmini-batches, so if you have a DP degree of 4, a global batch size of 1024 gets split up into 4 mini-batches of \n256 each (1024\/4). And if the number of `chunks` (or GAS) is 32 we end up with a micro-batch size of 8 (256\/32). Each \nPipeline stage works with a single micro-batch at a time. To calculate the global batch size of the DP + PP setup, \nuse the formula: `mbs * chunks * dp_degree` (`8 * 32 * 4 = 1024`).\nWith `chunks=1` you end up with the naive MP, which is inefficient. With a large `chunks` value you end up with \ntiny micro-batch sizes which is also inefficient. For this reason, we encourage to experiment with the `chunks` value to \nfind the one that leads to the most efficient GPUs utilization.\n\nYou may notice a bubble of \"dead\" time on the diagram that can't be parallelized because the last `forward` stage \nhas to wait for `backward` to complete the pipeline. The purpose of finding the best value for `chunks` is to enable a high \nconcurrent GPU utilization across all participating GPUs which translates to minimizing the size of the bubble.\n\nPipeline API solutions have been implemented in:\n- PyTorch\n- DeepSpeed\n- Megatron-LM\n\nThese come with some shortcomings:\n- They have to modify the model quite heavily, because Pipeline requires one to rewrite the normal flow of modules into a `nn.Sequential` sequence of the same, which may require changes to the design of the model.\n- Currently the Pipeline API is very restricted. If you had a bunch of Python variables being passed in the very first stage of the Pipeline, you will have to find a way around it. Currently, the pipeline interface requires either a single Tensor or a tuple of Tensors as the only input and output. These tensors must have a batch size as the very first dimension, since pipeline is going to chunk the mini batch into micro-batches. Possible improvements are being discussed here https:\/\/github.com\/pytorch\/pytorch\/pull\/50693\n- Conditional control flow at the level of pipe stages is not possible - e.g., Encoder-Decoder models like T5 require special workarounds to handle a conditional encoder stage.\n- They have to arrange each layer so that the output of one layer becomes an input to the other layer.\n\nMore recent solutions include:\n- Varuna\n- Sagemaker\n\nWe have not experimented with Varuna and SageMaker but their papers report that they have overcome the list of problems \nmentioned above and that they require smaller changes to the user's model.\n\nImplementations:\n- [PyTorch](https:\/\/pytorch.org\/docs\/stable\/pipeline.html) (initial support in pytorch-1.8, and progressively getting improved in 1.9 and more so in 1.10). Some [examples](https:\/\/github.com\/pytorch\/pytorch\/blob\/master\/benchmarks\/distributed\/pipeline\/pipe.py)\n- [DeepSpeed](https:\/\/www.deepspeed.ai\/tutorials\/pipeline\/)\n- [Megatron-LM](https:\/\/github.com\/NVIDIA\/Megatron-LM) has an internal implementation - no API.\n- [Varuna](https:\/\/github.com\/microsoft\/varuna)\n- [SageMaker](https:\/\/arxiv.org\/abs\/2111.05972) - this is a proprietary solution that can only be used on AWS.\n- [OSLO](https:\/\/github.com\/tunib-ai\/oslo) - this is implemented based on the Hugging Face Transformers.\n\n🤗 Transformers status: as of this writing none of the models supports full-PP. GPT2 and T5 models have naive MP support. \nThe main obstacle is being unable to convert the models to `nn.Sequential` and have all the inputs to be Tensors. This \nis because currently the models include many features that make the conversion very complicated, and will need to be removed to accomplish that.\n\nDeepSpeed and Megatron-LM integrations are available in [🤗 Accelerate](https:\/\/huggingface.co\/docs\/accelerate\/main\/en\/usage_guides\/deepspeed)\n\nOther approache" - }, - { - "id": 33, - "initial_rank": 33, - "content": "aiohttp==3.8.4\naiosignal==1.3.1\nasttokens==2.2.1\nasync-timeout==4.0.2\nattrs==22.2.0\nbackcall==0.2.0\nblack==23.1.0\ncertifi==2022.12.7\ncharset-normalizer==3.1.0\nclick==8.1.3\ncolorama==0.4.6\ncontourpy==1.0.7\ncycler==0.11.0\ndecorator==5.1.1\nexecuting==1.2.0\nfonttools==4.39.0\nfrozenlist==1.3.3\nidna==3.4\nipython==8.11.0\njedi==0.18.2\njoblib==1.2.0\nkiwisolver==1.4.4\nmatplotlib==3.7.1\nmatplotlib-inline==0.1.6\nmultidict==6.0.4\nmypy-extensions==1.0.0\nnumpy==1.24.2\nopenai==0.27.2\npackaging==23.0\npandas==1.5.3\nparso==0.8.3\npathspec==0.11.0\npickleshare==0.7.5\nPillow==9.4.0\nplatformdirs==3.1.0\nprompt-toolkit==3.0.38\npure-eval==0.2.2\nPygments==2.14.0\npyparsing==3.0.9\npython-dateutil==2.8.2\npytz==2022.7.1\nrequests==2.28.2\nscikit-learn==1.2.2\nscipy==1.10.1\nseaborn==0.12.2\nsix==1.16.0\nstack-data==0.6.2\nthreadpoolctl==3.1.0\ntokenize-rt==5.0.0\ntqdm==4.65.0\ntraitlets==5.9.0\nurllib3==1.26.15\nwcwidth==0.2.6\nyarl==1.8.2\n" - }, - { - "id": 34, - "initial_rank": 34, - "content": "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 225.1\/225.1 kB 11.7 MB\/s eta 0:00:00\n[19:24:01+0000] Downloading PyJWT-2.8.0-py3-none-any.whl (22 kB)\n[19:24:07+0000] Installing collected packages: pytz, fixedint, azure-common, zipp, wrapt, urllib3, tzdata, typing-extensions, types-pytz, types-pillow, tqdm, tenacity, sniffio, six, regex, pyjwt, pycparser, pyasn1, priority, portalocker, pillow, packaging, opentelemetry-util-http, opentelemetry-semantic-conventions, oauthlib, numpy, multidict, markupsafe, itsdangerous, idna, hyperframe, hpack, h11, frozenlist, distro, click, charset-normalizer, certifi, blinker, attrs, asgiref, annotated-types, aiofiles, yarl, wsproto, werkzeug, uvicorn, rsa, requests, python-dateutil, pydantic-core, pandas-stubs, jinja2, isodate, importlib-metadata, httpcore, h2, ecdsa, deprecated, cffi, anyio, aiosignal, tiktoken, requests-oauthlib, python-jose, pydantic, pandas, opentelemetry-api, hypercorn, httpx, flask, cryptography, azure-core, aiohttp, quart, opentelemetry-sdk, opentelemetry-instrumentation, openai, msrest, azure-storage-blob, azure-search-documents, azure-keyvault-secrets, azure-core-tracing-opentelemetry, quart-cors, opentelemetry-resource-detector-azure, opentelemetry-instrumentation-wsgi, opentelemetry-instrumentation-urllib3, opentelemetry-instrumentation-urllib, opentelemetry-instrumentation-requests, opentelemetry-instrumentation-httpx, opentelemetry-instrumentation-dbapi, opentelemetry-instrumentation-asgi, opentelemetry-instrumentation-aiohttp-client, msal, azure-monitor-opentelemetry-exporter, opentelemetry-instrumentation-psycopg2, opentelemetry-instrumentation-flask, opentelemetry-instrumentation-fastapi, opentelemetry-instrumentation-django, msal-extensions, azure-monitor-opentelemetry, azure-identity\n[19:25:31+0000] Successfully installed aiofiles-23.2.1 aiohttp-3.9.3 aiosignal-1.3.1 annotated-types-0.6.0 anyio-4.2.0 asgiref-3.7.2 attrs-23.2.0 azure-common-1.1.28 azure-core-1.29.7 azure-core-tracing-opentelemetry-1.0.0b11 azure-identity-1.15.0 azure-keyvault-secrets-4.7.0 azure-monitor-opentelemetry-1.2.0 azure-monitor-opentelemetry-exporter-1.0.0b21 azure-search-documents-11.6.0b1 azure-storage-blob-12.19.0 blinker-1.7.0 certifi-2023.11.17 cffi-1.16.0 charset-normalizer-3.3.2 click-8.1.7 cryptography-42.0.1 deprecated-1.2.14 distro-1.9.0 ecdsa-0.18.0 fixedint-0.1.6 flask-3.0.1 frozenlist-1.4.1 h11-0.14.0 h2-4.1.0 hpack-4.0.0 httpcore-1.0.2 httpx-0.26.0 hypercorn-0.16.0 hyperframe-6.0.1 idna-3.6 importlib-metadata-6.11.0 isodate-0.6.1 itsdangerous-2.1.2 jinja2-3.1.3 markupsafe-2.1.4 msal-1.26.0 msal-extensions-1.1.0 msrest-0.7.1 multidict-6.0.4 numpy-1.26.3 oauthlib-3.2.2 openai-1.10.0 opentelemetry-api-1.22.0 opentelemetry-instrumentation-0.43b0 opentelemetry-instrumentation-aiohttp-client-0.43b0 opentelemetry-instrumentation-asgi-0.43b0 opentelemetry-instrumentation-dbapi-0.43b0 opentelemetry-instrumentation-django-0.43b0 opentelemetry-instrumentation-fastapi-0.43b0 opentelemetry-instrumentation-flask-0.43b0 opentelemetry-instrumentation-httpx-0.43b0 opentelemetry-instrumentation-psycopg2-0.43b0 opentelemetry-instrumentation-requests-0.43b0 opentelemetry-instrumentation-urllib-0.43b0 opentelemetry-instrumentation-urllib3-0.43b0 opentelemetry-instrumentation-wsgi-0.43b0 opentelemetry-resource-detector-azure-0.1.3 opentelemetry-sdk-1.22.0 opentelemetry-semantic-conventions-0.43b0 opentelemetry-util-http-0.43b0 packaging-23.2 pandas-2.2.0 pandas-stubs-2.1.4.231227 pillow-10.2.0 portalocker-2.8.2 priority-2.0.0 pyasn1-0.5.1 pycparser-2.21 pydantic-2.6.0 pydantic-core-2.16.1 pyjwt-2.8.0 python-dateutil-2.8.2 python-jose-3.3.0 pytz-2023.4 quart-0.19.4 quart-cors-0.7.0 regex-2023.12.25 requests-2.31.0 requests-oauthlib-1.3.1 rsa-4.9 six-1.16.0 sniffio-1.3.0 tenacity-8.2.3 tiktoken-0.5.2 tqdm-4.66.1 types-pillow-10.2.0.20240206 types-pytz-2023.4.0.20240130 typing-extensions-4.9.0 tzdata-2023.4 urllib3-2.1.0 uvicorn-0.27.0.post1 werkzeug-3.0.1 wrapt-1.16.0 wsproto-1.2.0 yarl-1.9.4 zipp-3.17.0\n\n[notice] A new release of pip is available: 23.2.1 -> 24.0\n[notice] To update, run: pip install --upgrade pip\nNot a vso image, so not writing build commands\nPreparing output...\n\nCopying files to destination directory '\/tmp\/_preCompressedDestinationDir'...\nDone in 48 sec(s).\nCompressing content of directory '\/tmp\/_preCompressedDestinationDir'...\nCopied the compressed output to '\/home\/site\/wwwroot'\n\nRemoving existing manifest file\nCreating a manifest file...\nManifest file created.\nCopying .ostype to manifest output directory.\n\nDone in 522 sec(s).\n```\n\n<\/details>\n\nLook for these important steps in the Oryx build:\n\n- _Detected following platforms: python: 3.11.7_\n That should match your runtime in the App Service configuration.\n- _Running pip install..._\n That should install all the requirements in your requirements.txt - if it didn't find your requirements.txt, then you won't see the packages installed.\n\nIf you see all those steps in the Oryx build, then that's a good sign that the build went well, and you can move on to checking the App Service logs.\n\n## Checking the app logs for errors\n\nSelect _Advanced Tools_ from the side nav:\n\n![Advanced Tools](images\/screenshot_appservice_tools.png)\n\nSelect _Go_ to open the Kudu website.\n\nWhen the Kudu website loads, find the _Current Docker Logs_ link and select _Download as zip_ next to it:\n\n![Screenshot of section with Download logs links](images\/screenshot_appservice_dockerlogs.png)\n\nIn the downloaded zip file, find the filename that starts with the most recent date and ends with \"_default_docker.log\":\n\n![Screenshot of downloaded logs](images\/screenshot_appservice_downloadedlogs.png)\n\nOpen that file to see the full logs, with the most recent logs at the bottom.\n\n
\nHere are the full logs for the app successfully starting:<\/summary>\n\n```plaintext\n\n2024-02-08T19:30:27.900249002Z _____\n2024-02-08T19:30:27.900282702Z \/ _ \\ __________ _________ ____\n2024-02-08T19:30:27.90" - }, - { - "id": 35, - "initial_rank": 35, - "content": "def main():\n parser = argparse.ArgumentParser()\n\n # Required parameters\n parser.add_argument(\n \"--data_dir\",\n default=None,\n type=str,\n required=True,\n help=\"The input data dir. Should contain the .jsonl files for MMIMDB.\",\n )\n parser.add_argument(\n \"--model_name_or_path\",\n default=None,\n type=str,\n required=True,\n help=\"Path to pretrained model or model identifier from huggingface.co\/models\",\n )\n parser.add_argument(\n \"--output_dir\",\n default=None,\n type=str,\n required=True,\n help=\"The output directory where the model predictions and checkpoints will be written.\",\n )\n\n # Other parameters\n parser.add_argument(\n \"--config_name\", default=\"\", type=str, help=\"Pretrained config name or path if not the same as model_name\"\n )\n parser.add_argument(\n \"--tokenizer_name\",\n default=\"\",\n type=str,\n help=\"Pretrained tokenizer name or path if not the same as model_name\",\n )\n parser.add_argument(\n \"--cache_dir\",\n default=None,\n type=str,\n help=\"Where do you want to store the pre-trained models downloaded from huggingface.co\",\n )\n parser.add_argument(\n \"--max_seq_length\",\n default=128,\n type=int,\n help=(\n \"The maximum total input sequence length after tokenization. Sequences longer \"\n \"than this will be truncated, sequences shorter will be padded.\"\n ),\n )\n parser.add_argument(\n \"--num_image_embeds\", default=1, type=int, help=\"Number of Image Embeddings from the Image Encoder\"\n )\n parser.add_argument(\"--do_train\", action=\"store_true\", help=\"Whether to run training.\")\n parser.add_argument(\"--do_eval\", action=\"store_true\", help=\"Whether to run eval on the dev set.\")\n parser.add_argument(\n \"--evaluate_during_training\", action=\"store_true\", help=\"Rul evaluation during training at each logging step.\"\n )\n parser.add_argument(\n \"--do_lower_case\", action=\"store_true\", help=\"Set this flag if you are using an uncased model.\"\n )\n\n parser.add_argument(\"--per_gpu_train_batch_size\", default=8, type=int, help=\"Batch size per GPU\/CPU for training.\")\n parser.add_argument(\n \"--per_gpu_eval_batch_size\", default=8, type=int, help=\"Batch size per GPU\/CPU for evaluation.\"\n )\n parser.add_argument(\n \"--gradient_accumulation_steps\",\n type=int,\n default=1,\n help=\"Number of updates steps to accumulate before performing a backward\/update pass.\",\n )\n parser.add_argument(\"--learning_rate\", default=5e-5, type=float, help=\"The initial learning rate for Adam.\")\n parser.add_argument(\"--weight_decay\", default=0.0, type=float, help=\"Weight decay if we apply some.\")\n parser.add_argument(\"--adam_epsilon\", default=1e-8, type=float, help=\"Epsilon for Adam optimizer.\")\n parser.add_argument(\"--max_grad_norm\", default=1.0, type=float, help=\"Max gradient norm.\")\n parser.add_argument(\n \"--num_train_epochs\", default=3.0, type=float, help=\"Total number of training epochs to perform.\"\n )\n parser.add_argument(\"--patience\", default=5, type=int, help=\"Patience for Early Stopping.\")\n parser.add_argument(\n \"--max_steps\",\n default=-1,\n type=int,\n help=\"If > 0: set total number of training steps to perform. Override num_train_epochs.\",\n )\n parser.add_argument(\"--warmup_steps\", default=0, type=int, help=\"Linear warmup over warmup_steps.\")\n\n parser.add_argument(\"--logging_steps\", type=int, default=50, help=\"Log every X updates steps.\")\n parser.add_argument(\"--save_steps\", type=int, default=50, help=\"Save checkpoint every X updates steps.\")\n parser.add_argument(\n \"--eval_all_checkpoints\",\n action=\"store_true\",\n help=\"Evaluate all checkpoints starting with the same prefix as model_name ending and ending with step number\",\n )\n parser.add_argument(\"--no_cuda\", action=\"store_true\", help=\"Avoid using CUDA when available\")\n parser.add_argument(\"--num_workers\", type=int, default=8, help=\"number of worker threads for dataloading\")\n parser.add_argument(\n \"--overwrite_output_dir\", action=\"store_true\", help=\"Overwrite the content of the output directory\"\n )\n parser.add_argument(\n \"--overwrite_cache\", action=\"store_true\", help=\"Overwrite the cached training and evaluation sets\"\n )\n parser.add_argument(\"--seed\", type=int, default=42, help=\"random seed for initialization\")\n\n parser.add_argument(\n \"--fp16\",\n action=\"store_true\",\n help=\"Whether to use 16-bit (mixed) precision (through NVIDIA apex) instead of 32-bit\",\n )\n parser.add_argument(\n \"--fp16_opt_level\",\n type=str,\n default=\"O1\",\n help=(\n \"For fp16: Apex AMP optimization level selected in ['O0', 'O1', 'O2', and 'O3']. \"\n \"See details at https:\/\/nvidia.github.io\/apex\/amp.html\"\n ),\n )\n parser.add_argument(\"--local_rank\", type=int, default=-1, help=\"For distributed training: local_rank\")\n parser.add_argument(\"--server_ip\", type=str, default=\"\", help=\"For distant debugging.\")\n parser.add_argument(\"--server_port\", type=str, default=\"\", help=\"For distant debugging.\")\n args = parser.parse_args()\n\n if (\n os.path.exists(args.output_dir)\n and os.listdir(args.output_dir)\n and args.do_train\n and not args.overwrite_output_dir\n ):\n raise ValueError(\n \"Output directory ({}) already exists and is not empty. Use --overwrite_output_dir to overcome.\".format(\n args.output_dir\n )\n )\n\n # Setup distant debugging if needed\n if args.server_ip and args.server_port:\n # Distant debugging - see https:\/\/code.visualstudio.com\/docs\/python\/debugging#_attach-to-a-local-script\n import ptvsd\n\n print(\"Waiting for debugger attach\")\n ptvsd.enable_attach(address=(args.server_ip, args.server_port), redirect_output=True)\n ptvsd.wait_for_attach()\n\n # Setup CUDA, GPU & distributed training\n if args.local_rank == -1 or args.no_cuda:\n device = torch.device(\"cuda\" if torch.cuda.is_available() and not args.no_cuda else \"cpu\")\n args.n_gpu = 0 if args.no_cuda else torch.cuda.device_count()\n else: # Initializes the distributed backend which will take care of synchronizing nodes\/GPUs\n torch.cuda.set_device(args.local_rank)\n device = torch.device(\"cuda\", args.local_rank)\n torch.distributed.init_process_group(backend=\"nccl\")\n args.n_gpu = 1\n\n args.device = device\n\n # Setup logging\n logging.basicConfig(\n format=\"%(asctime)s - %(levelname)s - %(name)s - %(message)s\",\n datefmt=\"%m\/%d\/%Y %H:%M:%S\",\n level=logging.INFO if args.local_rank in [-1, 0] else logging.WARN,\n )\n logger.warning(\n \"Process rank: %s, device: %s, n_gpu: %s, distributed training: %s, 16-bits training: %s\",\n args.local_rank,\n device,\n args.n_gpu,\n bool(args.local_rank != -1),\n args.fp16,\n )\n # Set the verbosity to info of the Transformers logger (on main process only):\n if is_main_process(args.local_rank):\n transformers.utils.logging.set_verbosity_info()\n transformers.utils.logging.enable_default_handler()\n transformers.utils.logging.enable_explicit_format()\n # Set seed\n set_seed(args)\n\n # Load pretrained model and tokenizer\n if args.local_rank not in [-1, 0]:\n torch.distributed.barrier() # Make sure only the first process in distributed training will download model & vocab\n\n # Setup model\n labels = get_mmimdb_labels()\n num_labels = len(labels)\n transformer_config = AutoConfig.from_pretrained(args.config_name if args.config_name else args.model_name_or_path)\n tokenizer = AutoTokenizer.from_pretrained(\n args.tokenizer_name if args.tokenizer_name else args.model_name_or_path,\n do_lower_case=args.do_lower_case,\n cache_dir=args.cache_dir,\n )\n transformer = AutoModel.from_pretrained(\n args.model_name_or_path, config=transformer_config, cache_dir=args.cache_dir\n )\n img_encoder = ImageEncoder(args)\n config = MMBTConfig(transformer_config, num_labels=num_labels)\n model = MMBTForClassification(config, transformer, img_encoder)\n\n if args.local_rank == 0:\n torch.distributed.barrier() # Make sure only the first process in distributed training will download model & vocab\n\n model.to(args.device)\n\n logger.info(\"Training\/evaluation parameters %s\", args)\n\n # Training" - }, - { - "id": 36, - "initial_rank": 36, - "content": "AE](https:\/\/huggingface.co\/docs\/transformers\/model_doc\/vit_mae#transformers.ViTMAEModel)\n* [ViTMSN](https:\/\/huggingface.co\/docs\/transformers\/model_doc\/vit_msn#transformers.ViTMSNModel)\n* [VideoMAE](https:\/\/huggingface.co\/docs\/transformers\/model_doc\/videomae#transformers.VideoMAEModell)\n* [wav2vec2](https:\/\/huggingface.co\/docs\/transformers\/model_doc\/wav2vec2#transformers.Wav2Vec2Model)\n* [Whisper](https:\/\/huggingface.co\/docs\/transformers\/model_doc\/whisper#transformers.WhisperModel)\n* [XLM-RoBERTa](https:\/\/huggingface.co\/docs\/transformers\/model_doc\/xlm-roberta#transformers.XLMRobertaModel)\n* [XLM-RoBERTa-XL](https:\/\/huggingface.co\/docs\/transformers\/model_doc\/xlm-roberta-xl#transformers.XLMRobertaXLModel)\n* [YOLOS](https:\/\/huggingface.co\/docs\/transformers\/model_doc\/yolos#transformers.YolosModel)\n\n\n\nFlashAttention can only be used for models with the `fp16` or `bf16` torch type, so make sure to cast your model to the appropriate type first. The memory-efficient attention backend is able to handle `fp32` models.\n\n<\/Tip>\n\n\n\nSDPA does not support certain sets of attention parameters, such as `head_mask` and `output_attentions=True`.\nIn that case, you should see a warning message and we will fall back to the (slower) eager implementation.\n\n<\/Tip>\n\nBy default, SDPA selects the most performant kernel available but you can check whether a backend is available in a given setting (hardware, problem size) with [`torch.backends.cuda.sdp_kernel`](https:\/\/pytorch.org\/docs\/master\/backends.html#torch.backends.cuda.sdp_kernel) as a context manager:\n\n```diff\nimport torch\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\n\ntokenizer = AutoTokenizer.from_pretrained(\"facebook\/opt-350m\")\nmodel = AutoModelForCausalLM.from_pretrained(\"facebook\/opt-350m\", torch_dtype=torch.float16).to(\"cuda\")\n\ninput_text = \"Hello my dog is cute and\"\ninputs = tokenizer(input_text, return_tensors=\"pt\").to(\"cuda\")\n\n+ with torch.backends.cuda.sdp_kernel(enable_flash=True, enable_math=False, enable_mem_efficient=False):\n outputs = model.generate(**inputs)\n\nprint(tokenizer.decode(outputs[0], skip_special_tokens=True))\n```\n\nIf you see a bug with the traceback below, try using the nightly version of PyTorch which may have broader coverage for FlashAttention:\n\n```bash\nRuntimeError: No available kernel. Aborting execution.\n\n# install PyTorch nightly\npip3 install -U --pre torch torchvision torchaudio --index-url https:\/\/download.pytorch.org\/whl\/nightly\/cu118\n```\n\n## BetterTransformer\n\n\n\nSome BetterTransformer features are being upstreamed to Transformers with default support for native `torch.nn.scaled_dot_product_attention`. BetterTransformer still has a wider coverage than the Transformers SDPA integration, but you can expect more and more architectures to natively support SDPA in Transformers.\n\n<\/Tip>\n\n\n\nCheck out our benchmarks with BetterTransformer and scaled dot product attention in the [Out of the box acceleration and memory savings of 🤗 decoder models with PyTorch 2.0](https:\/\/pytorch.org\/blog\/out-of-the-box-acceleration\/) and learn more about the fastpath execution in the [BetterTransformer](https:\/\/medium.com\/pytorch\/bettertransformer-out-of-the-box-performance-for-huggingface-transformers-3fbe27d50ab2) blog post.\n\n<\/Tip>\n\nBetterTransformer accelerates inference with its fastpath (native PyTorch specialized implementation of Transformer functions) execution. The two optimizations in the fastpath execution are:\n\n1. fusion, which combines multiple sequential operations into a single \"kernel\" to reduce the number of computation steps\n2. skipping the inherent sparsity of padding tokens to avoid unnecessary computation with nested tensors\n\nBetterTransformer also converts all attention operations to use the more memory-efficient [scaled dot product attention (SDPA)](https:\/\/pytorch.org\/docs\/master\/generated\/torch.nn.functional.scaled_dot_product_attention), and it calls optimized kernels like [FlashAttention](https:\/\/huggingface.co\/papers\/2205.14135) under the hood.\n\nBefore you start, make sure you have 🤗 Optimum [installed](https:\/\/huggingface.co\/docs\/optimum\/installation).\n\nThen you can enable BetterTransformer with the [`PreTrainedModel.to_bettertransformer`] method:\n\n```python\nmodel = model.to_bettertransformer()\n```\n\nYou can return the original Transformers model with the [`~PreTrainedModel.reverse_bettertransformer`] method. You should use this before saving your model to use the canonical Transformers modeling:\n\n```py\nmodel = model.reverse_bettertransformer()\nmodel.save_pretrained(\"saved_model\")\n```\n\n## bitsandbyt" - }, - { - "id": 37, - "initial_rank": 37, - "content": "def main():\n parser = argparse.ArgumentParser()\n\n # Required parameters\n parser.add_argument(\n \"--data_dir\",\n default=None,\n type=str,\n required=True,\n help=\"The input data dir. Should contain the .tsv files (or other data files) for the task.\",\n )\n parser.add_argument(\n \"--model_type\",\n default=None,\n type=str,\n required=True,\n help=\"Model type selected in the list: \" + \", \".join(MODEL_CLASSES.keys()),\n )\n parser.add_argument(\n \"--model_name_or_path\",\n default=None,\n type=str,\n required=True,\n help=\"Path to pre-trained model or shortcut name.\",\n )\n parser.add_argument(\n \"--task_name\",\n default=None,\n type=str,\n required=True,\n help=\"The name of the task to train selected in the list: \" + \", \".join(processors.keys()),\n )\n parser.add_argument(\n \"--output_dir\",\n default=None,\n type=str,\n required=True,\n help=\"The output directory where the model predictions and checkpoints will be written.\",\n )\n parser.add_argument(\n \"--patience\",\n default=\"0\",\n type=str,\n required=False,\n )\n parser.add_argument(\n \"--regression_threshold\",\n default=0,\n type=float,\n required=False,\n )\n\n # Other parameters\n parser.add_argument(\n \"--config_name\",\n default=\"\",\n type=str,\n help=\"Pretrained config name or path if not the same as model_name\",\n )\n parser.add_argument(\n \"--tokenizer_name\",\n default=\"\",\n type=str,\n help=\"Pretrained tokenizer name or path if not the same as model_name\",\n )\n parser.add_argument(\n \"--cache_dir\",\n default=\"\",\n type=str,\n help=\"Where do you want to store the pre-trained models downloaded from huggingface.co\",\n )\n parser.add_argument(\n \"--max_seq_length\",\n default=128,\n type=int,\n help=(\n \"The maximum total input sequence length after tokenization. Sequences longer \"\n \"than this will be truncated, sequences shorter will be padded.\"\n ),\n )\n parser.add_argument(\"--do_train\", action=\"store_true\", help=\"Whether to run training.\")\n parser.add_argument(\"--do_eval\", action=\"store_true\", help=\"Whether to run eval on the dev set.\")\n parser.add_argument(\n \"--evaluate_during_training\",\n action=\"store_true\",\n help=\"Run evaluation during training at each logging step.\",\n )\n parser.add_argument(\n \"--do_lower_case\",\n action=\"store_true\",\n help=\"Set this flag if you are using an uncased model.\",\n )\n\n parser.add_argument(\n \"--per_gpu_train_batch_size\",\n default=8,\n type=int,\n help=\"Batch size per GPU\/CPU for training.\",\n )\n parser.add_argument(\n \"--per_gpu_eval_batch_size\",\n default=1,\n type=int,\n help=\"Batch size per GPU\/CPU for evaluation.\",\n )\n parser.add_argument(\n \"--gradient_accumulation_steps\",\n type=int,\n default=1,\n help=\"Number of updates steps to accumulate before performing a backward\/update pass.\",\n )\n parser.add_argument(\n \"--learning_rate\",\n default=5e-5,\n type=float,\n help=\"The initial learning rate for Adam.\",\n )\n parser.add_argument(\"--weight_decay\", default=0.0, type=float, help=\"Weight decay if we apply some.\")\n parser.add_argument(\"--adam_epsilon\", default=1e-8, type=float, help=\"Epsilon for Adam optimizer.\")\n parser.add_argument(\"--max_grad_norm\", default=1.0, type=float, help=\"Max gradient norm.\")\n parser.add_argument(\n \"--num_train_epochs\",\n default=3.0,\n type=float,\n help=\"Total number of training epochs to perform.\",\n )\n parser.add_argument(\n \"--max_steps\",\n default=-1,\n type=int,\n help=\"If > 0: set total number of training steps to perform. Override num_train_epochs.\",\n )\n parser.add_argument(\"--warmup_steps\", default=0, type=int, help=\"Linear warmup over warmup_steps.\")\n\n parser.add_argument(\"--logging_steps\", type=int, default=500, help=\"Log every X updates steps.\")\n parser.add_argument(\n \"--save_steps\",\n type=int,\n default=500,\n help=\"Save checkpoint every X updates steps.\",\n )\n parser.add_argument(\n \"--eval_all_checkpoints\",\n action=\"store_true\",\n help=\"Evaluate all checkpoints starting with the same prefix as model_name ending and ending with step number\",\n )\n parser.add_argument(\"--no_cuda\", action=\"store_true\", help=\"Avoid using CUDA when available\")\n parser.add_argument(\n \"--overwrite_output_dir\",\n action=\"store_true\",\n help=\"Overwrite the content of the output directory\",\n )\n parser.add_argument(\n \"--overwrite_cache\",\n action=\"store_true\",\n help=\"Overwrite the cached training and evaluation sets\",\n )\n parser.add_argument(\"--seed\", type=int, default=42, help=\"random seed for initialization\")\n\n parser.add_argument(\n \"--fp16\",\n action=\"store_true\",\n help=\"Whether to use 16-bit (mixed) precision (through NVIDIA apex) instead of 32-bit\",\n )\n parser.add_argument(\n \"--fp16_opt_level\",\n type=str,\n default=\"O1\",\n help=(\n \"For fp16: Apex AMP optimization level selected in ['O0', 'O1', 'O2', and 'O3']. \"\n \"See details at https:\/\/nvidia.github.io\/apex\/amp.html\"\n ),\n )\n parser.add_argument(\n \"--local_rank\",\n type=int,\n default=-1,\n help=\"For distributed training: local_rank\",\n )\n parser.add_argument(\"--server_ip\", type=str, default=\"\", help=\"For distant debugging.\")\n parser.add_argument(\"--server_port\", type=str, default=\"\", help=\"For distant debugging.\")\n args = parser.parse_args()\n\n if (\n os.path.exists(args.output_dir)\n and os.listdir(args.output_dir)\n and args.do_train\n and not args.overwrite_output_dir\n ):\n raise ValueError(\n \"Output directory ({}) already exists and is not empty. Use --overwrite_output_dir to overcome.\".format(\n args.output_dir\n )\n )\n\n # Setup distant debugging if needed\n if args.server_ip and args.server_port:\n # Distant debugging - see https:\/\/code.visualstudio.com\/docs\/python\/debugging#_attach-to-a-local-script\n import ptvsd\n\n print(\"Waiting for debugger attach\")\n ptvsd.enable_attach(address=(args.server_ip, args.server_port), redirect_output=True)\n ptvsd.wait_for_attach()\n\n # Setup CUDA, GPU & distributed training\n if args.local_rank == -1 or args.no_cuda:\n device = torch.device(\"cuda\" if torch.cuda.is_available() and not args.no_cuda else \"cpu\")\n args.n_gpu = torch.cuda.device_count()\n else: # Initializes the distributed backend which will take care of synchronizing nodes\/GPUs\n torch.cuda.set_device(args.local_rank)\n device = torch.device(\"cuda\", args.local_rank)\n torch.distributed.init_process_group(backend=\"nccl\")\n args.n_gpu = 1\n args.device = device\n\n # Setup logging\n logging.basicConfig(\n format=\"%(asctime)s - %(levelname)s - %(name)s - %(message)s\",\n datefmt=\"%m\/%d\/%Y %H:%M:%S\",\n level=logging.INFO if args.local_rank in [-1, 0] else logging.WARN,\n )\n logger.warning(\n \"Process rank: %s, device: %s, n_gpu: %s, distributed training: %s, 16-bits training: %s\",\n args.local_rank,\n device,\n args.n_gpu,\n bool(args.local_rank != -1),\n args.fp16,\n )\n # Set the verbosity to info of the Transformers logger (on main process only):\n if is_main_process(args.local_rank):\n transformers.utils.logging.set_verbosity_info()\n transformers.utils.logging.enable_default_handler()\n transformers.utils.logging.enable_explicit_format()\n # Set seed\n set_seed(args)\n\n # Prepare GLUE task\n args.task_name = args.task_name.lower()\n if args.task_name not in processors:\n raise ValueError(\"Task not found: %s\" % (args.task_name))\n processor = processors[args.task_name]()\n args.output_mode = output_modes[args.task_name]\n label_list = processor.get_labels()\n num_labels = len(label_list)\n\n if args.patience != \"0\" and args.per_gpu_eval_batch_size != 1:\n raise ValueError(\"The eval batch size must be 1 with PABEE inference on.\")\n\n # Load pretrained model and tokenizer\n if args.local_rank not in [-1, 0]:\n torch.distributed.barrier() # Make sure only the first process in distributed training will download model & vocab\n\n args.model_type = args.model_type.lower()" - }, - { - "id": 38, - "initial_rank": 38, - "content": "appdirs==1.4.3\nargon2-cffi==20.1.0\nasync-generator==1.10\nattrs==20.2.0\nbackcall==0.2.0\nCacheControl==0.12.6\ncertifi==2024.7.4\ncffi==1.14.2\nchardet==3.0.4\nclick==7.1.2\ncolorama==0.4.3\ncontextlib2==0.6.0\ncycler==0.10.0\ndatasets==1.0.0\ndecorator==4.4.2\ndefusedxml==0.6.0\ndill==0.3.2\ndistlib==0.3.0\ndistro==1.4.0\nentrypoints==0.3\nfilelock==3.0.12\nfuture==0.18.3\nhtml5lib==1.0.1\nidna==3.7\nipaddr==2.2.0\nipykernel==5.3.4\nipython\nipython-genutils==0.2.0\nipywidgets==7.5.1\njedi==0.17.2\nJinja2>=2.11.3\njoblib==1.2.0\njsonschema==3.2.0\njupyter==1.0.0\njupyter-client==6.1.7\njupyter-console==6.2.0\njupyter-core==4.11.2\njupyterlab-pygments==0.1.1\nkiwisolver==1.2.0\nlockfile==0.12.2\nMarkupSafe==1.1.1\nmatplotlib==3.3.1\nmistune==2.0.3\nmsgpack==0.6.2\nnbclient==0.5.0\nnbconvert==6.5.1\nnbformat==5.0.7\nnest-asyncio==1.4.0\nnotebook==6.4.12\nnumpy==1.22.0\nopencv-python==4.8.1.78\npackaging==20.3\npandas==1.1.2\npandocfilters==1.4.2\nparso==0.7.1\npep517==0.8.2\npexpect==4.8.0\npickleshare==0.7.5\nPillow>=8.1.1\nprogress==1.5\nprometheus-client==0.8.0\nprompt-toolkit==3.0.7\nptyprocess==0.6.0\npyaml==20.4.0\npyarrow==15.0.0\npycparser==2.20\nPygments>=2.7.4\npyparsing==2.4.6\npyrsistent==0.16.0\npython-dateutil==2.8.1\npytoml==0.1.21\npytz==2020.1\nPyYAML>=5.4\npyzmq==19.0.2\nqtconsole==4.7.7\nQtPy==1.9.0\nregex==2020.7.14\nrequests==2.32.2\nretrying==1.3.3\nsacremoses==0.0.43\nSend2Trash==1.5.0\nsentencepiece==0.1.91\nsix==1.14.0\nterminado==0.8.3\ntestpath==0.4.4\ntokenizers==0.8.1rc2\ntorch==2.2.0\ntorchvision==0.7.0\ntornado==6.4.1\ntqdm==4.66.3\ntraitlets\ngit+https:\/\/github.com\/huggingface\/transformers.git\nurllib3==1.26.19\nwcwidth==0.2.5\nwebencodings==0.5.1\nwget==3.2\nwidgetsnbextension==3.5.1\nxxhash==2.0.0\n" - }, - { - "id": 39, - "initial_rank": 39, - "content": "appdirs==1.4.3\nargon2-cffi==20.1.0\nasync-generator==1.10\nattrs==20.2.0\nbackcall==0.2.0\nCacheControl==0.12.6\ncertifi==2024.7.4\ncffi==1.14.2\nchardet==3.0.4\nclick==7.1.2\ncolorama==0.4.3\ncontextlib2==0.6.0\ncycler==0.10.0\ndatasets==1.0.0\ndecorator==4.4.2\ndefusedxml==0.6.0\ndill==0.3.2\ndistlib==0.3.0\ndistro==1.4.0\nentrypoints==0.3\nfilelock==3.0.12\nfuture==0.18.3\nhtml5lib==1.0.1\nidna==3.7\nipaddr==2.2.0\nipykernel==5.3.4\nipython\nipython-genutils==0.2.0\nipywidgets==7.5.1\njedi==0.17.2\nJinja2>=2.11.3\njoblib==1.2.0\njsonschema==3.2.0\njupyter==1.0.0\njupyter-client==6.1.7\njupyter-console==6.2.0\njupyter-core==4.11.2\njupyterlab-pygments==0.1.1\nkiwisolver==1.2.0\nlockfile==0.12.2\nMarkupSafe==1.1.1\nmatplotlib==3.3.1\nmistune==2.0.3\nmsgpack==0.6.2\nnbclient==0.5.0\nnbconvert==6.5.1\nnbformat==5.0.7\nnest-asyncio==1.4.0\nnotebook==6.4.12\nnumpy==1.22.0\nopencv-python==4.8.1.78\npackaging==20.3\npandas==1.1.2\npandocfilters==1.4.2\nparso==0.7.1\npep517==0.8.2\npexpect==4.8.0\npickleshare==0.7.5\nPillow>=8.1.1\nprogress==1.5\nprometheus-client==0.8.0\nprompt-toolkit==3.0.7\nptyprocess==0.6.0\npyaml==20.4.0\npyarrow==15.0.0\npycparser==2.20\nPygments>=2.7.4\npyparsing==2.4.6\npyrsistent==0.16.0\npython-dateutil==2.8.1\npytoml==0.1.21\npytz==2020.1\nPyYAML>=5.4\npyzmq==19.0.2\nqtconsole==4.7.7\nQtPy==1.9.0\nregex==2020.7.14\nrequests==2.32.2\nretrying==1.3.3\nsacremoses==0.0.43\nSend2Trash==1.5.0\nsentencepiece==0.1.91\nsix==1.14.0\nterminado==0.8.3\ntestpath==0.4.4\ntokenizers==0.8.1rc2\ntorch==2.2.0\ntorchvision==0.7.0\ntornado==6.4.1\ntqdm==4.66.3\ntraitlets\ngit+https:\/\/github.com\/huggingface\/transformers.git\nurllib3==1.26.19\nwcwidth==0.2.5\nwebencodings==0.5.1\nwget==3.2\nwidgetsnbextension==3.5.1\nxxhash==2.0.0\n" - }, - { - "id": 40, - "initial_rank": 40, - "content": "@require_torch\nclass UMT5ModelTest(ModelTesterMixin, GenerationTesterMixin, PipelineTesterMixin, unittest.TestCase):\n all_model_classes = (\n (UMT5Model, UMT5ForConditionalGeneration, UMT5ForSequenceClassification, UMT5ForQuestionAnswering)\n if is_torch_available()\n else ()\n )\n all_generative_model_classes = (UMT5ForConditionalGeneration,) if is_torch_available() else ()\n pipeline_model_mapping = (\n {\n \"feature-extraction\": UMT5Model,\n \"question-answering\": UMT5ForQuestionAnswering,\n \"summarization\": UMT5ForConditionalGeneration,\n \"text-classification\": UMT5ForSequenceClassification,\n \"text2text-generation\": UMT5ForConditionalGeneration,\n \"translation\": UMT5ForConditionalGeneration,\n \"zero-shot\": UMT5ForSequenceClassification,\n }\n if is_torch_available()\n else {}\n )\n is_encoder_decoder = True\n fx_compatible = False\n test_pruning = False\n test_missing_keys = True\n test_torchscript = True\n # The small UMT5 model needs higher percentages for CPU\/MP tests\n model_split_percents = [0.5, 0.8, 0.9]\n\n def setUp(self):\n self.model_tester = UMT5ModelTester(self)\n\n # `QAPipelineTests` is not working well with slow tokenizers (for some models) and we don't want to touch the file\n # `src\/transformers\/data\/processors\/squad.py` (where this test fails for this model)\n def is_pipeline_test_to_skip(\n self, pipeline_test_casse_name, config_class, model_architecture, tokenizer_name, processor_name\n ):\n if pipeline_test_casse_name == \"QAPipelineTests\" and not tokenizer_name.endswith(\"Fast\"):\n return True\n\n return False\n\n def _create_and_check_torch_fx_tracing(self, config, inputs_dict, output_loss=False):\n if not is_torch_fx_available() or not self.fx_compatible:\n self.skipTest(reason=\"torch fx is not available or not compatible with this model\")\n\n configs_no_init = _config_zero_init(config) # To be sure we have no Nan\n configs_no_init.return_dict = False\n\n for model_class in self.all_model_classes:\n if model_class.__name__ == \"UMT5ForSequenceClassification\":\n continue\n model = model_class(config=configs_no_init)\n model.to(torch_device)\n model.eval()\n inputs = self._prepare_for_class(inputs_dict, model_class, return_labels=output_loss)\n\n try:\n if model.config.is_encoder_decoder:\n model.config.use_cache = False # FSTM still requires this hack -> FSTM should probably be refactored similar to BART afterward\n labels = inputs.get(\"labels\", None)\n input_names = [\n \"attention_mask\",\n \"decoder_attention_mask\",\n \"decoder_input_ids\",\n \"input_features\",\n \"input_ids\",\n \"input_values\",\n ]\n if labels is not None:\n input_names.append(\"labels\")\n\n filtered_inputs = {k: v for (k, v) in inputs.items() if k in input_names}\n input_names = list(filtered_inputs.keys())\n\n model_output = model(**filtered_inputs)\n\n traced_model = symbolic_trace(model, input_names)\n traced_output = traced_model(**filtered_inputs)\n else:\n input_names = [\n \"attention_mask\",\n \"bbox\",\n \"input_features\",\n \"input_ids\",\n \"input_values\",\n \"pixel_values\",\n \"token_type_ids\",\n \"visual_feats\",\n \"visual_pos\",\n ]\n\n labels = inputs.get(\"labels\", None)\n start_positions = inputs.get(\"start_positions\", None)\n end_positions = inputs.get(\"end_positions\", None)\n if labels is not None:\n input_names.append(\"labels\")\n if start_positions is not None:\n input_names.append(\"start_positions\")\n if end_positions is not None:\n input_names.append(\"end_positions\")\n\n filtered_inputs = {k: v for (k, v) in inputs.items() if k in input_names}\n input_names = list(filtered_inputs.keys())\n\n if model.__class__.__name__ in set(MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING_NAMES.values()) and (\n not hasattr(model.config, \"problem_type\") or model.config.problem_type is None\n ):\n model.config.problem_type = \"single_label_classification\"\n\n traced_model = symbolic_trace(model, input_names)\n traced_output = traced_model(**filtered_inputs)\n model_output = model(**filtered_inputs)\n\n except Exception as e:\n self.fail(f\"Couldn't trace module: {e}\")\n\n def flatten_output(output):\n flatten = []\n for x in output:\n if isinstance(x, (tuple, list)):\n flatten += flatten_output(x)\n elif not isinstance(x, torch.Tensor):\n continue\n else:\n flatten.append(x)\n return flatten\n\n model_output = flatten_output(model_output)\n traced_output = flatten_output(traced_output)\n num_outputs = len(model_output)\n\n for i in range(num_outputs):\n self.assertTrue(\n torch.allclose(model_output[i], traced_output[i]),\n f\"traced {i}th output doesn't match model {i}th output for {model_class}\",\n )\n\n # Test that the model can be serialized and restored properly\n with tempfile.TemporaryDirectory() as tmp_dir_name:\n pkl_file_name = os.path.join(tmp_dir_name, \"model.pkl\")\n try:\n with open(pkl_file_name, \"wb\") as f:\n pickle.dump(traced_model, f)\n with open(pkl_file_name, \"rb\") as f:\n loaded = pickle.load(f)\n except Exception as e:\n self.fail(f\"Couldn't serialize \/ deserialize the traced model: {e}\")\n\n loaded_output = loaded(**filtered_inputs)\n loaded_output = flatten_output(loaded_output)\n\n for i in range(num_outputs):\n self.assertTrue(\n torch.allclose(model_output[i], loaded_output[i]),\n f\"serialized model {i}th output doesn't match model {i}th output for {model_class}\",\n )\n\n # Avoid memory leak. Without this, each call increase RAM usage by ~20MB.\n # (Even with this call, there are still memory leak by ~0.04MB)\n self.clear_torch_jit_class_registry()\n\n # UMT5ForSequenceClassification does not support inputs_embeds\n def test_inputs_embeds(self):\n config, inputs_dict = self.model_tester.prepare_config_and_inputs_for_common()\n\n for model_class in (UMT5Model, UMT5ForConditionalGeneration, UMT5ForQuestionAnswering):\n model = model_class(config)\n model.to(torch_device)\n model.eval()\n\n inputs = copy.deepcopy(self._prepare_for_class(inputs_dict, model_class))\n\n if not self.is_encoder_decoder:\n input_ids = inputs[\"input_ids\"]\n del inputs[\"input_ids\"]\n else:\n encoder_input_ids = inputs[\"input_ids\"]\n decoder_input_ids = inputs.get(\"decoder_input_ids\", encoder_input_ids)\n del inputs[\"input_ids\"]\n inputs.pop(\"decoder_input_ids\", None)\n\n wte = model.get_input_embeddings()\n if not self.is_encoder_decoder:\n inputs[\"inputs_embeds\"] = wte(input_ids)\n else:\n inputs[\"inputs_embeds\"] = wte(encoder_input_ids)\n inputs[\"decoder_inputs_embeds\"] = wte(decoder_input_ids)\n\n with torch.no_grad():\n model(**inputs)[0]\n\n def test_with_sequence_classification_head(self):\n config_and_inputs = self.model_tester.prepare_config_and_inputs()\n self.model_tester.create_and_check_with_sequence_classification_head(*config_and_inputs)\n\n @unittest.skip(reason=\"Test has a segmentation fault on torch 1.8.0\")\n def test_export_to_onnx(self):\n config_and_inputs = self.model_tester.prepare_config_and_inputs()\n model = UMT5Model(config_and_inputs[0]).to(torch_device)\n with tempfile.TemporaryDirectory() as tmpdirname:\n torch.onnx.export(\n model,\n (config_and_inputs[1], config_and_inputs[3], config_and_inputs[2]),\n f\"{tmpdirname}\/t5_test.onnx\",\n export_params=True,\n opset_version=9,\n input_names=[\"input_ids\", \"decoder_input_ids\"],\n )\n\n @unittest.skipIf(torch_device == \"cpu\", \"Cant do half precision\")\n def test_model_fp16_forward(self):\n config_and_inputs = self.model_tester.prepare_config_and_inputs()\n self.model_tester.create_and_check_model_fp16_forward(*config_and_inputs)" - }, - { - "id": 41, - "initial_rank": 41, - "content": "-bit (LLM.int8() algorithm)\n\n\n\nLearn more about the details of 8-bit quantization in this [blog post](https:\/\/huggingface.co\/blog\/hf-bitsandbytes-integration)!\n\n<\/Tip>\n\nThis section explores some of the specific features of 8-bit models, such as offloading, outlier thresholds, skipping module conversion, and finetuning.\n\n### Offloading\n\n8-bit models can offload weights between the CPU and GPU to support fitting very large models into memory. The weights dispatched to the CPU are actually stored in **float32**, and aren't converted to 8-bit. For example, to enable offloading for the [bigscience\/bloom-1b7](https:\/\/huggingface.co\/bigscience\/bloom-1b7) model, start by creating a [`BitsAndBytesConfig`]:\n\n```py\nfrom transformers import AutoModelForCausalLM, BitsAndBytesConfig\n\nquantization_config = BitsAndBytesConfig(llm_int8_enable_fp32_cpu_offload=True)\n```\n\nDesign a custom device map to fit everything on your GPU except for the `lm_head`, which you'll dispatch to the CPU:\n\n```py\ndevice_map = {\n \"transformer.word_embeddings\": 0,\n \"transformer.word_embeddings_layernorm\": 0,\n \"lm_head\": \"cpu\",\n \"transformer.h\": 0,\n \"transformer.ln_f\": 0,\n}\n```\n\nNow load your model with the custom `device_map` and `quantization_config`:\n\n```py\nmodel_8bit = AutoModelForCausalLM.from_pretrained(\n \"bigscience\/bloom-1b7\",\n device_map=device_map,\n quantization_config=quantization_config,\n)\n```\n\n### Outlier threshold\n\nAn \"outlier\" is a hidden state value greater than a certain threshold, and these values are computed in fp16. While the values are usually normally distributed ([-3.5, 3.5]), this distribution can be very different for large models ([-60, 6] or [6, 60]). 8-bit quantization works well for values ~5, but beyond that, there is a significant performance penalty. A good default threshold value is 6, but a lower threshold may be needed for more unstable models (small models or finetuning).\n\nTo find the best threshold for your model, we recommend experimenting with the `llm_int8_threshold` parameter in [`BitsAndBytesConfig`]:\n\n```py\nfrom transformers import AutoModelForCausalLM, BitsAndBytesConfig\n\nmodel_id = \"bigscience\/bloom-1b7\"\n\nquantization_config = BitsAndBytesConfig(\n llm_int8_threshold=10,\n)\n\nmodel_8bit = AutoModelForCausalLM.from_pretrained(\n model_id,\n device_map=device_map,\n quantization_config=quantization_config,\n)\n```\n\n### Skip module conversion\n\nFor some models, like [Jukebox](model_doc\/jukebox), you don't need to quantize every module to 8-bit which can actually cause instability. With Jukebox, there are several `lm_head` modules that should be skipped using the `llm_int8_skip_modules` parameter in [`BitsAndBytesConfig`]:\n\n```py\nfrom transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig\n\nmodel_id = \"bigscience\/bloom-1b7\"\n\nquantization_config = BitsAndBytesConfig(\n llm_int8_skip_modules=[\"lm_head\"],\n)\n\nmodel_8bit = AutoModelForCausalLM.from_pretrained(\n model_id,\n device_map=\"auto\",\n quantization_config=quantization_config,\n)\n```\n\n### Finetuning\n\nWith the [PEFT](https:\/\/github.com\/huggingface\/peft) library, you can finetune large models like [flan-t5-large](https:\/\/huggingface.co\/google\/flan-t5-large) and [facebook\/opt-6.7b](https:\/\/huggingface.co\/facebook\/opt-6.7b) with 8-bit quantization. You don't need to pass the `device_map` parameter for training because it'll automatically load your model on a GPU. However, you can still customize the device map with the `device_map` parameter if you want to (`device_map=\"auto\"` should only be used for inference).\n\n## 4-bit (QLoRA algorithm)\n\n\n\nTry 4-bit quantization in this [notebook](https:\/\/colab.research.google.com\/drive\/1ge2F1QSK8Q7h0hn3YKuBCOAS0bK8E0wf) and learn more about it's details in this [blog post](https:\/\/huggingface.co\/blog\/4bit-transformers-bitsandbytes).\n\n<\/Tip>\n\nThis section explores some of the specific features of 4-bit models, such as changing the compute data type, using the Normal Float 4 (NF4) data type, and using nested quantization.\n\n\n### Compute data type\n\nTo speedup computation, you can change the data type from float32 (the default value) to bf16 using the `bnb_4bit_compute_dtype` parameter in [`BitsAndBytesConfig`]:\n\n```py\nimport torch\nfrom transformers import BitsAndBytesConfig\n\nquantization_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.bfloat16)\n```\n\n### Normal Float 4 (NF4)\n\nNF4 is a 4-bit data type from the [QLoRA](https:\/\/hf.co\/papers\/2305.14314) paper, adapted for weights initialized from a normal distribution. You should use NF4 for training 4-bit base models. This can be configured with the `bnb_4bit_quant_type` parameter in the [`BitsAndBytesConfig`]:\n\n```py\nfrom transformers import BitsAndBytesConfig\n\nnf4_config = BitsAndBytesConfig(\n load_in_4bit=True,\n bnb_4bit_quant_type=\"nf4\",\n)\n\nmodel_nf4 = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=nf4_config)\n```\n\nFor inference, the `bnb_4bit_quant_type` does not have a huge impact on performance. However, to remain consistent with the model weights, you should use the `bnb_4bit_compute_dtype` and `torch_dtype` values.\n\n### Nested quantization\n\nNested quantization is a technique that can save additional memory at no additional performance cost. This feature performs a second quantization of the already quantized weights to save an additional 0.4 bits\/parameter. For example, with nested quantization, you can finetune a [Llama-13b](https:\/\/huggingface.co\/meta-llama\/Llama-2-13b) model on a 16GB NVIDIA T4 GPU with a sequence length of 1024, a batch size of 1, and enabling gradient accumulation with 4 steps.\n\n```py\nfrom transformers import BitsAndBytesConfig\n\ndouble_quant_config = BitsAndBytesConfig(\n load_in_4bit=True,\n bnb_4bit_use_double_quant=True,\n)\n\nmodel_double_quant = AutoModelForCausalLM.from_pretrained(\"meta-llama\/Llama-2-13b\", quantization_config=double_quant_config)\n```\n\n## Dequantizing `bitsandbytes` models\n\nOnce quantized, you can dequantize the model to the original precision but this might result in a small quality loss of the model. Make sure you have enough GPU RAM to fit the dequantized model. \n\n```python\nfrom transformers import AutoModelForCausalLM, BitsAndBytesConfig, AutoTokenizer\n\nmodel_id = \"facebook\/opt-125m\"\n\nmodel = AutoModelForCausalLM.from_pretrained(model_id, BitsAndBytesConfig(load_in_4bit=True))\ntokenizer = AutoTokenizer.from_pretrained(model_id)\n\nmodel.dequantize()\n\ntext = tokenizer(\"Hello my name is\", return_tensors=\"pt\").to(0)\n\nout = model.generate(**text)\nprint(tokenizer.decode(out[0]))\n```" - }, - { - "id": 42, - "initial_rank": 42, - "content": " },\n {\n \"name\": \"stdout\",\n \"output_type\": \"stream\",\n \"text\": [\n \"[Jan 04, 16:02:56] Loading filter_pids_cpp extension (set COLBERT_LOAD_TORCH_EXTENSION_VERBOSE=True for more info)...\\n\"\n ]\n },\n {\n \"name\": \"stderr\",\n \"output_type\": \"stream\",\n \"text\": [\n \"\\n\"\n ]\n },\n {\n \"name\": \"stdout\",\n \"output_type\": \"stream\",\n \"text\": [\n \"[Jan 04, 16:02:56] Loading decompress_residuals_cpp extension (set COLBERT_LOAD_TORCH_EXTENSION_VERBOSE=True for more info)...\\n\",\n \"Searcher loaded!\\n\",\n \"\\n\",\n \"#> QueryTokenizer.tensorize(batch_text[0], batch_background[0], bsize) ==\\n\",\n \"#> Input: . How does ColBERTv2 compare with SPLADEv2?, \\t\\t True, \\t\\t None\\n\",\n \"#> Output IDs: torch.Size([32]), tensor([ 101, 1, 2129, 2515, 23928, 2615, 2475, 12826, 2007, 11867,\\n\",\n \" 27266, 6777, 2475, 1029, 102, 103, 103, 103, 103, 103,\\n\",\n \" 103, 103, 103, 103, 103, 103, 103, 103, 103, 103,\\n\",\n \" 103, 103])\\n\",\n \"#> Output Mask: torch.Size([32]), tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,\\n\",\n \" 0, 0, 0, 0, 0, 0, 0, 0])\\n\",\n \"\\n\"\n ]\n },\n {\n \"name\": \"stderr\",\n \"output_type\": \"stream\",\n \"text\": [\n \"\/Users\/jerryliu\/Programming\/llama-hub\/.venv\/lib\/python3.10\/site-packages\/torch\/amp\/autocast_mode.py:250: UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disabling\\n\",\n \" warnings.warn(\\n\"\n ]\n },\n {\n \"data\": {\n \"text\/markdown\": [\n \"**Node ID:** 5e4028f7-fbb5-4440-abd0-0d8270cc8979
**Similarity:** 17.003997802734375
**Text:** While highly competitive in effec-\\n\",\n \"tiveness, ColBERT is orders of magnitude cheaper than BERT base...
\"\n ],\n \"text\/plain\": [\n \"\"\n ]\n },\n \"metadata\": {},\n \"output_type\": \"display_data\"\n },\n {\n \"data\": {\n \"text\/markdown\": [\n \"**Node ID:** d6240a29-0a5e-458f-86f1-abe570e13200
**Similarity:** 16.764663696289062
**Text:** Note that any BERT-based model\\n\",\n \"must incur the computational cost of processing each document\\n\",\n \"at l...
\"\n ],\n \"text\/plain\": [\n \"\"\n ]\n },\n \"metadata\": {},\n \"output_type\": \"display_data\"\n },\n {\n \"data\": {\n \"text\/markdown\": [\n \"**Node ID:** d19c0fe7-bdb7-4a51-ae89-00cd746b2d3a
**Similarity:** 16.70589828491211
**Text:** For instance,\\n\",\n \"its Recall@50 actually exceeds the official BM25’s Recall@1000 and\\n\",\n \"even all but docTT...
\"\n ],\n \"text\/plain\": [\n \"\"\n ]\n },\n \"metadata\": {},\n \"output_type\": \"display_data\"\n },\n {\n \"data\": {\n \"text\/markdown\": [\n \"**Node ID:** 38e84e5b-4345-4b08-a7fd-de2de4fa645a
**Similarity:** 16.577777862548828
**Text:** \/T_his layer serves to control the dimension\\n\",\n \"of ColBERT’s embeddings, producing m-dimensional emb...
\"\n ],\n \"text\/plain\": [\n \"\"\n ]\n },\n \"metadata\": {},\n \"output_type\": \"display_data\"\n },\n {\n \"data\": {\n \"text\/markdown\": [\n \"**Node ID:** c82df506-412a-40c2-baf3-df51ab43e434
**Similarity:** 16.252092361450195
**Text:** For instance, at k=10, BERT requires nearly\\n\",\n \"180\\u0002more FLOPs than ColBERT; at k=1000, BERT’s overhe...
\"\n ],\n \"text\/plain\": [\n \"\"\n ]\n },\n \"metadata\": {},\n \"output_type\": \"display_data\"\n }\n ],\n \"source\": [\n \"from llama_index.core.response.notebook_utils import display_source_node\\n\",\n \"\\n\",\n \"retriever = ragatouille_pack.get_modules()[\\\"retriever\\\"]\\n\",\n \"nodes = retriever.retrieve(\\\"How does ColBERTv2 compare with other BERT models?\\\")\\n\",\n \"\\n\",\n \"for node in nodes:\\n\",\n \" display_source_node(node)\"\n ]\n },\n {\n \"cell_type\": \"code\",\n \"execution_count\": null,\n \"id\": \"b206fab9-a980-44c8-8e76-7e340f7d08eb\",\n \"metadata\": {},\n \"outputs\": [\n {\n \"name\": \"stderr\",\n \"output_type\": \"stream\",\n \"text\": [\n \"\/Users\/jerryliu\/Programming\/llama-hub\/.venv\/lib\/python3.10\/site-packages\/torch\/amp\/autocast_mode.py:250: UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disabling\\n\",\n \" warnings.warn(\\n\"\n ]\n },\n {\n \"data\": {\n \"text\/plain\": [\n \"[{'content': 'While highly competitive in effec-\\\\ntiveness, ColBERT is orders of magnitude cheaper than BERT base,\\\\nin particular, by over 170 \\\\x02in latency and 13,900 \\\\x02in FLOPs. \/T_his\\\\nhighlights the expressiveness of our proposed late interaction mech-\\\\nanism, particularly when coupled with a powerful pre-trained LM\\\\nlike BERT. While ColBERT’s re-ranking latency is slightly higher\\\\nthan the non-BERT re-ranking models shown (i.e., by 10s of mil-\\\\nliseconds), this difference is explained by the time it takes to gather,\\\\nstack, and transfer the document embeddings to the GPU. In partic-\\\\nular, the query encoding and interaction in ColBERT consume only\\\\n13 milliseconds of its total execution time. We note that ColBERT’s\\\\nlatency and FLOPs can be considerably reduced by padding queries\\\\nto a shorter length, using smaller vector dimensions (the MRR@10\\\\nof which is tested in §4.5), employing quantization of the document\\\\n6h\/t_tps:\/\/github.com\/mit-han-lab\/torchpro\/f_ile',\\n\",\n \" 'score': 17.003997802734375,\\n\",\n \" 'rank': 1},\\n\"," - }, - { - "id": 43, - "initial_rank": 43, - "content": "TQ [[gptq]]\n\n\n\nPEFT를 활용한 GPTQ 양자화를 사용해보시려면 이 [노트북](https:\/\/colab.research.google.com\/drive\/1_TIrmuKOFhuRRiTWN94iLKUFu6ZX4ceb)을 참고하시고, 자세한 내용은 이 [블로그 게시물](https:\/\/huggingface.co\/blog\/gptq-integration)에서 확인하세요!\n\n<\/Tip>\n\n[AutoGPTQ](https:\/\/github.com\/PanQiWei\/AutoGPTQ) 라이브러리는 GPTQ 알고리즘을 구현합니다. 이는 훈련 후 양자화 기법으로, 가중치 행렬의 각 행을 독립적으로 양자화하여 오차를 최소화하는 가중치 버전을 찾습니다. 이 가중치는 int4로 양자화되지만, 추론 중에는 실시간으로 fp16으로 복원됩니다. 이는 int4 가중치가 GPU의 전역 메모리 대신 결합된 커널에서 역양자화되기 때문에 메모리 사용량을 4배 절약할 수 있으며, 더 낮은 비트 너비를 사용함으로써 통신 시간이 줄어들어 추론 속도가 빨라질 것으로 기대할 수 있습니다.\n\n시작하기 전에 다음 라이브러리들이 설치되어 있는지 확인하세요:\n\n```bash\npip install auto-gptq\npip install --upgrade accelerate optimum transformers\n```\n\n모델을 양자화하려면(현재 텍스트 모델만 지원됨) [`GPTQConfig`] 클래스를 생성하고 양자화할 비트 수, 양자화를 위한 가중치 교정 데이터셋, 그리고 데이터셋을 준비하기 위한 토크나이저를 설정해야 합니다.\n\n```py\nfrom transformers import AutoModelForCausalLM, AutoTokenizer, GPTQConfig\n\nmodel_id = \"facebook\/opt-125m\"\ntokenizer = AutoTokenizer.from_pretrained(model_id)\ngptq_config = GPTQConfig(bits=4, dataset=\"c4\", tokenizer=tokenizer)\n```\n\n자신의 데이터셋을 문자열 리스트 형태로 전달할 수도 있지만, GPTQ 논문에서 사용한 동일한 데이터셋을 사용하는 것을 강력히 권장합니다.\n\n```py\ndataset = [\"auto-gptq is an easy-to-use model quantization library with user-friendly apis, based on GPTQ algorithm.\"]\ngptq_config = GPTQConfig(bits=4, dataset=dataset, tokenizer=tokenizer)\n```\n\n양자화할 모델을 로드하고 `gptq_config`을 [`~AutoModelForCausalLM.from_pretrained`] 메소드에 전달하세요. 모델을 메모리에 맞추기 위해 `device_map=\"auto\"`를 설정하여 모델을 자동으로 CPU로 오프로드하고, 양자화를 위해 모델 모듈이 CPU와 GPU 간에 이동할 수 있도록 합니다.\n\n```py\nquantized_model = AutoModelForCausalLM.from_pretrained(model_id, device_map=\"auto\", quantization_config=gptq_config)\n```\n\n데이터셋이 너무 커서 메모리가 부족한 경우를 대비한 디스크 오프로드는 현재 지원하지 않고 있습니다. 이럴 때는 `max_memory` 매개변수를 사용하여 디바이스(GPU 및 CPU)에서 사용할 메모리 양을 할당해 보세요:\n\n```py\nquantized_model = AutoModelForCausalLM.from_pretrained(model_id, device_map=\"auto\", max_memory={0: \"30GiB\", 1: \"46GiB\", \"cpu\": \"30GiB\"}, quantization_config=gptq_config)\n```\n\n\n\n하드웨어와 모델 매개변수량에 따라 모델을 처음부터 양자화하는 데 드는 시간이 서로 다를 수 있습니다. 예를 들어, 무료 등급의 Google Colab GPU로 비교적 가벼운 [facebook\/opt-350m](https:\/\/huggingface.co\/facebook\/opt-350m) 모델을 양자화하는 데 약 5분이 걸리지만, NVIDIA A100으로 175B에 달하는 매개변수를 가진 모델을 양자화하는 데는 약 4시간에 달하는 시간이 걸릴 수 있습니다. 모델을 양자화하기 전에, Hub에서 해당 모델의 GPTQ 양자화 버전이 이미 존재하는지 확인하는 것이 좋습니다.\n\n<\/Tip>\n\n모델이 양자화되면, 모델과 토크나이저를 Hub에 푸시하여 쉽게 공유하고 접근할 수 있습니다. [`GPTQConfig`]를 저장하기 위해 [`~PreTrainedModel.push_to_hub`] 메소드를 사용하세요:\n\n```py\nquantized_model.push_to_hub(\"opt-125m-gptq\")\ntokenizer.push_to_hub(\"opt-125m-gptq\")\n```\n\n양자화된 모델을 로컬에 저장하려면 [`~PreTrainedModel.save_pretrained`] 메소드를 사용할 수 있습니다. 모델이 `device_map` 매개변수로 양자화되었을 경우, 저장하기 전에 전체 모델을 GPU나 CPU로 이동해야 합니다. 예를 들어, 모델을 CPU에 저장하려면 다음과 같이 합니다:\n\n```py\nquantized_model.save_pretrained(\"opt-125m-gptq\")\ntokenizer.save_pretrained(\"opt-125m-gptq\")\n\n# device_map이 설정된 상태에서 양자화된 경우\nquantized_model.to(\"cpu\")\nquantized_model.save_pretrained(\"opt-125m-gptq\")\n```\n\n양자화된 모델을 다시 로드하려면 [`~PreTrainedModel.from_pretrained`] 메소드를 사용하고, `device_map=\"auto\"`를 설정하여 모든 사용 가능한 GPU에 모델을 자동으로 분산시켜 더 많은 메모리를 사용하지 않으면서 모델을 더 빠르게 로드할 수 있습니다.\n\n```py\nfrom transformers import AutoModelForCausalLM\n\nmodel = AutoModelForCausalLM.from_pretrained(\"{your_username}\/opt-125m-gptq\", device_map=\"auto\")\n```\n\n## ExLlama [[exllama]]\n\n[ExLlama](https:\/\/github.com\/turboderp\/exllama)은 [Llama](model_doc\/llama) 모델의 Python\/C++\/CUDA 구현체로, 4비트 GPTQ 가중치를 사용하여 더 빠른 추론을 위해 설계되었습니다(이 [벤치마크](https:\/\/github.com\/huggingface\/optimum\/tree\/main\/tests\/benchmark#gptq-benchmark)를 참고하세요). ['GPTQConfig'] 객체를 생성할 때 ExLlama 커널이 기본적으로 활성화됩니다. 추론 속도를 더욱 높이기 위해, `exllama_config` 매개변수를 구성하여 [ExLlamaV2](https:\/\/github.com\/turboderp\/exllamav2) 커널을 사용할 수 있습니다:\n\n```py\nimport torch\nfrom transformers import AutoModelForCausalLM, GPTQConfig\n\ngptq_config = GPTQConfig(bits=4, exllama_config={\"version\":2})\nmodel = AutoModelForCausalLM.from_pretrained(\"{your_username}\/opt-125m-gptq\", device_map=\"auto\", quantization_config=" - }, - { - "id": 44, - "initial_rank": 44, - "content": "oubleshoot\n\nSometimes errors occur, but we are here to help! This guide covers some of the most common issues we've seen and how you can resolve them. However, this guide isn't meant to be a comprehensive collection of every 🤗 Transformers issue. For more help with troubleshooting your issue, try:\n\n\n\n1. Asking for help on the [forums](https:\/\/discuss.huggingface.co\/). There are specific categories you can post your question to, like [Beginners](https:\/\/discuss.huggingface.co\/c\/beginners\/5) or [🤗 Transformers](https:\/\/discuss.huggingface.co\/c\/transformers\/9). Make sure you write a good descriptive forum post with some reproducible code to maximize the likelihood that your problem is solved!\n\n\n\n2. Create an [Issue](https:\/\/github.com\/huggingface\/transformers\/issues\/new\/choose) on the 🤗 Transformers repository if it is a bug related to the library. Try to include as much information describing the bug as possible to help us better figure out what's wrong and how we can fix it.\n\n3. Check the [Migration](migration) guide if you use an older version of 🤗 Transformers since some important changes have been introduced between versions.\n\nFor more details about troubleshooting and getting help, take a look at [Chapter 8](https:\/\/huggingface.co\/course\/chapter8\/1?fw=pt) of the Hugging Face course.\n\n\n## Firewalled environments\n\nSome GPU instances on cloud and intranet setups are firewalled to external connections, resulting in a connection error. When your script attempts to download model weights or datasets, the download will hang and then timeout with the following message:\n\n```\nValueError: Connection error, and we cannot find the requested files in the cached path.\nPlease try again or make sure your Internet connection is on.\n```\n\nIn this case, you should try to run 🤗 Transformers on [offline mode](installation#offline-mode) to avoid the connection error.\n\n## CUDA out of memory\n\nTraining large models with millions of parameters can be challenging without the appropriate hardware. A common error you may encounter when the GPU runs out of memory is:\n\n```\nCUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 11.17 GiB total capacity; 9.70 GiB already allocated; 179.81 MiB free; 9.85 GiB reserved in total by PyTorch)\n```\n\nHere are some potential solutions you can try to lessen memory use:\n\n- Reduce the [`per_device_train_batch_size`](main_classes\/trainer#transformers.TrainingArguments.per_device_train_batch_size) value in [`TrainingArguments`].\n- Try using [`gradient_accumulation_steps`](main_classes\/trainer#transformers.TrainingArguments.gradient_accumulation_steps) in [`TrainingArguments`] to effectively increase overall batch size.\n\n\n\nRefer to the Performance [guide](performance) for more details about memory-saving techniques.\n\n<\/Tip>\n\n## Unable to load a saved TensorFlow model\n\nTensorFlow's [model.save](https:\/\/www.tensorflow.org\/tutorials\/keras\/save_and_load#save_the_entire_model) method will save the entire model - architecture, weights, training configuration - in a single file. However, when you load the model file again, you may run into an error because 🤗 Transformers may not load all the TensorFlow-related objects in the model file. To avoid issues with saving and loading TensorFlow models, we recommend you:\n\n- Save the model weights as a `h5` file extension with [`model.save_weights`](https:\/\/www.tensorflow.org\/tutorials\/keras\/save_and_load#save_the_entire_model) and then reload the model with [`~TFPreTrainedModel.from_pretrained`]:\n\n```py\n>>> from transformers import TFPreTrainedModel\n>>> from tensorflow import keras\n\n>>> model.save_weights(\"some_folder\/tf_model.h5\")\n>>> model = TFPreTrainedModel.from_pretrained(\"some_folder\")\n```\n\n- Save the model with [`~TFPretrainedModel.save_pretrained`] and load it again with [`~TFPreTrainedModel.from_pretrained`]:\n\n```py\n>>> from transformers import TFPreTrainedModel\n\n>>> model.save_pretrained(\"path_to\/model\")\n>>> model = TFPreTrainedModel.from_pretrained(\"path_to\/model\")\n```\n\n## ImportError\n\nAnother common error you may encounter, especially if it is a newly released model, is `ImportError`:\n\n```\nImportError: cannot import name 'ImageGPTImageProcessor' from 'transformers' (unknown location)\n```\n\nFor these error types, check to make sure you have the latest version of 🤗 Transformers installed to access the most recent models:\n\n```bash\npip install transformers --upgrade\n```\n\n## CUDA error: device-side assert triggered\n\nSometimes you may run into a generic CUDA error about an error in the device code.\n\n```\nRuntimeError: CUDA error: device-side assert triggered\n```\n\nYou should try to run the code on a CPU first to get a more descriptive error message. Add the following environment variable to the beginning of your code to switch to a CPU:\n\n```py\n>>> import os\n\n>>> os.environ[\"CUDA_VISIBLE_DEVICES\"] = \"\"\n```\n\nAnother option is to get a better traceback from the GPU. Add the following environment variable to the beginning of your code to get the traceback to point to the source of the error:\n\n```py\n>>> import os\n\n>>> os.environ[\"CUDA_LAUNCH_BLOCKING\"] = \"1\"\n```\n\n## Incorrect output when padding tokens aren't masked\n\nIn some cases, the output `hidden_state` may be incorrect if the `input_ids` include padding tokens. To demonstrate, load a model and tokenizer. You can access a model's `pad_token_id` to see its value. The `pad_token_id` may be `None` for some models, but you can always manually set it.\n\n```py\n>>> from transformers import AutoModelForSequenceClassification\n>>> import torch\n\n>>> model = AutoModelForSequenceClassification.from_pretrained(\"google-bert\/bert-base-uncased\")\n>>> model.config.pad_token_id\n0\n```\n\nThe following example shows the output without masking the padding tokens:\n\n```py\n>>> input_ids = torch.tensor([[7592, 2057, 2097, 2393, 9611, 2115], [7592, 0, 0, 0, 0, 0]])\n>>> output = model(input_ids)\n>>> print(output.logits)\ntensor([[ 0.0082, -0.2307],\n [ 0.1317, -0.1683]], grad_fn=)\n```\n\nHere is the actual output of the second sequence:\n\n```py\n>>> input_ids = torch.tensor([[7592]])\n>>> output = model(input_ids)\n>>> print(output.logits)\ntensor([[-0.1008, -0.4061]], grad_fn=)\n```\n\nMost of the time, you should provide an `attention_mask` to your model to ignore the padding tokens to avoid this silent error. Now the output of the second sequence matches its actual output:\n\n\n\nBy default, the tokenizer creates an `attention_mask` for you based on your specific tokenizer's defaults.\n\n<\/Tip>\n\n```py\n>>> attention_mask = torch.tensor([[1, 1, 1, 1, 1, 1], [1, 0, 0, 0, 0, 0]])\n>>> output = model(input_ids, attention_mask=attention_mask)\n>>> print(output.logits)\ntensor([[ 0.0082, -0.2307],\n [-0.1008, -0.4061]], grad_fn=)\n```\n\n🤗 Transformers doesn't automatically create an `attention_mask` to mask a padding token if it is provided because:\n\n- Some models don't have a padding token.\n- For some use-cases, users want a model to attend to a padding token.\n\n## ValueError: Unrecognized configuration class XYZ for this kind of AutoModel\n\nGenerally, we recommend using the [`AutoModel`] class to load pretrained instances of models. This class\ncan automatically infer and load the correct architecture from a given checkpoint based on the configuration. If you see\nthis `ValueError` when loading a model from a checkpoint, this means the Auto class couldn't find a mapping from\nthe configuration in the given checkpoint to the kind of model you are trying to load. Most commonly, this happens when a\ncheckpoint doesn't support a given task.\nFor instance, you'll see this error in the following example because there is no GPT2 for question answering:\n\n```py\n>>> from transformers import AutoProcessor, AutoModelForQuestionAnswering\n\n>>> processor = AutoProcessor.from_pretrained(\"openai-community\/gpt2-medium\")\n>>> model = AutoModelForQuestionAnswering.from_pretrained(\"openai-community\/gpt2-medium\")\nValueError: Unrecognized configuration class for this kind of AutoModel: AutoModelForQuestionAnswering.\nModel type should be one of AlbertConfig, BartConfig, BertConfig, BigBirdConfig, BigBirdPegasusConfig, BloomConfig, ...\n```\n" - }, - { - "id": 45, - "initial_rank": 45, - "content": "run the model just like before *without Flash Attention* and measure the peak GPU memory requirement and inference time.\n\n```python\nimport time\n\nstart_time = time.time()\nresult = pipe(long_prompt, max_new_tokens=60)[0][\"generated_text\"][len(long_prompt):]\n\nprint(f\"Generated in {time.time() - start_time} seconds.\")\nresult\n```\n\n**Output**:\n```\nGenerated in 10.96854019165039 seconds.\nSure. Here is a function that does that.\\n\\ndef bytes_to_giga(bytes):\\n return bytes \/ 1024 \/ 1024 \/ 1024\\n\\nAnswer: Sure. Here is a function that does that.\\n\\ndef\n````\n\nWe're getting the same output as before, however this time, the model repeats the answer multiple times until it's 60 tokens cut-off. This is not surprising as we've repeated the system prompt ten times for demonstration purposes and thus cued the model to repeat itself.\n\n**Note** that the system prompt should not be repeated ten times in real-world applications - one time is enough!\n\nLet's measure the peak GPU memory requirement.\n\n```python\nbytes_to_giga_bytes(torch.cuda.max_memory_allocated())\n```\n\n**Output**:\n```bash\n37.668193340301514\n```\n\nAs we can see the peak GPU memory requirement is now significantly higher than in the beginning, which is largely due to the longer input sequence. Also the generation takes a little over a minute now.\n\nWe call `flush()` to free GPU memory for our next experiment.\n\n```python\nflush()\n```\n\nFor comparison, let's run the same function, but enable Flash Attention instead.\nTo do so, we convert the model to [BetterTransformer](https:\/\/huggingface.co\/docs\/optimum\/bettertransformer\/overview) and by doing so enabling PyTorch's [SDPA self-attention](https:\/\/pytorch.org\/docs\/master\/generated\/torch.nn.functional.scaled_dot_product_attention) which in turn is able to use Flash Attention.\n\n```python\nmodel.to_bettertransformer()\n```\n\nNow we run the exact same code snippet as before and under the hood Transformers will make use of Flash Attention.\n\n```py\nstart_time = time.time()\nwith torch.backends.cuda.sdp_kernel(enable_flash=True, enable_math=False, enable_mem_efficient=False):\n result = pipe(long_prompt, max_new_tokens=60)[0][\"generated_text\"][len(long_prompt):]\n\nprint(f\"Generated in {time.time() - start_time} seconds.\")\nresult\n```\n\n**Output**:\n```\nGenerated in 3.0211617946624756 seconds.\n Sure. Here is a function that does that.\\n\\ndef bytes_to_giga(bytes):\\n return bytes \/ 1024 \/ 1024 \/ 1024\\n\\nAnswer: Sure. Here is a function that does that.\\n\\ndef\n```\n\nWe're getting the exact same result as before, but can observe a very significant speed-up thanks to Flash Attention.\n\nLet's measure the memory consumption one last time.\n\n```python\nbytes_to_giga_bytes(torch.cuda.max_memory_allocated())\n```\n\n**Output**:\n```\n32.617331981658936\n```\n\nAnd we're almost back to our original 29GB peak GPU memory from the beginning.\n\nWe can observe that we only use roughly 100MB more GPU memory when passing a very long input sequence with Flash Attention compared to passing a short input sequence as done in the beginning.\n\n```py\nflush()\n```\n\nFor more information on how to use Flash Attention, please have a look at [this doc page](https:\/\/huggingface.co\/docs\/transformers\/en\/perf_infer_gpu_one#flashattention-2).\n\n## 3. Architectural Innovations\n\nSo far we have looked into improving computational and memory efficiency by:\n\n- Casting the weights to a lower precision format\n- Replacing the self-attention algorithm with a more memory- and compute efficient version\n\nLet's now look into how we can change the architecture of an LLM so that it is most effective and efficient for task that require long text inputs, *e.g.*:\n- Retrieval augmented Questions Answering,\n- Summarization,\n- Chat\n\nNote that *chat* not only requires the LLM to handle long text inputs, but it also necessitates that the LLM is able to efficiently handle the back-and-forth dialogue between user and assistant (such as ChatGPT).\n\nOnce trained, the fundamental LLM architecture is difficult to change, so it is important to make considerations about the LLM's tasks beforehand and accordingly optimize the model's architecture.\nThere are two important components of the model architecture that quickly become memory and\/or performance bottlenecks for large input sequences.\n\n- The positional embeddings\n- The key-value cache\n\nLet's go over each component in more detail\n\n### 3.1 I" - }, - { - "id": 46, - "initial_rank": 46, - "content": "[[package]]\nname = \"langchain-community\"\nversion = \"0.0.28\"\ndescription = \"Community contributed LangChain integrations.\"\noptional = false\npython-versions = \">=3.8.1,<4.0\"\nfiles = [\n {file = \"langchain_community-0.0.28-py3-none-any.whl\", hash = \"sha256:bdb015ac455ae68432ea104628717583dce041e1abdfcefe86e39f034f5e90b8\"},\n {file = \"langchain_community-0.0.28.tar.gz\", hash = \"sha256:8664d243a90550fc5ddc137b712034e02c8d43afc8d4cc832ba5842b44c864ce\"},\n]\n\n[package.dependencies]\naiohttp = \">=3.8.3,<4.0.0\"\ndataclasses-json = \">=0.5.7,<0.7\"\nlangchain-core = \">=0.1.31,<0.2.0\"\nlangsmith = \">=0.1.0,<0.2.0\"\nnumpy = \">=1,<2\"\nPyYAML = \">=5.3\"\nrequests = \">=2,<3\"\nSQLAlchemy = \">=1.4,<3\"\ntenacity = \">=8.1.0,<9.0.0\"\n\n[package.extras]\ncli = [\"typer (>=0.9.0,<0.10.0)\"]\nextended-testing = [\"aiosqlite (>=0.19.0,<0.20.0)\", \"aleph-alpha-client (>=2.15.0,<3.0.0)\", \"anthropic (>=0.3.11,<0.4.0)\", \"arxiv (>=1.4,<2.0)\", \"assemblyai (>=0.17.0,<0.18.0)\", \"atlassian-python-api (>=3.36.0,<4.0.0)\", \"azure-ai-documentintelligence (>=1.0.0b1,<2.0.0)\", \"beautifulsoup4 (>=4,<5)\", \"bibtexparser (>=1.4.0,<2.0.0)\", \"cassio (>=0.1.0,<0.2.0)\", \"chardet (>=5.1.0,<6.0.0)\", \"cloudpickle (>=2.0.0)\", \"cohere (>=4,<5)\", \"databricks-vectorsearch (>=0.21,<0.22)\", \"datasets (>=2.15.0,<3.0.0)\", \"dgml-utils (>=0.3.0,<0.4.0)\", \"elasticsearch (>=8.12.0,<9.0.0)\", \"esprima (>=4.0.1,<5.0.0)\", \"faiss-cpu (>=1,<2)\", \"feedparser (>=6.0.10,<7.0.0)\", \"fireworks-ai (>=0.9.0,<0.10.0)\", \"friendli-client (>=1.2.4,<2.0.0)\", \"geopandas (>=0.13.1,<0.14.0)\", \"gitpython (>=3.1.32,<4.0.0)\", \"google-cloud-documentai (>=2.20.1,<3.0.0)\", \"gql (>=3.4.1,<4.0.0)\", \"gradientai (>=1.4.0,<2.0.0)\", \"hdbcli (>=2.19.21,<3.0.0)\", \"hologres-vector (>=0.0.6,<0.0.7)\", \"html2text (>=2020.1.16,<2021.0.0)\", \"httpx (>=0.24.1,<0.25.0)\", \"javelin-sdk (>=0.1.8,<0.2.0)\", \"jinja2 (>=3,<4)\", \"jq (>=1.4.1,<2.0.0)\", \"jsonschema (>1)\", \"lxml (>=4.9.2,<5.0.0)\", \"markdownify (>=0.11.6,<0.12.0)\", \"motor (>=3.3.1,<4.0.0)\", \"msal (>=1.25.0,<2.0.0)\", \"mwparserfromhell (>=0.6.4,<0.7.0)\", \"mwxml (>=0.3.3,<0.4.0)\", \"newspaper3k (>=0.2.8,<0.3.0)\", \"numexpr (>=2.8.6,<3.0.0)\", \"nvidia-riva-client (>=2.14.0,<3.0.0)\", \"oci (>=2.119.1,<3.0.0)\", \"openai (<2)\", \"openapi-pydantic (>=0.3.2,<0.4.0)\", \"oracle-ads (>=2.9.1,<3.0.0)\", \"pandas (>=2.0.1,<3.0.0)\", \"pdfminer-six (>=20221105,<20221106)\", \"pgvector (>=0.1.6,<0.2.0)\", \"praw (>=7.7.1,<8.0.0)\", \"psychicapi (>=0.8.0,<0.9.0)\", \"py-trello (>=0.19.0,<0.20.0)\", \"pymupdf (>=1.22.3,<2.0.0)\", \"pypdf (>=3.4.0,<4.0.0)\", \"pypdfium2 (>=4.10.0,<5.0.0)\", \"pyspark (>=3.4.0,<4.0.0)\", \"rank-bm25 (>=0.2.2,<0.3.0)\", \"rapidfuzz (>=3.1.1,<4.0.0)\", \"rapidocr-onnxruntime (>=1.3.2,<2.0.0)\", \"rdflib (==7.0.0)\", \"requests-toolbelt (>=1.0.0,<2.0.0)\", \"rspace_client (>=2.5.0,<3.0.0)\", \"scikit-learn (>=1.2.2,<2.0.0)\", \"sqlite-vss (>=0.1.2,<0.2.0)\", \"streamlit (>=1.18.0,<2.0.0)\", \"sympy (>=1.12,<2.0)\", \"telethon (>=1.28.5,<2.0.0)\", \"tidb-vector (>=0.0.3,<1.0.0)\", \"timescale-vector (>=0.0.1,<0.0.2)\", \"tqdm (>=4.48.0)\", \"tree-sitter (>=0.20.2,<0.21.0)\", \"tree-sitter-languages (>=1.8.0,<2.0.0)\", \"upstash-redis (>=0.15.0,<0.16.0)\", \"xata (>=1.0.0a7,<2.0.0)\", \"xmltodict (>=0.13.0,<0.14.0)\", \"zhipuai (>=1.0.7,<2.0.0)\"]\n\n[[package]]\nname = \"langchain-core\"\nversion = \"0.1.31\"\ndescription = \"Building applications with LLMs through composability\"\noptional = false\npython-versions = \">=3.8.1,<4.0\"\nfiles = []\ndevelop = true\n\n[package.dependencies]\nanyio = \">=3,<5\"\njsonpatch = \"^1.33\"\nlangsmith = \"^0.1.0\"\npackaging = \"^23.2\"\npydantic = \">=1,<3\"\nPyYAML = \">=5.3\"\nrequests = \"^2\"\ntenacity = \"^8.1.0\"\n\n[package.extras]\nextended-testing = [\"jinja2 (>=3,<4)\"]\n\n[package.source]\ntype = \"directory\"\nurl = \"..\/libs\/core\"\n\n" - }, - { - "id": 47, - "initial_rank": 47, - "content": "d=\"M1227.57,1375.94c-3.92-4.37-7.85-8.74-11.77-13.1-27.64-30.21-41.35-65.1-30.46-111.35-104.67,33.94-205.19,41.62-308.99,17.88-2.72,31.05-3.81,59.43-8.02,87.34-5.15,34.16-13.02,67.9-19.47,101.86-2.25,11.83-4.72,23.74-5.46,35.71-2.2,35.64-1.41,71.63-6.21,106.88-2.47,18.13-14.33,34.99-21.3,50.93h-77.71l.42-.97c4.15-11.29,8.29-22.58,12.44-33.87,0,0-.21.27-.21.27,4.2-4.21,8.39-8.43,12.59-12.64l-.27.27c2.12-2.13,4.24-4.25,6.36-6.38l11.44-1.32c1.11-18.89.94-11.95,1.11-18.89.83-33.72.34-67.53-1.47-101.21-.67-12.36-6.26-24.46-9.6-36.68-2.32.06-4.64.12-6.96.18-12.39,44.7-24.78,89.41-37.17,134.11,0,0,.55-.24.55-.24-8.87,16.08-21.35,47.27-30.22,63.35-11.03,1-42.13-.92-67.88-.77,2.1-21.24,9.98-32.37,23.37-44.39,28.79-25.84,36.54-63.75,36.43-100.05-.15-50.7-6.92-101.37-10.86-152.06,5.59-10.19,8.83-23.19,17.14-30.11,56.24-46.85,102.12-102.83,144.95-161.48,41.46-56.78,83.19-113.45,122.26-171.88,43.03-64.36,87.61-128.6,97.48-208.71,72.88,7.68,142.46,25.43,207.78,60.42,19.96,10.69,45.01,11.89,67.72,17.44,8.42,6.5,20.9,11.26,24.57,19.8,18.5,43.03,36.6,86.42,51.23,130.87,17.99,54.67,22.8,110.74,13.5,168.57-11.14,69.24-41.63,128.83-80.9,185.36-6.53,9.41-8.65,21.88-12.81,32.94,0,0,.15-.18.15-.18-6.82,4.82-14.03,9.18-20.38,14.54-17.18,14.49-21.33,42.61-48.67,46.55-.37-1.49-.55-2.99-.52-4.53-1.44-2.72-2.87-5.43-4.31-8.15-9.96-18.77-19.93-37.53-29.89-56.3ZM517.28,570.6c-10.46-.44-21.29,7.69-31.95,11.86,0,0-9.76,80.82-9.76,121.49s13.91,75.15,16.27,113.41c1.68,27.25,3.01,56.38,13.57,80.73,12.68,29.21,18.19,57.44,16.61,88.5-4.38,86.3,40.37,144.69,108.96,189.16,30.85-33.52,64.34-65,91.99-100.98,55.34-72.02,102.6-149.18,128.62-237.37,5.42-18.37,8.99-37.27,14.28-59.58,2.88-2.94,8.79-8.99,14.7-15.04l-141.94-.4c-2.09-1.89-3.55-3.22-5.64-5.11l-17.96-175.16c-65.91-4.05-131.8-8.74-197.77-11.52ZM665.84,361.83c-23.76-47.96-58.65-85.02-111.51-106.14-.84,14.47-1.52,26.06-2.26,38.88-17.66-12.73-33.28-23.98-53.45-38.52,2.58,25.98,5.08,44.11,6.04,62.33,1.89,36.02-13.58,54.11-49.86,59.15-7.07.98-14.41,0-21.52.83-21.28,2.46-32.65,14.43-35.51,35.63-3.74,27.68,10.41,45.81,33.06,56.83,23.57,11.47,48.7,19.75,77.45,31.13-4.21,14.19-8.49,28.22-12.51,42.32-3.62,12.69-6.95,25.46-10.41,38.2,10.66-4.17,21.48-12.3,31.95-11.86,65.97,2.78,131.86,7.46,197.77,11.52-3.72-76.27-14.91-151.03-49.22-220.28ZM1363.96,1424.58c1.96,2.05,3.91,4.09,5.87,6.14,2.05,2.06,4.1,4.12,6.15,6.17,2.04,1.96,4.07,3.92,6.11,5.88,7.5,4.99,15,9.99,22.49,14.98,3.64-19.55,25.85-47.67,44.43-61.31,7.67-5.63,11.05-16.91,16.99-25.18,3.32-4.62,7.89-8.34,11.9-12.46-1.62-13.04-4.17-26.03-4.67-39.11-1.43-37.53.92-74.57-10.84-112.04-14.13-45-8.64-92.52-6.02-140.17,43.12-8.67,61.47-36.79,63.89-74.64,2.92-45.72-5.65-89.82-34.29-127.23-20.53-26.81-48.34-41.86-82.84-36.43-22.85,3.59-45.04,11.37-67.53,17.28,8.42,6.5,20.9,11.26,24.57,19.8,18.5,43.03,36.6,86.42,51.23,130.87,17.99,54.67,22.8,110.74,13.5,168.57-11.14,69.24-41.63,128.83-80.9,185.36-6.53,9.41-8.65,21.88-12.81,32" - }, - { - "id": 48, - "initial_rank": 48, - "content": "createElement(\"path\",{className:\"cls-2\",d:\"M2133.97,104.73h-49.33c-48.36,0-90.91,25.48-115.75,64.1-14.52,22.58-22.99,49.63-22.99,78.73,0,44.89,20.13,84.92,51.59,111.1,2.93,2.6,6.05,4.98,9.31,7.14,12.86,8.49,28.11,13.47,44.52,13.47,1.23,0,2.46-.03,3.68-.09,.36-.02,.71-.05,1.07-.07,.87-.05,1.75-.11,2.62-.2,.34-.03,.68-.08,1.02-.12,.91-.1,1.82-.21,2.73-.34,.21-.03,.42-.07,.63-.1,32.89-5.07,61.56-30.82,70.9-62.81v57.83c0,3.26,2.64,5.9,5.9,5.9h50.42c3.26,0,5.9-2.64,5.9-5.9V110.63c0-3.26-2.64-5.9-5.9-5.9h-56.32Zm0,206.92c-12.2,10.16-27.97,13.98-44.84,15.12-.16,.01-.33,.03-.49,.04-1.12,.07-2.24,.1-3.36,.1-42.24,0-77.12-35.89-77.12-79.37,0-10.25,1.96-20.01,5.42-28.98,11.22-29.12,38.77-49.74,71.06-49.74h49.33v142.83Z\"}),Be.createElement(\"path\",{className:\"cls-2\",d:\"M1314.05,104.73h-49.33c-48.36,0-90.91,25.48-115.75,64.1-11.79,18.34-19.6,39.64-22.11,62.59-.58,5.3-.88,10.68-.88,16.14s.31,11.15,.93,16.59c4.28,38.09,23.14,71.61,50.66,94.52,2.93,2.6,6.05,4.98,9.31,7.14,12.86,8.49,28.11,13.47,44.52,13.47h0c17.99,0,34.61-5.93,48.16-15.97,16.29-11.58,28.88-28.54,34.48-47.75v50.26h-.11v11.08c0,21.84-5.71,38.27-17.34,49.36-11.61,11.08-31.04,16.63-58.25,16.63-11.12,0-28.79-.59-46.6-2.41-2.83-.29-5.46,1.5-6.27,4.22l-12.78,43.11c-1.02,3.46,1.27,7.02,4.83,7.53,21.52,3.08,42.52,4.68,54.65,4.68,48.91,0,85.16-10.75,108.89-32.21,21.48-19.41,33.15-48.89,35.2-88.52V110.63c0-3.26-2.64-5.9-5.9-5.9h-56.32Zm0,64.1s.65,139.13,0,143.36c-12.08,9.77-27.11,13.59-43.49,14.7-.16,.01-.33,.03-.49,.04-1.12,.07-2.24,.1-3.36,.1-1.32,0-2.63-.03-3.94-.1-40.41-2.11-74.52-37.26-74.52-79.38,0-10.25,1.96-20.01,5.42-28.98,11.22-29.12,38.77-49.74,71.06-49.74h49.33Z\"}),Be.createElement(\"path\",{className:\"cls-1\",d:\"M249.83,0C113.3,0,2,110.09,.03,246.16c-2,138.19,110.12,252.7,248.33,253.5,42.68,.25,83.79-10.19,120.3-30.03,3.56-1.93,4.11-6.83,1.08-9.51l-23.38-20.72c-4.75-4.21-11.51-5.4-17.36-2.92-25.48,10.84-53.17,16.38-81.71,16.03-111.68-1.37-201.91-94.29-200.13-205.96,1.76-110.26,92-199.41,202.67-199.41h202.69V407.41l-115-102.18c-3.72-3.31-9.42-2.66-12.42,1.31-18.46,24.44-48.53,39.64-81.93,37.34-46.33-3.2-83.87-40.5-87.34-86.81-4.15-55.24,39.63-101.52,94-101.52,49.18,0,89.68,37.85,93.91,85.95,.38,4.28,2.31,8.27,5.52,11.12l29.95,26.55c3.4,3.01,8.79,1.17,9.63-3.3,2.16-11.55,2.92-23.58,2.07-35.92-4.82-70.34-61.8-126.93-132.17-131.26-80.68-4.97-148.13,58.14-150.27,137.25-2.09,77.1,61.08,143.56,138.19,145.26,32.19,.71,62.03-9.41,86.14-26.95l150.26,133.2c6.44,5.71,16.61,1.14,16.61-7.47V9.48C499.66,4.25,495.42,0,490.18,0H249.83Z\"})))}function sr(e){return Be.createElement(\"svg\",{width:\"15\",height:\"15\",\"aria-label\":e.ariaLabel,role:\"img\"},Be.createElement(\"g\",{fill:\"none\",stroke:\"currentColor\",strokeLinecap:\"round\",strokeLinejoin:\"round\",strokeWidth:\"1.2\"},e.children))}fun" - }, - { - "id": 49, - "initial_rank": 49, - "content": "createElement(\"path\",{className:\"cls-2\",d:\"M2133.97,104.73h-49.33c-48.36,0-90.91,25.48-115.75,64.1-14.52,22.58-22.99,49.63-22.99,78.73,0,44.89,20.13,84.92,51.59,111.1,2.93,2.6,6.05,4.98,9.31,7.14,12.86,8.49,28.11,13.47,44.52,13.47,1.23,0,2.46-.03,3.68-.09,.36-.02,.71-.05,1.07-.07,.87-.05,1.75-.11,2.62-.2,.34-.03,.68-.08,1.02-.12,.91-.1,1.82-.21,2.73-.34,.21-.03,.42-.07,.63-.1,32.89-5.07,61.56-30.82,70.9-62.81v57.83c0,3.26,2.64,5.9,5.9,5.9h50.42c3.26,0,5.9-2.64,5.9-5.9V110.63c0-3.26-2.64-5.9-5.9-5.9h-56.32Zm0,206.92c-12.2,10.16-27.97,13.98-44.84,15.12-.16,.01-.33,.03-.49,.04-1.12,.07-2.24,.1-3.36,.1-42.24,0-77.12-35.89-77.12-79.37,0-10.25,1.96-20.01,5.42-28.98,11.22-29.12,38.77-49.74,71.06-49.74h49.33v142.83Z\"}),Be.createElement(\"path\",{className:\"cls-2\",d:\"M1314.05,104.73h-49.33c-48.36,0-90.91,25.48-115.75,64.1-11.79,18.34-19.6,39.64-22.11,62.59-.58,5.3-.88,10.68-.88,16.14s.31,11.15,.93,16.59c4.28,38.09,23.14,71.61,50.66,94.52,2.93,2.6,6.05,4.98,9.31,7.14,12.86,8.49,28.11,13.47,44.52,13.47h0c17.99,0,34.61-5.93,48.16-15.97,16.29-11.58,28.88-28.54,34.48-47.75v50.26h-.11v11.08c0,21.84-5.71,38.27-17.34,49.36-11.61,11.08-31.04,16.63-58.25,16.63-11.12,0-28.79-.59-46.6-2.41-2.83-.29-5.46,1.5-6.27,4.22l-12.78,43.11c-1.02,3.46,1.27,7.02,4.83,7.53,21.52,3.08,42.52,4.68,54.65,4.68,48.91,0,85.16-10.75,108.89-32.21,21.48-19.41,33.15-48.89,35.2-88.52V110.63c0-3.26-2.64-5.9-5.9-5.9h-56.32Zm0,64.1s.65,139.13,0,143.36c-12.08,9.77-27.11,13.59-43.49,14.7-.16,.01-.33,.03-.49,.04-1.12,.07-2.24,.1-3.36,.1-1.32,0-2.63-.03-3.94-.1-40.41-2.11-74.52-37.26-74.52-79.38,0-10.25,1.96-20.01,5.42-28.98,11.22-29.12,38.77-49.74,71.06-49.74h49.33Z\"}),Be.createElement(\"path\",{className:\"cls-1\",d:\"M249.83,0C113.3,0,2,110.09,.03,246.16c-2,138.19,110.12,252.7,248.33,253.5,42.68,.25,83.79-10.19,120.3-30.03,3.56-1.93,4.11-6.83,1.08-9.51l-23.38-20.72c-4.75-4.21-11.51-5.4-17.36-2.92-25.48,10.84-53.17,16.38-81.71,16.03-111.68-1.37-201.91-94.29-200.13-205.96,1.76-110.26,92-199.41,202.67-199.41h202.69V407.41l-115-102.18c-3.72-3.31-9.42-2.66-12.42,1.31-18.46,24.44-48.53,39.64-81.93,37.34-46.33-3.2-83.87-40.5-87.34-86.81-4.15-55.24,39.63-101.52,94-101.52,49.18,0,89.68,37.85,93.91,85.95,.38,4.28,2.31,8.27,5.52,11.12l29.95,26.55c3.4,3.01,8.79,1.17,9.63-3.3,2.16-11.55,2.92-23.58,2.07-35.92-4.82-70.34-61.8-126.93-132.17-131.26-80.68-4.97-148.13,58.14-150.27,137.25-2.09,77.1,61.08,143.56,138.19,145.26,32.19,.71,62.03-9.41,86.14-26.95l150.26,133.2c6.44,5.71,16.61,1.14,16.61-7.47V9.48C499.66,4.25,495.42,0,490.18,0H249.83Z\"})))}function sr(e){return Be.createElement(\"svg\",{width:\"15\",height:\"15\",\"aria-label\":e.ariaLabel,role:\"img\"},Be.createElement(\"g\",{fill:\"none\",stroke:\"currentColor\",strokeLinecap:\"round\",strokeLinejoin:\"round\",strokeWidth:\"1.2\"},e.children))}fun" - }, - { - "id": 50, - "initial_rank": 50, - "content": "# THIS FILE HAS BEEN AUTOGENERATED. To update:\n# 1. modify the `_deps` dict in setup.py\n# 2. run `make deps_table_update``\ndeps = {\n \"Pillow\": \"Pillow>=10.0.1,<=15.0\",\n \"accelerate\": \"accelerate>=0.26.0\",\n \"av\": \"av==9.2.0\",\n \"beautifulsoup4\": \"beautifulsoup4\",\n \"blobfile\": \"blobfile\",\n \"codecarbon\": \"codecarbon==1.2.0\",\n \"cookiecutter\": \"cookiecutter==1.7.3\",\n \"dataclasses\": \"dataclasses\",\n \"datasets\": \"datasets!=2.5.0\",\n \"decord\": \"decord==0.6.0\",\n \"deepspeed\": \"deepspeed>=0.9.3\",\n \"diffusers\": \"diffusers\",\n \"dill\": \"dill<0.3.5\",\n \"evaluate\": \"evaluate>=0.2.0\",\n \"faiss-cpu\": \"faiss-cpu\",\n \"fastapi\": \"fastapi\",\n \"filelock\": \"filelock\",\n \"flax\": \"flax>=0.4.1,<=0.7.0\",\n \"fsspec\": \"fsspec<2023.10.0\",\n \"ftfy\": \"ftfy\",\n \"fugashi\": \"fugashi>=1.0\",\n \"GitPython\": \"GitPython<3.1.19\",\n \"hf-doc-builder\": \"hf-doc-builder>=0.3.0\",\n \"huggingface-hub\": \"huggingface-hub>=0.23.2,<1.0\",\n \"importlib_metadata\": \"importlib_metadata\",\n \"ipadic\": \"ipadic>=1.0.0,<2.0\",\n \"isort\": \"isort>=5.5.4\",\n \"jax\": \"jax>=0.4.1,<=0.4.13\",\n \"jaxlib\": \"jaxlib>=0.4.1,<=0.4.13\",\n \"jieba\": \"jieba\",\n \"jinja2\": \"jinja2>=3.1.0\",\n \"kenlm\": \"kenlm\",\n \"keras\": \"keras>2.9,<2.16\",\n \"keras-nlp\": \"keras-nlp>=0.3.1,<0.14.0\",\n \"librosa\": \"librosa\",\n \"nltk\": \"nltk<=3.8.1\",\n \"natten\": \"natten>=0.14.6,<0.15.0\",\n \"numpy\": \"numpy>=1.17\",\n \"onnxconverter-common\": \"onnxconverter-common\",\n \"onnxruntime-tools\": \"onnxruntime-tools>=1.4.2\",\n \"onnxruntime\": \"onnxruntime>=1.4.0\",\n \"opencv-python\": \"opencv-python\",\n \"optimum-benchmark\": \"optimum-benchmark>=0.3.0\",\n \"optuna\": \"optuna\",\n \"optax\": \"optax>=0.0.8,<=0.1.4\",\n \"packaging\": \"packaging>=20.0\",\n \"parameterized\": \"parameterized\",\n \"phonemizer\": \"phonemizer\",\n \"protobuf\": \"protobuf\",\n \"psutil\": \"psutil\",\n \"pyyaml\": \"pyyaml>=5.1\",\n \"pydantic\": \"pydantic\",\n \"pytest\": \"pytest>=7.2.0,<8.0.0\",\n \"pytest-timeout\": \"pytest-timeout\",\n \"pytest-xdist\": \"pytest-xdist\",\n \"python\": \"python>=3.8.0\",\n \"ray[tune]\": \"ray[tune]>=2.7.0\",\n \"regex\": \"regex!=2019.12.17\",\n \"requests\": \"requests\",\n \"rhoknp\": \"rhoknp>=1.1.0,<1.3.1\",\n \"rjieba\": \"rjieba\",\n \"rouge-score\": \"rouge-score!=0.0.7,!=0.0.8,!=0.1,!=0.1.1\",\n \"ruff\": \"ruff==0.5.1\",\n \"sacrebleu\": \"sacrebleu>=1.4.12,<2.0.0\",\n \"sacremoses\": \"sacremoses\",\n \"safetensors\": \"safetensors>=0.4.1\",\n \"sagemaker\": \"sagemaker>=2.31.0\",\n \"schedulefree\": \"schedulefree>=1.2.6\",\n \"scikit-learn\": \"scikit-learn\",\n \"scipy\": \"scipy<1.13.0\",\n \"sentencepiece\": \"sentencepiece>=0.1.91,!=0.1.92\",\n \"sigopt\": \"sigopt\",\n \"starlette\": \"starlette\",\n \"sudachipy\": \"sudachipy>=0.6.6\",\n \"sudachidict_core\": \"sudachidict_core>=20220729\",\n \"tensorboard\": \"tensorboard\",\n \"tensorflow-cpu\": \"tensorflow-cpu>2.9,<2.16\",\n \"tensorflow\": \"tensorflow>2.9,<2.16\",\n \"tensorflow-text\": \"tensorflow-text<2.16\",\n \"tensorflow-probability\": \"tensorflow-probability<0.24\",\n \"tf2onnx\": \"tf2onnx\",\n \"timeout-decorator\": \"timeout-decorator\",\n \"tiktoken\": \"tiktoken\",\n \"timm\": \"timm<=0.9.16\",\n \"tokenizers\": \"tokenizers>=0.20,<0.21\",\n \"torch\": \"torch\",\n \"torchaudio\": \"torchaudio\",\n \"torchvision\": \"torchvision\",\n \"pyctcdecode\": \"pyctcdecode>=0.4.0\",\n \"tqdm\": \"tqdm>=4.27\",\n \"unidic\": \"unidic>=1.0.2\",\n \"unidic_lite\": \"unidic_lite>=1.0.7\",\n \"urllib3\": \"urllib3<2.0.0\",\n \"uvicorn\": \"uvicorn\",\n \"pytest-rich\": \"pytest-rich\",\n \"libcst\": \"libcst\",\n \"rich\": \"rich\",\n}\n" - } - ], - "top_k": 20, - "reranked_ids": [ - 2, - 1, - 8, - 12, - 26, - 11, - 23, - 45, - 9, - 43, - 3, - 21, - 17, - 5, - 6, - 16, - 28, - 4, - 15, - 19 - ] - }, - { - "question": "I'm trying to create an embedding vector database with some .txt documents in my local folder. In particular I'm following this tutorial from the official page of LangChain: LangChain - Azure Cognitive Search and Azure OpenAI.\nI have followed all the steps of the tutorial and this is my Python script:\n# From https:\/\/python.langchain.com\/docs\/integrations\/vectorstores\/azuresearch\n\n\nimport openai\nimport os\nfrom langchain.embeddings import OpenAIEmbeddings\nfrom langchain.vectorstores.azuresearch import AzureSearch\n\n\nos.environ[\"OPENAI_API_TYPE\"] = \"azure\"\nos.environ[\"OPENAI_API_BASE\"] = \"https:\/\/xxxxxx.openai.azure.com\"\nos.environ[\"OPENAI_API_KEY\"] = \"xxxxxxxxx\"\nos.environ[\"OPENAI_API_VERSION\"] = \"2023-05-15\"\n\nmodel: str = \"text-embedding-ada-002\"\n\n\nvector_store_address: str = \"https:\/\/xxxxxxx.search.windows.net\"\nvector_store_password: str = \"xxxxxxx\"\n\n\n\nembeddings: OpenAIEmbeddings = OpenAIEmbeddings(deployment=model, chunk_size=1)\nindex_name: str = \"cognitive-search-openai-exercise-index\"\nvector_store: AzureSearch = AzureSearch(\n azure_search_endpoint=vector_store_address,\n azure_search_key=vector_store_password,\n index_name=index_name,\n embedding_function=embeddings.embed_query,\n)\n\n\nfrom langchain.document_loaders import TextLoader\nfrom langchain.text_splitter import CharacterTextSplitter\n\nloader = TextLoader(\"C:\/Users\/xxxxxxxx\/azure_openai_cognitive_search_exercise\/data\/qna\/a.txt\", encoding=\"utf-8\")\n\ndocuments = loader.load()\ntext_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\ndocs = text_splitter.split_documents(documents)\n\nvector_store.add_documents(documents=docs)\n\n\n\n\n# Perform a similarity search\ndocs = vector_store.similarity_search(\n query=\"Who is Pippo Franco?\",\n k=3,\n search_type=\"similarity\",\n)\nprint(docs[0].page_content)\n\nNow, when I run the script I get the following error:\n\n\nvector_search_configuration is not a known attribute of class and will be ignored\nalgorithm_configurations is not a known attribute of class and will be ignored\nTraceback (most recent call last):\n File \"C:\\Users\\xxxxxxxxx\\venv\\Lib\\site-packages\\langchain\\vectorstores\\azuresearch.py\", line 105, in _get_search_client\n index_client.get_index(name=index_name)\n File \"C:\\Users\\xxxxxxx\\venv\\Lib\\site-packages\\azure\\core\\tracing\\decorator.py\", line 78, in wrapper_use_tracer\n return func(*args, **kwargs)\n ^^^^^^^^^^^^^^^^^^^^^\n File \"C:\\Users\\xxxxxxx\\KYF\\venv\\Lib\\site-packages\\azure\\search\\documents\\indexes\\_search_index_client.py\", line 145, in get_index\n result = self._client.indexes.get(name, **kwargs)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"C:\\Users\\xxxxxx\\venv\\Lib\\site-packages\\azure\\core\\tracing\\decorator.py\", line 78, in wrapper_use_tracer\n return func(*args, **kwargs)\n ^^^^^^^^^^^^^^^^^^^^^\n File \"C:\\Users\\xxxxxx\\KYF\\venv\\Lib\\site-packages\\azure\\search\\documents\\indexes\\_generated\\operations\\_indexes_operations.py\", \nline 864, in get\n map_error(status_code=response.status_code, response=response, error_map=error_map)\n File \"C:\\Users\\xxxxxxxx\\venv\\Lib\\site-packages\\azure\\core\\exceptions.py\", line 165, in map_error\n raise error\nazure.core.exceptions.ResourceNotFoundError: () No index with the name 'cognitive-search-openai-exercise-index' was found in the service 'cognitive-search-openai-exercise'.\nCode:\nMessage: No index with the name 'cognitive-search-openai-exercise-index' was found in the service 'cognitive-search-openai-exercise'. \n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n File \"c:\\Users\\xxxxxxx\\venv\\azure_openai_cognitive_search_exercise\\test.py\", line 25, in \n vector_store: AzureSearch = AzureSearch(\n ^^^^^^^^^^^^\n File \"C:\\Users\\xxxxxxx\\venv\\Lib\\site-packages\\langchain\\vectorstores\\azuresearch.py\", line 237, in __init__\n self.client = _get_search_client(\n ^^^^^^^^^^^^^^^^^^^\n File \"C:\\Users\\xxxxxxxx\\venv\\Lib\\site-packages\\langchain\\vectorstores\\azuresearch.py\", line 172, in _get_search_client \n index_client.create_index(index)\n File \"C:\\Users\\xxxxxxx\\venv\\Lib\\site-packages\\azure\\core\\tracing\\decorator.py\", line 78, in wrapper_use_tracer\n return func(*args, **kwargs)\n ^^^^^^^^^^^^^^^^^^^^^\n File \"C:\\Users\\xxxxxxx\\venv\\Lib\\site-packages\\azure\\search\\documents\\indexes\\_search_index_client.py\", line 220, in create_index\n result = self._client.indexes.create(patched_index, **kwargs)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"C:\\Users\\xxxxxxx\\venv\\Lib\\site-packages\\azure\\core\\tracing\\decorator.py\", line 78, in wrapper_use_tracer\n return func(*args, **kwargs)\n ^^^^^^^^^^^^^^^^^^^^^\n File \"C:\\Users\\xxxxxx\\venv\\Lib\\site-packages\\azure\\search\\documents\\indexes\\_generated\\operations\\_indexes_operations.py\", \nline 402, in create\n raise HttpResponseError(response=response, model=error)\nazure.core.exceptions.HttpResponseError: (InvalidRequestParameter) The request is invalid. Details: definition : The vector field 'content_vector' must have the property 'vectorSearchConfiguration' set.\nCode: InvalidRequestParameter\nMessage: The request is invalid. Details: definition : The vector field 'content_vector' must have the property 'vectorSearchConfiguration' set.\nException Details: (InvalidField) The vector field 'content_vector' must have the property 'vectorSearchConfiguration' set. Parameters: definition\n Code: InvalidField\n Message: The vector field 'content_vector' must have the property 'vectorSearchConfiguration' set. Parameters: definition\n\n\n\nI have created an index manually from the Azure Cognitive Search Console, but I don't think this is the correct approach, as the script should automatically create a new index.\n", - "dataset_ids": [ - "langchain\/libs\/community\/langchain_community\/vectorstores\/azuresearch.py_0_9431", - "openai-cookbook\/examples\/vector_databases\/azuresearch\/Getting_started_with_azure_ai_search_and_openai.ipynb_18981_29339", - "langchain\/docs\/docs\/integrations\/vectorstores\/azuresearch.ipynb_19617_26731", - "langchain\/docs\/docs\/integrations\/vectorstores\/azuresearch.ipynb_0_6995", - "langchain\/templates\/rag-azure-search\/pyproject.toml_0_523", - "langchain\/cookbook\/rag_semantic_chunking_azureaidocintelligence.ipynb_0_7455" - ], - "nugget_data": [ - { - "nugget_id": "77345121_nugget_0", - "text": "The error involves the 'content_vector' field needing 'vectorSearchConfiguration' to be set.", - "relevant_corpus_ids": [ - "langchain\/libs\/community\/langchain_community\/vectorstores\/azuresearch.py_0_9431", - "openai-cookbook\/examples\/vector_databases\/azuresearch\/Getting_started_with_azure_ai_search_and_openai.ipynb_18981_29339", - "langchain\/docs\/docs\/integrations\/vectorstores\/azuresearch.ipynb_19617_26731" - ] - }, - { - "nugget_id": "77345121_nugget_1", - "text": "The solution involves ensuring compatibility between the Azure Cognitive Search SDK and LangChain.", - "relevant_corpus_ids": [ - "langchain\/docs\/docs\/integrations\/vectorstores\/azuresearch.ipynb_0_6995", - "langchain\/templates\/rag-azure-search\/pyproject.toml_0_523" - ] - }, - { - "nugget_id": "77345121_nugget_2", - "text": "Installing azure-search-documents==11.4.0b8 is a recommended step to resolve the issue.", - "relevant_corpus_ids": [ - "langchain\/cookbook\/rag_semantic_chunking_azureaidocintelligence.ipynb_0_7455" - ] - } - ] - } - ], - "signature": { - "instructions": "Select a diverse set of relevant passages that cover different aspects of the query.\n\nYour task is to analyze ALL passages simultaneously and select a subset that:\n1. Covers different relevant topics\/aspects related to the query\n2. Avoids redundant\/duplicate information about the same topic\n3. Excludes passages about irrelevant topics\n\nInstructions:\n1. Read the query carefully and understand the key topics\/aspects needed\n2. Group passages by the topics they cover\n3. For each relevant topic:\n - Keep the highest quality passage\n - Remove redundant passages about that same topic\n4. Exclude passages about topics not relevant to the query\n5. Return EXACTLY `top_k` passage IDs representing diverse relevant topics\n\nCRITICAL: You must return exactly `top_k` IDs - no more, no less.", - "fields": [ - { - "prefix": "Query:", - "description": "The user's question or information need" - }, - { - "prefix": "Search Results:", - "description": "List of passages to analyze. Each contains: id, text, initial_rank, and hybrid_score" - }, - { - "prefix": "Top K:", - "description": "Exact number of passage IDs to return (strict requirement)" - }, - { - "prefix": "Reranked Ids:", - "description": "List of exactly `top_k` passage IDs representing diverse relevant topics. Must match IDs from search_results." - } - ] - }, - "lm": null - }, - "metadata": { - "dependency_versions": { - "python": "3.10", - "dspy": "2.6.27", - "cloudpickle": "3.1" - } - } -} \ No newline at end of file diff --git a/optimization_runs/mipro_optimized_query_expander.json b/optimization_runs/mipro_optimized_query_expander.json deleted file mode 100644 index 8daa6ba..0000000 --- a/optimization_runs/mipro_optimized_query_expander.json +++ /dev/null @@ -1,155 +0,0 @@ -{ - "expand_query": { - "traces": [], - "train": [], - "demos": [ - { - "augmented": true, - "question": "Kind of new to Langchain\/Qdrant but I'm building a recommendation engine to recommend users based on the contents of their associated PDF files, and I need to process PDFs and store their chunks in a vector database (I'm using Qdrant) for establishing context for the RAG agent. I don't exactly understand if this error is pertaining to some sort of version requirement, since the only prior error I found had to do with Langchain versions before 0.1.x:\nFound this prior issue\nHowever that issue was closed, and downgrading to versions below 0.1.x given the current releases of langchain doesn't seem feasible given what most of my current environment has recent dependencies.\nI tried different versions of langchain and different versions all of the corresponding langchain third-party libraries. Currently, these are the important parts of my requirements file (I think):\nlangchain==0.2.1\nlangchain-community==0.2.1\nlangchain-core==0.2.1\nlangchain-experimental==0.0.59\nlangchain-openai==0.1.7\nlangchain-text-splitters==0.2.0\nlangcodes==3.4.0\nlangsmith==0.1.57\n\nopenai==1.28.1 \npython==3.12.3\n\nLooking for some sort of workaround, or a diagnosis as to what may package may be causing the problem. My current program output:\nTraceback (most recent call last):\n File \"\/Users\/danielperlov\/dperlov\/JobsMatch\/backend\/ml_model\/resume_preprocessor\/main.py\", line 28, in \n main()\n File \"\/Users\/danielperlov\/dperlov\/JobsMatch\/backend\/ml_model\/resume_preprocessor\/main.py\", line 17, in main\n processor = PDFResumeProcessor(openai_api_key)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"\/Users\/danielperlov\/dperlov\/JobsMatch\/backend\/ml_model\/resume_preprocessor\/gpt_class.py\", line 16, in __init__\n self.model = ChatOpenAI(api_key=openai_api_key, temperature=0, model_name='gpt-3.5-turbo-16k-0613')\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"\/Users\/danielperlov\/dperlov\/JobsMatch\/backend\/ml_model\/resume_preprocessor\/.venv\/lib\/python3.12\/site-packages\/pydantic\/v1\/main.py\", line 339, in __init__\n values, fields_set, validation_error = validate_model(__pydantic_self__.__class__, data)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"\/Users\/danielperlov\/dperlov\/JobsMatch\/backend\/ml_model\/resume_preprocessor\/.venv\/lib\/python3.12\/site-packages\/pydantic\/v1\/main.py\", line 1064, in validate_model\n value = field.get_default()\n ^^^^^^^^^^^^^^^^^^^\n File \"\/Users\/danielperlov\/dperlov\/JobsMatch\/backend\/ml_model\/resume_preprocessor\/.venv\/lib\/python3.12\/site-packages\/pydantic\/v1\/fields.py\", line 437, in get_default\n return smart_deepcopy(self.default) if self.default_factory is None else self.default_factory()\n ^^^^^^^^^^^^^^^^^^^^^^\n File \"\/Users\/danielperlov\/dperlov\/JobsMatch\/backend\/ml_model\/resume_preprocessor\/.venv\/lib\/python3.12\/site-packages\/langchain_core\/language_models\/base.py\", line 72, in _get_verbosity\n return get_verbose()\n ^^^^^^^^^^^^^\n File \"\/Users\/danielperlov\/dperlov\/JobsMatch\/backend\/ml_model\/resume_preprocessor\/.venv\/lib\/python3.12\/site-packages\/langchain_core\/globals.py\", line 72, in get_verbose\n old_verbose = langchain.verbose\n ^^^^^^^^^^^^^^^^^\nAttributeError: module 'langchain' has no attribute 'verbose'\n\n", - "expanded_query": "Langchain version compatibility issues with langchain-core and langchain-experimental; troubleshooting AttributeError: module 'langchain' has no attribute 'verbose' in Langchain 0.2.1 environment; diagnosing version conflicts between langchain, langchain-core, langchain-community, langchain-experimental, and langchain-openai packages; best practices for installing Langchain and related packages for PDF processing and vector database integration with Qdrant; known bugs or breaking changes in Langchain versions 0.1.x to 0.2.1 related to ChatOpenAI initialization and pydantic validation errors; workarounds or fixes for Langchain verbose attribute missing error; compatibility of Langchain 0.2.1 with Python 3.12.3 and OpenAI 1.28.1; how to properly set up environment and dependencies for Langchain-based RAG agents using Qdrant vector database; community discussions or GitHub issues about Langchain verbose attribute and version mismatches; recommended versions of Langchain and related packages for stable PDF chunking and vector storage workflows." - }, - { - "augmented": true, - "question": "I'm working with AzureOpenAI and langchain, constantly getting hit by PermissionError. This mostly could be due to the proxy, but can someone please check the code --\nfrom langchain.llms import OpenAI, AzureOpenAI\nfrom langchain.prompts import PromptTemplate\nfrom langchain.chains import LLMChain\n\nllm = AzureOpenAI(openai_api_type=\"\", openai_api_base=\"\", deployment_name=\"\", model_name=\"\", openai_api_key=\"\", openai_api_version=\"\")\n\ntemplate = \"\"\"\"\nTranslate the following text from {source_lang} to {dest_lang}: {source_text}\n\"\"\"\n\nprompt_name = PromptTemplate(input_variables=[\"source_lang\", \"dest_lang\", \"source_text\"], template=template)\nchain = LLMChain(llm=llm, prompt=prompt_name)\n\nchain.predict(source_lang=\"English\", dest_lang=\"Spanish\", source_text=\"How are you?\")\n\nchain(inputs={\"source_lang\": \"English\", \"dest_lang\": \"Spanish\", \"source_text\": \"How are you\"})\n\nI also tried the additional openai_proxy parameter without much luck.\n", - "expanded_query": "How to properly configure AzureOpenAI with langchain to avoid PermissionError, including correct usage of parameters like openai_api_type, openai_api_base, deployment_name, model_name, openai_api_key, openai_api_version, and openai_proxy? Common causes and fixes for PermissionError when using AzureOpenAI with langchain behind a proxy? Examples of correct code setup for AzureOpenAI in langchain to prevent permission issues? How to troubleshoot and resolve PermissionError related to proxy settings in AzureOpenAI and langchain integration? Best practices for setting environment variables or parameters for AzureOpenAI API access in langchain? Documentation or community solutions addressing PermissionError in AzureOpenAI usage with langchain?" - }, - { - "question": "After installing pip install langchain-experimental I have tried:\nfrom langchain_experimental.sql_database import SQLDatabase\n\nBut it does not work. The code is as follows:\n# 1. Load db with langchain\nfrom langchain.sql_database import SQLDatabase\ndb = SQLDatabase.from_uri(\"sqlite:\/\/\/\/python\/chatopenai\/ecommerce.db\")\n\n# 2. Import APIs\nimport a_env_vars\nimport os\nos.environ[\"OPENAI_API_KEY\"] = a_env_vars.OPENAI_API_KEY\n\n# 3. Create LLM\nfrom langchain.chat_models import ChatOpenAI\nllm = ChatOpenAI(temperature=0,model_name='gpt-3.5-turbo')\n\n# 4. Create chain\nfrom langchain import SQLDatabaseChain\ncadena = SQLDatabaseChain(llm = llm, database = db, verbose=False)\n\nAnd the error is:\nImportError: cannot import name 'SQLDatabaseChain' from 'langchain' (C:\\Users\\jcarr\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\langchain\\__init__.py) Traceback: File \"C:\\Users\\jcarr\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\streamlit\\runtime\\scriptrunner\\script_runner.py\", line 534, in _run_script\n exec(code, module.__dict__) File \"C:\\python\\chatOpenAI\\c_front_end.py\", line 3, in \n import b_backend File \"C:\\python\\chatOpenAI\\b_backend.py\", line 15, in \n from langchain import SQLDatabaseChain\n\nThis is after doing the same with \"langchain.sql_database\".\n", - "dataset_ids": [ - "langchain\/cookbook\/sql_db_qa.mdx_0_5328", - "langchain\/libs\/langchain\/langchain\/__init__.py_1391_10280", - "langchain\/cookbook\/sql_db_qa.mdx_27935_33044", - "langchain\/docs\/docs\/tutorials\/sql_qa.ipynb_0_7460", - "langchain\/docs\/docs\/integrations\/providers\/motherduck.mdx_0_1355", - "langchain\/docs\/docs\/integrations\/providers\/rebuff.ipynb_0_6692", - "langchain\/cookbook\/databricks_sql_db.ipynb_0_6800" - ], - "nugget_data": [ - { - "nugget_id": "77569490_nugget_0", - "text": "'SQLDatabaseChain' should be imported from 'langchain_experimental.sql'.", - "relevant_corpus_ids": [ - "langchain\/cookbook\/sql_db_qa.mdx_0_5328", - "langchain\/libs\/langchain\/langchain\/__init__.py_1391_10280" - ] - }, - { - "nugget_id": "77569490_nugget_1", - "text": "Use 'SQLDatabase.from_uri' to create a database instance.", - "relevant_corpus_ids": [ - "langchain\/cookbook\/sql_db_qa.mdx_0_5328", - "langchain\/cookbook\/sql_db_qa.mdx_27935_33044", - "langchain\/docs\/docs\/tutorials\/sql_qa.ipynb_0_7460", - "langchain\/docs\/docs\/integrations\/providers\/motherduck.mdx_0_1355", - "langchain\/docs\/docs\/integrations\/providers\/rebuff.ipynb_0_6692" - ] - }, - { - "nugget_id": "77569490_nugget_2", - "text": "Use 'SQLDatabaseChain.from_llm' with an LLM instance and the database instance to create the chain.", - "relevant_corpus_ids": [ - "langchain\/cookbook\/sql_db_qa.mdx_0_5328", - "langchain\/cookbook\/sql_db_qa.mdx_27935_33044", - "langchain\/docs\/docs\/integrations\/providers\/motherduck.mdx_0_1355", - "langchain\/docs\/docs\/integrations\/providers\/rebuff.ipynb_0_6692", - "langchain\/cookbook\/databricks_sql_db.ipynb_0_6800" - ] - } - ] - }, - { - "question": "I am extracting text from pdf documents and load it to Azure Cognitive Search for a RAG approach. Unfortunately this does not work. I am receiving the error message\nHttpResponseError: () The request is invalid. Details: The property 'content' does not exist on type 'search.documentFields'. Make sure to only use property names that are defined by the type.\nCode: \nMessage: The request is invalid. Details: The property 'content' does not exist on type 'search.documentFields'. Make sure to only use property names that are defined by the type.\n\nWhat i want to do is\n\nExtract text from pdf via pymupdf - works\nUpload it to Azure Vector search as embeddings with vectors and metdata `filename``\nQuery this through ChatGPT model\n\nBeside the error i want to add to this document object the metadata information filename but also dont know how to extend this ...\nMy code:\n!pip install cohere tiktoken\n!pip install openai==0.28.1\n!pip install pymupdf\n!pip install azure-storage-blob azure-identity\n!pip install azure-search-documents --pre --upgrade\n!pip install langchain\n\nimport fitz\nimport time\nimport uuid\nimport os\nimport openai\n\nfrom PIL import Image\nfrom io import BytesIO\nfrom IPython.display import display\n\nfrom azure.identity import DefaultAzureCredential\nfrom azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient\n\nfrom langchain.embeddings import OpenAIEmbeddings\nfrom langchain.text_splitter import RecursiveCharacterTextSplitter\n\nfrom langchain.chat_models import AzureChatOpenAI\nfrom langchain.vectorstores import AzureSearch\nfrom langchain.docstore.document import Document\nfrom langchain.document_loaders import DirectoryLoader\nfrom langchain.document_loaders import TextLoader\nfrom langchain.text_splitter import TokenTextSplitter\nfrom langchain.chains import ConversationalRetrievalChain\nfrom langchain.prompts import PromptTemplate\n\nfrom google.colab import drive\n\nOPENAI_API_BASE = \"https:\/\/xxx.openai.azure.com\"\nOPENAI_API_KEY = \"xxx\"\nOPENAI_API_VERSION = \"2023-05-15\"\n\nopenai.api_type = \"azure\"\nopenai.api_key = OPENAI_API_KEY\nopenai.api_base = OPENAI_API_BASE\nopenai.api_version = OPENAI_API_VERSION\n\nAZURE_COGNITIVE_SEARCH_SERVICE_NAME = \"https:\/\/xxx.search.windows.net\"\nAZURE_COGNITIVE_SEARCH_API_KEY = \"xxx\"\nAZURE_COGNITIVE_SEARCH_INDEX_NAME = \"test\"\n\nllm = AzureChatOpenAI(deployment_name=\"gpt35\", openai_api_key=OPENAI_API_KEY, openai_api_base=OPENAI_API_BASE, openai_api_version=OPENAI_API_VERSION)\nembeddings = OpenAIEmbeddings(deployment_id=\"ada002\", chunk_size=1, openai_api_key=OPENAI_API_KEY, openai_api_base=OPENAI_API_BASE, openai_api_version=OPENAI_API_VERSION)\n\nacs = AzureSearch(azure_search_endpoint=AZURE_COGNITIVE_SEARCH_SERVICE_NAME,\n azure_search_key = AZURE_COGNITIVE_SEARCH_API_KEY,\n index_name = AZURE_COGNITIVE_SEARCH_INDEX_NAME,\n embedding_function = embeddings.embed_query)\n \ndef generate_tokens(s, f):\n text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)\n splits = text_splitter.split_text(s)\n i = 0\n\n documents = []\n for split in splits:\n metadata = {}\n metadata[\"index\"] = i\n metadata[\"file_source\"] = f\n i = i+1\n\n new_doc = Document(page_content=split, metadata=metadata)\n documents.append(new_doc)\n #documents = text_splitter.create_documents(splits)\n\n print (documents)\n\n return documents\n\n\ndrive.mount('\/content\/drive')\nfolder = \"\/content\/drive\/...\/pdf\/\"\n\npage_content = ''\ndoc_content = ''\n \nfor filename in os.listdir(folder):\n file_path = os.path.join(folder, filename)\n if os.path.isfile(file_path):\n print(f\"Processing file: {file_path}\")\n\n doc = fitz.open(file_path)\n for page in doc: # iterate the document pages\n page_content += page.get_text() # get plain text encoded as UTF-8 \n d = generate_tokens(doc_content)\n\n # the following line throws the error\n # how can i add the chunks + filename to \n # Azure Cognitive Search?\n\n doc_content += page_content\n d = generate_tokens(doc_content, file_path)\n\n acs.add_documents(documents=d)\n \n print(metadatas)\n print(\"----------\")\n print(doc_content)\n count = len(doc_content.split())\n print(\"Number of tokens: \", count)\n\n\nHttpResponseError Traceback (most recent call last)\n in ()\n 31 all_texts.extend(d)\n 32 \n---> 33 acs.add_documents(documents=d)\n 34 \n 35 metadatas = [{\"Source\": f\"{i}-pl\"} for i in range(len(all_texts))]\n\n7 frames\n\/usr\/local\/lib\/python3.10\/dist-packages\/azure\/search\/documents\/_generated\/operations\/_documents_operations.py in index(self, batch, request_options, **kwargs)\n 1249 map_error(status_code=response.status_code, response=response, error_map=error_map)\n 1250 error = self._deserialize.failsafe_deserialize(_models.SearchError, pipeline_response)\n-> 1251 raise HttpResponseError(response=response, model=error)\n 1252 \n 1253 if response.status_code == 200:\n\nHttpResponseError: () The request is invalid. Details: The property 'content' does not exist on type 'search.documentFields'. Make sure to only use property names that are defined by the type.\nCode: \nMessage: The request is invalid. Details: The property 'content' does not exist on type 'search.documentFields'. Make sure to only use property names that are defined by the type.\n\nThis is my index in Azure Cognitive Search index:\n\n", - "dataset_ids": [ - "langchainjs\/libs\/langchain-community\/src\/vectorstores\/azure_aisearch.ts_20062_24560", - "langchain\/docs\/docs\/integrations\/vectorstores\/azuresearch.ipynb_26732_32759", - "azure-openai\/End_to_end_Solutions\/AOAISearchDemo\/scripts\/indexing\/prepdocs.py_0_8673", - "azure-search-openai-demo\/tests\/test_searchmanager.py_0_7619", - "openai-cookbook\/examples\/azure\/archive\/chat_with_your_own_data.ipynb_0_7894", - "azure-openai\/End_to_end_Solutions\/InsightsGenerator\/insights_generator\/clients\/search_client.py_0_5069", - "llama_index\/llama-index-integrations\/readers\/llama-index-readers-azcognitive-search\/llama_index\/readers\/azcognitive_search\/base.py_0_2055", - "llama_index\/llama-index-integrations\/vector_stores\/llama-index-vector-stores-azureaisearch\/llama_index\/vector_stores\/azureaisearch\/base.py_0_9353", - "openai-cookbook\/examples\/chatgpt\/rag-quickstart\/azure\/Azure_AI_Search_with_Azure_Functions_and_GPT_Actions_in_ChatGPT.ipynb_29578_39232", - "openai-cookbook\/examples\/vector_databases\/azuresearch\/Getting_started_with_azure_ai_search_and_openai.ipynb_18981_29339", - "llama_index\/docs\/docs\/examples\/vector_stores\/AzureAISearchIndexDemo.ipynb_0_7287", - "llama_index\/llama-index-integrations\/vector_stores\/llama-index-vector-stores-azureaisearch\/llama_index\/vector_stores\/azureaisearch\/base.py_9359_19011" - ], - "nugget_data": [ - { - "nugget_id": "77465301_nugget_0", - "text": "The error is due to missing or incorrectly defined fields in Azure Cognitive Search.", - "relevant_corpus_ids": [ - "langchainjs\/libs\/langchain-community\/src\/vectorstores\/azure_aisearch.ts_20062_24560", - "langchain\/docs\/docs\/integrations\/vectorstores\/azuresearch.ipynb_26732_32759", - "azure-openai\/End_to_end_Solutions\/AOAISearchDemo\/scripts\/indexing\/prepdocs.py_0_8673", - "azure-search-openai-demo\/tests\/test_searchmanager.py_0_7619", - "openai-cookbook\/examples\/azure\/archive\/chat_with_your_own_data.ipynb_0_7894", - "azure-openai\/End_to_end_Solutions\/InsightsGenerator\/insights_generator\/clients\/search_client.py_0_5069", - "llama_index\/llama-index-integrations\/readers\/llama-index-readers-azcognitive-search\/llama_index\/readers\/azcognitive_search\/base.py_0_2055", - "llama_index\/llama-index-integrations\/vector_stores\/llama-index-vector-stores-azureaisearch\/llama_index\/vector_stores\/azureaisearch\/base.py_0_9353" - ] - }, - { - "nugget_id": "77465301_nugget_1", - "text": "Create a field named 'content_vector' in Azure Cognitive Search to hold the vectors.", - "relevant_corpus_ids": [ - "openai-cookbook\/examples\/chatgpt\/rag-quickstart\/azure\/Azure_AI_Search_with_Azure_Functions_and_GPT_Actions_in_ChatGPT.ipynb_29578_39232", - "openai-cookbook\/examples\/vector_databases\/azuresearch\/Getting_started_with_azure_ai_search_and_openai.ipynb_18981_29339", - "langchainjs\/libs\/langchain-community\/src\/vectorstores\/azure_aisearch.ts_20062_24560", - "langchain\/docs\/docs\/integrations\/vectorstores\/azuresearch.ipynb_26732_32759", - "azure-openai\/End_to_end_Solutions\/AOAISearchDemo\/scripts\/indexing\/prepdocs.py_0_8673", - "azure-search-openai-demo\/tests\/test_searchmanager.py_0_7619" - ] - }, - { - "nugget_id": "77465301_nugget_2", - "text": "The 'content_vector' field should have the type 'Collection(Edm.Single)' and dimensions set to 1536.", - "relevant_corpus_ids": [ - "openai-cookbook\/examples\/chatgpt\/rag-quickstart\/azure\/Azure_AI_Search_with_Azure_Functions_and_GPT_Actions_in_ChatGPT.ipynb_29578_39232", - "openai-cookbook\/examples\/vector_databases\/azuresearch\/Getting_started_with_azure_ai_search_and_openai.ipynb_18981_29339", - "langchainjs\/libs\/langchain-community\/src\/vectorstores\/azure_aisearch.ts_20062_24560", - "langchain\/docs\/docs\/integrations\/vectorstores\/azuresearch.ipynb_26732_32759", - "llama_index\/docs\/docs\/examples\/vector_stores\/AzureAISearchIndexDemo.ipynb_0_7287", - "azure-search-openai-demo\/tests\/test_searchmanager.py_0_7619" - ] - }, - { - "nugget_id": "77465301_nugget_3", - "text": "Ensure that the Azure Cognitive Search index is correctly configured with the necessary fields before uploading documents.\n\n```", - "relevant_corpus_ids": [ - "openai-cookbook\/examples\/chatgpt\/rag-quickstart\/azure\/Azure_AI_Search_with_Azure_Functions_and_GPT_Actions_in_ChatGPT.ipynb_29578_39232", - "llama_index\/llama-index-integrations\/vector_stores\/llama-index-vector-stores-azureaisearch\/llama_index\/vector_stores\/azureaisearch\/base.py_9359_19011", - "openai-cookbook\/examples\/vector_databases\/azuresearch\/Getting_started_with_azure_ai_search_and_openai.ipynb_18981_29339", - "langchainjs\/libs\/langchain-community\/src\/vectorstores\/azure_aisearch.ts_20062_24560", - "langchain\/docs\/docs\/integrations\/vectorstores\/azuresearch.ipynb_26732_32759", - "azure-openai\/End_to_end_Solutions\/AOAISearchDemo\/scripts\/indexing\/prepdocs.py_0_8673", - "azure-search-openai-demo\/tests\/test_searchmanager.py_0_7619", - "azure-openai\/End_to_end_Solutions\/InsightsGenerator\/insights_generator\/clients\/search_client.py_0_5069", - "llama_index\/llama-index-integrations\/vector_stores\/llama-index-vector-stores-azureaisearch\/llama_index\/vector_stores\/azureaisearch\/base.py_0_9353" - ] - } - ] - } - ], - "signature": { - "instructions": "You are a seasoned Langchain developer and troubleshooting expert specializing in AI integration and vector database setups. Expand the given user query into a comprehensive search query that targets detailed information on resolving import errors, version incompatibilities, configuration issues, and permission errors related to Langchain and its ecosystem (including langchain-experimental, langchain-core, Azure OpenAI, Qdrant, and related packages). The expanded query should be structured to retrieve practical debugging advice, best practices for correct imports and usage, version compatibility matrices, proxy and permission troubleshooting for AzureOpenAI, and code examples demonstrating proper setup and usage. Ensure the expanded query captures nuances such as recent changes in Langchain’s API, common pitfalls with environment variables and API keys, and dependency conflicts affecting Python 3.12 environments. The goal is to enable a search engine to return comprehensive, actionable resources that directly address the user’s complex multi-part troubleshooting needs in Langchain-based AI development.", - "fields": [ - { - "prefix": "Question:", - "description": "${question}" - }, - { - "prefix": "Expanded Query:", - "description": "${expanded_query}" - } - ] - }, - "lm": null - }, - "metadata": { - "dependency_versions": { - "python": "3.10", - "dspy": "2.6.27", - "cloudpickle": "3.1" - } - } -} \ No newline at end of file diff --git a/optimization_runs/mipro_optimized_query_writer.json b/optimization_runs/mipro_optimized_query_writer.json deleted file mode 100644 index d395131..0000000 --- a/optimization_runs/mipro_optimized_query_writer.json +++ /dev/null @@ -1,128 +0,0 @@ -{ - "query_writer.predict": { - "traces": [], - "train": [], - "demos": [ - { - "augmented": true, - "question": "I Am trying to download a PDF file from a GCS storage bucket and read the content into memory.\nWhen using Langchain with python, i can just use the GCSDirectoryLoader to read all the files in a bucket and the pdf text.\nLangchain for NodeJs doesnt have GCSDirectoryLoader or a webloader for PDF files.\nWhen downloading the file, i get a Document with the binary representation as content.\nWhat is the best way to download pdf content from GCS bucket into memory?\n", - "reasoning": "The user wants to download a PDF file from a Google Cloud Storage (GCS) bucket and read its content into memory using Node.js. They mention that Langchain's Python version has a GCSDirectoryLoader that can read PDFs directly, but the Node.js version lacks this feature. When they download the file, they get a Document with binary content, but they want to extract the text content from the PDF in memory.\n\nTo answer this, I need to find the best approach or libraries in Node.js to:\n1. Download a PDF file from a GCS bucket into memory (not just to disk).\n2. Parse or extract text content from the PDF binary data in memory.\n\nThis likely involves using the official Google Cloud Storage Node.js client to download the file as a buffer, then using a PDF parsing library (like pdf-parse, pdfjs-dist, or similar) to extract text from the buffer.\n\nI should search for:\n- How to download a file from GCS bucket into memory in Node.js\n- How to extract text from a PDF buffer in Node.js\n- Examples or best practices combining these two steps\n- Possibly Langchain Node.js community solutions or workarounds for PDF loading from GCS", - "search_queries": [ - "download file from Google Cloud Storage bucket into memory Node.js", - "extract text from PDF buffer Node.js", - "parse PDF from buffer Node.js", - "google cloud storage download pdf as buffer nodejs", - "pdf text extraction libraries nodejs", - "langchain nodejs load pdf from buffer", - "best way to read pdf content from memory nodejs", - "google cloud storage nodejs example download file in memory" - ] - }, - { - "question": "Hello i am trying to run this following code but i am getting an error;\nfrom langchain.schema import BaseOuputParser\n\nError;\n\nImportError: cannot import name 'BaseOuputParser' from\n'langchain.schema'\n\nMy langchain version is ; '0.1.7'\n", - "dataset_ids": [ - "langchain\/libs\/langchain\/tests\/unit_tests\/schema\/test_output_parser.py_0_396", - "langchain\/libs\/core\/tests\/unit_tests\/output_parsers\/test_imports.py_0_652" - ], - "nugget_data": [ - { - "nugget_id": "78023380_nugget_0", - "text": "The ImportError is due to a typo in the import statement: 'BaseOuputParser' should be 'BaseOutputParser'.", - "relevant_corpus_ids": [ - "langchain\/libs\/langchain\/tests\/unit_tests\/schema\/test_output_parser.py_0_396", - "langchain\/libs\/core\/tests\/unit_tests\/output_parsers\/test_imports.py_0_652" - ] - }, - { - "nugget_id": "78023380_nugget_1", - "text": "Correcting the typo in the import statement should resolve the ImportError.", - "relevant_corpus_ids": [ - "langchain\/libs\/langchain\/tests\/unit_tests\/schema\/test_output_parser.py_0_396", - "langchain\/libs\/core\/tests\/unit_tests\/output_parsers\/test_imports.py_0_652" - ] - } - ] - }, - { - "question": "I query a collection in a zilliz milvus db like this:\ndocuments = vector_store.similarity_search_with_score(query)\n\nThe query is successful but in line 777 of milvus.py the value result.full_length is retrieved, which is not available:\nfor result in res[0]:\n data = {x: result.entity.get(x) for x in output_fields}\n doc = self._parse_document(data)\n pair = (doc, result.full_length)\n ret.append(pair)\n\nwhich then leads to this exception\nFile \"\/Users\/tilman\/LangchainCorsera\/venv\/lib\/python3.9\/site-packages\/langchain_community\/vectorstores\/milvus.py\", line 644, in similarity_search\n res = self.similarity_search_with_score(\n File \"\/Users\/tilman\/LangchainCorsera\/venv\/lib\/python3.9\/site-packages\/langchain_community\/vectorstores\/milvus.py\", line 717, in similarity_search_with_score\n res = self.similarity_search_with_score_by_vector(\n File \"\/Users\/tilman\/LangchainCorsera\/venv\/lib\/python3.9\/site-packages\/langchain_community\/vectorstores\/milvus.py\", line 777, in similarity_search_with_score_by_vector\n pair = (doc, result.full_length)\n File \"\/Users\/tilman\/LangchainCorsera\/venv\/lib\/python3.9\/site-packages\/pymilvus\/client\/abstract.py\", line 588, in __getattr__\n raise MilvusException(message=f\"Field {item} is not in the hit entity\")\npymilvus.exceptions.MilvusException: \n\nAny clues?\n", - "dataset_ids": [ - "langchain\/docs\/docs\/versions\/release_policy.mdx_0_6118", - "langchain\/libs\/community\/langchain_community\/vectorstores\/infinispanvs.py_9486_15232" - ], - "nugget_data": [ - { - "nugget_id": "78352556_nugget_0", - "text": "The error is due to a bug in the `langchain-community` library.", - "relevant_corpus_ids": [ - "langchain\/docs\/docs\/versions\/release_policy.mdx_0_6118", - "langchain\/libs\/community\/langchain_community\/vectorstores\/infinispanvs.py_9486_15232" - ] - }, - { - "nugget_id": "78352556_nugget_1", - "text": "The issue can be resolved by updating the `langchain-community` library to a version where the bug is fixed.", - "relevant_corpus_ids": [ - "langchain\/docs\/docs\/versions\/release_policy.mdx_0_6118" - ] - } - ] - }, - { - "question": "Am trying to create vector stores on top of my existing KG using from_existing_graph, (followed tomaz and Saurav Joshi neo4j blog posts) - this method is allowing me to create embedding\/vector index only for single label due to which am unable to get desired results while asking NLQ (I am assuming though).\nbelow code is able to answer, the age and location of Oliver but not what he directed,\ni believe this is due to from_existing_graph has only to pass single label and its corresponding properties as option for generating embeddings and vector index\nAny ideas, how to achieve this?\nimport os\nimport re\nfrom langchain.vectorstores.neo4j_vector import Neo4jVector\n# from langchain.document_loaders import WikipediaLoader\nfrom langchain_openai import OpenAIEmbeddings\n# from langchain.text_splitter import CharacterTextSplitter, RecursiveCharacterTextSplitter\nfrom langchain.graphs import Neo4jGraph\nimport openai\n# from transformers import AutoModelForSeq2SeqLM, AutoTokenizer\n\nos.environ[\"OPENAI_API_KEY\"] = \"sk-xx\"\nurl = \"neo4j+s:\/\/xxxx.databases.neo4j.io\"\nusername = \"neo4j\"\npassword = \"mypassword\"\nexisting_graph = Neo4jVector.from_existing_graph(\n embedding=OpenAIEmbeddings(),\n url=url,\n username=username,\n password=password,\n index_name=\"person\",\n node_label=\"Person\",\n text_node_properties=[\"name\", \"age\", \"location\"],\n embedding_node_property=\"embedding\",\n)\n\nfrom langchain.chat_models import ChatOpenAI\nfrom langchain.chains import GraphCypherQAChain\nfrom langchain.graphs import Neo4jGraph\n\ngraph = Neo4jGraph(\n url=url, username=username, password=password\n)\n\nchain = GraphCypherQAChain.from_llm(\n ChatOpenAI(temperature=0), graph=graph, verbose=True\n)\n\nquery = \"Where does Oliver Stone live?\"\n#query = \"Name some films directed by Oliver Stone?\" \n\ngraph_result = chain.invoke(query)\n\nvector_results = existing_graph.similarity_search(query, k=1)\nfor i, res in enumerate(vector_results):\n print(res.page_content)\n if i != len(vector_results)-1:\n print()\nvector_result = vector_results[0].page_content\n\n# Construct prompt for OpenAI\nfinal_prompt = f\"\"\"You are a helpful question-answering agent. Your task is to analyze\nand synthesize information from two sources: the top result from a similarity search\n(unstructured information) and relevant data from a graph database (structured information).\nGiven the user's query: {query}, provide a meaningful and efficient answer based\non the insights derived from the following data:\n\nUnstructured information: {vector_result}.\nStructured information: {graph_result} \"\"\"\n\n\nfrom openai import OpenAI\nclient = OpenAI(\n # This is the default and can be omitted\n api_key=os.environ.get(\"OPENAI_API_KEY\"),\n)\n\nchat_completion = client.chat.completions.create(messages=[{\"role\": \"user\",\"content\": final_prompt, }],model=\"gpt-3.5-turbo\",)\n\nanswer = chat_completion.choices[0].message.content.strip()\nprint(answer)\n\nAny help would be highly appreicated?\nhere is my schema:\nNode properties are the following:\nPerson {name: STRING, embedding: LIST, age: INTEGER, location: STRING},Actor {name: STRING, embedding: LIST},Movie {title: STRING},Director {name: STRING, embedding: LIST, age: INTEGER, location: STRING}\nRelationship properties are the following:\nACTED_IN {role: STRING}\nThe relationships are the following:\n(:Person)-[:ACTED_IN]->(:Movie),(:Person)-[:DIRECTED]->(:Movie),(:Actor)-[:ACTED_IN]->(:Movie),(:Director)-[:DIRECTED]->(:Movie)\n\nCypher used to create:\nCREATE (charlie:Person:Actor {name: 'Charlie Sheen'})-[:ACTED_IN {role: 'Bud Fox'}]->(wallStreet:Movie {title: 'Wall Street'})<-[:DIRECTED]-(oliver:Person:Director {name: 'Oliver Stone'});\nMATCH (n:Person {name: 'Oliver Stone'}) SET n.age = 30, n.location = \"New York\" RETURN n\n\n", - "dataset_ids": [ - "llama_index\/llama-index-integrations\/graph_stores\/llama-index-graph-stores-falkordb\/llama_index\/graph_stores\/falkordb\/falkordb_property_graph.py_10334_17883", - "llama_index\/llama-index-core\/llama_index\/core\/indices\/knowledge_graph\/base.py_9948_14469" - ], - "nugget_data": [ - { - "nugget_id": "78173243_nugget_0", - "text": "Add the `:DIRECTED` relationship to the index to include movies directed by a person in the embeddings.", - "relevant_corpus_ids": [ - "llama_index\/llama-index-integrations\/graph_stores\/llama-index-graph-stores-falkordb\/llama_index\/graph_stores\/falkordb\/falkordb_property_graph.py_10334_17883", - "llama_index\/llama-index-core\/llama_index\/core\/indices\/knowledge_graph\/base.py_9948_14469" - ] - }, - { - "nugget_id": "78173243_nugget_1", - "text": "Use a retrieval query to match nodes with the `:DIRECTED` relationship and collect movies as metadata.", - "relevant_corpus_ids": [ - "llama_index\/llama-index-integrations\/graph_stores\/llama-index-graph-stores-falkordb\/llama_index\/graph_stores\/falkordb\/falkordb_property_graph.py_10334_17883" - ] - }, - { - "nugget_id": "78173243_nugget_2", - "text": "Update the vector result to include movie titles as metadata for comprehensive query results.", - "relevant_corpus_ids": [ - "llama_index\/llama-index-integrations\/graph_stores\/llama-index-graph-stores-falkordb\/llama_index\/graph_stores\/falkordb\/falkordb_property_graph.py_10334_17883" - ] - } - ] - } - ], - "signature": { - "instructions": "You are an expert technical researcher tasked with resolving complex programming issues involving Langchain, Neo4j, Milvus, and Google Cloud Storage integrations. Given a detailed programming question that includes code snippets, error messages, and context about vector stores, embeddings, graph databases, or library usage, generate a diverse and comprehensive set of precise search queries. These queries should be designed to thoroughly explore the problem space, uncover relevant documentation, examples, best practices, and potential workarounds. Your goal is to enable a high-stakes debugging or enhancement scenario where the user must quickly find actionable solutions to fix errors, extend functionality, or optimize code. Ensure the queries cover multiple angles including version compatibility, API usage, multi-label graph embeddings, relationship indexing, and common pitfalls to maximize the breadth and depth of search results.", - "fields": [ - { - "prefix": "Question:", - "description": "${question}" - }, - { - "prefix": "Reasoning: Let's think step by step in order to", - "description": "${reasoning}" - }, - { - "prefix": "Search Queries:", - "description": "${search_queries}" - } - ] - }, - "lm": null - }, - "metadata": { - "dependency_versions": { - "python": "3.10", - "dspy": "2.6.27", - "cloudpickle": "3.1" - } - } -} \ No newline at end of file diff --git a/optimization_runs/mipro_query_expander.ipynb b/optimization_runs/mipro_query_expander.ipynb deleted file mode 100644 index c629e74..0000000 --- a/optimization_runs/mipro_query_expander.ipynb +++ /dev/null @@ -1,606 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 5, - "id": "8273a63d", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\u001b[96mCoverage@100 evaluation:\u001b[0m\n", - "Total nuggets: 5\n", - "Covered nuggets: 1\n", - "\u001b[92mNugget 1: Covered\u001b[0m\n", - "\u001b[91mNugget 2: Not covered\u001b[0m\n", - "\u001b[91mNugget 3: Not covered\u001b[0m\n", - "\u001b[91mNugget 4: Not covered\u001b[0m\n", - "\u001b[91mNugget 5: Not covered\u001b[0m\n", - "\u001b[96mCoverage@100: 1/5 = 0.20\u001b[0m\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n" - ] - }, - { - "data": { - "text/plain": [ - "Prediction(\n", - " final_answer='',\n", - " sources=[Source(object_id='9608f261-9b02-4a9e-ab23-54b3b4c704f8'), Source(object_id='acdb9703-2471-4b96-989b-b1fe0f1b8aff'), Source(object_id='2c9c4348-53cf-4f35-b070-b6de187aaa5b'), Source(object_id='5163bd72-2249-4fa0-9ac4-7ba904a7f4e4'), Source(object_id='250479d5-7312-4a56-8b97-edfa2b8e54b4'), Source(object_id='4cf9a2cb-e9ec-4883-8075-05054f38f8a6'), Source(object_id='94f1d91d-1b3f-43bf-a7c0-b983ff8f3d8a'), Source(object_id='ba45fed1-db4d-4399-bd52-9627fb1ddbe7'), Source(object_id='653f7f19-da48-45df-b9d5-19ef173390dc'), Source(object_id='2c47bd45-b572-45be-a0bb-c583cdd809f9'), Source(object_id='9043a9eb-adcc-4712-a459-9b9b2280c862'), Source(object_id='b980876d-0521-4440-abf5-ea2b64dc96ff'), Source(object_id='cc38a9f1-8139-4894-b784-97588f1644b5'), Source(object_id='5e0291cb-8d88-454c-8938-fead32f20f49'), Source(object_id='1f907025-81ac-4950-a4dd-d858b9b28f29'), Source(object_id='a878bad7-1e66-427d-9672-aa96164bb41b'), Source(object_id='28a80008-7522-4851-97d3-ceb1430f3702'), Source(object_id='eabdeb84-50fd-43c0-8711-a0ec0a3d34b2'), Source(object_id='2977861c-b311-45cf-9a95-5b81e82ef376'), Source(object_id='61baac69-68b2-4780-a4d4-198c9da8319f')],\n", - " searches=['How to integrate Weaviate vector search database with LangChain framework? Step-by-step guide or tutorial on using Weaviate as a vector store in LangChain, including code examples, setup instructions, and best practices for combining Weaviate with LangChain for building AI applications.'],\n", - " aggregations=None,\n", - " usage={}\n", - ")" - ] - }, - "execution_count": 5, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "import retrieve_dspy\n", - "\n", - "query_writer = retrieve_dspy.QueryExpander(\n", - " collection_name=\"FreshstackLangchain\",\n", - " target_property_name=\"docs_text\",\n", - " retrieved_k=20\n", - ")\n", - "\n", - "query_writer(\"How can I use Weaviate with LangChain?\")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "b59d1f90", - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "{'path': 'query_expander_training_samples.jsonl',\n", - " 'added': 20,\n", - " 'total_in_file': 20}" - ] - }, - "execution_count": 9, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "from retrieve_dspy.metrics import create_metric\n", - "from retrieve_dspy.datasets.in_memory import load_queries_in_memory\n", - "\n", - "trainset, testset = load_queries_in_memory(\n", - " dataset_name=\"freshstack-langchain\",\n", - " train_samples=20,\n", - " test_samples=20\n", - ")\n", - "\n", - "metric = create_metric(\n", - " metric_type=\"coverage\",\n", - " dataset_name=\"freshstack-langchain\"\n", - ")\n", - "\n", - "evaluator = retrieve_dspy.utils.get_evaluator(\n", - " testset=testset,\n", - " metric=metric\n", - ")\n", - "\n", - "retrieve_dspy.utils.save_training_questions(trainset, \"mipro_query_expander_training_samples.jsonl\")" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "id": "100918e5", - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "Example({'question': 'After installing pip install langchain-experimental I have tried:\\nfrom langchain_experimental.sql_database import SQLDatabase\\n\\nBut it does not work. The code is as follows:\\n# 1. Load db with langchain\\nfrom langchain.sql_database import SQLDatabase\\ndb = SQLDatabase.from_uri(\"sqlite:////python/chatopenai/ecommerce.db\")\\n\\n# 2. Import APIs\\nimport a_env_vars\\nimport os\\nos.environ[\"OPENAI_API_KEY\"] = a_env_vars.OPENAI_API_KEY\\n\\n# 3. Create LLM\\nfrom langchain.chat_models import ChatOpenAI\\nllm = ChatOpenAI(temperature=0,model_name=\\'gpt-3.5-turbo\\')\\n\\n# 4. Create chain\\nfrom langchain import SQLDatabaseChain\\ncadena = SQLDatabaseChain(llm = llm, database = db, verbose=False)\\n\\nAnd the error is:\\nImportError: cannot import name \\'SQLDatabaseChain\\' from \\'langchain\\' (C:\\\\Users\\\\jcarr\\\\AppData\\\\Local\\\\Programs\\\\Python\\\\Python311\\\\Lib\\\\site-packages\\\\langchain\\\\__init__.py) Traceback: File \"C:\\\\Users\\\\jcarr\\\\AppData\\\\Local\\\\Programs\\\\Python\\\\Python311\\\\Lib\\\\site-packages\\\\streamlit\\\\runtime\\\\scriptrunner\\\\script_runner.py\", line 534, in _run_script\\n exec(code, module.__dict__) File \"C:\\\\python\\\\chatOpenAI\\\\c_front_end.py\", line 3, in \\n import b_backend File \"C:\\\\python\\\\chatOpenAI\\\\b_backend.py\", line 15, in \\n from langchain import SQLDatabaseChain\\n\\nThis is after doing the same with \"langchain.sql_database\".\\n', 'dataset_ids': ['langchain/cookbook/sql_db_qa.mdx_0_5328', 'langchain/libs/langchain/langchain/__init__.py_1391_10280', 'langchain/cookbook/sql_db_qa.mdx_27935_33044', 'langchain/docs/docs/tutorials/sql_qa.ipynb_0_7460', 'langchain/docs/docs/integrations/providers/motherduck.mdx_0_1355', 'langchain/docs/docs/integrations/providers/rebuff.ipynb_0_6692', 'langchain/cookbook/databricks_sql_db.ipynb_0_6800'], 'nugget_data': [{'nugget_id': '77569490_nugget_0', 'text': \"'SQLDatabaseChain' should be imported from 'langchain_experimental.sql'.\", 'relevant_corpus_ids': ['langchain/cookbook/sql_db_qa.mdx_0_5328', 'langchain/libs/langchain/langchain/__init__.py_1391_10280']}, {'nugget_id': '77569490_nugget_1', 'text': \"Use 'SQLDatabase.from_uri' to create a database instance.\", 'relevant_corpus_ids': ['langchain/cookbook/sql_db_qa.mdx_0_5328', 'langchain/cookbook/sql_db_qa.mdx_27935_33044', 'langchain/docs/docs/tutorials/sql_qa.ipynb_0_7460', 'langchain/docs/docs/integrations/providers/motherduck.mdx_0_1355', 'langchain/docs/docs/integrations/providers/rebuff.ipynb_0_6692']}, {'nugget_id': '77569490_nugget_2', 'text': \"Use 'SQLDatabaseChain.from_llm' with an LLM instance and the database instance to create the chain.\", 'relevant_corpus_ids': ['langchain/cookbook/sql_db_qa.mdx_0_5328', 'langchain/cookbook/sql_db_qa.mdx_27935_33044', 'langchain/docs/docs/integrations/providers/motherduck.mdx_0_1355', 'langchain/docs/docs/integrations/providers/rebuff.ipynb_0_6692', 'langchain/cookbook/databricks_sql_db.ipynb_0_6800']}]}) (input_keys={'question'})" - ] - }, - "execution_count": 10, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "trainset[0]" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "id": "7aedd0f1", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - " 0%| | 0/20 [00:00\n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n" - ] - }, - { - "data": { - "text/plain": [ - "66.58" - ] - }, - "execution_count": 11, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "dspy_evaluator_kwargs = {\n", - " \"num_threads\": 5\n", - "}\n", - "\n", - "evaluator(query_writer, **dspy_evaluator_kwargs)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "2b8e7e36", - "metadata": {}, - "outputs": [], - "source": [ - "import dspy\n", - "\n", - "optimizer = dspy.MIPROv2(\n", - " metric=metric,\n", - " auto=\"heavy\",\n", - " verbose=True\n", - ")\n", - "\n", - "optimized_query_expander = optimizer.compile(\n", - " query_writer,\n", - " trainset=trainset,\n", - " requires_permission_to_run=False\n", - ")" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "id": "270a926f", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "MIPRO run is finished!\n" - ] - } - ], - "source": [ - "print(\"MIPRO run is finished!\")" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "id": "65d2026c", - "metadata": {}, - "outputs": [], - "source": [ - "optimized_query_expander.save(\"mipro_optimized_query_expander.json\")" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "id": "a0f5937d", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - " 0%| | 0/20 [00:00\n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n" - ] - }, - { - "data": { - "text/plain": [ - "66.17" - ] - }, - "execution_count": 15, - "metadata": {}, - "output_type": "execute_result" - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n" - ] - } - ], - "source": [ - "evaluator(optimized_query_expander, **dspy_evaluator_kwargs)" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.10.10" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} diff --git a/optimization_runs/mipro_query_expander_training_samples.jsonl b/optimization_runs/mipro_query_expander_training_samples.jsonl deleted file mode 100644 index e3b791f..0000000 --- a/optimization_runs/mipro_query_expander_training_samples.jsonl +++ /dev/null @@ -1,20 +0,0 @@ -{"question": "I wanted to add additional metadata to the documents being embedded and loaded into Chroma.\nI'm unable to find a way to add metadata to documents loaded using\nChroma.from_documents(documents, embeddings)\nFor example, imagine I have a text file having details of a particular disease, I wanted to add species as a metadata that is a list of all species it affects.\nAs a round-about way I loaded it in a chromadb collection by adding required metadata and persisted it\nclient = chromadb.PersistentClient(path=\"chromaDB\")\n\ncollection = client.get_or_create_collection(name=\"test\",\n embedding_function=openai_ef,\n metadata={\"hnsw:space\": \"cosine\"})\n\ncollection.add(\n documents=documents,\n ids=ids,\n metadatas=metadata\n)\n\nThis was the result,\ncollection.get(include=['embeddings','metadatas'])\n\nOutput:\n\n{'ids': ['id0',\n'id1',\n'embeddings': [[-0.014580891467630863,\n0.0003901976451743394,\n0.00793908629566431,\n-0.027648288756608963,\n-0.009689063765108585,\n0.010222840122878551,\n-0.00946609303355217,\n-0.002771923551335931,\n-0.04675614833831787,\n-0.02056729979813099,\n0.014364678412675858,\n...\n{'species': 'XYZ', 'source': 'Flu.txt'},\n{'species': 'ABC', 'source': 'Common_cold.txt'}],\n'documents': None,\n'uris': None,\n'data': None}\n\nNow I tried loading it from the directory persisted in the disk using Chroma.from_documents()\ndb = Chroma(persist_directory=\"chromaDB\", embedding_function=embeddings)\n\nBut I don't see anything loaded. db.get() results in this,\ndb.get(include=['metadatas'])\n\nOutput:\n\n{'ids': [],\n'embeddings': None,\n'metadatas': [],\n'documents': None,\n'uris': None,\n'data': None}\n\nPlease help. Need to load metadata to the files being loaded.\n"} -{"question": "I wrote a program trying to query local sqlite db, and it worked fine for text-davinci-003:\nllm = OpenAI(model_name=\"text-davinci-003\", verbose=True)\n\nHowever, after I changed it to GPT-4:\nllm = ChatOpenAI(model_name=\"gpt-4-0613\", verbose=True)\n...\ndb_chain = SQLDatabaseChain.from_llm(\n llm,\n db,\n verbose=True,\n use_query_checker=True,\n return_intermediate_steps=True,\n)\n\nwith get_openai_callback() as cb:\n # No intermediate steps\n # result = db_chain.run(query)\n\n # If intermediate steps are needed...\n result = db_chain(query)\n intermediate_steps = result[\"intermediate_steps\"]\n\n print(\"\")\n\n try:\n sql_result = intermediate_steps[3]\n print(\"SQL Query Result:\")\n print(json.dumps(ast.literal_eval(sql_result), indent=4))\n except Exception as e:\n print(f\"Error while parsing the SQL result:\\n{e}\")\n print(\"\")\n print(intermediate_steps)\n \n print(\"\")\n\n print(cb)\n\n... everything still works, except the final SQL query contained more text in addition to SQL query, i.e.:\n> Entering new SQLDatabaseChain chain...\nHave the user visited some news website? If yes, list all the urls.\nDO NOT specify timestamp unless query said so.\nDO NOT specify limit unless query said so.\nSQLQuery:The original query appears to be correct as it doesn't seem to have any of the common mistakes listed. Here is the same query:\n\nSELECT \"URL\" FROM browsinghistory WHERE \"Title\" LIKE '%news%'Traceback (most recent call last):\n File \"C:\\path\\Python311\\Lib\\site-packages\\sqlalchemy\\engine\\base.py\", line 1968, in _exec_single_context\n self.dialect.do_execute(\n File \"C:\\path\\Python311\\Lib\\site-packages\\sqlalchemy\\engine\\default.py\", line 920, in do_execute\n cursor.execute(statement, parameters)\nsqlite3.OperationalError: near \"The\": syntax error\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File \"D:\\path\\run.py\", line 292, in \n database_mode(llm, filepath, delimiter)\n File \"D:\\path\\run.py\", line 156, in database_mode\n llm.query_database(db_path=db_path, query=query)\n File \"D:\\path\\modules\\chatbot.py\", line 220, in query_database\n result = db_chain(query)\n ^^^^^^^^^^^^^^^\n File \"C:\\path\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\langchain\\chains\\base.py\", line 140, in __call__\n raise e\n File \"C:\\path\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\langchain\\chains\\base.py\", line 134, in __call__\n self._call(inputs, run_manager=run_manager)\n File \"C:\\path\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\langchain\\chains\\sql_database\\base.py\", line 181, in _call\n raise exc\n File \"C:\\path\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\langchain\\chains\\sql_database\\base.py\", line 151, in _call\n result = self.database.run(checked_sql_command)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"C:\\path\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\langchain\\sql_database.py\", line 334, in run\n cursor = connection.execute(text(command))\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"C:\\path\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\sqlalchemy\\engine\\base.py\", line 1413, in execute\n return meth(\n ^^^^^\n File \"C:\\path\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\sqlalchemy\\sql\\elements.py\", line 483, in _execute_on_connection\n return connection._execute_clauseelement(\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"C:\\path\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\sqlalchemy\\engine\\base.py\", line 1637, in _execute_clauseelement\n ret = self._execute_context(\n ^^^^^^^^^^^^^^^^^^^^^^\n File \"C:\\path\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\sqlalchemy\\engine\\base.py\", line 1846, in _execute_context\n return self._exec_single_context(\n ^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"C:\\path\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\sqlalchemy\\engine\\base.py\", line 1987, in _exec_single_context\n self._handle_dbapi_exception(\n File \"C:\\path\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\sqlalchemy\\engine\\base.py\", line 2344, in _handle_dbapi_exception\n raise sqlalchemy_exception.with_traceback(exc_info[2]) from e\n File \"C:\\path\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\sqlalchemy\\engine\\base.py\", line 1968, in _exec_single_context\n self.dialect.do_execute(\n File \"C:\\path\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\sqlalchemy\\engine\\default.py\", line 920, in do_execute\n cursor.execute(statement, parameters)\nsqlalchemy.exc.OperationalError: (sqlite3.OperationalError) near \"The\": syntax error\n[SQL: The original query appears to be correct as it doesn't seem to have any of the common mistakes listed. Here is the same query:\n\nSELECT \"URL\" FROM browsinghistory WHERE \"Title\" LIKE '%news%']\n(Background on this error at: https://sqlalche.me/e/20/e3q8)\n\nI know that I can try to tell it not to return anything but the query (might be unstable. though...), but why isn't this work for GPT-4, while it works for text-davinci-003?\n\nUpdate:\nTried with a different query, and the problem remains:\n> Entering new SQLDatabaseChain chain...\nList all websites visited by the user.\nDO NOT specify timestamp unless query said so.\nDO NOT specify limit unless query said so.\nSQLQuery:The original query seems to be correct. It is simply selecting the \"URL\" column from the \"browsinghistory\" table. There is no misuse of any functions, no data type mismatch, no joins, etc.\n\nReproducing the original query:\n\nSELECT \"URL\" FROM browsinghistory\n...\n...\n...\n\n"} -{"question": "I'm trying to create a Qdrant vectorsore and add my documents.\n\nMy embeddings are based on OpenAIEmbeddings\nthe QdrantClient is local for my case\nthe collection that I'm creating has the\nVectorParams as such: VectorParams(size=2000, distance=Distance.EUCLID)\n\nI'm getting the following error:\nValueError: could not broadcast input array from shape (1536,) into shape (2000,)\nI understand that my error is how I configure the vectorParams, but I don't undertsand how these values need to be calculated.\nhere's my complete code:\nimport os\nfrom typing import List\n\nfrom langchain.docstore.document import Document\nfrom langchain.embeddings import OpenAIEmbeddings\nfrom langchain.text_splitter import RecursiveCharacterTextSplitter\nfrom langchain.vectorstores import Qdrant, VectorStore\nfrom qdrant_client import QdrantClient\nfrom qdrant_client.models import Distance, VectorParams\n\ndef load_documents(documents: List[Document]) -> VectorStore:\n \"\"\"Create a vectorstore from documents.\"\"\"\n collection_name = \"my_collection\"\n vectorstore_path = \"data/vectorstore/qdrant\"\n embeddings = OpenAIEmbeddings(\n model=\"text-embedding-ada-002\",\n openai_api_key=os.getenv(\"OPENAI_API_KEY\"),\n )\n qdrantClient = QdrantClient(path=vectorstore_path, prefer_grpc=True)\n qdrantClient.create_collection(\n collection_name=collection_name,\n vectors_config=VectorParams(size=2000, distance=Distance.EUCLID),\n )\n vectorstore = Qdrant(\n client=qdrantClient,\n collection_name=collection_name,\n embeddings=embeddings,\n )\n text_splitter = RecursiveCharacterTextSplitter(\n chunk_size=1000,\n chunk_overlap=200,\n )\n\n sub_docs = text_splitter.split_documents(documents)\n vectorstore.add_documents(sub_docs)\n\n return vectorstore\n\nAny ideas on how I should configure the vector params properly?\n"} -{"question": "I am extracting text from pdf documents and load it to Azure Cognitive Search for a RAG approach. Unfortunately this does not work. I am receiving the error message\nHttpResponseError: () The request is invalid. Details: The property 'content' does not exist on type 'search.documentFields'. Make sure to only use property names that are defined by the type.\nCode: \nMessage: The request is invalid. Details: The property 'content' does not exist on type 'search.documentFields'. Make sure to only use property names that are defined by the type.\n\nWhat i want to do is\n\nExtract text from pdf via pymupdf - works\nUpload it to Azure Vector search as embeddings with vectors and metdata `filename``\nQuery this through ChatGPT model\n\nBeside the error i want to add to this document object the metadata information filename but also dont know how to extend this ...\nMy code:\n!pip install cohere tiktoken\n!pip install openai==0.28.1\n!pip install pymupdf\n!pip install azure-storage-blob azure-identity\n!pip install azure-search-documents --pre --upgrade\n!pip install langchain\n\nimport fitz\nimport time\nimport uuid\nimport os\nimport openai\n\nfrom PIL import Image\nfrom io import BytesIO\nfrom IPython.display import display\n\nfrom azure.identity import DefaultAzureCredential\nfrom azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient\n\nfrom langchain.embeddings import OpenAIEmbeddings\nfrom langchain.text_splitter import RecursiveCharacterTextSplitter\n\nfrom langchain.chat_models import AzureChatOpenAI\nfrom langchain.vectorstores import AzureSearch\nfrom langchain.docstore.document import Document\nfrom langchain.document_loaders import DirectoryLoader\nfrom langchain.document_loaders import TextLoader\nfrom langchain.text_splitter import TokenTextSplitter\nfrom langchain.chains import ConversationalRetrievalChain\nfrom langchain.prompts import PromptTemplate\n\nfrom google.colab import drive\n\nOPENAI_API_BASE = \"https://xxx.openai.azure.com\"\nOPENAI_API_KEY = \"xxx\"\nOPENAI_API_VERSION = \"2023-05-15\"\n\nopenai.api_type = \"azure\"\nopenai.api_key = OPENAI_API_KEY\nopenai.api_base = OPENAI_API_BASE\nopenai.api_version = OPENAI_API_VERSION\n\nAZURE_COGNITIVE_SEARCH_SERVICE_NAME = \"https://xxx.search.windows.net\"\nAZURE_COGNITIVE_SEARCH_API_KEY = \"xxx\"\nAZURE_COGNITIVE_SEARCH_INDEX_NAME = \"test\"\n\nllm = AzureChatOpenAI(deployment_name=\"gpt35\", openai_api_key=OPENAI_API_KEY, openai_api_base=OPENAI_API_BASE, openai_api_version=OPENAI_API_VERSION)\nembeddings = OpenAIEmbeddings(deployment_id=\"ada002\", chunk_size=1, openai_api_key=OPENAI_API_KEY, openai_api_base=OPENAI_API_BASE, openai_api_version=OPENAI_API_VERSION)\n\nacs = AzureSearch(azure_search_endpoint=AZURE_COGNITIVE_SEARCH_SERVICE_NAME,\n azure_search_key = AZURE_COGNITIVE_SEARCH_API_KEY,\n index_name = AZURE_COGNITIVE_SEARCH_INDEX_NAME,\n embedding_function = embeddings.embed_query)\n \ndef generate_tokens(s, f):\n text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)\n splits = text_splitter.split_text(s)\n i = 0\n\n documents = []\n for split in splits:\n metadata = {}\n metadata[\"index\"] = i\n metadata[\"file_source\"] = f\n i = i+1\n\n new_doc = Document(page_content=split, metadata=metadata)\n documents.append(new_doc)\n #documents = text_splitter.create_documents(splits)\n\n print (documents)\n\n return documents\n\n\ndrive.mount('/content/drive')\nfolder = \"/content/drive/.../pdf/\"\n\npage_content = ''\ndoc_content = ''\n \nfor filename in os.listdir(folder):\n file_path = os.path.join(folder, filename)\n if os.path.isfile(file_path):\n print(f\"Processing file: {file_path}\")\n\n doc = fitz.open(file_path)\n for page in doc: # iterate the document pages\n page_content += page.get_text() # get plain text encoded as UTF-8 \n d = generate_tokens(doc_content)\n\n # the following line throws the error\n # how can i add the chunks + filename to \n # Azure Cognitive Search?\n\n doc_content += page_content\n d = generate_tokens(doc_content, file_path)\n\n acs.add_documents(documents=d)\n \n print(metadatas)\n print(\"----------\")\n print(doc_content)\n count = len(doc_content.split())\n print(\"Number of tokens: \", count)\n\n\nHttpResponseError Traceback (most recent call last)\n in ()\n 31 all_texts.extend(d)\n 32 \n---> 33 acs.add_documents(documents=d)\n 34 \n 35 metadatas = [{\"Source\": f\"{i}-pl\"} for i in range(len(all_texts))]\n\n7 frames\n/usr/local/lib/python3.10/dist-packages/azure/search/documents/_generated/operations/_documents_operations.py in index(self, batch, request_options, **kwargs)\n 1249 map_error(status_code=response.status_code, response=response, error_map=error_map)\n 1250 error = self._deserialize.failsafe_deserialize(_models.SearchError, pipeline_response)\n-> 1251 raise HttpResponseError(response=response, model=error)\n 1252 \n 1253 if response.status_code == 200:\n\nHttpResponseError: () The request is invalid. Details: The property 'content' does not exist on type 'search.documentFields'. Make sure to only use property names that are defined by the type.\nCode: \nMessage: The request is invalid. Details: The property 'content' does not exist on type 'search.documentFields'. Make sure to only use property names that are defined by the type.\n\nThis is my index in Azure Cognitive Search index:\n\n"} -{"question": "I'm trying to pass filters to redis retriever to do hybrid search on my embeddings (vector + metadata filtering). The following doesn't work! It fails to pass the filters and filters would always be None:\nretriever = redis.as_retriever(\n search_type=\"similarity_distance_threshold\",\n search_kwargs=\"{'include_metadata': True,'distance_threshold': 0.8,'k': 5}\",\n filter=\"(@launch:{false} @menu_text:(%%chicken%%))\"\n )\n\nI found another example and apparently filter expression should be pass as search_kwargs, but I can't figure out what should be the correct syntax. If I do it as follow:\nretriever = redis.as_retriever(\n search_type=\"similarity_distance_threshold\",\n \"retriever_search_kwargs\":\"{'include_metadata': True,'distance_threshold': 0.8,'k': 5, 'filter': '@menu_text:(%%chicken%%) @lunch:{true}'}\",\n}\n\nit generates this search query:\nsimilarity_search_by_vector > redis_query : (@content_vector:[VECTOR_RANGE $distance_threshold $vector] @menu_text:(%%chicken%%) @lunch:{true})=>{$yield_distance_as: distance}\nand fails with the following error:\nredis.exceptions.ResponseError: Invalid attribute yield_distance_as\nAny idea how to fix it?\nSystem Info:\nlangchain 0.0.346\nlangchain-core 0.0.10\npython 3.9.18\n"} -{"question": "I'm creating an app with the help of Langchain and OpenAI.\nI'm loading my data with JSONLoader and want to store it in a vectorstore, so I can retrieve on user request to answer questions specific to my data. The Langchain docs are describing HNSWLib as a possible store for ONLY Node.js apps.\nIn my understanding is that NEXT is built up on top of Node.js so it can run SS javascript, so I should be able to use it. I should also mention that the JSONLoader also only works on NodeJS, which works perfectly, so I reckon it should be all set.\nI've created an API route in app/api/llm/route.ts following the docs of the new Route Handlers, and also installed the hnswlib-node package.\nimport { NextRequest } from 'next/server';\nimport { OpenAI } from 'langchain/llms/openai';\nimport { RetrievalQAChain } from 'langchain/chains';\nimport { JSONLoader } from 'langchain/document_loaders/fs/json';\nimport { HNSWLib } from 'langchain/vectorstores/hnswlib';\nimport { OpenAIEmbeddings } from 'langchain/embeddings/openai';\nimport path from 'path';\n\n// eslint-disable-next-line @typescript-eslint/no-unused-vars, no-unused-vars\nexport const GET = async (req: NextRequest) => {\n const apiKey = process.env.NEXT_PUBLIC_OPENAI_API_KEY;\n const model = new OpenAI({ openAIApiKey: apiKey, temperature: 0.9, modelName: 'gpt-3.5-turbo' });\n // Initialize the LLM to use to answer the question.\n const loader = new JSONLoader(path.join(process.cwd(), '/assets/surfspots.json'));\n const docs = await loader.load();\n\n // Create a vector store from the documents.\n const vectorStore = await HNSWLib.fromDocuments(docs, new OpenAIEmbeddings({ openAIApiKey: apiKey }));\n\n // Create a chain that uses the OpenAI LLM and HNSWLib vector store.\n const chain = RetrievalQAChain.fromLLM(model, vectorStore.asRetriever());\n const res = await chain.call({\n query: 'List me all of the waves I can find in Fuerteventura',\n });\n console.log({ res });\n};\n\nWhich I'm calling on the front-end inside of a client-side react component.\nWhen I'm trying to run this code, I get the following error:\nError: Please install hnswlib-node as a dependency with, e.g. `npm install -S hnswlib-node`\n at HNSWLib.imports (webpack-internal:///(sc_server)/./node_modules/langchain/dist/vectorstores/hnswlib.js:184:19)\n\nI tried reinstalling the package, removed node_modules and reinstall everything again, search the web for answers, etc.\nAnybody worked with these libraries or have any direction I could consider to debug this?\nThank you in advance!\n"} -{"question": "After installing pip install langchain-experimental I have tried:\nfrom langchain_experimental.sql_database import SQLDatabase\n\nBut it does not work. The code is as follows:\n# 1. Load db with langchain\nfrom langchain.sql_database import SQLDatabase\ndb = SQLDatabase.from_uri(\"sqlite:////python/chatopenai/ecommerce.db\")\n\n# 2. Import APIs\nimport a_env_vars\nimport os\nos.environ[\"OPENAI_API_KEY\"] = a_env_vars.OPENAI_API_KEY\n\n# 3. Create LLM\nfrom langchain.chat_models import ChatOpenAI\nllm = ChatOpenAI(temperature=0,model_name='gpt-3.5-turbo')\n\n# 4. Create chain\nfrom langchain import SQLDatabaseChain\ncadena = SQLDatabaseChain(llm = llm, database = db, verbose=False)\n\nAnd the error is:\nImportError: cannot import name 'SQLDatabaseChain' from 'langchain' (C:\\Users\\jcarr\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\langchain\\__init__.py) Traceback: File \"C:\\Users\\jcarr\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\streamlit\\runtime\\scriptrunner\\script_runner.py\", line 534, in _run_script\n exec(code, module.__dict__) File \"C:\\python\\chatOpenAI\\c_front_end.py\", line 3, in \n import b_backend File \"C:\\python\\chatOpenAI\\b_backend.py\", line 15, in \n from langchain import SQLDatabaseChain\n\nThis is after doing the same with \"langchain.sql_database\".\n"} -{"question": "from langchain.document_loaders import TextLoader\n# Create the TextLoader object using the file path\nLoader = tl('data.txt')\n\nI want to use a langchain with a string instead of a txt file, is this possible?\ndef get_response(query):\n #print(query)\n result = index.query(query)\n result = str(result) \n\n"} -{"question": "My code uses \"wikipedia\" to search for the relevant content. Below is the code\nLoad tools\ntools = load_tools(\n [\"wikipedia\"],\n llm=llm)\nagent = initialize_agent(\n tools,\n llm,\n agent=AgentType.CHAT_ZERO_SHOT_REACT_DESCRIPTION,\n handle_parsing_errors=True,\n verbose=False\n)\nout = agent(f\"Does {var_1} cause {var_2} or the other way around?.\")\n\nInstead of \"wikipedia\", I want to use my own pdf document that is available in my local. Can anyone help me in doing this?\nI have tried using the below code\nfrom langchain.document_loaders import PyPDFium2Loader\nloader = PyPDFium2Loader(\"hunter-350-dual-channel.pdf\")\ndata = loader.load()\n\nbut i am not sure how to include this in the agent.\n"} -{"question": "I am working on a langchain based SQL chat application and wanted my agent to understand context w.r.t the user session. For e.g.\nUser - What is highest order placed in last placed?\nBot - Order id : XYZ\nUser - When was this placed?\nHere, bot should be able to deduce that 'this' refers to 'order id XYZ' from previous question. How can I incorporate this in my code?\nI am tried using ChatHistory but getting context from session history is where I am stuck.\n"} -{"question": "Kind of new to Langchain/Qdrant but I'm building a recommendation engine to recommend users based on the contents of their associated PDF files, and I need to process PDFs and store their chunks in a vector database (I'm using Qdrant) for establishing context for the RAG agent. I don't exactly understand if this error is pertaining to some sort of version requirement, since the only prior error I found had to do with Langchain versions before 0.1.x:\nFound this prior issue\nHowever that issue was closed, and downgrading to versions below 0.1.x given the current releases of langchain doesn't seem feasible given what most of my current environment has recent dependencies.\nI tried different versions of langchain and different versions all of the corresponding langchain third-party libraries. Currently, these are the important parts of my requirements file (I think):\nlangchain==0.2.1\nlangchain-community==0.2.1\nlangchain-core==0.2.1\nlangchain-experimental==0.0.59\nlangchain-openai==0.1.7\nlangchain-text-splitters==0.2.0\nlangcodes==3.4.0\nlangsmith==0.1.57\n\nopenai==1.28.1 \npython==3.12.3\n\nLooking for some sort of workaround, or a diagnosis as to what may package may be causing the problem. My current program output:\nTraceback (most recent call last):\n File \"/Users/danielperlov/dperlov/JobsMatch/backend/ml_model/resume_preprocessor/main.py\", line 28, in \n main()\n File \"/Users/danielperlov/dperlov/JobsMatch/backend/ml_model/resume_preprocessor/main.py\", line 17, in main\n processor = PDFResumeProcessor(openai_api_key)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/Users/danielperlov/dperlov/JobsMatch/backend/ml_model/resume_preprocessor/gpt_class.py\", line 16, in __init__\n self.model = ChatOpenAI(api_key=openai_api_key, temperature=0, model_name='gpt-3.5-turbo-16k-0613')\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/Users/danielperlov/dperlov/JobsMatch/backend/ml_model/resume_preprocessor/.venv/lib/python3.12/site-packages/pydantic/v1/main.py\", line 339, in __init__\n values, fields_set, validation_error = validate_model(__pydantic_self__.__class__, data)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/Users/danielperlov/dperlov/JobsMatch/backend/ml_model/resume_preprocessor/.venv/lib/python3.12/site-packages/pydantic/v1/main.py\", line 1064, in validate_model\n value = field.get_default()\n ^^^^^^^^^^^^^^^^^^^\n File \"/Users/danielperlov/dperlov/JobsMatch/backend/ml_model/resume_preprocessor/.venv/lib/python3.12/site-packages/pydantic/v1/fields.py\", line 437, in get_default\n return smart_deepcopy(self.default) if self.default_factory is None else self.default_factory()\n ^^^^^^^^^^^^^^^^^^^^^^\n File \"/Users/danielperlov/dperlov/JobsMatch/backend/ml_model/resume_preprocessor/.venv/lib/python3.12/site-packages/langchain_core/language_models/base.py\", line 72, in _get_verbosity\n return get_verbose()\n ^^^^^^^^^^^^^\n File \"/Users/danielperlov/dperlov/JobsMatch/backend/ml_model/resume_preprocessor/.venv/lib/python3.12/site-packages/langchain_core/globals.py\", line 72, in get_verbose\n old_verbose = langchain.verbose\n ^^^^^^^^^^^^^^^^^\nAttributeError: module 'langchain' has no attribute 'verbose'\n\n"} -{"question": "I'm using langchain with Azure OpenAI and Azure Cognitive Search.\nCurrently I'm using Azure OpenAI text-embedding-ada-002 model for generating embeddings, but I would like to use a embbeding model from HugginFace if possible, because Azure OpenAI API does not allow to send documents in batches, so I need to make several calls and hit the rate limit.\nI tried using this embbeding in my code:\nembeddings = SentenceTransformerEmbeddings(\n model_name=\"all-mpnet-base-v2\",\n )\n\nInstead of:\nembeddings = OpenAIEmbeddings(\n ...\n)\n\nThe problem I'm facing, is that when I use AzureSearch's aadd_texts method I get this error:\nThe vector field 'content_vector' dimensionality must match the field definition's 'dimensions' property. Expected: '1536'. Actual: '768'. (IndexDocumentsFieldError) 98: The vector field 'content_vector' dimensionality must match the field definition's 'dimensions' property. Expected: '1536'. Actual: '768'.\n Code: IndexDocumentsFieldError\n\nI'm pretty lost. Did anyone used an open source embeddings model with Cognitive Search? How?\n"} -{"question": "I have this requirement, where i want to create a knowledge retriver which will call the API to get the closest matching information, I know that we have these integrations in langchain with multiple vector stores, but we have requirement were we have to call the API to find the closest matching document how can we create our custom retriver in langchain which will call this API to get the nearest matching informtaion\nI'm trying to build the custom retriver in langchain but still not able figure it out\n"} -{"question": "The below def load_documents function is able to load various documents such as .docx, .txt, and .pdf into langchain. I would also like to be able to load power point documents and found a script here: https://python.langchain.com/docs/integrations/document_loaders that I added to below function.\nHowever, the function is unable to read .pptx files because I am not able to pip install UnstructuredPowerPointLoader. Can somebody please suggest a way to do this or to augment below function so I can load .pptx files?\nPython function follows below:\ndef load_document(file):\n import os\n name, extension = os.path.splitext(file)\n\n if extension == '.pdf':\n from langchain.document_loaders import PyPDFLoader\n print(f'Loading {file}')\n loader = PyPDFLoader(file)\n elif extension == '.docx':\n from langchain.document_loaders import Docx2txtLoader\n print(f'Loading {file}')\n loader = Docx2txtLoader(file)\n elif extension == '.txt':\n from langchain.document_loaders import TextLoader\n print(f'Loading {file}')\n loader = TextLoader(file)\n elif extension == '.pptx':\n from langchain_community.document_loaders import UnstructuredPowerPointLoader\n print(f'Loading {file}')\n loader = UnstructuredPowerPointLoader(file)\n else:\n print('Document format is not supported!')\n return None\n\n data = loader.load()\n return data\n\nThe error I am getting is because !pip install unstructured is failing. I tried also tried !pip install -q unstructured[\"all-docs\"]==0.12.0 but was unsuccessful again. Appreciate any help!\n"} -{"question": "I am doing a microservice with a document loader, and the app can't launch at the import level, when trying to import langchain's UnstructuredMarkdownLoader\n$ flask --app main run --debug\nTraceback (most recent call last):\n File \"venv/bin/flask\", line 8, in \n sys.exit(main())\n File \"venv/lib/python3.9/site-packages/flask/cli.py\", line 1063, in main\n cli.main()\n File \"venv/lib/python3.9/site-packages/click/core.py\", line 1055, in main\n rv = self.invoke(ctx)\n File \"venv/lib/python3.9/site-packages/click/core.py\", line 1657, in invoke\n return _process_result(sub_ctx.command.invoke(sub_ctx))\n File \"venv/lib/python3.9/site-packages/click/core.py\", line 1404, in invoke\n return ctx.invoke(self.callback, **ctx.params)\n File \"venv/lib/python3.9/site-packages/click/core.py\", line 760, in invoke\n return __callback(*args, **kwargs)\n File \"venv/lib/python3.9/site-packages/click/decorators.py\", line 84, in new_func\n return ctx.invoke(f, obj, *args, **kwargs)\n File \"venv/lib/python3.9/site-packages/click/core.py\", line 760, in invoke\n return __callback(*args, **kwargs)\n File \"venv/lib/python3.9/site-packages/flask/cli.py\", line 911, in run_command\n raise e from None\n File \"venv/lib/python3.9/site-packages/flask/cli.py\", line 897, in run_command\n app = info.load_app()\n File \"venv/lib/python3.9/site-packages/flask/cli.py\", line 308, in load_app\n app = locate_app(import_name, name)\n File \"venv/lib/python3.9/site-packages/flask/cli.py\", line 218, in locate_app\n __import__(module_name)\n File \"main.py\", line 5, in \n from lc_indexer import index_documents\n File \"lc_indexer.py\", line 5, in \n from langchain.document_loaders import UnstructuredMarkdownLoader\n File \"venv/lib/python3.9/site-packages/langchain/__init__.py\", line 6, in \n from langchain.agents import MRKLChain, ReActChain, SelfAskWithSearchChain\n File \"venv/lib/python3.9/site-packages/langchain/agents/__init__.py\", line 2, in \n from langchain.agents.agent import (\n File \"venv/lib/python3.9/site-packages/langchain/agents/agent.py\", line 16, in \n from langchain.agents.tools import InvalidTool\n File \"venv/lib/python3.9/site-packages/langchain/agents/tools.py\", line 8, in \n from langchain.tools.base import BaseTool, Tool, tool\n File \"venv/lib/python3.9/site-packages/langchain/tools/__init__.py\", line 42, in \n from langchain.tools.vectorstore.tool import (\n File \"venv/lib/python3.9/site-packages/langchain/tools/vectorstore/tool.py\", line 13, in \n from langchain.chains import RetrievalQA, RetrievalQAWithSourcesChain\n File \"venv/lib/python3.9/site-packages/langchain/chains/__init__.py\", line 2, in \n from langchain.chains.api.base import APIChain\n File \"venv/lib/python3.9/site-packages/langchain/chains/api/base.py\", line 13, in \n from langchain.chains.api.prompt import API_RESPONSE_PROMPT, API_URL_PROMPT\n File \"venv/lib/python3.9/site-packages/langchain/chains/api/prompt.py\", line 2, in \n from langchain.prompts.prompt import PromptTemplate\n File \"venv/lib/python3.9/site-packages/langchain/prompts/__init__.py\", line 3, in \n from langchain.prompts.chat import (\n File \"venv/lib/python3.9/site-packages/langchain/prompts/chat.py\", line 10, in \n from langchain.memory.buffer import get_buffer_string\n File \"venv/lib/python3.9/site-packages/langchain/memory/__init__.py\", line 28, in \n from langchain.memory.vectorstore import VectorStoreRetrieverMemory\n File \"venv/lib/python3.9/site-packages/langchain/memory/vectorstore.py\", line 10, in \n from langchain.vectorstores.base import VectorStoreRetriever\n File \"venv/lib/python3.9/site-packages/langchain/vectorstores/__init__.py\", line 2, in \n from langchain.vectorstores.analyticdb import AnalyticDB\n File \"venv/lib/python3.9/site-packages/langchain/vectorstores/analyticdb.py\", line 16, in \n from langchain.embeddings.base import Embeddings\n File \"venv/lib/python3.9/site-packages/langchain/embeddings/__init__.py\", line 19, in \n from langchain.embeddings.openai import OpenAIEmbeddings\n File \"venv/lib/python3.9/site-packages/langchain/embeddings/openai.py\", line 67, in \n class OpenAIEmbeddings(BaseModel, Embeddings):\n File \"pydantic/main.py\", line 197, in pydantic.main.ModelMetaclass.__new__\n File \"pydantic/fields.py\", line 506, in pydantic.fields.ModelField.infer\n File \"pydantic/fields.py\", line 436, in pydantic.fields.ModelField.__init__\n File \"pydantic/fields.py\", line 552, in pydantic.fields.ModelField.prepare\n File \"pydantic/fields.py\", line 663, in pydantic.fields.ModelField._type_analysis\n File \"pydantic/fields.py\", line 808, in pydantic.fields.ModelField._create_sub_type\n File \"pydantic/fields.py\", line 436, in pydantic.fields.ModelField.__init__\n File \"pydantic/fields.py\", line 552, in pydantic.fields.ModelField.prepare\n File \"pydantic/fields.py\", line 668, in pydantic.fields.ModelField._type_analysis\n File \"/home/my_username/.pyenv/versions/3.9.16/lib/python3.9/typing.py\", line 852, in __subclasscheck__\n return issubclass(cls, self.__origin__)\nTypeError: issubclass() arg 1 must be a class\n\nHere is the content of lc_indexer.py where the langchain imports occur\n# INDEX DOCUMENTS\nimport os\nfrom os.path import join, isfile\n\nfrom langchain.document_loaders import UnstructuredMarkdownLoader\nfrom langchain.embeddings import OpenAIEmbeddings\nfrom langchain.text_splitter import TokenTextSplitter, CharacterTextSplitter\nfrom langchain.vectorstores import Chroma\n\n\ndef index_documents(source_directories: list[str], persist_directory: str, chunk_size: int = 1000,\n chunk_overlap: int = 15):\n \"\"\"\n Indexe les documents venant des r\u00e9pertoires fournis\n\n :param source_directories: list[str]\n :param persist_directory: str\n :param chunk_size: int = 1000\n :param chunk_overlap: int = 15\n :return:\n \"\"\"\n\n only_files = []\n for directory in source_directories:\n my_path = f'{directory}'\n for f in os.listdir(my_path):\n if isfile(join(my_path, f)):\n only_files.append(f'{my_path}/{f}')\n\n embeddings = OpenAIEmbeddings()\n for file in only_files:\n index_file_to_chroma(file, persist_directory, embeddings, chunk_size, chunk_overlap)\n\n\ndef index_file_to_chroma(file: str, persist_directory: str, embeddings: OpenAIEmbeddings, chunk_size: int, chunk_overlap: int):\n \"\"\"\n Indexe un document dans Chroma\n\n :param embeddings: OpenAIEmbeddings\n :param file: str\n :param persist_directory: str\n :param chunk_size: int\n :param chunk_overlap: int\n :return:\n \"\"\"\n\n loader = UnstructuredMarkdownLoader(file_path=file, encoding='utf8')\n docs = loader.load()\n text_splitter = CharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=0)\n pages = text_splitter.split_documents(docs)\n text_splitter = TokenTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)\n texts = text_splitter.split_documents(pages)\n db = Chroma.from_documents(texts, embeddings, persist_directory=persist_directory)\n db.persist()\n print(f'Indexed file {file} for module {persist_directory}')\n db = None\n# /INDEX DOCUMENTS\n\nThis file has been copied from a test project where no such error occurs at all when trying it but it was tested from the CLI so it may change something here.\nAlready tried copying those functions and the imports into the main.py file, but I get the same error.\nI have tried commenting the import of lc_indexer.py and the call to the index_documents function in the main.py, and it launches no problem.\nWhat is the root of the problem here? Langchain requirements have been installed\n"} -{"question": "This piece of code seems to not work. Even though this is the way that Pinecone have stated in their documentation that it should look like.\nvectorstore = Pinecone(index, embeddings.embed_query, text_field)\nThe error/warning is\nC:\\Users\\ndira\\casetext-test-server\\Lib\\site-packages\\langchain\\vectorstores\\pinecone.py:59: UserWarning: Passing in \"embedding\" as a Callable is deprecated. Please pass in an Embeddings object instead. warnings.warn(\nI don't know any other way of solving this. Kindly help thanks.\n"} -{"question": "I'm working with langchain and ChromaDb using python.\nNow, I know how to use document loaders. For instance, the below loads a bunch of documents into ChromaDb:\nfrom langchain.embeddings.openai import OpenAIEmbeddings\nembeddings = OpenAIEmbeddings()\n\nfrom langchain.vectorstores import Chroma\ndb = Chroma.from_documents(docs, embeddings, persist_directory='db')\ndb.persist()\n\nBut what if I wanted to add a single document at a time? More specifically, I want to check if a document exists before I add it. This is so I don't keep adding duplicates.\nIf a document does not exist, only then do I want to get embeddings and add it.\nHow do I do this using langchain? I think I mostly understand langchain but have no idea how to do seemingly basic tasks like this.\n"} -{"question": "I'm working with AzureOpenAI and langchain, constantly getting hit by PermissionError. This mostly could be due to the proxy, but can someone please check the code --\nfrom langchain.llms import OpenAI, AzureOpenAI\nfrom langchain.prompts import PromptTemplate\nfrom langchain.chains import LLMChain\n\nllm = AzureOpenAI(openai_api_type=\"\", openai_api_base=\"\", deployment_name=\"\", model_name=\"\", openai_api_key=\"\", openai_api_version=\"\")\n\ntemplate = \"\"\"\"\nTranslate the following text from {source_lang} to {dest_lang}: {source_text}\n\"\"\"\n\nprompt_name = PromptTemplate(input_variables=[\"source_lang\", \"dest_lang\", \"source_text\"], template=template)\nchain = LLMChain(llm=llm, prompt=prompt_name)\n\nchain.predict(source_lang=\"English\", dest_lang=\"Spanish\", source_text=\"How are you?\")\n\nchain(inputs={\"source_lang\": \"English\", \"dest_lang\": \"Spanish\", \"source_text\": \"How are you\"})\n\nI also tried the additional openai_proxy parameter without much luck.\n"} -{"question": "I want to create a local LLM using falcon 40b instruct model and combine it with lanchain so I can give it a pdf or some resource to learn from so I can query it ask it questions, learn from it and ultimately be able to derive insights from the pdf report from an Excel sheet.\nFor now, I just want to load a pdf using langchain and have the falcon-40b-instruct model as the agent.\nI want to build an llm where I can make it interact with my own data using langchain.\nHere is my attempt so far:\nfrom langchain_community.llms import HuggingFaceHub\n\nllm = HuggingFaceHub(\nrepo_id=model_name,\ntask=\"text-generation\",\nmodel_kwargs={\n\"max_new_tokens\": 512,\n\"top_k\": 30,\n\"temperature\": 0.1,\n\"repetition_penalty\": 1.03\n},\nhuggingfacehub_api_token=\"hf_XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX\"\n)\n\nI reached the following stage:\nfrom langchain_community.chat_models.huggingface import ChatHuggingFace\nllm = ChatHuggingFace(llm=llm)\n\nyet I get this error:\n\nHfHubHTTPError: 401 Client Error: Unauthorized for url\n\nI am doing do this to be able to run the following:\nqa_chain = RetrievalQA.from_chain_type(\nllm=llm,\nretriever=vector_db.as_retriever()\n)\n\nWhat am I missing and is there a way to be able to do this fully local like doing the falcon model and pass it to ChatHuggingFace?\n"} -{"question": "LangChain's BaseMessage has a function toJSON that returns a Serialized.\nOnce I have a list of BaseMessages, I can use toJSON to serialize them, but how can I later deserialize them?\nconst messages = [\n new HumanMessage(\"hello\"),\n new AIMessage(\"foo\"),\n new HumanMessage(\"bar\"),\n new AIMessage(\"baz\"),\n];\n\nconst serialized = messages.map((message) => message.toJSON());\n\nconst deserialized = ???\n\n"} \ No newline at end of file diff --git a/optimization_runs/mipro_query_writer.ipynb b/optimization_runs/mipro_query_writer.ipynb deleted file mode 100644 index 2889086..0000000 --- a/optimization_runs/mipro_query_writer.ipynb +++ /dev/null @@ -1,9936 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 3, - "id": "29bba2b0", - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pydantic/main.py:453: UserWarning: Pydantic serializer warnings:\n", - " PydanticSerializationUnexpectedValue(Expected 9 fields but got 6: Expected `Message` - serialized value may not be as expected [input_value=Message(content='[[ ## re...: None}, annotations=[]), input_type=Message])\n", - " PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [input_value=Choices(finish_reason='st...ider_specific_fields={}), input_type=Choices])\n", - " return self.__pydantic_serializer__.to_python(\n" - ] - }, - { - "data": { - "text/plain": [ - "Prediction(\n", - " final_answer='',\n", - " sources=[Source(object_id='acdb9703-2471-4b96-989b-b1fe0f1b8aff'), Source(object_id='b980876d-0521-4440-abf5-ea2b64dc96ff'), Source(object_id='4e34d7b3-74c8-4387-bcbb-43a125ed919b'), Source(object_id='5163bd72-2249-4fa0-9ac4-7ba904a7f4e4'), Source(object_id='1cd71614-2aff-4961-a431-a05d5c37f25c'), Source(object_id='9608f261-9b02-4a9e-ab23-54b3b4c704f8'), Source(object_id='51945f2b-54d7-4b0b-9360-ce656af18ac6'), Source(object_id='eabdeb84-50fd-43c0-8711-a0ec0a3d34b2'), Source(object_id='1d04c3e1-6d4c-4bec-a4de-d1a559351637'), Source(object_id='683a28d2-3fdf-4533-b689-a3ebfc3ab1e2'), Source(object_id='9608f261-9b02-4a9e-ab23-54b3b4c704f8'), Source(object_id='acdb9703-2471-4b96-989b-b1fe0f1b8aff'), Source(object_id='5163bd72-2249-4fa0-9ac4-7ba904a7f4e4'), Source(object_id='4cf9a2cb-e9ec-4883-8075-05054f38f8a6'), Source(object_id='a3c9c93b-6c68-46e8-86f6-4da4ade70d57'), Source(object_id='b980876d-0521-4440-abf5-ea2b64dc96ff'), Source(object_id='250479d5-7312-4a56-8b97-edfa2b8e54b4'), Source(object_id='b484396b-da9e-4adb-a1d1-d34ff84b193a'), Source(object_id='9380eade-3f13-4037-87ba-29480e6afb19'), Source(object_id='28a80008-7522-4851-97d3-ceb1430f3702'), Source(object_id='1cd71614-2aff-4961-a431-a05d5c37f25c'), Source(object_id='b980876d-0521-4440-abf5-ea2b64dc96ff'), Source(object_id='5163bd72-2249-4fa0-9ac4-7ba904a7f4e4'), Source(object_id='acdb9703-2471-4b96-989b-b1fe0f1b8aff'), Source(object_id='eabdeb84-50fd-43c0-8711-a0ec0a3d34b2'), Source(object_id='4cf9a2cb-e9ec-4883-8075-05054f38f8a6'), Source(object_id='ca6fcd08-ef78-4118-b132-757f71cfd1ca'), Source(object_id='1d04c3e1-6d4c-4bec-a4de-d1a559351637'), Source(object_id='9608f261-9b02-4a9e-ab23-54b3b4c704f8'), Source(object_id='683a28d2-3fdf-4533-b689-a3ebfc3ab1e2'), Source(object_id='9608f261-9b02-4a9e-ab23-54b3b4c704f8'), Source(object_id='acdb9703-2471-4b96-989b-b1fe0f1b8aff'), Source(object_id='4cf9a2cb-e9ec-4883-8075-05054f38f8a6'), Source(object_id='250479d5-7312-4a56-8b97-edfa2b8e54b4'), Source(object_id='653f7f19-da48-45df-b9d5-19ef173390dc'), Source(object_id='683a28d2-3fdf-4533-b689-a3ebfc3ab1e2'), Source(object_id='5163bd72-2249-4fa0-9ac4-7ba904a7f4e4'), Source(object_id='9043a9eb-adcc-4712-a459-9b9b2280c862'), Source(object_id='b484396b-da9e-4adb-a1d1-d34ff84b193a'), Source(object_id='2c9c4348-53cf-4f35-b070-b6de187aaa5b'), Source(object_id='acdb9703-2471-4b96-989b-b1fe0f1b8aff'), Source(object_id='4cf9a2cb-e9ec-4883-8075-05054f38f8a6'), Source(object_id='250479d5-7312-4a56-8b97-edfa2b8e54b4'), Source(object_id='683a28d2-3fdf-4533-b689-a3ebfc3ab1e2'), Source(object_id='51945f2b-54d7-4b0b-9360-ce656af18ac6'), Source(object_id='b980876d-0521-4440-abf5-ea2b64dc96ff'), Source(object_id='1cd71614-2aff-4961-a431-a05d5c37f25c'), Source(object_id='eabdeb84-50fd-43c0-8711-a0ec0a3d34b2'), Source(object_id='5163bd72-2249-4fa0-9ac4-7ba904a7f4e4'), Source(object_id='ca6fcd08-ef78-4118-b132-757f71cfd1ca'), Source(object_id='1cd71614-2aff-4961-a431-a05d5c37f25c'), Source(object_id='5163bd72-2249-4fa0-9ac4-7ba904a7f4e4'), Source(object_id='b980876d-0521-4440-abf5-ea2b64dc96ff'), Source(object_id='4cf9a2cb-e9ec-4883-8075-05054f38f8a6'), Source(object_id='250479d5-7312-4a56-8b97-edfa2b8e54b4'), Source(object_id='9608f261-9b02-4a9e-ab23-54b3b4c704f8'), Source(object_id='acdb9703-2471-4b96-989b-b1fe0f1b8aff'), Source(object_id='eabdeb84-50fd-43c0-8711-a0ec0a3d34b2'), Source(object_id='ca6fcd08-ef78-4118-b132-757f71cfd1ca'), Source(object_id='51945f2b-54d7-4b0b-9360-ce656af18ac6'), Source(object_id='acdb9703-2471-4b96-989b-b1fe0f1b8aff'), Source(object_id='9608f261-9b02-4a9e-ab23-54b3b4c704f8'), Source(object_id='4cf9a2cb-e9ec-4883-8075-05054f38f8a6'), Source(object_id='5163bd72-2249-4fa0-9ac4-7ba904a7f4e4'), Source(object_id='b484396b-da9e-4adb-a1d1-d34ff84b193a'), Source(object_id='9380eade-3f13-4037-87ba-29480e6afb19'), Source(object_id='a3c9c93b-6c68-46e8-86f6-4da4ade70d57'), Source(object_id='b980876d-0521-4440-abf5-ea2b64dc96ff'), Source(object_id='1cd71614-2aff-4961-a431-a05d5c37f25c'), Source(object_id='86b76a15-b28f-47f4-bddf-158a160628b7'), Source(object_id='eabdeb84-50fd-43c0-8711-a0ec0a3d34b2'), Source(object_id='acdb9703-2471-4b96-989b-b1fe0f1b8aff'), Source(object_id='4cf9a2cb-e9ec-4883-8075-05054f38f8a6'), Source(object_id='1cd71614-2aff-4961-a431-a05d5c37f25c'), Source(object_id='ca6fcd08-ef78-4118-b132-757f71cfd1ca'), Source(object_id='5163bd72-2249-4fa0-9ac4-7ba904a7f4e4'), Source(object_id='b980876d-0521-4440-abf5-ea2b64dc96ff'), Source(object_id='9608f261-9b02-4a9e-ab23-54b3b4c704f8'), Source(object_id='683a28d2-3fdf-4533-b689-a3ebfc3ab1e2'), Source(object_id='1d04c3e1-6d4c-4bec-a4de-d1a559351637')],\n", - " searches=['Weaviate integration with LangChain', 'How to use Weaviate vector store in LangChain', 'LangChain Weaviate example tutorial', 'Using Weaviate as vector database in LangChain', 'LangChain Weaviate connector documentation', 'Weaviate LangChain setup guide', 'LangChain vector store Weaviate usage', 'Weaviate LangChain code examples'],\n", - " aggregations=None,\n", - " usage={}\n", - ")" - ] - }, - "execution_count": 3, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "import retrieve_dspy\n", - "\n", - "query_writer = retrieve_dspy.MultiQueryWriter(\n", - " collection_name=\"FreshstackLangchain\",\n", - " retrieved_k=10\n", - ")\n", - "\n", - "query_writer(\"How can I use Weaviate with LangChain?\")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "e23aff69", - "metadata": {}, - "outputs": [], - "source": [ - "from retrieve_dspy.metrics import create_metric\n", - "from retrieve_dspy.datasets.in_memory import load_queries_in_memory\n", - "\n", - "trainset, testset = load_queries_in_memory(\n", - " dataset_name=\"freshstack-langchain\",\n", - " train_samples=20,\n", - " test_samples=20\n", - ")\n", - "\n", - "metric = create_metric(\n", - " metric_type=\"coverage\",\n", - " dataset_name=\"freshstack-langchain\"\n", - ")\n", - "\n", - "evaluator = retrieve_dspy.utils.get_evaluator(\n", - " testset=testset,\n", - " metric=metric\n", - ")" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "id": "1a2d6602", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - " 0%| | 0/20 [00:00\n", - " obj, end = self.scan_once(s, idx)\n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/json/decoder.py:353: ResourceWarning: unclosed \n", - " obj, end = self.scan_once(s, idx)\n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\u001b[95mWrote 10 queries!\u001b[0m\n", - "\u001b[95mWrote 15 queries!\u001b[0m\n", - "\u001b[96m Returning 100 Sources!\u001b[0m\n", - "\u001b[96mCoverage@100 evaluation:\u001b[0m\n", - "Total nuggets: 2\n", - "Covered nuggets: 2\n", - "\u001b[92mNugget 1: Covered\u001b[0m\n", - "\u001b[92mNugget 2: Covered\u001b[0m\n", - "\u001b[96mCoverage@100: 2/2 = 1.00\u001b[0m\n", - "Average Metric: 1.00 / 1 (100.0%): 5%|▌ | 1/20 [00:10<03:26, 10.86s/it]\u001b[96m Returning 100 Sources!\u001b[0m\n", - "\u001b[96mCoverage@100 evaluation:\u001b[0m\n", - "Total nuggets: 4\n", - "Covered nuggets: 0\n", - "\u001b[91mNugget 1: Not covered\u001b[0m\n", - "\u001b[91mNugget 2: Not covered\u001b[0m\n", - "\u001b[91mNugget 3: Not covered\u001b[0m\n", - "\u001b[91mNugget 4: Not covered\u001b[0m\n", - "\u001b[96mCoverage@100: 0/4 = 0.00\u001b[0m\n", - "Average Metric: 1.00 / 2 (50.0%): 10%|█ | 2/20 [00:11<01:29, 4.98s/it] \u001b[96m Returning 100 Sources!\u001b[0m\n", - "\u001b[96mCoverage@100 evaluation:\u001b[0m\n", - "Total nuggets: 3\n", - "Covered nuggets: 3\n", - "\u001b[92mNugget 1: Covered\u001b[0m\n", - "\u001b[92mNugget 2: Covered\u001b[0m\n", - "\u001b[92mNugget 3: Covered\u001b[0m\n", - "\u001b[96mCoverage@100: 3/3 = 1.00\u001b[0m\n", - "Average Metric: 2.00 / 3 (66.7%): 15%|█▌ | 3/20 [00:14<01:09, 4.10s/it]" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/json/decoder.py:353: ResourceWarning: unclosed \n", - " obj, end = self.scan_once(s, idx)\n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\u001b[96m Returning 150 Sources!\u001b[0m\n", - "\u001b[95mWrote 9 queries!\u001b[0m\n", - "\u001b[96mCoverage@100 evaluation:\u001b[0m\n", - "Total nuggets: 8\n", - "Covered nuggets: 7\n", - "\u001b[92mNugget 1: Covered\u001b[0m\n", - "\u001b[92mNugget 2: Covered\u001b[0m\n", - "\u001b[92mNugget 3: Covered\u001b[0m\n", - "\u001b[92mNugget 4: Covered\u001b[0m\n", - "\u001b[91mNugget 5: Not covered\u001b[0m\n", - "... and 3 more nuggets\n", - "\u001b[96mCoverage@100: 7/8 = 0.88\u001b[0m\n", - "Average Metric: 2.88 / 4 (71.9%): 20%|██ | 4/20 [00:18<01:02, 3.90s/it]\u001b[95mWrote 10 queries!\u001b[0m\n", - "\u001b[95mWrote 10 queries!\u001b[0m\n", - "\u001b[96m Returning 90 Sources!\u001b[0m\n", - "\u001b[96mCoverage@100 evaluation:\u001b[0m\n", - "Total nuggets: 4\n", - "Covered nuggets: 4\n", - "\u001b[92mNugget 1: Covered\u001b[0m\n", - "\u001b[92mNugget 2: Covered\u001b[0m\n", - "\u001b[92mNugget 3: Covered\u001b[0m\n", - "\u001b[92mNugget 4: Covered\u001b[0m\n", - "\u001b[96mCoverage@100: 4/4 = 1.00\u001b[0m\n", - "Average Metric: 3.88 / 5 (77.5%): 25%|██▌ | 5/20 [00:22<01:00, 4.05s/it]\u001b[95mWrote 8 queries!\u001b[0m\n", - "\u001b[96m Returning 100 Sources!\u001b[0m\n", - "\u001b[96mCoverage@100 evaluation:\u001b[0m\n", - "Total nuggets: 2\n", - "Covered nuggets: 2\n", - "\u001b[92mNugget 1: Covered\u001b[0m\n", - "\u001b[92mNugget 2: Covered\u001b[0m\n", - "\u001b[96mCoverage@100: 2/2 = 1.00\u001b[0m\n", - "Average Metric: 4.88 / 6 (81.2%): 30%|███ | 6/20 [00:25<00:51, 3.64s/it]\u001b[96m Returning 100 Sources!\u001b[0m\n", - "\u001b[96mCoverage@100 evaluation:\u001b[0m\n", - "Total nuggets: 2\n", - "Covered nuggets: 2\n", - "\u001b[92mNugget 1: Covered\u001b[0m\n", - "\u001b[92mNugget 2: Covered\u001b[0m\n", - "\u001b[96mCoverage@100: 2/2 = 1.00\u001b[0m\n", - "Average Metric: 5.88 / 7 (83.9%): 35%|███▌ | 7/20 [00:25<00:32, 2.51s/it]\u001b[95mWrote 10 queries!\u001b[0m\n", - "\u001b[96m Returning 80 Sources!\u001b[0m\n", - "\u001b[96mCoverage@100 evaluation:\u001b[0m\n", - "Total nuggets: 3\n", - "Covered nuggets: 3\n", - "\u001b[92mNugget 1: Covered\u001b[0m\n", - "\u001b[92mNugget 2: Covered\u001b[0m\n", - "\u001b[92mNugget 3: Covered\u001b[0m\n", - "\u001b[96mCoverage@100: 3/3 = 1.00\u001b[0m\n", - "Average Metric: 6.88 / 8 (85.9%): 40%|████ | 8/20 [00:26<00:24, 2.06s/it]\u001b[95mWrote 8 queries!\u001b[0m\n", - "\u001b[95mWrote 10 queries!\u001b[0m\n", - "\u001b[96m Returning 100 Sources!\u001b[0m\n", - "\u001b[96mCoverage@100 evaluation:\u001b[0m\n", - "Total nuggets: 2\n", - "Covered nuggets: 2\n", - "\u001b[92mNugget 1: Covered\u001b[0m\n", - "\u001b[92mNugget 2: Covered\u001b[0m\n", - "\u001b[96mCoverage@100: 2/2 = 1.00\u001b[0m\n", - "Average Metric: 7.88 / 9 (87.5%): 45%|████▌ | 9/20 [00:31<00:31, 2.87s/it]\u001b[95mWrote 10 queries!\u001b[0m\n", - "\u001b[96m Returning 80 Sources!\u001b[0m\n", - "\u001b[96mCoverage@100 evaluation:\u001b[0m\n", - "Total nuggets: 5\n", - "Covered nuggets: 2\n", - "\u001b[92mNugget 1: Covered\u001b[0m\n", - "\u001b[92mNugget 2: Covered\u001b[0m\n", - "\u001b[91mNugget 3: Not covered\u001b[0m\n", - "\u001b[91mNugget 4: Not covered\u001b[0m\n", - "\u001b[91mNugget 5: Not covered\u001b[0m\n", - "\u001b[96mCoverage@100: 2/5 = 0.40\u001b[0m\n", - "Average Metric: 8.28 / 10 (82.8%): 50%|█████ | 10/20 [00:33<00:24, 2.49s/it]\u001b[96m Returning 100 Sources!\u001b[0m\n", - "\u001b[96mCoverage@100 evaluation:\u001b[0m\n", - "Total nuggets: 6\n", - "Covered nuggets: 6\n", - "\u001b[92mNugget 1: Covered\u001b[0m\n", - "\u001b[92mNugget 2: Covered\u001b[0m\n", - "\u001b[92mNugget 3: Covered\u001b[0m\n", - "\u001b[92mNugget 4: Covered\u001b[0m\n", - "\u001b[92mNugget 5: Covered\u001b[0m\n", - "... and 1 more nuggets\n", - "\u001b[96mCoverage@100: 6/6 = 1.00\u001b[0m\n", - "Average Metric: 9.28 / 11 (84.3%): 55%|█████▌ | 11/20 [00:35<00:21, 2.39s/it]\u001b[96m Returning 100 Sources!\u001b[0m\n", - "\u001b[96mCoverage@100 evaluation:\u001b[0m\n", - "Total nuggets: 3\n", - "Covered nuggets: 3\n", - "\u001b[92mNugget 1: Covered\u001b[0m\n", - "\u001b[92mNugget 2: Covered\u001b[0m\n", - "\u001b[92mNugget 3: Covered\u001b[0m\n", - "\u001b[96mCoverage@100: 3/3 = 1.00\u001b[0m\n", - "Average Metric: 10.28 / 12 (85.6%): 60%|██████ | 12/20 [00:36<00:17, 2.13s/it]\u001b[95mWrote 10 queries!\u001b[0m\n", - "\u001b[95mWrote 8 queries!\u001b[0m\n", - "\u001b[95mWrote 8 queries!\u001b[0m\n", - "\u001b[95mWrote 10 queries!\u001b[0m\n", - "\u001b[96m Returning 80 Sources!\u001b[0m\n", - "\u001b[96mCoverage@100 evaluation:\u001b[0m\n", - "Total nuggets: 2\n", - "Covered nuggets: 0\n", - "\u001b[91mNugget 1: Not covered\u001b[0m\n", - "\u001b[91mNugget 2: Not covered\u001b[0m\n", - "\u001b[96mCoverage@100: 0/2 = 0.00\u001b[0m\n", - "Average Metric: 10.28 / 13 (79.0%): 65%|██████▌ | 13/20 [00:49<00:37, 5.29s/it]\u001b[96m Returning 100 Sources!\u001b[0m\n", - "\u001b[96mCoverage@100 evaluation:\u001b[0m\n", - "Total nuggets: 2\n", - "Covered nuggets: 2\n", - "\u001b[92mNugget 1: Covered\u001b[0m\n", - "\u001b[92mNugget 2: Covered\u001b[0m\n", - "\u001b[96mCoverage@100: 2/2 = 1.00\u001b[0m\n", - "Average Metric: 11.28 / 14 (80.5%): 70%|███████ | 14/20 [00:49<00:22, 3.82s/it]\u001b[96m Returning 80 Sources!\u001b[0m\n", - "\u001b[96mCoverage@100 evaluation:\u001b[0m\n", - "Total nuggets: 4\n", - "Covered nuggets: 1\n", - "\u001b[91mNugget 1: Not covered\u001b[0m\n", - "\u001b[92mNugget 2: Covered\u001b[0m\n", - "\u001b[91mNugget 3: Not covered\u001b[0m\n", - "\u001b[91mNugget 4: Not covered\u001b[0m\n", - "\u001b[96mCoverage@100: 1/4 = 0.25\u001b[0m\n", - "Average Metric: 11.53 / 15 (76.8%): 75%|███████▌ | 15/20 [00:52<00:16, 3.40s/it]\u001b[96m Returning 100 Sources!\u001b[0m\n", - "\u001b[96mCoverage@100 evaluation:\u001b[0m\n", - "Total nuggets: 3\n", - "Covered nuggets: 3\n", - "\u001b[92mNugget 1: Covered\u001b[0m\n", - "\u001b[92mNugget 2: Covered\u001b[0m\n", - "\u001b[92mNugget 3: Covered\u001b[0m\n", - "\u001b[96mCoverage@100: 3/3 = 1.00\u001b[0m\n", - "Average Metric: 12.53 / 16 (78.3%): 80%|████████ | 16/20 [00:53<00:11, 2.78s/it]\u001b[95mWrote 10 queries!\u001b[0m\n", - "\u001b[95mWrote 10 queries!\u001b[0m\n", - "\u001b[95mWrote 10 queries!\u001b[0m\n", - "\u001b[95mWrote 10 queries!\u001b[0m\n", - "\u001b[96m Returning 100 Sources!\u001b[0m\n", - "\u001b[96mCoverage@100 evaluation:\u001b[0m\n", - "Total nuggets: 4\n", - "Covered nuggets: 3\n", - "\u001b[92mNugget 1: Covered\u001b[0m\n", - "\u001b[92mNugget 2: Covered\u001b[0m\n", - "\u001b[92mNugget 3: Covered\u001b[0m\n", - "\u001b[91mNugget 4: Not covered\u001b[0m\n", - "\u001b[96mCoverage@100: 3/4 = 0.75\u001b[0m\n", - "Average Metric: 13.28 / 17 (78.1%): 85%|████████▌ | 17/20 [00:59<00:11, 3.86s/it]\u001b[96m Returning 100 Sources!\u001b[0m\n", - "\u001b[96mCoverage@100 evaluation:\u001b[0m\n", - "Total nuggets: 2\n", - "Covered nuggets: 1\n", - "\u001b[91mNugget 1: Not covered\u001b[0m\n", - "\u001b[92mNugget 2: Covered\u001b[0m\n", - "\u001b[96mCoverage@100: 1/2 = 0.50\u001b[0m\n", - "Average Metric: 13.78 / 18 (76.5%): 90%|█████████ | 18/20 [01:00<00:05, 2.95s/it]\u001b[96m Returning 100 Sources!\u001b[0m\n", - "\u001b[96mCoverage@100 evaluation:\u001b[0m\n", - "Total nuggets: 2\n", - "Covered nuggets: 2\n", - "\u001b[92mNugget 1: Covered\u001b[0m\n", - "\u001b[92mNugget 2: Covered\u001b[0m\n", - "\u001b[96mCoverage@100: 2/2 = 1.00\u001b[0m\n", - "Average Metric: 14.78 / 19 (77.8%): 95%|█████████▌| 19/20 [01:01<00:02, 2.19s/it]\u001b[96m Returning 100 Sources!\u001b[0m\n", - "\u001b[96mCoverage@100 evaluation:\u001b[0m\n", - "Total nuggets: 3\n", - "Covered nuggets: 3\n", - "\u001b[92mNugget 1: Covered\u001b[0m\n", - "\u001b[92mNugget 2: Covered\u001b[0m\n", - "\u001b[92mNugget 3: Covered\u001b[0m\n", - "\u001b[96mCoverage@100: 3/3 = 1.00\u001b[0m\n", - "Average Metric: 15.78 / 20 (78.9%): 100%|██████████| 20/20 [01:03<00:00, 3.17s/it]" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/07/24 21:13:48 INFO dspy.evaluate.evaluate: Average Metric: 15.775 / 20 (78.9%)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n" - ] - }, - { - "data": { - "text/plain": [ - "78.88" - ] - }, - "execution_count": 6, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "dspy_evaluator_kwargs = {\n", - " \"num_threads\": 4\n", - "}\n", - "\n", - "evaluator(query_writer, **dspy_evaluator_kwargs)" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "id": "7b59aa2e", - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/07/24 21:13:52 INFO dspy.teleprompt.mipro_optimizer_v2: \n", - "RUNNING WITH THE FOLLOWING HEAVY AUTO RUN SETTINGS:\n", - "num_trials: 27\n", - "minibatch: False\n", - "num_fewshot_candidates: 18\n", - "num_instruct_candidates: 9\n", - "valset size: 16\n", - "\n", - "2025/07/24 21:13:52 INFO dspy.teleprompt.mipro_optimizer_v2: \n", - "==> STEP 1: BOOTSTRAP FEWSHOT EXAMPLES <==\n", - "2025/07/24 21:13:52 INFO dspy.teleprompt.mipro_optimizer_v2: These will be used as few-shot example candidates for our program and for creating instructions.\n", - "\n", - "2025/07/24 21:13:52 INFO dspy.teleprompt.mipro_optimizer_v2: Bootstrapping N=18 sets of demonstrations...\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Bootstrapping set 1/18\n", - "Bootstrapping set 2/18\n", - "Bootstrapping set 3/18\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - " 0%| | 0/4 [00:00 STEP 2: PROPOSE INSTRUCTION CANDIDATES <==\n", - "2025/07/24 21:21:20 INFO dspy.teleprompt.mipro_optimizer_v2: We will use the few-shot examples from the previous step, a generated dataset summary, a summary of the program code, and a randomly selected prompting tip to propose instructions.\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\u001b[96m Returning 100 Sources!\u001b[0m\n", - "\u001b[96mCoverage@100 evaluation:\u001b[0m\n", - "Total nuggets: 3\n", - "Covered nuggets: 3\n", - "\u001b[92mNugget 1: Covered\u001b[0m\n", - "\u001b[92mNugget 2: Covered\u001b[0m\n", - "\u001b[92mNugget 3: Covered\u001b[0m\n", - "\u001b[96mCoverage@100: 3/3 = 1.00\u001b[0m\n", - "Bootstrapped 1 full traces after 1 examples for up to 1 rounds, amounting to 1 attempts.\n", - "Error getting source code: unhashable type: 'dict'.\n", - "\n", - "Running without program aware proposer.\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pydantic/main.py:453: UserWarning: Pydantic serializer warnings:\n", - " PydanticSerializationUnexpectedValue(Expected 9 fields but got 6: Expected `Message` - serialized value may not be as expected [input_value=Message(content='[[ ## ob...: None}, annotations=[]), input_type=Message])\n", - " PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [input_value=Choices(finish_reason='st...ider_specific_fields={}), input_type=Choices])\n", - " return self.__pydantic_serializer__.to_python(\n", - "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pydantic/main.py:453: UserWarning: Pydantic serializer warnings:\n", - " PydanticSerializationUnexpectedValue(Expected 9 fields but got 6: Expected `Message` - serialized value may not be as expected [input_value=Message(content='[[ ## su...: None}, annotations=[]), input_type=Message])\n", - " PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [input_value=Choices(finish_reason='st...ider_specific_fields={}), input_type=Choices])\n", - " return self.__pydantic_serializer__.to_python(\n", - "2025/07/24 21:21:30 INFO dspy.teleprompt.mipro_optimizer_v2: \n", - "Proposing N=9 instructions...\n", - "\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "DATA SUMMARY: The dataset is centered on technical programming questions addressing code errors and functionality issues, especially involving libraries like Langchain, Neo4j, Milvus, and Google Cloud Storage. It features detailed problem descriptions paired with concise solution nuggets, aimed at debugging and enhancing code related to data processing, vector stores, graph databases, and language model integrations. The examples often include code snippets and error messages to provide context for precise, context-aware troubleshooting guidance.\n", - "Using a randomly generated configuration for our grounded proposer.\n", - "Selected tip: simple\n", - "task_demos No task demos provided.\n", - "\n", - "\n", - "\n", - "\n", - "\u001b[34m[2025-07-24T21:21:32.270565]\u001b[0m\n", - "\n", - "\u001b[31mSystem message:\u001b[0m\n", - "\n", - "Your input fields are:\n", - "1. `dataset_description` (str): A description of the dataset that we are using.\n", - "2. `task_demos` (str): Example inputs/outputs of our module.\n", - "3. `basic_instruction` (str): Basic instruction.\n", - "4. `tip` (str): A suggestion for how to go about generating the new instruction.\n", - "Your output fields are:\n", - "1. `proposed_instruction` (str): Propose an instruction that will be used to prompt a Language Model to perform this task.\n", - "All interactions will be structured in the following way, with the appropriate values filled in.\n", - "\n", - "[[ ## dataset_description ## ]]\n", - "{dataset_description}\n", - "\n", - "[[ ## task_demos ## ]]\n", - "{task_demos}\n", - "\n", - "[[ ## basic_instruction ## ]]\n", - "{basic_instruction}\n", - "\n", - "[[ ## tip ## ]]\n", - "{tip}\n", - "\n", - "[[ ## proposed_instruction ## ]]\n", - "{proposed_instruction}\n", - "\n", - "[[ ## completed ## ]]\n", - "In adhering to this structure, your objective is: \n", - " Use the information below to learn about a task that we are trying to solve using calls to an LM, then generate a new instruction that will be used to prompt a Language Model to better solve the task.\n", - "\n", - "\n", - "\u001b[31mUser message:\u001b[0m\n", - "\n", - "[[ ## dataset_description ## ]]\n", - "The dataset is centered on technical programming questions addressing code errors and functionality issues, especially involving libraries like Langchain, Neo4j, Milvus, and Google Cloud Storage. It features detailed problem descriptions paired with concise solution nuggets, aimed at debugging and enhancing code related to data processing, vector stores, graph databases, and language model integrations. The examples often include code snippets and error messages to provide context for precise, context-aware troubleshooting guidance.\n", - "\n", - "[[ ## task_demos ## ]]\n", - "No task demos provided.\n", - "\n", - "[[ ## basic_instruction ## ]]\n", - "Write search queries to gather information from a search engine that will help answer the question.\n", - "Consider both exploration and result diversity to capture multiple interpretations and facets of a query.\n", - "\n", - "[[ ## tip ## ]]\n", - "Keep the instruction clear and concise.\n", - "\n", - "Respond with the corresponding output fields, starting with the field `[[ ## proposed_instruction ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`.\n", - "\n", - "\n", - "\u001b[31mResponse:\u001b[0m\n", - "\n", - "\u001b[32m[[ ## proposed_instruction ## ]]\n", - "Given a technical programming question involving code errors or functionality issues—especially related to libraries like Langchain, Neo4j, Milvus, or Google Cloud Storage—write clear and diverse search queries that will retrieve relevant and varied information from search engines. Ensure the queries cover multiple aspects of the problem, including error messages, code context, and possible solutions, to maximize the chances of finding precise and comprehensive debugging guidance.\n", - "\n", - "[[ ## completed ## ]]\u001b[0m\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "PROPOSED INSTRUCTION: Given a technical programming question involving code errors or functionality issues—especially related to libraries like Langchain, Neo4j, Milvus, or Google Cloud Storage—write clear and diverse search queries that will retrieve relevant and varied information from search engines. Ensure the queries cover multiple aspects of the problem, including error messages, code context, and possible solutions, to maximize the chances of finding precise and comprehensive debugging guidance.\n", - "Using a randomly generated configuration for our grounded proposer.\n", - "Selected tip: high_stakes\n", - "task_demos Question: Hello i am trying to run this following code but i am getting an error;\n", - "from langchain.schema import BaseOuputParser\n", - "\n", - "Error;\n", - "\n", - "ImportError: cannot import name 'BaseOuputParser' from\n", - "'langchain.schema'\n", - "\n", - "My langchain version is ; '0.1.7'\n", - "\n", - "Reasoning: Let's think step by step in order to The error message indicates that the code is trying to import 'BaseOuputParser' from 'langchain.schema', but the import fails. The likely cause is a typo in the class name: 'BaseOuputParser' should probably be 'BaseOutputParser' (missing a 't' in 'Output'). Additionally, the user is using langchain version 0.1.7, which is an early version and may not have this class or may have different module structures. To resolve this, I need to check the correct class name and its availability in langchain version 0.1.7, and also verify the correct import path. Searching for the correct spelling, the class existence, and the import path in the specified version will help fix the import error.\n", - "Search Queries: ['langchain BaseOutputParser class', 'langchain.schema BaseOutputParser import', 'langchain version 0.1.7 BaseOutputParser', 'langchain 0.1.7 import BaseOutputParser error', 'langchain BaseOutputParser typo BaseOuputParser', 'langchain changelog 0.1.7 BaseOutputParser', 'langchain schema module classes 0.1.7']\n", - "\n", - "Question: I Am trying to download a PDF file from a GCS storage bucket and read the content into memory.\n", - "When using Langchain with python, i can just use the GCSDirectoryLoader to read all the files in a bucket and the pdf text.\n", - "Langchain for NodeJs doesnt have GCSDirectoryLoader or a webloader for PDF files.\n", - "When downloading the file, i get a Document with the binary representation as content.\n", - "What is the best way to download pdf content from GCS bucket into memory?\n", - "\n", - "Reasoning: Let's think step by step in order to The user wants to download a PDF file from a Google Cloud Storage (GCS) bucket into memory using Node.js, and then read the PDF content as text. They mention that Langchain for Python has a GCSDirectoryLoader that can read PDFs directly, but the Node.js version lacks this feature. When downloading the file in Node.js, they get a Document with binary content, but they want to extract the text content from the PDF in memory without saving it to disk.\n", - "\n", - "To answer this, I need to find the best approach or libraries in Node.js to:\n", - "1. Download a PDF file from GCS into memory (likely as a buffer).\n", - "2. Parse or extract text from the PDF buffer in memory.\n", - "\n", - "This involves:\n", - "- Using the official Google Cloud Storage Node.js client to download the file as a buffer.\n", - "- Using a PDF parsing library in Node.js that can accept a buffer or stream and extract text, such as pdf-parse, pdfjs-dist, or pdf-lib.\n", - "- Possibly examples or best practices for this workflow.\n", - "\n", - "I will search for:\n", - "- How to download a file from GCS into memory in Node.js.\n", - "- How to extract text from a PDF buffer in Node.js.\n", - "- Combining these two steps.\n", - "- Any existing examples or best practices for this use case.\n", - "Search Queries: ['download pdf file from google cloud storage into memory nodejs', 'google cloud storage nodejs download file as buffer', 'extract text from pdf buffer nodejs', 'pdf-parse example nodejs extract text from buffer', 'read pdf file from buffer in nodejs', 'best way to read pdf content from buffer nodejs', 'google cloud storage nodejs download pdf and parse text', 'langchain nodejs pdf loader alternative', 'how to parse pdf content in memory nodejs']\n", - "\n", - "Question: Hello i am trying to run this following code but i am getting an error;\n", - "from langchain.schema import BaseOuputParser\n", - "\n", - "Error;\n", - "\n", - "ImportError: cannot import name 'BaseOuputParser' from\n", - "'langchain.schema'\n", - "\n", - "My langchain version is ; '0.1.7'\n", - "\n", - "Reasoning: Let's think step by step in order to The error message indicates that the code is trying to import 'BaseOuputParser' from 'langchain.schema', but the import fails. The likely cause is a typo in the class name: 'BaseOuputParser' should probably be 'BaseOutputParser' (missing a 't' in 'Output'). Additionally, the user is using langchain version 0.1.7, which is an early version and may not have this class or may have different naming or module structure. To resolve this, I need to check the correct class name and its availability in langchain version 0.1.7, and whether the import path is correct. Also, I should verify if the class exists in that version or if the user needs to upgrade langchain. Therefore, I will search for the correct class name, its import path, and compatibility with langchain 0.1.7.\n", - "Search Queries: ['langchain BaseOutputParser import error', 'langchain BaseOutputParser class version 0.1.7', 'langchain.schema BaseOutputParser availability', 'langchain 0.1.7 changelog BaseOutputParser', 'langchain BaseOutputParser correct import path', 'langchain BaseOutputParser typo BaseOuputParser', 'how to import BaseOutputParser in langchain 0.1.7']\n", - "\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pydantic/main.py:453: UserWarning: Pydantic serializer warnings:\n", - " PydanticSerializationUnexpectedValue(Expected 9 fields but got 6: Expected `Message` - serialized value may not be as expected [input_value=Message(content='[[ ## pr...: None}, annotations=[]), input_type=Message])\n", - " PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [input_value=Choices(finish_reason='st...ider_specific_fields={}), input_type=Choices])\n", - " return self.__pydantic_serializer__.to_python(\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "\n", - "\n", - "\n", - "\u001b[34m[2025-07-24T21:21:35.260994]\u001b[0m\n", - "\n", - "\u001b[31mSystem message:\u001b[0m\n", - "\n", - "Your input fields are:\n", - "1. `dataset_description` (str): A description of the dataset that we are using.\n", - "2. `task_demos` (str): Example inputs/outputs of our module.\n", - "3. `basic_instruction` (str): Basic instruction.\n", - "4. `tip` (str): A suggestion for how to go about generating the new instruction.\n", - "Your output fields are:\n", - "1. `proposed_instruction` (str): Propose an instruction that will be used to prompt a Language Model to perform this task.\n", - "All interactions will be structured in the following way, with the appropriate values filled in.\n", - "\n", - "[[ ## dataset_description ## ]]\n", - "{dataset_description}\n", - "\n", - "[[ ## task_demos ## ]]\n", - "{task_demos}\n", - "\n", - "[[ ## basic_instruction ## ]]\n", - "{basic_instruction}\n", - "\n", - "[[ ## tip ## ]]\n", - "{tip}\n", - "\n", - "[[ ## proposed_instruction ## ]]\n", - "{proposed_instruction}\n", - "\n", - "[[ ## completed ## ]]\n", - "In adhering to this structure, your objective is: \n", - " Use the information below to learn about a task that we are trying to solve using calls to an LM, then generate a new instruction that will be used to prompt a Language Model to better solve the task.\n", - "\n", - "\n", - "\u001b[31mUser message:\u001b[0m\n", - "\n", - "[[ ## dataset_description ## ]]\n", - "The dataset is centered on technical programming questions addressing code errors and functionality issues, especially involving libraries like Langchain, Neo4j, Milvus, and Google Cloud Storage. It features detailed problem descriptions paired with concise solution nuggets, aimed at debugging and enhancing code related to data processing, vector stores, graph databases, and language model integrations. The examples often include code snippets and error messages to provide context for precise, context-aware troubleshooting guidance.\n", - "\n", - "[[ ## task_demos ## ]]\n", - "Question: Hello i am trying to run this following code but i am getting an error;\n", - "from langchain.schema import BaseOuputParser\n", - "\n", - "Error;\n", - "\n", - "ImportError: cannot import name 'BaseOuputParser' from\n", - "'langchain.schema'\n", - "\n", - "My langchain version is ; '0.1.7'\n", - "\n", - "Reasoning: Let's think step by step in order to The error message indicates that the code is trying to import 'BaseOuputParser' from 'langchain.schema', but the import fails. The likely cause is a typo in the class name: 'BaseOuputParser' should probably be 'BaseOutputParser' (missing a 't' in 'Output'). Additionally, the user is using langchain version 0.1.7, which is an early version and may not have this class or may have different module structures. To resolve this, I need to check the correct class name and its availability in langchain version 0.1.7, and also verify the correct import path. Searching for the correct spelling, the class existence, and the import path in the specified version will help fix the import error.\n", - "Search Queries: ['langchain BaseOutputParser class', 'langchain.schema BaseOutputParser import', 'langchain version 0.1.7 BaseOutputParser', 'langchain 0.1.7 import BaseOutputParser error', 'langchain BaseOutputParser typo BaseOuputParser', 'langchain changelog 0.1.7 BaseOutputParser', 'langchain schema module classes 0.1.7']\n", - "\n", - "Question: I Am trying to download a PDF file from a GCS storage bucket and read the content into memory.\n", - "When using Langchain with python, i can just use the GCSDirectoryLoader to read all the files in a bucket and the pdf text.\n", - "Langchain for NodeJs doesnt have GCSDirectoryLoader or a webloader for PDF files.\n", - "When downloading the file, i get a Document with the binary representation as content.\n", - "What is the best way to download pdf content from GCS bucket into memory?\n", - "\n", - "Reasoning: Let's think step by step in order to The user wants to download a PDF file from a Google Cloud Storage (GCS) bucket into memory using Node.js, and then read the PDF content as text. They mention that Langchain for Python has a GCSDirectoryLoader that can read PDFs directly, but the Node.js version lacks this feature. When downloading the file in Node.js, they get a Document with binary content, but they want to extract the text content from the PDF in memory without saving it to disk.\n", - "\n", - "To answer this, I need to find the best approach or libraries in Node.js to:\n", - "1. Download a PDF file from GCS into memory (likely as a buffer).\n", - "2. Parse or extract text from the PDF buffer in memory.\n", - "\n", - "This involves:\n", - "- Using the official Google Cloud Storage Node.js client to download the file as a buffer.\n", - "- Using a PDF parsing library in Node.js that can accept a buffer or stream and extract text, such as pdf-parse, pdfjs-dist, or pdf-lib.\n", - "- Possibly examples or best practices for this workflow.\n", - "\n", - "I will search for:\n", - "- How to download a file from GCS into memory in Node.js.\n", - "- How to extract text from a PDF buffer in Node.js.\n", - "- Combining these two steps.\n", - "- Any existing examples or best practices for this use case.\n", - "Search Queries: ['download pdf file from google cloud storage into memory nodejs', 'google cloud storage nodejs download file as buffer', 'extract text from pdf buffer nodejs', 'pdf-parse example nodejs extract text from buffer', 'read pdf file from buffer in nodejs', 'best way to read pdf content from buffer nodejs', 'google cloud storage nodejs download pdf and parse text', 'langchain nodejs pdf loader alternative', 'how to parse pdf content in memory nodejs']\n", - "\n", - "Question: Hello i am trying to run this following code but i am getting an error;\n", - "from langchain.schema import BaseOuputParser\n", - "\n", - "Error;\n", - "\n", - "ImportError: cannot import name 'BaseOuputParser' from\n", - "'langchain.schema'\n", - "\n", - "My langchain version is ; '0.1.7'\n", - "\n", - "Reasoning: Let's think step by step in order to The error message indicates that the code is trying to import 'BaseOuputParser' from 'langchain.schema', but the import fails. The likely cause is a typo in the class name: 'BaseOuputParser' should probably be 'BaseOutputParser' (missing a 't' in 'Output'). Additionally, the user is using langchain version 0.1.7, which is an early version and may not have this class or may have different naming or module structure. To resolve this, I need to check the correct class name and its availability in langchain version 0.1.7, and whether the import path is correct. Also, I should verify if the class exists in that version or if the user needs to upgrade langchain. Therefore, I will search for the correct class name, its import path, and compatibility with langchain 0.1.7.\n", - "Search Queries: ['langchain BaseOutputParser import error', 'langchain BaseOutputParser class version 0.1.7', 'langchain.schema BaseOutputParser availability', 'langchain 0.1.7 changelog BaseOutputParser', 'langchain BaseOutputParser correct import path', 'langchain BaseOutputParser typo BaseOuputParser', 'how to import BaseOutputParser in langchain 0.1.7']\n", - "\n", - "\n", - "\n", - "[[ ## basic_instruction ## ]]\n", - "Write search queries to gather information from a search engine that will help answer the question.\n", - "Consider both exploration and result diversity to capture multiple interpretations and facets of a query.\n", - "\n", - "[[ ## tip ## ]]\n", - "The instruction should include a high stakes scenario in which the LM must solve the task!\n", - "\n", - "Respond with the corresponding output fields, starting with the field `[[ ## proposed_instruction ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`.\n", - "\n", - "\n", - "\u001b[31mResponse:\u001b[0m\n", - "\n", - "\u001b[32m[[ ## proposed_instruction ## ]]\n", - "You are an expert developer tasked with solving high-stakes, complex programming issues involving code errors and functionality problems in advanced libraries such as Langchain, Neo4j, Milvus, and Google Cloud Storage. Given a technical question describing a specific error or coding challenge, your job is to generate a diverse and comprehensive set of targeted search queries. These queries should explore multiple angles of the problem, including potential typos, version compatibility, library-specific usage, alternative methods, and best practices, to gather precise and relevant information from search engines. Your queries must be designed to uncover detailed troubleshooting steps, code examples, and documentation that will enable you to provide an accurate and effective solution to the user’s problem.\n", - "\n", - "[[ ## completed ## ]]\u001b[0m\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "PROPOSED INSTRUCTION: You are an expert developer tasked with solving high-stakes, complex programming issues involving code errors and functionality problems in advanced libraries such as Langchain, Neo4j, Milvus, and Google Cloud Storage. Given a technical question describing a specific error or coding challenge, your job is to generate a diverse and comprehensive set of targeted search queries. These queries should explore multiple angles of the problem, including potential typos, version compatibility, library-specific usage, alternative methods, and best practices, to gather precise and relevant information from search engines. Your queries must be designed to uncover detailed troubleshooting steps, code examples, and documentation that will enable you to provide an accurate and effective solution to the user’s problem.\n", - "Using a randomly generated configuration for our grounded proposer.\n", - "Selected tip: persona\n", - "task_demos Question: Hello i am trying to run this following code but i am getting an error;\n", - "from langchain.schema import BaseOuputParser\n", - "\n", - "Error;\n", - "\n", - "ImportError: cannot import name 'BaseOuputParser' from\n", - "'langchain.schema'\n", - "\n", - "My langchain version is ; '0.1.7'\n", - "\n", - "Reasoning: Let's think step by step in order to The error message indicates that the code is trying to import 'BaseOuputParser' from 'langchain.schema', but the import fails. The likely cause is a typo in the class name: 'BaseOuputParser' should probably be 'BaseOutputParser' (missing a 't' in 'Output'). Additionally, the user is using langchain version 0.1.7, which is an early version and may not have this class or may have different module structures. To resolve this, I need to check the correct class name and its availability in langchain version 0.1.7, and also verify the correct import path. Searching for the correct spelling, the class existence, and the import path in the specified version will help fix the import error.\n", - "Search Queries: ['langchain BaseOutputParser class', 'langchain.schema BaseOutputParser import', 'langchain version 0.1.7 BaseOutputParser', 'langchain 0.1.7 import BaseOutputParser error', 'langchain BaseOutputParser typo BaseOuputParser', 'langchain changelog 0.1.7 BaseOutputParser', 'langchain schema module classes 0.1.7']\n", - "\n", - "Question: I Am trying to download a PDF file from a GCS storage bucket and read the content into memory.\n", - "When using Langchain with python, i can just use the GCSDirectoryLoader to read all the files in a bucket and the pdf text.\n", - "Langchain for NodeJs doesnt have GCSDirectoryLoader or a webloader for PDF files.\n", - "When downloading the file, i get a Document with the binary representation as content.\n", - "What is the best way to download pdf content from GCS bucket into memory?\n", - "\n", - "Reasoning: Let's think step by step in order to The user wants to download a PDF file from a Google Cloud Storage (GCS) bucket into memory using Node.js, and then read the PDF content as text. They mention that Langchain for Python has a GCSDirectoryLoader that can read PDFs directly, but the Node.js version lacks this feature. When downloading the file in Node.js, they get a Document with binary content, but they want to extract the text content from the PDF in memory without saving it to disk.\n", - "\n", - "To answer this, I need to find the best approach or libraries in Node.js to:\n", - "1. Download a PDF file from GCS into memory (likely as a buffer).\n", - "2. Parse or extract text from the PDF buffer in memory.\n", - "\n", - "This involves:\n", - "- Using the official Google Cloud Storage Node.js client to download the file as a buffer.\n", - "- Using a PDF parsing library in Node.js that can accept a buffer or stream and extract text, such as pdf-parse, pdfjs-dist, or pdf-lib.\n", - "- Possibly examples or best practices for this workflow.\n", - "\n", - "I will search for:\n", - "- How to download a file from GCS into memory in Node.js.\n", - "- How to extract text from a PDF buffer in Node.js.\n", - "- Combining these two steps.\n", - "- Any existing examples or best practices for this use case.\n", - "Search Queries: ['download pdf file from google cloud storage into memory nodejs', 'google cloud storage nodejs download file as buffer', 'extract text from pdf buffer nodejs', 'pdf-parse example nodejs extract text from buffer', 'read pdf file from buffer in nodejs', 'best way to read pdf content from buffer nodejs', 'google cloud storage nodejs download pdf and parse text', 'langchain nodejs pdf loader alternative', 'how to parse pdf content in memory nodejs']\n", - "\n", - "Question: Hello i am trying to run this following code but i am getting an error;\n", - "from langchain.schema import BaseOuputParser\n", - "\n", - "Error;\n", - "\n", - "ImportError: cannot import name 'BaseOuputParser' from\n", - "'langchain.schema'\n", - "\n", - "My langchain version is ; '0.1.7'\n", - "\n", - "Reasoning: Let's think step by step in order to The error message indicates that the code is trying to import 'BaseOuputParser' from 'langchain.schema', but the import fails. The likely cause is a typo in the class name: 'BaseOuputParser' should probably be 'BaseOutputParser' (missing a 't' in 'Output'). Additionally, the user is using langchain version 0.1.7, which is an early version and may not have this class or may have different naming or module structure. To resolve this, I need to check the correct class name and its availability in langchain version 0.1.7, and whether the import path is correct. Also, I should verify if the class exists in that version or if the user needs to upgrade langchain. Therefore, I will search for the correct class name, its import path, and compatibility with langchain 0.1.7.\n", - "Search Queries: ['langchain BaseOutputParser import error', 'langchain BaseOutputParser class version 0.1.7', 'langchain.schema BaseOutputParser availability', 'langchain 0.1.7 changelog BaseOutputParser', 'langchain BaseOutputParser correct import path', 'langchain BaseOutputParser typo BaseOuputParser', 'how to import BaseOutputParser in langchain 0.1.7']\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\u001b[34m[2025-07-24T21:21:38.252189]\u001b[0m\n", - "\n", - "\u001b[31mSystem message:\u001b[0m\n", - "\n", - "Your input fields are:\n", - "1. `dataset_description` (str): A description of the dataset that we are using.\n", - "2. `task_demos` (str): Example inputs/outputs of our module.\n", - "3. `basic_instruction` (str): Basic instruction.\n", - "4. `tip` (str): A suggestion for how to go about generating the new instruction.\n", - "Your output fields are:\n", - "1. `proposed_instruction` (str): Propose an instruction that will be used to prompt a Language Model to perform this task.\n", - "All interactions will be structured in the following way, with the appropriate values filled in.\n", - "\n", - "[[ ## dataset_description ## ]]\n", - "{dataset_description}\n", - "\n", - "[[ ## task_demos ## ]]\n", - "{task_demos}\n", - "\n", - "[[ ## basic_instruction ## ]]\n", - "{basic_instruction}\n", - "\n", - "[[ ## tip ## ]]\n", - "{tip}\n", - "\n", - "[[ ## proposed_instruction ## ]]\n", - "{proposed_instruction}\n", - "\n", - "[[ ## completed ## ]]\n", - "In adhering to this structure, your objective is: \n", - " Use the information below to learn about a task that we are trying to solve using calls to an LM, then generate a new instruction that will be used to prompt a Language Model to better solve the task.\n", - "\n", - "\n", - "\u001b[31mUser message:\u001b[0m\n", - "\n", - "[[ ## dataset_description ## ]]\n", - "The dataset is centered on technical programming questions addressing code errors and functionality issues, especially involving libraries like Langchain, Neo4j, Milvus, and Google Cloud Storage. It features detailed problem descriptions paired with concise solution nuggets, aimed at debugging and enhancing code related to data processing, vector stores, graph databases, and language model integrations. The examples often include code snippets and error messages to provide context for precise, context-aware troubleshooting guidance.\n", - "\n", - "[[ ## task_demos ## ]]\n", - "Question: Hello i am trying to run this following code but i am getting an error;\n", - "from langchain.schema import BaseOuputParser\n", - "\n", - "Error;\n", - "\n", - "ImportError: cannot import name 'BaseOuputParser' from\n", - "'langchain.schema'\n", - "\n", - "My langchain version is ; '0.1.7'\n", - "\n", - "Reasoning: Let's think step by step in order to The error message indicates that the code is trying to import 'BaseOuputParser' from 'langchain.schema', but the import fails. The likely cause is a typo in the class name: 'BaseOuputParser' should probably be 'BaseOutputParser' (missing a 't' in 'Output'). Additionally, the user is using langchain version 0.1.7, which is an early version and may not have this class or may have different module structures. To resolve this, I need to check the correct class name and its availability in langchain version 0.1.7, and also verify the correct import path. Searching for the correct spelling, the class existence, and the import path in the specified version will help fix the import error.\n", - "Search Queries: ['langchain BaseOutputParser class', 'langchain.schema BaseOutputParser import', 'langchain version 0.1.7 BaseOutputParser', 'langchain 0.1.7 import BaseOutputParser error', 'langchain BaseOutputParser typo BaseOuputParser', 'langchain changelog 0.1.7 BaseOutputParser', 'langchain schema module classes 0.1.7']\n", - "\n", - "Question: I Am trying to download a PDF file from a GCS storage bucket and read the content into memory.\n", - "When using Langchain with python, i can just use the GCSDirectoryLoader to read all the files in a bucket and the pdf text.\n", - "Langchain for NodeJs doesnt have GCSDirectoryLoader or a webloader for PDF files.\n", - "When downloading the file, i get a Document with the binary representation as content.\n", - "What is the best way to download pdf content from GCS bucket into memory?\n", - "\n", - "Reasoning: Let's think step by step in order to The user wants to download a PDF file from a Google Cloud Storage (GCS) bucket into memory using Node.js, and then read the PDF content as text. They mention that Langchain for Python has a GCSDirectoryLoader that can read PDFs directly, but the Node.js version lacks this feature. When downloading the file in Node.js, they get a Document with binary content, but they want to extract the text content from the PDF in memory without saving it to disk.\n", - "\n", - "To answer this, I need to find the best approach or libraries in Node.js to:\n", - "1. Download a PDF file from GCS into memory (likely as a buffer).\n", - "2. Parse or extract text from the PDF buffer in memory.\n", - "\n", - "This involves:\n", - "- Using the official Google Cloud Storage Node.js client to download the file as a buffer.\n", - "- Using a PDF parsing library in Node.js that can accept a buffer or stream and extract text, such as pdf-parse, pdfjs-dist, or pdf-lib.\n", - "- Possibly examples or best practices for this workflow.\n", - "\n", - "I will search for:\n", - "- How to download a file from GCS into memory in Node.js.\n", - "- How to extract text from a PDF buffer in Node.js.\n", - "- Combining these two steps.\n", - "- Any existing examples or best practices for this use case.\n", - "Search Queries: ['download pdf file from google cloud storage into memory nodejs', 'google cloud storage nodejs download file as buffer', 'extract text from pdf buffer nodejs', 'pdf-parse example nodejs extract text from buffer', 'read pdf file from buffer in nodejs', 'best way to read pdf content from buffer nodejs', 'google cloud storage nodejs download pdf and parse text', 'langchain nodejs pdf loader alternative', 'how to parse pdf content in memory nodejs']\n", - "\n", - "Question: Hello i am trying to run this following code but i am getting an error;\n", - "from langchain.schema import BaseOuputParser\n", - "\n", - "Error;\n", - "\n", - "ImportError: cannot import name 'BaseOuputParser' from\n", - "'langchain.schema'\n", - "\n", - "My langchain version is ; '0.1.7'\n", - "\n", - "Reasoning: Let's think step by step in order to The error message indicates that the code is trying to import 'BaseOuputParser' from 'langchain.schema', but the import fails. The likely cause is a typo in the class name: 'BaseOuputParser' should probably be 'BaseOutputParser' (missing a 't' in 'Output'). Additionally, the user is using langchain version 0.1.7, which is an early version and may not have this class or may have different naming or module structure. To resolve this, I need to check the correct class name and its availability in langchain version 0.1.7, and whether the import path is correct. Also, I should verify if the class exists in that version or if the user needs to upgrade langchain. Therefore, I will search for the correct class name, its import path, and compatibility with langchain 0.1.7.\n", - "Search Queries: ['langchain BaseOutputParser import error', 'langchain BaseOutputParser class version 0.1.7', 'langchain.schema BaseOutputParser availability', 'langchain 0.1.7 changelog BaseOutputParser', 'langchain BaseOutputParser correct import path', 'langchain BaseOutputParser typo BaseOuputParser', 'how to import BaseOutputParser in langchain 0.1.7']\n", - "\n", - "\n", - "\n", - "[[ ## basic_instruction ## ]]\n", - "Write search queries to gather information from a search engine that will help answer the question.\n", - "Consider both exploration and result diversity to capture multiple interpretations and facets of a query.\n", - "\n", - "[[ ## tip ## ]]\n", - "Include a persona that is relevant to the task in the instruction (ie. \"You are a ...\")\n", - "\n", - "Respond with the corresponding output fields, starting with the field `[[ ## proposed_instruction ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`.\n", - "\n", - "\n", - "\u001b[31mResponse:\u001b[0m\n", - "\n", - "\u001b[32m[[ ## proposed_instruction ## ]]\n", - "You are an expert software engineer and technical researcher specializing in debugging and resolving issues related to programming libraries such as Langchain, Neo4j, Milvus, and Google Cloud Storage integrations. Given a technical question involving code errors, library usage, or functionality issues, generate a diverse and comprehensive set of targeted search queries that will help gather relevant information from search engines. Your queries should explore different angles including error messages, version compatibility, alternative methods, best practices, and relevant documentation. Aim to cover both broad and specific aspects to ensure a thorough understanding that will enable accurate, context-aware troubleshooting and solution recommendations.\n", - "\n", - "[[ ## completed ## ]]\u001b[0m\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "PROPOSED INSTRUCTION: You are an expert software engineer and technical researcher specializing in debugging and resolving issues related to programming libraries such as Langchain, Neo4j, Milvus, and Google Cloud Storage integrations. Given a technical question involving code errors, library usage, or functionality issues, generate a diverse and comprehensive set of targeted search queries that will help gather relevant information from search engines. Your queries should explore different angles including error messages, version compatibility, alternative methods, best practices, and relevant documentation. Aim to cover both broad and specific aspects to ensure a thorough understanding that will enable accurate, context-aware troubleshooting and solution recommendations.\n", - "Using a randomly generated configuration for our grounded proposer.\n", - "Selected tip: high_stakes\n", - "task_demos Question: Hello i am trying to run this following code but i am getting an error;\n", - "from langchain.schema import BaseOuputParser\n", - "\n", - "Error;\n", - "\n", - "ImportError: cannot import name 'BaseOuputParser' from\n", - "'langchain.schema'\n", - "\n", - "My langchain version is ; '0.1.7'\n", - "\n", - "Reasoning: Let's think step by step in order to The error message indicates that the code is trying to import 'BaseOuputParser' from 'langchain.schema', but the import fails. The likely cause is a typo in the class name: 'BaseOuputParser' should probably be 'BaseOutputParser' (missing a 't' in 'Output'). Additionally, the user is using langchain version 0.1.7, which is an early version and may not have this class or may have different naming or module structure. To resolve this, I need to check the correct class name and its availability in langchain version 0.1.7, and whether the import path is correct. Also, I should verify if the class exists in that version or if the user needs to upgrade langchain. Therefore, I will search for the correct class name, its import path, and compatibility with langchain 0.1.7.\n", - "Search Queries: ['langchain BaseOutputParser import error', 'langchain BaseOutputParser class version 0.1.7', 'langchain.schema BaseOutputParser availability', 'langchain 0.1.7 changelog BaseOutputParser', 'langchain BaseOutputParser correct import path', 'langchain BaseOutputParser typo BaseOuputParser', 'how to import BaseOutputParser in langchain 0.1.7']\n", - "\n", - "Question: I Am trying to download a PDF file from a GCS storage bucket and read the content into memory.\n", - "When using Langchain with python, i can just use the GCSDirectoryLoader to read all the files in a bucket and the pdf text.\n", - "Langchain for NodeJs doesnt have GCSDirectoryLoader or a webloader for PDF files.\n", - "When downloading the file, i get a Document with the binary representation as content.\n", - "What is the best way to download pdf content from GCS bucket into memory?\n", - "\n", - "Reasoning: Let's think step by step in order to The user wants to download a PDF file from a Google Cloud Storage (GCS) bucket and read its content into memory using Node.js. They mention that Langchain's Python version has a GCSDirectoryLoader that can read PDFs directly, but the Node.js version lacks this feature. When they download the file, they get a Document with binary content, but they want to extract the text content from the PDF in memory.\n", - "\n", - "To answer this, I need to find the best approach or libraries in Node.js to:\n", - "1. Download a PDF file from a GCS bucket into memory (not just to disk).\n", - "2. Parse or extract text content from the PDF binary data in memory.\n", - "\n", - "This likely involves using the official Google Cloud Storage Node.js client to download the file as a buffer, then using a PDF parsing library (like pdf-parse, pdfjs-dist, or similar) to extract text from the buffer.\n", - "\n", - "I should search for:\n", - "- How to download a file from GCS bucket into memory in Node.js\n", - "- How to extract text from a PDF buffer in Node.js\n", - "- Examples or best practices combining these two steps\n", - "- Possibly any Langchain Node.js community solutions or workarounds for PDF loading from GCS\n", - "Search Queries: ['download file from Google Cloud Storage bucket into memory Node.js', 'extract text from PDF buffer Node.js', 'parse PDF from buffer Node.js', 'google cloud storage download pdf to buffer nodejs example', 'pdf text extraction libraries Node.js', 'langchain Node.js load PDF from GCS', 'best way to read PDF content from buffer Node.js']\n", - "\n", - "Question: I Am trying to download a PDF file from a GCS storage bucket and read the content into memory.\n", - "When using Langchain with python, i can just use the GCSDirectoryLoader to read all the files in a bucket and the pdf text.\n", - "Langchain for NodeJs doesnt have GCSDirectoryLoader or a webloader for PDF files.\n", - "When downloading the file, i get a Document with the binary representation as content.\n", - "What is the best way to download pdf content from GCS bucket into memory?\n", - "\n", - "Reasoning: Let's think step by step in order to The user wants to download a PDF file from a Google Cloud Storage (GCS) bucket and read its content into memory using Node.js. They mention that Langchain's Python version has a GCSDirectoryLoader that can read PDFs directly, but the Node.js version lacks this feature. When they download the file, they get a Document with binary content, but they want to extract the text content from the PDF in memory.\n", - "\n", - "To answer this, I need to find the best approach or libraries in Node.js to:\n", - "1. Download a PDF file from a GCS bucket into memory (not just to disk).\n", - "2. Parse or extract text content from the PDF binary data in memory.\n", - "\n", - "This likely involves using the official Google Cloud Storage Node.js client to download the file as a buffer, then using a PDF parsing library (like pdf-parse, pdfjs-dist, or similar) to extract text from the buffer.\n", - "\n", - "I should search for:\n", - "- How to download a file from GCS bucket into memory using Node.js.\n", - "- How to extract text from a PDF buffer in Node.js.\n", - "- Examples or best practices combining these two steps.\n", - "- Possibly if there are any Langchain Node.js community solutions or recommended approaches for PDF loading from GCS.\n", - "Search Queries: ['download file from Google Cloud Storage bucket into memory Node.js', 'extract text from PDF buffer Node.js', 'parse PDF from buffer Node.js pdf-parse', 'Google Cloud Storage Node.js download file as buffer', 'Node.js read PDF content from memory', 'Langchain Node.js load PDF from GCS', 'best way to read PDF content from GCS bucket Node.js']\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\u001b[34m[2025-07-24T21:21:42.501478]\u001b[0m\n", - "\n", - "\u001b[31mSystem message:\u001b[0m\n", - "\n", - "Your input fields are:\n", - "1. `dataset_description` (str): A description of the dataset that we are using.\n", - "2. `task_demos` (str): Example inputs/outputs of our module.\n", - "3. `basic_instruction` (str): Basic instruction.\n", - "4. `tip` (str): A suggestion for how to go about generating the new instruction.\n", - "Your output fields are:\n", - "1. `proposed_instruction` (str): Propose an instruction that will be used to prompt a Language Model to perform this task.\n", - "All interactions will be structured in the following way, with the appropriate values filled in.\n", - "\n", - "[[ ## dataset_description ## ]]\n", - "{dataset_description}\n", - "\n", - "[[ ## task_demos ## ]]\n", - "{task_demos}\n", - "\n", - "[[ ## basic_instruction ## ]]\n", - "{basic_instruction}\n", - "\n", - "[[ ## tip ## ]]\n", - "{tip}\n", - "\n", - "[[ ## proposed_instruction ## ]]\n", - "{proposed_instruction}\n", - "\n", - "[[ ## completed ## ]]\n", - "In adhering to this structure, your objective is: \n", - " Use the information below to learn about a task that we are trying to solve using calls to an LM, then generate a new instruction that will be used to prompt a Language Model to better solve the task.\n", - "\n", - "\n", - "\u001b[31mUser message:\u001b[0m\n", - "\n", - "[[ ## dataset_description ## ]]\n", - "The dataset is centered on technical programming questions addressing code errors and functionality issues, especially involving libraries like Langchain, Neo4j, Milvus, and Google Cloud Storage. It features detailed problem descriptions paired with concise solution nuggets, aimed at debugging and enhancing code related to data processing, vector stores, graph databases, and language model integrations. The examples often include code snippets and error messages to provide context for precise, context-aware troubleshooting guidance.\n", - "\n", - "[[ ## task_demos ## ]]\n", - "Question: Hello i am trying to run this following code but i am getting an error;\n", - "from langchain.schema import BaseOuputParser\n", - "\n", - "Error;\n", - "\n", - "ImportError: cannot import name 'BaseOuputParser' from\n", - "'langchain.schema'\n", - "\n", - "My langchain version is ; '0.1.7'\n", - "\n", - "Reasoning: Let's think step by step in order to The error message indicates that the code is trying to import 'BaseOuputParser' from 'langchain.schema', but the import fails. The likely cause is a typo in the class name: 'BaseOuputParser' should probably be 'BaseOutputParser' (missing a 't' in 'Output'). Additionally, the user is using langchain version 0.1.7, which is an early version and may not have this class or may have different naming or module structure. To resolve this, I need to check the correct class name and its availability in langchain version 0.1.7, and whether the import path is correct. Also, I should verify if the class exists in that version or if the user needs to upgrade langchain. Therefore, I will search for the correct class name, its import path, and compatibility with langchain 0.1.7.\n", - "Search Queries: ['langchain BaseOutputParser import error', 'langchain BaseOutputParser class version 0.1.7', 'langchain.schema BaseOutputParser availability', 'langchain 0.1.7 changelog BaseOutputParser', 'langchain BaseOutputParser correct import path', 'langchain BaseOutputParser typo BaseOuputParser', 'how to import BaseOutputParser in langchain 0.1.7']\n", - "\n", - "Question: I Am trying to download a PDF file from a GCS storage bucket and read the content into memory.\n", - "When using Langchain with python, i can just use the GCSDirectoryLoader to read all the files in a bucket and the pdf text.\n", - "Langchain for NodeJs doesnt have GCSDirectoryLoader or a webloader for PDF files.\n", - "When downloading the file, i get a Document with the binary representation as content.\n", - "What is the best way to download pdf content from GCS bucket into memory?\n", - "\n", - "Reasoning: Let's think step by step in order to The user wants to download a PDF file from a Google Cloud Storage (GCS) bucket and read its content into memory using Node.js. They mention that Langchain's Python version has a GCSDirectoryLoader that can read PDFs directly, but the Node.js version lacks this feature. When they download the file, they get a Document with binary content, but they want to extract the text content from the PDF in memory.\n", - "\n", - "To answer this, I need to find the best approach or libraries in Node.js to:\n", - "1. Download a PDF file from a GCS bucket into memory (not just to disk).\n", - "2. Parse or extract text content from the PDF binary data in memory.\n", - "\n", - "This likely involves using the official Google Cloud Storage Node.js client to download the file as a buffer, then using a PDF parsing library (like pdf-parse, pdfjs-dist, or similar) to extract text from the buffer.\n", - "\n", - "I should search for:\n", - "- How to download a file from GCS bucket into memory in Node.js\n", - "- How to extract text from a PDF buffer in Node.js\n", - "- Examples or best practices combining these two steps\n", - "- Possibly any Langchain Node.js community solutions or workarounds for PDF loading from GCS\n", - "Search Queries: ['download file from Google Cloud Storage bucket into memory Node.js', 'extract text from PDF buffer Node.js', 'parse PDF from buffer Node.js', 'google cloud storage download pdf to buffer nodejs example', 'pdf text extraction libraries Node.js', 'langchain Node.js load PDF from GCS', 'best way to read PDF content from buffer Node.js']\n", - "\n", - "Question: I Am trying to download a PDF file from a GCS storage bucket and read the content into memory.\n", - "When using Langchain with python, i can just use the GCSDirectoryLoader to read all the files in a bucket and the pdf text.\n", - "Langchain for NodeJs doesnt have GCSDirectoryLoader or a webloader for PDF files.\n", - "When downloading the file, i get a Document with the binary representation as content.\n", - "What is the best way to download pdf content from GCS bucket into memory?\n", - "\n", - "Reasoning: Let's think step by step in order to The user wants to download a PDF file from a Google Cloud Storage (GCS) bucket and read its content into memory using Node.js. They mention that Langchain's Python version has a GCSDirectoryLoader that can read PDFs directly, but the Node.js version lacks this feature. When they download the file, they get a Document with binary content, but they want to extract the text content from the PDF in memory.\n", - "\n", - "To answer this, I need to find the best approach or libraries in Node.js to:\n", - "1. Download a PDF file from a GCS bucket into memory (not just to disk).\n", - "2. Parse or extract text content from the PDF binary data in memory.\n", - "\n", - "This likely involves using the official Google Cloud Storage Node.js client to download the file as a buffer, then using a PDF parsing library (like pdf-parse, pdfjs-dist, or similar) to extract text from the buffer.\n", - "\n", - "I should search for:\n", - "- How to download a file from GCS bucket into memory using Node.js.\n", - "- How to extract text from a PDF buffer in Node.js.\n", - "- Examples or best practices combining these two steps.\n", - "- Possibly if there are any Langchain Node.js community solutions or recommended approaches for PDF loading from GCS.\n", - "Search Queries: ['download file from Google Cloud Storage bucket into memory Node.js', 'extract text from PDF buffer Node.js', 'parse PDF from buffer Node.js pdf-parse', 'Google Cloud Storage Node.js download file as buffer', 'Node.js read PDF content from memory', 'Langchain Node.js load PDF from GCS', 'best way to read PDF content from GCS bucket Node.js']\n", - "\n", - "\n", - "\n", - "[[ ## basic_instruction ## ]]\n", - "Write search queries to gather information from a search engine that will help answer the question.\n", - "Consider both exploration and result diversity to capture multiple interpretations and facets of a query.\n", - "\n", - "[[ ## tip ## ]]\n", - "The instruction should include a high stakes scenario in which the LM must solve the task!\n", - "\n", - "Respond with the corresponding output fields, starting with the field `[[ ## proposed_instruction ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`.\n", - "\n", - "\n", - "\u001b[31mResponse:\u001b[0m\n", - "\n", - "\u001b[32m[[ ## proposed_instruction ## ]]\n", - "You are tasked with solving complex and high-stakes technical programming questions related to code errors and functionality issues involving advanced libraries such as Langchain, Neo4j, Milvus, and Google Cloud Storage. Given a detailed problem description that often includes code snippets and error messages, your goal is to generate a diverse and comprehensive set of targeted search queries. These queries should enable thorough exploration and retrieval of relevant information from search engines to precisely diagnose and resolve the user's issue. When formulating these queries, consider multiple angles and possible interpretations of the problem to ensure coverage of all potential causes and solutions. Your search queries should be crafted to maximize the chances of finding the most accurate, up-to-date, and context-aware troubleshooting guidance, especially in scenarios where the problem involves debugging code integrations, handling version compatibility, or processing complex data types like PDFs in memory.\n", - "\n", - "[[ ## completed ## ]]\u001b[0m\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "PROPOSED INSTRUCTION: You are tasked with solving complex and high-stakes technical programming questions related to code errors and functionality issues involving advanced libraries such as Langchain, Neo4j, Milvus, and Google Cloud Storage. Given a detailed problem description that often includes code snippets and error messages, your goal is to generate a diverse and comprehensive set of targeted search queries. These queries should enable thorough exploration and retrieval of relevant information from search engines to precisely diagnose and resolve the user's issue. When formulating these queries, consider multiple angles and possible interpretations of the problem to ensure coverage of all potential causes and solutions. Your search queries should be crafted to maximize the chances of finding the most accurate, up-to-date, and context-aware troubleshooting guidance, especially in scenarios where the problem involves debugging code integrations, handling version compatibility, or processing complex data types like PDFs in memory.\n", - "Using a randomly generated configuration for our grounded proposer.\n", - "Selected tip: none\n", - "task_demos Question: I Am trying to download a PDF file from a GCS storage bucket and read the content into memory.\n", - "When using Langchain with python, i can just use the GCSDirectoryLoader to read all the files in a bucket and the pdf text.\n", - "Langchain for NodeJs doesnt have GCSDirectoryLoader or a webloader for PDF files.\n", - "When downloading the file, i get a Document with the binary representation as content.\n", - "What is the best way to download pdf content from GCS bucket into memory?\n", - "\n", - "Reasoning: Let's think step by step in order to The user wants to download a PDF file from a Google Cloud Storage (GCS) bucket and read its content into memory using Node.js. They mention that Langchain's Python version has a GCSDirectoryLoader that can read PDFs directly, but the Node.js version lacks this feature. When they download the file, they get a Document with binary content, but they want to extract the text content from the PDF in memory.\n", - "\n", - "To answer this, I need to find the best approach or libraries in Node.js to:\n", - "1. Download a PDF file from a GCS bucket into memory (not just to disk).\n", - "2. Parse or extract text content from the PDF binary data in memory.\n", - "\n", - "This likely involves using the official Google Cloud Storage Node.js client to download the file as a buffer, then using a PDF parsing library (like pdf-parse, pdfjs-dist, or similar) to extract text from the buffer.\n", - "\n", - "I should search for:\n", - "- How to download a file from GCS bucket into memory using Node.js.\n", - "- How to extract text from a PDF buffer in Node.js.\n", - "- Examples or best practices combining these two steps.\n", - "- Possibly if there are any Langchain Node.js community solutions or recommended approaches for PDF loading from GCS.\n", - "Search Queries: ['download file from Google Cloud Storage bucket into memory Node.js', 'extract text from PDF buffer Node.js', 'parse PDF from buffer Node.js pdf-parse', 'Google Cloud Storage Node.js download file as buffer', 'Node.js read PDF content from memory', 'Langchain Node.js load PDF from GCS', 'best way to read PDF content from GCS bucket Node.js']\n", - "\n", - "Question: I Am trying to download a PDF file from a GCS storage bucket and read the content into memory.\n", - "When using Langchain with python, i can just use the GCSDirectoryLoader to read all the files in a bucket and the pdf text.\n", - "Langchain for NodeJs doesnt have GCSDirectoryLoader or a webloader for PDF files.\n", - "When downloading the file, i get a Document with the binary representation as content.\n", - "What is the best way to download pdf content from GCS bucket into memory?\n", - "\n", - "Reasoning: Let's think step by step in order to The user wants to download a PDF file from a Google Cloud Storage (GCS) bucket into memory using Node.js, and then read the PDF content as text. They mention that Langchain for Python has a GCSDirectoryLoader that can read PDFs directly, but the Node.js version lacks this feature. When downloading the file in Node.js, they get a Document with binary content, but they want to extract the text from the PDF in memory without saving it to disk.\n", - "\n", - "To answer this, I need to find the best practices or libraries for:\n", - "1. Downloading a file from GCS bucket into memory in Node.js.\n", - "2. Parsing or extracting text from a PDF file in memory in Node.js.\n", - "3. Possibly integrating this with Langchain or similar frameworks in Node.js.\n", - "\n", - "I should look for:\n", - "- How to download files from GCS bucket into memory buffers in Node.js.\n", - "- How to parse PDF files from buffers in Node.js (e.g., pdf-parse, pdfjs-dist, or other libraries).\n", - "- Any existing examples or best practices for this workflow.\n", - "- Whether Langchain Node.js has any recommended approach or community solutions for this.\n", - "\n", - "This will help provide a clear, practical approach to the user’s problem.\n", - "Search Queries: ['download file from Google Cloud Storage bucket into memory Node.js', 'read PDF file from buffer in Node.js', 'extract text from PDF buffer Node.js', 'pdf parsing libraries Node.js', 'Langchain Node.js load PDF from memory', 'best way to read PDF content from GCS bucket Node.js', 'google cloud storage download file to buffer Node.js', 'pdf-parse example Node.js buffer', 'how to extract text from PDF without saving to disk Node.js']\n", - "\n", - "Question: Hello i am trying to run this following code but i am getting an error;\n", - "from langchain.schema import BaseOuputParser\n", - "\n", - "Error;\n", - "\n", - "ImportError: cannot import name 'BaseOuputParser' from\n", - "'langchain.schema'\n", - "\n", - "My langchain version is ; '0.1.7'\n", - "\n", - "Reasoning: Let's think step by step in order to The error message indicates that the code is trying to import 'BaseOuputParser' from 'langchain.schema', but the import fails. The likely cause is a typo in the class name: 'BaseOuputParser' should probably be 'BaseOutputParser' (missing a 't' in 'Output'). Additionally, the user is using langchain version 0.1.7, which is an early version and may not have this class or may have different module structures. To resolve this, I need to check the correct class name and its availability in langchain version 0.1.7, and also verify the correct import path. Searching for the correct spelling, usage, and version compatibility of 'BaseOutputParser' in langchain will help answer the question.\n", - "Search Queries: ['langchain BaseOutputParser import error', 'langchain BaseOutputParser class usage', 'langchain schema BaseOutputParser version 0.1.7', 'langchain changelog BaseOutputParser', 'langchain 0.1.7 available classes', 'langchain BaseOutputParser typo BaseOuputParser', 'how to import BaseOutputParser in langchain', 'langchain schema module classes', 'langchain version 0.1.7 documentation']\n", - "\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pydantic/main.py:453: UserWarning: Pydantic serializer warnings:\n", - " PydanticSerializationUnexpectedValue(Expected 9 fields but got 6: Expected `Message` - serialized value may not be as expected [input_value=Message(content=\"[[ ## pr...: None}, annotations=[]), input_type=Message])\n", - " PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [input_value=Choices(finish_reason='st...ider_specific_fields={}), input_type=Choices])\n", - " return self.__pydantic_serializer__.to_python(\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "\n", - "\n", - "\n", - "\u001b[34m[2025-07-24T21:21:47.917092]\u001b[0m\n", - "\n", - "\u001b[31mSystem message:\u001b[0m\n", - "\n", - "Your input fields are:\n", - "1. `dataset_description` (str): A description of the dataset that we are using.\n", - "2. `task_demos` (str): Example inputs/outputs of our module.\n", - "3. `basic_instruction` (str): Basic instruction.\n", - "Your output fields are:\n", - "1. `proposed_instruction` (str): Propose an instruction that will be used to prompt a Language Model to perform this task.\n", - "All interactions will be structured in the following way, with the appropriate values filled in.\n", - "\n", - "[[ ## dataset_description ## ]]\n", - "{dataset_description}\n", - "\n", - "[[ ## task_demos ## ]]\n", - "{task_demos}\n", - "\n", - "[[ ## basic_instruction ## ]]\n", - "{basic_instruction}\n", - "\n", - "[[ ## proposed_instruction ## ]]\n", - "{proposed_instruction}\n", - "\n", - "[[ ## completed ## ]]\n", - "In adhering to this structure, your objective is: \n", - " Use the information below to learn about a task that we are trying to solve using calls to an LM, then generate a new instruction that will be used to prompt a Language Model to better solve the task.\n", - "\n", - "\n", - "\u001b[31mUser message:\u001b[0m\n", - "\n", - "[[ ## dataset_description ## ]]\n", - "The dataset is centered on technical programming questions addressing code errors and functionality issues, especially involving libraries like Langchain, Neo4j, Milvus, and Google Cloud Storage. It features detailed problem descriptions paired with concise solution nuggets, aimed at debugging and enhancing code related to data processing, vector stores, graph databases, and language model integrations. The examples often include code snippets and error messages to provide context for precise, context-aware troubleshooting guidance.\n", - "\n", - "[[ ## task_demos ## ]]\n", - "Question: I Am trying to download a PDF file from a GCS storage bucket and read the content into memory.\n", - "When using Langchain with python, i can just use the GCSDirectoryLoader to read all the files in a bucket and the pdf text.\n", - "Langchain for NodeJs doesnt have GCSDirectoryLoader or a webloader for PDF files.\n", - "When downloading the file, i get a Document with the binary representation as content.\n", - "What is the best way to download pdf content from GCS bucket into memory?\n", - "\n", - "Reasoning: Let's think step by step in order to The user wants to download a PDF file from a Google Cloud Storage (GCS) bucket and read its content into memory using Node.js. They mention that Langchain's Python version has a GCSDirectoryLoader that can read PDFs directly, but the Node.js version lacks this feature. When they download the file, they get a Document with binary content, but they want to extract the text content from the PDF in memory.\n", - "\n", - "To answer this, I need to find the best approach or libraries in Node.js to:\n", - "1. Download a PDF file from a GCS bucket into memory (not just to disk).\n", - "2. Parse or extract text content from the PDF binary data in memory.\n", - "\n", - "This likely involves using the official Google Cloud Storage Node.js client to download the file as a buffer, then using a PDF parsing library (like pdf-parse, pdfjs-dist, or similar) to extract text from the buffer.\n", - "\n", - "I should search for:\n", - "- How to download a file from GCS bucket into memory using Node.js.\n", - "- How to extract text from a PDF buffer in Node.js.\n", - "- Examples or best practices combining these two steps.\n", - "- Possibly if there are any Langchain Node.js community solutions or recommended approaches for PDF loading from GCS.\n", - "Search Queries: ['download file from Google Cloud Storage bucket into memory Node.js', 'extract text from PDF buffer Node.js', 'parse PDF from buffer Node.js pdf-parse', 'Google Cloud Storage Node.js download file as buffer', 'Node.js read PDF content from memory', 'Langchain Node.js load PDF from GCS', 'best way to read PDF content from GCS bucket Node.js']\n", - "\n", - "Question: I Am trying to download a PDF file from a GCS storage bucket and read the content into memory.\n", - "When using Langchain with python, i can just use the GCSDirectoryLoader to read all the files in a bucket and the pdf text.\n", - "Langchain for NodeJs doesnt have GCSDirectoryLoader or a webloader for PDF files.\n", - "When downloading the file, i get a Document with the binary representation as content.\n", - "What is the best way to download pdf content from GCS bucket into memory?\n", - "\n", - "Reasoning: Let's think step by step in order to The user wants to download a PDF file from a Google Cloud Storage (GCS) bucket into memory using Node.js, and then read the PDF content as text. They mention that Langchain for Python has a GCSDirectoryLoader that can read PDFs directly, but the Node.js version lacks this feature. When downloading the file in Node.js, they get a Document with binary content, but they want to extract the text from the PDF in memory without saving it to disk.\n", - "\n", - "To answer this, I need to find the best practices or libraries for:\n", - "1. Downloading a file from GCS bucket into memory in Node.js.\n", - "2. Parsing or extracting text from a PDF file in memory in Node.js.\n", - "3. Possibly integrating this with Langchain or similar frameworks in Node.js.\n", - "\n", - "I should look for:\n", - "- How to download files from GCS bucket into memory buffers in Node.js.\n", - "- How to parse PDF files from buffers in Node.js (e.g., pdf-parse, pdfjs-dist, or other libraries).\n", - "- Any existing examples or best practices for this workflow.\n", - "- Whether Langchain Node.js has any recommended approach or community solutions for this.\n", - "\n", - "This will help provide a clear, practical approach to the user’s problem.\n", - "Search Queries: ['download file from Google Cloud Storage bucket into memory Node.js', 'read PDF file from buffer in Node.js', 'extract text from PDF buffer Node.js', 'pdf parsing libraries Node.js', 'Langchain Node.js load PDF from memory', 'best way to read PDF content from GCS bucket Node.js', 'google cloud storage download file to buffer Node.js', 'pdf-parse example Node.js buffer', 'how to extract text from PDF without saving to disk Node.js']\n", - "\n", - "Question: Hello i am trying to run this following code but i am getting an error;\n", - "from langchain.schema import BaseOuputParser\n", - "\n", - "Error;\n", - "\n", - "ImportError: cannot import name 'BaseOuputParser' from\n", - "'langchain.schema'\n", - "\n", - "My langchain version is ; '0.1.7'\n", - "\n", - "Reasoning: Let's think step by step in order to The error message indicates that the code is trying to import 'BaseOuputParser' from 'langchain.schema', but the import fails. The likely cause is a typo in the class name: 'BaseOuputParser' should probably be 'BaseOutputParser' (missing a 't' in 'Output'). Additionally, the user is using langchain version 0.1.7, which is an early version and may not have this class or may have different module structures. To resolve this, I need to check the correct class name and its availability in langchain version 0.1.7, and also verify the correct import path. Searching for the correct spelling, usage, and version compatibility of 'BaseOutputParser' in langchain will help answer the question.\n", - "Search Queries: ['langchain BaseOutputParser import error', 'langchain BaseOutputParser class usage', 'langchain schema BaseOutputParser version 0.1.7', 'langchain changelog BaseOutputParser', 'langchain 0.1.7 available classes', 'langchain BaseOutputParser typo BaseOuputParser', 'how to import BaseOutputParser in langchain', 'langchain schema module classes', 'langchain version 0.1.7 documentation']\n", - "\n", - "\n", - "\n", - "[[ ## basic_instruction ## ]]\n", - "Write search queries to gather information from a search engine that will help answer the question.\n", - "Consider both exploration and result diversity to capture multiple interpretations and facets of a query.\n", - "\n", - "Respond with the corresponding output fields, starting with the field `[[ ## proposed_instruction ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`.\n", - "\n", - "\n", - "\u001b[31mResponse:\u001b[0m\n", - "\n", - "\u001b[32m[[ ## proposed_instruction ## ]]\n", - "Given a technical programming question involving code errors or functionality issues—often with libraries like Langchain, Neo4j, Milvus, or Google Cloud Storage—generate a diverse set of well-crafted search queries that will help gather relevant information to resolve the problem. Your queries should cover multiple facets of the issue, including error messages, version compatibility, usage examples, best practices, and alternative approaches. Aim to explore both direct solutions and related contextual knowledge to ensure comprehensive coverage. Include queries about specific functions, classes, libraries, integration methods, and known limitations or updates. Tailor the queries to the programming language and environment specified in the question, and focus on extracting actionable insights for debugging or enhancing the code.\n", - "\n", - "[[ ## completed ## ]]\u001b[0m\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "PROPOSED INSTRUCTION: Given a technical programming question involving code errors or functionality issues—often with libraries like Langchain, Neo4j, Milvus, or Google Cloud Storage—generate a diverse set of well-crafted search queries that will help gather relevant information to resolve the problem. Your queries should cover multiple facets of the issue, including error messages, version compatibility, usage examples, best practices, and alternative approaches. Aim to explore both direct solutions and related contextual knowledge to ensure comprehensive coverage. Include queries about specific functions, classes, libraries, integration methods, and known limitations or updates. Tailor the queries to the programming language and environment specified in the question, and focus on extracting actionable insights for debugging or enhancing the code.\n", - "Using a randomly generated configuration for our grounded proposer.\n", - "Selected tip: description\n", - "task_demos Question: I Am trying to download a PDF file from a GCS storage bucket and read the content into memory.\n", - "When using Langchain with python, i can just use the GCSDirectoryLoader to read all the files in a bucket and the pdf text.\n", - "Langchain for NodeJs doesnt have GCSDirectoryLoader or a webloader for PDF files.\n", - "When downloading the file, i get a Document with the binary representation as content.\n", - "What is the best way to download pdf content from GCS bucket into memory?\n", - "\n", - "Reasoning: Let's think step by step in order to The user wants to download a PDF file from a Google Cloud Storage (GCS) bucket into memory using Node.js, and then read the PDF content as text. They mention that Langchain for Python has a GCSDirectoryLoader that can read PDFs directly, but the Node.js version lacks this feature. When downloading the file in Node.js, they get a Document with binary content, but they want to extract the text from the PDF in memory without saving it to disk.\n", - "\n", - "To answer this, I need to find the best practices or libraries for:\n", - "1. Downloading a file from GCS bucket into memory in Node.js.\n", - "2. Parsing or extracting text from a PDF file in memory in Node.js.\n", - "3. Possibly integrating this with Langchain or similar frameworks in Node.js.\n", - "\n", - "I should look for:\n", - "- How to download files from GCS bucket into memory buffers in Node.js.\n", - "- How to parse PDF files from buffers in Node.js (e.g., pdf-parse, pdfjs-dist, or other libraries).\n", - "- Any existing examples or best practices for this workflow.\n", - "- Whether Langchain Node.js has any recommended approach or community solutions for this.\n", - "\n", - "This will help provide a clear, practical approach to the user’s problem.\n", - "Search Queries: ['download file from Google Cloud Storage bucket into memory Node.js', 'read PDF file from buffer in Node.js', 'extract text from PDF buffer Node.js', 'pdf parsing libraries Node.js', 'Langchain Node.js load PDF from memory', 'best way to read PDF content from GCS bucket Node.js', 'google cloud storage download file to buffer Node.js', 'pdf-parse example Node.js buffer', 'how to extract text from PDF without saving to disk Node.js']\n", - "\n", - "Question: Hello i am trying to run this following code but i am getting an error;\n", - "from langchain.schema import BaseOuputParser\n", - "\n", - "Error;\n", - "\n", - "ImportError: cannot import name 'BaseOuputParser' from\n", - "'langchain.schema'\n", - "\n", - "My langchain version is ; '0.1.7'\n", - "\n", - "Reasoning: Let's think step by step in order to The error message indicates that the code is trying to import 'BaseOuputParser' from 'langchain.schema', but the import fails. The likely cause is a typo in the class name: 'BaseOuputParser' should probably be 'BaseOutputParser' (missing a 't' in 'Output'). Additionally, the user is using langchain version 0.1.7, which is an early version and may not have this class or may have different module structures. To resolve this, I need to check the correct class name and its availability in langchain version 0.1.7, and also verify the correct import path. Searching for the correct spelling, usage, and version compatibility of 'BaseOutputParser' in langchain will help answer the question.\n", - "Search Queries: ['langchain BaseOutputParser import error', 'langchain BaseOutputParser class usage', 'langchain schema BaseOutputParser version 0.1.7', 'langchain changelog BaseOutputParser', 'langchain 0.1.7 available classes', 'langchain BaseOutputParser typo BaseOuputParser', 'how to import BaseOutputParser in langchain', 'langchain schema module classes', 'langchain version 0.1.7 documentation']\n", - "\n", - "Question: Am trying to create vector stores on top of my existing KG using from_existing_graph, (followed tomaz and Saurav Joshi neo4j blog posts) - this method is allowing me to create embedding/vector index only for single label due to which am unable to get desired results while asking NLQ (I am assuming though).\n", - "below code is able to answer, the age and location of Oliver but not what he directed,\n", - "i believe this is due to from_existing_graph has only to pass single label and its corresponding properties as option for generating embeddings and vector index\n", - "Any ideas, how to achieve this?\n", - "import os\n", - "import re\n", - "from langchain.vectorstores.neo4j_vector import Neo4jVector\n", - "# from langchain.document_loaders import WikipediaLoader\n", - "from langchain_openai import OpenAIEmbeddings\n", - "# from langchain.text_splitter import CharacterTextSplitter, RecursiveCharacterTextSplitter\n", - "from langchain.graphs import Neo4jGraph\n", - "import openai\n", - "# from transformers import AutoModelForSeq2SeqLM, AutoTokenizer\n", - "\n", - "os.environ[\"OPENAI_API_KEY\"] = \"sk-xx\"\n", - "url = \"neo4j+s://xxxx.databases.neo4j.io\"\n", - "username = \"neo4j\"\n", - "password = \"mypassword\"\n", - "existing_graph = Neo4jVector.from_existing_graph(\n", - " embedding=OpenAIEmbeddings(),\n", - " url=url,\n", - " username=username,\n", - " password=password,\n", - " index_name=\"person\",\n", - " node_label=\"Person\",\n", - " text_node_properties=[\"name\", \"age\", \"location\"],\n", - " embedding_node_property=\"embedding\",\n", - ")\n", - "\n", - "from langchain.chat_models import ChatOpenAI\n", - "from langchain.chains import GraphCypherQAChain\n", - "from langchain.graphs import Neo4jGraph\n", - "\n", - "graph = Neo4jGraph(\n", - " url=url, username=username, password=password\n", - ")\n", - "\n", - "chain = GraphCypherQAChain.from_llm(\n", - " ChatOpenAI(temperature=0), graph=graph, verbose=True\n", - ")\n", - "\n", - "query = \"Where does Oliver Stone live?\"\n", - "#query = \"Name some films directed by Oliver Stone?\" \n", - "\n", - "graph_result = chain.invoke(query)\n", - "\n", - "vector_results = existing_graph.similarity_search(query, k=1)\n", - "for i, res in enumerate(vector_results):\n", - " print(res.page_content)\n", - " if i != len(vector_results)-1:\n", - " print()\n", - "vector_result = vector_results[0].page_content\n", - "\n", - "# Construct prompt for OpenAI\n", - "final_prompt = f\"\"\"You are a helpful question-answering agent. Your task is to analyze\n", - "and synthesize information from two sources: the top result from a similarity search\n", - "(unstructured information) and relevant data from a graph database (structured information).\n", - "Given the user's query: {query}, provide a meaningful and efficient answer based\n", - "on the insights derived from the following data:\n", - "\n", - "Unstructured information: {vector_result}.\n", - "Structured information: {graph_result} \"\"\"\n", - "\n", - "\n", - "from openai import OpenAI\n", - "client = OpenAI(\n", - " # This is the default and can be omitted\n", - " api_key=os.environ.get(\"OPENAI_API_KEY\"),\n", - ")\n", - "\n", - "chat_completion = client.chat.completions.create(messages=[{\"role\": \"user\",\"content\": final_prompt, }],model=\"gpt-3.5-turbo\",)\n", - "\n", - "answer = chat_completion.choices[0].message.content.strip()\n", - "print(answer)\n", - "\n", - "Any help would be highly appreicated?\n", - "here is my schema:\n", - "Node properties are the following:\n", - "Person {name: STRING, embedding: LIST, age: INTEGER, location: STRING},Actor {name: STRING, embedding: LIST},Movie {title: STRING},Director {name: STRING, embedding: LIST, age: INTEGER, location: STRING}\n", - "Relationship properties are the following:\n", - "ACTED_IN {role: STRING}\n", - "The relationships are the following:\n", - "(:Person)-[:ACTED_IN]->(:Movie),(:Person)-[:DIRECTED]->(:Movie),(:Actor)-[:ACTED_IN]->(:Movie),(:Director)-[:DIRECTED]->(:Movie)\n", - "\n", - "Cypher used to create:\n", - "CREATE (charlie:Person:Actor {name: 'Charlie Sheen'})-[:ACTED_IN {role: 'Bud Fox'}]->(wallStreet:Movie {title: 'Wall Street'})<-[:DIRECTED]-(oliver:Person:Director {name: 'Oliver Stone'});\n", - "MATCH (n:Person {name: 'Oliver Stone'}) SET n.age = 30, n.location = \"New York\" RETURN n\n", - "\n", - "\n", - "Reasoning: Let's think step by step in order to The user is trying to create a vector store on top of an existing Neo4j knowledge graph using the from_existing_graph method from langchain's Neo4jVector class. However, this method currently supports creating embeddings and vector indexes only for a single node label at a time, which limits the ability to answer questions that require information spanning multiple labels or relationships, such as \"Name some films directed by Oliver Stone?\" The user suspects that this limitation is why their current setup can answer questions about properties of a single node (like age and location of Oliver Stone) but not about relationships (like what he directed).\n", - "\n", - "To address this, the user needs a way to create embeddings that incorporate multiple node labels and possibly relationships, or to combine multiple vector stores or indexes. Alternatively, they might need to customize the embedding generation to include relationship context or create a composite text representation that includes connected nodes and relationships before embedding.\n", - "\n", - "The search queries should focus on:\n", - "- How to create vector embeddings/indexes in Neo4j that span multiple node labels or include relationships.\n", - "- Best practices or examples of using langchain's Neo4jVector or similar tools for multi-label or multi-relationship embedding.\n", - "- Techniques for embedding knowledge graphs with multiple node types and relationships for natural language querying.\n", - "- Any existing workarounds or extensions to from_existing_graph to support multiple labels or richer graph context.\n", - "- Examples or tutorials on combining graph embeddings with vector stores for complex queries.\n", - "\n", - "This will help find solutions or workarounds to enable the user to create a vector store that supports richer queries involving multiple node labels and relationships.\n", - "Search Queries: ['Neo4jVector from_existing_graph multiple node labels embedding', 'embedding knowledge graph with multiple node types and relationships', 'langchain Neo4jVector create vector store for multiple labels', 'how to embed relationships in Neo4j vector store', 'creating vector embeddings for heterogeneous graphs Neo4j', 'best practices for multi-label graph embeddings Neo4j', 'combining multiple vector stores for Neo4j knowledge graph', 'examples of natural language querying on Neo4j with vector embeddings', 'extending langchain Neo4jVector for multi-label embeddings', 'embedding graph relationships for question answering Neo4j']\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\u001b[34m[2025-07-24T21:21:52.296249]\u001b[0m\n", - "\n", - "\u001b[31mSystem message:\u001b[0m\n", - "\n", - "Your input fields are:\n", - "1. `dataset_description` (str): A description of the dataset that we are using.\n", - "2. `task_demos` (str): Example inputs/outputs of our module.\n", - "3. `basic_instruction` (str): Basic instruction.\n", - "4. `tip` (str): A suggestion for how to go about generating the new instruction.\n", - "Your output fields are:\n", - "1. `proposed_instruction` (str): Propose an instruction that will be used to prompt a Language Model to perform this task.\n", - "All interactions will be structured in the following way, with the appropriate values filled in.\n", - "\n", - "[[ ## dataset_description ## ]]\n", - "{dataset_description}\n", - "\n", - "[[ ## task_demos ## ]]\n", - "{task_demos}\n", - "\n", - "[[ ## basic_instruction ## ]]\n", - "{basic_instruction}\n", - "\n", - "[[ ## tip ## ]]\n", - "{tip}\n", - "\n", - "[[ ## proposed_instruction ## ]]\n", - "{proposed_instruction}\n", - "\n", - "[[ ## completed ## ]]\n", - "In adhering to this structure, your objective is: \n", - " Use the information below to learn about a task that we are trying to solve using calls to an LM, then generate a new instruction that will be used to prompt a Language Model to better solve the task.\n", - "\n", - "\n", - "\u001b[31mUser message:\u001b[0m\n", - "\n", - "[[ ## dataset_description ## ]]\n", - "The dataset is centered on technical programming questions addressing code errors and functionality issues, especially involving libraries like Langchain, Neo4j, Milvus, and Google Cloud Storage. It features detailed problem descriptions paired with concise solution nuggets, aimed at debugging and enhancing code related to data processing, vector stores, graph databases, and language model integrations. The examples often include code snippets and error messages to provide context for precise, context-aware troubleshooting guidance.\n", - "\n", - "[[ ## task_demos ## ]]\n", - "Question: I Am trying to download a PDF file from a GCS storage bucket and read the content into memory.\n", - "When using Langchain with python, i can just use the GCSDirectoryLoader to read all the files in a bucket and the pdf text.\n", - "Langchain for NodeJs doesnt have GCSDirectoryLoader or a webloader for PDF files.\n", - "When downloading the file, i get a Document with the binary representation as content.\n", - "What is the best way to download pdf content from GCS bucket into memory?\n", - "\n", - "Reasoning: Let's think step by step in order to The user wants to download a PDF file from a Google Cloud Storage (GCS) bucket into memory using Node.js, and then read the PDF content as text. They mention that Langchain for Python has a GCSDirectoryLoader that can read PDFs directly, but the Node.js version lacks this feature. When downloading the file in Node.js, they get a Document with binary content, but they want to extract the text from the PDF in memory without saving it to disk.\n", - "\n", - "To answer this, I need to find the best practices or libraries for:\n", - "1. Downloading a file from GCS bucket into memory in Node.js.\n", - "2. Parsing or extracting text from a PDF file in memory in Node.js.\n", - "3. Possibly integrating this with Langchain or similar frameworks in Node.js.\n", - "\n", - "I should look for:\n", - "- How to download files from GCS bucket into memory buffers in Node.js.\n", - "- How to parse PDF files from buffers in Node.js (e.g., pdf-parse, pdfjs-dist, or other libraries).\n", - "- Any existing examples or best practices for this workflow.\n", - "- Whether Langchain Node.js has any recommended approach or community solutions for this.\n", - "\n", - "This will help provide a clear, practical approach to the user’s problem.\n", - "Search Queries: ['download file from Google Cloud Storage bucket into memory Node.js', 'read PDF file from buffer in Node.js', 'extract text from PDF buffer Node.js', 'pdf parsing libraries Node.js', 'Langchain Node.js load PDF from memory', 'best way to read PDF content from GCS bucket Node.js', 'google cloud storage download file to buffer Node.js', 'pdf-parse example Node.js buffer', 'how to extract text from PDF without saving to disk Node.js']\n", - "\n", - "Question: Hello i am trying to run this following code but i am getting an error;\n", - "from langchain.schema import BaseOuputParser\n", - "\n", - "Error;\n", - "\n", - "ImportError: cannot import name 'BaseOuputParser' from\n", - "'langchain.schema'\n", - "\n", - "My langchain version is ; '0.1.7'\n", - "\n", - "Reasoning: Let's think step by step in order to The error message indicates that the code is trying to import 'BaseOuputParser' from 'langchain.schema', but the import fails. The likely cause is a typo in the class name: 'BaseOuputParser' should probably be 'BaseOutputParser' (missing a 't' in 'Output'). Additionally, the user is using langchain version 0.1.7, which is an early version and may not have this class or may have different module structures. To resolve this, I need to check the correct class name and its availability in langchain version 0.1.7, and also verify the correct import path. Searching for the correct spelling, usage, and version compatibility of 'BaseOutputParser' in langchain will help answer the question.\n", - "Search Queries: ['langchain BaseOutputParser import error', 'langchain BaseOutputParser class usage', 'langchain schema BaseOutputParser version 0.1.7', 'langchain changelog BaseOutputParser', 'langchain 0.1.7 available classes', 'langchain BaseOutputParser typo BaseOuputParser', 'how to import BaseOutputParser in langchain', 'langchain schema module classes', 'langchain version 0.1.7 documentation']\n", - "\n", - "Question: Am trying to create vector stores on top of my existing KG using from_existing_graph, (followed tomaz and Saurav Joshi neo4j blog posts) - this method is allowing me to create embedding/vector index only for single label due to which am unable to get desired results while asking NLQ (I am assuming though).\n", - "below code is able to answer, the age and location of Oliver but not what he directed,\n", - "i believe this is due to from_existing_graph has only to pass single label and its corresponding properties as option for generating embeddings and vector index\n", - "Any ideas, how to achieve this?\n", - "import os\n", - "import re\n", - "from langchain.vectorstores.neo4j_vector import Neo4jVector\n", - "# from langchain.document_loaders import WikipediaLoader\n", - "from langchain_openai import OpenAIEmbeddings\n", - "# from langchain.text_splitter import CharacterTextSplitter, RecursiveCharacterTextSplitter\n", - "from langchain.graphs import Neo4jGraph\n", - "import openai\n", - "# from transformers import AutoModelForSeq2SeqLM, AutoTokenizer\n", - "\n", - "os.environ[\"OPENAI_API_KEY\"] = \"sk-xx\"\n", - "url = \"neo4j+s://xxxx.databases.neo4j.io\"\n", - "username = \"neo4j\"\n", - "password = \"mypassword\"\n", - "existing_graph = Neo4jVector.from_existing_graph(\n", - " embedding=OpenAIEmbeddings(),\n", - " url=url,\n", - " username=username,\n", - " password=password,\n", - " index_name=\"person\",\n", - " node_label=\"Person\",\n", - " text_node_properties=[\"name\", \"age\", \"location\"],\n", - " embedding_node_property=\"embedding\",\n", - ")\n", - "\n", - "from langchain.chat_models import ChatOpenAI\n", - "from langchain.chains import GraphCypherQAChain\n", - "from langchain.graphs import Neo4jGraph\n", - "\n", - "graph = Neo4jGraph(\n", - " url=url, username=username, password=password\n", - ")\n", - "\n", - "chain = GraphCypherQAChain.from_llm(\n", - " ChatOpenAI(temperature=0), graph=graph, verbose=True\n", - ")\n", - "\n", - "query = \"Where does Oliver Stone live?\"\n", - "#query = \"Name some films directed by Oliver Stone?\" \n", - "\n", - "graph_result = chain.invoke(query)\n", - "\n", - "vector_results = existing_graph.similarity_search(query, k=1)\n", - "for i, res in enumerate(vector_results):\n", - " print(res.page_content)\n", - " if i != len(vector_results)-1:\n", - " print()\n", - "vector_result = vector_results[0].page_content\n", - "\n", - "# Construct prompt for OpenAI\n", - "final_prompt = f\"\"\"You are a helpful question-answering agent. Your task is to analyze\n", - "and synthesize information from two sources: the top result from a similarity search\n", - "(unstructured information) and relevant data from a graph database (structured information).\n", - "Given the user's query: {query}, provide a meaningful and efficient answer based\n", - "on the insights derived from the following data:\n", - "\n", - "Unstructured information: {vector_result}.\n", - "Structured information: {graph_result} \"\"\"\n", - "\n", - "\n", - "from openai import OpenAI\n", - "client = OpenAI(\n", - " # This is the default and can be omitted\n", - " api_key=os.environ.get(\"OPENAI_API_KEY\"),\n", - ")\n", - "\n", - "chat_completion = client.chat.completions.create(messages=[{\"role\": \"user\",\"content\": final_prompt, }],model=\"gpt-3.5-turbo\",)\n", - "\n", - "answer = chat_completion.choices[0].message.content.strip()\n", - "print(answer)\n", - "\n", - "Any help would be highly appreicated?\n", - "here is my schema:\n", - "Node properties are the following:\n", - "Person {name: STRING, embedding: LIST, age: INTEGER, location: STRING},Actor {name: STRING, embedding: LIST},Movie {title: STRING},Director {name: STRING, embedding: LIST, age: INTEGER, location: STRING}\n", - "Relationship properties are the following:\n", - "ACTED_IN {role: STRING}\n", - "The relationships are the following:\n", - "(:Person)-[:ACTED_IN]->(:Movie),(:Person)-[:DIRECTED]->(:Movie),(:Actor)-[:ACTED_IN]->(:Movie),(:Director)-[:DIRECTED]->(:Movie)\n", - "\n", - "Cypher used to create:\n", - "CREATE (charlie:Person:Actor {name: 'Charlie Sheen'})-[:ACTED_IN {role: 'Bud Fox'}]->(wallStreet:Movie {title: 'Wall Street'})<-[:DIRECTED]-(oliver:Person:Director {name: 'Oliver Stone'});\n", - "MATCH (n:Person {name: 'Oliver Stone'}) SET n.age = 30, n.location = \"New York\" RETURN n\n", - "\n", - "\n", - "Reasoning: Let's think step by step in order to The user is trying to create a vector store on top of an existing Neo4j knowledge graph using the from_existing_graph method from langchain's Neo4jVector class. However, this method currently supports creating embeddings and vector indexes only for a single node label at a time, which limits the ability to answer questions that require information spanning multiple labels or relationships, such as \"Name some films directed by Oliver Stone?\" The user suspects that this limitation is why their current setup can answer questions about properties of a single node (like age and location of Oliver Stone) but not about relationships (like what he directed).\n", - "\n", - "To address this, the user needs a way to create embeddings that incorporate multiple node labels and possibly relationships, or to combine multiple vector stores or indexes. Alternatively, they might need to customize the embedding generation to include relationship context or create a composite text representation that includes connected nodes and relationships before embedding.\n", - "\n", - "The search queries should focus on:\n", - "- How to create vector embeddings/indexes in Neo4j that span multiple node labels or include relationships.\n", - "- Best practices or examples of using langchain's Neo4jVector or similar tools for multi-label or multi-relationship embedding.\n", - "- Techniques for embedding knowledge graphs with multiple node types and relationships for natural language querying.\n", - "- Any existing workarounds or extensions to from_existing_graph to support multiple labels or richer graph context.\n", - "- Examples or tutorials on combining graph embeddings with vector stores for complex queries.\n", - "\n", - "This will help find solutions or workarounds to enable the user to create a vector store that supports richer queries involving multiple node labels and relationships.\n", - "Search Queries: ['Neo4jVector from_existing_graph multiple node labels embedding', 'embedding knowledge graph with multiple node types and relationships', 'langchain Neo4jVector create vector store for multiple labels', 'how to embed relationships in Neo4j vector store', 'creating vector embeddings for heterogeneous graphs Neo4j', 'best practices for multi-label graph embeddings Neo4j', 'combining multiple vector stores for Neo4j knowledge graph', 'examples of natural language querying on Neo4j with vector embeddings', 'extending langchain Neo4jVector for multi-label embeddings', 'embedding graph relationships for question answering Neo4j']\n", - "\n", - "\n", - "\n", - "[[ ## basic_instruction ## ]]\n", - "Write search queries to gather information from a search engine that will help answer the question.\n", - "Consider both exploration and result diversity to capture multiple interpretations and facets of a query.\n", - "\n", - "[[ ## tip ## ]]\n", - "Make sure your instruction is very informative and descriptive.\n", - "\n", - "Respond with the corresponding output fields, starting with the field `[[ ## proposed_instruction ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`.\n", - "\n", - "\n", - "\u001b[31mResponse:\u001b[0m\n", - "\n", - "\u001b[32m[[ ## proposed_instruction ## ]]\n", - "Given a technical programming question that involves debugging code errors, improving functionality, or integrating libraries such as Langchain, Neo4j, Milvus, or Google Cloud Storage, generate a comprehensive set of targeted search queries. These queries should be designed to explore multiple facets of the problem, including error diagnosis, usage of specific classes or methods, version compatibility, best practices for data processing workflows, and integration strategies for vector stores, graph databases, or language models. Your search queries should aim to cover both general and specific angles of the question to ensure thorough research and effective troubleshooting. Include queries that address potential typos, library limitations, alternative approaches, and relevant examples or tutorials. Ensure the queries are clear, technical, and focused on extracting actionable insights to resolve the user’s programming challenge.\n", - "\n", - "[[ ## completed ## ]]\u001b[0m\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "PROPOSED INSTRUCTION: Given a technical programming question that involves debugging code errors, improving functionality, or integrating libraries such as Langchain, Neo4j, Milvus, or Google Cloud Storage, generate a comprehensive set of targeted search queries. These queries should be designed to explore multiple facets of the problem, including error diagnosis, usage of specific classes or methods, version compatibility, best practices for data processing workflows, and integration strategies for vector stores, graph databases, or language models. Your search queries should aim to cover both general and specific angles of the question to ensure thorough research and effective troubleshooting. Include queries that address potential typos, library limitations, alternative approaches, and relevant examples or tutorials. Ensure the queries are clear, technical, and focused on extracting actionable insights to resolve the user’s programming challenge.\n", - "Using a randomly generated configuration for our grounded proposer.\n", - "Selected tip: creative\n", - "task_demos Question: Am trying to create vector stores on top of my existing KG using from_existing_graph, (followed tomaz and Saurav Joshi neo4j blog posts) - this method is allowing me to create embedding/vector index only for single label due to which am unable to get desired results while asking NLQ (I am assuming though).\n", - "below code is able to answer, the age and location of Oliver but not what he directed,\n", - "i believe this is due to from_existing_graph has only to pass single label and its corresponding properties as option for generating embeddings and vector index\n", - "Any ideas, how to achieve this?\n", - "import os\n", - "import re\n", - "from langchain.vectorstores.neo4j_vector import Neo4jVector\n", - "# from langchain.document_loaders import WikipediaLoader\n", - "from langchain_openai import OpenAIEmbeddings\n", - "# from langchain.text_splitter import CharacterTextSplitter, RecursiveCharacterTextSplitter\n", - "from langchain.graphs import Neo4jGraph\n", - "import openai\n", - "# from transformers import AutoModelForSeq2SeqLM, AutoTokenizer\n", - "\n", - "os.environ[\"OPENAI_API_KEY\"] = \"sk-xx\"\n", - "url = \"neo4j+s://xxxx.databases.neo4j.io\"\n", - "username = \"neo4j\"\n", - "password = \"mypassword\"\n", - "existing_graph = Neo4jVector.from_existing_graph(\n", - " embedding=OpenAIEmbeddings(),\n", - " url=url,\n", - " username=username,\n", - " password=password,\n", - " index_name=\"person\",\n", - " node_label=\"Person\",\n", - " text_node_properties=[\"name\", \"age\", \"location\"],\n", - " embedding_node_property=\"embedding\",\n", - ")\n", - "\n", - "from langchain.chat_models import ChatOpenAI\n", - "from langchain.chains import GraphCypherQAChain\n", - "from langchain.graphs import Neo4jGraph\n", - "\n", - "graph = Neo4jGraph(\n", - " url=url, username=username, password=password\n", - ")\n", - "\n", - "chain = GraphCypherQAChain.from_llm(\n", - " ChatOpenAI(temperature=0), graph=graph, verbose=True\n", - ")\n", - "\n", - "query = \"Where does Oliver Stone live?\"\n", - "#query = \"Name some films directed by Oliver Stone?\" \n", - "\n", - "graph_result = chain.invoke(query)\n", - "\n", - "vector_results = existing_graph.similarity_search(query, k=1)\n", - "for i, res in enumerate(vector_results):\n", - " print(res.page_content)\n", - " if i != len(vector_results)-1:\n", - " print()\n", - "vector_result = vector_results[0].page_content\n", - "\n", - "# Construct prompt for OpenAI\n", - "final_prompt = f\"\"\"You are a helpful question-answering agent. Your task is to analyze\n", - "and synthesize information from two sources: the top result from a similarity search\n", - "(unstructured information) and relevant data from a graph database (structured information).\n", - "Given the user's query: {query}, provide a meaningful and efficient answer based\n", - "on the insights derived from the following data:\n", - "\n", - "Unstructured information: {vector_result}.\n", - "Structured information: {graph_result} \"\"\"\n", - "\n", - "\n", - "from openai import OpenAI\n", - "client = OpenAI(\n", - " # This is the default and can be omitted\n", - " api_key=os.environ.get(\"OPENAI_API_KEY\"),\n", - ")\n", - "\n", - "chat_completion = client.chat.completions.create(messages=[{\"role\": \"user\",\"content\": final_prompt, }],model=\"gpt-3.5-turbo\",)\n", - "\n", - "answer = chat_completion.choices[0].message.content.strip()\n", - "print(answer)\n", - "\n", - "Any help would be highly appreicated?\n", - "here is my schema:\n", - "Node properties are the following:\n", - "Person {name: STRING, embedding: LIST, age: INTEGER, location: STRING},Actor {name: STRING, embedding: LIST},Movie {title: STRING},Director {name: STRING, embedding: LIST, age: INTEGER, location: STRING}\n", - "Relationship properties are the following:\n", - "ACTED_IN {role: STRING}\n", - "The relationships are the following:\n", - "(:Person)-[:ACTED_IN]->(:Movie),(:Person)-[:DIRECTED]->(:Movie),(:Actor)-[:ACTED_IN]->(:Movie),(:Director)-[:DIRECTED]->(:Movie)\n", - "\n", - "Cypher used to create:\n", - "CREATE (charlie:Person:Actor {name: 'Charlie Sheen'})-[:ACTED_IN {role: 'Bud Fox'}]->(wallStreet:Movie {title: 'Wall Street'})<-[:DIRECTED]-(oliver:Person:Director {name: 'Oliver Stone'});\n", - "MATCH (n:Person {name: 'Oliver Stone'}) SET n.age = 30, n.location = \"New York\" RETURN n\n", - "\n", - "\n", - "Reasoning: Let's think step by step in order to The user is trying to create a vector store on top of an existing Neo4j knowledge graph using the from_existing_graph method from langchain's Neo4jVector class. However, this method currently supports creating embeddings and vector indexes only for a single node label at a time, which limits the ability to answer questions that require information spanning multiple labels or relationships, such as \"Name some films directed by Oliver Stone?\" The user suspects that this limitation is why their current setup can answer questions about properties of a single node (like age and location of Oliver Stone) but not about relationships (like what he directed).\n", - "\n", - "To address this, the user needs a way to create embeddings that incorporate multiple node labels and possibly relationships, or to combine multiple vector stores or indexes. Alternatively, they might need to customize the embedding generation to include relationship context or create a composite text representation that includes connected nodes and relationships before embedding.\n", - "\n", - "The search queries should focus on:\n", - "- How to create vector embeddings/indexes in Neo4j that span multiple node labels or include relationships.\n", - "- Best practices or examples of using langchain's Neo4jVector or similar tools for multi-label or multi-relationship embedding.\n", - "- Techniques for embedding knowledge graphs with multiple node types and relationships for natural language querying.\n", - "- Any existing workarounds or extensions to from_existing_graph to support multiple labels or richer graph context.\n", - "- Examples or tutorials on combining graph embeddings with vector stores for complex queries.\n", - "\n", - "This will help find solutions or workarounds to enable the user to create a vector store that supports richer queries involving multiple node labels and relationships.\n", - "Search Queries: ['Neo4jVector from_existing_graph multiple node labels embedding', 'embedding knowledge graph with multiple node types and relationships', 'langchain Neo4jVector create vector store for multiple labels', 'how to embed relationships in Neo4j vector store', 'creating vector embeddings for heterogeneous graphs Neo4j', 'best practices for multi-label graph embeddings Neo4j', 'combining multiple vector stores for Neo4j knowledge graph', 'examples of natural language querying on Neo4j with vector embeddings', 'extending langchain Neo4jVector for multi-label embeddings', 'embedding graph relationships for question answering Neo4j']\n", - "\n", - "Question: I Am trying to download a PDF file from a GCS storage bucket and read the content into memory.\n", - "When using Langchain with python, i can just use the GCSDirectoryLoader to read all the files in a bucket and the pdf text.\n", - "Langchain for NodeJs doesnt have GCSDirectoryLoader or a webloader for PDF files.\n", - "When downloading the file, i get a Document with the binary representation as content.\n", - "What is the best way to download pdf content from GCS bucket into memory?\n", - "\n", - "Reasoning: Let's think step by step in order to The user wants to download a PDF file from a Google Cloud Storage (GCS) bucket and read its content into memory using Node.js. They mention that Langchain's Python version has a GCSDirectoryLoader that can read PDFs directly, but the Node.js version lacks this feature. When downloading the file in Node.js, they get a Document with the binary content, but they want to extract the text content from the PDF in memory without saving it to disk.\n", - "\n", - "To answer this, I need to find the best practices or libraries in Node.js for:\n", - "1. Downloading files from GCS buckets into memory.\n", - "2. Parsing PDF files from a binary buffer or stream in memory to extract text content.\n", - "\n", - "I should look for:\n", - "- How to download a file from GCS bucket into a buffer or stream in Node.js.\n", - "- How to parse PDF content from a buffer or stream in Node.js (e.g., using pdf-parse, pdfjs-dist, or other PDF parsing libraries).\n", - "- Any existing examples or best practices combining these two steps.\n", - "- Whether Langchain for Node.js has any recommended approach or community suggestions for this use case.\n", - "\n", - "This will help provide a clear, practical approach to download and read PDF content from GCS in Node.js.\n", - "Search Queries: ['download file from Google Cloud Storage bucket into memory Node.js', 'parse PDF from buffer Node.js', 'extract text from PDF buffer Node.js', 'pdf parsing libraries Node.js', 'Langchain Node.js load PDF from GCS', 'read PDF content from binary buffer Node.js', 'google cloud storage download file to buffer example Node.js', 'best way to read PDF content in memory Node.js']\n", - "\n", - "Question: Am trying to create vector stores on top of my existing KG using from_existing_graph, (followed tomaz and Saurav Joshi neo4j blog posts) - this method is allowing me to create embedding/vector index only for single label due to which am unable to get desired results while asking NLQ (I am assuming though).\n", - "below code is able to answer, the age and location of Oliver but not what he directed,\n", - "i believe this is due to from_existing_graph has only to pass single label and its corresponding properties as option for generating embeddings and vector index\n", - "Any ideas, how to achieve this?\n", - "import os\n", - "import re\n", - "from langchain.vectorstores.neo4j_vector import Neo4jVector\n", - "# from langchain.document_loaders import WikipediaLoader\n", - "from langchain_openai import OpenAIEmbeddings\n", - "# from langchain.text_splitter import CharacterTextSplitter, RecursiveCharacterTextSplitter\n", - "from langchain.graphs import Neo4jGraph\n", - "import openai\n", - "# from transformers import AutoModelForSeq2SeqLM, AutoTokenizer\n", - "\n", - "os.environ[\"OPENAI_API_KEY\"] = \"sk-xx\"\n", - "url = \"neo4j+s://xxxx.databases.neo4j.io\"\n", - "username = \"neo4j\"\n", - "password = \"mypassword\"\n", - "existing_graph = Neo4jVector.from_existing_graph(\n", - " embedding=OpenAIEmbeddings(),\n", - " url=url,\n", - " username=username,\n", - " password=password,\n", - " index_name=\"person\",\n", - " node_label=\"Person\",\n", - " text_node_properties=[\"name\", \"age\", \"location\"],\n", - " embedding_node_property=\"embedding\",\n", - ")\n", - "\n", - "from langchain.chat_models import ChatOpenAI\n", - "from langchain.chains import GraphCypherQAChain\n", - "from langchain.graphs import Neo4jGraph\n", - "\n", - "graph = Neo4jGraph(\n", - " url=url, username=username, password=password\n", - ")\n", - "\n", - "chain = GraphCypherQAChain.from_llm(\n", - " ChatOpenAI(temperature=0), graph=graph, verbose=True\n", - ")\n", - "\n", - "query = \"Where does Oliver Stone live?\"\n", - "#query = \"Name some films directed by Oliver Stone?\" \n", - "\n", - "graph_result = chain.invoke(query)\n", - "\n", - "vector_results = existing_graph.similarity_search(query, k=1)\n", - "for i, res in enumerate(vector_results):\n", - " print(res.page_content)\n", - " if i != len(vector_results)-1:\n", - " print()\n", - "vector_result = vector_results[0].page_content\n", - "\n", - "# Construct prompt for OpenAI\n", - "final_prompt = f\"\"\"You are a helpful question-answering agent. Your task is to analyze\n", - "and synthesize information from two sources: the top result from a similarity search\n", - "(unstructured information) and relevant data from a graph database (structured information).\n", - "Given the user's query: {query}, provide a meaningful and efficient answer based\n", - "on the insights derived from the following data:\n", - "\n", - "Unstructured information: {vector_result}.\n", - "Structured information: {graph_result} \"\"\"\n", - "\n", - "\n", - "from openai import OpenAI\n", - "client = OpenAI(\n", - " # This is the default and can be omitted\n", - " api_key=os.environ.get(\"OPENAI_API_KEY\"),\n", - ")\n", - "\n", - "chat_completion = client.chat.completions.create(messages=[{\"role\": \"user\",\"content\": final_prompt, }],model=\"gpt-3.5-turbo\",)\n", - "\n", - "answer = chat_completion.choices[0].message.content.strip()\n", - "print(answer)\n", - "\n", - "Any help would be highly appreicated?\n", - "here is my schema:\n", - "Node properties are the following:\n", - "Person {name: STRING, embedding: LIST, age: INTEGER, location: STRING},Actor {name: STRING, embedding: LIST},Movie {title: STRING},Director {name: STRING, embedding: LIST, age: INTEGER, location: STRING}\n", - "Relationship properties are the following:\n", - "ACTED_IN {role: STRING}\n", - "The relationships are the following:\n", - "(:Person)-[:ACTED_IN]->(:Movie),(:Person)-[:DIRECTED]->(:Movie),(:Actor)-[:ACTED_IN]->(:Movie),(:Director)-[:DIRECTED]->(:Movie)\n", - "\n", - "Cypher used to create:\n", - "CREATE (charlie:Person:Actor {name: 'Charlie Sheen'})-[:ACTED_IN {role: 'Bud Fox'}]->(wallStreet:Movie {title: 'Wall Street'})<-[:DIRECTED]-(oliver:Person:Director {name: 'Oliver Stone'});\n", - "MATCH (n:Person {name: 'Oliver Stone'}) SET n.age = 30, n.location = \"New York\" RETURN n\n", - "\n", - "\n", - "Reasoning: Let's think step by step in order to The user is trying to create a vector store on top of an existing Neo4j knowledge graph using the `from_existing_graph` method from Langchain's Neo4jVector class. However, this method currently supports creating embeddings and vector indexes only for a single node label at a time, which limits the ability to answer questions involving multiple labels or relationships, such as \"What did Oliver Stone direct?\" The user wants to know how to extend or modify their approach to generate embeddings and vector indexes that cover multiple node labels and their relationships, so that natural language queries involving those relationships can be answered effectively.\n", - "\n", - "To address this, I need to find information on:\n", - "- Whether `from_existing_graph` supports multiple labels or how to work around this limitation.\n", - "- How to create embeddings for multiple node labels or combined properties in Neo4j using Langchain or other tools.\n", - "- Best practices or examples for building vector stores over heterogeneous graphs with multiple node types and relationships.\n", - "- How to incorporate relationship information (e.g., directed movies) into embeddings or vector indexes.\n", - "- Alternative approaches or custom implementations to generate embeddings for multiple labels and relationships in Neo4j.\n", - "\n", - "This will help the user understand how to extend their current setup to handle multi-label graphs and improve NLQ results.\n", - "Search Queries: ['langchain Neo4jVector from_existing_graph multiple labels support', 'create vector store embeddings for multiple node labels Neo4j', 'embedding multiple node types and relationships Neo4j Langchain', 'how to index heterogeneous graph with multiple labels in Neo4j vector store', 'best practices for vector embeddings on knowledge graphs with multiple node labels', 'Neo4j vector search embedding relationships and multiple labels', 'Langchain Neo4jVector create embeddings for multiple labels and properties', 'natural language querying knowledge graph multiple labels embeddings', 'examples of vector stores on Neo4j knowledge graph with multiple node labels', 'custom embedding generation for multi-label Neo4j graph Langchain']\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\u001b[34m[2025-07-24T21:21:55.751935]\u001b[0m\n", - "\n", - "\u001b[31mSystem message:\u001b[0m\n", - "\n", - "Your input fields are:\n", - "1. `dataset_description` (str): A description of the dataset that we are using.\n", - "2. `task_demos` (str): Example inputs/outputs of our module.\n", - "3. `basic_instruction` (str): Basic instruction.\n", - "4. `tip` (str): A suggestion for how to go about generating the new instruction.\n", - "Your output fields are:\n", - "1. `proposed_instruction` (str): Propose an instruction that will be used to prompt a Language Model to perform this task.\n", - "All interactions will be structured in the following way, with the appropriate values filled in.\n", - "\n", - "[[ ## dataset_description ## ]]\n", - "{dataset_description}\n", - "\n", - "[[ ## task_demos ## ]]\n", - "{task_demos}\n", - "\n", - "[[ ## basic_instruction ## ]]\n", - "{basic_instruction}\n", - "\n", - "[[ ## tip ## ]]\n", - "{tip}\n", - "\n", - "[[ ## proposed_instruction ## ]]\n", - "{proposed_instruction}\n", - "\n", - "[[ ## completed ## ]]\n", - "In adhering to this structure, your objective is: \n", - " Use the information below to learn about a task that we are trying to solve using calls to an LM, then generate a new instruction that will be used to prompt a Language Model to better solve the task.\n", - "\n", - "\n", - "\u001b[31mUser message:\u001b[0m\n", - "\n", - "[[ ## dataset_description ## ]]\n", - "The dataset is centered on technical programming questions addressing code errors and functionality issues, especially involving libraries like Langchain, Neo4j, Milvus, and Google Cloud Storage. It features detailed problem descriptions paired with concise solution nuggets, aimed at debugging and enhancing code related to data processing, vector stores, graph databases, and language model integrations. The examples often include code snippets and error messages to provide context for precise, context-aware troubleshooting guidance.\n", - "\n", - "[[ ## task_demos ## ]]\n", - "Question: Am trying to create vector stores on top of my existing KG using from_existing_graph, (followed tomaz and Saurav Joshi neo4j blog posts) - this method is allowing me to create embedding/vector index only for single label due to which am unable to get desired results while asking NLQ (I am assuming though).\n", - "below code is able to answer, the age and location of Oliver but not what he directed,\n", - "i believe this is due to from_existing_graph has only to pass single label and its corresponding properties as option for generating embeddings and vector index\n", - "Any ideas, how to achieve this?\n", - "import os\n", - "import re\n", - "from langchain.vectorstores.neo4j_vector import Neo4jVector\n", - "# from langchain.document_loaders import WikipediaLoader\n", - "from langchain_openai import OpenAIEmbeddings\n", - "# from langchain.text_splitter import CharacterTextSplitter, RecursiveCharacterTextSplitter\n", - "from langchain.graphs import Neo4jGraph\n", - "import openai\n", - "# from transformers import AutoModelForSeq2SeqLM, AutoTokenizer\n", - "\n", - "os.environ[\"OPENAI_API_KEY\"] = \"sk-xx\"\n", - "url = \"neo4j+s://xxxx.databases.neo4j.io\"\n", - "username = \"neo4j\"\n", - "password = \"mypassword\"\n", - "existing_graph = Neo4jVector.from_existing_graph(\n", - " embedding=OpenAIEmbeddings(),\n", - " url=url,\n", - " username=username,\n", - " password=password,\n", - " index_name=\"person\",\n", - " node_label=\"Person\",\n", - " text_node_properties=[\"name\", \"age\", \"location\"],\n", - " embedding_node_property=\"embedding\",\n", - ")\n", - "\n", - "from langchain.chat_models import ChatOpenAI\n", - "from langchain.chains import GraphCypherQAChain\n", - "from langchain.graphs import Neo4jGraph\n", - "\n", - "graph = Neo4jGraph(\n", - " url=url, username=username, password=password\n", - ")\n", - "\n", - "chain = GraphCypherQAChain.from_llm(\n", - " ChatOpenAI(temperature=0), graph=graph, verbose=True\n", - ")\n", - "\n", - "query = \"Where does Oliver Stone live?\"\n", - "#query = \"Name some films directed by Oliver Stone?\" \n", - "\n", - "graph_result = chain.invoke(query)\n", - "\n", - "vector_results = existing_graph.similarity_search(query, k=1)\n", - "for i, res in enumerate(vector_results):\n", - " print(res.page_content)\n", - " if i != len(vector_results)-1:\n", - " print()\n", - "vector_result = vector_results[0].page_content\n", - "\n", - "# Construct prompt for OpenAI\n", - "final_prompt = f\"\"\"You are a helpful question-answering agent. Your task is to analyze\n", - "and synthesize information from two sources: the top result from a similarity search\n", - "(unstructured information) and relevant data from a graph database (structured information).\n", - "Given the user's query: {query}, provide a meaningful and efficient answer based\n", - "on the insights derived from the following data:\n", - "\n", - "Unstructured information: {vector_result}.\n", - "Structured information: {graph_result} \"\"\"\n", - "\n", - "\n", - "from openai import OpenAI\n", - "client = OpenAI(\n", - " # This is the default and can be omitted\n", - " api_key=os.environ.get(\"OPENAI_API_KEY\"),\n", - ")\n", - "\n", - "chat_completion = client.chat.completions.create(messages=[{\"role\": \"user\",\"content\": final_prompt, }],model=\"gpt-3.5-turbo\",)\n", - "\n", - "answer = chat_completion.choices[0].message.content.strip()\n", - "print(answer)\n", - "\n", - "Any help would be highly appreicated?\n", - "here is my schema:\n", - "Node properties are the following:\n", - "Person {name: STRING, embedding: LIST, age: INTEGER, location: STRING},Actor {name: STRING, embedding: LIST},Movie {title: STRING},Director {name: STRING, embedding: LIST, age: INTEGER, location: STRING}\n", - "Relationship properties are the following:\n", - "ACTED_IN {role: STRING}\n", - "The relationships are the following:\n", - "(:Person)-[:ACTED_IN]->(:Movie),(:Person)-[:DIRECTED]->(:Movie),(:Actor)-[:ACTED_IN]->(:Movie),(:Director)-[:DIRECTED]->(:Movie)\n", - "\n", - "Cypher used to create:\n", - "CREATE (charlie:Person:Actor {name: 'Charlie Sheen'})-[:ACTED_IN {role: 'Bud Fox'}]->(wallStreet:Movie {title: 'Wall Street'})<-[:DIRECTED]-(oliver:Person:Director {name: 'Oliver Stone'});\n", - "MATCH (n:Person {name: 'Oliver Stone'}) SET n.age = 30, n.location = \"New York\" RETURN n\n", - "\n", - "\n", - "Reasoning: Let's think step by step in order to The user is trying to create a vector store on top of an existing Neo4j knowledge graph using the from_existing_graph method from langchain's Neo4jVector class. However, this method currently supports creating embeddings and vector indexes only for a single node label at a time, which limits the ability to answer questions that require information spanning multiple labels or relationships, such as \"Name some films directed by Oliver Stone?\" The user suspects that this limitation is why their current setup can answer questions about properties of a single node (like age and location of Oliver Stone) but not about relationships (like what he directed).\n", - "\n", - "To address this, the user needs a way to create embeddings that incorporate multiple node labels and possibly relationships, or to combine multiple vector stores or indexes. Alternatively, they might need to customize the embedding generation to include relationship context or create a composite text representation that includes connected nodes and relationships before embedding.\n", - "\n", - "The search queries should focus on:\n", - "- How to create vector embeddings/indexes in Neo4j that span multiple node labels or include relationships.\n", - "- Best practices or examples of using langchain's Neo4jVector or similar tools for multi-label or multi-relationship embedding.\n", - "- Techniques for embedding knowledge graphs with multiple node types and relationships for natural language querying.\n", - "- Any existing workarounds or extensions to from_existing_graph to support multiple labels or richer graph context.\n", - "- Examples or tutorials on combining graph embeddings with vector stores for complex queries.\n", - "\n", - "This will help find solutions or workarounds to enable the user to create a vector store that supports richer queries involving multiple node labels and relationships.\n", - "Search Queries: ['Neo4jVector from_existing_graph multiple node labels embedding', 'embedding knowledge graph with multiple node types and relationships', 'langchain Neo4jVector create vector store for multiple labels', 'how to embed relationships in Neo4j vector store', 'creating vector embeddings for heterogeneous graphs Neo4j', 'best practices for multi-label graph embeddings Neo4j', 'combining multiple vector stores for Neo4j knowledge graph', 'examples of natural language querying on Neo4j with vector embeddings', 'extending langchain Neo4jVector for multi-label embeddings', 'embedding graph relationships for question answering Neo4j']\n", - "\n", - "Question: I Am trying to download a PDF file from a GCS storage bucket and read the content into memory.\n", - "When using Langchain with python, i can just use the GCSDirectoryLoader to read all the files in a bucket and the pdf text.\n", - "Langchain for NodeJs doesnt have GCSDirectoryLoader or a webloader for PDF files.\n", - "When downloading the file, i get a Document with the binary representation as content.\n", - "What is the best way to download pdf content from GCS bucket into memory?\n", - "\n", - "Reasoning: Let's think step by step in order to The user wants to download a PDF file from a Google Cloud Storage (GCS) bucket and read its content into memory using Node.js. They mention that Langchain's Python version has a GCSDirectoryLoader that can read PDFs directly, but the Node.js version lacks this feature. When downloading the file in Node.js, they get a Document with the binary content, but they want to extract the text content from the PDF in memory without saving it to disk.\n", - "\n", - "To answer this, I need to find the best practices or libraries in Node.js for:\n", - "1. Downloading files from GCS buckets into memory.\n", - "2. Parsing PDF files from a binary buffer or stream in memory to extract text content.\n", - "\n", - "I should look for:\n", - "- How to download a file from GCS bucket into a buffer or stream in Node.js.\n", - "- How to parse PDF content from a buffer or stream in Node.js (e.g., using pdf-parse, pdfjs-dist, or other PDF parsing libraries).\n", - "- Any existing examples or best practices combining these two steps.\n", - "- Whether Langchain for Node.js has any recommended approach or community suggestions for this use case.\n", - "\n", - "This will help provide a clear, practical approach to download and read PDF content from GCS in Node.js.\n", - "Search Queries: ['download file from Google Cloud Storage bucket into memory Node.js', 'parse PDF from buffer Node.js', 'extract text from PDF buffer Node.js', 'pdf parsing libraries Node.js', 'Langchain Node.js load PDF from GCS', 'read PDF content from binary buffer Node.js', 'google cloud storage download file to buffer example Node.js', 'best way to read PDF content in memory Node.js']\n", - "\n", - "Question: Am trying to create vector stores on top of my existing KG using from_existing_graph, (followed tomaz and Saurav Joshi neo4j blog posts) - this method is allowing me to create embedding/vector index only for single label due to which am unable to get desired results while asking NLQ (I am assuming though).\n", - "below code is able to answer, the age and location of Oliver but not what he directed,\n", - "i believe this is due to from_existing_graph has only to pass single label and its corresponding properties as option for generating embeddings and vector index\n", - "Any ideas, how to achieve this?\n", - "import os\n", - "import re\n", - "from langchain.vectorstores.neo4j_vector import Neo4jVector\n", - "# from langchain.document_loaders import WikipediaLoader\n", - "from langchain_openai import OpenAIEmbeddings\n", - "# from langchain.text_splitter import CharacterTextSplitter, RecursiveCharacterTextSplitter\n", - "from langchain.graphs import Neo4jGraph\n", - "import openai\n", - "# from transformers import AutoModelForSeq2SeqLM, AutoTokenizer\n", - "\n", - "os.environ[\"OPENAI_API_KEY\"] = \"sk-xx\"\n", - "url = \"neo4j+s://xxxx.databases.neo4j.io\"\n", - "username = \"neo4j\"\n", - "password = \"mypassword\"\n", - "existing_graph = Neo4jVector.from_existing_graph(\n", - " embedding=OpenAIEmbeddings(),\n", - " url=url,\n", - " username=username,\n", - " password=password,\n", - " index_name=\"person\",\n", - " node_label=\"Person\",\n", - " text_node_properties=[\"name\", \"age\", \"location\"],\n", - " embedding_node_property=\"embedding\",\n", - ")\n", - "\n", - "from langchain.chat_models import ChatOpenAI\n", - "from langchain.chains import GraphCypherQAChain\n", - "from langchain.graphs import Neo4jGraph\n", - "\n", - "graph = Neo4jGraph(\n", - " url=url, username=username, password=password\n", - ")\n", - "\n", - "chain = GraphCypherQAChain.from_llm(\n", - " ChatOpenAI(temperature=0), graph=graph, verbose=True\n", - ")\n", - "\n", - "query = \"Where does Oliver Stone live?\"\n", - "#query = \"Name some films directed by Oliver Stone?\" \n", - "\n", - "graph_result = chain.invoke(query)\n", - "\n", - "vector_results = existing_graph.similarity_search(query, k=1)\n", - "for i, res in enumerate(vector_results):\n", - " print(res.page_content)\n", - " if i != len(vector_results)-1:\n", - " print()\n", - "vector_result = vector_results[0].page_content\n", - "\n", - "# Construct prompt for OpenAI\n", - "final_prompt = f\"\"\"You are a helpful question-answering agent. Your task is to analyze\n", - "and synthesize information from two sources: the top result from a similarity search\n", - "(unstructured information) and relevant data from a graph database (structured information).\n", - "Given the user's query: {query}, provide a meaningful and efficient answer based\n", - "on the insights derived from the following data:\n", - "\n", - "Unstructured information: {vector_result}.\n", - "Structured information: {graph_result} \"\"\"\n", - "\n", - "\n", - "from openai import OpenAI\n", - "client = OpenAI(\n", - " # This is the default and can be omitted\n", - " api_key=os.environ.get(\"OPENAI_API_KEY\"),\n", - ")\n", - "\n", - "chat_completion = client.chat.completions.create(messages=[{\"role\": \"user\",\"content\": final_prompt, }],model=\"gpt-3.5-turbo\",)\n", - "\n", - "answer = chat_completion.choices[0].message.content.strip()\n", - "print(answer)\n", - "\n", - "Any help would be highly appreicated?\n", - "here is my schema:\n", - "Node properties are the following:\n", - "Person {name: STRING, embedding: LIST, age: INTEGER, location: STRING},Actor {name: STRING, embedding: LIST},Movie {title: STRING},Director {name: STRING, embedding: LIST, age: INTEGER, location: STRING}\n", - "Relationship properties are the following:\n", - "ACTED_IN {role: STRING}\n", - "The relationships are the following:\n", - "(:Person)-[:ACTED_IN]->(:Movie),(:Person)-[:DIRECTED]->(:Movie),(:Actor)-[:ACTED_IN]->(:Movie),(:Director)-[:DIRECTED]->(:Movie)\n", - "\n", - "Cypher used to create:\n", - "CREATE (charlie:Person:Actor {name: 'Charlie Sheen'})-[:ACTED_IN {role: 'Bud Fox'}]->(wallStreet:Movie {title: 'Wall Street'})<-[:DIRECTED]-(oliver:Person:Director {name: 'Oliver Stone'});\n", - "MATCH (n:Person {name: 'Oliver Stone'}) SET n.age = 30, n.location = \"New York\" RETURN n\n", - "\n", - "\n", - "Reasoning: Let's think step by step in order to The user is trying to create a vector store on top of an existing Neo4j knowledge graph using the `from_existing_graph` method from Langchain's Neo4jVector class. However, this method currently supports creating embeddings and vector indexes only for a single node label at a time, which limits the ability to answer questions involving multiple labels or relationships, such as \"What did Oliver Stone direct?\" The user wants to know how to extend or modify their approach to generate embeddings and vector indexes that cover multiple node labels and their relationships, so that natural language queries involving those relationships can be answered effectively.\n", - "\n", - "To address this, I need to find information on:\n", - "- Whether `from_existing_graph` supports multiple labels or how to work around this limitation.\n", - "- How to create embeddings for multiple node labels or combined properties in Neo4j using Langchain or other tools.\n", - "- Best practices or examples for building vector stores over heterogeneous graphs with multiple node types and relationships.\n", - "- How to incorporate relationship information (e.g., directed movies) into embeddings or vector indexes.\n", - "- Alternative approaches or custom implementations to generate embeddings for multiple labels and relationships in Neo4j.\n", - "\n", - "This will help the user understand how to extend their current setup to handle multi-label graphs and improve NLQ results.\n", - "Search Queries: ['langchain Neo4jVector from_existing_graph multiple labels support', 'create vector store embeddings for multiple node labels Neo4j', 'embedding multiple node types and relationships Neo4j Langchain', 'how to index heterogeneous graph with multiple labels in Neo4j vector store', 'best practices for vector embeddings on knowledge graphs with multiple node labels', 'Neo4j vector search embedding relationships and multiple labels', 'Langchain Neo4jVector create embeddings for multiple labels and properties', 'natural language querying knowledge graph multiple labels embeddings', 'examples of vector stores on Neo4j knowledge graph with multiple node labels', 'custom embedding generation for multi-label Neo4j graph Langchain']\n", - "\n", - "\n", - "\n", - "[[ ## basic_instruction ## ]]\n", - "Write search queries to gather information from a search engine that will help answer the question.\n", - "Consider both exploration and result diversity to capture multiple interpretations and facets of a query.\n", - "\n", - "[[ ## tip ## ]]\n", - "Don't be afraid to be creative when creating the new instruction!\n", - "\n", - "Respond with the corresponding output fields, starting with the field `[[ ## proposed_instruction ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`.\n", - "\n", - "\n", - "\u001b[31mResponse:\u001b[0m\n", - "\n", - "\u001b[32m[[ ## proposed_instruction ## ]]\n", - "Given a detailed technical programming question involving debugging, code functionality, or integration issues—especially those related to libraries such as Langchain, Neo4j, Milvus, or Google Cloud Storage—generate a diverse set of targeted search queries that will help gather relevant information from search engines. Your queries should cover multiple angles including troubleshooting specific error messages, best practices, alternative approaches, usage examples, and extensions or workarounds. Aim to capture both broad conceptual topics and precise technical details to ensure comprehensive coverage that supports an effective and context-aware solution.\n", - "\n", - "[[ ## completed ## ]]\u001b[0m\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "PROPOSED INSTRUCTION: Given a detailed technical programming question involving debugging, code functionality, or integration issues—especially those related to libraries such as Langchain, Neo4j, Milvus, or Google Cloud Storage—generate a diverse set of targeted search queries that will help gather relevant information from search engines. Your queries should cover multiple angles including troubleshooting specific error messages, best practices, alternative approaches, usage examples, and extensions or workarounds. Aim to capture both broad conceptual topics and precise technical details to ensure comprehensive coverage that supports an effective and context-aware solution.\n", - "Using a randomly generated configuration for our grounded proposer.\n", - "Selected tip: high_stakes\n", - "task_demos Question: Am trying to create vector stores on top of my existing KG using from_existing_graph, (followed tomaz and Saurav Joshi neo4j blog posts) - this method is allowing me to create embedding/vector index only for single label due to which am unable to get desired results while asking NLQ (I am assuming though).\n", - "below code is able to answer, the age and location of Oliver but not what he directed,\n", - "i believe this is due to from_existing_graph has only to pass single label and its corresponding properties as option for generating embeddings and vector index\n", - "Any ideas, how to achieve this?\n", - "import os\n", - "import re\n", - "from langchain.vectorstores.neo4j_vector import Neo4jVector\n", - "# from langchain.document_loaders import WikipediaLoader\n", - "from langchain_openai import OpenAIEmbeddings\n", - "# from langchain.text_splitter import CharacterTextSplitter, RecursiveCharacterTextSplitter\n", - "from langchain.graphs import Neo4jGraph\n", - "import openai\n", - "# from transformers import AutoModelForSeq2SeqLM, AutoTokenizer\n", - "\n", - "os.environ[\"OPENAI_API_KEY\"] = \"sk-xx\"\n", - "url = \"neo4j+s://xxxx.databases.neo4j.io\"\n", - "username = \"neo4j\"\n", - "password = \"mypassword\"\n", - "existing_graph = Neo4jVector.from_existing_graph(\n", - " embedding=OpenAIEmbeddings(),\n", - " url=url,\n", - " username=username,\n", - " password=password,\n", - " index_name=\"person\",\n", - " node_label=\"Person\",\n", - " text_node_properties=[\"name\", \"age\", \"location\"],\n", - " embedding_node_property=\"embedding\",\n", - ")\n", - "\n", - "from langchain.chat_models import ChatOpenAI\n", - "from langchain.chains import GraphCypherQAChain\n", - "from langchain.graphs import Neo4jGraph\n", - "\n", - "graph = Neo4jGraph(\n", - " url=url, username=username, password=password\n", - ")\n", - "\n", - "chain = GraphCypherQAChain.from_llm(\n", - " ChatOpenAI(temperature=0), graph=graph, verbose=True\n", - ")\n", - "\n", - "query = \"Where does Oliver Stone live?\"\n", - "#query = \"Name some films directed by Oliver Stone?\" \n", - "\n", - "graph_result = chain.invoke(query)\n", - "\n", - "vector_results = existing_graph.similarity_search(query, k=1)\n", - "for i, res in enumerate(vector_results):\n", - " print(res.page_content)\n", - " if i != len(vector_results)-1:\n", - " print()\n", - "vector_result = vector_results[0].page_content\n", - "\n", - "# Construct prompt for OpenAI\n", - "final_prompt = f\"\"\"You are a helpful question-answering agent. Your task is to analyze\n", - "and synthesize information from two sources: the top result from a similarity search\n", - "(unstructured information) and relevant data from a graph database (structured information).\n", - "Given the user's query: {query}, provide a meaningful and efficient answer based\n", - "on the insights derived from the following data:\n", - "\n", - "Unstructured information: {vector_result}.\n", - "Structured information: {graph_result} \"\"\"\n", - "\n", - "\n", - "from openai import OpenAI\n", - "client = OpenAI(\n", - " # This is the default and can be omitted\n", - " api_key=os.environ.get(\"OPENAI_API_KEY\"),\n", - ")\n", - "\n", - "chat_completion = client.chat.completions.create(messages=[{\"role\": \"user\",\"content\": final_prompt, }],model=\"gpt-3.5-turbo\",)\n", - "\n", - "answer = chat_completion.choices[0].message.content.strip()\n", - "print(answer)\n", - "\n", - "Any help would be highly appreicated?\n", - "here is my schema:\n", - "Node properties are the following:\n", - "Person {name: STRING, embedding: LIST, age: INTEGER, location: STRING},Actor {name: STRING, embedding: LIST},Movie {title: STRING},Director {name: STRING, embedding: LIST, age: INTEGER, location: STRING}\n", - "Relationship properties are the following:\n", - "ACTED_IN {role: STRING}\n", - "The relationships are the following:\n", - "(:Person)-[:ACTED_IN]->(:Movie),(:Person)-[:DIRECTED]->(:Movie),(:Actor)-[:ACTED_IN]->(:Movie),(:Director)-[:DIRECTED]->(:Movie)\n", - "\n", - "Cypher used to create:\n", - "CREATE (charlie:Person:Actor {name: 'Charlie Sheen'})-[:ACTED_IN {role: 'Bud Fox'}]->(wallStreet:Movie {title: 'Wall Street'})<-[:DIRECTED]-(oliver:Person:Director {name: 'Oliver Stone'});\n", - "MATCH (n:Person {name: 'Oliver Stone'}) SET n.age = 30, n.location = \"New York\" RETURN n\n", - "\n", - "\n", - "Reasoning: Let's think step by step in order to The user is trying to create a vector store on top of an existing Neo4j knowledge graph using the `from_existing_graph` method from Langchain's Neo4jVector class. However, this method currently supports creating embeddings and vector indexes only for a single node label at a time, which limits the ability to answer questions involving multiple labels or relationships, such as \"What did Oliver Stone direct?\" The user wants to know how to extend or modify their approach to generate embeddings and vector indexes that cover multiple node labels and their relationships, so that natural language queries involving those relationships can be answered effectively.\n", - "\n", - "To address this, I need to find information on:\n", - "- Whether `from_existing_graph` supports multiple labels or how to work around this limitation.\n", - "- How to create embeddings for multiple node labels or combined properties in Neo4j using Langchain or other tools.\n", - "- Best practices or examples for building vector stores over heterogeneous graphs with multiple node types and relationships.\n", - "- How to incorporate relationship information (e.g., directed movies) into embeddings or vector indexes.\n", - "- Alternative approaches or custom implementations to generate embeddings for multiple labels and relationships in Neo4j.\n", - "\n", - "This will help the user understand how to extend their current setup to handle multi-label graphs and improve NLQ results.\n", - "Search Queries: ['langchain Neo4jVector from_existing_graph multiple labels support', 'create vector store embeddings for multiple node labels Neo4j', 'embedding multiple node types and relationships Neo4j Langchain', 'how to index heterogeneous graph with multiple labels in Neo4j vector store', 'best practices for vector embeddings on knowledge graphs with multiple node labels', 'Neo4j vector search embedding relationships and multiple labels', 'Langchain Neo4jVector create embeddings for multiple labels and properties', 'natural language querying knowledge graph multiple labels embeddings', 'examples of vector stores on Neo4j knowledge graph with multiple node labels', 'custom embedding generation for multi-label Neo4j graph Langchain']\n", - "\n", - "Question: Hello i am trying to run this following code but i am getting an error;\n", - "from langchain.schema import BaseOuputParser\n", - "\n", - "Error;\n", - "\n", - "ImportError: cannot import name 'BaseOuputParser' from\n", - "'langchain.schema'\n", - "\n", - "My langchain version is ; '0.1.7'\n", - "\n", - "Reasoning: Let's think step by step in order to The error message indicates that the code is trying to import 'BaseOuputParser' from 'langchain.schema', but it cannot find this name. This could be due to a typo in the class name (likely 'BaseOutputParser' instead of 'BaseOuputParser'), or the class might not exist in the specified version of the langchain library (0.1.7). To resolve this, I need to verify the correct class name and check if it exists in langchain version 0.1.7. Additionally, I should look for any changes in the API or import paths in that version. Searching for the correct usage of output parsers in langchain 0.1.7 and any related documentation or changelogs will help clarify the issue.\n", - "Search Queries: ['langchain BaseOutputParser import error', 'langchain 0.1.7 BaseOutputParser', 'langchain schema BaseOutputParser usage', 'langchain output parser example 0.1.7', 'langchain changelog 0.1.7 output parser', 'langchain BaseOutputParser typo BaseOuputParser', 'langchain 0.1.7 documentation output parser']\n", - "\n", - "Question: Hello i am trying to run this following code but i am getting an error;\n", - "from langchain.schema import BaseOuputParser\n", - "\n", - "Error;\n", - "\n", - "ImportError: cannot import name 'BaseOuputParser' from\n", - "'langchain.schema'\n", - "\n", - "My langchain version is ; '0.1.7'\n", - "\n", - "Reasoning: Let's think step by step in order to The error message indicates that the code is trying to import 'BaseOuputParser' from 'langchain.schema', but the import fails. The likely cause is a typo in the class name: 'BaseOuputParser' should probably be 'BaseOutputParser' (missing a 't' in 'Output'). Additionally, the user is using langchain version 0.1.7, which is an early version and may not have this class or may have different module structures. To resolve this, I need to check the correct class name and its availability in langchain version 0.1.7, and also verify the correct import path. Searching for the correct spelling, the class definition, and the changelog or documentation for langchain 0.1.7 will help. Also, checking if the class exists in that version or if the user needs to upgrade langchain to a newer version.\n", - "Search Queries: ['langchain BaseOutputParser class', 'langchain.schema BaseOutputParser import', 'langchain 0.1.7 BaseOutputParser availability', 'langchain changelog 0.1.7 to latest', 'langchain BaseOutputParser typo BaseOuputParser', 'how to import BaseOutputParser in langchain', 'langchain version 0.1.7 schema module classes']\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\u001b[34m[2025-07-24T21:22:00.655181]\u001b[0m\n", - "\n", - "\u001b[31mSystem message:\u001b[0m\n", - "\n", - "Your input fields are:\n", - "1. `dataset_description` (str): A description of the dataset that we are using.\n", - "2. `task_demos` (str): Example inputs/outputs of our module.\n", - "3. `basic_instruction` (str): Basic instruction.\n", - "4. `tip` (str): A suggestion for how to go about generating the new instruction.\n", - "Your output fields are:\n", - "1. `proposed_instruction` (str): Propose an instruction that will be used to prompt a Language Model to perform this task.\n", - "All interactions will be structured in the following way, with the appropriate values filled in.\n", - "\n", - "[[ ## dataset_description ## ]]\n", - "{dataset_description}\n", - "\n", - "[[ ## task_demos ## ]]\n", - "{task_demos}\n", - "\n", - "[[ ## basic_instruction ## ]]\n", - "{basic_instruction}\n", - "\n", - "[[ ## tip ## ]]\n", - "{tip}\n", - "\n", - "[[ ## proposed_instruction ## ]]\n", - "{proposed_instruction}\n", - "\n", - "[[ ## completed ## ]]\n", - "In adhering to this structure, your objective is: \n", - " Use the information below to learn about a task that we are trying to solve using calls to an LM, then generate a new instruction that will be used to prompt a Language Model to better solve the task.\n", - "\n", - "\n", - "\u001b[31mUser message:\u001b[0m\n", - "\n", - "[[ ## dataset_description ## ]]\n", - "The dataset is centered on technical programming questions addressing code errors and functionality issues, especially involving libraries like Langchain, Neo4j, Milvus, and Google Cloud Storage. It features detailed problem descriptions paired with concise solution nuggets, aimed at debugging and enhancing code related to data processing, vector stores, graph databases, and language model integrations. The examples often include code snippets and error messages to provide context for precise, context-aware troubleshooting guidance.\n", - "\n", - "[[ ## task_demos ## ]]\n", - "Question: Am trying to create vector stores on top of my existing KG using from_existing_graph, (followed tomaz and Saurav Joshi neo4j blog posts) - this method is allowing me to create embedding/vector index only for single label due to which am unable to get desired results while asking NLQ (I am assuming though).\n", - "below code is able to answer, the age and location of Oliver but not what he directed,\n", - "i believe this is due to from_existing_graph has only to pass single label and its corresponding properties as option for generating embeddings and vector index\n", - "Any ideas, how to achieve this?\n", - "import os\n", - "import re\n", - "from langchain.vectorstores.neo4j_vector import Neo4jVector\n", - "# from langchain.document_loaders import WikipediaLoader\n", - "from langchain_openai import OpenAIEmbeddings\n", - "# from langchain.text_splitter import CharacterTextSplitter, RecursiveCharacterTextSplitter\n", - "from langchain.graphs import Neo4jGraph\n", - "import openai\n", - "# from transformers import AutoModelForSeq2SeqLM, AutoTokenizer\n", - "\n", - "os.environ[\"OPENAI_API_KEY\"] = \"sk-xx\"\n", - "url = \"neo4j+s://xxxx.databases.neo4j.io\"\n", - "username = \"neo4j\"\n", - "password = \"mypassword\"\n", - "existing_graph = Neo4jVector.from_existing_graph(\n", - " embedding=OpenAIEmbeddings(),\n", - " url=url,\n", - " username=username,\n", - " password=password,\n", - " index_name=\"person\",\n", - " node_label=\"Person\",\n", - " text_node_properties=[\"name\", \"age\", \"location\"],\n", - " embedding_node_property=\"embedding\",\n", - ")\n", - "\n", - "from langchain.chat_models import ChatOpenAI\n", - "from langchain.chains import GraphCypherQAChain\n", - "from langchain.graphs import Neo4jGraph\n", - "\n", - "graph = Neo4jGraph(\n", - " url=url, username=username, password=password\n", - ")\n", - "\n", - "chain = GraphCypherQAChain.from_llm(\n", - " ChatOpenAI(temperature=0), graph=graph, verbose=True\n", - ")\n", - "\n", - "query = \"Where does Oliver Stone live?\"\n", - "#query = \"Name some films directed by Oliver Stone?\" \n", - "\n", - "graph_result = chain.invoke(query)\n", - "\n", - "vector_results = existing_graph.similarity_search(query, k=1)\n", - "for i, res in enumerate(vector_results):\n", - " print(res.page_content)\n", - " if i != len(vector_results)-1:\n", - " print()\n", - "vector_result = vector_results[0].page_content\n", - "\n", - "# Construct prompt for OpenAI\n", - "final_prompt = f\"\"\"You are a helpful question-answering agent. Your task is to analyze\n", - "and synthesize information from two sources: the top result from a similarity search\n", - "(unstructured information) and relevant data from a graph database (structured information).\n", - "Given the user's query: {query}, provide a meaningful and efficient answer based\n", - "on the insights derived from the following data:\n", - "\n", - "Unstructured information: {vector_result}.\n", - "Structured information: {graph_result} \"\"\"\n", - "\n", - "\n", - "from openai import OpenAI\n", - "client = OpenAI(\n", - " # This is the default and can be omitted\n", - " api_key=os.environ.get(\"OPENAI_API_KEY\"),\n", - ")\n", - "\n", - "chat_completion = client.chat.completions.create(messages=[{\"role\": \"user\",\"content\": final_prompt, }],model=\"gpt-3.5-turbo\",)\n", - "\n", - "answer = chat_completion.choices[0].message.content.strip()\n", - "print(answer)\n", - "\n", - "Any help would be highly appreicated?\n", - "here is my schema:\n", - "Node properties are the following:\n", - "Person {name: STRING, embedding: LIST, age: INTEGER, location: STRING},Actor {name: STRING, embedding: LIST},Movie {title: STRING},Director {name: STRING, embedding: LIST, age: INTEGER, location: STRING}\n", - "Relationship properties are the following:\n", - "ACTED_IN {role: STRING}\n", - "The relationships are the following:\n", - "(:Person)-[:ACTED_IN]->(:Movie),(:Person)-[:DIRECTED]->(:Movie),(:Actor)-[:ACTED_IN]->(:Movie),(:Director)-[:DIRECTED]->(:Movie)\n", - "\n", - "Cypher used to create:\n", - "CREATE (charlie:Person:Actor {name: 'Charlie Sheen'})-[:ACTED_IN {role: 'Bud Fox'}]->(wallStreet:Movie {title: 'Wall Street'})<-[:DIRECTED]-(oliver:Person:Director {name: 'Oliver Stone'});\n", - "MATCH (n:Person {name: 'Oliver Stone'}) SET n.age = 30, n.location = \"New York\" RETURN n\n", - "\n", - "\n", - "Reasoning: Let's think step by step in order to The user is trying to create a vector store on top of an existing Neo4j knowledge graph using the `from_existing_graph` method from Langchain's Neo4jVector class. However, this method currently supports creating embeddings and vector indexes only for a single node label at a time, which limits the ability to answer questions involving multiple labels or relationships, such as \"What did Oliver Stone direct?\" The user wants to know how to extend or modify their approach to generate embeddings and vector indexes that cover multiple node labels and their relationships, so that natural language queries involving those relationships can be answered effectively.\n", - "\n", - "To address this, I need to find information on:\n", - "- Whether `from_existing_graph` supports multiple labels or how to work around this limitation.\n", - "- How to create embeddings for multiple node labels or combined properties in Neo4j using Langchain or other tools.\n", - "- Best practices or examples for building vector stores over heterogeneous graphs with multiple node types and relationships.\n", - "- How to incorporate relationship information (e.g., directed movies) into embeddings or vector indexes.\n", - "- Alternative approaches or custom implementations to generate embeddings for multiple labels and relationships in Neo4j.\n", - "\n", - "This will help the user understand how to extend their current setup to handle multi-label graphs and improve NLQ results.\n", - "Search Queries: ['langchain Neo4jVector from_existing_graph multiple labels support', 'create vector store embeddings for multiple node labels Neo4j', 'embedding multiple node types and relationships Neo4j Langchain', 'how to index heterogeneous graph with multiple labels in Neo4j vector store', 'best practices for vector embeddings on knowledge graphs with multiple node labels', 'Neo4j vector search embedding relationships and multiple labels', 'Langchain Neo4jVector create embeddings for multiple labels and properties', 'natural language querying knowledge graph multiple labels embeddings', 'examples of vector stores on Neo4j knowledge graph with multiple node labels', 'custom embedding generation for multi-label Neo4j graph Langchain']\n", - "\n", - "Question: Hello i am trying to run this following code but i am getting an error;\n", - "from langchain.schema import BaseOuputParser\n", - "\n", - "Error;\n", - "\n", - "ImportError: cannot import name 'BaseOuputParser' from\n", - "'langchain.schema'\n", - "\n", - "My langchain version is ; '0.1.7'\n", - "\n", - "Reasoning: Let's think step by step in order to The error message indicates that the code is trying to import 'BaseOuputParser' from 'langchain.schema', but it cannot find this name. This could be due to a typo in the class name (likely 'BaseOutputParser' instead of 'BaseOuputParser'), or the class might not exist in the specified version of the langchain library (0.1.7). To resolve this, I need to verify the correct class name and check if it exists in langchain version 0.1.7. Additionally, I should look for any changes in the API or import paths in that version. Searching for the correct usage of output parsers in langchain 0.1.7 and any related documentation or changelogs will help clarify the issue.\n", - "Search Queries: ['langchain BaseOutputParser import error', 'langchain 0.1.7 BaseOutputParser', 'langchain schema BaseOutputParser usage', 'langchain output parser example 0.1.7', 'langchain changelog 0.1.7 output parser', 'langchain BaseOutputParser typo BaseOuputParser', 'langchain 0.1.7 documentation output parser']\n", - "\n", - "Question: Hello i am trying to run this following code but i am getting an error;\n", - "from langchain.schema import BaseOuputParser\n", - "\n", - "Error;\n", - "\n", - "ImportError: cannot import name 'BaseOuputParser' from\n", - "'langchain.schema'\n", - "\n", - "My langchain version is ; '0.1.7'\n", - "\n", - "Reasoning: Let's think step by step in order to The error message indicates that the code is trying to import 'BaseOuputParser' from 'langchain.schema', but the import fails. The likely cause is a typo in the class name: 'BaseOuputParser' should probably be 'BaseOutputParser' (missing a 't' in 'Output'). Additionally, the user is using langchain version 0.1.7, which is an early version and may not have this class or may have different module structures. To resolve this, I need to check the correct class name and its availability in langchain version 0.1.7, and also verify the correct import path. Searching for the correct spelling, the class definition, and the changelog or documentation for langchain 0.1.7 will help. Also, checking if the class exists in that version or if the user needs to upgrade langchain to a newer version.\n", - "Search Queries: ['langchain BaseOutputParser class', 'langchain.schema BaseOutputParser import', 'langchain 0.1.7 BaseOutputParser availability', 'langchain changelog 0.1.7 to latest', 'langchain BaseOutputParser typo BaseOuputParser', 'how to import BaseOutputParser in langchain', 'langchain version 0.1.7 schema module classes']\n", - "\n", - "\n", - "\n", - "[[ ## basic_instruction ## ]]\n", - "Write search queries to gather information from a search engine that will help answer the question.\n", - "Consider both exploration and result diversity to capture multiple interpretations and facets of a query.\n", - "\n", - "[[ ## tip ## ]]\n", - "The instruction should include a high stakes scenario in which the LM must solve the task!\n", - "\n", - "Respond with the corresponding output fields, starting with the field `[[ ## proposed_instruction ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`.\n", - "\n", - "\n", - "\u001b[31mResponse:\u001b[0m\n", - "\n", - "\u001b[32m[[ ## proposed_instruction ## ]]\n", - "You are an expert technical researcher tasked with resolving complex programming issues involving Langchain, Neo4j, Milvus, and Google Cloud Storage integrations. Given a detailed programming question that includes code snippets, error messages, and context about vector stores, embeddings, graph databases, or library usage, generate a diverse and comprehensive set of precise search queries. These queries should be designed to thoroughly explore the problem space, uncover relevant documentation, examples, best practices, and potential workarounds. Your goal is to enable a high-stakes debugging or enhancement scenario where the user must quickly find actionable solutions to fix errors, extend functionality, or optimize code. Ensure the queries cover multiple angles including version compatibility, API usage, multi-label graph embeddings, relationship indexing, and common pitfalls to maximize the breadth and depth of search results.\n", - "\n", - "[[ ## completed ## ]]\u001b[0m\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "PROPOSED INSTRUCTION: You are an expert technical researcher tasked with resolving complex programming issues involving Langchain, Neo4j, Milvus, and Google Cloud Storage integrations. Given a detailed programming question that includes code snippets, error messages, and context about vector stores, embeddings, graph databases, or library usage, generate a diverse and comprehensive set of precise search queries. These queries should be designed to thoroughly explore the problem space, uncover relevant documentation, examples, best practices, and potential workarounds. Your goal is to enable a high-stakes debugging or enhancement scenario where the user must quickly find actionable solutions to fix errors, extend functionality, or optimize code. Ensure the queries cover multiple angles including version compatibility, API usage, multi-label graph embeddings, relationship indexing, and common pitfalls to maximize the breadth and depth of search results.\n", - "Using a randomly generated configuration for our grounded proposer.\n", - "Selected tip: simple\n", - "task_demos Question: Hello i am trying to run this following code but i am getting an error;\n", - "from langchain.schema import BaseOuputParser\n", - "\n", - "Error;\n", - "\n", - "ImportError: cannot import name 'BaseOuputParser' from\n", - "'langchain.schema'\n", - "\n", - "My langchain version is ; '0.1.7'\n", - "\n", - "Reasoning: Let's think step by step in order to The error message indicates that the code is trying to import 'BaseOuputParser' from 'langchain.schema', but the import fails. The likely cause is a typo in the class name: 'BaseOuputParser' should probably be 'BaseOutputParser' (missing a 't' in 'Output'). Additionally, the user is using langchain version 0.1.7, which is an early version and may not have this class or may have different module structures. To resolve this, I need to check the correct class name and its availability in langchain version 0.1.7, and also verify the correct import path. Searching for the correct spelling, the class definition, and the changelog or documentation for langchain 0.1.7 will help. Also, checking if the class exists in that version or if the user needs to upgrade langchain to a newer version.\n", - "Search Queries: ['langchain BaseOutputParser class', 'langchain.schema BaseOutputParser import', 'langchain 0.1.7 BaseOutputParser availability', 'langchain changelog 0.1.7 to latest', 'langchain BaseOutputParser typo BaseOuputParser', 'how to import BaseOutputParser in langchain', 'langchain version 0.1.7 schema module classes']\n", - "\n", - "Question: I Am trying to download a PDF file from a GCS storage bucket and read the content into memory.\n", - "When using Langchain with python, i can just use the GCSDirectoryLoader to read all the files in a bucket and the pdf text.\n", - "Langchain for NodeJs doesnt have GCSDirectoryLoader or a webloader for PDF files.\n", - "When downloading the file, i get a Document with the binary representation as content.\n", - "What is the best way to download pdf content from GCS bucket into memory?\n", - "\n", - "Reasoning: Let's think step by step in order to The user wants to download a PDF file from a Google Cloud Storage (GCS) bucket and read its content into memory using Node.js. They mention that Langchain's Python version has a GCSDirectoryLoader that can read PDFs directly, but the Node.js version lacks this feature. When they download the file, they get a Document with binary content, but they want to extract the text content from the PDF in memory.\n", - "\n", - "To answer this, I need to find the best approach or libraries in Node.js to:\n", - "1. Download a PDF file from a GCS bucket into memory (not just to disk).\n", - "2. Parse or extract text content from the PDF binary data in memory.\n", - "\n", - "This likely involves using the official Google Cloud Storage Node.js client to download the file as a buffer, then using a PDF parsing library (like pdf-parse, pdfjs-dist, or similar) to extract text from the buffer.\n", - "\n", - "I should search for:\n", - "- How to download a file from GCS bucket into memory in Node.js\n", - "- How to extract text from a PDF buffer in Node.js\n", - "- Examples or best practices combining these two steps\n", - "- Possibly if there are any Langchain Node.js community solutions or recommended approaches for PDF loading from GCS\n", - "Search Queries: ['download file from Google Cloud Storage bucket into memory Node.js', 'extract text from PDF buffer Node.js', 'parse PDF from buffer in Node.js', 'google cloud storage download pdf as buffer nodejs', 'pdf text extraction libraries Node.js', 'langchain nodejs load pdf from gcs', 'best way to read pdf content from buffer nodejs']\n", - "\n", - "Question: Hello i am trying to run this following code but i am getting an error;\n", - "from langchain.schema import BaseOuputParser\n", - "\n", - "Error;\n", - "\n", - "ImportError: cannot import name 'BaseOuputParser' from\n", - "'langchain.schema'\n", - "\n", - "My langchain version is ; '0.1.7'\n", - "\n", - "Reasoning: Let's think step by step in order to The error message indicates that the code is trying to import 'BaseOuputParser' from 'langchain.schema', but the import fails. The likely cause is a typo in the class name: 'BaseOuputParser' should probably be 'BaseOutputParser' (missing a 't' in 'Output'). Additionally, the user is using langchain version 0.1.7, which is an early version and may not have this class or may have different module structures. To resolve this, I need to check the correct class name and its availability in langchain version 0.1.7, and also verify the correct import path. Searching for the correct spelling, the class definition, and the changelog or documentation for langchain 0.1.7 will help. Also, checking if the class exists in that version or if the user needs to upgrade langchain to a newer version.\n", - "Search Queries: ['langchain BaseOutputParser class', 'langchain.schema BaseOutputParser import error', 'langchain 0.1.7 BaseOutputParser availability', 'langchain changelog 0.1.7 BaseOutputParser', 'langchain BaseOutputParser typo BaseOuputParser', 'how to import BaseOutputParser in langchain', 'langchain version compatibility BaseOutputParser']\n", - "\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/07/24 21:22:02 INFO dspy.teleprompt.mipro_optimizer_v2: Proposed Instructions for Predictor 0:\n", - "\n", - "2025/07/24 21:22:02 INFO dspy.teleprompt.mipro_optimizer_v2: 0: Write search queries to gather information from a search engine that will help answer the question.\n", - "Consider both exploration and result diversity to capture multiple interpretations and facets of a query.\n", - "\n", - "2025/07/24 21:22:02 INFO dspy.teleprompt.mipro_optimizer_v2: 1: You are an expert developer tasked with solving high-stakes, complex programming issues involving code errors and functionality problems in advanced libraries such as Langchain, Neo4j, Milvus, and Google Cloud Storage. Given a technical question describing a specific error or coding challenge, your job is to generate a diverse and comprehensive set of targeted search queries. These queries should explore multiple angles of the problem, including potential typos, version compatibility, library-specific usage, alternative methods, and best practices, to gather precise and relevant information from search engines. Your queries must be designed to uncover detailed troubleshooting steps, code examples, and documentation that will enable you to provide an accurate and effective solution to the user’s problem.\n", - "\n", - "2025/07/24 21:22:02 INFO dspy.teleprompt.mipro_optimizer_v2: 2: You are an expert software engineer and technical researcher specializing in debugging and resolving issues related to programming libraries such as Langchain, Neo4j, Milvus, and Google Cloud Storage integrations. Given a technical question involving code errors, library usage, or functionality issues, generate a diverse and comprehensive set of targeted search queries that will help gather relevant information from search engines. Your queries should explore different angles including error messages, version compatibility, alternative methods, best practices, and relevant documentation. Aim to cover both broad and specific aspects to ensure a thorough understanding that will enable accurate, context-aware troubleshooting and solution recommendations.\n", - "\n", - "2025/07/24 21:22:02 INFO dspy.teleprompt.mipro_optimizer_v2: 3: You are tasked with solving complex and high-stakes technical programming questions related to code errors and functionality issues involving advanced libraries such as Langchain, Neo4j, Milvus, and Google Cloud Storage. Given a detailed problem description that often includes code snippets and error messages, your goal is to generate a diverse and comprehensive set of targeted search queries. These queries should enable thorough exploration and retrieval of relevant information from search engines to precisely diagnose and resolve the user's issue. When formulating these queries, consider multiple angles and possible interpretations of the problem to ensure coverage of all potential causes and solutions. Your search queries should be crafted to maximize the chances of finding the most accurate, up-to-date, and context-aware troubleshooting guidance, especially in scenarios where the problem involves debugging code integrations, handling version compatibility, or processing complex data types like PDFs in memory.\n", - "\n", - "2025/07/24 21:22:02 INFO dspy.teleprompt.mipro_optimizer_v2: 4: Given a technical programming question involving code errors or functionality issues—often with libraries like Langchain, Neo4j, Milvus, or Google Cloud Storage—generate a diverse set of well-crafted search queries that will help gather relevant information to resolve the problem. Your queries should cover multiple facets of the issue, including error messages, version compatibility, usage examples, best practices, and alternative approaches. Aim to explore both direct solutions and related contextual knowledge to ensure comprehensive coverage. Include queries about specific functions, classes, libraries, integration methods, and known limitations or updates. Tailor the queries to the programming language and environment specified in the question, and focus on extracting actionable insights for debugging or enhancing the code.\n", - "\n", - "2025/07/24 21:22:02 INFO dspy.teleprompt.mipro_optimizer_v2: 5: Given a technical programming question that involves debugging code errors, improving functionality, or integrating libraries such as Langchain, Neo4j, Milvus, or Google Cloud Storage, generate a comprehensive set of targeted search queries. These queries should be designed to explore multiple facets of the problem, including error diagnosis, usage of specific classes or methods, version compatibility, best practices for data processing workflows, and integration strategies for vector stores, graph databases, or language models. Your search queries should aim to cover both general and specific angles of the question to ensure thorough research and effective troubleshooting. Include queries that address potential typos, library limitations, alternative approaches, and relevant examples or tutorials. Ensure the queries are clear, technical, and focused on extracting actionable insights to resolve the user’s programming challenge.\n", - "\n", - "2025/07/24 21:22:02 INFO dspy.teleprompt.mipro_optimizer_v2: 6: Given a detailed technical programming question involving debugging, code functionality, or integration issues—especially those related to libraries such as Langchain, Neo4j, Milvus, or Google Cloud Storage—generate a diverse set of targeted search queries that will help gather relevant information from search engines. Your queries should cover multiple angles including troubleshooting specific error messages, best practices, alternative approaches, usage examples, and extensions or workarounds. Aim to capture both broad conceptual topics and precise technical details to ensure comprehensive coverage that supports an effective and context-aware solution.\n", - "\n", - "2025/07/24 21:22:02 INFO dspy.teleprompt.mipro_optimizer_v2: 7: You are an expert technical researcher tasked with resolving complex programming issues involving Langchain, Neo4j, Milvus, and Google Cloud Storage integrations. Given a detailed programming question that includes code snippets, error messages, and context about vector stores, embeddings, graph databases, or library usage, generate a diverse and comprehensive set of precise search queries. These queries should be designed to thoroughly explore the problem space, uncover relevant documentation, examples, best practices, and potential workarounds. Your goal is to enable a high-stakes debugging or enhancement scenario where the user must quickly find actionable solutions to fix errors, extend functionality, or optimize code. Ensure the queries cover multiple angles including version compatibility, API usage, multi-label graph embeddings, relationship indexing, and common pitfalls to maximize the breadth and depth of search results.\n", - "\n", - "2025/07/24 21:22:02 INFO dspy.teleprompt.mipro_optimizer_v2: 8: Given a technical programming question involving code errors or functionality issues related to specific libraries (such as Langchain, Neo4j, Milvus, or Google Cloud Storage), generate a set of diverse and targeted search queries that would help gather relevant information from search engines. These queries should cover possible causes, error messages, version compatibility, usage examples, and alternative solutions to comprehensively address the problem described in the question.\n", - "\n", - "2025/07/24 21:22:02 INFO dspy.teleprompt.mipro_optimizer_v2: \n", - "\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "\n", - "\n", - "\n", - "\u001b[34m[2025-07-24T21:22:02.990787]\u001b[0m\n", - "\n", - "\u001b[31mSystem message:\u001b[0m\n", - "\n", - "Your input fields are:\n", - "1. `dataset_description` (str): A description of the dataset that we are using.\n", - "2. `task_demos` (str): Example inputs/outputs of our module.\n", - "3. `basic_instruction` (str): Basic instruction.\n", - "4. `tip` (str): A suggestion for how to go about generating the new instruction.\n", - "Your output fields are:\n", - "1. `proposed_instruction` (str): Propose an instruction that will be used to prompt a Language Model to perform this task.\n", - "All interactions will be structured in the following way, with the appropriate values filled in.\n", - "\n", - "[[ ## dataset_description ## ]]\n", - "{dataset_description}\n", - "\n", - "[[ ## task_demos ## ]]\n", - "{task_demos}\n", - "\n", - "[[ ## basic_instruction ## ]]\n", - "{basic_instruction}\n", - "\n", - "[[ ## tip ## ]]\n", - "{tip}\n", - "\n", - "[[ ## proposed_instruction ## ]]\n", - "{proposed_instruction}\n", - "\n", - "[[ ## completed ## ]]\n", - "In adhering to this structure, your objective is: \n", - " Use the information below to learn about a task that we are trying to solve using calls to an LM, then generate a new instruction that will be used to prompt a Language Model to better solve the task.\n", - "\n", - "\n", - "\u001b[31mUser message:\u001b[0m\n", - "\n", - "[[ ## dataset_description ## ]]\n", - "The dataset is centered on technical programming questions addressing code errors and functionality issues, especially involving libraries like Langchain, Neo4j, Milvus, and Google Cloud Storage. It features detailed problem descriptions paired with concise solution nuggets, aimed at debugging and enhancing code related to data processing, vector stores, graph databases, and language model integrations. The examples often include code snippets and error messages to provide context for precise, context-aware troubleshooting guidance.\n", - "\n", - "[[ ## task_demos ## ]]\n", - "Question: Hello i am trying to run this following code but i am getting an error;\n", - "from langchain.schema import BaseOuputParser\n", - "\n", - "Error;\n", - "\n", - "ImportError: cannot import name 'BaseOuputParser' from\n", - "'langchain.schema'\n", - "\n", - "My langchain version is ; '0.1.7'\n", - "\n", - "Reasoning: Let's think step by step in order to The error message indicates that the code is trying to import 'BaseOuputParser' from 'langchain.schema', but the import fails. The likely cause is a typo in the class name: 'BaseOuputParser' should probably be 'BaseOutputParser' (missing a 't' in 'Output'). Additionally, the user is using langchain version 0.1.7, which is an early version and may not have this class or may have different module structures. To resolve this, I need to check the correct class name and its availability in langchain version 0.1.7, and also verify the correct import path. Searching for the correct spelling, the class definition, and the changelog or documentation for langchain 0.1.7 will help. Also, checking if the class exists in that version or if the user needs to upgrade langchain to a newer version.\n", - "Search Queries: ['langchain BaseOutputParser class', 'langchain.schema BaseOutputParser import', 'langchain 0.1.7 BaseOutputParser availability', 'langchain changelog 0.1.7 to latest', 'langchain BaseOutputParser typo BaseOuputParser', 'how to import BaseOutputParser in langchain', 'langchain version 0.1.7 schema module classes']\n", - "\n", - "Question: I Am trying to download a PDF file from a GCS storage bucket and read the content into memory.\n", - "When using Langchain with python, i can just use the GCSDirectoryLoader to read all the files in a bucket and the pdf text.\n", - "Langchain for NodeJs doesnt have GCSDirectoryLoader or a webloader for PDF files.\n", - "When downloading the file, i get a Document with the binary representation as content.\n", - "What is the best way to download pdf content from GCS bucket into memory?\n", - "\n", - "Reasoning: Let's think step by step in order to The user wants to download a PDF file from a Google Cloud Storage (GCS) bucket and read its content into memory using Node.js. They mention that Langchain's Python version has a GCSDirectoryLoader that can read PDFs directly, but the Node.js version lacks this feature. When they download the file, they get a Document with binary content, but they want to extract the text content from the PDF in memory.\n", - "\n", - "To answer this, I need to find the best approach or libraries in Node.js to:\n", - "1. Download a PDF file from a GCS bucket into memory (not just to disk).\n", - "2. Parse or extract text content from the PDF binary data in memory.\n", - "\n", - "This likely involves using the official Google Cloud Storage Node.js client to download the file as a buffer, then using a PDF parsing library (like pdf-parse, pdfjs-dist, or similar) to extract text from the buffer.\n", - "\n", - "I should search for:\n", - "- How to download a file from GCS bucket into memory in Node.js\n", - "- How to extract text from a PDF buffer in Node.js\n", - "- Examples or best practices combining these two steps\n", - "- Possibly if there are any Langchain Node.js community solutions or recommended approaches for PDF loading from GCS\n", - "Search Queries: ['download file from Google Cloud Storage bucket into memory Node.js', 'extract text from PDF buffer Node.js', 'parse PDF from buffer in Node.js', 'google cloud storage download pdf as buffer nodejs', 'pdf text extraction libraries Node.js', 'langchain nodejs load pdf from gcs', 'best way to read pdf content from buffer nodejs']\n", - "\n", - "Question: Hello i am trying to run this following code but i am getting an error;\n", - "from langchain.schema import BaseOuputParser\n", - "\n", - "Error;\n", - "\n", - "ImportError: cannot import name 'BaseOuputParser' from\n", - "'langchain.schema'\n", - "\n", - "My langchain version is ; '0.1.7'\n", - "\n", - "Reasoning: Let's think step by step in order to The error message indicates that the code is trying to import 'BaseOuputParser' from 'langchain.schema', but the import fails. The likely cause is a typo in the class name: 'BaseOuputParser' should probably be 'BaseOutputParser' (missing a 't' in 'Output'). Additionally, the user is using langchain version 0.1.7, which is an early version and may not have this class or may have different module structures. To resolve this, I need to check the correct class name and its availability in langchain version 0.1.7, and also verify the correct import path. Searching for the correct spelling, the class definition, and the changelog or documentation for langchain 0.1.7 will help. Also, checking if the class exists in that version or if the user needs to upgrade langchain to a newer version.\n", - "Search Queries: ['langchain BaseOutputParser class', 'langchain.schema BaseOutputParser import error', 'langchain 0.1.7 BaseOutputParser availability', 'langchain changelog 0.1.7 BaseOutputParser', 'langchain BaseOutputParser typo BaseOuputParser', 'how to import BaseOutputParser in langchain', 'langchain version compatibility BaseOutputParser']\n", - "\n", - "\n", - "\n", - "[[ ## basic_instruction ## ]]\n", - "Write search queries to gather information from a search engine that will help answer the question.\n", - "Consider both exploration and result diversity to capture multiple interpretations and facets of a query.\n", - "\n", - "[[ ## tip ## ]]\n", - "Keep the instruction clear and concise.\n", - "\n", - "Respond with the corresponding output fields, starting with the field `[[ ## proposed_instruction ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`.\n", - "\n", - "\n", - "\u001b[31mResponse:\u001b[0m\n", - "\n", - "\u001b[32m[[ ## proposed_instruction ## ]]\n", - "Given a technical programming question involving code errors or functionality issues related to specific libraries (such as Langchain, Neo4j, Milvus, or Google Cloud Storage), generate a set of diverse and targeted search queries that would help gather relevant information from search engines. These queries should cover possible causes, error messages, version compatibility, usage examples, and alternative solutions to comprehensively address the problem described in the question.\n", - "\n", - "[[ ## completed ## ]]\u001b[0m\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "PROPOSED INSTRUCTION: Given a technical programming question involving code errors or functionality issues related to specific libraries (such as Langchain, Neo4j, Milvus, or Google Cloud Storage), generate a set of diverse and targeted search queries that would help gather relevant information from search engines. These queries should cover possible causes, error messages, version compatibility, usage examples, and alternative solutions to comprehensively address the problem described in the question.\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/07/24 21:22:03 INFO dspy.teleprompt.mipro_optimizer_v2: ==> STEP 3: FINDING OPTIMAL PROMPT PARAMETERS <==\n", - "2025/07/24 21:22:03 INFO dspy.teleprompt.mipro_optimizer_v2: We will evaluate the program over a series of trials with different combinations of instructions and few-shot examples to find the optimal combination using Bayesian Optimization.\n", - "\n", - "2025/07/24 21:22:03 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 1 / 27 - Full Evaluation of Default Program ==\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - " 0%| | 0/16 [00:00\n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/optuna/samplers/_tpe/sampler.py:295: ExperimentalWarning: ``multivariate`` option is an experimental feature. The interface can change in the future.\n", - " warnings.warn(\n", - "2025/07/24 21:22:27 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 2 / 27 =====\n", - "2025/07/24 21:22:27 INFO dspy.teleprompt.mipro_optimizer_v2: Evaluating the following candidate program...\n", - "\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "Predictor 0\n", - "i: You are an expert developer tasked with solving high-stakes, complex programming issues involving code errors and functionality problems in advanced libraries such as Langchain, Neo4j, Milvus, and Google Cloud Storage. Given a technical question describing a specific error or coding challenge, your job is to generate a diverse and comprehensive set of targeted search queries. These queries should explore multiple angles of the problem, including potential typos, version compatibility, library-specific usage, alternative methods, and best practices, to gather precise and relevant information from search engines. Your queries must be designed to uncover detailed troubleshooting steps, code examples, and documentation that will enable you to provide an accurate and effective solution to the user’s problem.\n", - "p: Search Queries:\n", - "\n", - "\n", - " 0%| | 0/16 [00:00\n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:22:57 INFO dspy.teleprompt.mipro_optimizer_v2: \u001b[92mBest full score so far!\u001b[0m Score: 79.06\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:22:57 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 79.06 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 17'].\n", - "2025/07/24 21:22:57 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [75.79, 79.06]\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:22:57 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 79.06\n", - "2025/07/24 21:22:57 INFO dspy.teleprompt.mipro_optimizer_v2: ========================\n", - "\n", - "\n", - "2025/07/24 21:22:57 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 3 / 27 =====\n", - "2025/07/24 21:22:57 INFO dspy.teleprompt.mipro_optimizer_v2: Evaluating the following candidate program...\n", - "\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "Predictor 0\n", - "i: Given a technical programming question that involves debugging code errors, improving functionality, or integrating libraries such as Langchain, Neo4j, Milvus, or Google Cloud Storage, generate a comprehensive set of targeted search queries. These queries should be designed to explore multiple facets of the problem, including error diagnosis, usage of specific classes or methods, version compatibility, best practices for data processing workflows, and integration strategies for vector stores, graph databases, or language models. Your search queries should aim to cover both general and specific angles of the question to ensure thorough research and effective troubleshooting. Include queries that address potential typos, library limitations, alternative approaches, and relevant examples or tutorials. Ensure the queries are clear, technical, and focused on extracting actionable insights to resolve the user’s programming challenge.\n", - "p: Search Queries:\n", - "\n", - "\n", - " 0%| | 0/16 [00:00\n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:23:19 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [75.79, 79.06, 75.3]\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:23:19 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 79.06\n", - "2025/07/24 21:23:19 INFO dspy.teleprompt.mipro_optimizer_v2: ========================\n", - "\n", - "\n", - "2025/07/24 21:23:19 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 4 / 27 =====\n", - "2025/07/24 21:23:19 INFO dspy.teleprompt.mipro_optimizer_v2: Evaluating the following candidate program...\n", - "\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "Predictor 0\n", - "i: Given a technical programming question involving code errors or functionality issues related to specific libraries (such as Langchain, Neo4j, Milvus, or Google Cloud Storage), generate a set of diverse and targeted search queries that would help gather relevant information from search engines. These queries should cover possible causes, error messages, version compatibility, usage examples, and alternative solutions to comprehensively address the problem described in the question.\n", - "p: Search Queries:\n", - "\n", - "\n", - " 0%| | 0/16 [00:00\n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:23:46 INFO dspy.teleprompt.mipro_optimizer_v2: \u001b[92mBest full score so far!\u001b[0m Score: 79.46\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:23:46 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 79.46 with parameters ['Predictor 0: Instruction 8', 'Predictor 0: Few-Shot Set 1'].\n", - "2025/07/24 21:23:46 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [75.79, 79.06, 75.3, 79.46]\n", - "2025/07/24 21:23:46 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 79.46\n", - "2025/07/24 21:23:46 INFO dspy.teleprompt.mipro_optimizer_v2: ========================\n", - "\n", - "\n", - "2025/07/24 21:23:46 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 5 / 27 =====\n", - "2025/07/24 21:23:46 INFO dspy.teleprompt.mipro_optimizer_v2: Evaluating the following candidate program...\n", - "\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "Predictor 0\n", - "i: You are an expert software engineer and technical researcher specializing in debugging and resolving issues related to programming libraries such as Langchain, Neo4j, Milvus, and Google Cloud Storage integrations. Given a technical question involving code errors, library usage, or functionality issues, generate a diverse and comprehensive set of targeted search queries that will help gather relevant information from search engines. Your queries should explore different angles including error messages, version compatibility, alternative methods, best practices, and relevant documentation. Aim to cover both broad and specific aspects to ensure a thorough understanding that will enable accurate, context-aware troubleshooting and solution recommendations.\n", - "p: Search Queries:\n", - "\n", - "\n", - " 0%| | 0/16 [00:00\n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:24:08 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 77.38 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 12'].\n", - "2025/07/24 21:24:08 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [75.79, 79.06, 75.3, 79.46, 77.38]\n", - "2025/07/24 21:24:08 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 79.46\n", - "2025/07/24 21:24:08 INFO dspy.teleprompt.mipro_optimizer_v2: ========================\n", - "\n", - "\n", - "2025/07/24 21:24:08 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 6 / 27 =====\n", - "2025/07/24 21:24:08 INFO dspy.teleprompt.mipro_optimizer_v2: Evaluating the following candidate program...\n", - "\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "Predictor 0\n", - "i: Given a technical programming question that involves debugging code errors, improving functionality, or integrating libraries such as Langchain, Neo4j, Milvus, or Google Cloud Storage, generate a comprehensive set of targeted search queries. These queries should be designed to explore multiple facets of the problem, including error diagnosis, usage of specific classes or methods, version compatibility, best practices for data processing workflows, and integration strategies for vector stores, graph databases, or language models. Your search queries should aim to cover both general and specific angles of the question to ensure thorough research and effective troubleshooting. Include queries that address potential typos, library limitations, alternative approaches, and relevant examples or tutorials. Ensure the queries are clear, technical, and focused on extracting actionable insights to resolve the user’s programming challenge.\n", - "p: Search Queries:\n", - "\n", - "\n", - " 0%| | 0/16 [00:00\n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:24:32 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [75.79, 79.06, 75.3, 79.46, 77.38, 75.71]\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:24:32 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 79.46\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:24:32 INFO dspy.teleprompt.mipro_optimizer_v2: ========================\n", - "\n", - "\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:24:32 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 7 / 27 =====\n", - "2025/07/24 21:24:32 INFO dspy.teleprompt.mipro_optimizer_v2: Evaluating the following candidate program...\n", - "\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "Predictor 0\n", - "i: Write search queries to gather information from a search engine that will help answer the question.\n", - "Consider both exploration and result diversity to capture multiple interpretations and facets of a query.\n", - "p: Search Queries:\n", - "\n", - "\n", - " 0%| | 0/16 [00:00\n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:24:57 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 75.71 with parameters ['Predictor 0: Instruction 0', 'Predictor 0: Few-Shot Set 16'].\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:24:57 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [75.79, 79.06, 75.3, 79.46, 77.38, 75.71, 75.71]\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:24:57 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 79.46\n", - "2025/07/24 21:24:57 INFO dspy.teleprompt.mipro_optimizer_v2: ========================\n", - "\n", - "\n", - "2025/07/24 21:24:57 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 8 / 27 =====\n", - "2025/07/24 21:24:57 INFO dspy.teleprompt.mipro_optimizer_v2: Evaluating the following candidate program...\n", - "\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "Predictor 0\n", - "i: Write search queries to gather information from a search engine that will help answer the question.\n", - "Consider both exploration and result diversity to capture multiple interpretations and facets of a query.\n", - "p: Search Queries:\n", - "\n", - "\n", - " 0%| | 0/16 [00:00\n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:25:31 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 73.39 with parameters ['Predictor 0: Instruction 0', 'Predictor 0: Few-Shot Set 13'].\n", - "2025/07/24 21:25:31 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [75.79, 79.06, 75.3, 79.46, 77.38, 75.71, 75.71, 73.39]\n", - "2025/07/24 21:25:31 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 79.46\n", - "2025/07/24 21:25:31 INFO dspy.teleprompt.mipro_optimizer_v2: ========================\n", - "\n", - "\n", - "2025/07/24 21:25:31 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 9 / 27 =====\n", - "2025/07/24 21:25:31 INFO dspy.teleprompt.mipro_optimizer_v2: Evaluating the following candidate program...\n", - "\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "Predictor 0\n", - "i: Write search queries to gather information from a search engine that will help answer the question.\n", - "Consider both exploration and result diversity to capture multiple interpretations and facets of a query.\n", - "p: Search Queries:\n", - "\n", - "\n", - " 0%| | 0/16 [00:00\n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:25:56 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [75.79, 79.06, 75.3, 79.46, 77.38, 75.71, 75.71, 73.39, 63.39]\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:25:56 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 79.46\n", - "2025/07/24 21:25:56 INFO dspy.teleprompt.mipro_optimizer_v2: ========================\n", - "\n", - "\n", - "2025/07/24 21:25:56 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 10 / 27 =====\n", - "2025/07/24 21:25:56 INFO dspy.teleprompt.mipro_optimizer_v2: Evaluating the following candidate program...\n", - "\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "Predictor 0\n", - "i: Given a technical programming question involving code errors or functionality issues related to specific libraries (such as Langchain, Neo4j, Milvus, or Google Cloud Storage), generate a set of diverse and targeted search queries that would help gather relevant information from search engines. These queries should cover possible causes, error messages, version compatibility, usage examples, and alternative solutions to comprehensively address the problem described in the question.\n", - "p: Search Queries:\n", - "\n", - "\n", - " 0%| | 0/16 [00:00\n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:26:15 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [75.79, 79.06, 75.3, 79.46, 77.38, 75.71, 75.71, 73.39, 63.39, 79.05]\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:26:15 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 79.46\n", - "2025/07/24 21:26:15 INFO dspy.teleprompt.mipro_optimizer_v2: =========================\n", - "\n", - "\n", - "2025/07/24 21:26:15 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 11 / 27 =====\n", - "2025/07/24 21:26:15 INFO dspy.teleprompt.mipro_optimizer_v2: Evaluating the following candidate program...\n", - "\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "Predictor 0\n", - "i: You are an expert technical researcher tasked with resolving complex programming issues involving Langchain, Neo4j, Milvus, and Google Cloud Storage integrations. Given a detailed programming question that includes code snippets, error messages, and context about vector stores, embeddings, graph databases, or library usage, generate a diverse and comprehensive set of precise search queries. These queries should be designed to thoroughly explore the problem space, uncover relevant documentation, examples, best practices, and potential workarounds. Your goal is to enable a high-stakes debugging or enhancement scenario where the user must quickly find actionable solutions to fix errors, extend functionality, or optimize code. Ensure the queries cover multiple angles including version compatibility, API usage, multi-label graph embeddings, relationship indexing, and common pitfalls to maximize the breadth and depth of search results.\n", - "p: Search Queries:\n", - "\n", - "\n", - " 0%| | 0/16 [00:00\n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:26:44 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [75.79, 79.06, 75.3, 79.46, 77.38, 75.71, 75.71, 73.39, 63.39, 79.05, 74.26]\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:26:44 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 79.46\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:26:44 INFO dspy.teleprompt.mipro_optimizer_v2: =========================\n", - "\n", - "\n", - "2025/07/24 21:26:44 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 12 / 27 =====\n", - "2025/07/24 21:26:44 INFO dspy.teleprompt.mipro_optimizer_v2: Evaluating the following candidate program...\n", - "\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "Predictor 0\n", - "i: You are an expert developer tasked with solving high-stakes, complex programming issues involving code errors and functionality problems in advanced libraries such as Langchain, Neo4j, Milvus, and Google Cloud Storage. Given a technical question describing a specific error or coding challenge, your job is to generate a diverse and comprehensive set of targeted search queries. These queries should explore multiple angles of the problem, including potential typos, version compatibility, library-specific usage, alternative methods, and best practices, to gather precise and relevant information from search engines. Your queries must be designed to uncover detailed troubleshooting steps, code examples, and documentation that will enable you to provide an accurate and effective solution to the user’s problem.\n", - "p: Search Queries:\n", - "\n", - "\n", - " 0%| | 0/16 [00:00\n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:27:08 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [75.79, 79.06, 75.3, 79.46, 77.38, 75.71, 75.71, 73.39, 63.39, 79.05, 74.26, 79.36]\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:27:08 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 79.46\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:27:08 INFO dspy.teleprompt.mipro_optimizer_v2: =========================\n", - "\n", - "\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:27:08 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 13 / 27 =====\n", - "2025/07/24 21:27:08 INFO dspy.teleprompt.mipro_optimizer_v2: Evaluating the following candidate program...\n", - "\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "Predictor 0\n", - "i: Given a technical programming question involving code errors or functionality issues related to specific libraries (such as Langchain, Neo4j, Milvus, or Google Cloud Storage), generate a set of diverse and targeted search queries that would help gather relevant information from search engines. These queries should cover possible causes, error messages, version compatibility, usage examples, and alternative solutions to comprehensively address the problem described in the question.\n", - "p: Search Queries:\n", - "\n", - "\n", - " 0%| | 0/16 [00:00\n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:27:36 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [75.79, 79.06, 75.3, 79.46, 77.38, 75.71, 75.71, 73.39, 63.39, 79.05, 74.26, 79.36, 75.3]\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:27:36 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 79.46\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:27:36 INFO dspy.teleprompt.mipro_optimizer_v2: =========================\n", - "\n", - "\n", - "2025/07/24 21:27:36 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 14 / 27 =====\n", - "2025/07/24 21:27:36 INFO dspy.teleprompt.mipro_optimizer_v2: Evaluating the following candidate program...\n", - "\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "Predictor 0\n", - "i: Given a detailed technical programming question involving debugging, code functionality, or integration issues—especially those related to libraries such as Langchain, Neo4j, Milvus, or Google Cloud Storage—generate a diverse set of targeted search queries that will help gather relevant information from search engines. Your queries should cover multiple angles including troubleshooting specific error messages, best practices, alternative approaches, usage examples, and extensions or workarounds. Aim to capture both broad conceptual topics and precise technical details to ensure comprehensive coverage that supports an effective and context-aware solution.\n", - "p: Search Queries:\n", - "\n", - "\n", - " 0%| | 0/16 [00:00\n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:27:56 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 75.37 with parameters ['Predictor 0: Instruction 6', 'Predictor 0: Few-Shot Set 14'].\n", - "2025/07/24 21:27:56 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [75.79, 79.06, 75.3, 79.46, 77.38, 75.71, 75.71, 73.39, 63.39, 79.05, 74.26, 79.36, 75.3, 75.37]\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:27:56 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 79.46\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:27:56 INFO dspy.teleprompt.mipro_optimizer_v2: =========================\n", - "\n", - "\n", - "2025/07/24 21:27:56 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 15 / 27 =====\n", - "2025/07/24 21:27:56 INFO dspy.teleprompt.mipro_optimizer_v2: Evaluating the following candidate program...\n", - "\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "Predictor 0\n", - "i: You are an expert developer tasked with solving high-stakes, complex programming issues involving code errors and functionality problems in advanced libraries such as Langchain, Neo4j, Milvus, and Google Cloud Storage. Given a technical question describing a specific error or coding challenge, your job is to generate a diverse and comprehensive set of targeted search queries. These queries should explore multiple angles of the problem, including potential typos, version compatibility, library-specific usage, alternative methods, and best practices, to gather precise and relevant information from search engines. Your queries must be designed to uncover detailed troubleshooting steps, code examples, and documentation that will enable you to provide an accurate and effective solution to the user’s problem.\n", - "p: Search Queries:\n", - "\n", - "\n", - " 0%| | 0/16 [00:00\n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:28:17 INFO dspy.teleprompt.mipro_optimizer_v2: \u001b[92mBest full score so far!\u001b[0m Score: 81.96\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:28:17 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 81.96 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 8'].\n", - "2025/07/24 21:28:17 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [75.79, 79.06, 75.3, 79.46, 77.38, 75.71, 75.71, 73.39, 63.39, 79.05, 74.26, 79.36, 75.3, 75.37, 81.96]\n", - "2025/07/24 21:28:17 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 81.96\n", - "2025/07/24 21:28:17 INFO dspy.teleprompt.mipro_optimizer_v2: =========================\n", - "\n", - "\n", - "2025/07/24 21:28:17 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 16 / 27 =====\n", - "2025/07/24 21:28:17 INFO dspy.teleprompt.mipro_optimizer_v2: Evaluating the following candidate program...\n", - "\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "Predictor 0\n", - "i: You are an expert developer tasked with solving high-stakes, complex programming issues involving code errors and functionality problems in advanced libraries such as Langchain, Neo4j, Milvus, and Google Cloud Storage. Given a technical question describing a specific error or coding challenge, your job is to generate a diverse and comprehensive set of targeted search queries. These queries should explore multiple angles of the problem, including potential typos, version compatibility, library-specific usage, alternative methods, and best practices, to gather precise and relevant information from search engines. Your queries must be designed to uncover detailed troubleshooting steps, code examples, and documentation that will enable you to provide an accurate and effective solution to the user’s problem.\n", - "p: Search Queries:\n", - "\n", - "\n", - " 0%| | 0/16 [00:00\n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:28:40 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [75.79, 79.06, 75.3, 79.46, 77.38, 75.71, 75.71, 73.39, 63.39, 79.05, 74.26, 79.36, 75.3, 75.37, 81.96, 68.45]\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:28:40 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 81.96\n", - "2025/07/24 21:28:40 INFO dspy.teleprompt.mipro_optimizer_v2: =========================\n", - "\n", - "\n", - "2025/07/24 21:28:40 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 17 / 27 =====\n", - "2025/07/24 21:28:40 INFO dspy.teleprompt.mipro_optimizer_v2: Evaluating the following candidate program...\n", - "\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "Predictor 0\n", - "i: Given a technical programming question involving code errors or functionality issues—often with libraries like Langchain, Neo4j, Milvus, or Google Cloud Storage—generate a diverse set of well-crafted search queries that will help gather relevant information to resolve the problem. Your queries should cover multiple facets of the issue, including error messages, version compatibility, usage examples, best practices, and alternative approaches. Aim to explore both direct solutions and related contextual knowledge to ensure comprehensive coverage. Include queries about specific functions, classes, libraries, integration methods, and known limitations or updates. Tailor the queries to the programming language and environment specified in the question, and focus on extracting actionable insights for debugging or enhancing the code.\n", - "p: Search Queries:\n", - "\n", - "\n", - " 0%| | 0/16 [00:00\n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:29:02 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 81.03 with parameters ['Predictor 0: Instruction 4', 'Predictor 0: Few-Shot Set 4'].\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:29:02 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [75.79, 79.06, 75.3, 79.46, 77.38, 75.71, 75.71, 73.39, 63.39, 79.05, 74.26, 79.36, 75.3, 75.37, 81.96, 68.45, 81.03]\n", - "2025/07/24 21:29:02 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 81.96\n", - "2025/07/24 21:29:02 INFO dspy.teleprompt.mipro_optimizer_v2: =========================\n", - "\n", - "\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:29:02 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 18 / 27 =====\n", - "2025/07/24 21:29:02 INFO dspy.teleprompt.mipro_optimizer_v2: Evaluating the following candidate program...\n", - "\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "Predictor 0\n", - "i: Given a technical programming question involving code errors or functionality issues—often with libraries like Langchain, Neo4j, Milvus, or Google Cloud Storage—generate a diverse set of well-crafted search queries that will help gather relevant information to resolve the problem. Your queries should cover multiple facets of the issue, including error messages, version compatibility, usage examples, best practices, and alternative approaches. Aim to explore both direct solutions and related contextual knowledge to ensure comprehensive coverage. Include queries about specific functions, classes, libraries, integration methods, and known limitations or updates. Tailor the queries to the programming language and environment specified in the question, and focus on extracting actionable insights for debugging or enhancing the code.\n", - "p: Search Queries:\n", - "\n", - "\n", - " 0%| | 0/16 [00:00\n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:29:25 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [75.79, 79.06, 75.3, 79.46, 77.38, 75.71, 75.71, 73.39, 63.39, 79.05, 74.26, 79.36, 75.3, 75.37, 81.96, 68.45, 81.03, 66.0]\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:29:25 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 81.96\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:29:25 INFO dspy.teleprompt.mipro_optimizer_v2: =========================\n", - "\n", - "\n", - "2025/07/24 21:29:25 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 19 / 27 =====\n", - "2025/07/24 21:29:25 INFO dspy.teleprompt.mipro_optimizer_v2: Evaluating the following candidate program...\n", - "\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "Predictor 0\n", - "i: Given a technical programming question involving code errors or functionality issues—often with libraries like Langchain, Neo4j, Milvus, or Google Cloud Storage—generate a diverse set of well-crafted search queries that will help gather relevant information to resolve the problem. Your queries should cover multiple facets of the issue, including error messages, version compatibility, usage examples, best practices, and alternative approaches. Aim to explore both direct solutions and related contextual knowledge to ensure comprehensive coverage. Include queries about specific functions, classes, libraries, integration methods, and known limitations or updates. Tailor the queries to the programming language and environment specified in the question, and focus on extracting actionable insights for debugging or enhancing the code.\n", - "p: Search Queries:\n", - "\n", - "\n", - " 0%| | 0/16 [00:00\n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:29:46 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 75.3 with parameters ['Predictor 0: Instruction 4', 'Predictor 0: Few-Shot Set 7'].\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:29:46 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [75.79, 79.06, 75.3, 79.46, 77.38, 75.71, 75.71, 73.39, 63.39, 79.05, 74.26, 79.36, 75.3, 75.37, 81.96, 68.45, 81.03, 66.0, 75.3]\n", - "2025/07/24 21:29:46 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 81.96\n", - "2025/07/24 21:29:46 INFO dspy.teleprompt.mipro_optimizer_v2: =========================\n", - "\n", - "\n", - "2025/07/24 21:29:46 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 20 / 27 =====\n", - "2025/07/24 21:29:46 INFO dspy.teleprompt.mipro_optimizer_v2: Evaluating the following candidate program...\n", - "\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "Predictor 0\n", - "i: You are tasked with solving complex and high-stakes technical programming questions related to code errors and functionality issues involving advanced libraries such as Langchain, Neo4j, Milvus, and Google Cloud Storage. Given a detailed problem description that often includes code snippets and error messages, your goal is to generate a diverse and comprehensive set of targeted search queries. These queries should enable thorough exploration and retrieval of relevant information from search engines to precisely diagnose and resolve the user's issue. When formulating these queries, consider multiple angles and possible interpretations of the problem to ensure coverage of all potential causes and solutions. Your search queries should be crafted to maximize the chances of finding the most accurate, up-to-date, and context-aware troubleshooting guidance, especially in scenarios where the problem involves debugging code integrations, handling version compatibility, or processing complex data types like PDFs in memory.\n", - "p: Search Queries:\n", - "\n", - "\n", - " 0%| | 0/16 [00:00\n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:30:09 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [75.79, 79.06, 75.3, 79.46, 77.38, 75.71, 75.71, 73.39, 63.39, 79.05, 74.26, 79.36, 75.3, 75.37, 81.96, 68.45, 81.03, 66.0, 75.3, 79.36]\n", - "2025/07/24 21:30:09 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 81.96\n", - "2025/07/24 21:30:09 INFO dspy.teleprompt.mipro_optimizer_v2: =========================\n", - "\n", - "\n", - "2025/07/24 21:30:09 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 21 / 27 =====\n", - "2025/07/24 21:30:09 INFO dspy.teleprompt.mipro_optimizer_v2: Evaluating the following candidate program...\n", - "\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "Predictor 0\n", - "i: Given a technical programming question involving code errors or functionality issues—often with libraries like Langchain, Neo4j, Milvus, or Google Cloud Storage—generate a diverse set of well-crafted search queries that will help gather relevant information to resolve the problem. Your queries should cover multiple facets of the issue, including error messages, version compatibility, usage examples, best practices, and alternative approaches. Aim to explore both direct solutions and related contextual knowledge to ensure comprehensive coverage. Include queries about specific functions, classes, libraries, integration methods, and known limitations or updates. Tailor the queries to the programming language and environment specified in the question, and focus on extracting actionable insights for debugging or enhancing the code.\n", - "p: Search Queries:\n", - "\n", - "\n", - " 0%| | 0/16 [00:00\n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:30:32 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 79.64 with parameters ['Predictor 0: Instruction 4', 'Predictor 0: Few-Shot Set 9'].\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:30:32 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [75.79, 79.06, 75.3, 79.46, 77.38, 75.71, 75.71, 73.39, 63.39, 79.05, 74.26, 79.36, 75.3, 75.37, 81.96, 68.45, 81.03, 66.0, 75.3, 79.36, 79.64]\n", - "2025/07/24 21:30:32 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 81.96\n", - "2025/07/24 21:30:32 INFO dspy.teleprompt.mipro_optimizer_v2: =========================\n", - "\n", - "\n", - "2025/07/24 21:30:32 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 22 / 27 =====\n", - "2025/07/24 21:30:32 INFO dspy.teleprompt.mipro_optimizer_v2: Evaluating the following candidate program...\n", - "\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "Predictor 0\n", - "i: Given a technical programming question involving code errors or functionality issues—often with libraries like Langchain, Neo4j, Milvus, or Google Cloud Storage—generate a diverse set of well-crafted search queries that will help gather relevant information to resolve the problem. Your queries should cover multiple facets of the issue, including error messages, version compatibility, usage examples, best practices, and alternative approaches. Aim to explore both direct solutions and related contextual knowledge to ensure comprehensive coverage. Include queries about specific functions, classes, libraries, integration methods, and known limitations or updates. Tailor the queries to the programming language and environment specified in the question, and focus on extracting actionable insights for debugging or enhancing the code.\n", - "p: Search Queries:\n", - "\n", - "\n", - " 0%| | 0/16 [00:00\n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:30:55 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [75.79, 79.06, 75.3, 79.46, 77.38, 75.71, 75.71, 73.39, 63.39, 79.05, 74.26, 79.36, 75.3, 75.37, 81.96, 68.45, 81.03, 66.0, 75.3, 79.36, 79.64, 76.68]\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:30:55 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 81.96\n", - "2025/07/24 21:30:55 INFO dspy.teleprompt.mipro_optimizer_v2: =========================\n", - "\n", - "\n", - "2025/07/24 21:30:55 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 23 / 27 =====\n", - "2025/07/24 21:30:55 INFO dspy.teleprompt.mipro_optimizer_v2: Evaluating the following candidate program...\n", - "\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "Predictor 0\n", - "i: Given a technical programming question involving code errors or functionality issues—often with libraries like Langchain, Neo4j, Milvus, or Google Cloud Storage—generate a diverse set of well-crafted search queries that will help gather relevant information to resolve the problem. Your queries should cover multiple facets of the issue, including error messages, version compatibility, usage examples, best practices, and alternative approaches. Aim to explore both direct solutions and related contextual knowledge to ensure comprehensive coverage. Include queries about specific functions, classes, libraries, integration methods, and known limitations or updates. Tailor the queries to the programming language and environment specified in the question, and focus on extracting actionable insights for debugging or enhancing the code.\n", - "p: Search Queries:\n", - "\n", - "\n", - " 0%| | 0/16 [00:00 151 from InstructorEmbedding import INSTRUCTOR\\n 153 self.client = INSTRUCTOR(\\n 154 self.model_name, cache_folder=self.cache_folder, **self.model_kwargs\\n 155 )\\n\\nFile /opt/conda/lib/python3.11/site-packages/InstructorEmbedding/__init__.py:1\\n----> 1 from .instructor import *\\n\\nFile /opt/conda/lib/python3.11/site-packages/InstructorEmbedding/instructor.py:9\\n 8 from torch import Tensor, device\\n----> 9 from sentence_transformers import SentenceTransformer\\n 10 from sentence_transformers.models import Transformer\\n\\nModuleNotFoundError: No module named \\'sentence_transformers\\'\\n\\nThe above exception was the direct cause of the following exception:\\n\\nImportError Traceback (most recent call last)\\nCell In[2], line 10\\n 4 DEVICE = \"cuda:0\" if torch.cuda.is_available() else \"cpu\"\\n 6 #loader = PyPDFDirectoryLoader(\"aircraft_pdfs\")\\n 7 #docs = loader.load()\\n 8 #print(len(docs)) # length of all pages together\\n---> 10 embedding = HuggingFaceInstructEmbeddings(model_name=\"sentence-transformers/all-MiniLM-L6-v2\", model_kwargs={\"device\": DEVICE})\\n\\nFile /opt/conda/lib/python3.11/site-packages/langchain_community/embeddings/huggingface.py:157, in HuggingFaceInstructEmbeddings.__init__(self, **kwargs)\\n 153 self.client = INSTRUCTOR(\\n 154 self.model_name, cache_folder=self.cache_folder, **self.model_kwargs\\n 155 )\\n 156 except ImportError as e:\\n--> 157 raise ImportError(\"Dependencies for InstructorEmbedding not found.\") from e\\n\\nImportError: Dependencies for InstructorEmbedding not found.\\n\\nhere is the output of pip freeze\\ntransformers==4.37.2\\ntorch==2.2.0\\nlangchain==0.1.6\\nInstructorEmbedding==1.0.1\\n...\\n\\n', 'dataset_ids': ['llama_index/docs/docs/examples/cookbooks/oreilly_course_cookbooks/Module-8/Advanced_RAG_with_LlamaParse.ipynb_44656_49785', 'langchain/cookbook/sql_db_qa.mdx_42315_44989', 'llama_index/docs/docs/examples/cookbooks/oreilly_course_cookbooks/Module-8/Advanced_RAG_with_LlamaParse.ipynb_53669_59467', 'llama_index/docs/docs/examples/embeddings/huggingface.ipynb_0_7156', 'langchain/libs/community/langchain_community/embeddings/huggingface.py_0_9358', 'langchain/libs/partners/huggingface/langchain_huggingface/embeddings/huggingface.py_0_3995', 'langchain/docs/docs/integrations/text_embedding/huggingfacehub.ipynb_0_5289', 'langchain/docs/docs/integrations/vectorstores/vdms.ipynb_0_7081', 'langchain/libs/community/langchain_community/embeddings/huggingface.py_9361_17132', 'llama_index/llama-index-integrations/embeddings/llama-index-embeddings-instructor/pyproject.toml_0_1545'], 'nugget_data': [{'nugget_id': '77990896_nugget_0', 'text': \"The ImportError is due to a missing 'sentence-transformers' module.\", 'relevant_corpus_ids': ['llama_index/docs/docs/examples/cookbooks/oreilly_course_cookbooks/Module-8/Advanced_RAG_with_LlamaParse.ipynb_44656_49785', 'langchain/cookbook/sql_db_qa.mdx_42315_44989', 'llama_index/docs/docs/examples/cookbooks/oreilly_course_cookbooks/Module-8/Advanced_RAG_with_LlamaParse.ipynb_53669_59467', 'llama_index/docs/docs/examples/embeddings/huggingface.ipynb_0_7156', 'langchain/libs/community/langchain_community/embeddings/huggingface.py_0_9358', 'langchain/libs/partners/huggingface/langchain_huggingface/embeddings/huggingface.py_0_3995', 'langchain/docs/docs/integrations/text_embedding/huggingfacehub.ipynb_0_5289', 'langchain/docs/docs/integrations/vectorstores/vdms.ipynb_0_7081', 'langchain/libs/community/langchain_community/embeddings/huggingface.py_9361_17132']}, {'nugget_id': '77990896_nugget_1', 'text': \"Installing 'sentence-transformers' version 2.2.2 is necessary to resolve the ImportError and avoid compatibility issues.\", 'relevant_corpus_ids': ['langchain/cookbook/sql_db_qa.mdx_42315_44989', 'llama_index/docs/docs/examples/cookbooks/oreilly_course_cookbooks/Module-8/Advanced_RAG_with_LlamaParse.ipynb_53669_59467', 'llama_index/docs/docs/examples/embeddings/huggingface.ipynb_0_7156', 'llama_index/llama-index-integrations/embeddings/llama-index-embeddings-instructor/pyproject.toml_0_1545']}]}) (input_keys={'question'}): \n", - "Weaviate v1.31.5 makes use of a high-speed gRPC API as well as a REST API.\n", - "Unfortunately, the gRPC health check against Weaviate could not be completed.\n", - "\n", - "This error could be due to one of several reasons:\n", - "- The gRPC traffic at the specified port is blocked by a firewall.\n", - "- gRPC is not enabled or incorrectly configured on the server or the client.\n", - " - Please check that the server address and port (grpc-lmmeto5urgkzhn7vf7bnxa.c0.us-east1.gcp.weaviate.cloud:443) are correct.\n", - "- your connection is unstable or has a high latency. In this case you can:\n", - " - increase init-timeout in `weaviate.connect_to_local(additional_config=wvc.init.AdditionalConfig(timeout=wvc.init.Timeout(init=X)))`\n", - " - disable startup checks by connecting using `skip_init_checks=True`\n", - ". Set `provide_traceback=True` for traceback.\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Average Metric: 5.64 / 8 (70.5%): 81%|████████▏ | 13/16 [00:18<00:02, 1.01it/s]\u001b[96m Returning 100 Sources!\u001b[0m\n", - "\u001b[96mCoverage@100 evaluation:\u001b[0m\n", - "Total nuggets: 3\n", - "Covered nuggets: 3\n", - "\u001b[92mNugget 1: Covered\u001b[0m\n", - "\u001b[92mNugget 2: Covered\u001b[0m\n", - "\u001b[92mNugget 3: Covered\u001b[0m\n", - "\u001b[96mCoverage@100: 3/3 = 1.00\u001b[0m\n", - "Average Metric: 6.64 / 9 (73.8%): 88%|████████▊ | 14/16 [00:23<00:04, 2.01s/it]\u001b[96m Returning 100 Sources!\u001b[0m\n", - "\u001b[96mCoverage@100 evaluation:\u001b[0m\n", - "Total nuggets: 6\n", - "Covered nuggets: 5\n", - "\u001b[92mNugget 1: Covered\u001b[0m\n", - "\u001b[92mNugget 2: Covered\u001b[0m\n", - "\u001b[91mNugget 3: Not covered\u001b[0m\n", - "\u001b[92mNugget 4: Covered\u001b[0m\n", - "\u001b[92mNugget 5: Covered\u001b[0m\n", - "... and 1 more nuggets\n", - "\u001b[96mCoverage@100: 5/6 = 0.83\u001b[0m\n", - "Average Metric: 7.48 / 10 (74.8%): 94%|█████████▍| 15/16 [00:25<00:02, 2.22s/it]\u001b[96m Returning 150 Sources!\u001b[0m\n", - "\u001b[96mCoverage@100 evaluation:\u001b[0m\n", - "Total nuggets: 3\n", - "Covered nuggets: 0\n", - "\u001b[91mNugget 1: Not covered\u001b[0m\n", - "\u001b[91mNugget 2: Not covered\u001b[0m\n", - "\u001b[91mNugget 3: Not covered\u001b[0m\n", - "\u001b[96mCoverage@100: 0/3 = 0.00\u001b[0m\n", - "Average Metric: 7.48 / 11 (68.0%): 100%|██████████| 16/16 [00:26<00:00, 1.63s/it]" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2025/07/24 21:31:21 INFO dspy.evaluate.evaluate: Average Metric: 7.476190476190475 / 16 (46.7%)\n", - "2025/07/24 21:31:21 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 46.73 with parameters ['Predictor 0: Instruction 4', 'Predictor 0: Few-Shot Set 8'].\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:31:21 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [75.79, 79.06, 75.3, 79.46, 77.38, 75.71, 75.71, 73.39, 63.39, 79.05, 74.26, 79.36, 75.3, 75.37, 81.96, 68.45, 81.03, 66.0, 75.3, 79.36, 79.64, 76.68, 46.73]\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:31:21 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 81.96\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:31:21 INFO dspy.teleprompt.mipro_optimizer_v2: =========================\n", - "\n", - "\n", - "2025/07/24 21:31:21 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 24 / 27 =====\n", - "2025/07/24 21:31:21 INFO dspy.teleprompt.mipro_optimizer_v2: Evaluating the following candidate program...\n", - "\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "Predictor 0\n", - "i: You are an expert technical researcher tasked with resolving complex programming issues involving Langchain, Neo4j, Milvus, and Google Cloud Storage integrations. Given a detailed programming question that includes code snippets, error messages, and context about vector stores, embeddings, graph databases, or library usage, generate a diverse and comprehensive set of precise search queries. These queries should be designed to thoroughly explore the problem space, uncover relevant documentation, examples, best practices, and potential workarounds. Your goal is to enable a high-stakes debugging or enhancement scenario where the user must quickly find actionable solutions to fix errors, extend functionality, or optimize code. Ensure the queries cover multiple angles including version compatibility, API usage, multi-label graph embeddings, relationship indexing, and common pitfalls to maximize the breadth and depth of search results.\n", - "p: Search Queries:\n", - "\n", - "\n", - " 0%| | 0/16 [00:00\n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:31:42 INFO dspy.teleprompt.mipro_optimizer_v2: \u001b[92mBest full score so far!\u001b[0m Score: 83.53\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:31:42 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 83.53 with parameters ['Predictor 0: Instruction 7', 'Predictor 0: Few-Shot Set 4'].\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:31:42 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [75.79, 79.06, 75.3, 79.46, 77.38, 75.71, 75.71, 73.39, 63.39, 79.05, 74.26, 79.36, 75.3, 75.37, 81.96, 68.45, 81.03, 66.0, 75.3, 79.36, 79.64, 76.68, 46.73, 83.53]\n", - "2025/07/24 21:31:42 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 83.53\n", - "2025/07/24 21:31:42 INFO dspy.teleprompt.mipro_optimizer_v2: =========================\n", - "\n", - "\n", - "2025/07/24 21:31:42 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 25 / 27 =====\n", - "2025/07/24 21:31:42 INFO dspy.teleprompt.mipro_optimizer_v2: Evaluating the following candidate program...\n", - "\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "Predictor 0\n", - "i: You are an expert technical researcher tasked with resolving complex programming issues involving Langchain, Neo4j, Milvus, and Google Cloud Storage integrations. Given a detailed programming question that includes code snippets, error messages, and context about vector stores, embeddings, graph databases, or library usage, generate a diverse and comprehensive set of precise search queries. These queries should be designed to thoroughly explore the problem space, uncover relevant documentation, examples, best practices, and potential workarounds. Your goal is to enable a high-stakes debugging or enhancement scenario where the user must quickly find actionable solutions to fix errors, extend functionality, or optimize code. Ensure the queries cover multiple angles including version compatibility, API usage, multi-label graph embeddings, relationship indexing, and common pitfalls to maximize the breadth and depth of search results.\n", - "p: Search Queries:\n", - "\n", - "\n", - " 0%| | 0/16 [00:00\n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:32:04 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 76.86 with parameters ['Predictor 0: Instruction 7', 'Predictor 0: Few-Shot Set 4'].\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:32:04 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [75.79, 79.06, 75.3, 79.46, 77.38, 75.71, 75.71, 73.39, 63.39, 79.05, 74.26, 79.36, 75.3, 75.37, 81.96, 68.45, 81.03, 66.0, 75.3, 79.36, 79.64, 76.68, 46.73, 83.53, 76.86]\n", - "2025/07/24 21:32:04 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 83.53\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:32:04 INFO dspy.teleprompt.mipro_optimizer_v2: =========================\n", - "\n", - "\n", - "2025/07/24 21:32:04 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 26 / 27 =====\n", - "2025/07/24 21:32:04 INFO dspy.teleprompt.mipro_optimizer_v2: Evaluating the following candidate program...\n", - "\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "Predictor 0\n", - "i: You are an expert developer tasked with solving high-stakes, complex programming issues involving code errors and functionality problems in advanced libraries such as Langchain, Neo4j, Milvus, and Google Cloud Storage. Given a technical question describing a specific error or coding challenge, your job is to generate a diverse and comprehensive set of targeted search queries. These queries should explore multiple angles of the problem, including potential typos, version compatibility, library-specific usage, alternative methods, and best practices, to gather precise and relevant information from search engines. Your queries must be designed to uncover detailed troubleshooting steps, code examples, and documentation that will enable you to provide an accurate and effective solution to the user’s problem.\n", - "p: Search Queries:\n", - "\n", - "\n", - " 0%| | 0/16 [00:00\n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:32:25 INFO dspy.teleprompt.mipro_optimizer_v2: =========================\n", - "\n", - "\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:32:25 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 27 / 27 =====\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:32:25 INFO dspy.teleprompt.mipro_optimizer_v2: Evaluating the following candidate program...\n", - "\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "Predictor 0\n", - "i: You are an expert technical researcher tasked with resolving complex programming issues involving Langchain, Neo4j, Milvus, and Google Cloud Storage integrations. Given a detailed programming question that includes code snippets, error messages, and context about vector stores, embeddings, graph databases, or library usage, generate a diverse and comprehensive set of precise search queries. These queries should be designed to thoroughly explore the problem space, uncover relevant documentation, examples, best practices, and potential workarounds. Your goal is to enable a high-stakes debugging or enhancement scenario where the user must quickly find actionable solutions to fix errors, extend functionality, or optimize code. Ensure the queries cover multiple angles including version compatibility, API usage, multi-label graph embeddings, relationship indexing, and common pitfalls to maximize the breadth and depth of search results.\n", - "p: Search Queries:\n", - "\n", - "\n", - " 0%| | 0/16 [00:00\n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:32:52 INFO dspy.teleprompt.mipro_optimizer_v2: \u001b[92mBest full score so far!\u001b[0m Score: 86.86\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:32:52 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 86.86 with parameters ['Predictor 0: Instruction 7', 'Predictor 0: Few-Shot Set 11'].\n", - "2025/07/24 21:32:52 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [75.79, 79.06, 75.3, 79.46, 77.38, 75.71, 75.71, 73.39, 63.39, 79.05, 74.26, 79.36, 75.3, 75.37, 81.96, 68.45, 81.03, 66.0, 75.3, 79.36, 79.64, 76.68, 46.73, 83.53, 76.86, 77.28, 86.86]\n", - "2025/07/24 21:32:52 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 86.86\n", - "2025/07/24 21:32:52 INFO dspy.teleprompt.mipro_optimizer_v2: =========================\n", - "\n", - "\n", - "2025/07/24 21:32:52 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 28 / 27 =====\n", - "2025/07/24 21:32:52 INFO dspy.teleprompt.mipro_optimizer_v2: Evaluating the following candidate program...\n", - "\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "Predictor 0\n", - "i: You are an expert technical researcher tasked with resolving complex programming issues involving Langchain, Neo4j, Milvus, and Google Cloud Storage integrations. Given a detailed programming question that includes code snippets, error messages, and context about vector stores, embeddings, graph databases, or library usage, generate a diverse and comprehensive set of precise search queries. These queries should be designed to thoroughly explore the problem space, uncover relevant documentation, examples, best practices, and potential workarounds. Your goal is to enable a high-stakes debugging or enhancement scenario where the user must quickly find actionable solutions to fix errors, extend functionality, or optimize code. Ensure the queries cover multiple angles including version compatibility, API usage, multi-label graph embeddings, relationship indexing, and common pitfalls to maximize the breadth and depth of search results.\n", - "p: Search Queries:\n", - "\n", - "\n", - " 0%| | 0/16 [00:00\n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:33:49 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 81.03 with parameters ['Predictor 0: Instruction 7', 'Predictor 0: Few-Shot Set 11'].\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "2025/07/24 21:33:49 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [75.79, 79.06, 75.3, 79.46, 77.38, 75.71, 75.71, 73.39, 63.39, 79.05, 74.26, 79.36, 75.3, 75.37, 81.96, 68.45, 81.03, 66.0, 75.3, 79.36, 79.64, 76.68, 46.73, 83.53, 76.86, 77.28, 86.86, 81.03]\n", - "2025/07/24 21:33:49 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 86.86\n", - "2025/07/24 21:33:49 INFO dspy.teleprompt.mipro_optimizer_v2: =========================\n", - "\n", - "\n", - "2025/07/24 21:33:49 INFO dspy.teleprompt.mipro_optimizer_v2: Returning best identified program with score 86.86!\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - } - ], - "source": [ - "import dspy\n", - "\n", - "optimizer = dspy.MIPROv2(\n", - " metric=metric,\n", - " auto=\"heavy\",\n", - " verbose=True\n", - ")\n", - "\n", - "optimized_query_writer = optimizer.compile(\n", - " query_writer,\n", - " trainset=trainset,\n", - " requires_permission_to_run=False\n", - ")" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "id": "be4528b0", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "MIPRO run is finished!\n" - ] - } - ], - "source": [ - "print(\"MIPRO run is finished!\")" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "id": "ffa1b115", - "metadata": {}, - "outputs": [], - "source": [ - "optimized_query_writer.save(\"mipro_optimizer_query_writer.json\")" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "id": "f5e81ba0", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - " 0%| | 0/20 [00:00\n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n", - "sys:1: ResourceWarning: Unclosed socket \n", - "ResourceWarning: Enable tracemalloc to get the object allocation traceback\n" - ] - }, - { - "data": { - "text/plain": [ - "81.88" - ] - }, - "execution_count": 10, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "evaluator(optimized_query_writer, **dspy_evaluator_kwargs)" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.10.10" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} diff --git a/pyproject.toml b/pyproject.toml index 6ffcea8..becb729 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -10,6 +10,7 @@ dependencies = [ "dspy==3.*", "hdbscan>=0.8.40", "ipykernel>=6.30.1", + "ir-datasets>=0.5.11", "matplotlib>=3.10.5", "ruff>=0.12.7", "voyageai>=0.3.4", diff --git a/retrieve_dspy/__init__.py b/retrieve_dspy/__init__.py index 855e90e..ef9c099 100644 --- a/retrieve_dspy/__init__.py +++ b/retrieve_dspy/__init__.py @@ -1,27 +1,36 @@ from .retrievers import ( MultiQueryWriter, MultiQueryWriterWithHint, - QueryWriterWithListwiseReranker, MultiQueryWriterWithReranker, - VanillaRAG, + HybridSearch, + HyDE_QueryExpander, + LameR_QueryExpander, + ThinkQE_QueryExpander, + RAGFusion, CrossEncoderReranker, ListwiseReranker, - LayeredReranker, + LayeredBestMatchReranker, + LayeredListwiseReranker, FilteredQueryWriter, SummarizedListwiseReranker, - LoopingQueryWriter, QueryExpander, DecomposeAndExpand, QueryExpanderWithHint, DecomposeAndExpandWithHints, QueryExpanderWithReranker, BestMatchReranker, - QueryDocumentSummarizer + SlidingWindowListwiseReranker, + TopDownPartitioningReranker, + QueryDocumentSummarizer, + SimplifiedBaleenWithCrossEncoder, + QUIPLER, ) from . import utils from . import metrics from . import datasets +from . import clients +from . import benchmark_run __version__ = "0.1.0" @@ -29,14 +38,17 @@ "MultiQueryWriter", "MultiQueryWriterWithHint", "MultiQueryWriterWithReranker", - "QueryWriterWithListwiseReranker", - "VanillaRAG", + "HybridSearch", + "HyDE_QueryExpander", + "LameR_QueryExpander", + "ThinkQE_QueryExpander", + "RAGFusion", "FilteredQueryWriter", "SummarizedListwiseReranker", "CrossEncoderReranker", "ListwiseReranker", - "LayeredReranker", - "LoopingQueryWriter", + "LayeredBestMatchReranker", + "LayeredListwiseReranker", "QueryExpander", "DecomposeAndExpand", "QueryExpanderWithHint", @@ -44,7 +56,13 @@ "QueryExpanderWithReranker", "BestMatchReranker", "QueryDocumentSummarizer", + "SimplifiedBaleenWithCrossEncoder", + "SlidingWindowListwiseReranker", + "TopDownPartitioningReranker", "utils", "metrics", "datasets", + "clients", + "benchmark_run", + "QUIPLER", ] \ No newline at end of file diff --git a/retrieve_dspy/retrievers/query_writer_with_hint.py b/retrieve_dspy/benchmark_run/__init__.py similarity index 100% rename from retrieve_dspy/retrievers/query_writer_with_hint.py rename to retrieve_dspy/benchmark_run/__init__.py diff --git a/retrieve_dspy/benchmark_run/eval-config.yml b/retrieve_dspy/benchmark_run/eval-config.yml new file mode 100644 index 0000000..3ba3d9d --- /dev/null +++ b/retrieve_dspy/benchmark_run/eval-config.yml @@ -0,0 +1,54 @@ +# Evaluation Configuration +dataset: + name: "bright/biology" # Check options in `eval_config.py` + collection_name: "BrightBiology" # Weaviate collection name + target_property_name: "content" # Property to search in + +evaluation: + num_trials: 1 + num_samples: 150 + seed: 42 + num_threads: 1 + +retriever: + type: "TopDownPartitioningReranker" # Check options in `eval_config.py` + + # Common parameters + retrieved_k: 20 + verbose: true + # verbose_signature: true + + # parameters for particular retrievers + layered_best_match_reranker: + return_property_name: "content" + + layered_listwise_reranker: + return_property_name: "content" + + simplified_baleen_with_cross_encoder: + max_hops: 2 + + sliding_window_listwise_reranker: + window_size: 5 + stride: 3 + + top_down_partitioning_reranker: + window_size: 5 + budget: 20 + ranking_depth: 100 + use_thinking: true + +# Metrics to evaluate +metrics: + - name: "success@1" + type: "success" + k: 1 + - name: "recall@5" + type: "recall" + k: 5 + - name: "recall@20" + type: "recall" + k: 20 + - name: "nDCG@10" + type: "nDCG" + k: 10 \ No newline at end of file diff --git a/retrieve_dspy/benchmark_run/eval_config.py b/retrieve_dspy/benchmark_run/eval_config.py new file mode 100644 index 0000000..3b61286 --- /dev/null +++ b/retrieve_dspy/benchmark_run/eval_config.py @@ -0,0 +1,25 @@ +supported_datasets = ( + "beir/fiqa/test", + "beir/nq", + "beir/scifact/test", + "enron", + "lotte/lifestyle/test/forum", + "lotte/lifestyle/test/search", + "lotte/recreation/test/forum", + "lotte/recreation/test/search", + "wixqa", + "bright/biology", + "bright/earth_science", + "bright/economics", + "bright/psychology", + "bright/robotics", +) + +supported_retriever_types = ( + "HybridSearch", + "HyDE", + "LameR", + "ThinkQE", + "SlidingWindowListwiseReranker", + "TopDownPartitioningReranker", +) \ No newline at end of file diff --git a/retrieve_dspy/benchmark_run/eval_utils.py b/retrieve_dspy/benchmark_run/eval_utils.py new file mode 100644 index 0000000..5c21235 --- /dev/null +++ b/retrieve_dspy/benchmark_run/eval_utils.py @@ -0,0 +1,119 @@ +import numpy as np +import dspy +from dspy import Example, Prediction +from typing import List, Tuple, Dict, Callable +import yaml +from retrieve_dspy.clients import get_weaviate_client, get_voyage_client +from retrieve_dspy.datasets.in_memory import in_memory_dataset_loader +from retrieve_dspy.metrics import create_metric + + +def load_config(config_path="./retrieve_dspy/benchmark_run/eval-config.yml"): + """Load configuration from YAML file.""" + with open(config_path, 'r') as f: + return yaml.safe_load(f) + + +def setup_clients(): + """Initialize all required clients.""" + weaviate_client = get_weaviate_client() + voyage_client = get_voyage_client() + + # Only initialize async clients if needed (they're not used by most retrievers) + weaviate_async_client = None + voyage_async_client = None + + return weaviate_client, weaviate_async_client, voyage_client, voyage_async_client + + +def create_metrics_dict(metrics_config): + """Create metrics dictionary from configuration.""" + metrics = {} + + for metric_config in metrics_config: + metric_name = metric_config["name"] + metrics[metric_name] = create_metric( + metric_type=metric_config["type"], + k=metric_config["k"], + verbose=False # Keep individual metrics quiet + ) + + return metrics + + +def load_dataset(dataset_config): + """Load the specified dataset.""" + dataset_name = dataset_config["name"] + + print(f"\033[95mLoading dataset: {dataset_name}\033[0m") + + _, queries = in_memory_dataset_loader(dataset_name=dataset_name) + return queries + + +def print_trial_results(trial, num_trials, primary_score, offline_scores): + """Print results for a single trial.""" + print(f"\nTrial {trial + 1}/{num_trials} Results:") + print(f"Primary score: {primary_score:.3f}") + + for metric_name, score in offline_scores.items(): + print(f"\033[96m{metric_name}\033[0m: \033[92m{score:.3f}\033[0m") + + +def print_final_results(scores, offline_scores_across_trials, metrics): + """Print final aggregated results across all trials.""" + print("\n" + "="*60) + print("PRIMARY METRIC RESULTS ACROSS TRIALS:") + print("="*60) + + scores = np.array(scores) + print(f"Individual scores: {[f'{score:.3f}' for score in scores]}") + print(f"Min score: {scores.min():.3f}") + print(f"Max score: {scores.max():.3f}") + print(f"\033[92mMean score: {scores.mean():.3f}\033[0m") + print(f"Std dev: {scores.std():.3f}") + + print("\n" + "="*60) + print("ALL METRICS RESULTS ACROSS TRIALS:") + print("="*60) + + for metric_name in metrics.keys(): + metric_scores = np.array(offline_scores_across_trials[metric_name]) + print(f"\n\033[96m{metric_name}:\033[0m") + print(f" Individual scores: {[f'{score:.3f}' for score in metric_scores]}") + print(f" Min score: {metric_scores.min():.3f}") + print(f" Max score: {metric_scores.max():.3f}") + print(f" \033[92mMean score: {metric_scores.mean():.3f}\033[0m") + print(f" Std dev: {metric_scores.std():.3f}") + +def get_evaluator( + testset: list[Example], + metric: callable +): + evaluator = dspy.Evaluate( + devset=testset, + metric=metric, + num_threads=1, + display_progress=True, + max_errors=1, + provide_traceback=True + ) + + return evaluator + +def offline_recall_evaluator( + results: List[Tuple[Example, Prediction, float]], + metrics: Dict[str, Callable], +) -> Dict[str, float]: + metric_scores = {name: [] for name in metrics.keys()} + + for example, prediction, original_score in results: + for metric_name, metric_func in metrics.items(): + score = metric_func(example, prediction) + metric_scores[metric_name].append(score) + + avg_scores = {} + for metric_name, scores in metric_scores.items(): + avg_scores[metric_name] = np.mean(scores) if scores else 0.0 + + return avg_scores \ No newline at end of file diff --git a/scripts/populate-db.py b/retrieve_dspy/benchmark_run/populate_db.py similarity index 100% rename from scripts/populate-db.py rename to retrieve_dspy/benchmark_run/populate_db.py diff --git a/retrieve_dspy/benchmark_run/retriever_builder.py b/retrieve_dspy/benchmark_run/retriever_builder.py new file mode 100644 index 0000000..e18e78a --- /dev/null +++ b/retrieve_dspy/benchmark_run/retriever_builder.py @@ -0,0 +1,202 @@ +""" +Builder patterns for different retriever types. +""" +import retrieve_dspy +from retrieve_dspy.clients import get_weaviate_client, get_and_connect_weaviate_async_client, get_voyage_client +from retrieve_dspy.benchmark_run.eval_config import supported_retriever_types + +"""Factory method for building different types of retrievers.""" + +def build_retriever(retriever_config, use_async, dataset_config, lm_config=None): + """ + Build a retriever based on the configuration. + + Args: + retriever_config: Dictionary containing retriever configuration + dataset_config: Dictionary containing dataset configuration + lm_config: Optional language model configuration + + Returns: + Configured retriever instance + """ + retriever_type = retriever_config["type"] + if retriever_type not in supported_retriever_types: + raise ValueError(f"Unsupported retriever type: {retriever_type}") + + # Common parameters + common_params = { + "weaviate_client": get_and_connect_weaviate_async_client() if use_async else get_weaviate_client(), + "collection_name": dataset_config["collection_name"], + "target_property_name": dataset_config["target_property_name"], + "verbose": retriever_config.get("verbose", True), + "retrieved_k": retriever_config.get("retrieved_k"), + } + + # Add verbose_signature if specified + if "verbose_signature" in retriever_config: + common_params["verbose_signature"] = retriever_config["verbose_signature"] + + # Add return_property_name if different from target + if "return_property_name" in dataset_config: + common_params["return_property_name"] = dataset_config["return_property_name"] + + elif retriever_type == "HyDE": + return _build_hyde(common_params, retriever_config) + + elif retriever_type == "LameR": + return _build_lame_r(common_params, retriever_config) + + elif retriever_type == "ThinkQE": + return _build_think_qe(common_params, retriever_config) + + elif retriever_type == "SlidingWindowListwiseReranker": + return _build_sliding_window_listwise_reranker(common_params, retriever_config) + + elif retriever_type == "TopDownPartitioningReranker": + return _build_top_down_partitioning_reranker(common_params, retriever_config) + + elif retriever_type == "RAGFusion": + return _build_rag_fusion(common_params, retriever_config) + + elif retriever_type == "CrossEncoderReranker": + voyage_client = get_voyage_client() + return _build_cross_encoder_reranker(common_params, retriever_config, voyage_client) + + elif retriever_type == "LayeredBestMatchReranker": + voyage_client = get_voyage_client() + return _build_layered_best_match_reranker(common_params, retriever_config, voyage_client) + + elif retriever_type == "LayeredListwiseReranker": + voyage_client = get_voyage_client() + return _build_layered_listwise_reranker(common_params, retriever_config, voyage_client) + + elif retriever_type == "SimplifiedBaleenWithCrossEncoder": + voyage_client = get_voyage_client() + return _build_simplified_baleen_with_cross_encoder(common_params, retriever_config, voyage_client) + + elif retriever_type == "QUIPLER": + voyage_client = get_voyage_client() + return _build_quipler(common_params, retriever_config, voyage_client) + + elif retriever_type == "HybridSearch": + return _build_hybrid_search(common_params, retriever_config) + + else: + raise ValueError(f"Unknown retriever type: {retriever_type}") + +def _build_hyde(common_params, config): + return retrieve_dspy.HyDE_QueryExpander(**common_params) + +def _build_lame_r(common_params, config): + return retrieve_dspy.LameR_QueryExpander(**common_params) + +def _build_think_qe(common_params, config): + return retrieve_dspy.ThinkQE_QueryExpander(**common_params) + +def _build_sliding_window_listwise_reranker(common_params, config): + params = { + **common_params, + "retrieved_k": config.get("retrieved_k", 50), + "window_size": config.get("window_size", 5), + "stride": config.get("stride", 3), + } + return retrieve_dspy.SlidingWindowListwiseReranker(**params) + +def _build_top_down_partitioning_reranker(common_params, config): + return retrieve_dspy.TopDownPartitioningReranker(**common_params) + +def _build_rag_fusion(common_params, config): + params = { + **common_params, + "retrieved_k": config.get("retrieved_k", 50), + "reranked_k": config.get("reranked_k", 20), + } + return retrieve_dspy.RAGFusion(**params) + +def _build_cross_encoder_reranker(common_params, config, voyage_client): + params = { + **common_params, + "reranker_clients": [voyage_client], + "retrieved_k": config.get("retrieved_k", 50), + "reranked_k": config.get("reranked_k", 20), + "reranker_provider": config.get("reranker_provider", "voyage"), + } + return retrieve_dspy.CrossEncoderReranker(**params) + +def _build_layered_best_match_reranker(common_params, config, voyage_client): + params = { + **common_params, + "reranker_clients": [voyage_client], + "retrieved_k": config.get("retrieved_k", 50), + "reranked_N": config.get("reranked_N", 20), + "reranked_M": config.get("reranked_M", 5), + "reranker_provider": config.get("reranker_provider", "voyage"), + } + if "return_property_name" in config: + params["return_property_name"] = config["return_property_name"] + else: + # Default to target_property_name if not specified + params["return_property_name"] = common_params["target_property_name"] + # Add verbose_signature support + if "verbose_signature" in config: + params["verbose_signature"] = config["verbose_signature"] + return retrieve_dspy.LayeredBestMatchReranker(**params) + +def _build_layered_listwise_reranker(common_params, config, voyage_client): + params = { + **common_params, + "reranker_clients": [voyage_client], + "retrieved_k": config.get("retrieved_k", 50), + "reranked_N": config.get("reranked_N", 20), + "reranked_M": config.get("reranked_M", 5), + "reranker_provider": config.get("reranker_provider", "voyage"), + } + if "return_property_name" in config: + params["return_property_name"] = config["return_property_name"] + else: + # Default to target_property_name if not specified + params["return_property_name"] = common_params["target_property_name"] + if "verbose_signature" in config: + params["verbose_signature"] = config["verbose_signature"] + return retrieve_dspy.LayeredListwiseReranker(**params) + +def _build_simplified_baleen_with_cross_encoder(common_params, config, voyage_client): + params = { + **common_params, + "reranker_clients": [voyage_client], + "retrieved_k": config.get("retrieved_k", 10), + "reranked_N": config.get("reranked_N", 20), + "reranker_provider": config.get("reranker_provider", "voyage"), + "voyage_model": config.get("voyage_model", "rerank-2.5"), + "max_hops": config.get("simplified_baleen", {}).get("max_hops", 2), + } + return retrieve_dspy.SimplifiedBaleenWithCrossEncoder(**params) + +def _build_quipler(common_params, config, voyage_client): + params = { + **common_params, + "reranker_clients": [voyage_client], + "retrieved_k": config.get("retrieved_k", 50), + "reranked_k": config.get("reranked_k", 20), + } + + # QUIPLER supports verbose_signature + if "verbose_signature" in config: + params["verbose_signature"] = config["verbose_signature"] + + return retrieve_dspy.QUIPLER(**params) + +def _build_hybrid_search(common_params, config): + params = { + **common_params, + "retrieved_k": config.get("retrieved_k", 100), + } + print(f"Building HybridSearch with params: {params}") + + try: + retriever = retrieve_dspy.HybridSearch(**params) + print(f"Successfully created HybridSearch: {type(retriever)}") + return retriever + except Exception as e: + print(f"Error creating HybridSearch: {e}") + raise \ No newline at end of file diff --git a/retrieve_dspy/benchmark_run/run_eval.py b/retrieve_dspy/benchmark_run/run_eval.py new file mode 100644 index 0000000..20dd31a --- /dev/null +++ b/retrieve_dspy/benchmark_run/run_eval.py @@ -0,0 +1,107 @@ + +from retrieve_dspy.metrics import create_metric +from retrieve_dspy.datasets.in_memory import prepare_random_subset + +from retriever_builder import build_retriever +from retrieve_dspy.benchmark_run.eval_utils import ( + load_config, + create_metrics_dict, + load_dataset, + print_trial_results, + print_final_results, + get_evaluator, + offline_recall_evaluator +) + +def main(): + # Load configuration + config = load_config() + + # Build retriever from config + print(f"Building retriever with config: {config['retriever']}") + rag_pipeline = build_retriever( + retriever_config=config["retriever"], + use_async=config.get("use_async", False), + dataset_config=config["dataset"], + lm_config=config.get("language_models") + ) + + print(f"Successfully created {type(rag_pipeline).__name__} pipeline") + + # Load dataset + queries = load_dataset(config["dataset"]) + + # Create metrics + metrics = create_metrics_dict(config["metrics"]) + + # Primary metric (first one in the list, or recall@1 by default) + primary_metric = create_metric( + metric_type="recall", + k=1, + verbose=True + ) + + # Initialize tracking variables + eval_config = config["evaluation"] + num_trials = eval_config["num_trials"] + scores = [] + offline_scores_across_trials = {metric_name: [] for metric_name in metrics.keys()} + used_qs = None # TODO: Leave this when introducing fine-tuned retrievers + + print(f"Running evaluation with {config['retriever']['type']} retriever") + print(f"Dataset: {config['dataset']['name']} (Collection: {config['dataset']['collection_name']})") + print(f"Trials: {num_trials}, Samples per trial: {eval_config['num_samples']}") + print(f"Retriever config: retrieved_k={config['retriever'].get('retrieved_k', 'N/A')}") + + # Run evaluation trials + for trial in range(num_trials): + print(f"\nRunning trial {trial + 1}/{num_trials}") + + # Prepare test set + testset = prepare_random_subset( + queries=queries, + num_samples=eval_config["num_samples"], + seed=eval_config["seed"], + samples_used_in_training=used_qs, + ) + + # Create evaluator + evaluator = get_evaluator( + testset=testset, + metric=primary_metric, + ) + + # Run evaluation + dspy_evaluator_kwargs = { + "num_threads": eval_config["num_threads"] + } + + evaluator_result = evaluator(rag_pipeline, **dspy_evaluator_kwargs) + primary_score = evaluator_result.score + scores.append(primary_score) + all_results = evaluator_result.results + + # Calculate offline metrics + print("Calculating offline metrics...") + offline_scores = offline_recall_evaluator( + results=all_results, + metrics=metrics + ) + + # Store results + for key, value in offline_scores.items(): + offline_scores_across_trials[key].append(value) + + # Print trial results + print_trial_results(trial, num_trials, primary_score, offline_scores) + + # Optional sleep to avoid rate limits (uncomment if needed) + # print("Sleeping to avoid rate limits...") + # time.sleep(60) + + # Print final results + print_final_results(scores, offline_scores_across_trials, metrics) + + +if __name__ == "__main__": + main() \ No newline at end of file diff --git a/retrieve_dspy/clients.py b/retrieve_dspy/clients.py new file mode 100644 index 0000000..f33213a --- /dev/null +++ b/retrieve_dspy/clients.py @@ -0,0 +1,32 @@ +import os +import weaviate +import cohere +import voyageai + +from retrieve_dspy.models import RerankerClient + +def get_weaviate_client() -> weaviate.WeaviateClient: + return weaviate.connect_to_weaviate_cloud( + cluster_url=os.getenv("WEAVIATE_URL"), + auth_credentials=weaviate.auth.AuthApiKey(os.getenv("WEAVIATE_API_KEY")), + ) + +async def get_and_connect_weaviate_async_client() -> weaviate.WeaviateAsyncClient: + weaviate_async_client = weaviate.use_async_with_weaviate_cloud( + cluster_url=os.getenv("WEAVIATE_URL"), + auth_credentials=weaviate.auth.AuthApiKey(os.getenv("WEAVIATE_API_KEY")), + ) + await weaviate_async_client.connect() + return weaviate_async_client + +def get_cohere_client() -> RerankerClient: + return RerankerClient(name="cohere", client=cohere.ClientV2(os.getenv("COHERE_API_KEY"))) + +def get_cohere_async_client() -> RerankerClient: + return RerankerClient(name="cohere", client=cohere.AsyncClientV2(os.getenv("COHERE_API_KEY"))) + +def get_voyage_client() -> RerankerClient: + return RerankerClient(name="voyage", client=voyageai.Client(os.getenv("VOYAGE_API_KEY"))) + +def get_voyage_async_client() -> RerankerClient: + return RerankerClient(name="voyage", client=voyageai.AsyncClient(os.getenv("VOYAGE_API_KEY"))) diff --git a/retrieve_dspy/database/weaviate_database.py b/retrieve_dspy/database/weaviate_database.py index fa8e8e5..f726e30 100644 --- a/retrieve_dspy/database/weaviate_database.py +++ b/retrieve_dspy/database/weaviate_database.py @@ -1,277 +1,101 @@ import asyncio import os -from typing import Literal, Optional +from typing import Optional import weaviate from weaviate.classes.query import Filter, Metrics, MetadataQuery -from weaviate.classes.init import AdditionalConfig, Timeout -from weaviate.outputs.query import QueryReturn -from retrieve_dspy.models import Source, SourceWithContentAndVector, SearchResult +from retrieve_dspy.models import ObjectFromDB -RETURN_FORMATS = ["string", "dict", "rerank", "vectors"] - -# Extend to add `return_properties` def weaviate_search_tool( query: str, collection_name: str, target_property_name: str, + weaviate_client: Optional[weaviate.WeaviateClient] = None, return_property_name: Optional[str] = None, retrieved_k: Optional[int] = 5, - return_score: bool = False, return_vector: bool = False, tag_filter_value: Optional[str] = None, - return_format: Literal["string", "dict", "rerank", "vectors"] = "string" -): - weaviate_client = weaviate.connect_to_weaviate_cloud( - cluster_url=os.getenv("WEAVIATE_URL"), - auth_credentials=weaviate.auth.AuthApiKey(os.getenv("WEAVIATE_API_KEY")) - ) - +) -> list[ObjectFromDB]: + if weaviate_client is None: + weaviate_client = weaviate.connect_to_weaviate_cloud( + cluster_url=os.getenv("WEAVIATE_URL"), + auth_credentials=weaviate.auth.AuthApiKey(os.getenv("WEAVIATE_API_KEY")), + ) collection = weaviate_client.collections.get(collection_name) - ''' - return_metadata = None - if return_score: - return_metadata = MetadataQuery(score=return_score) - ''' - - if return_property_name is None: - return_property_name = target_property_name ''' + TODO: Add Support for Tag Filtering and Target Vectors with something like `**kwargs` if tag_filter_value: filter = Filter.by_property("tags").contains_any([tag_filter_value]) ''' - - ''' - search_results = collection.query.hybrid( - query=query, - limit=retrieved_k, - return_metadata=return_metadata, - return_properties=return_properties, - include_vector=return_vector - ) - ''' search_results = collection.query.hybrid( query=query, + return_metadata=MetadataQuery(score=True), limit=retrieved_k ) - weaviate_client.close() - - # Build `Source` list of object IDs - object_ids: list[Source] = [] + objects: list[ObjectFromDB] = [] if search_results.objects: - for obj in search_results.objects: - # Instead of UUID, use dataset_id directly - dataset_id = obj.properties.get('dataset_id') - if dataset_id: - object_ids.append(Source(object_id=str(dataset_id))) - - if return_format == "vectors": - sources_with_content_and_vector: list[SourceWithContentAndVector] = [] - for obj in search_results.objects: - sources_with_content_and_vector.append(SourceWithContentAndVector( - object_id=str(obj.uuid), - content=obj.properties[target_property_name], - vector=obj.vector["default"] # update with named vectors + for rank, obj in enumerate(search_results.objects, start=1): + object_id = str(obj.properties.get('dataset_id') or obj.uuid) + content_value = None + if obj.properties and target_property_name in obj.properties: + content_value = obj.properties[target_property_name] + objects.append(ObjectFromDB( + object_id=object_id, + content=str(content_value) if content_value is not None else "", + relevance_rank=rank, + relevance_score=obj.metadata.score, + vector=(obj.vector.get("default") if return_vector and getattr(obj, 'vector', None) else None) )) - return sources_with_content_and_vector, object_ids - - if return_format == "rerank": - search_results_for_rerank: list[SearchResult] = [] - for i, obj in enumerate(search_results.objects): - content = obj.properties[return_property_name] - dataset_id = obj.properties["dataset_id"] - search_results_for_rerank.append(SearchResult( - id=i + 1, - dataset_id=dataset_id, - content=content - )) - - return search_results_for_rerank, object_ids - - elif return_format == "dict": - return _dictify_search_results(search_results, view_properties=[return_property_name]), object_ids - else: - return _stringify_search_results(search_results, view_properties=[return_property_name]), object_ids + return objects async def async_weaviate_search_tool( query: str, collection_name: str, target_property_name: str, + weaviate_async_client: Optional[weaviate.WeaviateAsyncClient] = None, return_property_name: Optional[str] = None, retrieved_k: Optional[int] = 10, return_score: bool = False, return_vector: bool = False, tag_filter_value: Optional[str] = None, - return_format: Literal["string", "dict", "rerank", "vectors"] = "string" -): - """Async version of search tool with hybrid scores.""" - async_client = weaviate.use_async_with_weaviate_cloud( - cluster_url=os.getenv("WEAVIATE_URL"), - auth_credentials=weaviate.auth.AuthApiKey(os.getenv("WEAVIATE_API_KEY")), - additional_config=AdditionalConfig( - timeout=Timeout(init=30, query=60, insert=120) # Values in seconds - ) - ) - - await async_client.connect() - - try: - collection = async_client.collections.get(collection_name) - - return_metadata = None - if return_score: - return_metadata = MetadataQuery(score=return_score) - - if return_property_name is None: - return_property_name = target_property_name - return_properties = [return_property_name] - - ''' - if tag_filter_value: - filter = Filter.by_property("tags").contains_any([tag_filter_value]) - ''' - kwargs = dict( - query=query, - limit=retrieved_k, - return_metadata=return_metadata, - return_properties=return_properties, - include_vector=return_vector, - target_vector=target_property_name +) -> list[ObjectFromDB]: + if weaviate_async_client is None: + weaviate_async_client = weaviate.use_async_with_weaviate_cloud( + cluster_url=os.getenv("WEAVIATE_URL"), + auth_credentials=weaviate.auth.AuthApiKey(os.getenv("WEAVIATE_API_KEY")), ) - - search_results = await collection.query.hybrid(**kwargs) - - object_ids = [] - if search_results.objects: - for obj in search_results.objects: - object_ids.append(Source( - object_id=str(obj.uuid) - )) - - if return_format == "vectors": - sources_with_content_and_vector: list[SourceWithContentAndVector] = [] - for obj in search_results.objects: - sources_with_content_and_vector.append(SourceWithContentAndVector( - object_id=str(obj.uuid), - content=obj.properties[target_property_name], - vector=obj.vector["default"] # update with named vectors - )) - return sources_with_content_and_vector, object_ids - - if return_format == "rerank": - search_results_for_rerank = [] - for i, obj in enumerate(search_results.objects): - content = "" - if obj.properties and return_property_name in obj.properties: - content = obj.properties[return_property_name] - - # score = obj.metadata.score - - search_results_for_rerank.append(SearchResult( - id=i + 1, - initial_rank=i + 1, - # initial_score=float(score), - content=content - )) - - return search_results_for_rerank, object_ids - - elif return_format == "dict": - return _dictify_search_results(search_results, view_properties=[return_property_name]), object_ids - else: - return _stringify_search_results(search_results, view_properties=[return_property_name]), object_ids - - finally: - await async_client.close() - -def _stringify_search_results(search_results: QueryReturn, view_properties=None) -> str: - """ - Convert Weaviate search results to a readable string format. - - Args: - search_results: The QueryReturn object from Weaviate - view_properties: List of property names to include (None means include nothing) - Can include metadata fields prefixed with underscore - - Returns: - A formatted string representation of the search results - """ - result_str = f"Found {len(search_results.objects)} results:\n\n" - - for i, obj in enumerate(search_results.objects): - result_str += f"Result {i+1}:\n" - - if view_properties: - if obj.properties: - properties_to_show = {k: v for k, v in obj.properties.items() if k in view_properties} - - if properties_to_show: - result_str += "Properties:\n" - for key, value in properties_to_show.items(): - result_str += f" {key}: {value}\n" - - if obj.metadata: - metadata_fields = [] - for attr in dir(obj.metadata): - if attr in view_properties: - value = getattr(obj.metadata, attr) - if value is not None: - metadata_fields.append((attr, value)) - - if metadata_fields: - result_str += "Metadata:\n" - for attr, value in metadata_fields: - result_str += f" {attr}: {value}\n" - - result_str += "\n" - - return result_str - -def _dictify_search_results(search_results: QueryReturn, view_properties=None) -> dict[int, str]: - """ - Convert Weaviate search results to a dictionary with integer keys (1-based). - - Args: - search_results: The QueryReturn object from Weaviate - view_properties: List of property names to include - - Returns: - A dictionary mapping numeric IDs to formatted search result strings - """ - result_dict = {} + await weaviate_async_client.connect() + collection = weaviate_async_client.collections.get(collection_name) + ''' + TODO: Add Support for Tag Filtering and Target Vectors with something like `**kwargs` + if tag_filter_value: + filter = Filter.by_property("tags").contains_any([tag_filter_value]) + ''' - for i, obj in enumerate(search_results.objects): - result_id = i + 1 # 1-based indexing - result_str = f"Result {result_id}:\n" - - if view_properties: - if obj.properties: - properties_to_show = {k: v for k, v in obj.properties.items() if k in view_properties} - - if properties_to_show: - result_str += "Properties:\n" - for key, value in properties_to_show.items(): - result_str += f" {key}: {value}\n" - - if obj.metadata: - metadata_fields = [] - for attr in dir(obj.metadata): - if attr in view_properties: - value = getattr(obj.metadata, attr) - if value is not None: - metadata_fields.append((attr, value)) - - if metadata_fields: - result_str += "Metadata:\n" - for attr, value in metadata_fields: - result_str += f" {attr}: {value}\n" - - result_dict[result_id] = result_str + search_results = await collection.query.hybrid( + query=query, + return_metadata=MetadataQuery(score=True), + limit=retrieved_k + ) - return result_dict + objects: list[ObjectFromDB] = [] + if search_results.objects: + for rank, obj in enumerate(search_results.objects, start=1): + object_id = str(obj.properties.get('dataset_id') or obj.uuid) + content_value = None + if obj.properties and target_property_name in obj.properties: + content_value = obj.properties[target_property_name] + objects.append(ObjectFromDB( + object_id=object_id, + content=str(content_value) if content_value is not None else "", + relevance_rank=rank, + relevance_score=obj.metadata.score, + vector=(obj.vector.get("default") if return_vector and getattr(obj, 'vector', None) else None) + )) + return objects def get_tag_values(collection_name: str) -> list[str]: weaviate_client = weaviate.connect_to_weaviate_cloud( @@ -310,25 +134,31 @@ def get_tag_values(collection_name: str) -> list[str]: async def main(): print("Testing sync search tool...") + weaviate_client = weaviate.connect_to_weaviate_cloud( + cluster_url=os.getenv("WEAVIATE_URL"), + auth_credentials=weaviate.auth.AuthApiKey(os.getenv("WEAVIATE_API_KEY")) + ) sync_results = weaviate_search_tool( + weaviate_client=weaviate_client, query="How do I use Weaviate with Langchain?", collection_name="FreshstackLangchain", target_property_name="docs_text", retrieved_k=10, - return_score=True, return_vector=True, - return_format="vectors" ) print(sync_results) print("Testing async search tool...") + async_client = weaviate.use_async_with_weaviate_cloud( + cluster_url=os.getenv("WEAVIATE_URL"), + auth_credentials=weaviate.auth.AuthApiKey(os.getenv("WEAVIATE_API_KEY")), + ) async_results = await async_weaviate_search_tool( + weaviate_async_client=async_client, query="How do I use Weaviate with Langchain?", collection_name="FreshstackLangchain", target_property_name="docs_text", retrieved_k=10, - return_score=True, return_vector=True, - return_format="vectors" ) print(async_results) diff --git a/retrieve_dspy/datasets/in_memory.py b/retrieve_dspy/datasets/in_memory.py index b7924e7..52adbfb 100644 --- a/retrieve_dspy/datasets/in_memory.py +++ b/retrieve_dspy/datasets/in_memory.py @@ -4,67 +4,19 @@ from datasets import load_dataset from dspy import Example - -def create_dspy_examples_from_dataset( - queries: List[Dict], - max_train: int, - max_test: int, - *, - training_samples: Optional[Set[str]] = None, # exact strings to EXCLUDE from test - seed: Optional[int] = None, -) -> Tuple[List[Example], List[Example]]: - """ - Convert dataset queries to DSPy Examples, sample train/test. - Test set will EXCLUDE any question present in `training_samples`. - - Args: - queries: List of query dicts with at least "question" - max_train: Max number of train examples (0/None means all) - max_test: Max number of test examples (0/None means all) - training_samples: Set of question strings previously used for training - (these will be EXCLUDED from test sampling) - seed: Optional RNG seed for reproducibility - - Returns: - (train_examples, test_examples) - """ - if seed is not None: - random.seed(seed) - - examples = [] - for query in queries: - q = query["question"] - ex = Example().with_inputs("question") - ex["question"] = q - - if "dataset_ids" in query: - ex.dataset_ids = query["dataset_ids"] - - if "nugget_data" in query: - ex.nugget_data = query["nugget_data"] - - examples.append(ex) - - # Train sample (no special filtering beyond size) - train_examples = random.sample(examples, min(max_train, len(examples))) if max_train else examples.copy() - - # Candidates for test = everything not selected for train in THIS run - remaining_examples = [ex for ex in examples if ex not in train_examples] - - # Further exclude anything in the external `training_samples` ledger from the TEST pool - if training_samples: - remaining_examples = [ex for ex in remaining_examples if ex["question"] not in training_samples] - - # Test sample from filtered remaining - test_examples = random.sample(remaining_examples, min(max_test, len(remaining_examples))) if max_test else remaining_examples - - return train_examples, test_examples +import ir_datasets def in_memory_dataset_loader(dataset_name: str): if dataset_name == "enron": return _in_memory_dataset_loader_enron() elif dataset_name == "wixqa": return _in_memory_dataset_loader_wixqa() + elif dataset_name.startswith("beir/"): + return _in_memory_dataset_loader_beir(dataset_name) + elif dataset_name.startswith("bright/"): + return _in_memory_dataset_loader_bright(dataset_name) + elif dataset_name.startswith("lotte/"): + return _in_memory_dataset_loader_lotte(dataset_name) elif dataset_name == "freshstack-angular": return _in_memory_dataset_loader_freshstack(subset="angular") elif dataset_name == "freshstack-godot": @@ -77,6 +29,72 @@ def in_memory_dataset_loader(dataset_name: str): return _in_memory_dataset_loader_freshstack(subset="yolo") else: return None + +def _in_memory_dataset_loader_beir(dataset_name: str): + dataset = ir_datasets.load(f"{dataset_name}") + print(f"Loading BEIR dataset: {dataset_name}") + docs, questions = [], [] + for doc in dataset.docs_iter(): + docs.append({ + "title": getattr(doc, "title", ""), + "content": getattr(doc, "text", ""), + "doc_id": getattr(doc, "doc_id", None) + }) + qrels = {} + for qrel in dataset.qrels_iter(): + query_id = qrel.query_id + if query_id not in qrels: + qrels[query_id] = [] + qrels[query_id].append(qrel.doc_id) + for question in dataset.queries_iter(): + questions.append({ + "query_id": question.query_id, + "question": question.text, + "dataset_ids": qrels[question.query_id] + }) + return docs, questions + +def _in_memory_dataset_loader_bright(dataset_name: str): + all_docs = load_dataset("xlangai/BRIGHT", "documents") + split = dataset_name.split("/")[1] + print(f"Loading BRIGHT dataset: {dataset_name}") + docs, questions = [], [] + for doc in all_docs[split]: + docs.append({ + "content": doc["content"], + "dataset_id": doc["id"] + }) + all_questions = load_dataset("xlangai/BRIGHT", "examples") + for question in all_questions[split]: + questions.append({ + "query_id": question["id"], + "question": question["query"], + "dataset_ids": question["gold_ids"] + }) + return docs, questions + +def _in_memory_dataset_loader_lotte(dataset_name: str): + dataset = ir_datasets.load(f"{dataset_name}") + print(f"Loading LOTTE dataset: {dataset_name}") + docs, questions = [], [] + for doc in dataset.docs_iter(): + docs.append({ + "text": getattr(doc, "text", ""), + "doc_id": getattr(doc, "doc_id", None) + }) + qrels = {} + for qrel in dataset.qrels_iter(): + query_id = qrel.query_id + if query_id not in qrels: + qrels[query_id] = [] + qrels[query_id].append(qrel.doc_id) + for question in dataset.queries_iter(): + questions.append({ + "query_id": question.query_id, + "question": question.text, + "dataset_ids": qrels[question.query_id] + }) + return docs, questions def _in_memory_dataset_loader_enron(): emails = _load_dataset_from_hf_hub("weaviate/enron-qa-emails-dasovich-j") @@ -163,34 +181,34 @@ def split_dataset(dataset, train_ratio=0.8, shuffle=True): return train_data, test_data -def load_queries_in_memory( - dataset_name: str, - train_samples: int, - test_samples: int, - *, - training_samples: Optional[Set[str]] = None, # pass your exact-question set here - seed: Optional[int] = None, -): - _, queries = in_memory_dataset_loader(dataset_name) - - ''' - # WHAT!! - # THE FIRST K SAMPLES ON ENRONQA SEEM TO BE EASIER THAN RANDOM SAMPLES!! - first_k_samples = queries[:test_samples] - formatted_samples = [] - for sample in first_k_samples: - formatted_sample = Example().with_inputs("question") - formatted_sample["question"] = sample["question"] - formatted_sample["dataset_ids"] = sample["dataset_ids"] - formatted_samples.append(formatted_sample) - ''' - - trainset, testset = create_dspy_examples_from_dataset( - queries=queries, - max_train=train_samples, - max_test=test_samples, - training_samples=training_samples, - seed=seed, - ) +def prepare_random_subset( + queries: List[Dict], + num_samples: int, + samples_used_in_training: Optional[Set[str]] = None, + seed: Optional[int] = 42, +) -> Tuple[List[Example], List[Example]]: + random.seed(seed) + + examples = [] + for query in queries: + q = query["question"] + ex = Example().with_inputs("question") + ex["question"] = q + + if "dataset_ids" in query: + ex.dataset_ids = query["dataset_ids"] + + if "nugget_data" in query: + ex.nugget_data = query["nugget_data"] + + examples.append(ex) + + # Filter any samples_used_in_training from the examples + if samples_used_in_training: + examples = [ex for ex in examples if ex["question"] not in samples_used_in_training] + + # Sample the desired number of examples + random.shuffle(examples) + examples = examples[:num_samples] - return trainset, testset \ No newline at end of file + return examples \ No newline at end of file diff --git a/retrieve_dspy/datasets/populate_db.py b/retrieve_dspy/datasets/populate_db.py index a5eae3f..d071c8d 100644 --- a/retrieve_dspy/datasets/populate_db.py +++ b/retrieve_dspy/datasets/populate_db.py @@ -47,6 +47,77 @@ def database_loader( end_time = time.time() upload_time = end_time - start_time print(f"Inserted {i + 1} emails into Weaviate... (Time elapsed: {upload_time:.2f} seconds)") + + if dataset_name.startswith("beir/"): + beir_subset = dataset_name.split("beir/")[1] + formatted_beir_name = beir_subset.replace("-", "_").replace("/", "_").lower() + collection_name = f"Beir{formatted_beir_name.capitalize()}" + + if weaviate_client.collections.exists(collection_name): + weaviate_client.collections.delete(collection_name) + + weaviate_client.collections.create( + name=collection_name, + vectorizer_config=wvcc.Configure.Vectorizer.text2vec_weaviate(), + properties=[ + wvcc.Property(name="title", data_type=wvcc.DataType.TEXT), + wvcc.Property(name="content", data_type=wvcc.DataType.TEXT), + wvcc.Property(name="dataset_id", data_type=wvcc.DataType.TEXT, index_searchable=False), + ], + ) + + start_time = time.time() + with weaviate_client.batch.fixed_size(batch_size=100, concurrent_requests=4) as batch: + for i, doc in enumerate(objects): + batch.add_object( + collection=collection_name, + properties={ + "title": doc["title"], + "content": doc["content"], + "dataset_id": str(doc["doc_id"]) + } + ) + if i % 1000 == 999: + print(f"Inserted {i + 1} documents into Weaviate... (Time elapsed: {time.time()-start_time:.2f} seconds)") + + end_time = time.time() + upload_time = end_time - start_time + print(f"Inserted {i + 1} documents into Weaviate... (Time elapsed: {upload_time:.2f} seconds)") + + if dataset_name.startswith("lotte/"): + lotte_subset = dataset_name.split("/")[1] + collection_name = f"Lotte{lotte_subset.capitalize()}" + print(f"Creating collection: {collection_name}") + + if weaviate_client.collections.exists(collection_name): + weaviate_client.collections.delete(collection_name) + + weaviate_client.collections.create( + name=collection_name, + vectorizer_config=wvcc.Configure.Vectorizer.text2vec_weaviate(), + properties=[ + wvcc.Property(name="content", data_type=wvcc.DataType.TEXT), + wvcc.Property(name="dataset_id", data_type=wvcc.DataType.TEXT, index_searchable=False), + ], + ) + + start_time = time.time() + with weaviate_client.batch.fixed_size(batch_size=100, concurrent_requests=4) as batch: + for i, doc in enumerate(objects): + batch.add_object( + collection=collection_name, + properties={ + "content": doc["text"], + "dataset_id": str(doc["doc_id"]) + } + ) + if i % 1000 == 999: + print(f"Inserted {i + 1} documents into Weaviate... (Time elapsed: {time.time()-start_time:.2f} seconds)") + + end_time = time.time() + upload_time = end_time - start_time + print(f"Inserted {i + 1} documents into Weaviate... (Time elapsed: {upload_time:.2f} seconds)") + if dataset_name == "wixqa": if weaviate_client.collections.exists("WixKB"): weaviate_client.collections.delete("WixKB") diff --git a/retrieve_dspy/metrics.py b/retrieve_dspy/metrics.py index cbf419b..77ee897 100644 --- a/retrieve_dspy/metrics.py +++ b/retrieve_dspy/metrics.py @@ -1,30 +1,67 @@ +import time from typing import Callable from dspy import Example, Prediction +import numpy as np + +from retrieve_dspy.models import ObjectFromDB + +def calculate_success_at_k( + target_ids: list[str], + retrieved_objects: list[ObjectFromDB], + k: int, + verbose: bool = False +) -> int: + """Calculate Success@k (Hit Rate@k). + + Args: + target_ids: List of target document IDs (ground truth). + retrieved_objects: List of retrieved document objects. + k: The number of top results to consider. + + Returns: + int: 1 if at least one target_id is found in the top-k retrieved_ids, + otherwise 0. + """ + target_id_set = {str(id) for id in target_ids} + retrieved_ids = [obj.object_id for obj in retrieved_objects] if retrieved_objects else [] + + retrieved_ids_at_k = retrieved_ids[:k] + + if verbose: + print(f"\033[96mTarget IDs: {target_id_set}\033[0m") + print(f"\033[92mRetrieved IDs @{k}: {retrieved_ids_at_k}\033[0m") + + # Success is binary: 1 if any overlap, 0 otherwise + success = int(any(rid in target_id_set for rid in retrieved_ids_at_k)) + + if verbose: + print(f"\033[96mSuccess@{k}: {success}\033[0m") + + return success def calculate_recall_at_k( target_ids: list[str], - retrieved_ids: list[str], + retrieved_objects: list[ObjectFromDB], k: int, - verbose: bool = True + verbose: bool = True, + sleep_time: int = None ): """Calculate traditional recall@k for retrieved documents. Args: target_ids: List of target document IDs (ground truth). - retrieved_ids: List of retrieved document IDs. + retrieved_objects: List of retrieved document objects. k: The number of top results to consider for recall calculation. Returns: float: Recall@k score (0.0 to 1.0) - proportion of relevant docs found in the top k retrieved results. - """ - if not isinstance(target_ids, list): - target_ids = [target_ids] - + """ # Use sets for efficient lookup target_id_set = {str(id) for id in target_ids} - retrieved_ids = [str(id) for id in retrieved_ids] if retrieved_ids else [] + + retrieved_ids = [obj.object_id for obj in retrieved_objects] if retrieved_objects else [] # Consider only the top k retrieved IDs retrieved_ids_at_k = retrieved_ids[:k] @@ -41,20 +78,75 @@ def calculate_recall_at_k( else: if verbose: print(f"\033[91mRetrieved IDs @{k}: {retrieved_ids_at_k}\033[0m") - - recall = found_count / len(target_id_set) if target_id_set else 0 + + # print(f"Length of gold ids: {len(target_id_set)}") + + if len(retrieved_ids_at_k) == 1: + recall = found_count + else: + recall = found_count / len(target_id_set) if verbose: print(f"\033[96mRecall@{k}: {found_count}/{len(target_id_set)} = {recall:.2f}\033[0m") + # hack for rate limiting + if sleep_time: + print(f"Sleeping to avoid rate limits for {sleep_time} seconds...") + time.sleep(sleep_time) + return recall -def calculate_coverage(retrieved_ids: list[str], nugget_data: list[dict], k: int = 1000): +def calculate_nDCG_at_k( + target_ids: list[str], + retrieved_objects: list[ObjectFromDB], + k: int, + verbose: bool = False +) -> float: + """Calculate nDCG@k for retrieved documents with binary relevance. + + Args: + target_ids: List of relevant document IDs + retrieved_ids: List of retrieved document IDs in ranked order + k: Number of top documents to consider + verbose: Whether to print debug information + + Returns: + nDCG@k score (0 to 1) + """ + # convert target_ids to strings + target_id_set = {str(id) for id in target_ids} + retrieved_ids = [str(obj.object_id) for obj in retrieved_objects[:k]] if retrieved_objects else [] + + # Calculate DCG@k - sum of (relevance / log2(position + 1)) + dcg = 0.0 + for i, doc_id in enumerate(retrieved_ids): + if doc_id in target_id_set: + # Position starts at 1, so we use i+2 for the denominator + dcg += 1.0 / np.log2(i + 2) if i > 0 else 1.0 + + # Calculate IDCG@k - best possible DCG if we had perfect ranking + idcg = 0.0 + num_relevant = min(len(target_id_set), k) + for i in range(num_relevant): + idcg += 1.0 / np.log2(i + 2) if i > 0 else 1.0 + + # Calculate nDCG + ndcg = dcg / idcg if idcg > 0 else 0.0 + + if verbose: + print(f"\033[96mTarget IDs: {target_id_set}\033[0m") + print(f"\033[92mRetrieved IDs @{k}: {retrieved_ids}\033[0m") + print(f"\033[93mDCG@{k}: {dcg:.4f}, IDCG@{k}: {idcg:.4f}\033[0m") + print(f"\033[96mnDCG@{k}: {ndcg:.4f}\033[0m") + + return ndcg + +def calculate_coverage(retrieved_objects: list[ObjectFromDB], nugget_data: list[dict], k: int = 1000): """Calculate Coverage@k metric from FreshStack. Measures the proportion of nuggets covered by the top-k retrieved documents. Args: - retrieved_ids: List of retrieved document IDs in ranked order + retrieved_objects: List of retrieved document objects in ranked order nugget_data: List of nugget information, each with 'relevant_corpus_ids' field k: Number of top documents to consider (default: 20) @@ -65,14 +157,14 @@ def calculate_coverage(retrieved_ids: list[str], nugget_data: list[dict], k: int return 0.0 # Convert to strings for consistent comparison - retrieved_ids = [str(id) for id in retrieved_ids[:k]] if retrieved_ids else [] + retrieved_ids = [obj.object_id for obj in retrieved_objects[:k]] if retrieved_objects else [] covered_nuggets = set() nugget_coverage_details = [] for i, nugget in enumerate(nugget_data): nugget_id = nugget.get('id', f'nugget_{i}') - nugget_relevant_ids = [str(id) for id in nugget.get('relevant_corpus_ids', [])] + nugget_relevant_ids = [obj.object_id for obj in nugget.get('relevant_corpus_objects', [])] # Check if any relevant doc for this nugget is in top-k retrieved covered = any(doc_id in retrieved_ids for doc_id in nugget_relevant_ids) @@ -97,7 +189,22 @@ def calculate_coverage(retrieved_ids: list[str], nugget_data: list[dict], k: int return coverage_score -def create_recall_metric(k: int, verbose: bool = True) -> Callable: +def create_success_at_k_metric(k: int, verbose: bool = True) -> Callable: + """ + Create a success@k metric function that wraps the existing calculate_success_at_k function. + """ + + def success_at_k_metric(example: Example, prediction, trace=None) -> float: + return calculate_success_at_k( + target_ids=example.dataset_ids, + retrieved_objects=prediction.sources, + k=k, + verbose=verbose + ) + + return success_at_k_metric + +def create_recall_metric(k: int, verbose: bool = True, sleep_time: int = None) -> Callable: """ Create a recall metric function that wraps the existing calculate_recall function. @@ -112,7 +219,7 @@ def create_recall_metric(k: int, verbose: bool = True) -> Callable: def recall_metric(example: Example, prediction, trace=None) -> float: try: # Extract sources from prediction - retrieved_ids = prediction.sources + retrieved_objects = prediction.sources # Get target IDs from example target_ids = example.dataset_ids @@ -120,9 +227,10 @@ def recall_metric(example: Example, prediction, trace=None) -> float: # Use the existing calculate_recall function recall_score = calculate_recall_at_k( target_ids=target_ids, - retrieved_ids=retrieved_ids, + retrieved_objects=retrieved_objects, k=k, - verbose=verbose + verbose=verbose, + sleep_time=sleep_time ) return recall_score @@ -133,6 +241,32 @@ def recall_metric(example: Example, prediction, trace=None) -> float: return recall_metric +def create_nDCG_metric(k: int, verbose: bool = True) -> Callable: + """ + Create a nDCG metric function that wraps the existing calculate_nDCG function. + """ + + def nDCG_metric(example: Example, prediction, trace=None) -> float: + try: + retrieved_objects = prediction.sources + + target_ids = example.dataset_ids + + nDCG_score = calculate_nDCG_at_k( + target_ids=target_ids, + retrieved_objects=retrieved_objects, + k=k, + verbose=verbose + ) + + return nDCG_score + + except Exception as e: + print(f"Error calculating nDCG: {e}") + return 0.0 + + return nDCG_metric + def create_coverage_metric(k: int = 1000) -> Callable: """ Create a coverage metric function that wraps the existing calculate_coverage function. @@ -146,12 +280,12 @@ def create_coverage_metric(k: int = 1000) -> Callable: def coverage_metric(example: Example, prediction, trace=None) -> float: try: - retrieved_ids = prediction.sources + retrieved_objects = prediction.sources nugget_data = example.nugget_data if hasattr(example, 'nugget_data') else [] coverage_score = calculate_coverage( - retrieved_ids=retrieved_ids, + retrieved_objects=retrieved_objects, nugget_data=nugget_data, k=k ) @@ -172,14 +306,18 @@ def create_metric( Factory function for creating metric functions. Args: - metric_type: Type of metric ("recall", "coverage") + metric_type: Type of metric ("recall", "coverage", "nDCG") **kwargs: Additional arguments for metric configuration Returns: Configured metric function """ - if metric_type == "recall": + if metric_type == "success": + return create_success_at_k_metric(**kwargs) + elif metric_type == "recall": return create_recall_metric(**kwargs) + elif metric_type == "nDCG": + return create_nDCG_metric(**kwargs) elif metric_type == "coverage": return create_coverage_metric(**kwargs) else: @@ -200,17 +338,17 @@ def coverage_metric_with_feedback( pred_trace=None ) -> Prediction: try: - retrieved_ids = prediction.sources + retrieved_objects = prediction.sources nugget_data = example.nugget_data if hasattr(example, 'nugget_data') else [] - retrieved_ids_str = [str(id) for id in retrieved_ids[:k]] if retrieved_ids else [] + retrieved_ids_str = [obj.object_id for obj in retrieved_objects[:k]] if retrieved_objects else [] covered = [] uncovered = [] for nugget in nugget_data: - nugget_relevant_ids = [str(id) for id in nugget.get('relevant_corpus_ids', [])] + nugget_relevant_ids = [obj.object_id for obj in nugget.get('relevant_corpus_objects', [])] nugget_text = nugget.get('text', '') if any(doc_id in retrieved_ids_str for doc_id in nugget_relevant_ids): diff --git a/retrieve_dspy/models.py b/retrieve_dspy/models.py index d002283..4e8816b 100644 --- a/retrieve_dspy/models.py +++ b/retrieve_dspy/models.py @@ -1,18 +1,15 @@ from pydantic import BaseModel -from typing import Optional, Dict, Any, List +from typing import Optional, Dict, Any, List, Literal + import dspy -class Source(BaseModel): +class ObjectFromDB(BaseModel): object_id: str - -class SourceWithContentAndVector(Source): - content: str - vector: list[float] - -class SearchResult(BaseModel): - id: int content: str - dataset_id: Optional[str] + relevance_rank: Optional[int] = None + relevance_score: Optional[float] = None + vector: Optional[list[float]] = None + source_query: Optional[str] = None class SearchQueryWithFilter(BaseModel): search_query: str @@ -24,7 +21,7 @@ class Cluster(BaseModel): vectors: list[list[float]] class DSPyAgentRAGResponse(dspy.Prediction): - def __init__(self, final_answer: str = "", sources: List[Source] = None, + def __init__(self, final_answer: str = "", sources: List[ObjectFromDB] = None, searches: Optional[List[str]] = None, aggregations: Optional[List] = None, usage: Optional[Dict[str, Any]] = None, **kwargs): super().__init__(**kwargs) @@ -32,5 +29,21 @@ def __init__(self, final_answer: str = "", sources: List[Source] = None, self.final_answer = final_answer self.sources = sources or [] self.searches = searches - self.aggregations = aggregations - self.usage = usage or {} \ No newline at end of file + self.usage = usage or {} + +class RerankerClient(BaseModel): + name: Literal["cohere", "voyage"] + client: Any + +class RerankItem(BaseModel): + index: int + relevance_score: float + +class ListwiseRankedDocument(BaseModel): + content: Any + original_position: int + current_position: Optional[int] = None + +class MultiLMConfig(BaseModel): + signature_name: str + lm: Any \ No newline at end of file diff --git a/retrieve_dspy/retrievers/__init__.py b/retrieve_dspy/retrievers/__init__.py index dafe5f9..18c9d2d 100644 --- a/retrieve_dspy/retrievers/__init__.py +++ b/retrieve_dspy/retrievers/__init__.py @@ -1,24 +1,37 @@ -from .vanilla_rag import VanillaRAG -from .multi_query_writer import MultiQueryWriter -from .query_expander import QueryExpander -from .query_expander_with_hint import QueryExpanderWithHint -from .query_expander_with_reranker import QueryExpanderWithReranker -from .cross_encoder_reranker import CrossEncoderReranker -from .best_match_reranker import BestMatchReranker -from .listwise_reranker import ListwiseReranker -from .summarized_listwise_reranker import SummarizedListwiseReranker -from .query_writer_and_listwise_reranker import QueryWriterWithListwiseReranker -from .multi_query_writer_with_hint import MultiQueryWriterWithHint -from .multi_query_writer_with_reranker import MultiQueryWriterWithReranker -from .filtered_query_writer import FilteredQueryWriter -from .looping_query_writer import LoopingQueryWriter -from .layered_reranker import LayeredReranker -from .decompose_and_expand import DecomposeAndExpand -from .decompose_and_expand_with_hints import DecomposeAndExpandWithHints -from .query_document_summarizer import QueryDocumentSummarizer +from .hybrid_search import HybridSearch +from .query_writers.multi_query_writer import MultiQueryWriter +from .query_writers.query_expander import QueryExpander +from .query_writers.query_expander_with_hint import QueryExpanderWithHint +from .query_writers.query_expander_with_reranker import QueryExpanderWithReranker +from .query_writers.rag_fusion import RAGFusion +from .rerankers.cross_encoder_reranker import CrossEncoderReranker +from .atomics.best_match_reranker import BestMatchReranker +from .rerankers.listwise_reranker import ListwiseReranker +from .rerankers.summarized_listwise_reranker import SummarizedListwiseReranker +from .query_writers.multi_query_writer_with_hint import MultiQueryWriterWithHint +from .query_writers.multi_query_writer_with_reranker import MultiQueryWriterWithReranker +from .query_writers.filtered_query_writer import FilteredQueryWriter +from .multi_hop.simplified_baleen_with_cross_encoder import SimplifiedBaleenWithCrossEncoder +from .rerankers.layered_best_match_reranker import LayeredBestMatchReranker +from .rerankers.layered_listwise_reranker import LayeredListwiseReranker +from .query_writers.decompose_and_expand import DecomposeAndExpand +from .query_writers.decompose_and_expand_with_hints import DecomposeAndExpandWithHints +from .atomics.query_document_summarizer import QueryDocumentSummarizer +from .compositions.quipler import QUIPLER +from .query_writers.HyDE import HyDE_QueryExpander +from .query_writers.LameR import LameR_QueryExpander +from .query_writers.ThinkQE import ThinkQE_QueryExpander +from .rerankers.sliding_window_listwise_reranker import SlidingWindowListwiseReranker +from .rerankers.top_down_partitioning_reranker import TopDownPartitioningReranker __all__ = [ - "VanillaRAG", + "HybridSearch", + "HyDE_QueryExpander", + "LameR_QueryExpander", + "ThinkQE_QueryExpander", + "SlidingWindowListwiseReranker", + "TopDownPartitioningReranker", + "RAGFusion", "CrossEncoderReranker", "ListwiseReranker", "BestMatchReranker", @@ -26,14 +39,15 @@ "MultiQueryWriterWithHint", "MultiQueryWriterWithReranker", "SummarizedListwiseReranker", - "QueryWriterWithListwiseReranker", "FilteredQueryWriter", - "LoopingQueryWriter", "QueryExpander", - "LayeredReranker", + "LayeredBestMatchReranker", + "LayeredListwiseReranker", "DecomposeAndExpand", "QueryExpanderWithHint", "DecomposeAndExpandWithHints", "QueryExpanderWithReranker", - "QueryDocumentSummarizer" + "QueryDocumentSummarizer", + "SimplifiedBaleenWithCrossEncoder", + "QUIPLER" ] diff --git a/retrieve_dspy/retrievers/best_match_reranker.py b/retrieve_dspy/retrievers/atomics/best_match_reranker.py similarity index 88% rename from retrieve_dspy/retrievers/best_match_reranker.py rename to retrieve_dspy/retrievers/atomics/best_match_reranker.py index a7d0f3a..0f16b13 100644 --- a/retrieve_dspy/retrievers/best_match_reranker.py +++ b/retrieve_dspy/retrievers/atomics/best_match_reranker.py @@ -4,7 +4,7 @@ import dspy -from retrieve_dspy.models import DSPyAgentRAGResponse, SearchResult +from retrieve_dspy.models import DSPyAgentRAGResponse, ObjectFromDB from retrieve_dspy.signatures import BestMatchRanker class BestMatchReranker(dspy.Module): @@ -19,7 +19,7 @@ def __init__( self.verbose = verbose self.reranker = dspy.ChainOfThought(BestMatchRanker) # update to send rationale through to metric - def forward(self, question: str, candidates: list[SearchResult]) -> DSPyAgentRAGResponse: + def forward(self, question: str, candidates: list[ObjectFromDB]) -> DSPyAgentRAGResponse: # Perform reranking rerank_pred = self.reranker( query=question, @@ -73,9 +73,9 @@ async def main(): ) test_q = "What number did David Ortiz wear when he played for the Boston Red Sox?" candidates = [ - SearchResult(id=1, content="David Ortiz wore the number 34 when he played for the Boston Red Sox."), - SearchResult(id=2, content="Derek Jeter wore the number 2 for the New York Yankees throughout his career."), - SearchResult(id=3, content="The Boston Red Sox retired David Ortiz's number 34 in 2017, making him the 11th player to receive this honor."), + ObjectFromDB(id=1, content="David Ortiz wore the number 34 when he played for the Boston Red Sox."), + ObjectFromDB(id=2, content="Derek Jeter wore the number 2 for the New York Yankees throughout his career."), + ObjectFromDB(id=3, content="The Boston Red Sox retired David Ortiz's number 34 in 2017, making him the 11th player to receive this honor."), ] response = test_pipeline.forward(test_q, candidates) print(response) diff --git a/retrieve_dspy/retrievers/query_document_summarizer.py b/retrieve_dspy/retrievers/atomics/query_document_summarizer.py similarity index 100% rename from retrieve_dspy/retrievers/query_document_summarizer.py rename to retrieve_dspy/retrievers/atomics/query_document_summarizer.py diff --git a/retrieve_dspy/retrievers/base_rag.py b/retrieve_dspy/retrievers/base_rag.py index 5a304aa..3083876 100644 --- a/retrieve_dspy/retrievers/base_rag.py +++ b/retrieve_dspy/retrievers/base_rag.py @@ -4,24 +4,31 @@ import dspy -from retrieve_dspy.models import DSPyAgentRAGResponse +from retrieve_dspy.models import DSPyAgentRAGResponse, MultiLMConfig class BaseRAG(dspy.Module): def __init__( - self, + self, collection_name: str, target_property_name: Optional[str] = "content", verbose: Optional[bool] = True, search_only: Optional[bool] = True, retrieved_k: Optional[int] = 5, + verbose_signature: Optional[bool] = True, + multi_lm_configs: Optional[list[MultiLMConfig]] = None, ) -> None: self.collection_name = collection_name self.target_property_name = target_property_name self.verbose = verbose self.search_only = search_only self.retrieved_k = retrieved_k + self.verbose_signature = verbose_signature + self.multi_lm_configs = multi_lm_configs + if self.multi_lm_configs: + self._multi_lm_configs_to_dict() + else: + self.multi_lm_configs_dict = None - # TODO: Interface ablating `lms` here lm = dspy.LM( "openai/gpt-4.1-mini", cache=False, @@ -44,6 +51,9 @@ def _merge_usage(*usages: dict[str, dict[str, int]]) -> dict[str, dict[str, int] bucket["completion_tokens"] += stats.get("completion_tokens", 0) return merged + def _multi_lm_configs_to_dict(self): + self.multi_lm_configs_dict = {config.signature_name: config.lm for config in self.multi_lm_configs} + @abc.abstractmethod def forward(self, question: str) -> DSPyAgentRAGResponse: ... diff --git a/retrieve_dspy/retrievers/common/call_ce_ranker.py b/retrieve_dspy/retrievers/common/call_ce_ranker.py new file mode 100644 index 0000000..15dd3c6 --- /dev/null +++ b/retrieve_dspy/retrievers/common/call_ce_ranker.py @@ -0,0 +1,310 @@ +from __future__ import annotations + +import asyncio +import inspect +from typing import Any, Callable, Dict, List, Literal, Optional + +from retrieve_dspy.models import RerankerClient # Pydantic: name: Literal["cohere","voyage"], client: Any +from retrieve_dspy.models import ObjectFromDB, RerankItem + +Provider = Literal["cohere", "voyage", "hybrid"] + +# (query, documents, top_k) -> List[RerankItem] +SyncReranker = Callable[[str, List[str], int], List[RerankItem]] +# (query, documents, top_k) -> awaitable List[RerankItem] +AsyncReranker = Callable[[str, List[str], int], "asyncio.Future[List[RerankItem]]"] + +def make_cohere_reranker(client: Any, model: str = "rerank-v3.5") -> SyncReranker: + def _fn(query: str, documents: List[str], top_k: int) -> List[RerankItem]: + rerank_call = client.rerank( + model=model, + query=query, + documents=list(documents), + top_n=min(top_k, len(documents)), + ) + + # If the result is a coroutine, we can't handle it in sync context + if inspect.iscoroutine(rerank_call): + raise RuntimeError("Cannot use async client in sync reranker. Use async_ce_rank instead.") + + return [RerankItem(index=r.index, relevance_score=float(r.relevance_score)) for r in rerank_call.results] + return _fn + +def make_voyage_reranker(client: Any, model: str = "rerank-2.5") -> SyncReranker: + def _fn(query: str, documents: List[str], top_k: int) -> List[RerankItem]: + rerank_call = client.rerank( + query=query, + documents=list(documents), + model=model, + top_k=min(top_k, len(documents)), + ) + + # If the result is a coroutine, we can't handle it in sync context + if inspect.iscoroutine(rerank_call): + raise RuntimeError("Cannot use async client in sync reranker. Use async_ce_rank instead.") + + return [RerankItem(index=r.index, relevance_score=float(r.relevance_score)) for r in rerank_call.results] + return _fn + +def make_async_cohere_reranker(client: Any, model: str = "rerank-v3.5") -> AsyncReranker: + async def _fn(query: str, documents: List[str], top_k: int) -> List[RerankItem]: + res = await client.rerank( + model=model, + query=query, + documents=list(documents), + top_n=min(top_k, len(documents)), + ) + return [RerankItem(index=r.index, relevance_score=float(r.relevance_score)) for r in res.results] + return _fn + +def make_async_voyage_reranker(client: Any, model: str = "rerank-2.5") -> AsyncReranker: + async def _fn(query: str, documents: List[str], top_k: int) -> List[RerankItem]: + res = await client.rerank( + query=query, + documents=list(documents), + model=model, + top_k=min(top_k, len(documents)), + ) + return [RerankItem(index=r.index, relevance_score=float(r.relevance_score)) for r in res.results] + return _fn + +def fuse_rrf( + rankings: Dict[str, List[RerankItem]], + top_k: int, + *, + rrf_k: int = 60, + weights: Optional[Dict[str, float]] = None, + verbose: bool = False, +) -> List[RerankItem]: + weights = weights or {} + if not rankings.get("cohere") and rankings.get("voyage"): + return rankings["voyage"][:top_k] + if not rankings.get("voyage") and rankings.get("cohere"): + return rankings["cohere"][:top_k] + if not rankings.get("cohere") and not rankings.get("voyage"): + raise RuntimeError("Both rerankers returned no results") + + scores: Dict[int, float] = {} + for name, items in rankings.items(): + if not items: + continue + w = float(weights.get(name, 0.5)) + for rank, it in enumerate(items): + contrib = w * (1.0 / (rrf_k + rank + 1)) + scores[it.index] = scores.get(it.index, 0.0) + contrib + if verbose and rank < 3: + print(f"{name} rank {rank+1}: doc {it.index}, +{contrib:.4f}") + + fused_pairs = sorted(scores.items(), key=lambda x: x[1], reverse=True)[:top_k] + return [RerankItem(index=i, relevance_score=s) for i, s in fused_pairs] + +def _single_provider( + provider: Literal["cohere", "voyage"], + *, + rerankers: Dict[str, SyncReranker], + query: str, + documents: List[str], + top_k: int, +) -> List[RerankItem]: + fn = rerankers.get(provider) + if not fn: + raise ValueError(f"Missing reranker for provider '{provider}'") + return fn(query, documents, top_k) + + +def rerank( + provider: Provider, + query: str, + documents: List[str], + top_k: int, + *, + rerankers: Dict[str, SyncReranker], + rrf_k: int = 60, + hybrid_weights: Optional[Dict[str, float]] = None, + verbose: bool = False, +) -> List[RerankItem]: + if provider in ("cohere", "voyage"): + return _single_provider(provider, rerankers=rerankers, query=query, documents=documents, top_k=top_k) + + if provider == "hybrid": + results: Dict[str, List[RerankItem]] = {} + for p in ("cohere", "voyage"): + fn = rerankers.get(p) + if not fn: + results[p] = [] + continue + try: + results[p] = fn(query, documents, top_k) + except Exception as e: + if verbose: + print(f"{p} rerank error: {e}") + results[p] = [] + return fuse_rrf(results, top_k, rrf_k=rrf_k, weights=hybrid_weights, verbose=verbose) + + raise ValueError(f"Unsupported provider: {provider}") + + +async def async_rerank( + provider: Provider, + query: str, + documents: List[str], + top_k: int, + *, + async_rerankers: Optional[Dict[str, AsyncReranker]] = None, + rerankers: Optional[Dict[str, SyncReranker]] = None, + rrf_k: int = 60, + hybrid_weights: Optional[Dict[str, float]] = None, + verbose: bool = False, +) -> List[RerankItem]: + async_rerankers = async_rerankers or {} + + async def _run(p: str) -> List[RerankItem]: + # First try async rerankers + if p in async_rerankers: + return await async_rerankers[p](query, documents, top_k) + # Fallback to sync rerankers only if no async version available + if rerankers and p in rerankers: + return await asyncio.to_thread(rerankers[p], query, documents, top_k) + if verbose: + print(f"No reranker registered for {p}") + return [] + + if provider in ("cohere", "voyage"): + return await _run(provider) + + if provider == "hybrid": + co_task = asyncio.create_task(_run("cohere")) + vo_task = asyncio.create_task(_run("voyage")) + co_items, vo_items = await asyncio.gather(co_task, vo_task) + return fuse_rrf({"cohere": co_items, "voyage": vo_items}, top_k, rrf_k=rrf_k, weights=hybrid_weights, verbose=verbose) + + raise ValueError(f"Unsupported provider: {provider}") + +def _adapters_from_clients( + clients: Optional[List[RerankerClient]], + *, + cohere_model: str, + voyage_model: str, +) -> Dict[str, SyncReranker]: + adapters: Dict[str, SyncReranker] = {} + if not clients: + return adapters + for rc in clients: + if rc.name == "cohere": + adapters["cohere"] = make_cohere_reranker(rc.client, cohere_model) if not callable(rc.client) else rc.client # type: ignore[assignment] + elif rc.name == "voyage": + adapters["voyage"] = make_voyage_reranker(rc.client, voyage_model) if not callable(rc.client) else rc.client # type: ignore[assignment] + else: + raise ValueError("RerankerClient.name must be 'cohere' or 'voyage'") + return adapters + +def _async_adapters_from_clients( + clients: Optional[List[RerankerClient]], + *, + cohere_model: str, + voyage_model: str, +) -> Dict[str, AsyncReranker]: + adapters: Dict[str, AsyncReranker] = {} + if not clients: + return adapters + for rc in clients: + if rc.name == "cohere": + # Check if client.rerank is async by examining if it's a coroutine function + if hasattr(rc.client, 'rerank') and inspect.iscoroutinefunction(rc.client.rerank): + adapters["cohere"] = make_async_cohere_reranker(rc.client, cohere_model) + # If it's already a callable async reranker, use it directly + elif callable(rc.client) and inspect.iscoroutinefunction(rc.client): + adapters["cohere"] = rc.client # type: ignore[assignment] + elif rc.name == "voyage": + if hasattr(rc.client, 'rerank') and inspect.iscoroutinefunction(rc.client.rerank): + adapters["voyage"] = make_async_voyage_reranker(rc.client, voyage_model) + elif callable(rc.client) and inspect.iscoroutinefunction(rc.client): + adapters["voyage"] = rc.client # type: ignore[assignment] + else: + raise ValueError("RerankerClient.name must be 'cohere' or 'voyage'") + return adapters + + +def _pick_provider(requested: Optional[Provider], available: Dict[str, Any]) -> Provider: + have_co = "cohere" in available + have_vo = "voyage" in available + if requested is None: + if have_co and have_vo: + return "hybrid" + if have_co: + return "cohere" + if have_vo: + return "voyage" + raise ValueError("No rerankers provided.") + if requested == "hybrid": + if have_co and have_vo: + return "hybrid" + if have_co: + return "cohere" + if have_vo: + return "voyage" + raise ValueError("Hybrid requested but no rerankers provided.") + if requested not in ("cohere", "voyage"): + raise ValueError(f"Unsupported provider: {requested}") + if requested not in available: + raise ValueError(f"Provider '{requested}' requested but not provided.") + return requested + + +def ce_rank( + query: str, + documents: List[str], + top_k: int, + *, + clients: Optional[List[RerankerClient]] = None, + provider: Optional[Provider] = None, + cohere_model: str = "rerank-v3.5", + voyage_model: str = "rerank-2.5", + rrf_k: int = 60, + hybrid_weights: Optional[Dict[str, float]] = None, + verbose: bool = False, +) -> List[RerankItem]: + adapters = _adapters_from_clients(clients, cohere_model=cohere_model, voyage_model=voyage_model) + eff = _pick_provider(provider, adapters) + return rerank(eff, query, documents, top_k, rerankers=adapters, rrf_k=rrf_k, hybrid_weights=hybrid_weights, verbose=verbose) + + +async def async_ce_rank( + query: str, + documents: List[str], + top_k: int, + *, + clients: Optional[List[RerankerClient]] = None, + provider: Optional[Provider] = None, + cohere_model: str = "rerank-v3.5", + voyage_model: str = "rerank-2.5", + rrf_k: int = 60, + hybrid_weights: Optional[Dict[str, float]] = None, + verbose: bool = False, +) -> List[RerankItem]: + # Create both sync and async adapters + sync_adapters = _adapters_from_clients(clients, cohere_model=cohere_model, voyage_model=voyage_model) + async_adapters = _async_adapters_from_clients(clients, cohere_model=cohere_model, voyage_model=voyage_model) + + # Combine available adapters for provider selection + all_adapters = {**sync_adapters, **async_adapters} + eff = _pick_provider(provider, all_adapters) + + return await async_rerank( + eff, + query, + documents, + top_k, + async_rerankers=async_adapters, + rerankers=sync_adapters, + rrf_k=rrf_k, + hybrid_weights=hybrid_weights, + verbose=verbose + ) + +def reorder(items: List[RerankItem], sources: List[ObjectFromDB]) -> List[ObjectFromDB]: + out: List[ObjectFromDB] = [] + for i, it in enumerate(items): + if 0 <= it.index < len(sources): + out.append(sources[it.index]) + return out \ No newline at end of file diff --git a/retrieve_dspy/retrievers/common/deduplicate.py b/retrieve_dspy/retrievers/common/deduplicate.py new file mode 100644 index 0000000..eea457b --- /dev/null +++ b/retrieve_dspy/retrievers/common/deduplicate.py @@ -0,0 +1,11 @@ +from retrieve_dspy.models import ObjectFromDB + +def deduplicate_and_join(original_list: list[ObjectFromDB], incoming_list: list[ObjectFromDB]) -> list[ObjectFromDB]: + seen_ids = set() + for obj in original_list: + seen_ids.add(obj.object_id) + for obj in incoming_list: + if obj.object_id not in seen_ids: + seen_ids.add(obj.object_id) + original_list.append(obj) + return original_list \ No newline at end of file diff --git a/retrieve_dspy/retrievers/common/rrf.py b/retrieve_dspy/retrievers/common/rrf.py new file mode 100644 index 0000000..e908f99 --- /dev/null +++ b/retrieve_dspy/retrievers/common/rrf.py @@ -0,0 +1,54 @@ +from typing import List, Dict, Optional +from collections import defaultdict +from retrieve_dspy.models import ObjectFromDB + +def reciprocal_rank_fusion( + result_sets: List[List[ObjectFromDB]], + k: int = 60, # Standard RRF constant + top_k: Optional[int] = None +) -> List[ObjectFromDB]: + """ + Combine multiple ranked lists using Reciprocal Rank Fusion. + + Args: + result_sets: List of lists, each containing ObjectFromDB results from different queries + k: RRF constant (typically 60) + top_k: Number of top results to return + """ + # Track RRF scores and document details + rrf_scores: Dict[str, float] = defaultdict(float) + doc_map: Dict[str, ObjectFromDB] = {} + + for result_set in result_sets: + for rank, obj in enumerate(result_set, start=1): + doc_id = obj.object_id + + # Calculate RRF score: 1/(rank + k) + rrf_scores[doc_id] += 1.0 / (rank + k) + + # Store document if not seen before (keeps first occurrence) + if doc_id not in doc_map: + doc_map[doc_id] = obj + + # Sort by RRF score and create final ranking + sorted_docs = sorted( + rrf_scores.items(), + key=lambda x: x[1], + reverse=True + ) + + # Create output list with updated ranks and scores + results = [] + for new_rank, (doc_id, rrf_score) in enumerate(sorted_docs[:top_k], start=1): + obj = doc_map[doc_id] + # Create new object with updated rank and score + results.append(ObjectFromDB( + object_id=obj.object_id, + content=obj.content, + relevance_rank=new_rank, + relevance_score=rrf_score, + vector=obj.vector, + source_query=obj.source_query + )) + + return results \ No newline at end of file diff --git a/retrieve_dspy/retrievers/compositions/looping_quipler.py b/retrieve_dspy/retrievers/compositions/looping_quipler.py new file mode 100644 index 0000000..e69de29 diff --git a/retrieve_dspy/retrievers/compositions/quipler.py b/retrieve_dspy/retrievers/compositions/quipler.py new file mode 100644 index 0000000..f8d4ac3 --- /dev/null +++ b/retrieve_dspy/retrievers/compositions/quipler.py @@ -0,0 +1,359 @@ +from __future__ import annotations + +import asyncio +import concurrent.futures +from typing import Optional, List +import weaviate + +from retrieve_dspy.retrievers.base_rag import BaseRAG +from retrieve_dspy.signatures import VerboseWriteSearchQueries, WriteSearchQueries +from retrieve_dspy.retrievers.common.rrf import reciprocal_rank_fusion +from retrieve_dspy.retrievers import CrossEncoderReranker +from retrieve_dspy.models import DSPyAgentRAGResponse, ObjectFromDB, RerankerClient + +import dspy + + +class QUIPLER(BaseRAG): + def __init__( + self, + collection_name: str, + target_property_name: str, + weaviate_client: Optional[weaviate.WeaviateClient] = None, + reranker_clients: Optional[List[RerankerClient]] = None, + return_property_name: Optional[str] = None, + verbose: bool = False, + verbose_signature: bool = True, + search_only: bool = True, + retrieved_k: int = 50, + reranked_k: int = 20, + rrf_k: int = 60, + **cross_encoder_kwargs + ): + super().__init__( + weaviate_client=weaviate_client, + collection_name=collection_name, + target_property_name=target_property_name, + verbose=verbose, + verbose_signature=verbose_signature, + search_only=search_only, + retrieved_k=retrieved_k, + ) + + # Query generation + if self.verbose_signature: + self.query_writer = dspy.ChainOfThought(VerboseWriteSearchQueries) + else: + self.query_writer = dspy.Predict(WriteSearchQueries) + + self.rrf_k = rrf_k + self.reranked_k = reranked_k + + # Cross-encoder reranker + self.searcher = CrossEncoderReranker( + collection_name=collection_name, + target_property_name=target_property_name, + weaviate_client=weaviate_client, + reranker_clients=reranker_clients, + return_property_name=return_property_name, + verbose=verbose, + search_only=search_only, + retrieved_k=retrieved_k, + reranked_k=reranked_k, + **cross_encoder_kwargs + ) + + def forward( + self, + question: str, + weaviate_client: Optional[weaviate.WeaviateClient] = None, + reranker_clients: Optional[List[RerankerClient]] = None, + ) -> DSPyAgentRAGResponse: + """Synchronous version with sequential execution.""" + if self.verbose: + print(f"\033[95m[QUIPLER] Starting sync query expansion for: '{question}'\033[0m") + + # Generate multiple search queries + query_result = self.query_writer(question=question) + queries = query_result.search_queries + + if self.verbose: + print(f"\033[95m[QUIPLER] Generated {len(queries)} queries for sequential execution:\033[0m") + for i, q in enumerate(queries, 1): + print(f" {i}. {q}") + + # Run searches sequentially + return self._run_sequential(queries, question, weaviate_client, reranker_clients) + + def _run_parallel_sync( + self, + queries: List[str], + original_question: str, + weaviate_client: Optional[weaviate.WeaviateClient] = None, + reranker_clients: Optional[List[RerankerClient]] = None, + ) -> DSPyAgentRAGResponse: + """Run searches in parallel from sync context.""" + + async def parallel_search_async(): + # We'll need async clients for true parallelism + weaviate_async_client = None + + # Try to set up async weaviate client + if weaviate_client: + try: + from retrieve_dspy.clients import get_and_connect_weaviate_async_client + weaviate_async_client = await get_and_connect_weaviate_async_client() + except Exception as e: + if self.verbose: + print(f"\033[93m[QUIPLER] Could not create async weaviate client: {e}\033[0m") + # Fall back to sync execution + raise e + + # Search with all queries in parallel + async def search_single_query(query: str, index: int) -> tuple[int, DSPyAgentRAGResponse]: + if self.verbose: + print(f"\033[96m[QUIPLER] Starting parallel search {index+1}/{len(queries)}\033[0m") + + result = await self.searcher.aforward( + question=query, + weaviate_async_client=weaviate_async_client, + reranker_clients=reranker_clients + ) + + if self.verbose: + print(f"\033[96m[QUIPLER] Completed parallel search {index+1}/{len(queries)}\033[0m") + + return index, result + + # Execute all searches in parallel + search_tasks = [ + search_single_query(query, i) + for i, query in enumerate(queries) + ] + + search_results = await asyncio.gather(*search_tasks, return_exceptions=True) + + # Clean up async client + if weaviate_async_client: + await weaviate_async_client.close() + + return search_results + + # Handle event loop scenarios + try: + # Check if we're already in an async context + # current_loop = asyncio.get_running_loop() + + # If we're in an async context, we need to run in a thread + with concurrent.futures.ThreadPoolExecutor() as executor: + future = executor.submit(lambda: asyncio.run(parallel_search_async())) + search_results = future.result() + + except RuntimeError: + # No event loop running, we can use asyncio.run() directly + search_results = asyncio.run(parallel_search_async()) + + return self._process_search_results(search_results, queries, original_question, weaviate_client, reranker_clients) + + def _run_sequential( + self, + queries: List[str], + original_question: str, + weaviate_client: Optional[weaviate.WeaviateClient] = None, + reranker_clients: Optional[List[RerankerClient]] = None, + ) -> DSPyAgentRAGResponse: + """Fallback to sequential execution.""" + if self.verbose: + print(f"\033[95m[QUIPLER] Running {len(queries)} searches sequentially\033[0m") + + all_results: List[List[ObjectFromDB]] = [] + all_searches: List[str] = [] + + for i, query in enumerate(queries): + if self.verbose: + print(f"\033[96m[QUIPLER] Searching with query {i+1}/{len(queries)}\033[0m") + + try: + result = self.searcher.forward( + question=query, + weaviate_client=weaviate_client, + reranker_clients=reranker_clients + ) + all_results.append(result.sources) + all_searches.extend(result.searches) + except Exception as e: + if self.verbose: + print(f"\033[91m[QUIPLER] Search {i+1} failed: {e}\033[0m") + continue + + if not all_results: + if self.verbose: + print("\033[91m[QUIPLER] All searches failed, using single search fallback\033[0m") + return self.searcher.forward( + question=original_question, + weaviate_client=weaviate_client, + reranker_clients=reranker_clients + ) + + # Aggregate results using RRF + fused_sources = reciprocal_rank_fusion( + result_sets=all_results, + k=self.rrf_k, + top_k=self.reranked_k + ) + + if self.verbose: + print(f"\033[95m[QUIPLER] Final result: {len(fused_sources)} documents\033[0m") + + return DSPyAgentRAGResponse( + final_answer="", + sources=fused_sources, + searches=all_searches, + aggregations=None, + usage={}, + ) + + def _process_search_results( + self, + search_results: List, + queries: List[str], + original_question: str, + weaviate_client: Optional[weaviate.WeaviateClient] = None, + reranker_clients: Optional[List[RerankerClient]] = None, + ) -> DSPyAgentRAGResponse: + """Process parallel search results and fuse them.""" + all_results: List[List[ObjectFromDB]] = [] + all_searches: List[str] = [] + successful_searches = 0 + + for result in search_results: + if isinstance(result, Exception): + if self.verbose: + print(f"\033[91m[QUIPLER] Search failed: {result}\033[0m") + continue + + index, response = result + all_results.append(response.sources) + all_searches.extend(response.searches) + successful_searches += 1 + + if successful_searches == 0: + if self.verbose: + print("\033[91m[QUIPLER] All parallel searches failed, using single search fallback\033[0m") + return self.searcher.forward( + question=original_question, + weaviate_client=weaviate_client, + reranker_clients=reranker_clients + ) + + # Aggregate results using RRF + if self.verbose: + print(f"\033[95m[QUIPLER] Fusing results from {successful_searches}/{len(queries)} successful searches\033[0m") + + fused_sources = reciprocal_rank_fusion( + result_sets=all_results, + k=self.rrf_k, + top_k=self.reranked_k + ) + + if self.verbose: + print(f"\033[95m[QUIPLER] Final parallel result: {len(fused_sources)} documents\033[0m") + + return DSPyAgentRAGResponse( + final_answer="", + sources=fused_sources, + searches=all_searches, + aggregations=None, + usage={}, + ) + + async def aforward( + self, + question: str, + weaviate_async_client: Optional[weaviate.AsyncWeaviateClient] = None, + reranker_clients: Optional[List[RerankerClient]] = None, + ) -> DSPyAgentRAGResponse: + """Asynchronous version - generates queries and searches in parallel.""" + if self.verbose: + print(f"\033[95m[QUIPLER] Starting async query expansion for: '{question}'\033[0m") + + # Generate multiple search queries + query_result = self.query_writer(question=question) + queries = query_result.search_queries + + if self.verbose: + print(f"\033[95m[QUIPLER] Generated {len(queries)} queries for parallel search:\033[0m") + for i, q in enumerate(queries, 1): + print(f" {i}. {q}") + + # Search with all queries in parallel + async def search_single_query(query: str, index: int) -> tuple[int, DSPyAgentRAGResponse]: + if self.verbose: + print(f"\033[96m[QUIPLER] Starting parallel search {index+1}/{len(queries)}\033[0m") + + result = await self.searcher.aforward( + question=query, + weaviate_async_client=weaviate_async_client, + reranker_clients=reranker_clients + ) + + if self.verbose: + print(f"\033[96m[QUIPLER] Completed parallel search {index+1}/{len(queries)}\033[0m") + + return index, result + + # Execute all searches in parallel + search_tasks = [ + search_single_query(query, i) + for i, query in enumerate(queries) + ] + + search_results = await asyncio.gather(*search_tasks, return_exceptions=True) + + # Process results and handle any exceptions + all_results: List[List[ObjectFromDB]] = [] + all_searches: List[str] = [] + successful_searches = 0 + + for result in search_results: + if isinstance(result, Exception): + if self.verbose: + print(f"\033[91m[QUIPLER] Search failed: {result}\033[0m") + continue + + index, response = result + all_results.append(response.sources) + all_searches.extend(response.searches) + successful_searches += 1 + + if successful_searches == 0: + if self.verbose: + print("\033[91m[QUIPLER] All searches failed, falling back to original question\033[0m") + # Fallback to single search with original question + fallback_result = await self.searcher.aforward( + question=question, + weaviate_async_client=weaviate_async_client, + reranker_clients=reranker_clients + ) + return fallback_result + + # Aggregate results using RRF + if self.verbose: + print(f"\033[95m[QUIPLER] Fusing results from {successful_searches}/{len(queries)} successful searches\033[0m") + + fused_sources = reciprocal_rank_fusion( + result_sets=all_results, + k=self.rrf_k, + top_k=self.reranked_k + ) + + if self.verbose: + print(f"\033[95m[QUIPLER] Final async result: {len(fused_sources)} documents\033[0m") + + return DSPyAgentRAGResponse( + final_answer="", + sources=fused_sources, + searches=all_searches, + aggregations=None, + usage={}, + ) \ No newline at end of file diff --git a/retrieve_dspy/retrievers/compositions/react_with_retriever.py b/retrieve_dspy/retrievers/compositions/react_with_retriever.py new file mode 100644 index 0000000..e69de29 diff --git a/retrieve_dspy/retrievers/cross_encoder_reranker.py b/retrieve_dspy/retrievers/cross_encoder_reranker.py deleted file mode 100644 index a6d6d1d..0000000 --- a/retrieve_dspy/retrievers/cross_encoder_reranker.py +++ /dev/null @@ -1,601 +0,0 @@ -import asyncio -import os -from typing import Optional, List, Literal, Union, Any, Dict, Tuple -from collections import defaultdict - -import cohere -import voyageai -import dspy - -from retrieve_dspy.database.weaviate_database import ( - weaviate_search_tool -) - -from retrieve_dspy.retrievers.base_rag import BaseRAG -from retrieve_dspy.models import DSPyAgentRAGResponse, SearchResult -from retrieve_dspy.signatures import QuerySummarizer - -class CrossEncoderReranker(BaseRAG): - def __init__( - self, - collection_name: str, - target_property_name: str, - return_property_name: Optional[str] = None, - verbose: Optional[bool] = False, - search_only: Optional[bool] = True, - retrieved_k: Optional[int] = 50, - reranked_k: Optional[int] = 20, - reranker_provider: Literal["cohere", "voyage", "hybrid"] = "cohere", - cohere_model: Optional[str] = "rerank-v3.5", - voyage_model: Optional[str] = "rerank-2.5", - cohere_api_key: Optional[str] = None, - voyage_api_key: Optional[str] = None, - summarize_query: Optional[bool] = False, - rrf_k: Optional[int] = 60, - hybrid_weights: Optional[Dict[str, float]] = None - ): - """ - Initialize the Cross Encoder Reranker. - - Args: - collection_name: Weaviate collection name - target_property_name: Property to search in Weaviate - verbose: Whether to print debug information - search_only: Whether to only search without generating answers - retrieved_k: Number of documents to retrieve initially - reranked_k: Number of documents to keep after reranking - reranker_provider: Which reranker to use ("cohere", "voyage", or "hybrid") - cohere_model: Cohere reranking model to use - voyage_model: Voyage reranking model to use - cohere_api_key: Cohere API key (defaults to COHERE_API_KEY env var) - voyage_api_key: Voyage API key (defaults to VOYAGE_API_KEY env var) - summarize_query: Whether to summarize the query before reranking - rrf_k: K parameter for Reciprocal Rank Fusion (default 60) - hybrid_weights: Optional weights for each reranker in hybrid mode - (e.g., {"cohere": 0.6, "voyage": 0.4}) - """ - super().__init__( - collection_name=collection_name, - target_property_name=target_property_name, - verbose=verbose, - search_only=search_only, - retrieved_k=retrieved_k, - ) - self.return_property_name = return_property_name - self.reranked_k = reranked_k - self.reranker_provider = reranker_provider - self.cohere_model = cohere_model - self.voyage_model = voyage_model - self.summarize_query = summarize_query - self.query_summarizer = dspy.Predict(QuerySummarizer) - self.rrf_k = rrf_k - self.hybrid_weights = hybrid_weights or {"cohere": 0.5, "voyage": 0.5} - - # Initialize clients based on provider - if reranker_provider in ["cohere", "hybrid"]: - api_key = cohere_api_key or os.getenv("COHERE_API_KEY") - if not api_key: - raise ValueError("COHERE_API_KEY must be provided or set as environment variable") - self.co = cohere.ClientV2(api_key) - - if reranker_provider in ["voyage", "hybrid"]: - api_key = voyage_api_key or os.getenv("VOYAGE_API_KEY") - if not api_key: - raise ValueError("VOYAGE_API_KEY must be provided or set as environment variable") - self.vo = voyageai.Client(api_key=api_key) - - if reranker_provider not in ["cohere", "voyage", "hybrid"]: - raise ValueError(f"Unsupported reranker provider: {reranker_provider}") - - def _rerank_with_cohere( - self, - query: str, - documents: List[str] - ) -> List[Any]: - """ - Rerank documents using Cohere's Cross Encoder. - - Args: - query: User query - documents: List of document texts to rerank - - Returns: - Reranked results from Cohere - """ - try: - response = self.co.rerank( - model=self.cohere_model, - query=query, - documents=documents, - top_n=min(self.reranked_k, len(documents)) - ) - return response.results - except Exception as e: - if self.verbose: - print(f"\033[91mError during Cohere reranking: {e}\033[0m") - raise - - def _rerank_with_voyage( - self, - query: str, - documents: List[str] - ) -> List[Any]: - """ - Rerank documents using Voyage's Cross Encoder. - - Args: - query: User query - documents: List of document texts to rerank - - Returns: - Reranked results from Voyage - """ - try: - response = self.vo.rerank( - query=query, - documents=documents, - model=self.voyage_model, - top_k=min(self.reranked_k, len(documents)) - ) - return response.results - except Exception as e: - if self.verbose: - print(f"\033[91mError during Voyage reranking: {e}\033[0m") - raise - - def _reciprocal_rank_fusion( - self, - rankings: Dict[str, List[Tuple[int, float]]], - k: int = 60 - ) -> List[Tuple[int, float]]: - """ - Combine multiple rankings using Reciprocal Rank Fusion. - - Args: - rankings: Dictionary mapping ranker name to list of (doc_index, score) tuples - k: RRF constant (default 60) - - Returns: - Combined ranking as list of (doc_index, fused_score) tuples - """ - rrf_scores = defaultdict(float) - - for ranker_name, ranked_docs in rankings.items(): - weight = self.hybrid_weights.get(ranker_name, 0.5) - - for rank, (doc_idx, original_score) in enumerate(ranked_docs): - # RRF formula: 1 / (k + rank) - # rank is 0-based, so we add 1 to get the actual position - rrf_score = weight * (1.0 / (k + rank + 1)) - rrf_scores[doc_idx] += rrf_score - - if self.verbose and rank < 3: - print(f" {ranker_name} - Rank {rank+1}: Doc {doc_idx}, " - f"Original score: {original_score:.4f}, " - f"RRF contribution: {rrf_score:.4f}") - - # Sort by RRF score in descending order - fused_ranking = sorted( - rrf_scores.items(), - key=lambda x: x[1], - reverse=True - ) - - if self.verbose: - print(f"\n\033[93mRRF Fusion Results (k={k}):\033[0m") - for i, (doc_idx, score) in enumerate(fused_ranking[:5]): - print(f" Final Rank {i+1}: Doc {doc_idx}, RRF Score: {score:.4f}") - - return fused_ranking[:self.reranked_k] - - def _rerank_hybrid( - self, - query: str, - documents: List[str] - ) -> List[Tuple[int, float]]: - """ - Rerank using both Cohere and Voyage, then fuse with RRF. - - Args: - query: User query - documents: List of document texts to rerank - - Returns: - Fused ranking as list of (doc_index, fused_score) tuples - """ - if self.verbose: - print("\n\033[95mHybrid Reranking Mode - Using both Cohere and Voyage\033[0m") - print(f"Weights: Cohere={self.hybrid_weights['cohere']}, " - f"Voyage={self.hybrid_weights['voyage']}") - - rankings = {} - - # Get Cohere rankings - try: - cohere_results = self._rerank_with_cohere(query, documents) - # Store as (doc_index, relevance_score) tuples - rankings["cohere"] = [ - (result.index, result.relevance_score) - for result in cohere_results - ] - if self.verbose: - print(f"\n\033[96mCohere returned {len(cohere_results)} results\033[0m") - except Exception as e: - if self.verbose: - print(f"\033[91mCohere reranking failed: {e}\033[0m") - rankings["cohere"] = [] - - # Get Voyage rankings - try: - voyage_results = self._rerank_with_voyage(query, documents) - rankings["voyage"] = [ - (result.index, result.relevance_score) - for result in voyage_results - ] - if self.verbose: - print(f"\033[96mVoyage returned {len(voyage_results)} results\033[0m") - except Exception as e: - if self.verbose: - print(f"\033[91mVoyage reranking failed: {e}\033[0m") - rankings["voyage"] = [] - - # If one ranker fails, use the other's results - if not rankings["cohere"] and rankings["voyage"]: - if self.verbose: - print("\033[93mUsing only Voyage results (Cohere failed)\033[0m") - return rankings["voyage"] - elif not rankings["voyage"] and rankings["cohere"]: - if self.verbose: - print("\033[93mUsing only Cohere results (Voyage failed)\033[0m") - return rankings["cohere"] - elif not rankings["cohere"] and not rankings["voyage"]: - raise RuntimeError("Both rerankers failed") - - # Fuse rankings using RRF - return self._reciprocal_rank_fusion(rankings, self.rrf_k) - - async def _async_rerank_hybrid( - self, - query: str, - documents: List[str] - ) -> List[Tuple[int, float]]: - """ - Asynchronously rerank using both providers and fuse with RRF. - - Args: - query: User query - documents: List of document texts to rerank - - Returns: - Fused ranking as list of (doc_index, fused_score) tuples - """ - if self.verbose: - print("\n\033[95mAsync Hybrid Reranking - Using both Cohere and Voyage\033[0m") - - loop = asyncio.get_event_loop() - - # Run both rerankers concurrently - # tasks = [] - rankings = {} - - # Cohere task - async def get_cohere_rankings(): - try: - results = await loop.run_in_executor( - None, self._rerank_with_cohere, query, documents - ) - return "cohere", [(r.index, r.relevance_score) for r in results] - except Exception as e: - if self.verbose: - print(f"\033[91mCohere async reranking failed: {e}\033[0m") - return "cohere", [] - - # Voyage task - async def get_voyage_rankings(): - try: - results = await loop.run_in_executor( - None, self._rerank_with_voyage, query, documents - ) - return "voyage", [(r.index, r.relevance_score) for r in results] - except Exception as e: - if self.verbose: - print(f"\033[91mVoyage async reranking failed: {e}\033[0m") - return "voyage", [] - - # Execute both tasks concurrently - results = await asyncio.gather( - get_cohere_rankings(), - get_voyage_rankings() - ) - - for ranker_name, ranked_docs in results: - rankings[ranker_name] = ranked_docs - if self.verbose and ranked_docs: - print(f"\033[96m{ranker_name.capitalize()} returned {len(ranked_docs)} results\033[0m") - - # Handle failures - if not rankings["cohere"] and rankings["voyage"]: - return rankings["voyage"] - elif not rankings["voyage"] and rankings["cohere"]: - return rankings["cohere"] - elif not rankings["cohere"] and not rankings["voyage"]: - raise RuntimeError("Both rerankers failed") - - # Fuse rankings - return self._reciprocal_rank_fusion(rankings, self.rrf_k) - - def _rerank_documents( - self, - query: str, - documents: List[str] - ) -> Union[List[Any], List[Tuple[int, float]]]: - """ - Rerank documents using the configured provider. - - Args: - query: User query - documents: List of document texts to rerank - - Returns: - Reranked results from the provider - """ - if self.reranker_provider == "cohere": - return self._rerank_with_cohere(query, documents) - elif self.reranker_provider == "voyage": - return self._rerank_with_voyage(query, documents) - elif self.reranker_provider == "hybrid": - return self._rerank_hybrid(query, documents) - else: - raise ValueError(f"Unsupported reranker provider: {self.reranker_provider}") - - async def _async_rerank_documents( - self, - query: str, - documents: List[str] - ) -> Union[List[Any], List[Tuple[int, float]]]: - """ - Asynchronously rerank documents using the configured provider. - - Args: - query: User query - documents: List of document texts to rerank - - Returns: - Reranked results from the provider - """ - if self.reranker_provider == "hybrid": - return await self._async_rerank_hybrid(query, documents) - else: - # For single rerankers, use the existing logic - loop = asyncio.get_event_loop() - return await loop.run_in_executor( - None, - self._rerank_documents, - query, - documents - ) - - def forward(self, question: str) -> DSPyAgentRAGResponse: - """ - Execute the retrieval and reranking pipeline. - - Args: - question: User query - - Returns: - DSPyAgentRAGResponse with reranked sources as SearchResult objects - """ - # Get initial search results - search_results, sources = weaviate_search_tool( - query=question, - collection_name=self.collection_name, - target_property_name=self.target_property_name, - return_property_name=self.return_property_name, - retrieved_k=self.retrieved_k, - return_format="rerank" - ) - - if self.verbose: - print(f"\033[96mInitial retrieval: {len(search_results)} documents\033[0m") - print(f"Query: '{question}'") - print(f"Using {self.reranker_provider} for reranking") - - # Extract document content directly from search results - documents = [] - for result in search_results: - # SearchResult objects have a 'content' attribute - doc_text = result.content if hasattr(result, 'content') else str(result) - documents.append(doc_text) - - if self.verbose: - print(f"\n\033[93mPreparing {len(documents)} documents for reranking...\033[0m") - for i, doc in enumerate(documents[:3]): # Show first 3 - preview = doc[:100] + "..." if len(doc) > 100 else doc - print(f" Doc {i+1} preview: {preview}") - - if self.summarize_query: - question_pred = self.query_summarizer(question=question) - question = question_pred.summary - if self.verbose: - print(f"\033[96mSummarized query: {question}\033[0m") - - # Rerank with configured provider - reranked_results = self._rerank_documents(question, documents) - - if self.verbose: - provider_name = self.reranker_provider.capitalize() - print(f"\n\033[93m{provider_name} reranking complete.\033[0m") - - # Reorder SearchResult objects based on reranking results - reranked_search_results = [] - - # Handle different result formats - if self.reranker_provider == "hybrid": - # Hybrid mode returns list of (doc_index, fused_score) tuples - for i, (doc_idx, score) in enumerate(reranked_results): - if 0 <= doc_idx < len(search_results): - # Keep the original SearchResult but update ranking info - original_result = search_results[doc_idx] - reranked_search_results.append(SearchResult( - id=original_result.id, - dataset_id=original_result.dataset_id, - content=original_result.content - )) - - if self.verbose and i < 5: - print(f"Rank {i + 1}: " - f"Document {doc_idx + 1} " - f"(RRF score: {score:.4f})") - else: - # Single reranker mode - for i, result in enumerate(reranked_results): - if 0 <= result.index < len(search_results): - # Keep the original SearchResult but update ranking info - original_result = search_results[result.index] - reranked_search_results.append(SearchResult( - id=original_result.id, - dataset_id=original_result.dataset_id, - content=original_result.content - )) - - if self.verbose and i < 5: - print(f"Rank {i + 1}: " - f"Document {result.index + 1} " - f"(relevance: {result.relevance_score:.4f})") - - if self.verbose: - print(f"\n\033[96mReranked: Returning {len(reranked_search_results)} documents\033[0m") - - # Additional diagnostics for low scores (single reranker mode) - if (self.reranker_provider != "hybrid" and - reranked_results and - reranked_results[0].relevance_score < 0.1): - print(f"\033[91mWarning: Low relevance scores detected! " - f"Top score: {reranked_results[0].relevance_score:.4f}\033[0m") - print("This might indicate:") - print("- Documents don't contain relevant content for the query") - print("- The collection might not have documents about this topic") - - # Return response with SearchResult objects as sources - return DSPyAgentRAGResponse( - final_answer="", - sources=reranked_search_results, - searches=[question], - aggregations=None, - usage={}, - ) - - async def aforward(self, question: str) -> DSPyAgentRAGResponse: - """ - Asynchronously execute the retrieval and reranking pipeline. - - Args: - question: User query - - Returns: - DSPyAgentRAGResponse with reranked sources - """ - pass - - -async def main(): - """Test the Cross Encoder Reranker with all providers including hybrid""" - import os - - # Test with Cohere - if os.getenv("COHERE_API_KEY"): - print("\n=== Testing with Cohere Reranker ===") - cohere_reranker = CrossEncoderReranker( - collection_name="EnronEmails", - target_property_name="email_body_vector", - return_property_name="email_body", - retrieved_k=20, - reranked_k=10, - reranker_provider="cohere", - verbose=True - ) - - test_query = "Where will Governor Gray Davis host a party for the delegates, according to the article “Davis faces dire political consequences if power woes linger?" - - # Test synchronous execution - print("\n--- Synchronous Reranking (Cohere) ---") - response = cohere_reranker.forward(test_query) - print(f"Returned {len(response.sources)} reranked documents") - - # Test asynchronous execution - print("\n--- Asynchronous Reranking (Cohere) ---") - async_response = await cohere_reranker.aforward(test_query) - print(f"Returned {len(async_response.sources)} reranked documents") - - # Test with Voyage - if os.getenv("VOYAGE_API_KEY"): - print("\n=== Testing with Voyage Reranker ===") - voyage_reranker = CrossEncoderReranker( - collection_name="EnronEmails", - target_property_name="email_body_vector", - return_property_name="email_body", - retrieved_k=20, - reranked_k=10, - reranker_provider="voyage", - voyage_model="rerank-2.5", - verbose=True - ) - - # Test synchronous execution - print("\n--- Synchronous Reranking (Voyage) ---") - response = voyage_reranker.forward(test_query) - print(f"Returned {len(response.sources)} reranked documents") - - # Test asynchronous execution - print("\n--- Asynchronous Reranking (Voyage) ---") - async_response = await voyage_reranker.aforward(test_query) - print(f"Returned {len(async_response.sources)} reranked documents") - - # Test with Hybrid mode - if os.getenv("COHERE_API_KEY") and os.getenv("VOYAGE_API_KEY"): - print("\n=== Testing with Hybrid Reranker (RRF) ===") - - # Test with equal weights - hybrid_reranker = CrossEncoderReranker( - collection_name="EnronEmails", - target_property_name="email_body_vector", - return_property_name="email_body", - retrieved_k=20, - reranked_k=10, - reranker_provider="hybrid", - verbose=True, - rrf_k=60, - hybrid_weights={"cohere": 0.5, "voyage": 0.5} - ) - - # Test synchronous execution - print("\n--- Synchronous Hybrid Reranking (Equal Weights) ---") - response = hybrid_reranker.forward(test_query) - print(f"Returned {len(response.sources)} reranked documents") - - # Test asynchronous execution - print("\n--- Asynchronous Hybrid Reranking (Equal Weights) ---") - async_response = await hybrid_reranker.aforward(test_query) - print(f"Returned {len(async_response.sources)} reranked documents") - - # Test with weighted preference for Cohere - print("\n=== Testing Hybrid with Cohere Preference (0.7/0.3) ===") - weighted_reranker = CrossEncoderReranker( - collection_name="EnronEmails", - target_property_name="email_body_vector", - return_property_name="email_body", - retrieved_k=20, - reranked_k=10, - reranker_provider="hybrid", - verbose=True, - rrf_k=60, - hybrid_weights={"cohere": 0.7, "voyage": 0.3} - ) - - print("\n--- Weighted Hybrid Reranking ---") - response = weighted_reranker.forward(test_query) - print(f"Returned {len(response.sources)} reranked documents") - - -if __name__ == "__main__": - asyncio.run(main()) \ No newline at end of file diff --git a/retrieve_dspy/retrievers/hybrid_search.py b/retrieve_dspy/retrievers/hybrid_search.py new file mode 100644 index 0000000..a55cd08 --- /dev/null +++ b/retrieve_dspy/retrievers/hybrid_search.py @@ -0,0 +1,101 @@ +import asyncio +from typing import Optional + +import weaviate + +from retrieve_dspy.database.weaviate_database import ( + weaviate_search_tool, + async_weaviate_search_tool +) +from retrieve_dspy.retrievers.base_rag import BaseRAG +from retrieve_dspy.models import DSPyAgentRAGResponse + +class HybridSearch(BaseRAG): + def __init__( + self, + collection_name: str, + weaviate_client: Optional[weaviate.WeaviateClient | weaviate.WeaviateAsyncClient] = None, + target_property_name: Optional[str] = "content", + verbose: Optional[bool] = False, + search_only: Optional[bool] = True, + retrieved_k: Optional[int] = 20, + ): + super().__init__(collection_name, target_property_name, search_only=search_only, verbose=verbose, retrieved_k=retrieved_k) + self.weaviate_client = weaviate_client + + def forward(self, question: str, weaviate_client: Optional[weaviate.WeaviateClient] = None) -> DSPyAgentRAGResponse: + if weaviate_client is None: + if isinstance(self.weaviate_client, weaviate.WeaviateClient): + weaviate_client = self.weaviate_client + + sources = weaviate_search_tool( + query=question, + collection_name=self.collection_name, + target_property_name=self.target_property_name, + retrieved_k=self.retrieved_k, + weaviate_client=weaviate_client, + ) + + if self.verbose: + print(f"\033[96m Returning {len(sources)} Sources!\033[0m") + + if not self.search_only: + print("") + + return DSPyAgentRAGResponse( + final_answer="", + sources=sources, + searches=[question], + usage={}, + ) + + async def aforward(self, question: str, weaviate_async_client: Optional[weaviate.WeaviateAsyncClient] = None) -> DSPyAgentRAGResponse: + if weaviate_async_client is None: + if isinstance(self.weaviate_async_client, weaviate.WeaviateAsyncClient): + weaviate_async_client = self.weaviate_async_client + + sources = await async_weaviate_search_tool( + query=question, + collection_name=self.collection_name, + target_property_name=self.target_property_name, + retrieved_k=self.retrieved_k, + weaviate_async_client=weaviate_async_client, + ) + + if self.verbose: + print(f"\033[96m Returning {len(sources)} Sources!\033[0m") + + if not self.search_only: + print("") + + return DSPyAgentRAGResponse( + final_answer="", + sources=sources, + searches=[question], + usage={}, + ) + +async def main(): + import os + test_pipeline = HybridSearch( + collection_name="EnronEmails", + target_property_name="email_body", + retrieved_k=5 + ) + test_q = "What are the implications of SBX12?" + weaviate_client = weaviate.connect_to_weaviate_cloud( + cluster_url=os.getenv("WEAVIATE_URL"), + auth_credentials=weaviate.auth.AuthApiKey(os.getenv("WEAVIATE_API_KEY")), + ) + response = test_pipeline.forward(test_q, weaviate_client=weaviate_client) + print(response) + weaviate_async_client = weaviate.use_async_with_weaviate_cloud( + cluster_url=os.getenv("WEAVIATE_URL"), + auth_credentials=weaviate.auth.AuthApiKey(os.getenv("WEAVIATE_API_KEY")), + ) + await weaviate_async_client.connect() + async_response = await test_pipeline.aforward(test_q, weaviate_async_client=weaviate_async_client) + print(async_response) + +if __name__ == "__main__": + asyncio.run(main()) \ No newline at end of file diff --git a/retrieve_dspy/retrievers/layered_reranker.py b/retrieve_dspy/retrievers/layered_reranker.py deleted file mode 100644 index 3ee158b..0000000 --- a/retrieve_dspy/retrievers/layered_reranker.py +++ /dev/null @@ -1,218 +0,0 @@ -import asyncio -import os -import re -from typing import Optional, Any - -import dspy -import voyageai - -from retrieve_dspy.tools.weaviate_database import ( - weaviate_search_tool -) -from retrieve_dspy.retrievers.base_rag import BaseRAG -from retrieve_dspy.models import DSPyAgentRAGResponse, SearchResult -from retrieve_dspy.signatures import RelevanceRanker, IdentifyMostRelevantPassage - - -class LayeredReranker(BaseRAG): - def __init__( - self, - collection_name: str, - target_property_name: str, - return_property_name: str, - verbose: bool = False, - search_only: bool = True, - retrieved_k: int = 100, - reranked_N: int = 50, - reranked_M: int = 20, - voyage_model: str = "rerank-2.5", - voyage_api_key: Optional[str] = None - ): - super().__init__( - collection_name=collection_name, - target_property_name=target_property_name, - verbose=verbose, - search_only=search_only, - retrieved_k=retrieved_k - ) - self.return_property_name = return_property_name - self.reranked_N = reranked_N - self.reranked_M = reranked_M - self.voyage_model = voyage_model - - # Initialize Cohere client - api_key = voyage_api_key or os.getenv("VOYAGE_API_KEY") - if not api_key: - raise ValueError("VOYAGE_API_KEY must be provided or set as environment variable") - - # Need Async Client for async case here - self.vo = voyageai.Client(api_key) - - # Initialize Listwise Reranker - if self.reranked_M == 1: - self.listwise_reranker = dspy.Predict(IdentifyMostRelevantPassage) - else: - self.listwise_reranker = dspy.Predict(RelevanceRanker) - - def _rerank_with_voyage( - self, - query: str, - documents: list[str] - ) -> list[Any]: - """ - Rerank documents using Voyage's Cross Encoder. - - Args: - query: User query - documents: List of document texts to rerank - - Returns: - Reranked results from Voyage - """ - try: - response = self.vo.rerank( - query=query, - documents=documents, - model=self.voyage_model, - top_k=min(self.reranked_N, len(documents)) - ) - return response.results - except Exception as e: - if self.verbose: - print(f"\033[91mError during Voyage reranking: {e}\033[0m") - raise - - async def _async_rerank_with_voyage( - self, - query: str, - documents: list[str], - ): - pass - - def forward(self, question: str) -> DSPyAgentRAGResponse: - # first search with the original query - search_results, sources = weaviate_search_tool( - query=question, - collection_name=self.collection_name, - target_property_name=self.target_property_name, - return_property_name=self.return_property_name, - retrieved_k=self.retrieved_k, - return_format="rerank" - ) - - if self.verbose: - print(f"\033[96mInitial retrieval: {len(search_results)} documents\033[0m") - - # Extract document content for reranking - documents = [] - for result in search_results: - doc_text = result.content if hasattr(result, 'content') else str(result) - documents.append(doc_text) - - # then apply the cross encoder reranker to truncate the results to N - reranked_results = self._rerank_with_voyage(question, documents) - - # Reorder sources based on Cohere's reranking - cross_encoder_sources = [] - for result in reranked_results: - if 0 <= result.index < len(sources): - cross_encoder_sources.append(sources[result.index]) - - if self.verbose: - print(f"\033[93mCross encoder reranking: {len(cross_encoder_sources)} documents\033[0m") - - if len(cross_encoder_sources) > self.reranked_M: - cross_encoder_search_results = [] - for i, source in enumerate(cross_encoder_sources): - if hasattr(source, 'content'): - content = source.content - elif hasattr(source, 'text'): - content = source.text - else: - content = str(source) - - search_result = SearchResult( - id=i, - initial_rank=i, - content=content - ) - cross_encoder_search_results.append(search_result) - - listwise_reranked_result = self.listwise_reranker( - query=question, - search_results=cross_encoder_search_results, - top_k=self.reranked_M - ) - - if self.reranked_M == 1: - final_sources = [cross_encoder_sources[listwise_reranked_result.most_relevant_passage]] - if self.verbose: - print(f"\033[92mListwise reranking: Returning {len(final_sources)} documents\033[0m") - return DSPyAgentRAGResponse( - final_answer="", - sources=final_sources, - searches=[question], - aggregations=None, - usage={}, - ) - - - ranked_indices = [] - if hasattr(listwise_reranked_result, 'reranked_ids'): - ranked_indices = listwise_reranked_result.reranked_ids - elif hasattr(listwise_reranked_result, 'prediction'): - prediction = listwise_reranked_result.prediction - if isinstance(prediction, str): - indices = re.findall(r'\d+', prediction) - ranked_indices = [int(i) for i in indices if int(i) < len(cross_encoder_sources)] - - - final_sources = [] - for idx in ranked_indices[:self.reranked_M]: - if 0 <= idx < len(cross_encoder_sources): - final_sources.append(cross_encoder_sources[idx]) - - if len(final_sources) < self.reranked_M: - for i, source in enumerate(cross_encoder_sources): - if i not in ranked_indices and len(final_sources) < self.reranked_M: - final_sources.append(source) - - if self.verbose: - print(f"\033[92mListwise reranking: Returning {len(final_sources)} documents\033[0m") - - else: - final_sources = cross_encoder_sources[:self.reranked_M] - - - return DSPyAgentRAGResponse( - final_answer="", - sources=final_sources, - searches=[question], - aggregations=None, - usage={}, - ) - - async def aforward(self, question: str) -> DSPyAgentRAGResponse: - pass - -async def main(): - rag_pipeline = LayeredReranker( - collection_name="EnronEmails", - target_property_name="email_body_vector", - return_property_name="email_body", - retrieved_k=50, - reranked_N=20, - reranked_M=5, - voyage_model="rerank-2.5", - verbose=True - ) - print("Testing sync forward") - test_query = "Where will Governor Gray Davis host a party for the delegates, according to the article “Davis faces dire political consequences if power woes linger?" - response = rag_pipeline.forward(test_query) - print(response) - #print("Testing async forward") - #response = await rag_pipeline.aforward("What is the best way to learn Angular?") - #print(response) - -if __name__ == "__main__": - asyncio.run(main()) \ No newline at end of file diff --git a/retrieve_dspy/retrievers/looping_query_writer.py b/retrieve_dspy/retrievers/looping_query_writer.py deleted file mode 100644 index 23c7e5b..0000000 --- a/retrieve_dspy/retrievers/looping_query_writer.py +++ /dev/null @@ -1,207 +0,0 @@ -from typing import Optional -import asyncio -import os - -import dspy - -from retrieve_dspy.tools.weaviate_database import ( - weaviate_search_tool, - async_weaviate_search_tool -) - -from retrieve_dspy.retrievers.base_rag import BaseRAG -from retrieve_dspy.models import DSPyAgentRAGResponse -from retrieve_dspy.signatures import WriteFollowUpQueries - -class LoopingQueryWriter(BaseRAG): - def __init__( - self, - collection_name: str, - target_property_name: Optional[str] = "content", - max_loops: Optional[int] = 1, - verbose: Optional[bool] = False, - search_only: Optional[bool] = True, - retrieved_k: Optional[int] = 20 - ): - super().__init__(collection_name, target_property_name, search_only=search_only, verbose=verbose, retrieved_k=retrieved_k) - self.max_loops = max_loops - self.looping_query_writer = dspy.Predict(WriteFollowUpQueries) - - def forward(self, question: str) -> DSPyAgentRAGResponse: - all_contexts = [] - all_sources = [] - all_searches = [question] - usage_buckets = [] - - # Initial search - contexts, sources = weaviate_search_tool( - query=question, - collection_name=self.collection_name, - target_property_name=self.target_property_name, - retrieved_k=self.retrieved_k, - ) - - all_contexts.extend(contexts) - all_sources.extend(sources) - - if self.verbose: - print(f"\033[96m Initial search returned {len(sources)} Sources!\033[0m") - - loop_count = 0 - while loop_count < self.max_loops: - contexts_str = "\n".join(all_contexts) - - follow_up_result = self.looping_query_writer( - question=question, - contexts=contexts_str, - ) - - usage_buckets.append(follow_up_result.get_lm_usage() or {}) - - if follow_up_result.follow_up_queries_needed and follow_up_result.follow_up_queries: - if self.verbose: - print(f"\033[94m Loop {loop_count + 1}: Generated {len(follow_up_result.follow_up_queries)} follow-up queries\033[0m") - - for follow_up_query in follow_up_result.follow_up_queries: - new_contexts, new_sources = weaviate_search_tool( - query=follow_up_query, - collection_name=self.collection_name, - target_property_name=self.target_property_name, - retrieved_k=self.retrieved_k, - ) - - all_contexts.extend(new_contexts) - all_sources.extend(new_sources) - all_searches.append(follow_up_query) - - if self.verbose: - print(f"\033[92m Follow-up query '{follow_up_query}' returned {len(new_sources)} sources\033[0m") - else: - if self.verbose: - print(f"\033[93m No follow-up queries needed, stopping at loop {loop_count + 1}\033[0m") - break - - loop_count += 1 - - # Remove duplicates while preserving order - unique_sources = [] - seen_ids = set() - for source in all_sources: - if hasattr(source, 'id') and source.id not in seen_ids: - unique_sources.append(source) - seen_ids.add(source.id) - elif not hasattr(source, 'id'): - unique_sources.append(source) - - if self.verbose: - print(f"\033[96m Total unique sources after {loop_count + 1} iterations: {len(unique_sources)}\033[0m") - - return DSPyAgentRAGResponse( - final_answer="", - sources=unique_sources, - searches=all_searches, - aggregations=None, - usage=self._merge_usage(*usage_buckets), - ) - - async def aforward(self, question: str) -> DSPyAgentRAGResponse: - all_contexts = [] - all_sources = [] - all_searches = [question] - usage_buckets = [] - - # Initial search - contexts, sources = await async_weaviate_search_tool( - query=question, - collection_name=self.collection_name, - target_property_name=self.target_property_name, - retrieved_k=self.retrieved_k, - ) - - all_contexts.extend(contexts) - all_sources.extend(sources) - - if self.verbose: - print(f"\033[96m Initial search returned {len(sources)} Sources!\033[0m") - - loop_count = 0 - while loop_count < self.max_loops: - contexts_str = "\n".join(all_contexts) - - follow_up_result = await self.looping_query_writer.acall( - question=question, - contexts=contexts_str, - ) - - usage_buckets.append(follow_up_result.get_lm_usage() or {}) - - if follow_up_result.follow_up_queries_needed and follow_up_result.follow_up_queries: - if self.verbose: - print(f"\033[94m Loop {loop_count + 1}: Generated {len(follow_up_result.follow_up_queries)} follow-up queries\033[0m") - - for follow_up_query in follow_up_result.follow_up_queries: - new_contexts, new_sources = await async_weaviate_search_tool( - query=follow_up_query, - collection_name=self.collection_name, - target_property_name=self.target_property_name, - retrieved_k=self.retrieved_k, - ) - - all_contexts.extend(new_contexts) - all_sources.extend(new_sources) - all_searches.append(follow_up_query) - - if self.verbose: - print(f"\033[92m Follow-up query '{follow_up_query}' returned {len(new_sources)} sources\033[0m") - else: - if self.verbose: - print(f"\033[93m No follow-up queries needed, stopping at loop {loop_count + 1}\033[0m") - break - - loop_count += 1 - - # Remove duplicates while preserving order - unique_sources = [] - seen_ids = set() - for source in all_sources: - if hasattr(source, 'id') and source.id not in seen_ids: - unique_sources.append(source) - seen_ids.add(source.id) - elif not hasattr(source, 'id'): - unique_sources.append(source) - - if self.verbose: - print(f"\033[96m Total unique sources after {loop_count + 1} iterations: {len(unique_sources)}\033[0m") - - return DSPyAgentRAGResponse( - final_answer="", - sources=unique_sources, - searches=all_searches, - aggregations=None, - usage=self._merge_usage(*usage_buckets), - ) - -async def main(): - openai_api_key = os.getenv("OPENAI_API_KEY") - if not openai_api_key: - raise ValueError("OPENAI_API_KEY environment variable is required") - - lm = dspy.LM("openai/gpt-4.1-mini", api_key=openai_api_key) - dspy.configure(lm=lm, track_usage=True) - print(f"DSPy configured with: {lm}") - - test_pipeline = LoopingQueryWriter( - collection_name="FreshstackLangchain", - target_property_name="docs_text", - retrieved_k=5, - max_loops=2, - verbose=True - ) - test_q = "How do I integrate Weaviate and Langchain?" - response = test_pipeline.forward(test_q) - print(response) - async_response = await test_pipeline.aforward(test_q) - print(async_response) - -if __name__ == "__main__": - asyncio.run(main()) \ No newline at end of file diff --git a/retrieve_dspy/retrievers/multi_hop/baleen.py b/retrieve_dspy/retrievers/multi_hop/baleen.py new file mode 100644 index 0000000..15fcd75 --- /dev/null +++ b/retrieve_dspy/retrievers/multi_hop/baleen.py @@ -0,0 +1,149 @@ +# ============================== +# WIP! +# ============================== + +import asyncio +import os +from typing import Optional, Dict, Literal + +import dspy +import voyageai +import weaviate + +from retrieve_dspy.retrievers.base_rag import BaseRAG +from retrieve_dspy.database.weaviate_database import weaviate_search_tool +from retrieve_dspy.signatures import ( + WriteFollowUpQuery, + VerboseSummarizeSearchResults, + SummarizeSearchResults, +) +from retrieve_dspy.models import ObjectFromDB, RerankerClient, DSPyAgentRAGResponse + +from retrieve_dspy.retrievers.common.deduplicate import deduplicate_and_join +from retrieve_dspy.retrievers.common.call_ce_ranker import ( + RerankItem, + ce_rank, + reorder, +) + +RerankProvider = Literal["cohere", "voyage", "hybrid"] + +class SimplifiedBaleen(BaseRAG): + def __init__( + self, + weaviate_client: weaviate.WeaviateClient, + reranker_clients: list[RerankerClient], + collection_name: str, + target_property_name: str, + verbose: bool = False, + verbose_signature: bool = True, + search_only: bool = True, + retrieved_k: int = 5, + reranked_N: int = 20, + max_hops: int = 2, + reranker_provider: Optional[RerankProvider] = None, + cohere_model: Optional[str] = "rerank-v3.5", + voyage_model: Optional[str] = "rerank-2.5", + rrf_k: Optional[int] = 60, + hybrid_weights: Optional[Dict[str, float]] = None, + ): + super().__init__( + weaviate_client=weaviate_client, + collection_name=collection_name, + target_property_name=target_property_name, + verbose=verbose, + verbose_signature=verbose_signature, + search_only=search_only, + retrieved_k=retrieved_k, + ) + + self.reranker_clients = reranker_clients + self.max_hops = max_hops + self.reranked_N = reranked_N + self.reranker_provider = reranker_provider + self.cohere_model = cohere_model + self.voyage_model = voyage_model + self.rrf_k = rrf_k + self.hybrid_weights = hybrid_weights + self.verbose = verbose + if self.verbose_signature: + self.query_writer = dspy.ChainOfThought(WriteFollowUpQuery) + self.search_results_summarizer = dspy.ChainOfThought(VerboseSummarizeSearchResults) + else: + self.query_writer = dspy.Predict(WriteFollowUpQuery) + self.search_results_summarizer = dspy.Predict(SummarizeSearchResults) + + def forward(self, question: str) -> list[ObjectFromDB]: + results: list[ObjectFromDB] = [] + # init by searching with the original query + # summary_of_results_found_so_far = "" + + for hop in range(self.max_hops): + query_writer_pred = self.query_writer(question=question, results_found_so_far=results) + if query_writer_pred.follow_up_query_needed: + passages = weaviate_search_tool( + query=query_writer_pred.follow_up_query, + collection_name=self.collection_name, + target_property_name=self.target_property_name, + retrieved_k=self.retrieved_k, + weaviate_client=self.weaviate_client + ) + results = deduplicate_and_join(results, passages) + ''' + summary_of_results_found_so_far = self.search_results_summarizer( + question=question, + search_results=results + ).summary_of_search_results + ''' + if self.verbose: + print(f"\033[92mHop {hop + 1}:\nQuery '{query_writer_pred.follow_up_query}'\nreturned {len(passages)} sources\033[0m") + + # Add Cross Encoder Ranking to the end (or RRF) + + documents = [s.content for s in results] + reranked_results: list[RerankItem] = ce_rank( + query=question, + documents=documents, + top_k=self.reranked_N, + clients=self.reranker_clients, + provider=self.reranker_provider, + cohere_model=self.cohere_model, + voyage_model=self.voyage_model, + rrf_k=self.rrf_k, + verbose=self.verbose, + ) + results: list[ObjectFromDB] = reorder(reranked_results, results) + if self.verbose: + print(f"\033[93mReranked: Returning {len(results)} documents\033[0m") + return DSPyAgentRAGResponse( + final_answer="", + sources=results, + searches=[question], + aggregations=None, + usage={}, + ) + +async def main(): + weaviate_client = weaviate.connect_to_weaviate_cloud( + cluster_url=os.getenv("WEAVIATE_URL"), + auth_credentials=weaviate.auth.AuthApiKey(os.getenv("WEAVIATE_API_KEY")) + ) + voyage_client = voyageai.Client(api_key=os.getenv("VOYAGE_API_KEY")) + retriever = SimplifiedBaleen( + weaviate_client=weaviate_client, + collection_name="EnronEmails", + target_property_name="email_body", + retrieved_k=5, + max_hops=2, + verbose=True, + verbose_signature=True, + reranker_clients=[RerankerClient(name="voyage", client=voyage_client)], + reranked_N=20, + reranker_provider="voyage", + voyage_model="rerank-2.5", + ) + results = retriever.forward(question="What are the implications of SBX12?") + print(results) + +if __name__ == "__main__": + asyncio.run(main()) \ No newline at end of file diff --git a/retrieve_dspy/retrievers/multi_hop/simplified_baleen_with_cross_encoder.py b/retrieve_dspy/retrievers/multi_hop/simplified_baleen_with_cross_encoder.py new file mode 100644 index 0000000..33e2659 --- /dev/null +++ b/retrieve_dspy/retrievers/multi_hop/simplified_baleen_with_cross_encoder.py @@ -0,0 +1,142 @@ +import asyncio +import os +from typing import Optional, Dict, Literal + +import dspy +import voyageai +import weaviate + +from retrieve_dspy.retrievers.base_rag import BaseRAG +from retrieve_dspy.database.weaviate_database import weaviate_search_tool +from retrieve_dspy.signatures import WriteFollowUpQuery, VerboseWriteFollowUpQuery +from retrieve_dspy.models import ObjectFromDB, RerankerClient, DSPyAgentRAGResponse + +from retrieve_dspy.retrievers.common.deduplicate import deduplicate_and_join +from retrieve_dspy.retrievers.common.call_ce_ranker import ( + RerankItem, + ce_rank, + reorder, +) + +RerankProvider = Literal["cohere", "voyage", "hybrid"] + +class SimplifiedBaleenWithCrossEncoder(BaseRAG): + def __init__( + self, + weaviate_client: weaviate.WeaviateClient, + reranker_clients: list[RerankerClient], + collection_name: str, + target_property_name: str, + verbose: bool = False, + verbose_signature: bool = True, + search_only: bool = True, + retrieved_k: int = 5, + reranked_N: int = 20, + max_hops: int = 2, + reranker_provider: Optional[RerankProvider] = None, + cohere_model: Optional[str] = "rerank-v3.5", + voyage_model: Optional[str] = "rerank-2.5", + rrf_k: Optional[int] = 60, + hybrid_weights: Optional[Dict[str, float]] = None, + ): + super().__init__( + weaviate_client=weaviate_client, + collection_name=collection_name, + target_property_name=target_property_name, + verbose=verbose, + verbose_signature=verbose_signature, + search_only=search_only, + retrieved_k=retrieved_k, + ) + + self.reranker_clients = reranker_clients + self.max_hops = max_hops + self.reranked_N = reranked_N + self.reranker_provider = reranker_provider + self.cohere_model = cohere_model + self.voyage_model = voyage_model + self.rrf_k = rrf_k + self.hybrid_weights = hybrid_weights + self.verbose = verbose + if self.verbose_signature: + self.query_writer = dspy.ChainOfThought(VerboseWriteFollowUpQuery) + else: + self.query_writer = dspy.Predict(WriteFollowUpQuery) + + def forward(self, question: str) -> list[ObjectFromDB]: + results: list[ObjectFromDB] = [] + + for hop in range(self.max_hops): + query_writer_pred = self.query_writer(question=question, results_found_so_far=results) + if query_writer_pred.follow_up_query_needed: + passages = weaviate_search_tool( + query=query_writer_pred.follow_up_query, + collection_name=self.collection_name, + target_property_name=self.target_property_name, + retrieved_k=self.retrieved_k, + weaviate_client=self.weaviate_client + ) + results = deduplicate_and_join(results, passages) + if self.verbose: + print(f"\033[92mHop {hop + 1}:\nQuery '{query_writer_pred.follow_up_query}'\nreturned {len(passages)} sources\033[0m") + + # Add Cross Encoder Ranking to the end (or RRF) + + documents = [s.content for s in results] + + if len(documents) == 0: # actually fix later + print("For some reason, no documents were found. Returning empty results.") + return DSPyAgentRAGResponse( + final_answer="", + sources=[], + searches=[question], + aggregations=None, + usage={}, + ) + + reranked_results: list[RerankItem] = ce_rank( + query=question, + documents=documents, + top_k=self.reranked_N, + clients=self.reranker_clients, + provider=self.reranker_provider, + cohere_model=self.cohere_model, + voyage_model=self.voyage_model, + rrf_k=self.rrf_k, + verbose=self.verbose, + ) + results: list[ObjectFromDB] = reorder(reranked_results, results) + if self.verbose: + print(f"\033[93mReranked: Returning {len(results)} documents\033[0m") + return DSPyAgentRAGResponse( + final_answer="", + sources=results, + searches=[question], + aggregations=None, + usage={}, + ) + +async def main(): + weaviate_client = weaviate.connect_to_weaviate_cloud( + cluster_url=os.getenv("WEAVIATE_URL"), + auth_credentials=weaviate.auth.AuthApiKey(os.getenv("WEAVIATE_API_KEY")) + ) + voyage_client = voyageai.Client(api_key=os.getenv("VOYAGE_API_KEY")) + retriever = SimplifiedBaleenWithCrossEncoder( + weaviate_client=weaviate_client, + collection_name="EnronEmails", + target_property_name="email_body", + retrieved_k=5, + max_hops=2, + verbose=True, + verbose_signature=True, + reranker_clients=[RerankerClient(name="voyage", client=voyage_client)], + reranked_N=20, + reranker_provider="voyage", + voyage_model="rerank-2.5", + ) + results = retriever.forward(question="What are the implications of SBX12?") + print(results) + +if __name__ == "__main__": + asyncio.run(main()) \ No newline at end of file diff --git a/retrieve_dspy/retrievers/query_writer_and_listwise_reranker.py b/retrieve_dspy/retrievers/query_writer_and_listwise_reranker.py deleted file mode 100644 index 66959bb..0000000 --- a/retrieve_dspy/retrievers/query_writer_and_listwise_reranker.py +++ /dev/null @@ -1,171 +0,0 @@ -import asyncio -from typing import Optional - -import dspy - -from retrieve_dspy.tools.weaviate_database import ( - weaviate_search_tool, - async_weaviate_search_tool -) - -from retrieve_dspy.retrievers.base_rag import BaseRAG - -from retrieve_dspy.models import DSPyAgentRAGResponse, Source -from retrieve_dspy.signatures import WriteSearchQueries, DiversityRanker - -class QueryWriterWithListwiseReranker(BaseRAG): - def __init__( - self, - collection_name: str, - target_property_name: str, - retrieved_k: Optional[int] = 10, - reranked_k: Optional[int] = 20, - search_with_queries_concatenated: Optional[bool] = False - ): - super().__init__( - collection_name=collection_name, - target_property_name=target_property_name, - retrieved_k=retrieved_k - ) - self.reranked_k = reranked_k - self.query_writer = dspy.Predict(WriteSearchQueries) - self.reranker = dspy.Predict(DiversityRanker) - - def forward(self, question: str) -> DSPyAgentRAGResponse: - qw_pred = self.query_writer(question=question) - queries: list[str] = qw_pred.search_queries - print(f"\033[95mWrote {len(queries)} queries!\033[0m") - - usage_buckets = [qw_pred.get_lm_usage() or {}] - - all_search_results = [] - all_sources: list[Source] = [] - for q in queries: - search_results, sources = weaviate_search_tool( - query=q, - collection_name=self.collection_name, - target_property_name=self.target_property_name, - retrieved_k=self.retrieved_k, - return_format="rerank" - ) - all_search_results.extend(search_results) - all_sources.extend(sources) - - print(f"\033[96mCollected {len(all_sources)} candidates from {len(queries)} queries\033[0m") - print(f"Number of search results -- {len(all_search_results)}") - - print(f"Testing if reranked_k is set -- {self.reranked_k}") - - rerank_pred = self.reranker( - query=question, - search_results=all_search_results, - top_k=self.reranked_k - ) - - # Reorder sources based on reranking - reranked_sources = [] - reranked_results = [] - for rank_id in rerank_pred.reranked_ids: - # Find the source corresponding to this rank_id - source_index = rank_id - 1 - if 0 <= source_index < len(all_sources): - reranked_sources.append(all_sources[source_index]) - reranked_results.append(all_search_results[source_index]) - - print(f"\033[96mReranked: Returning {len(reranked_sources)} Sources!\033[0m") - print("\nTop 5 reranked results:") - for i, result in enumerate(reranked_results[:5], 1): - print(f"New Rank {i} (was {result.initial_rank}).") - - usage_buckets.append(rerank_pred.get_lm_usage() or {}) - - return DSPyAgentRAGResponse( - final_answer="", - sources=reranked_sources, - searches=queries, - aggregations=None, - usage=self._merge_usage(*usage_buckets), - ) - - async def aforward(self, question: str) -> DSPyAgentRAGResponse: - qw_pred = await self.query_writer.acall(question=question) - queries: list[str] = qw_pred.search_queries - print(f"\033[95mWrote {len(queries)} queries!\033[0m") - usage_buckets = [qw_pred.get_lm_usage() or {}] - - tasks = [ - async_weaviate_search_tool( - query=q, - collection_name=self.collection_name, - target_property_name=self.target_property_name, - retrieved_k=self.retrieved_k, - return_format="rerank" - ) - for q in queries - ] - results = await asyncio.gather(*tasks) - all_search_results = [] - all_sources: list[Source] = [] - for search_results, sources in results: - all_search_results.extend(search_results) - all_sources.extend(sources) - - print(f"\033[96mCollected {len(all_sources)} candidates from {len(queries)} queries\033[0m") - - rerank_pred = await self.reranker.acall( - query=question, - search_results=all_search_results, - top_k=self.reranked_k - ) - - # Reorder sources based on reranking - reranked_sources = [] - reranked_results = [] - for rank_id in rerank_pred.reranked_ids: - # Find the source corresponding to this rank_id - source_index = rank_id - 1 - if 0 <= source_index < len(all_sources): - reranked_sources.append(all_sources[source_index]) - reranked_results.append(all_search_results[source_index]) - - print(f"\033[96mReranked: Returning {len(reranked_sources)} Sources!\033[0m") - print("\nTop 5 reranked results:") - for i, result in enumerate(reranked_results[:5], 1): - print(f"New Rank {i} (was {result.initial_rank}).") - - usage_buckets.append(rerank_pred.get_lm_usage() or {}) - - return DSPyAgentRAGResponse( - final_answer="", - sources=reranked_sources, - searches=queries, - aggregations=None, - usage=self._merge_usage(*usage_buckets), - ) - -async def main(): - import os - import dspy - - openai_api_key = os.getenv("OPENAI_API_KEY") - if not openai_api_key: - raise ValueError("OPENAI_API_KEY environment variable is required") - - lm = dspy.LM("openai/gpt-4.1-mini", api_key=openai_api_key) - dspy.configure(lm=lm, track_usage=True) - print(f"DSPy configured with: {lm}") - - test_pipeline = QueryWriterWithListwiseReranker( - collection_name="FreshstackLangchain", - target_property_name="docs_text", - retrieved_k=5 - ) - test_q = "How do I integrate Weaviate and Langchain?" - response = test_pipeline.forward(test_q) - print(response) - async_response = await test_pipeline.aforward(test_q) - print(async_response) - -if __name__ == "__main__": - import asyncio - asyncio.run(main()) \ No newline at end of file diff --git a/retrieve_dspy/retrievers/query_writers/LameR.py b/retrieve_dspy/retrievers/query_writers/LameR.py new file mode 100644 index 0000000..9704228 --- /dev/null +++ b/retrieve_dspy/retrievers/query_writers/LameR.py @@ -0,0 +1,142 @@ +import asyncio +import os +from typing import Optional + +import dspy +import weaviate + +from retrieve_dspy.database.weaviate_database import ( + weaviate_search_tool, + async_weaviate_search_tool +) +from retrieve_dspy.retrievers.base_rag import BaseRAG +from retrieve_dspy.models import DSPyAgentRAGResponse +from retrieve_dspy.signatures import LameR, VerboseLameR + +class LameR_QueryExpander(BaseRAG): + def __init__( + self, + collection_name: str, + target_property_name: str, + weaviate_client: Optional[weaviate.WeaviateClient | weaviate.WeaviateAsyncClient] = None, + verbose: Optional[bool] = False, + search_only: Optional[bool] = True, + retrieved_k: Optional[int] = 20 + ): + super().__init__( + collection_name=collection_name, + target_property_name=target_property_name, + search_only=search_only, + verbose=verbose, + retrieved_k=retrieved_k + ) + self.weaviate_client = weaviate_client + if self.verbose: + self.LameR = dspy.Predict(VerboseLameR) + else: + self.LameR = dspy.Predict(LameR) + + def forward( + self, + question: str, + weaviate_client: Optional[weaviate.WeaviateClient] = None + ) -> DSPyAgentRAGResponse: + if weaviate_client is None: + if isinstance(self.weaviate_client, weaviate.WeaviateClient): + weaviate_client = self.weaviate_client + + initial_search_results = weaviate_search_tool( + query=question, + collection_name=self.collection_name, + target_property_name=self.target_property_name, + retrieved_k=3, + weaviate_client=weaviate_client, + ) + LameR_query = self.LameR( + question=question, + possible_answering_passages=initial_search_results + ).correct_answering_passage + + + if self.verbose: + print(f"\033[95mLameR query:\n{LameR_query}\033[0m") + + sources = weaviate_search_tool( + query=LameR_query, + collection_name=self.collection_name, + target_property_name=self.target_property_name, + retrieved_k=self.retrieved_k, + weaviate_client=weaviate_client, + ) + return DSPyAgentRAGResponse( + final_answer="", + sources=sources, + searches=[LameR_query], + aggregations=None, + usage={}, + ) + + async def aforward( + self, + question: str, + weaviate_async_client: Optional[weaviate.WeaviateAsyncClient] = None + ) -> DSPyAgentRAGResponse: + if weaviate_async_client is None: + if isinstance(self.weaviate_async_client, weaviate.WeaviateAsyncClient): + weaviate_async_client = self.weaviate_async_client + + initial_search_results = await async_weaviate_search_tool( + query=question, + collection_name=self.collection_name, + target_property_name=self.target_property_name, + retrieved_k=3, + weaviate_async_client=weaviate_async_client, + ) + LameR_response = await self.LameR.acall( + question=question, + possible_answering_passages=initial_search_results + ) + LameR_query = LameR_response.correct_answering_passage + + if self.verbose: + print(f"\033[95mLameR query:\n{LameR_query}\033[0m") + + sources = await async_weaviate_search_tool( + query=LameR_query, + collection_name=self.collection_name, + target_property_name=self.target_property_name, + retrieved_k=self.retrieved_k, + weaviate_async_client=weaviate_async_client, + ) + return DSPyAgentRAGResponse( + final_answer="", + sources=sources, + searches=[LameR_query], + aggregations=None, + usage={}, + ) + +async def main(): + test_pipeline = LameR_QueryExpander( + collection_name="BrightBiology", + target_property_name="content", + verbose=True, + retrieved_k=5 + ) + test_q = "How many cells are in the human body?" + weaviate_client = weaviate.connect_to_weaviate_cloud( + cluster_url=os.getenv("WEAVIATE_URL"), + auth_credentials=weaviate.auth.AuthApiKey(os.getenv("WEAVIATE_API_KEY")), + ) + weaviate_async_client = weaviate.use_async_with_weaviate_cloud( + cluster_url=os.getenv("WEAVIATE_URL"), + auth_credentials=weaviate.auth.AuthApiKey(os.getenv("WEAVIATE_API_KEY")), + ) + await weaviate_async_client.connect() + test_sync_response = test_pipeline.forward(test_q, weaviate_client=weaviate_client) + print(test_sync_response) + test_async_response = await test_pipeline.aforward(test_q, weaviate_async_client=weaviate_async_client) + print(test_async_response) + +if __name__ == "__main__": + asyncio.run(main()) \ No newline at end of file diff --git a/retrieve_dspy/retrievers/query_writers/ThinkQE.py b/retrieve_dspy/retrievers/query_writers/ThinkQE.py new file mode 100644 index 0000000..d91d651 --- /dev/null +++ b/retrieve_dspy/retrievers/query_writers/ThinkQE.py @@ -0,0 +1,303 @@ +import asyncio +import os +from typing import Optional + +import dspy +import weaviate + +from retrieve_dspy.database.weaviate_database import ( + weaviate_search_tool, + async_weaviate_search_tool +) +from retrieve_dspy.retrievers.base_rag import BaseRAG +from retrieve_dspy.models import DSPyAgentRAGResponse +from retrieve_dspy.signatures import ThinkQE, VerboseThinkQE + +class ThinkQE_QueryExpander(BaseRAG): + def __init__( + self, + collection_name: str, + target_property_name: str, + weaviate_client: Optional[weaviate.WeaviateClient | weaviate.WeaviateAsyncClient] = None, + verbose: Optional[bool] = False, + search_only: Optional[bool] = True, + retrieved_k: Optional[int] = 5, + num_rounds: Optional[int] = 3, + num_samples: Optional[int] = 2, + repetition_lambda: Optional[float] = 3.0, + ): + super().__init__( + collection_name=collection_name, + target_property_name=target_property_name, + search_only=search_only, + verbose=verbose, + retrieved_k=retrieved_k + ) + self.num_rounds = num_rounds + self.num_samples = num_samples + self.repetition_lambda = repetition_lambda + self.weaviate_client = weaviate_client + if self.verbose: + self.ThinkQE = dspy.ChainOfThought(VerboseThinkQE) + else: + self.ThinkQE = dspy.ChainOfThought(ThinkQE) + + # Blacklist for redundancy filtering + self.blacklist = set() + self.previous_top_k = set() + + def _calculate_query_repetitions(self, original_query: str, expansions: list[str]) -> int: + """Calculate how many times to repeat the original query based on expansion length.""" + expansion_length = sum(len(exp.split()) for exp in expansions) + query_length = len(original_query.split()) + + if query_length == 0: + return 1 + + n = int(expansion_length / (query_length * self.repetition_lambda)) + return max(1, n) + + def _filter_documents(self, search_results: list, previous_results: set) -> list: + """Filter out documents that appear in blacklist or previous top-K.""" + filtered = [] + for doc in search_results: + doc_id = getattr(doc, 'id', str(doc)) + if doc_id not in self.blacklist and doc_id not in previous_results: + filtered.append(doc) + return filtered + + def _update_blacklist(self, all_results: list, filtered_results: list): + """Add filtered-out documents to blacklist.""" + filtered_ids = {getattr(doc, 'id', str(doc)) for doc in filtered_results} + for doc in all_results: + doc_id = getattr(doc, 'id', str(doc)) + if doc_id not in filtered_ids: + self.blacklist.add(doc_id) + + def forward( + self, + question: str, + weaviate_client: Optional[weaviate.WeaviateClient] = None + ) -> DSPyAgentRAGResponse: + if weaviate_client is None: + if isinstance(self.weaviate_client, weaviate.WeaviateClient): + weaviate_client = self.weaviate_client + + # Reset state for new query + self.blacklist = set() + self.previous_top_k = set() + + # Initial retrieval + initial_results = weaviate_search_tool( + query=question, + collection_name=self.collection_name, + target_property_name=self.target_property_name, + retrieved_k=10, + weaviate_client=weaviate_client, + ) + + current_query = question + all_expansions = [] + all_searches = [question] + + # Iterative expansion rounds + for round_idx in range(self.num_rounds): + # Get documents for this round (initial or from previous query) + if round_idx == 0: + round_results = initial_results + else: + all_round_results = weaviate_search_tool( + query=current_query, + collection_name=self.collection_name, + target_property_name=self.target_property_name, + retrieved_k=10, # Retrieve more for filtering + weaviate_client=weaviate_client, + ) + + # Apply redundancy filtering + round_results = self._filter_documents( + all_round_results, + self.previous_top_k + )[:self.retrieved_k] + + # Update blacklist and previous top-K + self._update_blacklist(all_round_results, round_results) + self.previous_top_k = {getattr(doc, 'id', str(doc)) for doc in round_results} + + # Generate multiple expansion samples for diversity + round_expansions = [] + for _ in range(self.num_samples): + expansion_response = self.ThinkQE( + question=question, + possible_answering_passages=round_results + ) + expansion = expansion_response.correct_answering_passage + round_expansions.append(expansion) + + if self.verbose: + print(f"\033[95mRound {round_idx + 1} Expansion:\n{expansion}\033[0m\n") + + # Accumulate expansions + all_expansions.extend(round_expansions) + + # Update query by concatenating expansions + current_query = question + " " + " ".join(all_expansions) + all_searches.append(current_query) + + # Add query repetition to reinforce original intent + num_repetitions = self._calculate_query_repetitions(question, all_expansions) + final_query = " ".join([question] * num_repetitions) + " " + " ".join(all_expansions) + + if self.verbose: + print(f"\033[94mFinal query with {num_repetitions} repetitions\033[0m") + + # Final retrieval with accumulated query + final_sources = weaviate_search_tool( + query=final_query, + collection_name=self.collection_name, + target_property_name=self.target_property_name, + retrieved_k=self.retrieved_k, + weaviate_client=weaviate_client, + ) + + return DSPyAgentRAGResponse( + final_answer="", + sources=final_sources, + searches=all_searches, + aggregations=None, + usage={}, + ) + + async def aforward( + self, + question: str, + weaviate_async_client: Optional[weaviate.WeaviateAsyncClient] = None + ) -> DSPyAgentRAGResponse: + if weaviate_async_client is None: + if isinstance(self.weaviate_async_client, weaviate.WeaviateAsyncClient): + weaviate_async_client = self.weaviate_async_client + + # Reset state for new query + self.blacklist = set() + self.previous_top_k = set() + + # Initial retrieval + initial_results = await async_weaviate_search_tool( + query=question, + collection_name=self.collection_name, + target_property_name=self.target_property_name, + retrieved_k=self.retrieved_k, + weaviate_async_client=weaviate_async_client, + ) + + current_query = question + all_expansions = [] + all_searches = [question] + + # Iterative expansion rounds + for round_idx in range(self.num_rounds): + # Get documents for this round + if round_idx == 0: + round_results = initial_results + else: + all_round_results = await async_weaviate_search_tool( + query=current_query, + collection_name=self.collection_name, + target_property_name=self.target_property_name, + retrieved_k=self.retrieved_k * 3, + weaviate_async_client=weaviate_async_client, + ) + + # Apply redundancy filtering + round_results = self._filter_documents( + all_round_results, + self.previous_top_k + )[:self.retrieved_k] + + # Update blacklist and previous top-K + self._update_blacklist(all_round_results, round_results) + self.previous_top_k = {getattr(doc, 'id', str(doc)) for doc in round_results} + + # Generate multiple expansion samples for diversity + round_expansions = [] + for _ in range(self.num_samples): + expansion_response = await self.ThinkQE.acall( + question=question, + possible_answering_passages=round_results, + ) + expansion = expansion_response.correct_answering_passage + round_expansions.append(expansion) + + if self.verbose: + print(f"\033[95mRound {round_idx + 1} Expansion:\n{expansion}\033[0m\n") + + # Accumulate expansions + all_expansions.extend(round_expansions) + + # Update query + current_query = question + " " + " ".join(all_expansions) + all_searches.append(current_query) + + # Add query repetition + num_repetitions = self._calculate_query_repetitions(question, all_expansions) + final_query = " ".join([question] * num_repetitions) + " " + " ".join(all_expansions) + + if self.verbose: + print(f"\033[94mFinal query with {num_repetitions} repetitions\033[0m") + + # Final retrieval + final_sources = await async_weaviate_search_tool( + query=final_query, + collection_name=self.collection_name, + target_property_name=self.target_property_name, + retrieved_k=self.retrieved_k, + weaviate_async_client=weaviate_async_client, + ) + + return DSPyAgentRAGResponse( + final_answer="", + sources=final_sources, + searches=all_searches, + aggregations=None, + usage={}, + ) + + +async def main(): + test_pipeline = ThinkQE_QueryExpander( + collection_name="BrightBiology", + target_property_name="content", + verbose=True, + retrieved_k=5, + num_rounds=3, + num_samples=2, + ) + + test_q = "How many cells are in the human body?" + + weaviate_client = weaviate.connect_to_weaviate_cloud( + cluster_url=os.getenv("WEAVIATE_URL"), + auth_credentials=weaviate.auth.AuthApiKey(os.getenv("WEAVIATE_API_KEY")), + ) + + weaviate_async_client = weaviate.use_async_with_weaviate_cloud( + cluster_url=os.getenv("WEAVIATE_URL"), + auth_credentials=weaviate.auth.AuthApiKey(os.getenv("WEAVIATE_API_KEY")), + ) + + await weaviate_async_client.connect() + + print("=== Testing Sync Forward ===") + test_sync_response = test_pipeline.forward(test_q, weaviate_client=weaviate_client) + print(test_sync_response) + + print("\n=== Testing Async Forward ===") + test_async_response = await test_pipeline.aforward(test_q, weaviate_async_client=weaviate_async_client) + print(test_async_response) + + weaviate_client.close() + await weaviate_async_client.close() + + +if __name__ == "__main__": + asyncio.run(main()) \ No newline at end of file diff --git a/retrieve_dspy/retrievers/decompose_and_expand.py b/retrieve_dspy/retrievers/query_writers/decompose_and_expand.py similarity index 98% rename from retrieve_dspy/retrievers/decompose_and_expand.py rename to retrieve_dspy/retrievers/query_writers/decompose_and_expand.py index fc18801..2b5d3a1 100644 --- a/retrieve_dspy/retrievers/decompose_and_expand.py +++ b/retrieve_dspy/retrievers/query_writers/decompose_and_expand.py @@ -2,7 +2,7 @@ import dspy -from retrieve_dspy.tools.weaviate_database import ( +from retrieve_dspy.database.weaviate_database import ( weaviate_search_tool, async_weaviate_search_tool ) diff --git a/retrieve_dspy/retrievers/decompose_and_expand_with_hints.py b/retrieve_dspy/retrievers/query_writers/decompose_and_expand_with_hints.py similarity index 99% rename from retrieve_dspy/retrievers/decompose_and_expand_with_hints.py rename to retrieve_dspy/retrievers/query_writers/decompose_and_expand_with_hints.py index 4e0b12b..7527b39 100644 --- a/retrieve_dspy/retrievers/decompose_and_expand_with_hints.py +++ b/retrieve_dspy/retrievers/query_writers/decompose_and_expand_with_hints.py @@ -2,7 +2,7 @@ import dspy -from retrieve_dspy.tools.weaviate_database import ( +from retrieve_dspy.database.weaviate_database import ( weaviate_search_tool, async_weaviate_search_tool ) diff --git a/retrieve_dspy/retrievers/filtered_query_writer.py b/retrieve_dspy/retrievers/query_writers/filtered_query_writer.py similarity index 95% rename from retrieve_dspy/retrievers/filtered_query_writer.py rename to retrieve_dspy/retrievers/query_writers/filtered_query_writer.py index 55b6324..6dd2399 100644 --- a/retrieve_dspy/retrievers/filtered_query_writer.py +++ b/retrieve_dspy/retrievers/query_writers/filtered_query_writer.py @@ -3,7 +3,7 @@ import dspy -from retrieve_dspy.tools.weaviate_database import ( +from retrieve_dspy.database.weaviate_database import ( get_tag_values, weaviate_search_tool, async_weaviate_search_tool @@ -11,7 +11,7 @@ from retrieve_dspy.retrievers.base_rag import BaseRAG -from retrieve_dspy.models import DSPyAgentRAGResponse, Source +from retrieve_dspy.models import DSPyAgentRAGResponse, ObjectFromDB from retrieve_dspy.signatures import SearchQueryWithFilter, WriteSearchQueriesWithFilters class FilteredQueryWriter(BaseRAG): @@ -37,7 +37,7 @@ def forward(self, question: str) -> DSPyAgentRAGResponse: usage_buckets = [fqw_pred.get_lm_usage() or {}] - sources: list[Source] = [] + sources: list[ObjectFromDB] = [] for q in queries: _, src = weaviate_search_tool( query=q.search_query, @@ -88,7 +88,7 @@ async def aforward(self, question: str) -> DSPyAgentRAGResponse: search_results = await asyncio.gather(*search_tasks) - sources: list[Source] = [] + sources: list[ObjectFromDB] = [] for _, src in search_results: sources.extend(src) diff --git a/retrieve_dspy/retrievers/query_writers/hyde.py b/retrieve_dspy/retrievers/query_writers/hyde.py new file mode 100644 index 0000000..10e391b --- /dev/null +++ b/retrieve_dspy/retrievers/query_writers/hyde.py @@ -0,0 +1,116 @@ +import asyncio +import os +from typing import Optional + +import dspy +import weaviate + +from retrieve_dspy.database.weaviate_database import ( + weaviate_search_tool, + async_weaviate_search_tool +) + +from retrieve_dspy.retrievers.base_rag import BaseRAG + +from retrieve_dspy.models import DSPyAgentRAGResponse +from retrieve_dspy.signatures import HyDE, VerboseHyDE + +class HyDE_QueryExpander(BaseRAG): + def __init__( + self, + collection_name: str, + target_property_name: str, + weaviate_client: Optional[weaviate.WeaviateClient | weaviate.WeaviateAsyncClient] = None, + verbose: Optional[bool] = False, + search_only: Optional[bool] = True, + retrieved_k: Optional[int] = 20 + ): + super().__init__( + collection_name=collection_name, + target_property_name=target_property_name, + search_only=search_only, + verbose=verbose, + retrieved_k=retrieved_k + ) + self.weaviate_client = weaviate_client + if self.verbose: + self.hyde = dspy.Predict(VerboseHyDE) + else: + self.hyde = dspy.Predict(HyDE) + + def forward(self, question: str, weaviate_client: Optional[weaviate.WeaviateClient] = None) -> DSPyAgentRAGResponse: + if weaviate_client is None: + if isinstance(self.weaviate_client, weaviate.WeaviateClient): + weaviate_client = self.weaviate_client + + hypothetical_passage = self.hyde(question=question).passage + + if self.verbose: + print(f"\033[95mHypothetical passage:\n{hypothetical_passage}\033[0m") + + sources = weaviate_search_tool( + query=hypothetical_passage, + collection_name=self.collection_name, + target_property_name=self.target_property_name, + weaviate_client=weaviate_client, + retrieved_k=self.retrieved_k, + ) + + return DSPyAgentRAGResponse( + final_answer="", + sources=sources, + searches=[hypothetical_passage], + aggregations=None, + usage={}, + ) + + async def aforward(self, question: str, weaviate_async_client: Optional[weaviate.WeaviateAsyncClient] = None) -> DSPyAgentRAGResponse: + if weaviate_async_client is None: + if isinstance(self.weaviate_async_client, weaviate.WeaviateAsyncClient): + weaviate_async_client = self.weaviate_async_client + + hypothetical_passage_pred = await self.hyde.acall(question=question) + hypothetical_passage = hypothetical_passage_pred.passage + + if self.verbose: + print(f"\033[95mHypothetical passage:\n{hypothetical_passage}\033[0m") + + sources = await async_weaviate_search_tool( + query=hypothetical_passage, + collection_name=self.collection_name, + target_property_name=self.target_property_name, + weaviate_async_client=weaviate_async_client, + retrieved_k=self.retrieved_k, + ) + + return DSPyAgentRAGResponse( + final_answer="", + sources=sources, + searches=[hypothetical_passage], + aggregations=None, + usage={}, + ) + +async def main(): + test_pipeline = HyDE_QueryExpander( + collection_name="BrightBiology", + target_property_name="content", + retrieved_k=5 + ) + test_q = "How many cells are in the human body?" + weaviate_client = weaviate.connect_to_weaviate_cloud( + cluster_url=os.getenv("WEAVIATE_URL"), + auth_credentials=weaviate.auth.AuthApiKey(os.getenv("WEAVIATE_API_KEY")), + ) + weaviate_async_client = weaviate.use_async_with_weaviate_cloud( + cluster_url=os.getenv("WEAVIATE_URL"), + auth_credentials=weaviate.auth.AuthApiKey(os.getenv("WEAVIATE_API_KEY")), + ) + await weaviate_async_client.connect() + test_sync_response = test_pipeline.forward(test_q, weaviate_client=weaviate_client) + print(test_sync_response) + test_async_response = await test_pipeline.aforward(test_q, weaviate_async_client=weaviate_async_client) + print(test_async_response) + +if __name__ == "__main__": + asyncio.run(main()) \ No newline at end of file diff --git a/retrieve_dspy/retrievers/multi_query_writer.py b/retrieve_dspy/retrievers/query_writers/multi_query_writer.py similarity index 93% rename from retrieve_dspy/retrievers/multi_query_writer.py rename to retrieve_dspy/retrievers/query_writers/multi_query_writer.py index f3d70f5..e3d5aab 100644 --- a/retrieve_dspy/retrievers/multi_query_writer.py +++ b/retrieve_dspy/retrievers/query_writers/multi_query_writer.py @@ -4,14 +4,14 @@ import dspy -from retrieve_dspy.tools.weaviate_database import ( +from retrieve_dspy.database.weaviate_database import ( weaviate_search_tool, async_weaviate_search_tool ) from retrieve_dspy.retrievers.base_rag import BaseRAG -from retrieve_dspy.models import DSPyAgentRAGResponse, Source +from retrieve_dspy.models import DSPyAgentRAGResponse, ObjectFromDB from retrieve_dspy.signatures import WriteSearchQueries class MultiQueryWriter(BaseRAG): @@ -49,11 +49,11 @@ def forward(self, question: str) -> DSPyAgentRAGResponse: usage_buckets = [qw_pred.get_lm_usage() or {}] - sources: list[Source] = [] + sources: list[ObjectFromDB] = [] if self.search_with_queries_concatenated: concatenated_query = " ".join(queries) - _, src = weaviate_search_tool( + src = weaviate_search_tool( query=concatenated_query, collection_name=self.collection_name, target_property_name=self.target_property_name, @@ -63,7 +63,7 @@ def forward(self, question: str) -> DSPyAgentRAGResponse: else: for q in queries: - _, src = weaviate_search_tool( + src = weaviate_search_tool( query=q, collection_name=self.collection_name, target_property_name=self.target_property_name, @@ -99,11 +99,11 @@ async def aforward(self, question: str) -> DSPyAgentRAGResponse: usage_buckets = [qw_pred.get_lm_usage() or {}] - sources: list[Source] = [] + sources: list[ObjectFromDB] = [] if self.search_with_queries_concatenated: concatenated_query = " ".join(queries) - _, src = await async_weaviate_search_tool( + src = await async_weaviate_search_tool( query=concatenated_query, collection_name=self.collection_name, target_property_name=self.target_property_name, @@ -125,7 +125,7 @@ async def aforward(self, question: str) -> DSPyAgentRAGResponse: search_results = await asyncio.gather(*search_tasks) - for _, src in search_results: + for src in search_results: sources.extend(src) if self.verbose: diff --git a/retrieve_dspy/retrievers/multi_query_writer_with_cluster_ranking.py b/retrieve_dspy/retrievers/query_writers/multi_query_writer_with_cluster_ranking.py similarity index 97% rename from retrieve_dspy/retrievers/multi_query_writer_with_cluster_ranking.py rename to retrieve_dspy/retrievers/query_writers/multi_query_writer_with_cluster_ranking.py index 9ba3a27..906eb49 100644 --- a/retrieve_dspy/retrievers/multi_query_writer_with_cluster_ranking.py +++ b/retrieve_dspy/retrievers/query_writers/multi_query_writer_with_cluster_ranking.py @@ -11,7 +11,7 @@ from sklearn.manifold import TSNE import matplotlib.pyplot as plt -from retrieve_dspy.tools.weaviate_database import ( +from retrieve_dspy.database.weaviate_database import ( weaviate_search_tool, async_weaviate_search_tool ) @@ -183,13 +183,12 @@ def forward(self, question: str) -> DSPyAgentRAGResponse: else: deduplicated_sources: dict[str, SourceWithContentAndVector] = {} for q in queries: - retrieved_docs, _ = weaviate_search_tool( + retrieved_docs = weaviate_search_tool( query=q, collection_name=self.collection_name, target_property_name=self.target_property_name, retrieved_k=self.retrieved_k, return_vector=True, - return_format="vectors" ) for doc in retrieved_docs: deduplicated_sources[doc.object_id] = doc @@ -237,14 +236,13 @@ async def aforward(self, question: str) -> DSPyAgentRAGResponse: target_property_name=self.target_property_name, retrieved_k=self.retrieved_k, return_vector=True, - return_format="vectors" ) for q in queries ] search_results_tuples = await asyncio.gather(*search_tasks) deduplicated_sources: dict[str, SourceWithContentAndVector] = {} - for retrieved_docs, _ in search_results_tuples: + for retrieved_docs in search_results_tuples: for doc in retrieved_docs: deduplicated_sources[doc.object_id] = doc sources_with_vectors = list(deduplicated_sources.values()) diff --git a/retrieve_dspy/retrievers/multi_query_writer_with_hint.py b/retrieve_dspy/retrievers/query_writers/multi_query_writer_with_hint.py similarity index 96% rename from retrieve_dspy/retrievers/multi_query_writer_with_hint.py rename to retrieve_dspy/retrievers/query_writers/multi_query_writer_with_hint.py index c9b7d2b..a4a064d 100644 --- a/retrieve_dspy/retrievers/multi_query_writer_with_hint.py +++ b/retrieve_dspy/retrievers/query_writers/multi_query_writer_with_hint.py @@ -4,14 +4,14 @@ import dspy -from retrieve_dspy.tools.weaviate_database import ( +from retrieve_dspy.database.weaviate_database import ( weaviate_search_tool, async_weaviate_search_tool ) from retrieve_dspy.retrievers.base_rag import BaseRAG -from retrieve_dspy.models import DSPyAgentRAGResponse, Source +from retrieve_dspy.models import DSPyAgentRAGResponse, ObjectFromDB from retrieve_dspy.signatures import DecomposeQueryWithHint class MultiQueryWriterWithHint(BaseRAG): @@ -55,7 +55,7 @@ def forward(self, question: str) -> DSPyAgentRAGResponse: usage_buckets = [qw_pred.get_lm_usage() or {}] - sources: list[Source] = [] + sources: list[ObjectFromDB] = [] if self.search_with_queries_concatenated: concatenated_query = " ".join(queries) @@ -111,7 +111,7 @@ async def aforward(self, question: str) -> DSPyAgentRAGResponse: usage_buckets = [qw_pred.get_lm_usage() or {}] - sources: list[Source] = [] + sources: list[ObjectFromDB] = [] if self.search_with_queries_concatenated: concatenated_query = " ".join(queries) diff --git a/retrieve_dspy/retrievers/multi_query_writer_with_reranker.py b/retrieve_dspy/retrievers/query_writers/multi_query_writer_with_reranker.py similarity index 89% rename from retrieve_dspy/retrievers/multi_query_writer_with_reranker.py rename to retrieve_dspy/retrievers/query_writers/multi_query_writer_with_reranker.py index 45bd423..78161c3 100644 --- a/retrieve_dspy/retrievers/multi_query_writer_with_reranker.py +++ b/retrieve_dspy/retrievers/query_writers/multi_query_writer_with_reranker.py @@ -6,13 +6,13 @@ import dspy from cohere import RerankResponseResultsItem -from retrieve_dspy.tools.weaviate_database import ( +from retrieve_dspy.database.weaviate_database import ( weaviate_search_tool, async_weaviate_search_tool ) from retrieve_dspy.retrievers.base_rag import BaseRAG -from retrieve_dspy.models import DSPyAgentRAGResponse, Source +from retrieve_dspy.models import DSPyAgentRAGResponse, ObjectFromDB from retrieve_dspy.signatures import WriteSearchQueries @@ -66,7 +66,7 @@ def __init__( self.co = cohere.ClientV2(api_key) - def _deduplicate_sources(self, sources: List[Source]) -> list[Source]: + def _deduplicate_sources(self, sources: List[ObjectFromDB]) -> list[ObjectFromDB]: """ Remove duplicate sources based on object_id and return unique content. @@ -74,7 +74,7 @@ def _deduplicate_sources(self, sources: List[Source]) -> list[Source]: Tuple of (unique sources, corresponding document texts) """ seen_ids: Set[str] = set() - unique_sources: List[Source] = [] + unique_sources: List[ObjectFromDB] = [] for source in sources: if source.object_id not in seen_ids: @@ -135,22 +135,21 @@ def forward(self, question: str) -> DSPyAgentRAGResponse: usage_buckets = [qw_pred.get_lm_usage() or {}] if self.two_stage_reranking and not self.search_with_queries_concatenated: - all_sources: list[Source] = [] + all_sources: list[ObjectFromDB] = [] all_documents: list[str] = [] for i, query in enumerate(queries, 1): - search_results, sources = weaviate_search_tool( + sources = weaviate_search_tool( query=query, collection_name=self.collection_name, target_property_name=self.target_property_name, retrieved_k=self.retrieved_k, - return_format="rerank" ) if self.verbose: print(f"\n\033[96mQuery {i} retrieved {len(sources)} documents\033[0m") - query_documents = [result.content for result in search_results] + query_documents = [s.content for s in sources] if len(query_documents) > 0: reranked_for_query = self._rerank_with_cohere(query, query_documents) @@ -187,31 +186,29 @@ def forward(self, question: str) -> DSPyAgentRAGResponse: else: # Original single-stage approach - all_sources: list[Source] = [] + all_sources: list[ObjectFromDB] = [] all_search_results = [] if self.search_with_queries_concatenated: concatenated_query = " ".join(queries) - search_results, sources = weaviate_search_tool( + sources = weaviate_search_tool( query=concatenated_query, collection_name=self.collection_name, target_property_name=self.target_property_name, retrieved_k=self.retrieved_k, - return_format="rerank" ) all_sources.extend(sources) - all_search_results.extend(search_results) + all_search_results.extend([s.content for s in sources]) else: for q in queries: - search_results, sources = weaviate_search_tool( + sources = weaviate_search_tool( query=q, collection_name=self.collection_name, target_property_name=self.target_property_name, retrieved_k=self.retrieved_k, - return_format="rerank" ) all_sources.extend(sources) - all_search_results.extend(search_results) + all_search_results.extend([s.content for s in sources]) if self.verbose: print(f"\033[96mRetrieved {len(all_sources)} total documents " @@ -222,13 +219,11 @@ def forward(self, question: str) -> DSPyAgentRAGResponse: # Extract content seen_ids = set() - unique_search_results = [] - for result, source in zip(all_search_results, all_sources): + unique_documents = [] + for source in all_sources: if source.object_id not in seen_ids: seen_ids.add(source.object_id) - unique_search_results.append(result) - - unique_documents = [result.content for result in unique_search_results] + unique_documents.append(source.content) if self.verbose: print(f"\n\033[93mReranking {len(unique_documents)} unique documents...\033[0m") @@ -278,7 +273,7 @@ async def aforward(self, question: str) -> DSPyAgentRAGResponse: usage_buckets = [qw_pred.get_lm_usage() or {}] if self.two_stage_reranking and not self.search_with_queries_concatenated: - all_sources: list[Source] = [] + all_sources: list[ObjectFromDB] = [] all_documents: list[str] = [] # Execute all searches concurrently @@ -288,18 +283,17 @@ async def aforward(self, question: str) -> DSPyAgentRAGResponse: collection_name=self.collection_name, target_property_name=self.target_property_name, retrieved_k=self.retrieved_k, - return_format="rerank" ) for q in queries ] search_results_list = await asyncio.gather(*search_tasks) - for i, (query, (search_results, sources)) in enumerate(zip(queries, search_results_list), 1): + for i, (query, sources) in enumerate(zip(queries, search_results_list), 1): if self.verbose: print(f"\n\033[96mQuery {i} retrieved {len(sources)} documents\033[0m") - query_documents = [result.content for result in search_results] + query_documents = [s.content for s in sources] if len(query_documents) > 0: reranked_for_query = await self._async_rerank_with_cohere(query, query_documents) @@ -336,20 +330,19 @@ async def aforward(self, question: str) -> DSPyAgentRAGResponse: else: # Original single-stage approach - all_sources: list[Source] = [] + all_sources: list[ObjectFromDB] = [] all_search_results = [] if self.search_with_queries_concatenated: concatenated_query = " ".join(queries) - search_results, sources = await async_weaviate_search_tool( + sources = await async_weaviate_search_tool( query=concatenated_query, collection_name=self.collection_name, target_property_name=self.target_property_name, retrieved_k=self.retrieved_k, - return_format="rerank" ) all_sources.extend(sources) - all_search_results.extend(search_results) + all_search_results.extend([s.content for s in sources]) else: search_tasks = [ async_weaviate_search_tool( @@ -357,31 +350,28 @@ async def aforward(self, question: str) -> DSPyAgentRAGResponse: collection_name=self.collection_name, target_property_name=self.target_property_name, retrieved_k=self.retrieved_k, - return_format="rerank" ) for q in queries ] results = await asyncio.gather(*search_tasks) - for search_results, sources in results: + for sources in results: all_sources.extend(sources) - all_search_results.extend(search_results) + all_search_results.extend([s.content for s in sources]) if self.verbose: print(f"\033[96mRetrieved {len(all_sources)} total documents " f"({len(set(s.object_id for s in all_sources))} unique)\033[0m") - unique_sources, _ = self._deduplicate_sources(all_sources) + unique_sources = self._deduplicate_sources(all_sources) seen_ids = set() - unique_search_results = [] - for result, source in zip(all_search_results, all_sources): + unique_documents = [] + for source in all_sources: if source.object_id not in seen_ids: seen_ids.add(source.object_id) - unique_search_results.append(result) - - unique_documents = [result.content for result in unique_search_results] + unique_documents.append(source.content) if self.verbose: print(f"\n\033[93mReranking {len(unique_documents)} unique documents...\033[0m") diff --git a/retrieve_dspy/retrievers/query_expander.py b/retrieve_dspy/retrievers/query_writers/query_expander.py similarity index 94% rename from retrieve_dspy/retrievers/query_expander.py rename to retrieve_dspy/retrievers/query_writers/query_expander.py index 7ea7670..a8c7a3a 100644 --- a/retrieve_dspy/retrievers/query_expander.py +++ b/retrieve_dspy/retrievers/query_writers/query_expander.py @@ -3,10 +3,10 @@ import dspy -from retrieve_dspy.tools.weaviate_database import ( +from retrieve_dspy.database.weaviate_database import ( weaviate_search_tool, async_weaviate_search_tool -) +) from retrieve_dspy.retrievers.base_rag import BaseRAG from retrieve_dspy.models import DSPyAgentRAGResponse from retrieve_dspy.signatures import ExpandQuery @@ -29,7 +29,7 @@ def forward(self, question: str) -> DSPyAgentRAGResponse: if self.verbose: print(f"\033[95mExpanded query from:\n{question}\nto:\n{expanded_query}\033[0m") - contexts, sources = weaviate_search_tool( + sources = weaviate_search_tool( query=expanded_query, collection_name=self.collection_name, target_property_name=self.target_property_name, @@ -54,7 +54,7 @@ async def aforward(self, question: str) -> DSPyAgentRAGResponse: if self.verbose: print(f"\033[95mExpanded query from:\n{question}\nto:\n{expanded_query}\033[0m") - contexts, sources = await async_weaviate_search_tool( + sources = await async_weaviate_search_tool( query=expanded_query, collection_name=self.collection_name, target_property_name=self.target_property_name, diff --git a/retrieve_dspy/retrievers/query_expander_with_hint.py b/retrieve_dspy/retrievers/query_writers/query_expander_with_hint.py similarity index 91% rename from retrieve_dspy/retrievers/query_expander_with_hint.py rename to retrieve_dspy/retrievers/query_writers/query_expander_with_hint.py index 7c4d510..8a58015 100644 --- a/retrieve_dspy/retrievers/query_expander_with_hint.py +++ b/retrieve_dspy/retrievers/query_writers/query_expander_with_hint.py @@ -3,7 +3,7 @@ import dspy -from retrieve_dspy.tools.weaviate_database import ( +from retrieve_dspy.database.weaviate_database import ( weaviate_search_tool, async_weaviate_search_tool ) @@ -24,7 +24,7 @@ def __init__( self.expand_query = dspy.Predict(ExpandQueryWithHint) def forward(self, question: str) -> DSPyAgentRAGResponse: - initial_search_results, _ = weaviate_search_tool( + initial_search_results = weaviate_search_tool( query=question, collection_name=self.collection_name, target_property_name=self.target_property_name, @@ -35,7 +35,7 @@ def forward(self, question: str) -> DSPyAgentRAGResponse: if self.verbose: print(f"\033[95mExpanded query from:\n{question}\nto:\n{expanded_query}\033[0m") - contexts, sources = weaviate_search_tool( + sources = weaviate_search_tool( query=expanded_query, collection_name=self.collection_name, target_property_name=self.target_property_name, @@ -54,7 +54,7 @@ def forward(self, question: str) -> DSPyAgentRAGResponse: ) async def aforward(self, question: str) -> DSPyAgentRAGResponse: - initial_search_results, _ = await async_weaviate_search_tool( + initial_search_results = await async_weaviate_search_tool( query=question, collection_name=self.collection_name, target_property_name=self.target_property_name, @@ -66,7 +66,7 @@ async def aforward(self, question: str) -> DSPyAgentRAGResponse: if self.verbose: print(f"\033[95mExpanded query from:\n{question}\nto:\n{expanded_query}\033[0m") - contexts, sources = await async_weaviate_search_tool( + sources = await async_weaviate_search_tool( query=expanded_query, collection_name=self.collection_name, target_property_name=self.target_property_name, diff --git a/retrieve_dspy/retrievers/query_expander_with_reranker.py b/retrieve_dspy/retrievers/query_writers/query_expander_with_reranker.py similarity index 90% rename from retrieve_dspy/retrievers/query_expander_with_reranker.py rename to retrieve_dspy/retrievers/query_writers/query_expander_with_reranker.py index 8763318..4234d1a 100644 --- a/retrieve_dspy/retrievers/query_expander_with_reranker.py +++ b/retrieve_dspy/retrievers/query_writers/query_expander_with_reranker.py @@ -6,11 +6,11 @@ import cohere from cohere import RerankResponseResultsItem -from retrieve_dspy.tools.weaviate_database import ( +from retrieve_dspy.database.weaviate_database import ( weaviate_search_tool, async_weaviate_search_tool ) -from retrieve_dspy.retrievers.base_rag import BaseRAG +from retrieve_dspy.retrievers.base_rag import BaseRAG from retrieve_dspy.models import DSPyAgentRAGResponse from retrieve_dspy.signatures import ExpandQuery @@ -82,21 +82,17 @@ def forward(self, question: str) -> DSPyAgentRAGResponse: if self.verbose: print(f"\033[95mExpanded query from:\n'{question}'\nto:\n'{expanded_query}'\033[0m") - search_results, sources = weaviate_search_tool( + sources = weaviate_search_tool( query=expanded_query, collection_name=self.collection_name, target_property_name=self.target_property_name, retrieved_k=self.retrieved_k, - return_format="rerank" ) if self.verbose: - print(f"\033[96mInitial retrieval: {len(search_results)} documents\033[0m") + print(f"\033[96mInitial retrieval: {len(sources)} documents\033[0m") - documents = [] - for result in search_results: - doc_text = result.content if hasattr(result, 'content') else str(result) - documents.append(doc_text) + documents = [s.content for s in sources] if self.verbose: print(f"\n\033[93mPreparing {len(documents)} documents for reranking...\033[0m") @@ -144,21 +140,17 @@ async def aforward(self, question: str) -> DSPyAgentRAGResponse: if self.verbose: print(f"\033[95mExpanded query from:\n'{question}'\nto:\n'{expanded_query}'\033[0m") - search_results, sources = await async_weaviate_search_tool( + sources = await async_weaviate_search_tool( query=expanded_query, collection_name=self.collection_name, target_property_name=self.target_property_name, retrieved_k=self.retrieved_k, - return_format="rerank" ) if self.verbose: print(f"\033[96mInitial retrieval: {len(sources)} documents\033[0m") - documents = [] - for result in search_results: - doc_text = result.content if hasattr(result, 'content') else str(result) - documents.append(doc_text) + documents = [s.content for s in sources] reranked_results = await self._async_rerank_with_cohere(question, documents) diff --git a/retrieve_dspy/retrievers/query_writers/rag_fusion.py b/retrieve_dspy/retrievers/query_writers/rag_fusion.py new file mode 100644 index 0000000..876fa22 --- /dev/null +++ b/retrieve_dspy/retrievers/query_writers/rag_fusion.py @@ -0,0 +1,129 @@ +from typing import Optional, List + +import dspy +import weaviate + +from retrieve_dspy.retrievers.common.rrf import reciprocal_rank_fusion +from retrieve_dspy.database.weaviate_database import weaviate_search_tool +from retrieve_dspy.retrievers.base_rag import BaseRAG +from retrieve_dspy.models import DSPyAgentRAGResponse, ObjectFromDB +from retrieve_dspy.signatures import WriteSearchQueries, VerboseWriteSearchQueries + +class RAGFusion(BaseRAG): + def __init__( + self, + weaviate_client: weaviate.WeaviateClient, + collection_name: str, + target_property_name: str, + retrieved_k: int = 20, + reranked_k: int = 200, + rrf_k: int = 60, # RRF constant + verbose: Optional[bool] = False, + verbose_signature: Optional[bool] = True + ): + super().__init__(weaviate_client, collection_name, target_property_name, verbose) + self.retrieved_k = retrieved_k + self.reranked_k = reranked_k + self.rrf_k = rrf_k + + if verbose_signature: + self.decompose_query = dspy.Predict(VerboseWriteSearchQueries) + else: + self.decompose_query = dspy.Predict(WriteSearchQueries) + + def forward(self, question: str, weaviate_client: Optional[weaviate.WeaviateClient] = None) -> DSPyAgentRAGResponse: + # Generate query variations + if weaviate_client is None: + weaviate_client = self.weaviate_client + + search_queries_response = self.decompose_query(question=question) + search_queries = search_queries_response.search_queries + + # Add original query if not already included + if question not in search_queries: + search_queries = [question] + search_queries + + if self.verbose: + print(f"Search queries: {search_queries}") + + # Retrieve results for each query + result_sets: List[List[ObjectFromDB]] = [] + for query in search_queries: + results = weaviate_search_tool( + weaviate_client=weaviate_client, + query=query, + collection_name=self.collection_name, + target_property_name=self.target_property_name, + retrieved_k=self.retrieved_k, + ) + + # Tag results with source query for debugging + for obj in results: + obj.source_query = query + + result_sets.append(results) + + # Apply RRF to combine results + fused_results = reciprocal_rank_fusion( + result_sets=result_sets, + k=self.rrf_k, + top_k=self.reranked_k + ) + + if self.verbose: + print(f"Fused {len(fused_results)} unique documents from {sum(len(rs) for rs in result_sets)} total") + + # Use your existing answer generation logic here + # For now, returning structured response + return DSPyAgentRAGResponse( + final_answer="", # You'll need to add answer generation + sources=fused_results, + searches=search_queries, + aggregations=None, + usage={}, + ) + + +if __name__ == "__main__": + import os + + # Initialize Weaviate client + weaviate_client = weaviate.connect_to_weaviate_cloud( + cluster_url=os.getenv("WEAVIATE_URL"), + auth_credentials=weaviate.auth.AuthApiKey(os.getenv("WEAVIATE_API_KEY")) + ) + + # Initialize RAG Fusion + rag_fusion = RAGFusion( + collection_name="EnronEmails", + target_property_name="email_body", + verbose=True, + verbose_signature=True, + retrieved_k=20, + reranked_k=20, + rrf_k=60 + ) + + # Test query + test_question = "What are the implications of SBX12?" + + print(f"Testing RAG Fusion with question: {test_question}") + + try: + response = rag_fusion.forward(test_question) + + print(f"\nGenerated {len(response.searches)} search queries:") + for i, query in enumerate(response.searches, 1): + print(f" {i}. {query}") + + print(f"\nFound {len(response.sources)} sources after fusion:") + for i, source in enumerate(response.sources[:3], 1): # Show first 3 + print(f" {i}. (Score: {source.relevance_score:.4f}) {source.content[:100]}...") + if source.source_query: + print(f" Source query: {source.source_query}") + + except Exception as e: + print(f"Error during testing: {e}") + + finally: + weaviate_client.close() \ No newline at end of file diff --git a/retrieve_dspy/retrievers/rerankers/cross_encoder_reranker.py b/retrieve_dspy/retrievers/rerankers/cross_encoder_reranker.py new file mode 100644 index 0000000..5406e63 --- /dev/null +++ b/retrieve_dspy/retrievers/rerankers/cross_encoder_reranker.py @@ -0,0 +1,245 @@ +from __future__ import annotations + +from typing import Optional, List, Dict + +import weaviate + +from retrieve_dspy.database.weaviate_database import weaviate_search_tool, async_weaviate_search_tool +from retrieve_dspy.models import DSPyAgentRAGResponse, ObjectFromDB, RerankerClient +from retrieve_dspy.retrievers.base_rag import BaseRAG +from retrieve_dspy.retrievers.common.call_ce_ranker import ( + RerankItem, + ce_rank, + async_ce_rank, + reorder, +) + + +class CrossEncoderReranker(BaseRAG): + def __init__( + self, + collection_name: str, + target_property_name: str, + weaviate_client: Optional[weaviate.WeaviateClient] = None, + reranker_clients: Optional[List[RerankerClient]] = None, + return_property_name: Optional[str] = None, + verbose: Optional[bool] = False, + search_only: Optional[bool] = True, + retrieved_k: Optional[int] = 50, + reranked_k: Optional[int] = 20, + model_name_overrides: Optional[Dict[str, str]] = None, + rrf_k: Optional[int] = 60, # Used for Mixture of Cross Encoders + hybrid_weights: Optional[Dict[str, float]] = None, + ): + """ + Initialize CrossEncoderReranker. + + Args: + model_name_overrides: Optional dict mapping provider names to model names. + Example: {"cohere": "rerank-v4.0", "voyage": "rerank-3.0"} + If not provided, defaults from call_ce_ranker will be used. + """ + super().__init__( + weaviate_client=weaviate_client, + collection_name=collection_name, + target_property_name=target_property_name, + verbose=verbose, + search_only=search_only, + retrieved_k=retrieved_k, + ) + self.return_property_name = return_property_name + self.reranker_clients = reranker_clients + self.reranked_k = int(reranked_k or 20) + self.model_name_overrides = model_name_overrides or {} + self.rrf_k = int(rrf_k or 60) + self.hybrid_weights = hybrid_weights + self.verbose = bool(verbose) + + def forward( + self, + question: str, + weaviate_client: Optional[weaviate.WeaviateClient] = None, + reranker_clients: Optional[List[RerankerClient]] = None, + ) -> DSPyAgentRAGResponse: + if weaviate_client is None: + weaviate_client = self.weaviate_client + + if reranker_clients is None: + reranker_clients = self.reranker_clients + + sources = weaviate_search_tool( + weaviate_client=weaviate_client, + query=question, + collection_name=self.collection_name, + target_property_name=self.target_property_name, + return_property_name=self.return_property_name, + retrieved_k=self.retrieved_k, + ) + + if self.verbose: + print(f"\033[96mInitial retrieval: {len(sources)} documents\033[0m") + print(f"Query: '{question}'") + + docs: List[str] = [s.content for s in sources] + + if not reranker_clients: + if self.verbose: + print("\033[93mNo reranker_clients provided; returning retrieved order\033[0m") + return DSPyAgentRAGResponse( + final_answer="", + sources=sources[: self.reranked_k], + searches=[question], + aggregations=None, + usage={}, + ) + + items: List[RerankItem] = ce_rank( + query=question, + documents=docs, + top_k=self.reranked_k, + clients=reranker_clients, + model_name_overrides=self.model_name_overrides, + rrf_k=self.rrf_k, + hybrid_weights=self.hybrid_weights, + verbose=self.verbose, + ) + + reranked: List[ObjectFromDB] = reorder(items, sources) + if self.verbose: + print(f"\n\033[96mReranked: Returning {len(reranked)} documents\033[0m") + + return DSPyAgentRAGResponse( + final_answer="", + sources=reranked, + searches=[question], + aggregations=None, + usage={}, + ) + + async def aforward( + self, + question: str, + weaviate_async_client: Optional[weaviate.AsyncWeaviateClient] = None, + reranker_clients: Optional[List[RerankerClient]] = None, + ) -> DSPyAgentRAGResponse: + sources = await async_weaviate_search_tool( + weaviate_async_client=weaviate_async_client, + query=question, + collection_name=self.collection_name, + target_property_name=self.target_property_name, + return_property_name=self.return_property_name, + retrieved_k=self.retrieved_k, + ) + + if self.verbose: + print(f"\033[96mInitial retrieval: {len(sources)} documents\033[0m") + print(f"Query: '{question}' (async)") + + docs: List[str] = [s.content for s in sources] + + if not reranker_clients: + if self.verbose: + print("\033[93mNo reranker_clients provided; returning retrieved order (async)\033[0m") + return DSPyAgentRAGResponse( + final_answer="", + sources=sources[: self.reranked_k], + searches=[question], + aggregations=None, + usage={}, + ) + + items = await async_ce_rank( + query=question, + documents=docs, + top_k=self.reranked_k, + clients=reranker_clients, + model_name_overrides=self.model_name_overrides, + rrf_k=self.rrf_k, + hybrid_weights=self.hybrid_weights, + verbose=self.verbose, + ) + + reranked: List[ObjectFromDB] = reorder(items, sources) + if self.verbose: + print(f"\n\033[96mReranked: Returning {len(reranked)} documents\033[0m") + + return DSPyAgentRAGResponse( + final_answer="", + sources=reranked, + searches=[question], + aggregations=None, + usage={}, + ) + + +async def main(): + import os + import cohere + import weaviate + + weaviate_client = weaviate.connect_to_weaviate_cloud( + cluster_url=os.getenv("WEAVIATE_URL"), + auth_credentials=weaviate.auth.AuthApiKey(os.getenv("WEAVIATE_API_KEY")), + ) + cohere_client = cohere.ClientV2(api_key=os.getenv("COHERE_API_KEY")) + # voyage_client = voyageai.Client(api_key=os.getenv("VOYAGE_API_KEY")) + + # Example with default models + cross_encoder_reranker = CrossEncoderReranker( + collection_name="EnronEmails", + target_property_name="email_body", + verbose=True, + search_only=True, + retrieved_k=50, + reranked_k=20, + ) + + # Example with custom model overrides + ''' + cross_encoder_with_overrides = CrossEncoderReranker( + collection_name="EnronEmails", + target_property_name="email_body", + verbose=True, + search_only=True, + retrieved_k=50, + reranked_k=20, + model_name_overrides={ + "cohere": "rerank-v3.5", + "voyage": "rerank-2.5" + } + ) + ''' + + # Test forward() method + print("Testing forward() method:") + response = cross_encoder_reranker.forward( + question="What are the implications of SBX12?", + weaviate_client=weaviate_client, + reranker_clients=[RerankerClient(name="cohere", client=cohere_client)], + ) + print(f"\033[92mSync successfully returned: {len(response.sources)} documents\033[0m") + + weaviate_async_client = weaviate.use_async_with_weaviate_cloud( + cluster_url=os.getenv("WEAVIATE_URL"), + auth_credentials=weaviate.auth.AuthApiKey(os.getenv("WEAVIATE_API_KEY")), + ) + await weaviate_async_client.connect() + cohere_async_client = cohere.AsyncClientV2(api_key=os.getenv("COHERE_API_KEY")) + # voyage_async_client = voyageai.AsyncClient(api_key=os.getenv("VOYAGE_API_KEY")) + + # Test aforward() method + print("\nTesting aforward() method:") + async_response = await cross_encoder_reranker.aforward( + question="What are the implications of SBX12?", + weaviate_async_client=weaviate_async_client, + reranker_clients=[ + RerankerClient(name="cohere", client=cohere_async_client), + # RerankerClient(name="voyage", client=voyage_async_client) + ], + ) + print(f"\033[92mAsync successfully returned: {len(async_response.sources)} documents\033[0m") + + +if __name__ == "__main__": + import asyncio + asyncio.run(main()) \ No newline at end of file diff --git a/retrieve_dspy/retrievers/rerankers/layered_best_match_reranker.py b/retrieve_dspy/retrievers/rerankers/layered_best_match_reranker.py new file mode 100644 index 0000000..de92ed1 --- /dev/null +++ b/retrieve_dspy/retrievers/rerankers/layered_best_match_reranker.py @@ -0,0 +1,213 @@ +import asyncio +from typing import Optional, List, Literal + +import dspy +import weaviate + +from retrieve_dspy.database.weaviate_database import ( + weaviate_search_tool +) +from retrieve_dspy.retrievers.base_rag import BaseRAG +from retrieve_dspy.models import DSPyAgentRAGResponse, ObjectFromDB, RerankerClient, MultiLMConfig +from retrieve_dspy.signatures import ( + VerboseBestMatchRanker, + BestMatchRanker, + VerboseSummarizeSearchRelevance, + SummarizeSearchRelevance, +) +from retrieve_dspy.retrievers.common.call_ce_ranker import ( + RerankItem, + ce_rank, + reorder, +) + +RerankProvider = Literal["voyage", "hybrid"] + +class LayeredBestMatchReranker(BaseRAG): + def __init__( + self, + weaviate_client: weaviate.WeaviateClient, + reranker_clients: List[RerankerClient], + collection_name: str, + target_property_name: str, + return_property_name: str, + verbose: bool = False, + verbose_signature: bool = True, + search_only: bool = True, + retrieved_k: int = 50, + reranked_N: int = 20, + reranked_M: int = 5, + reranker_provider: Optional[RerankProvider] = None, + cohere_model: Optional[str] = "rerank-v3.5", + voyage_model: str = "rerank-2.5", + multi_lm_configs: Optional[List[MultiLMConfig]] = None, + ): + super().__init__( + weaviate_client=weaviate_client, + collection_name=collection_name, + target_property_name=target_property_name, + verbose=verbose, + search_only=search_only, + retrieved_k=retrieved_k, + verbose_signature=verbose_signature, + multi_lm_configs=multi_lm_configs, + ) + self.return_property_name = return_property_name + self.reranker_clients = reranker_clients + self.reranked_N = reranked_N + self.reranked_M = reranked_M + self.voyage_model = voyage_model + self.reranker_provider = reranker_provider + self.cohere_model = cohere_model + # Initialize Listwise Reranker + if self.verbose_signature: + self.listwise_reranker = dspy.ChainOfThought(VerboseBestMatchRanker) + else: + self.listwise_reranker = dspy.Predict(BestMatchRanker) + + if self.verbose_signature: + self.summarizer = dspy.ChainOfThought(VerboseSummarizeSearchRelevance) + else: + self.summarizer = dspy.Predict(SummarizeSearchRelevance) + + def forward(self, question: str) -> DSPyAgentRAGResponse: + # first search with the original query + sources = weaviate_search_tool( + weaviate_client=self.weaviate_client, + query=question, + collection_name=self.collection_name, + target_property_name=self.target_property_name, + return_property_name=self.return_property_name, + retrieved_k=self.retrieved_k, + ) + + if self.verbose: + print(f"\033[96mInitial retrieval: {len(sources)} documents\033[0m") + + # Extract document content for reranking + documents = [s.content for s in sources] + + # then apply the cross encoder reranker to truncate the results to N + reranked_results: List[RerankItem] = ce_rank( + query=question, + documents=documents, + top_k=self.reranked_N, + clients=self.reranker_clients, + provider=self.reranker_provider, + cohere_model=self.cohere_model, + voyage_model=self.voyage_model, + verbose=self.verbose, + ) + + # Reorder sources based on Cohere's reranking + reranked_results: list[ObjectFromDB] = reorder(reranked_results, sources) + + if self.verbose: + print(f"\033[93mCross encoder reranking: {len(reranked_results)} documents\033[0m") + + objects_with_summarized_content: List[ObjectFromDB] = [] + + for result in reranked_results[:self.reranked_M]: + if self.multi_lm_configs: + with dspy.context(lm=self.multi_lm_configs_dict["summarizer"]): + summary = self.summarizer( + query=question, + passage=result.content, + ).relevance_summary + else: + summary = self.summarizer( + query=question, + passage=result.content, + ).relevance_summary + objects_with_summarized_content.append(ObjectFromDB( + object_id=result.object_id, + relevance_rank=result.relevance_rank, + content=summary + )) + + if self.verbose: + print("\033[93mSummarized objects...\033[0m") + print("Here is a sample:") + print(f"{objects_with_summarized_content[0].content[:100]}...") + print(f"{objects_with_summarized_content[0].object_id}") + + valid_object_ids = [obj.object_id for obj in objects_with_summarized_content] + + if self.multi_lm_configs: + with dspy.context(lm=self.multi_lm_configs_dict["listwise_reranker"]): + listwise_reranked_pred = self.listwise_reranker( + query=question, + search_results=objects_with_summarized_content, + top_k=self.reranked_M, + valid_object_ids=valid_object_ids + ) + else: + listwise_reranked_pred = self.listwise_reranker( + query=question, + search_results=objects_with_summarized_content, + top_k=self.reranked_M, + valid_object_ids=valid_object_ids + ) + + listwise_reranked_result = listwise_reranked_pred.best_match_id + listwise_reranked_result = str(listwise_reranked_result).strip().strip('"').strip("'") # parsing + if self.verbose: + print(f"\033[96mListwise reranked result: {listwise_reranked_result}\033[0m") + if self.verbose_signature: + rationale = listwise_reranked_pred.reasoning + print(f"\033[96mListwise reranked result rationale: {rationale}\033[0m") + + + chosen = None + for idx, obj in enumerate(reranked_results): + print(f"\033[38;5;208mChecking object {obj.object_id} against {listwise_reranked_result}\033[0m") + if str(obj.object_id) == str(listwise_reranked_result): + chosen = reranked_results.pop(idx) + reranked_results.insert(0, chosen) + break + + if self.verbose: + print(f"\033[92mListwise reranking: Returning {self.reranked_M} documents\033[0m") + + return DSPyAgentRAGResponse( + final_answer="", + sources=reranked_results[:self.reranked_N], + searches=[question], + aggregations=None, + usage={}, + ) + + async def aforward(self, question: str) -> DSPyAgentRAGResponse: + pass + +async def main(): + from retrieve_dspy.clients import get_weaviate_client, get_voyage_client + from retrieve_dspy.utils import get_lm + + gpt5 = get_lm("openai/gpt-5", max_tokens=32000) + + rag_pipeline = LayeredBestMatchReranker( + weaviate_client=get_weaviate_client(), + reranker_clients=[get_voyage_client()], + collection_name="EnronEmails", + target_property_name="email_body", + return_property_name="email_body", + retrieved_k=50, + reranked_N=20, + reranked_M=5, + voyage_model="rerank-2.5", + reranker_provider="voyage", + verbose=True, + verbose_signature=True, + multi_lm_configs=[MultiLMConfig(signature_name="listwise_reranker", lm=gpt5)] + ) + print("Testing sync with BestMatch strategy forward") + test_query = "Where will Governor Gray Davis host a party for the delegates, according to the article “Davis faces dire political consequences if power woes linger?" + response = rag_pipeline.forward(test_query) + print(response) + #print("Testing async forward") + #response = await rag_pipeline.aforward("What is the best way to learn Angular?") + #print(response) + +if __name__ == "__main__": + asyncio.run(main()) \ No newline at end of file diff --git a/retrieve_dspy/retrievers/rerankers/layered_listwise_reranker.py b/retrieve_dspy/retrievers/rerankers/layered_listwise_reranker.py new file mode 100644 index 0000000..73f237e --- /dev/null +++ b/retrieve_dspy/retrievers/rerankers/layered_listwise_reranker.py @@ -0,0 +1,249 @@ +import asyncio +from typing import Optional, List, Literal + +import dspy +import weaviate + +from retrieve_dspy.database.weaviate_database import ( + weaviate_search_tool +) +from retrieve_dspy.retrievers.base_rag import BaseRAG +from retrieve_dspy.models import DSPyAgentRAGResponse, ObjectFromDB, RerankerClient, MultiLMConfig +from retrieve_dspy.signatures import ( + VerboseRelevanceRanker, + RelevanceRanker, + VerboseSummarizeSearchRelevance, + SummarizeSearchRelevance, +) +from retrieve_dspy.retrievers.common.call_ce_ranker import ( + RerankItem, + ce_rank, + reorder, +) + +RerankProvider = Literal["voyage", "hybrid"] + +class LayeredListwiseReranker(BaseRAG): + def __init__( + self, + weaviate_client: weaviate.WeaviateClient, + reranker_clients: List[RerankerClient], + collection_name: str, + target_property_name: str, + return_property_name: str, + verbose: bool = False, + verbose_signature: bool = True, + search_only: bool = True, + retrieved_k: int = 50, + reranked_N: int = 20, + reranked_M: int = 5, + reranker_provider: Optional[RerankProvider] = None, + cohere_model: Optional[str] = "rerank-v3.5", + voyage_model: str = "rerank-2.5", + multi_lm_configs: Optional[List[MultiLMConfig]] = None, + ): + super().__init__( + weaviate_client=weaviate_client, + collection_name=collection_name, + target_property_name=target_property_name, + verbose=verbose, + search_only=search_only, + retrieved_k=retrieved_k, + verbose_signature=verbose_signature, + multi_lm_configs=multi_lm_configs, + ) + self.return_property_name = return_property_name + self.reranker_clients = reranker_clients + self.reranked_N = reranked_N + self.reranked_M = reranked_M + self.voyage_model = voyage_model + self.reranker_provider = reranker_provider + self.cohere_model = cohere_model + # Initialize Listwise Reranker + if self.verbose_signature: + self.listwise_reranker = dspy.ChainOfThought(VerboseRelevanceRanker) + else: + self.listwise_reranker = dspy.Predict(RelevanceRanker) + + if self.verbose_signature: + self.summarizer = dspy.ChainOfThought(VerboseSummarizeSearchRelevance) + else: + self.summarizer = dspy.Predict(SummarizeSearchRelevance) + + def forward(self, question: str) -> DSPyAgentRAGResponse: + # first search with the original query + sources = weaviate_search_tool( + weaviate_client=self.weaviate_client, + query=question, + collection_name=self.collection_name, + target_property_name=self.target_property_name, + return_property_name=self.return_property_name, + retrieved_k=self.retrieved_k, + ) + + if self.verbose: + print(f"\033[96mInitial retrieval: {len(sources)} documents\033[0m") + + # Debug the initial search results + print("DEBUG: First 10 source object_ids from initial search:") + for i, source in enumerate(sources[:10]): + print(f" Source {i}: {source.object_id}") + + print("DEBUG: Unique object_ids in sources:") + unique_ids = set(s.object_id for s in sources) + print(f" Total sources: {len(sources)}, Unique IDs: {len(unique_ids)}") + print(f" Unique IDs: {list(unique_ids)[:10]}") + + # Extract document content for reranking + documents = [s.content for s in sources] + + # then apply the cross encoder reranker to truncate the results to N + reranked_results: List[RerankItem] = ce_rank( + query=question, + documents=documents, + top_k=self.reranked_N, + clients=self.reranker_clients, + provider=self.reranker_provider, + cohere_model=self.cohere_model, + voyage_model=self.voyage_model, + verbose=self.verbose, + ) + + print("DEBUG: ce_rank results:") + for i, item in enumerate(reranked_results[:5]): + print(f" Item {i}: index={item.index}, score={getattr(item, 'score', 'N/A')}") + + print("DEBUG: Original sources object_ids:") + for i, source in enumerate(sources[:10]): + print(f" Source {i}: {source.object_id}") + + # Reorder sources based on Cohere's reranking + reordered_results: list[ObjectFromDB] = reorder(reranked_results, sources) + + if self.verbose: + print(f"\033[93mCross encoder reranking: {len(reordered_results)} documents\033[0m") + print("DEBUG: First 5 reranked object_ids:", [r.object_id for r in reordered_results[:5]]) + print("DEBUG: First 5 reranked relevance_ranks:", [r.relevance_rank for r in reordered_results[:5]]) + + objects_with_summarized_content: List[ObjectFromDB] = [] + + for result in reordered_results[:self.reranked_M]: + if self.multi_lm_configs: + with dspy.context(lm=self.multi_lm_configs_dict["summarizer"]): + summary = self.summarizer( + query=question, + passage=result.content, + ).relevance_summary + else: + summary = self.summarizer( + query=question, + passage=result.content, + ).relevance_summary + objects_with_summarized_content.append(ObjectFromDB( + object_id=result.object_id, + relevance_rank=result.relevance_rank, + content=summary + )) + + if self.verbose: + print("\033[93mSummarized objects...\033[0m") + print("Here is a sample:") + print(f"{objects_with_summarized_content[0].content[:100]}...") + print(f"{objects_with_summarized_content[0].object_id}") + + valid_object_ids = [obj.object_id for obj in objects_with_summarized_content] + print("DEBUG: objects_with_summarized_content object_ids:", [obj.object_id for obj in objects_with_summarized_content]) + print("DEBUG: valid_object_ids:", valid_object_ids) + + if self.multi_lm_configs: + with dspy.context(lm=self.multi_lm_configs_dict["listwise_reranker"]): + listwise_reranked_pred = self.listwise_reranker( + query=question, + search_results=objects_with_summarized_content, + top_k=self.reranked_M, + valid_object_ids=valid_object_ids + ) + else: + listwise_reranked_pred = self.listwise_reranker( + query=question, + search_results=objects_with_summarized_content, + top_k=self.reranked_M, + valid_object_ids=valid_object_ids + ) + + listwise_reranked_result = listwise_reranked_pred.reranked_ids + # listwise_reranked_result is now a list of IDs in ranked order + if self.verbose: + print(f"\033[96mListwise reranked result: {listwise_reranked_result}\033[0m") + if self.verbose_signature: + rationale = listwise_reranked_pred.reasoning + print(f"\033[96mListwise reranked result rationale: {rationale}\033[0m") + + + # Reorder reranked_results based on the listwise ranking + # Create a mapping from object_id to the original object + id_to_obj = {obj.object_id: obj for obj in reordered_results} + + # Reorder according to listwise_reranked_result + reordered_results = [] + for ranked_id in listwise_reranked_result: + if ranked_id in id_to_obj: + reordered_results.append(id_to_obj[ranked_id]) + if self.verbose: + print(f"\033[38;5;208mAdding object {ranked_id} to reordered results\033[0m") + + # Add any remaining objects that weren't in the listwise ranking + remaining_ids = set(id_to_obj.keys()) - set(listwise_reranked_result) + for remaining_id in remaining_ids: + reordered_results.append(id_to_obj[remaining_id]) + if self.verbose: + print(f"\033[38;5;208mAdding remaining object {remaining_id} to reordered results\033[0m") + + reranked_results = reordered_results + + if self.verbose: + print(f"\033[92mListwise reranking: Returning {self.reranked_M} documents\033[0m") + + return DSPyAgentRAGResponse( + final_answer="", + sources=reranked_results[:self.reranked_N], + searches=[question], + aggregations=None, + usage={}, + ) + + async def aforward(self, question: str) -> DSPyAgentRAGResponse: + pass + +async def main(): + from retrieve_dspy.clients import get_weaviate_client, get_voyage_client + from retrieve_dspy.utils import get_lm + + gpt5 = get_lm("openai/gpt-5", max_tokens=32000) + gpt4_1_mini = get_lm("openai/gpt-4.1-mini", max_tokens=32000) + + rag_pipeline = LayeredListwiseReranker( + weaviate_client=get_weaviate_client(), + reranker_clients=[get_voyage_client()], + collection_name="BeirNq", + target_property_name="content", + return_property_name="content", + retrieved_k=50, + reranked_N=20, + reranked_M=5, + voyage_model="rerank-2.5", + reranker_provider="voyage", + verbose=True, + verbose_signature=True, + multi_lm_configs=[MultiLMConfig(signature_name="listwise_reranker", lm=gpt5), MultiLMConfig(signature_name="summarizer", lm=gpt4_1_mini)] + ) + print("Testing sync with Listwise strategy forward") + test_query = "How many types of MIDI messages are there?" + response = rag_pipeline.forward(test_query) + print(response) + #print("Testing async forward") + #response = await rag_pipeline.aforward("What is the best way to learn Angular?") + #print(response) + +if __name__ == "__main__": + asyncio.run(main()) \ No newline at end of file diff --git a/retrieve_dspy/retrievers/listwise_reranker.py b/retrieve_dspy/retrievers/rerankers/listwise_reranker.py similarity index 88% rename from retrieve_dspy/retrievers/listwise_reranker.py rename to retrieve_dspy/retrievers/rerankers/listwise_reranker.py index 1ea5275..6433f85 100644 --- a/retrieve_dspy/retrievers/listwise_reranker.py +++ b/retrieve_dspy/retrievers/rerankers/listwise_reranker.py @@ -3,14 +3,14 @@ import dspy -from retrieve_dspy.tools.weaviate_database import ( +from retrieve_dspy.database.weaviate_database import ( weaviate_search_tool, async_weaviate_search_tool ) from retrieve_dspy.retrievers.base_rag import BaseRAG -from retrieve_dspy.models import DSPyAgentRAGResponse +from retrieve_dspy.models import DSPyAgentRAGResponse, ObjectFromDB from retrieve_dspy.signatures import RelevanceRanker, DiversityRanker class ListwiseReranker(BaseRAG): @@ -41,17 +41,21 @@ def __init__( self.reranker = dspy.Predict(RelevanceRanker) def forward(self, question: str) -> DSPyAgentRAGResponse: - # Get search results with scores for reranking - search_results, sources = weaviate_search_tool( + # Get search results + sources = weaviate_search_tool( query=question, collection_name=self.collection_name, target_property_name=self.target_property_name, retrieved_k=self.retrieved_k, return_property_name=self.return_property_name, - return_format="rerank" ) # Perform reranking + # Build SearchResult-like structures for the reranker + search_results = [] + for i, s in enumerate(sources, 1): + search_results.append(ObjectFromDB(id=i, initial_rank=i, content=s.content)) + rerank_pred = self.reranker( query=question, search_results=search_results, @@ -62,8 +66,6 @@ def forward(self, question: str) -> DSPyAgentRAGResponse: reranked_sources = [] reranked_results = [] for rank_id in rerank_pred.reranked_ids: - # Find the source corresponding to this rank_id - # rank_id is 1-based, sources list is 0-based source_index = rank_id - 1 if 0 <= source_index < len(sources): reranked_sources.append(sources[source_index]) @@ -87,19 +89,22 @@ def forward(self, question: str) -> DSPyAgentRAGResponse: ) async def aforward(self, question: str) -> DSPyAgentRAGResponse: - # Get search results with scores for reranking - search_results, sources = await async_weaviate_search_tool( + # Get search results + sources = await async_weaviate_search_tool( query=question, collection_name=self.collection_name, target_property_name=self.target_property_name, retrieved_k=self.retrieved_k, return_property_name=self.return_property_name, - return_format="rerank" ) if self.verbose: print(f"\033[96mInitial results: {len(sources)} Sources!\033[0m") + search_results = [] + for i, s in enumerate(sources, 1): + search_results.append(ObjectFromDB(id=i, initial_rank=i, content=s.content)) + rerank_pred = await self.reranker.acall( query=question, search_results=search_results, diff --git a/retrieve_dspy/retrievers/rerankers/sliding_window_listwise_reranker.py b/retrieve_dspy/retrievers/rerankers/sliding_window_listwise_reranker.py new file mode 100644 index 0000000..e51f733 --- /dev/null +++ b/retrieve_dspy/retrievers/rerankers/sliding_window_listwise_reranker.py @@ -0,0 +1,366 @@ +import asyncio +import os +from typing import Optional, List, Any + +import dspy +import weaviate + +from retrieve_dspy.database.weaviate_database import ( + weaviate_search_tool, + async_weaviate_search_tool +) +from retrieve_dspy.retrievers.base_rag import BaseRAG +from retrieve_dspy.models import DSPyAgentRAGResponse, ListwiseRankedDocument +from retrieve_dspy.signatures import ListwiseRanking, VerboseListwiseRanking + + +class SlidingWindowListwiseReranker(BaseRAG): + """ + Listwise reranker using a sliding window approach. + + Processes documents in overlapping windows from bottom to top, + progressively sorting and bubbling the most relevant documents to the top. + + Example with window_size=5, stride=3, total_docs=15: + Window 1: docs [10-14] → sort within window + Window 2: docs [7-11] → sort, best bubble up + Window 3: docs [4-8] → sort, best bubble up + Window 4: docs [1-5] → sort, best bubble up + Window 5: docs [0-4] → sort, best bubble up + Final result: fully sorted list + """ + + def __init__( + self, + collection_name: str, + target_property_name: str, + weaviate_client: Optional[weaviate.WeaviateClient | weaviate.WeaviateAsyncClient] = None, + verbose: Optional[bool] = False, + search_only: Optional[bool] = True, + retrieved_k: Optional[int] = 50, + window_size: Optional[int] = 10, + stride: Optional[int] = 5, + use_thinking: Optional[bool] = False, + ): + super().__init__( + collection_name=collection_name, + target_property_name=target_property_name, + search_only=search_only, + verbose=verbose, + retrieved_k=retrieved_k + ) + self.weaviate_client = weaviate_client + self.window_size = window_size + self.stride = stride + if use_thinking: + if self.verbose: + self.ranker = dspy.ChainOfThought(VerboseListwiseRanking) + else: + self.ranker = dspy.ChainOfThought(ListwiseRanking) + else: + if self.verbose: + self.ranker = dspy.Predict(VerboseListwiseRanking) + else: + self.ranker = dspy.Predict(ListwiseRanking) + + def _extract_document_text(self, doc: Any) -> str: + """Extract text content from a document object.""" + if hasattr(doc, self.target_property_name): + content = getattr(doc, self.target_property_name) + elif isinstance(doc, dict) and self.target_property_name in doc: + content = doc[self.target_property_name] + else: + content = str(doc) + + # Truncate long documents for efficiency + if isinstance(content, str) and len(content) > 1000: + content = content[:1000] + "..." + + return content + + def _rerank_window( + self, + query: str, + window_docs: List[ListwiseRankedDocument] + ) -> List[ListwiseRankedDocument]: + """Rerank a single window of documents.""" + if len(window_docs) <= 1: + return window_docs + + # Extract text content for ranking + doc_texts = [self._extract_document_text(doc.content) for doc in window_docs] + + # Get ranking from LLM + ranking_response = self.ranker( + query=query, + documents=doc_texts, + ) + + # Get ranked indices (trust LLM output is a list of ints) + ranked_indices = ranking_response.ranked_indices + + # Validate indices + valid_indices = [idx for idx in ranked_indices + if 0 <= idx < len(window_docs)] + + # Handle missing indices + missing = set(range(len(window_docs))) - set(valid_indices) + valid_indices.extend(sorted(missing)) + + # Reorder documents + reranked = [window_docs[idx] for idx in valid_indices[:len(window_docs)]] + + if self.verbose: + # Show original indices instead of window-relative indices + original_indices = [window_docs[idx].original_position for idx in valid_indices[:len(window_docs)]] + print(f"\033[96mWindow ranking: {original_indices}\033[0m") + + return reranked + + async def _arerank_window( + self, + query: str, + window_docs: List[ListwiseRankedDocument] + ) -> List[ListwiseRankedDocument]: + """Async version of rerank_window.""" + if len(window_docs) <= 1: + return window_docs + + doc_texts = [self._extract_document_text(doc.content) for doc in window_docs] + + ranking_response = await self.ranker.acall( + query=query, + documents=doc_texts, + ) + + # Get ranked indices (trust LLM output is a list of ints) + ranked_indices = ranking_response.ranked_indices + + valid_indices = [idx for idx in ranked_indices + if 0 <= idx < len(window_docs)] + missing = set(range(len(window_docs))) - set(valid_indices) + valid_indices.extend(sorted(missing)) + + reranked = [window_docs[idx] for idx in valid_indices[:len(window_docs)]] + + if self.verbose: + # Show original indices instead of window-relative indices + original_indices = [window_docs[idx].original_position for idx in valid_indices[:len(window_docs)]] + print(f"\033[96mWindow ranking: {original_indices}\033[0m") + + return reranked + + def _sliding_window_rerank( + self, + query: str, + documents: List[Any] + ) -> List[Any]: + """ + Perform sliding window reranking from bottom to top. + + Process windows from the end of the list backwards, allowing + highly relevant documents to "bubble up" to the top. + """ + if len(documents) == 0: + return documents + + # Wrap documents with ListwiseRankedDocument + ranked_docs = [ + ListwiseRankedDocument( + content=doc, + original_position=i, + current_position=i + ) + for i, doc in enumerate(documents) + ] + + # Calculate window positions (from bottom to top) + num_docs = len(ranked_docs) + window_starts = list(range(num_docs - self.window_size, -1, -self.stride)) + + # Ensure we cover the beginning + if window_starts[-1] != 0: + window_starts.append(0) + + window_starts = sorted(set(window_starts), reverse=True) + + if self.verbose: + print(f"\033[94mReranking {num_docs} documents with {len(window_starts)} windows\033[0m") + print(f"\033[94mWindow starts: {window_starts}\033[0m") + + # Process each window + for window_idx, start in enumerate(window_starts): + end = min(start + self.window_size, num_docs) + + if self.verbose: + print(f"\n\033[95m=== Window {window_idx + 1}/{len(window_starts)}: docs [{start}:{end}] ===\033[0m") + + # Extract window + window = ranked_docs[start:end] + + # Rerank window + reranked_window = self._rerank_window(query, window) + + # Update the main list with reranked window + ranked_docs[start:end] = reranked_window + + # Extract final ordered documents + return [doc.content for doc in ranked_docs] + + async def _asliding_window_rerank( + self, + query: str, + documents: List[Any] + ) -> List[Any]: + """Async version of sliding window rerank.""" + if len(documents) == 0: + return documents + + ranked_docs = [ + ListwiseRankedDocument( + content=doc, + original_position=i, + current_position=i + ) + for i, doc in enumerate(documents) + ] + + num_docs = len(ranked_docs) + window_starts = list(range(num_docs - self.window_size, -1, -self.stride)) + + if window_starts[-1] != 0: + window_starts.append(0) + + window_starts = sorted(set(window_starts), reverse=True) + + if self.verbose: + print(f"\033[94mReranking {num_docs} documents with {len(window_starts)} windows\033[0m") + print(f"\033[94mWindow starts: {window_starts}\033[0m") + + for window_idx, start in enumerate(window_starts): + end = min(start + self.window_size, num_docs) + + if self.verbose: + print(f"\n\033[95m=== Window {window_idx + 1}/{len(window_starts)}: docs [{start}:{end}] ===\033[0m") + + window = ranked_docs[start:end] + reranked_window = await self._arerank_window(query, window) + ranked_docs[start:end] = reranked_window + + return [doc.content for doc in ranked_docs] + + def forward( + self, + question: str, + weaviate_client: Optional[weaviate.WeaviateClient] = None + ) -> DSPyAgentRAGResponse: + if weaviate_client is None: + if isinstance(self.weaviate_client, weaviate.WeaviateClient): + weaviate_client = self.weaviate_client + + # Initial retrieval + initial_results = weaviate_search_tool( + query=question, + collection_name=self.collection_name, + target_property_name=self.target_property_name, + retrieved_k=self.retrieved_k, + weaviate_client=weaviate_client, + ) + + if self.verbose: + print(f"\n\033[92mInitial retrieval: {len(initial_results)} documents\033[0m") + + # Rerank using sliding windows + reranked_results = self._sliding_window_rerank(question, initial_results) + + if self.verbose: + print("\n\033[92mReranking complete!\033[0m\n") + + return DSPyAgentRAGResponse( + final_answer="", + sources=reranked_results, + searches=[question], + aggregations=None, + usage={}, + ) + + async def aforward( + self, + question: str, + weaviate_async_client: Optional[weaviate.WeaviateAsyncClient] = None + ) -> DSPyAgentRAGResponse: + if weaviate_async_client is None: + if isinstance(self.weaviate_async_client, weaviate.WeaviateAsyncClient): + weaviate_async_client = self.weaviate_async_client + + initial_results = await async_weaviate_search_tool( + query=question, + collection_name=self.collection_name, + target_property_name=self.target_property_name, + retrieved_k=self.retrieved_k, + weaviate_async_client=weaviate_async_client, + ) + + if self.verbose: + print(f"\n\033[92mInitial retrieval: {len(initial_results)} documents\033[0m") + + reranked_results = await self._asliding_window_rerank(question, initial_results) + + if self.verbose: + print("\n\033[92mReranking complete!\033[0m\n") + + return DSPyAgentRAGResponse( + final_answer="", + sources=reranked_results, + searches=[question], + aggregations=None, + usage={}, + ) + + +async def main(): + # Example with smaller numbers for demonstration + test_pipeline = SlidingWindowListwiseReranker( + collection_name="BrightBiology", + target_property_name="content", + verbose=True, + retrieved_k=20, # Get top 20 documents + window_size=5, # Process 5 docs at a time + stride=3, # Move 3 positions each time + use_thinking=True, + ) + + test_q = "How many cells are in the human body?" + + weaviate_client = weaviate.connect_to_weaviate_cloud( + cluster_url=os.getenv("WEAVIATE_URL"), + auth_credentials=weaviate.auth.AuthApiKey(os.getenv("WEAVIATE_API_KEY")), + ) + + weaviate_async_client = weaviate.use_async_with_weaviate_cloud( + cluster_url=os.getenv("WEAVIATE_URL"), + auth_credentials=weaviate.auth.AuthApiKey(os.getenv("WEAVIATE_API_KEY")), + ) + + await weaviate_async_client.connect() + + print("=== Testing Sync Reranking ===") + test_sync_response = test_pipeline.forward(test_q, weaviate_client=weaviate_client) + print("\nTop 3 reranked results:") + for i, doc in enumerate(test_sync_response.sources[:3]): + print(f"{i+1}. {str(doc)[:100]}...") + + print(f"Returned {len(test_sync_response.sources)} documents.") + + print("\n\n=== Testing Async Reranking ===") + test_async_response = await test_pipeline.aforward(test_q, weaviate_async_client=weaviate_async_client) + print("\nTop 3 reranked results:") + for i, doc in enumerate(test_async_response.sources[:3]): + print(f"{i+1}. {str(doc)[:100]}...") + + weaviate_client.close() + await weaviate_async_client.close() + + +if __name__ == "__main__": + asyncio.run(main()) \ No newline at end of file diff --git a/retrieve_dspy/retrievers/summarized_listwise_reranker.py b/retrieve_dspy/retrievers/rerankers/summarized_listwise_reranker.py similarity index 91% rename from retrieve_dspy/retrievers/summarized_listwise_reranker.py rename to retrieve_dspy/retrievers/rerankers/summarized_listwise_reranker.py index da5c405..6a4b139 100644 --- a/retrieve_dspy/retrievers/summarized_listwise_reranker.py +++ b/retrieve_dspy/retrievers/rerankers/summarized_listwise_reranker.py @@ -3,13 +3,13 @@ import dspy -from retrieve_dspy.tools.weaviate_database import ( +from retrieve_dspy.database.weaviate_database import ( weaviate_search_tool, async_weaviate_search_tool ) from retrieve_dspy.retrievers.base_rag import BaseRAG -from retrieve_dspy.models import DSPyAgentRAGResponse, SearchResult +from retrieve_dspy.models import DSPyAgentRAGResponse, ObjectFromDB from retrieve_dspy.signatures import SummarizeSearchRelevance, RelevanceRanker def aggregate_usage(total_usage: dict, new_usage: dict) -> dict: @@ -54,13 +54,12 @@ def __init__( def forward(self, question: str) -> DSPyAgentRAGResponse: # Get search results - search_results, sources = weaviate_search_tool( + sources = weaviate_search_tool( query=question, collection_name=self.collection_name, target_property_name=self.target_property_name, return_property_name=self.return_property_name, retrieved_k=self.retrieved_k, - return_format="rerank" ) if self.verbose: @@ -71,14 +70,14 @@ def forward(self, question: str) -> DSPyAgentRAGResponse: summaries = [] total_usage = {} - for i, (result, source) in enumerate(zip(search_results, sources)): + for i, source in enumerate(sources): summary_pred = self.summarizer( query=question, - passage=result.content, + passage=source.content, ) summaries.append({ - "passage_id": result.id, + "passage_id": i + 1, "initial_rank": i, "relevance_summary": summary_pred.relevance_summary, }) @@ -93,8 +92,8 @@ def forward(self, question: str) -> DSPyAgentRAGResponse: for summary in summaries: print(f"\033[96m{summary['relevance_summary']}\033[0m\n") - # Convert search results to list of SearchResult objects - search_results_list = [SearchResult( + # Convert summaries to list of SearchResult objects for reranker + search_results_list = [ObjectFromDB( id=result["passage_id"], initial_rank=result["initial_rank"], content=result["relevance_summary"] @@ -135,13 +134,12 @@ def forward(self, question: str) -> DSPyAgentRAGResponse: async def aforward(self, question: str) -> DSPyAgentRAGResponse: # Get search results - search_results, sources = await async_weaviate_search_tool( + sources = await async_weaviate_search_tool( query=question, collection_name=self.collection_name, target_property_name=self.target_property_name, return_property_name=self.return_property_name, retrieved_k=self.retrieved_k, - return_format="rerank" ) if self.verbose: @@ -150,14 +148,14 @@ async def aforward(self, question: str) -> DSPyAgentRAGResponse: # Summarize relevance for each result in parallel summary_tasks = [] passage_ids = [] # Track passage IDs separately - for i, (result, source) in enumerate(zip(search_results, sources)): + for i, source in enumerate(sources): task = self.summarizer.acall( query=question, - passage=result.content, + passage=source.content, initial_rank=i ) summary_tasks.append(task) - passage_ids.append(result.id) # Store the ID + passage_ids.append(i + 1) # Store the ID # Wait for all summaries to complete summary_preds = await asyncio.gather(*summary_tasks) @@ -183,7 +181,7 @@ async def aforward(self, question: str) -> DSPyAgentRAGResponse: # Perform reranking based on summaries # Convert search results to list of SearchResult objects - search_results_list = [SearchResult( + search_results_list = [ObjectFromDB( id=result["passage_id"], initial_rank=result["initial_rank"], content=result["relevance_summary"] diff --git a/retrieve_dspy/retrievers/rerankers/top_down_partitioning_reranker.py b/retrieve_dspy/retrievers/rerankers/top_down_partitioning_reranker.py new file mode 100644 index 0000000..d4d2ea6 --- /dev/null +++ b/retrieve_dspy/retrievers/rerankers/top_down_partitioning_reranker.py @@ -0,0 +1,1019 @@ +import asyncio +import os +from typing import Optional, List, Any, Tuple + +import dspy +import weaviate + +from retrieve_dspy.database.weaviate_database import ( + weaviate_search_tool, + async_weaviate_search_tool +) +from retrieve_dspy.retrievers.base_rag import BaseRAG +from retrieve_dspy.models import DSPyAgentRAGResponse, ListwiseRankedDocument +from retrieve_dspy.signatures import ListwiseRanking, VerboseListwiseRanking + + +class TopDownPartitioningReranker(BaseRAG): + """ + Listwise reranker using top-down partitioning with pivot-based selection. + Parry et al. 2024: https://arxiv.org/pdf/2405.14589 + + This approach addresses the inefficiencies of sliding window reranking by: + 1. Processing documents top-down instead of bottom-up + 2. Using a pivot element for parallel comparison + 3. Reducing redundant re-scoring of top documents + + Algorithm: + 1. Rank the top-w documents and select a pivot at position k (typically w/2) + 2. Compare pivot against remaining documents in parallel batches + 3. Documents ranked above the pivot become candidates for top-k + 4. Recursively refine candidate pool until budget is met or no more candidates + 5. Final ranking of the candidate pool produces the top-k results + + Key advantages over sliding window: + - ~33% fewer inference calls + - Inherently parallelizable (most inferences can run concurrently) + - Reduces repeated re-scoring of highly ranked documents + - Better aligned with list-wise ranker biases (prefers well-ordered lists) + """ + + def __init__( + self, + collection_name: str, + target_property_name: str, + weaviate_client: Optional[weaviate.WeaviateClient | weaviate.WeaviateAsyncClient] = None, + verbose: Optional[bool] = False, + search_only: Optional[bool] = True, + retrieved_k: Optional[int] = 50, + target_k: Optional[int] = 10, + window_size: Optional[int] = 10, + budget: Optional[int] = None, + use_thinking: Optional[bool] = True, + ranking_depth: Optional[int] = 100, + ): + """ + Initialize the Top-Down Partitioning Reranker. + + Args: + collection_name: Weaviate collection name + target_property_name: Property to retrieve from documents + weaviate_client: Weaviate client instance + verbose: Enable detailed logging + search_only: If True, only return reranked documents without answer generation + retrieved_k: Number of documents to retrieve in initial search + target_k: Target number of top documents for final ranking (the k in "top-k") + window_size: Number of documents to rank at once + budget: Maximum number of candidates to collect before stopping (default: window_size) + use_thinking: Use Chain of Thought for ranking + ranking_depth: Maximum depth to rank documents to + """ + super().__init__( + collection_name=collection_name, + target_property_name=target_property_name, + search_only=search_only, + verbose=verbose, + retrieved_k=retrieved_k + ) + self.weaviate_client = weaviate_client + self.target_k = target_k + self.window_size = window_size + self.budget = budget if budget is not None else window_size + self.ranking_depth = ranking_depth + self.pivot_position = min(target_k, window_size // 2) # k = w/2 as per paper + + if use_thinking: + if self.verbose: + self.ranker = dspy.ChainOfThought(VerboseListwiseRanking) + else: + self.ranker = dspy.ChainOfThought(ListwiseRanking) + else: + if self.verbose: + self.ranker = dspy.Predict(VerboseListwiseRanking) + else: + self.ranker = dspy.Predict(ListwiseRanking) + + # Track statistics for efficiency analysis + self.inference_count = 0 + self.parallel_inference_count = 0 + + def _extract_document_text(self, doc: Any) -> str: + """Extract text content from a document object.""" + if hasattr(doc, self.target_property_name): + content = getattr(doc, self.target_property_name) + elif isinstance(doc, dict) and self.target_property_name in doc: + content = doc[self.target_property_name] + else: + content = str(doc) + + # Truncate long documents for efficiency + if isinstance(content, str) and len(content) > 1000: + content = content[:1000] + "..." + + return content + + def _rank_window( + self, + query: str, + window_docs: List[ListwiseRankedDocument], + window_label: str = "" + ) -> List[ListwiseRankedDocument]: + """Rank a single window of documents.""" + if len(window_docs) <= 1: + return window_docs + + self.inference_count += 1 + + # Extract text content for ranking + doc_texts = [self._extract_document_text(doc.content) for doc in window_docs] + + # Get ranking from LLM + ranking_response = self.ranker( + query=query, + documents=doc_texts, + ) + + # Get ranked indices + ranked_indices = ranking_response.ranked_indices + + # Validate indices + valid_indices = [idx for idx in ranked_indices + if 0 <= idx < len(window_docs)] + + # Handle missing indices + missing = set(range(len(window_docs))) - set(valid_indices) + valid_indices.extend(sorted(missing)) + + # Reorder documents + reranked = [window_docs[idx] for idx in valid_indices[:len(window_docs)]] + + if self.verbose: + original_indices = [window_docs[idx].original_position for idx in valid_indices[:len(window_docs)]] + print(f"\033[96m{window_label}Ranking: {original_indices}\033[0m") + + return reranked + + async def _arank_window( + self, + query: str, + window_docs: List[ListwiseRankedDocument], + window_label: str = "" + ) -> List[ListwiseRankedDocument]: + """Async version of rank_window.""" + if len(window_docs) <= 1: + return window_docs + + self.inference_count += 1 + + doc_texts = [self._extract_document_text(doc.content) for doc in window_docs] + + ranking_response = await self.ranker.acall( + query=query, + documents=doc_texts, + ) + + ranked_indices = ranking_response.ranked_indices + valid_indices = [idx for idx in ranked_indices + if 0 <= idx < len(window_docs)] + missing = set(range(len(window_docs))) - set(valid_indices) + valid_indices.extend(sorted(missing)) + + reranked = [window_docs[idx] for idx in valid_indices[:len(window_docs)]] + + if self.verbose: + original_indices = [window_docs[idx].original_position for idx in valid_indices[:len(window_docs)]] + print(f"\033[96m{window_label}Ranking: {original_indices}\033[0m") + + return reranked + + def _compare_with_pivot( + self, + query: str, + pivot: ListwiseRankedDocument, + batch_docs: List[ListwiseRankedDocument], + batch_idx: int + ) -> Tuple[List[ListwiseRankedDocument], List[ListwiseRankedDocument]]: + """ + Compare a batch of documents against the pivot. + Returns (documents_above_pivot, documents_below_pivot). + """ + if len(batch_docs) == 0: + return [], [] + + # Create window with pivot at the start (as per paper's suggestion) + window = [pivot] + batch_docs + + if self.verbose: + print(f"\n\033[95m--- Batch {batch_idx}: Comparing {len(batch_docs)} docs vs pivot (orig pos {pivot.original_position}) ---\033[0m") + + # Rank the window + ranked_window = self._rank_window( + query, + window, + window_label=f"Batch {batch_idx} " + ) + + # Find pivot position in ranked window + pivot_rank = next(i for i, doc in enumerate(ranked_window) if doc is pivot) + + # Split based on pivot position + above_pivot = ranked_window[:pivot_rank] + below_pivot = ranked_window[pivot_rank + 1:] + + if self.verbose: + print(f"\033[96mPivot ranked at position {pivot_rank}, {len(above_pivot)} docs above, {len(below_pivot)} docs below\033[0m") + + return above_pivot, below_pivot + + async def _acompare_with_pivot( + self, + query: str, + pivot: ListwiseRankedDocument, + batch_docs: List[ListwiseRankedDocument], + batch_idx: int + ) -> Tuple[List[ListwiseRankedDocument], List[ListwiseRankedDocument]]: + """Async version of compare_with_pivot.""" + if len(batch_docs) == 0: + return [], [] + + window = [pivot] + batch_docs + + if self.verbose: + print(f"\n\033[95m--- Batch {batch_idx}: Comparing {len(batch_docs)} docs vs pivot (orig pos {pivot.original_position}) ---\033[0m") + + ranked_window = await self._arank_window( + query, + window, + window_label=f"Batch {batch_idx} " + ) + + pivot_rank = next(i for i, doc in enumerate(ranked_window) if doc is pivot) + + above_pivot = ranked_window[:pivot_rank] + below_pivot = ranked_window[pivot_rank + 1:] + + if self.verbose: + print(f"\033[96mPivot ranked at position {pivot_rank}, {len(above_pivot)} docs above, {len(below_pivot)} docs below\033[0m") + + return above_pivot, below_pivot + + def _partition_iteration( + self, + query: str, + candidates: List[ListwiseRankedDocument], + remaining: List[ListwiseRankedDocument], + iteration: int + ) -> Tuple[List[ListwiseRankedDocument], List[ListwiseRankedDocument], ListwiseRankedDocument]: + """ + Single iteration of the partitioning algorithm. + Returns (new_candidates, backfill, pivot). + + The pivot is returned separately to ensure it's not lost and can be + included in the final ranking as per Algorithm 1: A_i ∪ p ∪ B + + CRITICAL: Documents below the pivot in the initial window must be + compared against the pivot in subsequent batches, not immediately + added to backfill! + """ + if self.verbose: + print(f"\n\033[94m{'='*60}\033[0m") + print(f"\033[94mIteration {iteration}: {len(candidates)} candidates, {len(remaining)} remaining\033[0m") + print(f"\033[94m{'='*60}\033[0m") + + # Step 1: Rank the top window to find pivot + top_window = candidates[:self.window_size] + + if self.verbose: + print(f"\n\033[95m=== Ranking top {len(top_window)} candidates to find pivot ===\033[0m") + + ranked_top = self._rank_window(query, top_window, window_label="Initial ") + + # Step 2: Select pivot at position k (or last element if window too small) + pivot_idx = min(self.pivot_position, len(ranked_top) - 1) + pivot = ranked_top[pivot_idx] + + # Documents above pivot are guaranteed candidates for top-k + new_candidates = ranked_top[:pivot_idx] + + # Documents below pivot need to be compared against the pivot! + # They do NOT go directly to backfill - that was the bug! + docs_below_pivot = ranked_top[pivot_idx + 1:] + + if self.verbose: + print(f"\n\033[93mPivot selected: document at original position {pivot.original_position}\033[0m") + print(f"\033[93m{len(new_candidates)} docs above pivot (added to candidates)\033[0m") + print(f"\033[93m{len(docs_below_pivot)} docs below pivot (need pivot comparison)\033[0m") + + # Step 3: Combine ALL documents that need to be compared with pivot: + # - Documents below the pivot in the initial window + # - Remaining unprocessed candidates from the candidate pool + # - Any leftover documents from previous iteration + remaining_candidates = candidates[self.window_size:] + all_remaining = docs_below_pivot + remaining_candidates + remaining + + if self.verbose: + print(f"\033[93mTotal documents to compare with pivot: {len(all_remaining)}\033[0m") + + # Initialize empty backfill - only populated after pivot comparisons + backfill = [] + + if len(all_remaining) == 0: + return new_candidates, backfill, pivot + + # Create batches of size (window_size - 1) to account for pivot + batch_size = self.window_size - 1 + batches = [all_remaining[i:i + batch_size] for i in range(0, len(all_remaining), batch_size)] + + if self.verbose: + print(f"\n\033[94mProcessing {len(batches)} batches (window_size - 1 = {batch_size} docs per batch)\033[0m") + + self.parallel_inference_count += len(batches) + + # Process each batch (in production, these could run in parallel) + for batch_idx, batch in enumerate(batches, 1): + above, below = self._compare_with_pivot(query, pivot, batch, batch_idx) + new_candidates.extend(above) + backfill.extend(below) + + # Early stopping if we've reached budget + if len(new_candidates) >= self.budget: + if self.verbose: + print(f"\n\033[93m⚠️ Budget reached ({self.budget}), stopping early\033[0m") + # Add remaining unprocessed docs to backfill + remaining_batches = batches[batch_idx:] + for remaining_batch in remaining_batches: + backfill.extend(remaining_batch) + break + + return new_candidates, backfill, pivot + + async def _apartition_iteration( + self, + query: str, + candidates: List[ListwiseRankedDocument], + remaining: List[ListwiseRankedDocument], + iteration: int + ) -> Tuple[List[ListwiseRankedDocument], List[ListwiseRankedDocument], ListwiseRankedDocument]: + """Async version of partition_iteration with true parallelization.""" + if self.verbose: + print(f"\n\033[94m{'='*60}\033[0m") + print(f"\033[94mIteration {iteration}: {len(candidates)} candidates, {len(remaining)} remaining\033[0m") + print(f"\033[94m{'='*60}\033[0m") + + # Step 1: Rank the top window to find pivot + top_window = candidates[:self.window_size] + + if self.verbose: + print(f"\n\033[95m=== Ranking top {len(top_window)} candidates to find pivot ===\033[0m") + + ranked_top = await self._arank_window(query, top_window, window_label="Initial ") + + # Step 2: Select pivot + pivot_idx = min(self.pivot_position, len(ranked_top) - 1) + pivot = ranked_top[pivot_idx] + + new_candidates = ranked_top[:pivot_idx] + docs_below_pivot = ranked_top[pivot_idx + 1:] + + if self.verbose: + print(f"\n\033[93mPivot selected: document at original position {pivot.original_position}\033[0m") + print(f"\033[93m{len(new_candidates)} docs above pivot (added to candidates)\033[0m") + print(f"\033[93m{len(docs_below_pivot)} docs below pivot (need pivot comparison)\033[0m") + + # Step 3: Combine all documents needing pivot comparison + remaining_candidates = candidates[self.window_size:] + all_remaining = docs_below_pivot + remaining_candidates + remaining + + if self.verbose: + print(f"\033[93mTotal documents to compare with pivot: {len(all_remaining)}\033[0m") + + backfill = [] + + if len(all_remaining) == 0: + return new_candidates, backfill, pivot + + batch_size = self.window_size - 1 + batches = [all_remaining[i:i + batch_size] for i in range(0, len(all_remaining), batch_size)] + + if self.verbose: + print(f"\n\033[94m⚡ Processing {len(batches)} batches in PARALLEL ⚡\033[0m") + + self.parallel_inference_count += len(batches) + + # Process all batches concurrently + batch_tasks = [ + self._acompare_with_pivot(query, pivot, batch, batch_idx) + for batch_idx, batch in enumerate(batches, 1) + ] + + batch_results = await asyncio.gather(*batch_tasks) + + # Collect results and check budget + for batch_idx, (above, below) in enumerate(batch_results, 1): + new_candidates.extend(above) + backfill.extend(below) + + if len(new_candidates) >= self.budget: + if self.verbose: + print(f"\n\033[93m⚠️ Budget reached ({self.budget})\033[0m") + # Add remaining unprocessed docs to backfill + for remaining_idx in range(batch_idx, len(batches)): + backfill.extend(batches[remaining_idx]) + break + + return new_candidates, backfill, pivot + + def _top_down_partition_recursive( + self, + query: str, + documents: List[Any], + depth: int = 0 + ) -> List[Any]: + """ + Recursive version of top-down partitioning for refining large candidate pools. + This implements the "pivot(A_i)" call from Algorithm 1, Line 14. + + Args: + query: Search query + documents: Documents to partition + depth: Recursion depth (for logging) + + Returns: + Ranked list of documents + """ + if len(documents) <= self.window_size: + # Base case: small enough to rank directly + if self.verbose: + print(f"\n\033[96m{' '*depth}↳ Recursive depth {depth}: {len(documents)} docs, ranking directly\033[0m") + + ranked_docs = [ + ListwiseRankedDocument(content=doc, original_position=i, current_position=i) + for i, doc in enumerate(documents) + ] + ranked = self._rank_window(query, ranked_docs, window_label=f"Recursive-{depth} ") + return [doc.content for doc in ranked] + + if self.verbose: + print(f"\n\033[96m{' '*depth}↳ Recursive depth {depth}: Partitioning {len(documents)} docs\033[0m") + + # Wrap documents + ranked_docs = [ + ListwiseRankedDocument(content=doc, original_position=i, current_position=i) + for i, doc in enumerate(documents) + ] + + candidates = ranked_docs + backfill = [] + all_pivots = [] + + # Single iteration of partitioning + iteration = 1 + while len(candidates) > self.window_size: + new_candidates, new_backfill, pivot = self._partition_iteration( + query, candidates, [], iteration + ) + + all_pivots.append(pivot) + backfill.extend(new_backfill) + candidates = new_candidates + + if len(candidates) <= self.window_size: + break + + iteration += 1 + if iteration > 5: # Limit recursion iterations + break + + # Final ranking of candidates + if len(candidates) > 1: + candidates = self._rank_window( + query, + candidates, + window_label=f"Recursive-{depth}-Final " + ) + + # Combine and return + final = candidates + all_pivots + backfill + return [doc.content for doc in final] + + async def _atop_down_partition_recursive( + self, + query: str, + documents: List[Any], + depth: int = 0 + ) -> List[Any]: + """Async recursive partitioning.""" + if len(documents) <= self.window_size: + if self.verbose: + print(f"\n\033[96m{' '*depth}↳ Async recursive depth {depth}: {len(documents)} docs, ranking directly\033[0m") + + ranked_docs = [ + ListwiseRankedDocument(content=doc, original_position=i, current_position=i) + for i, doc in enumerate(documents) + ] + ranked = await self._arank_window(query, ranked_docs, window_label=f"AsyncRecursive-{depth} ") + return [doc.content for doc in ranked] + + if self.verbose: + print(f"\n\033[96m{' '*depth}↳ Async recursive depth {depth}: Partitioning {len(documents)} docs\033[0m") + + ranked_docs = [ + ListwiseRankedDocument(content=doc, original_position=i, current_position=i) + for i, doc in enumerate(documents) + ] + + candidates = ranked_docs + backfill = [] + all_pivots = [] + + iteration = 1 + while len(candidates) > self.window_size: + new_candidates, new_backfill, pivot = await self._apartition_iteration( + query, candidates, [], iteration + ) + + all_pivots.append(pivot) + backfill.extend(new_backfill) + candidates = new_candidates + + if len(candidates) <= self.window_size: + break + + iteration += 1 + if iteration > 5: + break + + if len(candidates) > 1: + candidates = await self._arank_window( + query, + candidates, + window_label=f"AsyncRecursive-{depth}-Final " + ) + + final = candidates + all_pivots + backfill + return [doc.content for doc in final] + + def _top_down_partition( + self, + query: str, + documents: List[Any] + ) -> List[Any]: + """ + Perform top-down partitioning reranking. + + Algorithm (from paper): + 1. Process top-w documents and select pivot at position k + 2. Compare pivot against remaining documents in parallel batches + 3. Collect documents ranked above pivot as candidates + 4. Recursively refine if candidates > target_k + 5. Final ranking of candidate pool + + Paper's termination condition (Algorithm 1, Line 14): + return (|A_i| = k - 1) ? A_i ∪ p ∪ B : pivot(A_i) ∪ p ∪ B + + This means: + - If we have exactly k-1 candidates: done! + - If we have more: recursively partition the candidates + """ + if len(documents) == 0: + return documents + + # Reset statistics + self.inference_count = 0 + self.parallel_inference_count = 0 + + # Wrap documents + ranked_docs = [ + ListwiseRankedDocument( + content=doc, + original_position=i, + current_position=i + ) + for i, doc in enumerate(documents) + ] + + # Limit to ranking depth + docs_to_rank = ranked_docs[:self.ranking_depth] + backfill = ranked_docs[self.ranking_depth:] + + if self.verbose: + print(f"\n\033[92m{'='*60}\033[0m") + print("\033[92mStarting Top-Down Partitioning Reranking\033[0m") + print(f"\033[92mTotal documents: {len(documents)}, Ranking depth: {min(self.ranking_depth, len(documents))}\033[0m") + print(f"\033[92mWindow size: {self.window_size}, Budget: {self.budget}, Pivot position: {self.pivot_position}\033[0m") + print(f"\033[92mTarget k: {self.target_k}\033[0m") + print(f"\033[92m{'='*60}\033[0m") + + candidates = docs_to_rank + all_pivots = [] # Track all pivots (they should be included in final ranking) + iteration = 1 + + # Main partitioning loop + # Continue while we have enough candidates to partition + while len(candidates) > self.window_size: + if self.verbose: + print(f"\n\033[94m>>> Loop iteration {iteration}: {len(candidates)} candidates\033[0m") + + # Single partition iteration + new_candidates, new_backfill, pivot = self._partition_iteration( + query, candidates, [], iteration + ) + + all_pivots.append(pivot) + backfill.extend(new_backfill) + candidates = new_candidates + + # Check termination conditions + if len(candidates) <= self.window_size: + if self.verbose: + print(f"\n\033[93m✓ Termination condition met: {len(candidates)} candidates <= window_size ({self.window_size})\033[0m") + break + + # Budget exceeded - stop collecting more candidates + if len(candidates) >= self.budget: + if self.verbose: + print(f"\n\033[93m✓ Budget limit reached: {len(candidates)} candidates >= budget ({self.budget})\033[0m") + break + + iteration += 1 + + # Safety check to prevent infinite loops + if iteration > 10: + if self.verbose: + print("\n\033[91m⚠ Maximum iterations reached, stopping\033[0m") + break + + # Now we have a candidate pool. Apply paper's decision rule: + # If |candidates| ≈ target_k: Done! Return candidates ∪ pivots ∪ backfill + # If |candidates| > target_k: Recursively partition candidates (the "pivot(A_i)" call) + + if len(candidates) > self.target_k and len(candidates) > self.window_size: + # Too many candidates - recursively refine + if self.verbose: + print(f"\n\033[95m{'='*60}\033[0m") + print(f"\033[95m🔄 Recursive refinement needed: {len(candidates)} candidates > target {self.target_k}\033[0m") + print(f"\033[95m{'='*60}\033[0m") + + # Recursively partition the candidate set + # Note: This is the "pivot(A_i)" call from Algorithm 1, Line 14 + candidate_contents = [doc.content for doc in candidates] + refined_docs = self._top_down_partition_recursive(query, candidate_contents, depth=1) + + # Rewrap refined documents + candidates = [ + ListwiseRankedDocument( + content=doc, + original_position=-1, + current_position=i + ) + for i, doc in enumerate(refined_docs[:self.target_k]) + ] + + # Anything not in top-k goes to backfill + if len(refined_docs) > self.target_k: + backfill_docs = [ + ListwiseRankedDocument( + content=doc, + original_position=-1, + current_position=i + ) + for i, doc in enumerate(refined_docs[self.target_k:], start=self.target_k) + ] + backfill.extend(backfill_docs) + + elif len(candidates) > 1: + # Small enough candidate set - just do final ranking + if self.verbose: + print(f"\n\033[95m{'='*60}\033[0m") + print(f"\033[95m🎯 Final ranking of {len(candidates)} candidates (small enough)\033[0m") + print(f"\033[95m{'='*60}\033[0m") + + candidates = self._rank_window(query, candidates, window_label="Final ") + + # Combine: candidates (top-k) + pivots + backfill + # Paper: A_i ∪ p ∪ B + final_ranking = candidates + all_pivots + backfill + + if self.verbose: + print(f"\n\033[92m{'='*60}\033[0m") + print("\033[92m✅ Reranking complete!\033[0m") + print(f"\033[92mFinal ranking: {len(candidates)} top candidates + {len(all_pivots)} pivots + {len(backfill)} backfill\033[0m") + print(f"\033[92mTotal inferences: {self.inference_count}\033[0m") + print(f"\033[92mParallelizable inferences: {self.parallel_inference_count}\033[0m") + print(f"\033[92mEstimated speedup: {self.parallel_inference_count / max(self.inference_count, 1):.2f}x with parallelization\033[0m") + print(f"\033[92m{'='*60}\033[0m\n") + + return [doc.content for doc in final_ranking] + + async def _atop_down_partition( + self, + query: str, + documents: List[Any] + ) -> List[Any]: + """Async version of top-down partitioning with true parallelization.""" + if len(documents) == 0: + return documents + + self.inference_count = 0 + self.parallel_inference_count = 0 + + ranked_docs = [ + ListwiseRankedDocument( + content=doc, + original_position=i, + current_position=i + ) + for i, doc in enumerate(documents) + ] + + docs_to_rank = ranked_docs[:self.ranking_depth] + backfill = ranked_docs[self.ranking_depth:] + + if self.verbose: + print(f"\n\033[92m{'='*60}\033[0m") + print("\033[92m⚡ Starting Top-Down Partitioning Reranking (ASYNC) ⚡\033[0m") + print(f"\033[92mTotal documents: {len(documents)}, Ranking depth: {min(self.ranking_depth, len(documents))}\033[0m") + print(f"\033[92mWindow size: {self.window_size}, Budget: {self.budget}, Pivot position: {self.pivot_position}\033[0m") + print(f"\033[92mTarget k: {self.target_k}\033[0m") + print(f"\033[92m{'='*60}\033[0m") + + candidates = docs_to_rank + all_pivots = [] + iteration = 1 + + while len(candidates) > self.window_size: + if self.verbose: + print(f"\n\033[94m>>> Loop iteration {iteration}: {len(candidates)} candidates\033[0m") + + new_candidates, new_backfill, pivot = await self._apartition_iteration( + query, candidates, [], iteration + ) + + all_pivots.append(pivot) + backfill.extend(new_backfill) + candidates = new_candidates + + if len(candidates) <= self.window_size: + if self.verbose: + print(f"\n\033[93m✓ Termination: {len(candidates)} candidates <= window_size\033[0m") + break + + if len(candidates) >= self.budget: + if self.verbose: + print(f"\n\033[93m✓ Budget reached: {len(candidates)} candidates >= budget\033[0m") + break + + iteration += 1 + if iteration > 10: + if self.verbose: + print("\n\033[91m⚠ Maximum iterations reached\033[0m") + break + + # Recursive refinement or final ranking + if len(candidates) > self.target_k and len(candidates) > self.window_size: + if self.verbose: + print(f"\n\033[95m{'='*60}\033[0m") + print(f"\033[95m🔄 Recursive refinement: {len(candidates)} > {self.target_k}\033[0m") + print(f"\033[95m{'='*60}\033[0m") + + candidate_contents = [doc.content for doc in candidates] + refined_docs = await self._atop_down_partition_recursive(query, candidate_contents, depth=1) + + candidates = [ + ListwiseRankedDocument(content=doc, original_position=-1, current_position=i) + for i, doc in enumerate(refined_docs[:self.target_k]) + ] + + if len(refined_docs) > self.target_k: + backfill_docs = [ + ListwiseRankedDocument(content=doc, original_position=-1, current_position=i) + for i, doc in enumerate(refined_docs[self.target_k:], start=self.target_k) + ] + backfill.extend(backfill_docs) + + elif len(candidates) > 1: + if self.verbose: + print(f"\n\033[95m{'='*60}\033[0m") + print(f"\033[95m🎯 Final ranking of {len(candidates)} candidates\033[0m") + print(f"\033[95m{'='*60}\033[0m") + candidates = await self._arank_window(query, candidates, window_label="Final ") + + final_ranking = candidates + all_pivots + backfill + + if self.verbose: + print(f"\n\033[92m{'='*60}\033[0m") + print("\033[92m✅ Async reranking complete!\033[0m") + print(f"\033[92mFinal: {len(candidates)} candidates + {len(all_pivots)} pivots + {len(backfill)} backfill\033[0m") + print(f"\033[92mTotal inferences: {self.inference_count}\033[0m") + print(f"\033[92mParallelizable: {self.parallel_inference_count}\033[0m") + print(f"\033[92mTheoretical speedup: {self.parallel_inference_count / max(self.inference_count, 1):.2f}x\033[0m") + print(f"\033[92m{'='*60}\033[0m\n") + + return [doc.content for doc in final_ranking] + + def forward( + self, + question: str, + weaviate_client: Optional[weaviate.WeaviateClient] = None + ) -> DSPyAgentRAGResponse: + """ + Synchronous forward pass: retrieve and rerank documents. + + Args: + question: User query + weaviate_client: Weaviate client for search + + Returns: + DSPyAgentRAGResponse with reranked documents + """ + if weaviate_client is None: + if isinstance(self.weaviate_client, weaviate.WeaviateClient): + weaviate_client = self.weaviate_client + + # Initial retrieval + initial_results = weaviate_search_tool( + query=question, + collection_name=self.collection_name, + target_property_name=self.target_property_name, + retrieved_k=self.retrieved_k, + weaviate_client=weaviate_client, + ) + + if self.verbose: + print(f"\n\033[92m🔍 Initial retrieval: {len(initial_results)} documents\033[0m") + + # Rerank using top-down partitioning + reranked_results = self._top_down_partition(question, initial_results) + + return DSPyAgentRAGResponse( + final_answer="", + sources=reranked_results, + searches=[question], + aggregations=None, + usage={}, + ) + + async def aforward( + self, + question: str, + weaviate_async_client: Optional[weaviate.WeaviateAsyncClient] = None + ) -> DSPyAgentRAGResponse: + """ + Async forward pass: retrieve and rerank documents with parallelization. + + Args: + question: User query + weaviate_async_client: Async Weaviate client for search + + Returns: + DSPyAgentRAGResponse with reranked documents + """ + if weaviate_async_client is None: + if isinstance(self.weaviate_client, weaviate.WeaviateAsyncClient): + weaviate_async_client = self.weaviate_client + + initial_results = await async_weaviate_search_tool( + query=question, + collection_name=self.collection_name, + target_property_name=self.target_property_name, + retrieved_k=self.retrieved_k, + weaviate_async_client=weaviate_async_client, + ) + + if self.verbose: + print(f"\n\033[92m🔍 Initial retrieval: {len(initial_results)} documents\033[0m") + + reranked_results = await self._atop_down_partition(question, initial_results) + + return DSPyAgentRAGResponse( + final_answer="", + sources=reranked_results, + searches=[question], + aggregations=None, + usage={}, + ) + + +async def main(): + """ + Example demonstrating the efficiency gains of top-down partitioning. + + This example shows: + 1. How to configure the reranker with different parameters + 2. Comparison between sync and async implementations + 3. Efficiency metrics (inference count, parallelization opportunities) + """ + + # Test different configurations to see efficiency trade-offs + print("\033[93m" + "="*80) + print("TOP-DOWN PARTITIONING RERANKER - DEMONSTRATION") + print("="*80 + "\033[0m\n") + + configs = [ + { + "name": "Small Window (Fast)", + "window_size": 5, + "target_k": 10, + "budget": 15, + "ranking_depth": 50, + }, + { + "name": "Medium Window (Balanced)", + "window_size": 10, + "target_k": 10, + "budget": 30, + "ranking_depth": 100, + }, + ] + + test_q = "How many cells are in the human body?" + + # Setup Weaviate clients + weaviate_client = weaviate.connect_to_weaviate_cloud( + cluster_url=os.getenv("WEAVIATE_URL"), + auth_credentials=weaviate.auth.AuthApiKey(os.getenv("WEAVIATE_API_KEY")), + ) + + weaviate_async_client = weaviate.use_async_with_weaviate_cloud( + cluster_url=os.getenv("WEAVIATE_URL"), + auth_credentials=weaviate.auth.AuthApiKey(os.getenv("WEAVIATE_API_KEY")), + ) + + await weaviate_async_client.connect() + + for config in configs: + print(f"\n\033[93m{'='*80}") + print(f"Testing Configuration: {config['name']}") + print(f"{'='*80}\033[0m") + print(f"Parameters: {config}") + print() + + test_pipeline = TopDownPartitioningReranker( + collection_name="BrightBiology", + target_property_name="content", + verbose=True, + retrieved_k=20, + use_thinking=True, + **{k: v for k, v in config.items() if k != 'name'} + ) + + print("\n\033[96m" + "="*80) + print("SYNC VERSION (Sequential)") + print("="*80 + "\033[0m") + + test_sync_response = test_pipeline.forward(test_q, weaviate_client=weaviate_client) + + print("\n\033[92mTop 5 reranked results:\033[0m") + for i, doc in enumerate(test_sync_response.sources[:5]): + doc_str = str(doc)[:150] if len(str(doc)) > 150 else str(doc) + print(f"\033[96m{i+1}.\033[0m {doc_str}...") + + print("\n\033[93m📊 Efficiency Metrics:\033[0m") + print(f" • Total inferences: {test_pipeline.inference_count}") + print(f" • Parallelizable inferences: {test_pipeline.parallel_inference_count}") + print(f" • Potential speedup: {test_pipeline.parallel_inference_count / max(test_pipeline.inference_count, 1):.2f}x") + + print("\n\n\033[96m" + "="*80) + print("ASYNC VERSION (With True Parallelization)") + print("="*80 + "\033[0m") + + test_async_response = await test_pipeline.aforward(test_q, weaviate_async_client=weaviate_async_client) + + print("\n\033[92mTop 5 reranked results:\033[0m") + for i, doc in enumerate(test_async_response.sources[:5]): + doc_str = str(doc)[:150] if len(str(doc)) > 150 else str(doc) + print(f"\033[96m{i+1}.\033[0m {doc_str}...") + + print("\n\033[93m📊 Efficiency Metrics:\033[0m") + print(f" • Total inferences: {test_pipeline.inference_count}") + print(f" • Parallelizable inferences: {test_pipeline.parallel_inference_count}") + print(f" • Actual speedup with async: ~{test_pipeline.parallel_inference_count / max(test_pipeline.inference_count, 1):.2f}x") + + print("\n" + "="*80 + "\n") + + # Comparison with theoretical sliding window + print("\n\033[93m" + "="*80) + print("EFFICIENCY COMPARISON") + print("="*80 + "\033[0m\n") + + print("For ranking to depth 100 with window size 10:") + print() + print("Sliding Window (stride=5):") + print(" • Inferences needed: (100 / 5) - 1 = \033[91m19 inferences\033[0m") + print(" • Parallelizable: \033[91m0 (sequential dependency)\033[0m") + print(" • Issues: Redundant re-scoring, bottom-up bias") + print() + print("Top-Down Partitioning:") + print(" • Inferences needed: ~\033[92m12-13 inferences\033[0m (33% reduction)") + print(" • Parallelizable: \033[92m~8-10 batches\033[0m") + print(" • Benefits: No redundant scoring, top-down bias, parallelizable") + print() + print("\033[92m✅ Result: Same quality, fewer inferences, better parallelism!\033[0m") + print() + + weaviate_client.close() + await weaviate_async_client.close() + + +if __name__ == "__main__": + asyncio.run(main()) \ No newline at end of file diff --git a/retrieve_dspy/retrievers/vanilla_rag.py b/retrieve_dspy/retrievers/vanilla_rag.py deleted file mode 100644 index f2ab287..0000000 --- a/retrieve_dspy/retrievers/vanilla_rag.py +++ /dev/null @@ -1,93 +0,0 @@ -import asyncio -from typing import Optional - -import dspy - -from retrieve_dspy.tools.weaviate_database import ( - weaviate_search_tool, - async_weaviate_search_tool -) -from retrieve_dspy.retrievers.base_rag import BaseRAG -from retrieve_dspy.models import DSPyAgentRAGResponse -from retrieve_dspy.signatures import QuerySummarizer - -class VanillaRAG(BaseRAG): - def __init__( - self, - collection_name: str, - target_property_name: Optional[str] = "content", - verbose: Optional[bool] = False, - search_only: Optional[bool] = True, - retrieved_k: Optional[int] = 20, - summarize_query: Optional[bool] = False - ): - super().__init__(collection_name, target_property_name, search_only=search_only, verbose=verbose, retrieved_k=retrieved_k) - self.summarize_query = summarize_query - self.query_summarizer = dspy.Predict(QuerySummarizer) - - def forward(self, question: str) -> DSPyAgentRAGResponse: - if self.summarize_query: - question_pred = self.query_summarizer(question=question) - question = question_pred.summary - - contexts, sources = weaviate_search_tool( - query=question, - collection_name=self.collection_name, - target_property_name=self.target_property_name, - retrieved_k=self.retrieved_k, - ) - - if self.verbose: - print(f"\033[96m Returning {len(sources)} Sources!\033[0m") - - if not self.search_only: - print("") - - return DSPyAgentRAGResponse( - final_answer="", - sources=sources, - searches=[question], - aggregations=None, - usage={}, - ) - - async def aforward(self, question: str) -> DSPyAgentRAGResponse: - if self.summarize_query: - question_pred = self.query_summarizer(question=question) - question = question_pred.summary - - contexts, sources = await async_weaviate_search_tool( - query=question, - collection_name=self.collection_name, - target_property_name=self.target_property_name, - retrieved_k=self.retrieved_k, - ) - - if self.verbose: - print(f"\033[96m Returning {len(sources)} Sources!\033[0m") - - if not self.search_only: - print("") - - return DSPyAgentRAGResponse( - final_answer="", - sources=sources, - searches=[question], - aggregations=None, - usage={}, - ) - -async def main(): - test_pipeline = VanillaRAG( - collection_name="FreshstackLangchain", - target_property_name="docs_text", - retrieved_k=5 - ) - test_q = "How do I integrate Weaviate and Langchain?" - response = test_pipeline.forward(test_q) - print(response) - async_response = await test_pipeline.aforward(test_q) - print(async_response) - -if __name__ == "__main__": - asyncio.run(main()) \ No newline at end of file diff --git a/retrieve_dspy/signatures.py b/retrieve_dspy/signatures.py index 153c4af..41c945c 100644 --- a/retrieve_dspy/signatures.py +++ b/retrieve_dspy/signatures.py @@ -1,10 +1,54 @@ import dspy -from retrieve_dspy.models import SearchResult, SearchQueryWithFilter +from retrieve_dspy.models import ObjectFromDB, SearchQueryWithFilter -# Rerankers +# ============ Rerankers ============ -class RelevanceRanker(dspy.Signature): +# Pointwise Rerankers + +class AssessRelevance(dspy.Signature): + """Assess whether or not the candidate document is relevant to the query.""" + + query: str = dspy.InputField(desc="The user's question or information need") + candidate_document: str = dspy.InputField(desc="The candidate document to assess for relevance") + is_relevant: bool = dspy.OutputField(desc="Whether or not the candidate document is relevant to the query") + +# Listwise Rerankers + +class ListwiseRanking(dspy.Signature): + """Given a query and a list of documents, rank them by relevance in descending order (most relevant first).""" + + query: str = dspy.InputField(desc="The search query") + documents: list[str] = dspy.InputField(desc="List of document contents to rank") + ranked_indices: list[int] = dspy.OutputField(desc="List of document indices in order of relevance (0-indexed)") + + +class VerboseListwiseRanking(dspy.Signature): + """ + Given a query and a list of documents, carefully rank them by relevance. + + Instructions: + - Read the query carefully to understand the information need + - Examine each document thoroughly + - Consider multiple relevance factors: + * Direct answer to the query + * Topical relevance + * Information completeness + * Credibility and specificity + - Compare documents against each other, not in isolation + - Think about which documents best satisfy the user's intent + - Return the indices in descending order of relevance (most relevant first) + - The output should be a Python list of integers, e.g., [3, 0, 2, 1, 4] + + Your ranking should reflect a holistic assessment of relevance. + """ + + query: str = dspy.InputField(desc="The search query") + documents: list[str] = dspy.InputField(desc="List of document contents to rank") + ranked_indices: list[int] = dspy.OutputField(desc="List of document indices in descending order of relevance (0-indexed)") + + +class VerboseRelevanceRanker(dspy.Signature): """Rerank passages based on their relevance to the query using listwise comparison. Your task is to analyze ALL passages simultaneously and produce a single ranked list @@ -25,17 +69,26 @@ class RelevanceRanker(dspy.Signature): query: str = dspy.InputField( desc="The user's question or information need" ) - search_results: list[SearchResult] = dspy.InputField( + search_results: list[ObjectFromDB] = dspy.InputField( desc="List of passages to rerank. Each contains: id, text, initial_rank, and hybrid_score" ) top_k: int = dspy.InputField( desc="Exact number of passage IDs to return (strict requirement)" ) - reranked_ids: list[int] = dspy.OutputField( - desc="List of exactly `top_k` passage IDs ordered by relevance (most relevant first). Must match IDs from search_results." + reranked_ids: list[str] = dspy.OutputField( + desc="List of exactly `top_k` object_id strings ordered by relevance (most relevant first). Must use EXACT strings from valid_object_ids (e.g., 'doc2023976', not '2023976')." ) -class BestMatchRanker(dspy.Signature): +class RelevanceRanker(dspy.Signature): + """Rerank passages based on their relevance to the query.""" + + query: str = dspy.InputField(desc="The user's question or information need") + search_results: list[ObjectFromDB] = dspy.InputField(desc="List of passages to rerank.") + top_k: int = dspy.InputField(desc="Exact number of passage IDs to return.") + reranked_ids: list[int] = dspy.OutputField(desc="List of top_k passage IDs ordered by relevance.") + + +class VerboseBestMatchRanker(dspy.Signature): """Identify the single most relevant passage to the query. Your task is to analyze ALL passages simultaneously and identify the one passage @@ -48,35 +101,45 @@ class BestMatchRanker(dspy.Signature): - Factual accuracy and completeness - Information quality and clarity 3. Compare passages against each other (not just individually) - 4. Return the ID of the single most relevant passage + 4. Return the `object_id` of the single most relevant passage - CRITICAL: You must return exactly 1 passage ID - the best match. + CRITICAL: You must return exactly 1 passage `object_id` - the best match. """ query: str = dspy.InputField( desc="The user's question or information need" ) - search_results: list[SearchResult] = dspy.InputField( + search_results: list[ObjectFromDB] = dspy.InputField( desc="List of passages to analyze. Each contains: id, text, initial_rank, and hybrid_score" ) - best_match_id: int = dspy.OutputField( - desc="The ID of the single most relevant passage. Must match an ID from search_results." + valid_object_ids: list[int] = dspy.InputField( + desc="A reminder of the valid `object_id`s from search_results." + ) + best_match_object_id: str = dspy.OutputField( + desc="The `object_id` of the single most relevant passage. Must match an `object_id` from search_results." ) +class BestMatchRanker(dspy.Signature): + """Identify the single most relevant passage to the query.""" + + query: str = dspy.InputField(desc="The user's question or information need") + search_results: list[ObjectFromDB] = dspy.InputField(desc="List of passages to analyze.") + best_match_id: int = dspy.OutputField(desc="The ID of the single most relevant passage.") + class IdentifyMostRelevantPassage(dspy.Signature): """Identify the passage that contains the answer to the query from a list of passages.""" query: str = dspy.InputField( desc="The user's question or information need" ) - search_results: list[SearchResult] = dspy.InputField( + search_results: list[ObjectFromDB] = dspy.InputField( desc="List of passages to analyze. Each contains: id, text, and inital_rank" ) most_relevant_passage: int = dspy.OutputField( desc="The id of the passage that contains the answer to the query" ) -class DiversityRanker(dspy.Signature): +class VerboseDiversityRanker(dspy.Signature): """Select a diverse set of relevant passages that cover different aspects of the query. Your task is to analyze ALL passages simultaneously and select a subset that: @@ -99,7 +162,7 @@ class DiversityRanker(dspy.Signature): query: str = dspy.InputField( desc="The user's question or information need" ) - search_results: list[SearchResult] = dspy.InputField( + search_results: list[ObjectFromDB] = dspy.InputField( desc="List of passages to analyze. Each contains: id, text, initial_rank, and hybrid_score" ) top_k: int = dspy.InputField( @@ -109,7 +172,119 @@ class DiversityRanker(dspy.Signature): desc="List of exactly `top_k` passage IDs representing diverse relevant topics. Must match IDs from search_results." ) -# Query Writers +class DiversityRanker(dspy.Signature): + """Select a diverse set of relevant passages that cover different aspects of the query.""" + + query: str = dspy.InputField(desc="The user's question or information need") + search_results: list[ObjectFromDB] = dspy.InputField(desc="List of passages to analyze.") + top_k: int = dspy.InputField(desc="Exact number of passage IDs to return.") + reranked_ids: list[int] = dspy.OutputField(desc="List of top_k passage IDs representing diverse relevant topics.") + +# ============ Query Writers ============ + +class HyDE(dspy.Signature): + """Please write a passage to answer the question.""" + + question: str = dspy.InputField(desc="The user's question or information need") + passage: str = dspy.OutputField(desc="A passage to answer the question.") + +class VerboseHyDE(dspy.Signature): + """ + Write a comprehensive, informative passage that fully answers the user's question. + + Instructions: + - Carefully read and understand the user's question or information need. + - Write a passage that directly addresses the question, providing a clear, detailed, and self-contained answer. + - Include all relevant background, context, and explanations necessary for a reader unfamiliar with the topic. + - Cover important aspects, nuances, and potential follow-up points related to the question. + - Use precise, accurate, and well-organized language. + - The passage should be suitable as a high-quality answer in a knowledge base or search engine result. + + Your goal is to produce a passage that is as helpful and complete as possible for someone seeking an answer to the question. + """ + + question: str = dspy.InputField(desc="The user's question or information need") + passage: str = dspy.OutputField(desc="A comprehensive passage that fully answers the question.") + +class LameR(dspy.Signature): + """Given a question and its possible answering passages, please write a correct answering passage.""" + + question: str = dspy.InputField(desc="The user's question or information need") + possible_answering_passages: list[ObjectFromDB] = dspy.InputField(desc="The possible answering passages to the question") + correct_answering_passage: str = dspy.OutputField(desc="The correct answering passage to the question") + +class VerboseLameR(dspy.Signature): + """ + Given a user question and a set of possible answering passages, write the best possible passage to answer the question. + + Instructions: + - Carefully analyze the user's question or information need. + - Read and compare ALL the possible answering passages provided. + - Identify which passage(s) contain key information needed to answer the question correctly and completely. + - If useful information is found across multiple passages, synthesize and combine relevant details to produce a comprehensive, accurate answer. + - Write a well-structured, context-rich, self-contained passage that fully addresses the question, using only the provided information. + - Include important background, explanatory details, and any clarifications necessary for a reader unfamiliar with the topic. + - Avoid copying text verbatim; instead, rewrite and integrate content for clarity and quality. + - Do NOT include any information not found in the provided passages. + - The output should be suitable as a top-quality answer in a knowledge base or search engine result. + + Your task is to return a single coherent passage that is the best possible answer to the question, based only on the given passages. + """ + + question: str = dspy.InputField(desc="The user's question or information need") + possible_answering_passages: list[ObjectFromDB] = dspy.InputField(desc="The possible answering passages to the question") + correct_answering_passage: str = dspy.OutputField(desc="The best, synthesized passage answering the question based only on the provided passages.") + +class ThinkQE(dspy.Signature): + """Given a question and its possible answering passages (most of these passages are wrong), please write a correct answering passage. Use your own knowledge, not just the example passages!""" + + question: str = dspy.InputField(desc="The original user's question") + possible_answering_passages: list = dspy.InputField(desc="Top-K retrieved documents from the corpus (most may be incorrect)") + correct_answering_passage: str = dspy.OutputField(desc="A correct answering passage generated through thinking") + +class VerboseThinkQE(dspy.Signature): + """ + Given a user question and a set of possible answering passages (most of which may be wrong), + generate a correct answering passage through deep reasoning and exploration. + + Instructions: + - Carefully analyze the user's question to identify all possible interpretations, facets, and ambiguities. + - Review the provided passages, noting that most may contain incorrect or incomplete information. + - Think deeply about the query space: consider alternative formulations, related concepts, and different semantic angles. + - Explore multiple hypotheses about what the user might be seeking. + - Use your own knowledge to generate a comprehensive, correct answering passage. + - The passage should capture diverse facets and interpretations of the information need. + - Introduce exploratory terms and concepts that go beyond the initial query scope. + - DO NOT simply paraphrase or copy from the provided passages. + - Focus on breadth and exploration rather than narrow, overconfident answers. + + Your task is to produce an expanded, exploratory passage that will help retrieve + a more diverse and comprehensive set of relevant documents. + """ + + question: str = dspy.InputField(desc="The original user's question") + possible_answering_passages: list = dspy.InputField(desc="Top-K retrieved documents (most may be incorrect)") + correct_answering_passage: str = dspy.OutputField(desc="A comprehensive, exploratory answering passage generated through thinking") + +class VerboseExpandQuery(dspy.Signature): + """Expand a query to gather more comprehensive information from a search engine. + + Your task is to rewrite the user's question into an expanded query that is optimized for a search engine. The goal is to retrieve documents that will help answer the original question. + + Instructions: + 1. Analyze the user's question to understand the core intent. + 2. Identify key concepts, entities, and technical terms. + 3. Add context, clarify ambiguities, and include synonyms or related terms. + 4. Formulate a query that is specific and detailed enough to narrow down results, but broad enough to capture relevant variations. + 5. The expanded query should be a single, coherent question or search phrase. + + Example: + - Question: "dspy signature error" + - Expanded Query: "How to fix dspy.Signature field validation error for InputField and OutputField in DSPy framework" + """ + + question: str = dspy.InputField(desc="The original user question.") + expanded_query: str = dspy.OutputField(desc="A detailed, expanded query for a search engine.") class ExpandQuery(dspy.Signature): """Expand a query to gather information from a search engine that will help answer the question.""" @@ -117,7 +292,7 @@ class ExpandQuery(dspy.Signature): question: str = dspy.InputField() expanded_query: str = dspy.OutputField() -class ExpandQueryWithHint(dspy.Signature): +class VerboseExpandQueryWithHint(dspy.Signature): """Expand a query to gather information from a search engine that will help answer the question. Use the initial search results as hints to guide your query expansion. Analyze what information @@ -136,7 +311,14 @@ class ExpandQueryWithHint(dspy.Signature): initial_search_results: str = dspy.InputField() expanded_query: str = dspy.OutputField() -class WriteSearchQueries(dspy.Signature): +class ExpandQueryWithHint(dspy.Signature): + """Expand a query using initial search results as a hint to improve information retrieval.""" + + question: str = dspy.InputField() + initial_search_results: str = dspy.InputField() + expanded_query: str = dspy.OutputField() + +class VerboseWriteSearchQueries(dspy.Signature): """Write search queries to gather information from a search engine that will help answer the question. Consider both exploration and result diversity to capture multiple interpretations and facets of a query. @@ -145,7 +327,13 @@ class WriteSearchQueries(dspy.Signature): question: str = dspy.InputField() search_queries: list[str] = dspy.OutputField() -class DecomposeQueryWithHint(dspy.Signature): +class WriteSearchQueries(dspy.Signature): + """Write search queries to gather information from a search engine that will help answer the question.""" + + question: str = dspy.InputField() + search_queries: list[str] = dspy.OutputField() + +class VerboseDecomposeQueryWithHint(dspy.Signature): """Your task is to decompose a complex technical problem into atomic sub-queries that collectively cover all essential aspects needed to answer the question. You are given the initial search results from the user's original query. Analyze what information is missing or insufficiently covered, then generate sub-queries that will: @@ -170,6 +358,32 @@ class DecomposeQueryWithHint(dspy.Signature): initial_search_results: str = dspy.InputField(desc="Initial retrieval results to identify coverage gaps") sub_queries: list[str] = dspy.OutputField(desc="List of 3-8 atomic sub-queries that maximize nugget coverage") +class DecomposeQueryWithHint(dspy.Signature): + """Decompose a complex query into atomic sub-queries based on initial search results.""" + + user_question: str = dspy.InputField(desc="The original technical question or problem statement") + initial_search_results: str = dspy.InputField(desc="Initial retrieval results to identify coverage gaps") + sub_queries: list[str] = dspy.OutputField(desc="List of atomic sub-queries") + +class VerboseWriteSearchQueriesWithFilters(dspy.Signature): + """Write search queries with optional filters to gather targeted information from a search engine that will help answer the question. + + Your task is to generate a list of search queries based on the user's question. For each query, you can optionally apply filters to narrow down the search space. + + Instructions: + 1. Analyze the question to identify key topics and entities. + 2. Formulate several distinct search queries that explore different facets of the question. + 3. For each query, examine the available filters and decide if any can be applied to improve the results. + 4. Apply filters only when they directly correspond to the information in the query. For example, if the question is about "react components", a filter `library: "react"` would be appropriate. + 5. Construct the output as a list of `SearchQueryWithFilter` objects, where each object contains the `query` string and an optional `filter` dictionary. + + CRITICAL: The filter keys must be from the `filters_available` list. Do not invent new filter keys. + """ + + question: str = dspy.InputField(desc="The user's question.") + filters_available: str = dspy.InputField(desc="A string showing available filter keys, e.g., 'library, source, author'.") + search_queries_with_filters: list[SearchQueryWithFilter] = dspy.OutputField(desc="A list of search queries, each with an optional filter applied.") + class WriteSearchQueriesWithFilters(dspy.Signature): """Write search queries with optional filters to gather information from a search engine that will help answer the question.""" @@ -177,7 +391,7 @@ class WriteSearchQueriesWithFilters(dspy.Signature): filters_available: str = dspy.InputField() search_queries_with_filters: list[SearchQueryWithFilter] = dspy.OutputField() -class WriteFollowUpQueries(dspy.Signature): +class VerboseWriteFollowUpQueries(dspy.Signature): """Given a user question and contexts retrieved so far from search, assess if additional search queries are needed to fully answer the question. You are part of a retrieval system that has already performed an initial search and retrieved some contexts. Your job is to: @@ -192,7 +406,61 @@ class WriteFollowUpQueries(dspy.Signature): follow_up_queries_needed: bool = dspy.OutputField() follow_up_queries: list[str] = dspy.OutputField() -# Summarizers +class WriteFollowUpQueries(dspy.Signature): + """Assess if more information is needed to answer a question and generate follow-up search queries if necessary.""" + + question: str = dspy.InputField() + contexts: str = dspy.InputField() + follow_up_queries_needed: bool = dspy.OutputField() + follow_up_queries: list[str] = dspy.OutputField() + +class VerboseWriteFollowUpQuery(dspy.Signature): + """Given a user question and contexts retrieved so far from search, assess if an additional search query is needed to fully answer the question. + + You are part of a retrieval system that has already performed an initial search and retrieved some contexts. Your job is to: + 1. Analyze whether the current contexts provide sufficient information to answer the user's question + 2. If not, determine what specific information is still missing + 3. Generate a single targeted search query that would retrieve the most critical missing information from a search engine + + The follow-up query should be optimized for search engines and designed to fill the most important gap in the current knowledge base.""" + + question: str = dspy.InputField() + results_found_so_far: list[ObjectFromDB] = dspy.InputField() + follow_up_query_needed: bool = dspy.OutputField() + follow_up_query: str = dspy.OutputField() + +class WriteFollowUpQuery(dspy.Signature): + """Assess if more information is needed to answer a question and generate a follow-up search query if necessary.""" + + question: str = dspy.InputField() + results_found_so_far: list[ObjectFromDB] = dspy.InputField() + follow_up_query_needed: bool = dspy.OutputField() + follow_up_query: str = dspy.OutputField() + +# ============ Summarizers ============ + +class VerboseFilterIrrelevantSearchResults(dspy.Signature): + """Filter out search results that are not relevant to answering the question. + + Your task is to act as a strict relevance filter. For each search result, you must decide if it contains information that is directly useful for answering the user's question. + + Instructions: + 1. Carefully read the user's question to understand the specific information needed. + 2. For each search result, evaluate its content against the question. + 3. A result is RELEVANT if it: + - Directly answers a part of the question. + - Provides essential context or background information. + - Discusses the key entities or concepts mentioned in the question. + 4. A result is IRRELEVANT if it: + - Is on a completely different topic. + - Only mentions keywords without providing substantive information. + - Is an ad, a forum index, or other non-informative content. + 5. Return a list containing ONLY the IDs of the relevant search results. + """ + + question: str = dspy.InputField(desc="The user's question.") + search_results: dict[int, str] = dspy.InputField(desc="The search results keyed by their id.") + filtered_results: list[int] = dspy.OutputField(desc="A list of the IDs of the relevant results only.") class FilterIrrelevantSearchResults(dspy.Signature): """Filter out search results that are not relevant to answering the question.""" @@ -201,14 +469,34 @@ class FilterIrrelevantSearchResults(dspy.Signature): search_results: dict[int, str] = dspy.InputField(desc="The search results keyed by their id.") filtered_results: list[int] = dspy.OutputField(desc="The ids of relevant results.") +class VerboseSummarizeSearchResults(dspy.Signature): + """Summarize search results to extract and synthesize the most important information related to the question. + + Your task is to read all the provided search results and create a single, coherent summary that directly answers the user's question. + + Instructions: + 1. Thoroughly understand the user's question. + 2. Read through all the search results to identify key pieces of information, facts, and explanations. + 3. Synthesize the information from multiple sources. Do not just summarize each document individually. + 4. Construct a comprehensive answer that flows logically. + 5. CRITICAL: Wherever you use information from a search result, you MUST cite its ID using the format `[id]`. For example: "The sky is blue because of Rayleigh scattering [1]." A single sentence can have multiple citations, like `[2, 3]`. + 6. The final summary should be a self-contained answer to the question, based only on the provided search results. + """ + + question: str = dspy.InputField(desc="The user's question.") + search_results: list[ObjectFromDB] = dspy.InputField(desc="A dictionary of search results, with ID as key and text as value.") + summary: str = dspy.OutputField(desc="A comprehensive summary of the search results with citations to the result IDs, e.g., '...information [1, 2].'") + +''' class SummarizeSearchResults(dspy.Signature): """Summarize search results to extract the most important information related to the question.""" question: str = dspy.InputField() - search_results: dict[int, str] = dspy.InputField() + search_results: list[ObjectFromDB] = dspy.InputField() summary: str = dspy.OutputField() # add citations to the ids in the summary +''' -class SummarizeSearchRelevance(dspy.Signature): +class VerboseSummarizeSearchRelevance(dspy.Signature): """Analyze and summarize how a search result addresses the given query. Evaluate the passage's relevance by considering: @@ -229,8 +517,43 @@ class SummarizeSearchRelevance(dspy.Signature): desc="A 2-3 sentence summary of how this passage relates to the query and its relevance" ) +class SummarizeSearchRelevance(dspy.Signature): + """Analyze and summarize how a search result addresses the given query.""" + + query: str = dspy.InputField() + passage: str = dspy.InputField() + relevance_summary: str = dspy.OutputField( + desc="A summary of how this passage relates to the query and its relevance" + ) + +class VerboseQuerySummarizer(dspy.Signature): + """Summarize a technical question into one or two sentences, capturing the core problem and context. + + Your task is to distill a potentially long and detailed technical question into a very short summary. This summary should retain the essential information needed to understand the user's goal. + + Instructions: + 1. Identify the main technology or library involved (e.g., Python, React, Chromadb). + 2. Pinpoint the specific function, component, or concept that is causing the issue. + 3. Determine the user's goal or what they are trying to achieve. + 4. Combine these elements into a concise, one- or two-sentence summary. + + Example: + - Question: "I'm trying to use the from_documents function in chromadb with my own list of documents, but I keep getting a 'NoneType' object has no attribute 'embed_documents' error. I'm using sentence-transformers for embeddings. How do I fix this?" + - Summary: "The user is encountering a 'NoneType' error when using chromadb's `from_documents` function with sentence-transformers embeddings and needs to resolve it." + """ + + question: str = dspy.InputField(desc="The user's technical question.") + summary: str = dspy.OutputField(desc="A one- or two-sentence summary of the question.") + class QuerySummarizer(dspy.Signature): """Summarize a technical question into one or two sentences.""" question: str = dspy.InputField() + summary: str = dspy.OutputField() + +class SummarizeSearchResults(dspy.Signature): + """You are an iterative searching agent. You are given a list of search queries and a summary of their results. You need to summarize the results of the search queries.""" + + search_queries: list[str] = dspy.InputField() + search_results: list[str] = dspy.InputField() summary: str = dspy.OutputField() \ No newline at end of file diff --git a/retrieve_dspy/utils.py b/retrieve_dspy/utils.py index 0f52fbc..b326caa 100644 --- a/retrieve_dspy/utils.py +++ b/retrieve_dspy/utils.py @@ -1,43 +1,10 @@ import json import os -from typing import Iterable, Set, List, Tuple, Callable, Dict +from typing import Iterable, Set, List -import numpy as np import dspy -from dspy import Example, Prediction - -def get_evaluator( - testset: list[Example], - metric: callable -): - evaluator = dspy.Evaluate( - devset=testset, - metric=metric, - num_threads=1, - display_progress=True, - max_errors=1, - provide_traceback=True - ) - - return evaluator - -def offline_recall_evaluator( - results: List[Tuple[Example, Prediction, float]], - metrics: Dict[str, Callable], -) -> Dict[str, float]: - metric_scores = {name: [] for name in metrics.keys()} - - for example, prediction, original_score in results: - for metric_name, metric_func in metrics.items(): - score = metric_func(example, prediction) - metric_scores[metric_name].append(score) - - avg_scores = {} - for metric_name, scores in metric_scores.items(): - avg_scores[metric_name] = np.mean(scores) if scores else 0.0 - - return avg_scores +from dspy import Example # Used for saving training samples and ensuring we are not testing with training samples @@ -84,4 +51,14 @@ def load_training_questions(path: str) -> Set[str]: # Fallback for legacy plain-text lines questions.add(line) - return questions \ No newline at end of file + return questions + +available_lms = ( + "openai/gpt-4.1-mini", + "openai/gpt-5" +) + +def get_lm(requested_lm: str, max_tokens: int = 32000) -> dspy.LM: + if requested_lm not in available_lms: + raise ValueError(f"Requested LM {requested_lm} not available. Available LMs: {available_lms}") + return dspy.LM(requested_lm, cache=False, temperature=1.0, api_key=os.getenv("OPENAI_API_KEY"), max_tokens=max_tokens) \ No newline at end of file diff --git a/scripts/run-eval.py b/scripts/run-eval.py deleted file mode 100644 index be2b1da..0000000 --- a/scripts/run-eval.py +++ /dev/null @@ -1,160 +0,0 @@ -import numpy as np - -import retrieve_dspy -from retrieve_dspy.metrics import create_metric -from retrieve_dspy.datasets.in_memory import load_queries_in_memory - -''' -rag_pipeline = retrieve_dspy.CrossEncoderReranker( - collection_name="EnronEmails", - target_property_name="email_body", - retrieved_k=50, - reranked_k=20, - reranker_provider="voyage", - verbose=True -) - -rag_pipeline = retrieve_dspy.ListwiseReranker( - collection_name="EnronEmails", - target_property_name="email_body_vector", - return_property_name="email_summary", - retrieved_k=5, - reranked_k=5, - verbose=True -) - -rag_pipeline = retrieve_dspy.VanillaRAG( - collection_name="EnronEmails", - target_property_name="email_body_vector", - retrieved_k=5, - verbose=True -) - -rag_pipeline = retrieve_dspy.SummarizedListwiseReranker( - collection_name="EnronEmails", - target_property_name="email_body_vector", - return_property_name="email_body", - retrieved_k=5, - reranked_k=5, - verbose=True -) - -rag_pipeline = retrieve_dspy.CrossEncoderReranker( - collection_name="EnronEmails", - target_property_name="email_body_vector", - return_property_name="email_body", - reranker_provider="hybrid", - retrieved_k=50, - reranked_k=20, - verbose=True -) -''' - -rag_pipeline = retrieve_dspy.LayeredReranker( - collection_name="EnronEmails", - target_property_name="email_body_vector", - return_property_name="email_body", - retrieved_k=50, - reranked_N=5, - reranked_M=1, - verbose=True -) - - -#print(rag_pipeline.__class__.__name__) - -#rag_pipeline.load("./optimization_runs/2_gepa_optimized_query_expander.json") -#used_qs = retrieve_dspy.utils.load_training_questions("./optimization_runs/2_gepa_query_expander_training_samples.jsonl") -used_qs = None - -#print(f"\033[92m{rag_pipeline.expand_query.signature}\033[0m") - -NUM_TRIALS = 3 -scores = [] - -metric = create_metric( - metric_type="recall", - dataset_name="enron", - k=1 -) - -recall_metrics = { - 'recall@1': create_metric( - metric_type="recall", - dataset_name="enron", - k=1, - verbose=False - ), - 'recall@5': create_metric( - metric_type="recall", - dataset_name="enron", - k=5, - verbose=False - ), - 'recall@20': create_metric( - metric_type="recall", - dataset_name="enron", - k=20, - verbose=False - ) -} - -offline_scores_across_trials = {metric_name: [] for metric_name in recall_metrics.keys()} - -for trial in range(NUM_TRIALS): - print(f"\nRunning trial {trial + 1}/{NUM_TRIALS}") - - trainset, testset = load_queries_in_memory( - dataset_name="enron", - train_samples=20, - test_samples=20, - training_samples=used_qs, - seed=trial - ) - - evaluator = retrieve_dspy.utils.get_evaluator( - testset=testset, - metric=metric, - ) - - dspy_evaluator_kwargs = { - "num_threads": 1 - } - - evaluator_result = evaluator(rag_pipeline, **dspy_evaluator_kwargs) - score = evaluator_result.score - scores.append(score) - all_results = evaluator_result.results - print("Running eval for all metrics...") - offline_scores = retrieve_dspy.utils.offline_recall_evaluator( - results=all_results, - metrics=recall_metrics - ) - - for key, value in offline_scores.items(): - print(f"\033[96m{key}\033[0m: \033[92m{value:.3f}\033[0m") - offline_scores_across_trials[key].append(value) - - -print("\n" + "="*60) -print("ORIGINAL METRIC RESULTS ACROSS TRIALS:") -print("="*60) -scores = np.array(scores) -print(f"Individual scores: {[f'{score:.3f}' for score in scores]}") -print(f"Min score: {scores.min():.3f}") -print(f"Max score: {scores.max():.3f}") -print(f"\033[92mMean score: {scores.mean():.3f}\033[0m") -print(f"Std dev: {scores.std():.3f}") - -print("\n" + "="*60) -print("OFFLINE METRICS RESULTS ACROSS TRIALS:") -print("="*60) - -for metric_name in recall_metrics.keys(): - metric_scores = np.array(offline_scores_across_trials[metric_name]) - print(f"\n\033[96m{metric_name}:\033[0m") - print(f" Individual scores: {[f'{score:.3f}' for score in metric_scores]}") - print(f" Min score: {metric_scores.min():.3f}") - print(f" Max score: {metric_scores.max():.3f}") - print(f" \033[92mMean score: {metric_scores.mean():.3f}\033[0m") - print(f" Std dev: {metric_scores.std():.3f}") \ No newline at end of file diff --git a/visuals/cover.png b/visuals/cover.png index 008726c..f2a8b54 100644 Binary files a/visuals/cover.png and b/visuals/cover.png differ