HyDE Retriever - ihower's Notes

https://arxiv.org/abs/2212.10496 https://github.com/texttron/hyde ## 細看原版實作用了 openai API 的 n 產生了 8 個假設性答案 https://github.com/texttron/hyde/blob/main/src/hyde/generator.py 有用 n 參數產生多個假答案然後用本來 query 跟 hypothesis_documents 每個都 encode 之後，平均出一個 hyde_vector 值 https://github.com/texttron/hyde/blob/main/src/hyde/hyde.py L21~L25 在用 hyde_vector 去搜尋向量資料庫.... prompt 是 Please write a passage to answer the question. Question: {} Passage: ## langchain 實作預設似乎 n = 1 https://github.com/langchain-ai/langchain/blob/master/cookbook/hypothetical_document_embeddings.ipynb 但可以在建立 OpenAI 時設定 best_of 沒有把 query embedding 拿去平均，跟原版不同!