課程網頁: https://www.deeplearning.ai/short-courses/large-language-models-semantic-search/ * 花一節教傳統 keyword 有點浪費時間 * Embedding 沒用 openai,是用 cohere embedding API * Vector store 用 Weaviate (keyword 搜尋也用這個示範) * Rerank API 是個 cohere 專有的功能,用來改進搜尋結果,openai 沒有這東西可以用 ## Introduction * 在 LLM 時代,搜尋可以 * 可用問答 QA 的方式 * 可以讓搜尋結果更 Semantic ![[Pasted image 20230817130101.png]] ![[Pasted image 20230817130108.png]] ![[Pasted image 20230817130121.png]] ![[Pasted image 20230817130157.png]] ![[Pasted image 20230817130210.png]] ## Keyword Search ![[Pasted image 20230817130549.png]] ![[Pasted image 20230817211318.png]] ![[Pasted image 20230817211245.png]] ![[Pasted image 20230817211413.png]] ![[Pasted image 20230817211510.png]] ![[Pasted image 20230817211538.png]] ![[Pasted image 20230817211559.png]] ![[Pasted image 20230817211729.png]] ![[Pasted image 20230817211841.png]] LLM 可以在三處幫上忙: Retrieval, Reranking, Generation ## Embeddings ![[Pasted image 20230818205119.png]] ![[Pasted image 20230818205205.png]] ![[Pasted image 20230818205330.png]] ![[Pasted image 20230818205403.png]] 不止單字,句子也可以 ![[Pasted image 20230818205456.png]] ![[Pasted image 20230818205543.png]] umap_plot 可以將多維度降到二維做圖 ![[Pasted image 20230818205728.png]] ## Dense Retrieval ### Part 1: Vector Database for semantic Search ![[Pasted image 20230818210620.png]] ![[Pasted image 20230818210544.png]] ![[Pasted image 20230818210636.png]] ![[Pasted image 20230818210702.png]] 各種比較 keyword search ![[Pasted image 20230818210809.png]] ![[Pasted image 20230818210855.png]] ![[Pasted image 20230818210916.png]] ![[Pasted image 20230818210922.png]] ![[Pasted image 20230818210929.png]] ### Part 2: Building Semantic Search from Scratch ![[Pasted image 20230818211235.png]] 改用段落拆 ![[Pasted image 20230818211335.png]] 改回用 sentences,但把 title 加到 chunk ![[Pasted image 20230818211345.png]] ![[Pasted image 20230818211620.png]] ![[Pasted image 20230818211711.png]] ![[Pasted image 20230818211748.png]] ![[Pasted image 20230818211903.png]] 最後 Text Relevance Reranker 就是下一節 ReRank ## ReRank ![[Pasted image 20230818213638.png]] ![[Pasted image 20230818213841.png]] ![[Pasted image 20230818214013.png]] ![[Pasted image 20230818214045.png]] ![[Pasted image 20230818213738.png]] ![[Pasted image 20230818214256.png]] ![[Pasted image 20230818214332.png]] ![[Pasted image 20230818214345.png]] ![[Pasted image 20230818214536.png]] ## Generating Answers ![[Pasted image 20230818214854.png]] ![[Pasted image 20230818214920.png]] ![[Pasted image 20230818214927.png]] ![[Pasted image 20230818215022.png]] ![[Pasted image 20230818215049.png]] ![[Pasted image 20230818215057.png]] ![[Pasted image 20230818215327.png]] ![[Pasted image 20230818215415.png]] ![[Pasted image 20230818215442.png]] ![[Pasted image 20230818215449.png]]