課程網頁: https://www.deeplearning.ai/short-courses/large-language-models-semantic-search/
* 花一節教傳統 keyword 有點浪費時間
* Embedding 沒用 openai,是用 cohere embedding API
* Vector store 用 Weaviate (keyword 搜尋也用這個示範)
* Rerank API 是個 cohere 專有的功能,用來改進搜尋結果,openai 沒有這東西可以用
## Introduction
* 在 LLM 時代,搜尋可以
* 可用問答 QA 的方式
* 可以讓搜尋結果更 Semantic
![[Pasted image 20230817130101.png]]
![[Pasted image 20230817130108.png]]
![[Pasted image 20230817130121.png]]
![[Pasted image 20230817130157.png]]
![[Pasted image 20230817130210.png]]
## Keyword Search
![[Pasted image 20230817130549.png]]
![[Pasted image 20230817211318.png]]
![[Pasted image 20230817211245.png]]
![[Pasted image 20230817211413.png]]
![[Pasted image 20230817211510.png]]
![[Pasted image 20230817211538.png]]
![[Pasted image 20230817211559.png]]
![[Pasted image 20230817211729.png]]
![[Pasted image 20230817211841.png]]
LLM 可以在三處幫上忙: Retrieval, Reranking, Generation
## Embeddings
![[Pasted image 20230818205119.png]]
![[Pasted image 20230818205205.png]]
![[Pasted image 20230818205330.png]]
![[Pasted image 20230818205403.png]]
不止單字,句子也可以
![[Pasted image 20230818205456.png]]
![[Pasted image 20230818205543.png]]
umap_plot 可以將多維度降到二維做圖
![[Pasted image 20230818205728.png]]
## Dense Retrieval
### Part 1: Vector Database for semantic Search
![[Pasted image 20230818210620.png]]
![[Pasted image 20230818210544.png]]
![[Pasted image 20230818210636.png]]
![[Pasted image 20230818210702.png]]
各種比較 keyword search
![[Pasted image 20230818210809.png]]
![[Pasted image 20230818210855.png]]
![[Pasted image 20230818210916.png]]
![[Pasted image 20230818210922.png]]
![[Pasted image 20230818210929.png]]
### Part 2: Building Semantic Search from Scratch
![[Pasted image 20230818211235.png]]
改用段落拆
![[Pasted image 20230818211335.png]]
改回用 sentences,但把 title 加到 chunk
![[Pasted image 20230818211345.png]]
![[Pasted image 20230818211620.png]]
![[Pasted image 20230818211711.png]]
![[Pasted image 20230818211748.png]]
![[Pasted image 20230818211903.png]]
最後 Text Relevance Reranker 就是下一節 ReRank
## ReRank
![[Pasted image 20230818213638.png]]
![[Pasted image 20230818213841.png]]
![[Pasted image 20230818214013.png]]
![[Pasted image 20230818214045.png]]
![[Pasted image 20230818213738.png]]
![[Pasted image 20230818214256.png]]
![[Pasted image 20230818214332.png]]
![[Pasted image 20230818214345.png]]
![[Pasted image 20230818214536.png]]
## Generating Answers
![[Pasted image 20230818214854.png]]
![[Pasted image 20230818214920.png]]
![[Pasted image 20230818214927.png]]
![[Pasted image 20230818215022.png]]
![[Pasted image 20230818215049.png]]
![[Pasted image 20230818215057.png]]
![[Pasted image 20230818215327.png]]
![[Pasted image 20230818215415.png]]
![[Pasted image 20230818215442.png]]
![[Pasted image 20230818215449.png]]