> 歡迎訂閱我的 [AI Engineer 電子報](https://aihao.eo.page/6tcs9) 和瀏覽 [[Generative AI Engineer 知識庫]]
* 錄影: https://www.youtube.com/watch?v=ahnGLM-RC1Y
* 網友寫的摘要: https://www.breezedeus.com/article/make-llm-greater
* llamaindex 的後續回應 https://twitter.com/yi_ding/status/1721728060876300461
* langchain 的後續回應 https://blog.langchain.dev/applying-openai-rag/
* Optimizing LLMs is hard
* 沒有通用一站解法,還是要看問題跟你如何解決
* Extracting signal from the noise is not easy
* Performance can be abstract and difficult to measure
* When to use what optimization
* Today's talk is about maximizing performance. You should leave here with:
* A mental model of what the options are
* An appreciation of when to use one over the other
* The confidence to continue on the journey yourself
![[Pasted image 20231116221405.png]]
* 上圖是有問題的。RAG 跟 Fine-tuneing 解決不同問題,有些需要 RAG,有些需要 fine-tuneing,有些需要 Both。不是線性的。
* 我們認為更像是下圖
![[Pasted image 20231116222040.png]]
* 先從 Prompt engineering 開始做,得到 evaluation
* 如果是 context 問題,做 RAG
* 如果是 model act problem,需要更一致的遵守指示,做 Fine-tuning
* 或是兩者都需要
![[Pasted image 20231116224555.png]]
* 簡單來說,就是 Try something -> Evaluate -> Try something else
## 1. Prompt engineering
![[Pasted image 20231116225933.png]]
![[Pasted image 20231116230015.png]]
![[Pasted image 20231116230702.png]]
![[Pasted image 20231116230719.png]]
![[Pasted image 20231116230940.png]]
## 2. Retrieval-augmented generation
![[Pasted image 20231116235023.png]]
* RAG: Giving the model access to domain-specific context
![[Pasted image 20231117000304.png]]
![[Pasted image 20231117000604.png]]
![[Pasted image 20231117000736.png]]
* 需要很多迭代、測試和學習才能做好 RAG
* HyDE: 產生 fake answers 來做相似性搜尋
* 微調 Embeddings: 從準確率來說有幫助,但太貴太慢了,所以放棄 (non-functional 理由)
* 實驗不同大小的 chunks、embedding 不同 bits 的內容
* 嘗試了 20 次迭代才增強到 65%
* re-ranking: cross encoder 或 relues-based stuff
* classification step: 先讓模型分類是哪一種 domain,再增加額外不同的 metadata 到 prompt,來幫助找到最相關的內容
* 看看是哪些類型的問題答錯
* 導入工具: 有些是結構化資料問題,用 SQL 去查資料庫
* query expansion: 產生多種 query 平行查,再合併結果
* 這裡沒有用微調,因為問題完全是 context 問題
![[Pasted image 20231117001750.png]]
![[Pasted image 20231117001801.png]]
* 人工標註以為是幻覺,但其實不是,因為模型完全照 RAG 的 context 回答
* RAG 搜出一篇財務報告標題是 Optimal Song
* RAG 系統不只 LLM 可能出錯,搜尋也可能出錯
* 推薦 rage 這套開源工具來做評估 https://github.com/explodinggradients/ragas
![[Pasted image 20231117002336.png]]
## 3. Fine-tuning
(換另個講者)
* Fine-tuning: Continuing the training process on a smaller, domain-specific dataset to optimize a model for a specific task
* Benefits
* Improve model performance on a specific task
* Often a more effective way of improving model performance than prompt-engineering or FSL (few-shot learning)
* Prompt Engineering 受限於 context size,但微調不會,可以 show 更多 example 給模型學
* Improve model efficiency
* Reduce the number of tokens needed to get a model to perform well on your task
* 不需要複雜的 prompt 技巧跟 in-context examples,節省 tokens 成本跟 latency
* Distill the expertise of a large model into a smaller one
![[Pasted image 20231117010822.png]]
* 這個 mistake 可以透過改 prompt 加新規則 or 增加 few-shot example 來解決
![[Pasted image 20231117010957.png]]
![[Pasted image 20231117011244.png]]
* 微調其實不擅長加全新知識,你應該用 RAG。
![[Pasted image 20231117011607.png]]
![[Pasted image 20231117012058.png]]
* GPT 3.5 微調後,甚至超越 GPT4
* why this use case worked?
* 不需要新知識,所有要解決這個問題的知識,已經存在於 base model
* 輸出需要非常特定的結構
* 用高品質的訓練資料
* 有 baseline 來做評估
![[Pasted image 20231117012520.png]]
* 目的是想要微調寫 blog 的 tone, writing style
* 但這是 slack writing style,所以結果就不對了
![[Pasted image 20231117012821.png]]
![[Pasted image 20231117012828.png]]
![[Pasted image 20231117012941.png]]
![[Pasted image 20231117013024.png]]
![[Pasted image 20231117013202.png]]
![[Pasted image 20231117013301.png]]
## 4. Application of theory
![[Pasted image 20231117013326.png]]
![[Pasted image 20231117013350.png]]
![[Pasted image 20231117013848.png]]
![[Pasted image 20231117013856.png]]
![[Pasted image 20231117014005.png]]