A Survey of Techniques for Maximizing LLM Performance (OpenAI DevDay)

> 歡迎訂閱我的 [AI Engineer 電子報](https://aihao.eo.page/6tcs9) 和瀏覽 [[Generative AI Engineer 知識庫]] * 錄影: https://www.youtube.com/watch?v=ahnGLM-RC1Y * 網友寫的摘要: https://www.breezedeus.com/article/make-llm-greater * llamaindex 的後續回應 https://twitter.com/yi_ding/status/1721728060876300461 * langchain 的後續回應 https://blog.langchain.dev/applying-openai-rag/ * Optimizing LLMs is hard * 沒有通用一站解法，還是要看問題跟你如何解決 * Extracting signal from the noise is not easy * Performance can be abstract and difficult to measure * When to use what optimization * Today's talk is about maximizing performance. You should leave here with: * A mental model of what the options are * An appreciation of when to use one over the other * The confidence to continue on the journey yourself ![[Pasted image 20231116221405.png]] * 上圖是有問題的。RAG 跟 Fine-tuneing 解決不同問題，有些需要 RAG，有些需要 fine-tuneing，有些需要 Both。不是線性的。 * 我們認為更像是下圖 ![[Pasted image 20231116222040.png]] * 先從 Prompt engineering 開始做，得到 evaluation * 如果是 context 問題，做 RAG * 如果是 model act problem，需要更一致的遵守指示，做 Fine-tuning * 或是兩者都需要 ![[Pasted image 20231116224555.png]] * 簡單來說，就是 Try something -> Evaluate -> Try something else ## 1. Prompt engineering ![[Pasted image 20231116225933.png]] ![[Pasted image 20231116230015.png]] ![[Pasted image 20231116230702.png]] ![[Pasted image 20231116230719.png]] ![[Pasted image 20231116230940.png]] ## 2. Retrieval-augmented generation ![[Pasted image 20231116235023.png]] * RAG: Giving the model access to domain-specific context ![[Pasted image 20231117000304.png]] ![[Pasted image 20231117000604.png]] ![[Pasted image 20231117000736.png]] * 需要很多迭代、測試和學習才能做好 RAG * HyDE: 產生 fake answers 來做相似性搜尋 * 微調 Embeddings: 從準確率來說有幫助，但太貴太慢了，所以放棄 (non-functional 理由) * 實驗不同大小的 chunks、embedding 不同 bits 的內容 * 嘗試了 20 次迭代才增強到 65% * re-ranking: cross encoder 或 relues-based stuff * classification step: 先讓模型分類是哪一種 domain，再增加額外不同的 metadata 到 prompt，來幫助找到最相關的內容 * 看看是哪些類型的問題答錯 * 導入工具: 有些是結構化資料問題，用 SQL 去查資料庫 * query expansion: 產生多種 query 平行查，再合併結果 * 這裡沒有用微調，因為問題完全是 context 問題 ![[Pasted image 20231117001750.png]] ![[Pasted image 20231117001801.png]] * 人工標註以為是幻覺，但其實不是，因為模型完全照 RAG 的 context 回答 * RAG 搜出一篇財務報告標題是 Optimal Song * RAG 系統不只 LLM 可能出錯，搜尋也可能出錯 * 推薦 rage 這套開源工具來做評估 https://github.com/explodinggradients/ragas ![[Pasted image 20231117002336.png]] ## 3. Fine-tuning (換另個講者) * Fine-tuning: Continuing the training process on a smaller, domain-specific dataset to optimize a model for a specific task * Benefits * Improve model performance on a specific task * Often a more effective way of improving model performance than prompt-engineering or FSL (few-shot learning) * Prompt Engineering 受限於 context size，但微調不會，可以 show 更多 example 給模型學 * Improve model efficiency * Reduce the number of tokens needed to get a model to perform well on your task * 不需要複雜的 prompt 技巧跟 in-context examples，節省 tokens 成本跟 latency * Distill the expertise of a large model into a smaller one ![[Pasted image 20231117010822.png]] * 這個 mistake 可以透過改 prompt 加新規則 or 增加 few-shot example 來解決 ![[Pasted image 20231117010957.png]] ![[Pasted image 20231117011244.png]] * 微調其實不擅長加全新知識，你應該用 RAG。 ![[Pasted image 20231117011607.png]] ![[Pasted image 20231117012058.png]] * GPT 3.5 微調後，甚至超越 GPT4 * why this use case worked? * 不需要新知識，所有要解決這個問題的知識，已經存在於 base model * 輸出需要非常特定的結構 * 用高品質的訓練資料 * 有 baseline 來做評估 ![[Pasted image 20231117012520.png]] * 目的是想要微調寫 blog 的 tone, writing style * 但這是 slack writing style，所以結果就不對了 ![[Pasted image 20231117012821.png]] ![[Pasted image 20231117012828.png]] ![[Pasted image 20231117012941.png]] ![[Pasted image 20231117013024.png]] ![[Pasted image 20231117013202.png]] ![[Pasted image 20231117013301.png]] ## 4. Application of theory ![[Pasted image 20231117013326.png]] ![[Pasted image 20231117013350.png]] ![[Pasted image 20231117013848.png]] ![[Pasted image 20231117013856.png]] ![[Pasted image 20231117014005.png]]