Beyond the Basics of RAG w/ Ben Clavié

Source: https://parlance-labs.com/education/rag/ben.html

請注意，本網頁為程式自動產生，可能會有錯誤，請觀賞原影片做查核。網頁產生方式為影片每5秒截圖、去除重複的影像，使用 whisper 模型做語音辨識字幕、使用 gpt-4o 做中文翻譯，以及 Claude 做摘要。

RAG（檢索增強生成）的基本概念
1. RAG 不是一個完整的端到端系統，而是將檢索和生成結合的方法
2. 主要組成部分：
  1. 檢索管道
  2. 生成模型（大型語言模型）
  3. 連接兩者的方法（如提示格式化）
RAG 的簡單實現（Compact MVP）
1. 使用雙編碼器（Bi-encoder）模型進行向量搜索
2. 步驟：
  1. 加載模型
  2. 獲取並編碼數據
  3. 存儲向量
  4. 編碼查詢
  5. 進行餘弦相似度搜索
3. 無需使用向量數據庫，對於小規模文檔（如 500 個）可直接使用 NumPy
進階 RAG 技術
1. 使用交叉編碼器（Cross-encoder）進行重新排序
  1. 更準確但計算成本更高
  2. 通常用於對雙編碼器結果進行細化
2. 結合 TF-IDF/BM25 關鍵詞搜索
  1. 彌補向量搜索在處理特定關鍵詞時的不足
  2. 特別適用於專業領域或包含大量縮寫的文本
3. 利用元數據進行過濾
  1. 提高檢索準確性，特別是對於日期、部門等結構化信息
  2. 使用實體檢測模型（如 Gleaner）提取元數據
長上下文窗口對 RAG 的影響
1. 允許檢索更長的文檔
2. 減少對高精度檢索的需求，可以檢索更多文檔
3. 提供更多靈活性，在檢索速度和準確性之間取得平衡
先進的檢索模型：ColBERT
1. 使用多向量表示，而不是單一向量
2. 每個標記都有自己的嵌入
3. 在域外泛化性能更好
嵌入模型的微調
1. 建議使用 sentence-transformers 庫
2. 微調過程：
  1. 準備查詢和相關/不相關文檔對
  2. 使用三元組損失進行訓練
  3. 考慮使用難負例來提高模型性能
實用建議
1. 在生產環境中結合 ElasticSearch 與重新排序器
2. 使用 BM25 檢索候選文檔，然後用深度學習模型進行重新排序
3. 選擇嵌入模型時，不必使用過大的模型，100-1000M 參數的模型通常足夠
處理長文檔的策略
1. 使用摘要：預先總結文檔，檢索時使用摘要
2. 考慮延遲容忍度，平衡處理時間和結果質量
3. 根據具體使用場景選擇適當的方法

	圖片編號: 0 圖片時間: 0.00秒
Hamel Husain: Benclava is Hamel Husain: you know, one of the cracked researchers who work at answer. AI Hamel Husain：Benclava 是 Hamel Husain：你知道的，是在 answer.AI 工作的頂尖研究員之一	圖片編號: 1 圖片時間: 5.00秒
Hamel Husain: you've heard from several researchers from answer I already in this conference Ben has a background in information, retrieval. Hamel Husain：你們已經在這次會議上聽過幾位來自 Answer 的研究人員，Ben 有信息檢索的背景。	圖片編號: 2 圖片時間: 10.00秒
Hamel Husain: amongst other things. Hamel Husain: and has he has an open source package called rag ragatui. Hamel Husain: which you should check out. Hamel Husain: It also comes from a deep background and information retrieval Hamel Husain: and brings that to rag any also one of the clearest thinkers on the topic. Hamel Husain: but yeah, I'll hand it over to you, Ben. Hamel Husain: kind of give more color to your background. Anything that I missed. Hamel Husain: And yeah, we can just jump into it. Ben: Okay, let's go. So I think that's pretty much the key aspect of my background. You pretty much read this slide out. Hamel Husain：其中還有其他事情。 Hamel Husain：他有一個開源套件叫做 rag ragatui。 Hamel Husain：你應該去看看。 Hamel Husain：這也來自於他在信息檢索方面的深厚背景。 Hamel Husain：並將其帶入 RAG，他也是該主題上最清晰的思考者之一。 Hamel Husain：好的，我把時間交給你，Ben。 Hamel Husain：給你的背景更多的色彩。我有遺漏什麼嗎？ Hamel Husain：好的，我們可以直接進入主題。 Ben：好的，開始吧。我想這基本上就是我背景的關鍵方面。你基本上讀出了這張幻燈片。	圖片編號: 10 圖片時間: 50.00秒
Ben: So I do. Yeah, I do. And the atan. So I with, like Jamie, you've seen Joe? No, in this course, and there's like a lot of other awesome people Ben: we distributed and the lab. So we do a research, and we try to be as open source as possible because we want people to use what we build Ben: prior to joining. And so I Ben: did a lot of Nlp. And kind of stumbled upon information retrieval, because it's very, very useful and everybody wants information retrieval. It's more for like clarifying what information retrieval is which I hope today will help. Ben: And yeah, my, so my claim to fame or claim to moderate fame at last is the Ragato Library, which makes it like, makes it much easier to use a family of models called Colbert. Ben: which we will very briefly mention today, but won't have time to go into detail. Ben: But hopefully, like, if you want to know more about that like, do feel free to ping me on this call. I'm generally either very responsive, or you need to ping me again Ben: pretty much. I walk, and I also maintain the rear and Cos library, which we will discuss in one of the later slides. Ben: And yeah, if you know me, I want to follow me. I want to hear more. But why I do. It's pretty much all on Twitter. I'm not on Linkedin at all. I'm just everything goes through Twitter a lot of memes, and should post. But some very informative stuff once in a while. So Ben: so yeah. Ben: and Ben: let's get started with Ben: what we're going to talk about today. And Ben：所以我做了。對，我做了。還有 atan。所以我和 Jamie 一樣，你見過 Joe 嗎？沒有，在這門課程中，有很多其他很棒的人。 Ben：我們分發了和實驗室。所以我們做研究，並且我們盡量做到開源，因為我們希望人們使用我們構建的東西。 Ben：在加入之前。所以我 Ben：做了很多 NLP。然後偶然發現了信息檢索，因為它非常非常有用，每個人都想要信息檢索。這更多是為了澄清什麼是信息檢索，我希望今天能有所幫助。 Ben：對，我的成名之作或至少是小有名氣的是 Ragato Library，它使得使用一系列名為 Colbert 的模型變得更加容易。 Ben：我們今天會簡要提到它，但沒有時間詳細討論。 Ben：但希望，如果你想了解更多，請隨時在這個通話中聯繫我。我通常回應很快，或者你需要再聯繫我一次。 Ben：基本上。我也維護 rear 和 Cos library，我們會在後面的幻燈片中討論。 Ben：如果你認識我，想關注我，想了解更多關於我做的事情。這些基本上都在 Twitter 上。我完全不在 Linkedin 上。我所有的東西都通過 Twitter，很多迷因和應該發的帖子，但偶爾也有一些非常有用的東西。 Ben：所以，對。 Ben：那麼 Ben：讓我們開始今天要討論的內容。	圖片編號: 27 圖片時間: 135.00秒
Ben: so it's only half an hour, so we're not going to talk about a lot I'm going to talk about. Why, I think out the like call retrieval basics us. They should exist in your pipelines, because rag is a very nebulous term, and that will be the 1st slide, and Hamel will be very happy about that slide, I think. Ben: but Ben: rags, not a silver bullet rags, not a new thing. From December 2022 rags naive on an N. 20 system will cover that. But I think it's very important to like ground it a bit. When we talk about rag, because it means a lot of different things to different people. Ben: Then we will cover basically what we call the compact Mvp. Which is what most people do when they are starting out with rag. It's actually an example from Jeremy. Ben: And it's like the simplest possible implementation of rag, as in just using vector, search. Ben: And then the other topics are basically thing that I think you should have in your rack, Pipeline as part of your Mvp. And I'll show that like, there's a lot of scary concepts because they're all big walls like Bianca cross and tf, idf, slash. Vm, 25. Filtering. That sounds like a lot. But then I'm going to try and show it. But they are very simple concepts, and you can have Ben: pretty much the same Mvp by adding just 10 lines of code, but using, like basically state of the art retrieval components in every bit. Ben: and the bonus, which I don't think we'll have time to cover when I try this again was talking about Colbert, because I like talking about Colbert, so I might do it at the end if we have some time. But I might not. Ben: And yeah, that's it for the agenda. Ben: And then I also think it's important to have the contour agenda, which is what we won't be talking about today, because those are just as important for Rag. But they're not what would put in the very basics, and here will very much about the basics. So one of them is Ben：所以只有半小時，所以我們不會談很多我要談的內容。我會談談為什麼我認為檢索基礎應該存在於你的管道中，因為 RAG 是一個非常模糊的術語，這將是第一張幻燈片，Hamel 會對那張幻燈片非常滿意，我想。 Ben：但是 Ben：RAG 不是萬能的解決方案，RAG 也不是新事物。從 2022 年 12 月起，RAG 在 N.20 系統上的天真實現將涵蓋這一點。但我認為在談論 RAG 時，將其基礎打牢非常重要，因為它對不同的人意味著很多不同的東西。 Ben：然後我們將基本介紹我們所謂的緊湊型 MVP。這是大多數人在開始使用 RAG 時所做的事情。這實際上是 Jeremy 的一個例子。 Ben：這是 RAG 最簡單的實現方式，只是使用向量搜索。 Ben：然後其他主題基本上是我認為你應該在你的 RAG 管道中作為 MVP 的一部分的內容。我會展示很多看起來很可怕的概念，因為它們都是大牆，比如 Bianca 交叉和 TF-IDF/BM25 過濾。這聽起來很多，但我會嘗試展示它們其實是非常簡單的概念，你可以通過添加 10 行代碼來實現幾乎相同的 MVP，但在每個部分使用基本上最先進的檢索組件。 Ben：還有一個額外的部分，我不認為我們有時間涵蓋，當我再次嘗試時是談論 ColBERT，因為我喜歡談論 ColBERT，所以如果有時間我可能會在最後談談，但也可能不會。 Ben：是的，這就是議程的全部內容。 Ben：然後我還認為有必要有一個輪廓議程，即我們今天不會談論的內容，因為這些對 RAG 也同樣重要，但它們不是我們會放在非常基礎的內容中，而這裡將非常關注基礎。所以其中之一是	圖片編號: 45 圖片時間: 225.00秒
Ben: how to monitor and improve rag systems because rags are systems and they're living systems and very much things you should monitor and continuously improve on. I think Jedon covered that quite well in his talk yesterday or last week. Yeah, last week. Ben: So I would invite you to watch that and watch Jason and Dan's upcoming course, if it does materialize Ben: evaluations. They're also extremely important. But we won't talk about them at all today. But I know that, Joe. We talk about them at length in this talk. Ben: benchmarks and paper references. So I'll make a lot of claims that you will just have to trust me on, because I don't want to like have too many references, or too many, like academic looking tables and daystream to keep it quite lively, and a Ben: I won't give you a rundown of all the best performing models, and why you should use them. I won't talk about training the augmentation, etc, and I won't talk about all the other cool approaches like split call bear in details because they go beyond the basics. Ben: But those are all very important topics. So if you're interested to look up, there's a lot of good resources out there Ben: do feel free to ask me. And with that Ben: let's get started with the rant. Ben: which is my favorite part. So Ben：如何監控和改進 RAG 系統，因為 RAG 是系統，而且是活的系統，非常值得你監控和持續改進。我認為 Jedon 在昨天或上週的演講中已經很好地覆蓋了這個話題。對，上週。 Ben：所以我邀請你觀看那場演講，並觀看 Jason 和 Dan 即將推出的課程，如果它真的實現的話。 Ben：評估。這些也非常重要。但我們今天不會討論它們。但我知道，Joe。我們在這次演講中詳細討論了它們。 Ben：基準測試和論文參考。所以我會提出很多你必須相信我的主張，因為我不想有太多的參考資料，或者太多像學術表格和數據流，保持它相當生動，和一個 Ben：我不會給你所有最佳表現模型的概述，以及為什麼你應該使用它們。我不會談論訓練增強等，也不會詳細討論其他很酷的方法，如 split call bear，因為它們超出了基礎。 Ben：但這些都是非常重要的話題。所以如果你有興趣查找，有很多好的資源在那裡。 Ben：隨時問我。那麼 Ben：讓我們開始這個討論。 Ben：這是我最喜歡的部分。所以	圖片編號: 60 圖片時間: 300.00秒
Ben: this is a thing that Tamil been doing on Twitter recently as part of his flame posting campaign, I'll say, which is basically. Ben: there's so much in AI so much, especially in the Adl. M. World that uses walls that are like a lot scarier than they need to be, and rags probably that because to me when I hear retrieval of message, generation or rag. Ben: It sounds like that's an end to end system. That's a very definite set of components. That's a thing that works on its own. Ben: And it's not. It's literally just doing retrieval to put stuff into your prompt context like before your prompt, after your prompt, you want to get some context where you're doing retrieval. Ben: But that means that's not an end to end system. Despite what Jason will have you believe on his twitter is not created it, but it does make a lot of money from it. And Ben: it's basically just the act of stitching together retrieval. So the our part of rug and generation. So the G part of Ben: like to ground the letter through like you want to your generation to be granted to you some context. So you're doing rich revol, and whatever documents you have and pass it to your Lm. Ben: But there's no magic going on. It's very much like a pipeline that take the output of model A and gives it to model. B. Ben: The generation part is what's handled by the large language. Models and good rags are actually Ben: 3 different components. It's your good retrieval pipeline. It's a good generative model, and it's a good way of linking them up so it can be formatting your prompt or whatever. Ben: and it's very important to think about it when you're saying my rag doesn't work. Ben: You need to be more specific, like my rag doesn't. Fork is the same as saying like, car doesn't fork. It's like, Yeah, but something specific is broken. You need to figure out what is it. The retrieval parts is the Lm. Struggling to make use of the context, etc. There's a lot of fail cases there. Ben: And with that being said, let's look at what like the compact Mvp. Is. So that is basically what you will see. I think. Ben：這是 Tamil 最近在 Twitter 上做的一件事，作為他 flame posting 活動的一部分，我會這麼說，基本上。 Ben：在 AI 中有很多東西，尤其是在 Adl. M. 世界中，使用的術語比實際需要的要嚇人得多，而 rags 可能就是這樣，因為對我來說，當我聽到檢索消息生成或 rag 時。 Ben：聽起來像是一個端到端的系統。這是一組非常明確的組件。這是一個可以自行運作的東西。 Ben：但事實並非如此。它只是字面上在檢索以將內容放入你的提示上下文中，比如在你的提示之前，提示之後，你想要獲取一些上下文，你正在進行檢索。 Ben：但這意味著這不是一個端到端的系統。儘管 Jason 在他的 Twitter 上會讓你相信它不是創建的，但它確實從中賺了很多錢。而且 Ben：它基本上只是將檢索和生成結合在一起的行為。所以 rug 的 r 部分和生成的 g 部分。 Ben：就像你希望你的生成能夠基於某些上下文。所以你正在進行豐富的檢索，並將你擁有的任何文檔傳遞給你的 Lm。 Ben：但這裡沒有魔法。這非常像一個管道，將模型 A 的輸出給模型 B。 Ben：生成部分是由大型語言模型處理的，而好的 rags 實際上是 Ben：3 個不同的組件。它是你的良好檢索管道。它是一個好的生成模型，並且它是一個將它們連接起來的好方法，因此它可以格式化你的提示或其他內容。 Ben：當你說我的 rag 不工作時，考慮這一點非常重要。 Ben：你需要更具體一些，比如我的 rag 不工作，就像說車子不工作一樣。是的，但某些特定的東西壞了。你需要弄清楚是什麼。是檢索部分還是 Lm 難以利用上下文等。有很多失敗的情況。 Ben：說到這裡，讓我們看看緊湊的 Mvp 是什麼。所以這基本上就是你會看到的。我想。	圖片編號: 81 圖片時間: 405.00秒
Ben: if you've read any medium block post about the advent of rag in early 2023. That's the pipeline that everyone used, and that's also because the easiest pipeline to bring to production is very simple. Ben: You have a query. You have an embedding model. You have documents, the documents get embedded and pulled into a single. Vector. Ben: then you do cosine similarity. Search between the vectors for your query and for the documents, and that gets you your results that gets yours call. Ben: And this is a bit of a teaser for an upcoming slide. When I say this is called the Bianca approach purchase. So you get the term in mind, and I'll define it, because that's 1 of those things that is like a scary term. Ben: That's actually, very, very simple when you break it down. Ben: But 1st Ben: let's look at what this actually means in code, this whole pipeline. So the 1st thing you want to do is load your model. Ben：如果你在 2023 年初讀過任何關於 RAG 出現的 medium 文章，那就是大家使用的管道，因為這是最簡單的生產管道，非常簡單。 Ben：你有一個查詢。你有一個嵌入模型。你有文件，這些文件被嵌入並拉入一個單一的向量。 Ben：然後你做餘弦相似度。對查詢和文件的向量進行搜索，這樣你就能得到結果，這樣你就能得到你的調用。 Ben：這是一個即將到來的幻燈片的預告。當我說這叫做 Bianca 方法時，你會記住這個術語，我會定義它，因為這是那些看起來很可怕的術語之一。 Ben：但當你分解它時，其實非常非常簡單。 Ben：但首先 Ben：讓我們看看這在代碼中實際意味著什麼，整個管道。所以你首先要做的是加載你的模型。	圖片編號: 89 圖片時間: 445.00秒
Ben: Then you get your data. Ben: you encode it, you store your vectors. Ben: Then you get your query. You encode it, and then here we use numpy. You do a cosine similarity search egot product between normalized vectors Ben: to get the most similar documents, and the document that I, Ben: similar to like the documents whose embedding are similar to your query. Embedding is what you would consider as you relevant documents, you know. Ben: and that's pretty much it. That's Ben: modified from something that Jeremy did to showcase how simple Rag actually is in his house guide to Lms. But that's what you want to do to retrieve context in the simplest possible way. Ben: And you will have noticed that there's no vector dB in this. This is all numpy eyes. And this is all numpy eyes. Because when you use vector dB's. Ben：然後你獲取你的數據。 Ben：你對其進行編碼，存儲你的向量。 Ben：然後你獲取你的查詢。你對其進行編碼，然後在這裡我們使用 numpy。你進行餘弦相似度搜索，即在歸一化向量之間進行內積 Ben：以獲取最相似的文檔，而我， Ben：類似於那些嵌入與你的查詢相似的文檔。嵌入是你認為相關的文檔，你知道的。 Ben：基本上就是這樣。這是 Ben：從 Jeremy 做的一些修改來展示 Rag 在他的 Lms 指南中實際上有多簡單。但這就是你想要做的，以最簡單的方式檢索上下文。 Ben：你會注意到這裡沒有向量數據庫。這一切都是 numpy 化的。這一切都是 numpy 化的。因為當你使用向量數據庫時。	圖片編號: 98 圖片時間: 490.00秒
Ben: the huge point of using a vector dB is to allow you to efficiently search through a lot of documents, because what a vector dB does generally not all of them. But most of them Ben: wrap stuff like at H. And SW. Or Ivf. PQ. Which are indexing types. And what that allows you to do is to find and retrieve relevant documents without having to compute cosine similarity against every single document. It like tries to do an approximate search of an exact search. Ben: This is not something that you need. If you're embedding like 500 documents like your CPU, you can do that in milliseconds. You don't actually need a vector dB, if you're trying to go to the simplest possible stage, but Ben: if you wanted one, it would go right here on the graph like, right after you unpad your documents. You would put them in the victor diving. Ben: And the second thing I think to discuss about Ben: is like this tiny graph is. Ben：使用向量資料庫的一個重要點是讓你能夠高效地搜索大量文檔，因為向量資料庫通常會做的事情，不是所有的，但大多數會這樣做。 Ben：包裝像 H 和 SW 或 Ivf PQ 這樣的索引類型。這樣做可以讓你找到並檢索相關的文檔，而不必對每個文檔計算餘弦相似度。它嘗試做一個近似搜索而不是精確搜索。 Ben：如果你只嵌入大約 500 個文檔，你的 CPU 可以在毫秒內完成這個工作，你其實不需要向量資料庫，如果你只是想要最簡單的階段，但 Ben：如果你需要一個，它會在這裡的圖表上，正好在你取消填充文檔之後。你會把它們放入向量資料庫。 Ben：第二件我認為需要討論的事情是這個小圖表。	圖片編號: 109 圖片時間: 545.00秒
Ben: why am I calling embeddings by encoders? Because that step that I call bank order. You will have seen a lot of times, but you will always see genetic call embeddings, or model Ben: and Bangkok the term that the I Al attach your uses to refer to that. Ben: and it's simply because you encode things separated like you do 2 encoding stages. So it's a buy encoding. Ben: And that's used to create single vector presentations where you pre-compute all your documentary presentations. So when you're using biancoders. Ben: you uncovered your documents Ben: whenever you want like when you're creating your database, when you're adding documents, those get encoded at a time that's completely separate from in France. Ben: and then only at in France. Will you like, in the second aspect of this colon? Will you embed your query to compare to your precomputed documentary presentations? Ben: So that's really, really computationally efficient. Because at in France you're only ever encoding one thing, which is the query, and everything else has been done before. Ben: And so that is part of why it's done that quickly. Ben: And I did want to take a slide break, because I can see there are questions, but they're not showing up on my screen. So they're only on like this. Quick! Mvp. Done. Dan Becker: Yeah, let me look through some of the questions. Dan Becker: I'm gonna give you a few of them, and you can decide whether you want to take them now or later. Dan Becker: So we got one. It's a 7,000 query, a 7,000 question and answer data set. Dan Becker: Can I optimize rag to accurately retrieve and quote exact answers. Dan Becker: Also, if I think any queries is slightly different from the original data. Dan Becker: I think there's actually 2 parts that. So one is Dan Becker: to quote the exact answer to something about the is not the information retrieval part. Dan Becker: but it's rather just like, what do you tell the Lm. To do Dan Becker: But Dan Becker: the information chival part is probably well. Dan Becker: you see. Oh. Dan Becker: you go ahead. Ben: I will actually cover how to better deal with out of context things in like an upcoming slide. Ben: It's. Dan Becker: Add it? Dan Becker: Yeah. Why do you keep going? None of these questions. Ben: Yeah, I'll just go back and say. Ben: Yeah, perfect. Yeah. Ben: Okay, so the next one is, if that's very computationally efficient. Ben: there is an obvious trade off here, and that is, your documents are entirely unaware of your query, and your queries are entirely unaware of your documents. Ben: which means that you're very, very like subject to how it was trained is basically, if your queries look a bit different from your training data, or if Ben: like, if there's very, very specific information that will be in certain documents and not although sometimes you want to know what how the query is phrase. You want to know what the query is looking for Ben: when you encoding your document so that it can like kind of paint. That representation represented more towards information that you're interested in. Ben: and that's done with what we call rerunking. So rerunking is another one of the scary stages that we'll see in your pipeline, and the most common way to do. Rerunking is using something that we call cross encoder. Ben：為什麼我稱嵌入為編碼器？因為我稱之為 bank order 的步驟。你會經常看到這個，但你總是會看到基因嵌入或模型。 Ben：而 Bangkok 是我 Al attach your 使用的術語來指代這個。 Ben：這只是因為你將事物分開編碼，就像你做兩個編碼階段一樣。所以這是一個雙編碼。 Ben：這用於創建單一向量表示，你預先計算所有文檔表示。所以當你使用雙編碼器時。 Ben：你可以隨時解碼你的文檔，比如當你創建數據庫時，當你添加文檔時，這些文檔會在與推理完全分開的時間進行編碼。 Ben：然後只有在推理時，你才會在這個過程的第二個方面嵌入你的查詢，以與預先計算的文檔表示進行比較。 Ben：所以這在計算上非常高效。因為在推理時，你只需要編碼一個東西，就是查詢，其他所有事情都已經完成。 Ben：這就是為什麼它能如此快速完成的一部分原因。 Ben：我想休息一下，因為我看到有問題，但它們沒有顯示在我的屏幕上。所以它們只在這個快速 MVP 上顯示。 Dan Becker：好的，讓我看看一些問題。 Dan Becker：我會給你一些問題，你可以決定現在回答還是稍後回答。 Dan Becker：我們有一個 7,000 個查詢，7,000 個問答數據集的問題。 Dan Becker：我可以優化 RAG 來準確檢索和引用精確答案嗎？ Dan Becker：另外，如果我認為任何查詢與原始數據略有不同。 Dan Becker：我認為這實際上有兩個部分。一個是 Dan Becker：引用精確答案的部分不是信息檢索部分。 Dan Becker：而是你告訴 LLM 做什麼。 Dan Becker：但是 Dan Becker：信息檢索部分可能是。 Dan Becker：你看。 Dan Becker：你繼續。 Ben：我實際上會在即將到來的幻燈片中介紹如何更好地處理上下文之外的事情。 Ben：這是。 Dan Becker：加上嗎？ Dan Becker：是的，為什麼你繼續？這些問題都沒有。 Ben：是的，我會回來說。 Ben：是的，完美。是的。 Ben：好的，下一個問題是，如果這在計算上非常高效。 Ben：這裡有一個明顯的權衡，那就是你的文檔完全不了解你的查詢，而你的查詢完全不了解你的文檔。 Ben：這意味著你非常依賴於它的訓練方式，基本上，如果你的查詢看起來與你的訓練數據有點不同，或者 Ben：如果有非常具體的信息會在某些文檔中而不是其他文檔中，雖然有時你想知道查詢是如何表達的，你想知道查詢在尋找什麼 Ben：當你編碼你的文檔時，以便它可以更傾向於你感興趣的信息。 Ben：這就是我們所說的重新排序。所以重新排序是我們在管道中會看到的另一個可怕階段，最常見的重新排序方式是使用我們稱之為交叉編碼器的東西。	圖片編號: 144 圖片時間: 720.00秒
Ben: and cross encoder is another one of the scary walls, like by encoder. That you feel should be like a very advanced concept. But it's actually very simple. This graph here represents the whole difference between them. Ben: The Brian Coder is basically this 2 column system that we describe where documents get encoded in that corner, queries get encoded in their own corner, and they only meet very, very late like you only do cosine similarity between vectors. Ben: but the documents never seen the query invite, that's our. Ben: The cross encoder is different. The cross encoder is a model that will take your document and your query together, so you're going to give it. Ben: But your document, or like sales of documents depending the type of model. But to keep it simple. We do it one by one, so you always give it a quaid document pair. Ben: and you put you through this cross and coder model, which is effectively a classifier with a single label, and the probability of the label being positive is what your model considers as how similar the documents are, or how relevant it is. Ben: This is extremely powerful, because it means that the model knows everything about what you're looking for when it's encoding the document, and can give you a very accurate score or a system more accurate score. Ben: The problem is that you can see how that wouldn't scale, because it's not very computationally realistic to Ben: compute this like query documents call for every single query document that every time you want to retrieve a document. Say, you've got like Wikipedia embedded Ben: you've got, I don't know. Like 10 million paragraphs, you're not gonna compute 10 million scores Ben: through a model for like choosing 300 million pound meters for every single document, are you? Ben: You would eventually reach out, return something, and it would be a very, very relevant document. But it will also take 15 min, which is probably not what you want in production. Ben: So you probably also have had, or you might also have had, if you're ready to retrieval of Ben：『交叉編碼器是另一個讓人望而生畏的技術，就像雙編碼器一樣。你會覺得這應該是一個非常高級的概念，但實際上它非常簡單。這張圖表展示了它們之間的全部區別。』 Ben：『雙編碼器基本上是我們描述的這種雙列系統，文檔在那一邊被編碼，查詢在它自己的那一邊被編碼，它們只在非常晚的階段才會相遇，比如你只在向量之間做餘弦相似度計算。』 Ben：『但文檔從未見過查詢，這就是我們的雙編碼器。』 Ben：『交叉編碼器則不同。交叉編碼器是一個將你的文檔和查詢一起處理的模型，所以你會給它。』 Ben：『你的文檔，或者根據模型的類型，可能是多個文檔。但為了簡單起見，我們一個一個來，所以你總是給它一對查詢和文檔。』 Ben：『然後你通過這個交叉編碼器模型，這實際上是一個帶有單一標籤的分類器，標籤為正的概率就是你的模型認為文檔有多相似或多相關的指標。』 Ben：『這非常強大，因為這意味著模型在編碼文檔時知道你在尋找什麼，並且可以給你一個非常準確的分數或系統更準確的分數。』 Ben：『問題是你可以看到這樣做無法擴展，因為每次你想檢索文檔時，對每個查詢文檔計算這樣的查詢文檔分數在計算上是不現實的。』 Ben：『比如說，你有像 Wikipedia 這樣的嵌入，你有，不知道，像 1000 萬段落，你不會為每個文檔計算 1000 萬個分數。』 Ben：『通過一個模型來選擇每個文檔的 3 億個參數，是嗎？』 Ben：『你最終會返回一些東西，這將是一個非常非常相關的文檔。但這也會花費 15 分鐘，這可能不是你在生產環境中想要的。』 Ben：『所以你可能也有過，或者如果你已經準備好檢索的話，你可能也會有過。』	圖片編號: 164 圖片時間: 820.00秒
Ben: not hell at all if you're not into retrieval of other rerunking approaches like rank Gpt or Rank Lm. Using lms to run documents has been a big thing lately for people really into retrieval, you will know, if notified, etc. Ben: So those are not cross encoders. But that's not really relevant to us, because the core idea the same. And that's basically what we always do with rerunking in the pipeline. Ben: You use a powerful model that is computationally expensive to score. Only a subset of your documents, and that's why it's reranking and not ranking, because Ben: this can only work if you give it like. I don't know 1050, not more than that document. So you always have a 4 stage retrieval, which here is our vector search. Ben: And then the rerun card does the ranking for you. So it creates another list. Ben: There's a lot of ways to try those models out. Some of them have an Api base. So it's just an Api call to cohere Ben: some of them. You run your machine if you want to try them out. And this is basically the safe promotion per month. I do maintain that until the day I library just called reruncers with the QR code here. Ben: where it's basically a unified Api, so you can test any ranking method in your pipeline and swap them out for you. Ben: and that's what your pipeline looks like. Now, it's the same with just that one extra step at the end where you rerun things before getting your results. Ben：如果你不喜歡其他重新排序方法，比如 Rank Gpt 或 Rank Lm，那麼這根本不是地獄。最近，使用 LLM 來處理文檔對於真正熱衷於檢索的人來說是一件大事，你會知道的，如果被通知等等。 Ben：所以這些不是交叉編碼器。但這對我們來說並不重要，因為核心思想是一樣的。這基本上就是我們在管道中進行重新排序時所做的事情。 Ben：你使用一個計算成本高昂的強大模型來對你的文檔子集進行評分，這就是為什麼這是重新排序而不是排序，因為 Ben：這只能在你給它比如說 1050 個文檔，不超過這個數量的情況下工作。所以你總是有一個四階段的檢索，這裡是我們的向量搜索。 Ben：然後重新排序器為你進行排序。所以它會創建另一個列表。 Ben：有很多方法可以嘗試這些模型。其中一些有基於 Api 的，所以只需一個 Api 調用來協作 Ben：其中一些。如果你想嘗試它們，你可以在你的機器上運行它們。這基本上是每月的安全推廣。我會維護這個，直到我有一個叫做 reruncers 的庫，這裡有 QR 碼。 Ben：這基本上是一個統一的 Api，所以你可以在你的管道中測試任何排序方法，並為你替換它們。 Ben：這就是你的管道現在的樣子，只是在最後多了一個步驟，在獲取結果之前重新排序。	圖片編號: 179 圖片時間: 895.00秒
Ben: So we've added rerunking. But there's something else that's missing here, and that's something actually addresses the 1st question that is partially is that Ben：所以我們加入了重新排序。但是這裡還缺少了一些東西，這些東西實際上部分解決了第一個問題，那就是	圖片編號: 181 圖片時間: 905.00秒
Ben: the semantic search via embeddings is powerful, and I'm not saying, Don't choose. Vectors. Vectors are cool, like models are cool, deep learning is cool. Ben: but it's very, very hard if you think about it, because you're asking your model to take. I don't know 512 tokens even more. If you're doing a long context. And you're like, okay, put all of this into this one. Vector we are just using a single vector you've got like I don't know. 384, 1024 most floats, and that must represent all the information in this document. Ben: But naturally losing. There's no way you're going to keep all of the information here. Ben: and what you do when you're training on embedding is that you're teaching the embedding to replant information that is useful in that training. Ben: So the model doesn't learn to represent all of the documents information, because that's pretty much impossible, since our meetings are essentially a form of compression. Ben: What the model actually learn is to replant the information that is useful to the training queries. So your training data is very, very important here. It's like replanting the documents in the way that will help you use the queries in the weather phrase in your training data to retrieve a given document. Ben: So when you use that on your own data, it's likely that you're going to be missing some information, or when you go slightly out of distribution. Ben: There's another thing which is, humans love to use keywords, especially if you're like going into the legal domain, the biomedical domain, anything specific. Ben: We have a lot of acronyms that might not even be in the training data. But we use a lot of acronyms. We use a lot of like very advanced medical walls like people love jargon. People love to use technical walls, because those are very, very useful. Ben: and that's why you should. And I know it sounds like I'm talking from the 70 s. Because that's actually a method from the 70 s. But you should always have keyword search in your pipeline. You should always also have full text search on top of, like anything that you do with vector. Ben：透過嵌入進行語義搜索是非常強大的，我不是說不要選擇向量。向量很酷，模型很酷，深度學習很酷。 Ben：但如果你仔細想想，這其實非常非常困難，因為你要求你的模型處理。我不知道，512 個 tokens，甚至更多。如果你在處理長上下文。然後你會說，好吧，把所有這些放進這個向量中。我們只使用一個向量，你可能有，我不知道，384，1024 個浮點數，這些必須代表這個文檔中的所有信息。 Ben：但自然會丟失。你不可能保留這裡的所有信息。 Ben：當你在嵌入上進行訓練時，你所做的是教嵌入重新植入在訓練中有用的信息。 Ben：所以模型不會學習代表所有文檔的信息，因為這幾乎是不可能的，因為我們的嵌入本質上是一種壓縮形式。 Ben：模型實際上學到的是重新植入對訓練查詢有用的信息。所以你的訓練數據在這裡非常非常重要。這就像以一種能幫助你使用訓練數據中的查詢來檢索給定文檔的方式重新植入文檔。 Ben：所以當你在自己的數據上使用它時，很可能會丟失一些信息，或者當你稍微偏離分佈時。 Ben：還有另一件事是，人類喜歡使用關鍵詞，特別是如果你進入法律領域、生物醫學領域、任何特定領域。 Ben：我們有很多縮寫，甚至可能不在訓練數據中。但我們使用了很多縮寫。我們使用了很多非常高級的醫學術語，人們喜歡行話。人們喜歡使用技術術語，因為這些非常非常有用。 Ben：這就是為什麼你應該。我知道這聽起來像是我在說70年代的事情。因為這實際上是70年代的方法。但你應該總是在你的管道中有關鍵詞搜索。你應該總是在你用向量做的任何事情之上也有全文搜索。	圖片編號: 203 圖片時間: 1015.00秒
Ben: and keyword search, which you can call full text search, or like. Tf, Id. Fbm. 25. It's powered by what we call tf, idf. Ben: which is a very basic Nlp concept that essentially Ben: stands for term frequency inverse document frequency, and it assigns every single world in a document, or like group of words, because sometimes we do them 2 by 2, Ben: a 3 by 3 even it gives them a weight based on how rare they are, so like a wall that appears everywhere like a V or A as a very, very small weight, and a wall that's like highly specific to certain documents at a very high weight. Ben: And the main method to use tf, idf for retrieval is called bm, 25, Ben: which stands for best matching 25. It was invented in the seventies. It's been updated since then, but basically just been iterations of it. Ben: and you'll often hear Ir research. I'll say that the reason that the fields not taken off like Nlp has or computer vision has is because the baseline is just too good, like, we're still competing with the M. 25. And though it's been 50 years now. Ben: my God, it's been 50 years. Yeah. So the M. 25 existed. For, like, basically my entire lifetime before my birth. And it's still using production pipeline today. That's how good it is. Ben: And the good thing is, it's just world content with a match, with like awaiting Ben: formula. So the compute time is virtually unnoticeable like you can add that to your pipeline, you will absolutely never fit it. Ben: and I know I said I wouldn't add anything from papers, but I feel like, because I'm making a very strong claim that this method from 70 is strong. I should add a table, and at the table, from the bare paper. Ben：以及關鍵詞搜索，你可以稱之為全文搜索，或者像 Tf, Id. Fbm. 25。它是由我們所稱的 tf, idf 驅動的。 Ben：這是一個非常基本的 NLP 概念，本質上 Ben：代表詞頻逆文檔頻率，它為文檔中的每個單詞或詞組分配權重，因為有時我們會兩兩分組， Ben：甚至三三分組，根據它們的稀有程度給予權重，所以像「the」或「a」這樣到處出現的詞權重非常小，而特定文檔中特有的詞權重非常高。 Ben：使用 tf, idf 進行檢索的主要方法稱為 bm, 25， Ben：代表最佳匹配 25。它在七十年代發明，之後進行了更新，但基本上只是它的迭代版本。 Ben：你經常會聽到 IR 研究人員說，這個領域沒有像 NLP 或計算機視覺那樣起飛的原因是因為基線太好了，我們仍在與 BM25 競爭。雖然已經過了 50 年。 Ben：天啊，已經 50 年了。是的，所以 BM25 存在了基本上我一生的時間，甚至在我出生之前。它今天仍在生產管道中使用。這就是它的優秀之處。 Ben：好處是，它只是單詞內容與權重公式的匹配 Ben：所以計算時間幾乎可以忽略不計，你可以將其添加到你的管道中，絕對不會感覺到它的存在。 Ben：我知道我說過不會從論文中添加任何東西，但我覺得，因為我在做一個非常強烈的聲明，這個七十年代的方法很強大，我應該添加一個表格，並從論文中添加表格。	圖片編號: 221 圖片時間: 1105.00秒
Ben: which is the retrieval part of Mtb. Which is likely the main embeddings benchmark. Ben: And they compared it to Ben: a lot of models that were very popular for retrieval, like Dpr. And very strong vector retrievers. Ben: And basically you can see that unless you go into very over train embeddings like E. 5 B. Ge. Ben: The M. 25 is competitive with virtually all deep learning based approaches, at least at the time of the paper, which was only just 3 years ago. Ben: We now Ben: have embeddings that are better, but we don't have any embeddings that are better to the point where they're not met better by being using conjunction with Bm. 25. Ben: Though. Ben: knowing that this is how you want your pipeline to look, you'll notice that there's now a whole new pathway for both the query and the documents. Go on top of being uncoded by the embedded, also encoded by Tf. Idf. Ben：這是 Mtb 的檢索部分。這可能是主要的嵌入基準。 Ben：他們將其與 Ben：許多非常受歡迎的檢索模型進行了比較，比如 Dpr 和非常強大的向量檢索器。 Ben：基本上你可以看到，除非你使用非常過度訓練的嵌入模型，比如 E. 5 B. Ge。 Ben：M. 25 與幾乎所有基於深度學習的方法競爭，至少在該論文發表的時候，這只是三年前的事。 Ben：我們現在 Ben：有更好的嵌入模型，但我們沒有任何嵌入模型好到不需要與 Bm. 25 結合使用的地步。 Ben：不過。 Ben：知道這是你希望你的管道看起來的樣子，你會注意到現在有一個全新的路徑，既適用於查詢也適用於文檔。除了被嵌入編碼外，還被 Tf. Idf 編碼。	圖片編號: 231 圖片時間: 1155.00秒
Ben: to get full text stash, and that will help you like retrieve keyword, etc. Humans use keywords in quiz all the time. It's something you should do Ben: at the end. You will combine the scores. Ben: It's you can do that in a lot of ways. I won't go into too much details. But what a lot of people do is give a weight of 0 point 7 to the cosine simulated score and 0 point 3 to the full text. Ben: But I'm pretty sure we could do a whole talk, for now, on different methods of comparing that. Ben: Okay, I do have 5 more minutes. So the last one that you want to add to a simple pipeline the thing that I think really completes your Mvp. Plus Ben：取得全文存儲，這將幫助你檢索關鍵詞等。人類在測驗中經常使用關鍵詞。這是你應該做的。 Ben：最後，你將結合這些分數。 Ben：你可以用很多方法來做到這一點。我不會詳細說明。但很多人會給餘弦相似度分數一個 0.7 的權重，給全文一個 0.3 的權重。 Ben：但我很確定我們可以就不同的比較方法做一整場演講。 Ben：好的，我還有 5 分鐘。所以最後一個你想要添加到簡單管道中的東西，我認為這真的完成了你的 MVP。	圖片編號: 238 圖片時間: 1190.00秒
Ben: is Ben: using metadata and using metadata filtering because academic benchmarks don't, because Academy in academic benchmarks. Documents exist mostly in a vacuum like they don't exist in the real world. Ben: They're not tied to a specific company, etc. When you're using Ragin production. Ben: it's very, very rare that someone comes to you and says, these documents came to me in a dream and caught them like they came from somewhere. They've been generated by a department. They've been generated for a reason. They might be all the excel sheets or whatever, but they are like. They have business sense, or they have like in contact sense. Ben: And the metadata is actually sometimes a lot more informative than the document content, especially in rag contexts. So if you take the query here, which is. Ben: can you get me the cruise division financial report for Q. 4, 22. Ben: There's a lot of ways in which this can go wrong if you're Ben: if you're just looking at it from like the semantic, or even using keywords aspect. So Ben: when you say, when you see like this, the model must capture the financial report. So you, the model, must figure out you want a financial report, but also cruise division Q. 4 and 2022, and embedding models are bad at numbers. Ben: so you might get a financial report that may be for another division, or maybe for the Cruise Division of 1998. It's very hard to just hope that your vector will capture all of this. Ben: But there's another Federal case which will happen, especially with weaker Lms. If that if you just have, like Ben: top top chaos in top 5 documents, and you retrieve the top 5 documents for your query. Ben: Even if your model is very good, if you just let it retrieve the top 5 documents, no matter what. You will end up with financial reports, at least 5 of them, and does most likely only one function for 22. Ben: So at that point. You're just passing all 5 to the mother, and being, like good luck, use the right one, which might confuse it, especially because tables can be held, etc. Ben: And Ben: so I'm not saying that your vector such will fail. But statistically, it will like. In most cases it will fail. And if you don't fail for this query, it will fail for a similar one. Ben：是 Ben：使用元數據和使用元數據過濾，因為學術基準測試不這樣做，因為在學術基準測試中，文檔大多數是獨立存在的，就像它們不存在於現實世界中一樣。 Ben：它們不與特定公司等相關聯。當你在生產中使用 RAG 時。 Ben：非常非常罕見有人會來找你說，這些文檔是我在夢中得到的，並且像它們來自某個地方一樣。它們是由某個部門生成的。它們是有原因生成的。它們可能是所有的 excel 表格或其他什麼，但它們是有商業意義的，或者它們有上下文意義。 Ben：而且在 RAG 的情境中，元數據有時實際上比文檔內容更具信息量。所以如果你在這裡提出查詢，比如。 Ben：你能給我 2022 年第四季度的郵輪部門財務報告嗎？ Ben：如果你只是從語義或甚至使用關鍵詞的角度來看，這有很多可能出錯的地方。所以 Ben：當你說，當你看到這樣的查詢時，模型必須捕捉到財務報告。所以你，模型，必須弄清楚你想要的是財務報告，但也要包括郵輪部門、第四季度和 2022 年，而嵌入模型在處理數字方面很差。 Ben：所以你可能會得到另一個部門的財務報告，或者可能是 1998 年的郵輪部門的財務報告。僅僅希望你的向量能捕捉到所有這些是非常困難的。 Ben：但還有另一個情況會發生，特別是對於較弱的 LLM。如果你只是有，比如 Ben：前 5 個文件中的前 5 個混亂，並且你為你的查詢檢索到前 5 個文件。 Ben：即使你的模型非常好，如果你只是讓它檢索前 5 個文件，不管怎樣。你最終會得到至少 5 份財務報告，而其中最有可能只有一份是 2022 年的。 Ben：所以在那個時候。你只是將所有 5 份文件傳遞給模型，並且像祝你好運一樣使用正確的那一份，這可能會讓它困惑，特別是因為表格可能會很麻煩等。 Ben：而且 Ben：所以我不是說你的向量檢索會失敗。但從統計上來說，它會失敗。在大多數情況下它會失敗。如果這個查詢不會失敗，它會在類似的查詢中失敗。	圖片編號: 262 圖片時間: 1310.00秒
Ben: But that's actually, very, very easy to mitigate. You just have to like, think outside of the vector and just use more traditional methods. Ben: And you can use entity detection models. And one that's very good for this is gleaner, which is a very recent model that does basically 0 shot entity detection. So you give it arbitrary entity types. So like document, type, temp period. And the department. Ben: And this is like a live thing of clean, or you can run the demo Ben: on the bottom. But here we just extract financial report, empire and department. Ben: and when you generate your database for rag. All you need to do is basically specify the time period. So when you get an excel sheet, you will just pass the name for it, or pass the date in it, and give metadata 2024. Q. 2. Ben: Like Q. 4. So it's 2022. Q. 2. The Q. 4. Ben: Okay, mixed up there. And then you just need to ensure that this is stored alongside your document, and at quite time you can always pre filter your document set to only query things that make sense. Ben: So you will only query documents for this relevant time period. So you ensure that even if you give your model the wrong thing, it will at least be as the right timeframe, so it can maybe try and make sense of it. Ben: And with this final component. This is what your pipeline looks like. You can see the new component here, which is meta filtering which doesn't apply to queries. Queries go right through it. Ben：但這其實非常非常容易解決。你只需要跳出向量的思維，使用更傳統的方法。 Ben：你可以使用實體檢測模型。其中一個非常適合這個用途的是 gleaner，這是一個非常新的模型，基本上可以進行零樣本實體檢測。所以你給它任意的實體類型，比如文件、類型、時間段和部門。 Ben：這就像是一個實時的 clean，你可以運行這個演示 Ben：在底部。但這裡我們只提取財務報告、帝國和部門。 Ben：當你為 RAG 生成數據庫時，你所需要做的基本上就是指定時間段。所以當你得到一個 Excel 表時，你只需要傳遞它的名稱，或者傳遞其中的日期，並給出元數據 2024.Q2。 Ben：比如 Q4。所以是 2022.Q2。Q4。 Ben：好，這裡有點混淆了。然後你只需要確保這些信息與你的文件一起存儲，並且在查詢時你可以總是預先過濾你的文件集，只查詢有意義的內容。 Ben：所以你只會查詢這個相關時間段的文件。這樣即使你給你的模型錯誤的東西，它至少會在正確的時間範圍內，所以它可能會試著理解。 Ben：有了這個最終組件，這就是你的管道的樣子。你可以看到這裡的新組件，即元過濾，它不適用於查詢。查詢直接通過它。	圖片編號: 277 圖片時間: 1385.00秒
Ben: But documents get filtered by that, and we won't perform search on documents that will not meet the metadata that we want. Ben: And okay, I do agree that this looks a lot scalier than the friendly one at the start, which just that you want better and then cosine similarity search and the results. Ben: But Ben: it is actually not very scary. This is your full pipeline, this implements, everything we've just talked about. It's about 25 lines of code. If you remove the commands. Ben：但是文件會因此被過濾，我們不會在不符合我們所需元數據的文件上進行搜索。 Ben：好的，我同意這看起來比一開始的友好版本要複雜得多，因為你想要更好的結果，然後進行餘弦相似度搜索。 Ben：但是 Ben：其實並不那麼可怕。這是你的完整管道，這實現了我們剛剛談到的所有內容。大約有 25 行代碼。如果你去掉註釋。	圖片編號: 282 圖片時間: 1410.00秒
Ben: It does look a bit more unfriendly, because there's a lot more like moving parts. I think there's a lot more steps. But Ben: if you want, we can just break it down a bit further. And so we use lens. dB, for this. Ben: And this is not necessarily an endorsement of lance Dbs of vector, dB, although I do like lens. dB, because it makes all of these components, which are very important, very, very easy to use. Ben：『這看起來確實有點不太友好，因為有很多移動的部分。我認為有很多步驟。但是』 Ben：『如果你願意，我們可以進一步分解。所以我們使用 lens. dB 來做這個。』 Ben：『這不一定是對 lance Dbs 或 vector, dB 的認可，儘管我確實喜歡 lens. dB，因為它使所有這些非常重要的組件變得非常非常容易使用。』	圖片編號: 286 圖片時間: 1430.00秒
Ben: But I don't. I tried not to take side in like the vector dB, was because I've used. We have yet. I've used chroma. I've used lands. dB, have you spend Con. They all have that place. But I think, lance dB, if you're trying to build an Mvp is the one should always use phone Vps right now because it had those components built in. Ben: And here you can see just how easy it actually is. So we still load the bank holder just in a slightly different way, same as earlier Ben: with the final document, metadata. Yeah, it's just a string category. If it could be a timestamp it could be just about anything. Ben: Then we uncoord documents just like we did previously. And that's here we've created. So it's not an index. This is still a hard search. This is not an approximate search. Ben: then we create a full text search index, which is generating those tf, idf, so why mentioned before we give away to every single time in the documents. Ben: Then we load the rerun car. Here. We're using the coherer because it's simple to use an Api. Ben: and at the very end you've just got your way and your search where we restrict it to the Category Equals film. So we will only ever search into the document. That's about a film. No, but an author, no, but a director. Ben: We get the top 10 results, and we just have a quick ranking step. Ben: and that's pretty much it. We've taken the pipeline at the start, which only add the bangkok, or component to a pipeline that now has the bank color component metadata filtering Ben: full text search and arianco at the end. So we've had that, like, basically the Ben: 4 most important components of retry into a single pipeline. And it really don't take much more space in your code. Ben: And Ben: yeah, that is pretty much the end of this talk. So there's a lot more to cover in rag. This is definitely not Ben：但我沒有。我試著不在向量資料庫中選邊站，因為我已經使用過。我們還沒有。我用過 Chroma。我用過 Lands 資料庫，你有花時間在 Con 上。他們都有自己的位置。但我認為，如果你想建立一個 MVP，現在應該總是使用 Lance 資料庫，因為它內建了那些組件。 Ben：在這裡你可以看到實際上有多容易。所以我們仍然以稍微不同的方式加載銀行持有人，和之前一樣 Ben：使用最終的文檔和元數據。是的，這只是一個字符串類別。如果它可以是時間戳，它可以是任何東西。 Ben：然後我們像之前一樣解碼文檔。這裡我們創建的。所以這不是一個索引。這仍然是一個硬搜索。這不是一個近似搜索。 Ben：然後我們創建一個全文搜索索引，這會生成那些 tf-idf，所以我之前提到我們給每個文檔中的每個時間分配權重。 Ben：然後我們加載重新運行的車輛。這裡我們使用 coherer，因為它使用 Api 很簡單。 Ben：最後你只需要你的方式和搜索，我們將其限制在類別等於電影。所以我們只會搜索關於電影的文檔。不是作者，不是導演。 Ben：我們得到前 10 個結果，然後進行快速排名步驟。 Ben：這幾乎就是全部。我們從一開始的管道，只添加了 Bangkok 或組件，到現在的管道，包含了 Bangkok 組件、元數據過濾、全文搜索和 arianco。所以我們基本上將 Ben：檢索的四個最重要的組件整合到一個管道中。而且它真的不會佔用你代碼中的更多空間。 Ben：是的，這幾乎就是這次演講的結尾。所以在 RAG 中還有很多內容要覆蓋。這絕對不是	圖片編號: 306 圖片時間: 1530.00秒
Ben: the full call off like. But this is the most important thing like this is what you need to know about how to make a good pipeline very quickly. Ben: All the other improvements are very, very valuable, but they have a decreasing cost, effort, ratio. This takes virtually no effort to put in place Ben: definitely worth learning about spouse methods. Multi victor methods, because they are very adapted to a lot of situations. Call bear, for instance, is very strong out of domain. Spar is very strong in domain. Ben: You should watch Jason's talk about rack systems and Joe's upcoming talk about retrieval evaluations, because those are Ben: like your traffic of the most important things. And yeah, any questions. Now. Dan Becker: Hamil and I were Dan Becker: just messaging, saying Dan Becker: we love this talk at like. It's everything is Dan Becker: presented so clearly. Dan Becker: So. Dan Becker: we've also got Dan Becker: quite a few questions. I'm happy to. Hamel Husain: My favorite talk so far. Hamel Husain: So you know, that's big favorites. But yeah. Ben: Thank you. Tom. Dan Becker: So go ahead. Hamel Husain: Okay, questions. You were gonna pick. Did you have one that you were looking at already, Dan? Hamel Husain: I can try it. Dan Becker: Yeah, we've got one that I quite like in the way that you fine tune your buy encoder model affects how you should approach fine tune if you're cross, encoder, and vice versa. Ben: Yes, I don't think I can give like a really comprehensive answer. It will really depend on your domain which would generally want them to be complementary. So if you've got, if you're in a situation where you've got like the compute and the data to function both. Ben: you always want to Ben: by encoder to be a bit more loose like you wanted to retrieve potential candidates, and then you want to trust your rerun card like your cross encoder, to actually do the filtrain. Ben: So if you're going to use both and have full control over both, you might want to Ben: intruding in a way that would basically make sure that your top K candidates can be a bit more representative and trust the Ryanka. Dan Becker: Yeah, let me ask. This wasn't an audience question, but Dan Becker: related question. Dan Becker: You showed us Dan Becker: where the when you choose questions to feed into the reranker. That's sort of a weighted average Dan Becker: of what you get from the Dan Becker: tf, idf, or bm, 25, with what you get from the just simple vector search. Dan Becker: What do you think of as the advantages disadvantage of that over saying, we're gonna take the top X from one cat, from one of the Dan Becker: rancors and the top X from the others. And that way, if you think Dan Becker: one of these is is for some questions especially bad. You have a way of short, short circuiting. Its influence on what gets sent to the Dan Becker: the re-recker. Ben: Yeah, I think that also makes complete sense. And that's another also cop out. And so I'll use a lot. But that also depends a lot on your data like a lot of the time you want to look at what's your actual context and how it's actually being used. Because in some situations that actually works better like, especially if you walk with Ben: biomedical data Ben: because there's so much like specific documents. It's quite often the embedding won't be that amazing on some questions. So you just want to take the top 5 from both and get the Ryanka to do it quickly. The Ryanka is quite aware. Ben: So it's a perfectly valid approach to combine them that way. Yeah. Dan Becker: You wanna pick your question? Hamil. Hamel Husain: yeah, I've been looking through them. You guys been Hamel Husain: okay? Jeremy is asking, can we link get a link to the code? Example? Ben: With. Hamel Husain: Your slides in Maven. We could also. Can I share your slides in discord as well? Ben. Ben: Oh, yes. Please. Yeah. Hamel Husain: I'll go ahead and share slides and discord. Ben: And I'll share, like the Github gist, for the code examples of photoch. Dan Becker: And I'll embed the a link to the slides in Ben：這是最重要的事情，這是你需要知道如何快速建立一個好的管道。 Ben：所有其他的改進都非常有價值，但它們的成本、努力和比率是遞減的。這幾乎不需要任何努力就能實施。 Ben：絕對值得學習配偶方法、多向量方法，因為它們適應很多情況。例如，ColBERT 在域外非常強大，Spar 在域內非常強大。 Ben：你應該觀看 Jason 關於 RAG 系統的演講和 Joe 即將到來的關於檢索評估的演講，因為這些是 Ben：最重要的事情。還有什麼問題嗎？ Dan Becker：Hamil 和我剛剛在 Dan Becker：發消息說 Dan Becker：我們非常喜歡這次演講，一切都 Dan Becker：呈現得非常清楚。 Dan Becker：所以， Dan Becker：我們也有很多問題。我很高興能回答。 Hamel Husain：到目前為止我最喜歡的演講。 Hamel Husain：所以你知道，這是大愛。但對。 Ben：謝謝你，Tom。 Dan Becker：那麼繼續吧。 Hamel Husain：好的，問題。你要選哪個？你已經有在看的問題嗎，Dan？ Hamel Husain：我可以試試。 Dan Becker：是的，我們有一個我很喜歡的問題，就是你如何微調雙編碼器模型會影響你應該如何微調交叉編碼器，反之亦然。 Ben：是的，我不認為我能給出一個非常全面的答案，這真的取決於你的領域，你通常希望它們是互補的。所以如果你有計算資源和數據來運行兩者， Ben：你總是希望雙編碼器更鬆散一點，比如你希望它檢索潛在的候選者，然後你希望信任你的重新排序器，比如你的交叉編碼器，來實際進行篩選。 Ben：所以如果你要使用兩者並完全控制兩者，你可能希望 Ben：以一種方式進行訓練，基本上確保你的前 K 名候選者能夠更具代表性，並信任重新排序器。 Dan Becker：是的，讓我問一個這不是觀眾的問題，但 Dan Becker：相關的問題。 Dan Becker：你向我們展示了 Dan Becker：當你選擇問題來餵給重新排序器時，那是一種加權平均 Dan Becker：從 tf-idf 或 bm25 獲得的結果與簡單向量搜索獲得的結果。 Dan Becker：你認為這種方法的優缺點是什麼，而不是說我們要從一個排序器中選擇前 X 名，從另一個排序器中選擇前 X 名。這樣，如果你認為 Dan Becker：其中一個對某些問題特別糟糕，你有辦法短路它對重新排序器的影響。 Ben：是的，我認為這也完全有道理。這也是一個常見的做法，這也取決於你的數據。很多時候你需要看看你的實際上下文以及它實際是如何使用的。因為在某些情況下，這種方法實際上效果更好，特別是如果你處理的是 Ben：生物醫學數據 Ben：因為有很多特定的文檔，嵌入在某些問題上的效果可能不那麼好。所以你只想從兩者中選擇前 5 名，讓重新排序器快速處理。重新排序器非常靈活。 Ben：所以這是一種完全有效的方法來結合它們。是的。 Dan Becker：你要選你的問題嗎，Hamil。 Hamel Husain：是的，我一直在看。你們還好嗎？ Hamel Husain：Jeremy 問，我們能否獲得代碼示例的鏈接？ Ben：當然。 Hamel Husain：你的幻燈片在 Maven 上。我們也可以。我可以在 Discord 上分享你的幻燈片嗎，Ben？ Ben：哦，當然可以。是的。 Hamel Husain：我會在 Discord 上分享幻燈片。 Ben：我會分享 Github gist 的代碼示例。 Dan Becker：我會嵌入幻燈片的鏈接。	圖片編號: 356 圖片時間: 1780.00秒
Dan Becker: maven for people who are to talk some point deep into the future, and who might lose track of it in discord. Dan Becker: there's a question summary in here. I'll I'll find in a moment. But Dan Becker: we've got this question for Jason the speed. And then the speaker, just before you, Paige Bailey said, Dan Becker: rag, you know, in the world of Dan Becker: 1 million token context length is not going to be as important. What's your take on the relative of importance of rag in the future? Ben: So I'm still very hopeful about Rag in the future, and I think I see it as some Ben: some sort of like. So your Lm. To me is like your CPU, Ben: and your contacts window will be your RAM. And so like, even if you've got 32 gigs of RAM the bodies ever said, Yeah, throw away your hard drive. You don't need that like in a lot of context, you'll still want to have, like some sort of storage where you can retrieve the relevant documents. Having to use a long context window is not never gonna be a silver bullet just like rags. Never a silver bullet. Ben: But I'm actually really happy because it just means I can retrieve much longer documents and get logged Ben: more efficient rack systems, because to me it's a bit of a trade off where, if you've got longer context. Ben: it just means you've got a lot more freedom with how quick your retrieval system can be, because if it if you need to use top 10 or the top 15, that's fine, you can fit them in, whereas when you kind of lift it, the top 3 documents. You need your retrieval system to be really good, which might mean really slow. Ben: So yeah. Dan Becker: We had a question from widium. What? Dan Becker: What are your thoughts on different chunking strategies? Ben: I probably don't think about chunking as much as I should. I am very hopeful for Ben: 2 avenues using Lms. To Pre chunk. I don't think those walk very well right now. That's in my test. I've never been impressed. Ben: but Ben: also I do tend to use call bear more fund and bank orders and call bears a lot more resistant to chunking. So it's something that I don't care about as much. Ben: But generally I would try to. So my go to is always to chunk, based on like around 300 tokens per chunk and try to do it in a way where you never cut off, or sometimes in the middle, and always keep like the last 50 tokens in the next 50 tokens of the previous and next chunk. Ben: because information overlap is very useful to give content like, please don't be afraid to duplicate information in your chunks. Hamel Husain: I have a question about the buy encoder. Do you ever Hamel Husain: try to fine tune that Hamel Husain: we're using some kind of like label data Hamel Husain: to get that Hamel Husain: to be really good? Or do you usually kind of use that off the shelf Hamel Husain: and then use a re-ranker? And you know, how do you usually go about it? Or how do you make the trade off. Ben: So again, context dependant. But if you have data, you should always function. All you want us be the band code or the cross and coda, I think, call bear. Because single vector, you can get away with not functioning for a bit longer, because multi vector, so you can get away with not functioning for a bit longer. Ben: But if you have data, it's all about like basically the resources you have. So in this talk, we're doing an Mvp. Is something you can put together in an afternoon. Ben: If your company says you have $500, Ben: spend 480 of that on open AI to generate synthetic question and find you on your on Kodos. That will always get you better results Ben: like always venturing if you can. Ben: And so yes, so a couple of questions about fitting colbare in. And I'm using presenter executive decision to answer those. So call bear in this pipeline. Some people use it as a rerun car. Ben: But then that's Ben: not optimal. That's very much. When you don't want to have to change your existing pipeline. If you were to design the pipeline from scratch and wanted to use. Call back, you would have it instead of the Biancoda. Dan Becker：對於那些將來某個時候要深入討論的人來說，maven 可能會在 discord 中失去它的軌跡。 Dan Becker：這裡有一個問題摘要。我稍後會找到。但 Dan Becker：我們有一個關於 Jason 的速度的問題。然後在你之前的講者 Paige Bailey 說， Dan Becker：在 Dan Becker：一百萬個 token 的上下文長度的世界中，rag 不會那麼重要。你對 rag 在未來相對重要性的看法是什麼？ Ben：所以我對 Rag 的未來仍然非常有希望，我認為我把它看作是某種 Ben：某種像。你的 Lm 對我來說就像你的 CPU， Ben：而你的上下文窗口將是你的 RAM。所以，即使你有 32 GB 的 RAM，沒有人會說，對，扔掉你的硬盤。你不需要那個。在很多情況下，你仍然會想要有某種存儲設備來檢索相關的文檔。使用長上下文窗口永遠不會是一個萬能解決方案，就像 rag 永遠不是萬能解決方案一樣。 Ben：但我其實很高興，因為這意味著我可以檢索更長的文檔並獲得更高效的 rack 系統，因為對我來說這是一種權衡，如果你有更長的上下文， Ben：這意味著你在檢索系統的速度上有更多的自由，因為如果你需要使用前 10 或前 15，那沒問題，你可以把它們放進去，而當你限制在前 3 個文檔時，你需要你的檢索系統非常好，這可能意味著非常慢。 Ben：所以，是的。 Dan Becker：我們有一個來自 widium 的問題。什麼？ Dan Becker：你對不同的分塊策略有什麼看法？ Ben：我可能沒有像應該的那樣多考慮分塊。我對 Ben：使用 Lms 進行預分塊的兩個途徑非常有希望。我認為那些現在還不是很好用。在我的測試中，我從未印象深刻。 Ben：但 Ben：我也傾向於更多地使用 call bear 而不是 fund 和 bank orders，call bear 對分塊的抵抗力更強。所以這是我不太關心的事情。 Ben：但總的來說，我會嘗試。所以我的首選總是基於每塊大約 300 個 token 進行分塊，並嘗試以不會在中間切斷的方式進行，並且總是保留前一塊和下一塊的最後 50 個 token 和下一塊的前 50 個 token。 Ben：因為信息重疊對於提供內容非常有用，所以請不要害怕在你的塊中重複信息。 Hamel Husain：我有一個關於雙編碼器的問題。你有沒有 Hamel Husain：嘗試微調那個 Hamel Husain：我們使用某種標籤數據 Hamel Husain：來讓那個 Hamel Husain：變得非常好？還是你通常使用現成的 Hamel Husain：然後使用重新排序器？你通常怎麼做？或者你如何做出權衡？ Ben：所以，這取決於上下文。但如果你有數據，你應該總是進行微調。你想要的是雙編碼器還是交叉編碼器，我認為 call bear。因為單向量，你可以在不進行微調的情況下使用更長時間，因為多向量，所以你可以在不進行微調的情況下使用更長時間。 Ben：但如果你有數據，這完全取決於你擁有的資源。所以在這次演講中，我們正在做一個 Mvp。這是你可以在一個下午內組裝的東西。 Ben：如果你的公司說你有 500 美元， Ben：花 480 美元在 open AI 上生成合成問題並微調你的編碼器。這將始終為你帶來更好的結果 Ben：如果可以的話，總是進行微調。 Ben：所以，是的，有幾個關於 fitting colbare 的問題。我使用 presenter executive decision 來回答這些問題。所以在這個管道中使用 call bear。有些人將其用作重新排序器。 Ben：但那不是最佳的。那非常像是當你不想改變現有的管道時。如果你要從頭設計管道並想使用 call bear，你會將其放在雙編碼器的位置。	圖片編號: 406 圖片時間: 2030.00秒
Ben: and it will perform basically the same role as the bank order, which is 1st edge retrieval. Ben: and if you wanted to use call bear, and especially if you don't have the budget to function and need the rerunking step. Sometimes it can actually be better to use call bear. So rerun cost. Still. Ben: because the multi vector, approach can be better, like capturing keywords, etc. But that's very context dependent. Ben: So ideally, you would have it, as shown by encoder. Dan Becker: Or a a lot of people here probably aren't too familiar with Colbert Colbert. Can you? Dan Becker: Give the Dan Becker: quick summary of it. Ben: Yeah, sorry I got carried away, because so the question so callback is an approach which is effectively by encoder. But instead of cram cramming everything into a single document to a single vector. Ben: you represent each document as a bag of embeddings. So like. Ben: if you've got 100 tokens instead of having one big 124, vector you will have a lot of small 128 vectors. What? One for each token. Ben: And then you will call that at the end. You will do the same for the query. So if you're query 32 tokens, you will have 32 query token. Ben: and for each query token, you will compare it to every token in the document, and keep the highest call. Ben: and then you will sum up those highest scores, and that will be the score for that given document that's called maximality. Ben: And the reason that's so powerful is not because it does very well on data. It's been trained on. Actually like you can beat it with a normal bang coder. But it does very well at extrapolating to out of domain, because Ben: you just give the model so much more room to represent each token. It's contact. So it's much easier if, like, you're in a non familiar setting. You've not compressed as much information. Ben: and I do have self promotion. I do have a pretty cool call bear thing coming out later this week to compress the call bear space by reducing the tokens it actually needs to save by about 50 to 60% Ben: without losing any performance. So that's a bit of a tizer. But look forward to the block post. If you're interested. Dan Becker: And to find the blog post you suggest. People follow you on Twitter or. Ben: Yeah, definitely, for on the end without. Ben: since it was pretty much the only place where you can reliably reach me. Hamel Husain: Someone's asking, what are some good tools to fine tune embeddings Hamel Husain: for retrieval? Would you recommend Rag Reggae, or anything else like, what's your. Ben: I'd recommend sentence transformers, especially with the like. 3 point all videos. Recently, it's now much, much friendlier to use. And it's basically there's no need to reinvent the wheel. They've got all the basics implemented very well there. So sentence transformers. Dan Becker: Question from divvia. Dan Becker: Can you give any pointers on how one fine tunes their embedding model. Ben: Sorry. Can you repeat that I got Australian. Dan Becker: Yeah. The question is, can you give any pointers or describe the flow? For when you fine tune your embedding model. Ben: Okay? So that's probably a bit more involved than this talk. But essentially, when you mentioned your embedding model, what you'll want is queries. But you need to have quiz, and you need your documents. And you're gonna tell the model Ben: for this given query, these documents for advance. And for this given query, these documents natural events, because sometimes user triplet loss. And the triplet loss is what you will do when you have one positive document in one negative document. Ben: and you'll kind of be teaching the model. This is useful. This is not useful. Ben: and Ben: I'm not gonna go to Ben: done too much, because this rabbit hole can take you quite far. But sometimes, when you have triplets, you also want to use what we call harm negatives. Ben: which is, you want to actually use retrieval to generate your negative examples, because you want them to be quite close to what the positive example is, but not quite the right thing. Ben：它基本上會執行與銀行訂單相同的角色，即第一邊檢索。 Ben：如果你想使用 ColBERT，尤其是當你沒有預算來運行並且需要重新排序步驟時，有時實際上使用 ColBERT 會更好。所以重新排序的成本仍然存在。 Ben：因為多向量方法可以更好地捕捉關鍵詞等，但這非常依賴於上下文。 Ben：所以理想情況下，你會像編碼器那樣展示它。 Dan Becker：這裡可能有很多人對 ColBERT 不太熟悉。你能 Dan Becker：簡要概述一下嗎？ Ben：對不起，我有點跑題了。ColBERT 是一種有效的雙編碼器方法，但不是將所有內容壓縮到單個向量中。 Ben：你將每個文檔表示為嵌入的集合。所以，比如說， Ben：如果你有 100 個標記，與其擁有一個大的 124 向量，你會有很多小的 128 向量，每個標記一個。 Ben：然後你會在最後進行匯總。對於查詢也是如此。如果你的查詢有 32 個標記，你會有 32 個查詢標記。 Ben：對於每個查詢標記，你會將其與文檔中的每個標記進行比較，並保留最高分數。 Ben：然後你會將這些最高分數相加，這就是給定文檔的分數，這被稱為最大化。 Ben：這麼強大的原因不是因為它在訓練數據上表現很好，實際上你可以用普通的雙編碼器擊敗它。但它在域外推斷方面表現非常好，因為 Ben：你給模型更多的空間來表示每個標記的上下文。所以如果你在一個不熟悉的環境中，壓縮的信息就少得多。 Ben：我有一個自我推廣，我有一個很酷的 ColBERT 項目即將推出，通過減少實際需要保存的標記數量約 50% 到 60% 來壓縮 ColBERT 空間，而不會損失任何性能。所以這是一個預告，如果你感興趣的話，請期待博客文章。 Dan Becker：要找到這篇博客文章，你建議人們關注你的 Twitter 還是？ Ben：是的，絕對是 Twitter，因為那是唯一可以可靠地找到我的地方。 Hamel Husain：有人問，有哪些好的工具可以微調檢索嵌入？ Hamel Husain：你會推薦 RAG 還是其他什麼？ Ben：我推薦 Sentence Transformers，尤其是最近的 3.0 版本，現在使用起來更加友好。基本上不需要重新發明輪子，他們已經很好地實現了所有基本功能。所以 Sentence Transformers。 Dan Becker：來自 Divvia 的問題。 Dan Becker：你能給一些關於如何微調嵌入模型的建議嗎？ Ben：對不起，你能重複一下嗎？我剛剛走神了。 Dan Becker：問題是，你能給一些建議或描述一下微調嵌入模型的流程嗎？ Ben：這可能比這次演講更複雜，但基本上，當你微調嵌入模型時，你需要查詢和文檔。你會告訴模型 Ben：對於給定的查詢，這些文檔是相關的，而這些文檔是不相關的。有時候會使用三元組損失，這是當你有一個正樣本文檔和一個負樣本文檔時使用的。 Ben：你會教模型這是有用的，這是沒用的。 Ben：我不會深入探討太多，因為這個話題可以延伸得很遠。但有時候，當你有三元組時，你也想使用所謂的困難負樣本。 Ben：你會使用檢索來生成你的負樣本，因為你希望它們與正樣本非常接近，但不完全是正確的。	圖片編號: 454 圖片時間: 2270.00秒
Ben: because that's where you teach the model more, but was actually useful to answer your pray. Ben: So the workflow is probably, as always, look at your data, figure out what kind of quiz your user would actually be doing. If you don't have a user queries. Ben: go into production, write some, write some queries yourself, and give that to an Lm. Generate. More queries, and you can have a pretty solid pipeline like that. Hamel Husain: Someone's asking the discord. And I get this question all the time. Is, please share your thoughts on the graph rag. Ben: I have never actually done graph rag. I see this mentioned all the time, but it's not something that has come up for me at all, so I don't have strong files now. Ben: I think it's cool, but that's pretty much the full extent of my knowledge. Hamel Husain: Someone's asking. Okay. Hamel Husain: when? When you have long context windows. Hamel Husain: does that allow you to do something different with rag like Hamel Husain: retrieve longer documents, or do any other like different kinds of strategies Hamel Husain: than you were able to before? Or Hamel Husain: does it change anything? Hamel Husain: How you go about this? Yeah. Ben: Yeah, I think it's a bit why I mentioned before to me changes 2 main things. One is, I can use longer documents, which means I can use longer mothers, or I can like teach chunks together, because sometimes. Ben: sometimes, if your retrieval model isn't very good at retrieving long documents, which is often the case you might just want. If I go chunk from these documents, give the model the full document Ben: like, if I just get the chunk from it pass the full contacts, and you just hope the model is able to read it. And if you've got good long context model, it can. So it changes how you decide to feed the information into the model. Ben: And then the other aspect is like, I said, it changes the retrieval overhead, because if you need to be very good like, I was saying, if you need only the top 3 documents to be relevant, you're gonna spend a lot of time and money on your pipeline. Ben: if you're like. Oh, as long as my record at 10, or my record at 15 is good, that's fine. You're good. You can't afford to have like much light on others, and spend a lot less time and resources on retrieval. Ben: like there's a lot of diminishing returns in retrieval when getting a good record at 10. So like record at 10 is how likely you are to retrieve the relevant document in the 1st 10 results Ben: is generally very easy. Recorder 200 is very, very easy. Ben: and then record at 5. It's getting held on record at 3 and record at once. I like the really tough ones, because a lot of the training data is noisy. So it's gonna even be hard to know what a good record at one is. Ben: The longer context makes that irrelevant. And that's why it's great, for, like. Hamel Husain: Someone's asking, and I don't even know what this means. What's your view on hide versus react versus step back. Ben: I've only used react out of those. And so those are like Ben: agentic systems of function coding. So it's like to give you other them. The ability to call tools at this react is. Ben: I don't have strong thoughts on those in the context of rich reverse. So I can't really answer the question. Ben: Yeah, I think Ben: I would occasionally use react from the model to be able to trigger search itself. But I think that's still on the Panay of research. And I think Griffin from answer is also in the Ben: in the chat. And he's very interested in that is basically, how do you get a model to tell you that it doesn't know cause. Sometimes you don't need retrieval, the model that already knows. Sometimes you do need retrieval. Ben: but that's still a very open question like, How do you decide when to search Ben: no strong thoughts there? Yet. Dan Becker: There may or may, you may or may not have a good answer for this one. Is there an end to end? Project, open source project that someone could look at as a way to Dan Becker: see or evaluate the difference in result quality when they do result from just buying code or Mvp. And compare that to Dan Becker: the final compact. Mvp. Plus plus that you showed. Ben：因為那是你教模型更多的地方，但實際上對回答你的祈禱有用。 Ben：所以工作流程可能是，一如既往，查看你的數據，弄清楚你的用戶實際上會做什麼樣的測驗。如果你沒有用戶查詢。 Ben：進入生產環境，自己寫一些查詢，然後給 LLM 生成更多查詢，你可以有一個相當穩固的管道。 Hamel Husain：有人在 Discord 上問這個問題。我經常被問到這個問題，請分享你對圖形 RAG 的看法。 Ben：我實際上從未做過圖形 RAG。我經常看到這個被提及，但這對我來說從來沒有出現過，所以我現在沒有強烈的看法。 Ben：我覺得這很酷，但這幾乎是我知識的全部範圍。 Hamel Husain：有人在問。好吧。 Hamel Husain：當你有長上下文窗口時。 Hamel Husain：這是否允許你用 RAG 做一些不同的事情，比如 Hamel Husain：檢索更長的文檔，或者做任何其他不同類型的策略 Hamel Husain：比你以前能做的更多？或者 Hamel Husain：這改變了什麼嗎？ Hamel Husain：你是怎麼處理這個問題的？是的。 Ben：是的，我認為這有點像我之前提到的，對我來說改變了兩個主要方面。一是，我可以使用更長的文檔，這意味著我可以使用更長的模型，或者我可以將塊結合在一起，因為有時候。 Ben：有時候，如果你的檢索模型不太擅長檢索長文檔，這通常是情況，你可能只想從這些文檔中提取一個塊，給模型完整的文檔 Ben：比如，如果我只是從中提取一個塊，傳遞完整的上下文，你只希望模型能夠讀取它。如果你有好的長上下文模型，它可以。所以這改變了你決定如何將信息餵給模型的方式。 Ben：然後另一個方面是，就像我說的，它改變了檢索的開銷，因為如果你需要非常好，就像我說的，如果你只需要前 3 個文檔是相關的，你會在你的管道上花費大量的時間和金錢。 Ben：如果你覺得，只要我的前 10 名或前 15 名是好的，那就沒問題。你可以負擔得起在其他方面花費更少的時間和資源。 Ben：在檢索中有很多收益遞減的情況，當獲得一個好的前 10 名時。所以前 10 名是你在前 10 個結果中檢索到相關文檔的可能性 Ben：通常非常容易。前 200 名非常非常容易。 Ben：然後前 5 名變得困難，前 3 名和前 1 名是最難的，因為很多訓練數據是噪音。所以甚至很難知道什麼是好的前 1 名。 Ben：更長的上下文使這變得無關緊要。這就是為什麼它對於像這樣的情況非常棒。 Hamel Husain：有人在問，我甚至不知道這意味著什麼。你對 hide 和 react 和 step back 有什麼看法。 Ben：在這些中我只使用過 react。所以這些就像 Ben：代理系統的功能編碼。所以這就像給你其他的能力來調用工具，這個 react 是。 Ben：在豐富的反向上下文中，我對這些沒有強烈的看法。所以我不能真正回答這個問題。 Ben：是的，我認為 Ben：我偶爾會使用 react 讓模型能夠觸發搜索本身。但我認為這仍然在研究的範圍內。我認為 Griffin 也在 Ben：在聊天中。他對此非常感興趣，基本上是，如何讓模型告訴你它不知道，因為有時你不需要檢索，模型已經知道。有時你確實需要檢索。 Ben：但這仍然是一個非常開放的問題，比如，如何決定何時搜索 Ben：目前還沒有強烈的看法。 Dan Becker：這可能有，也可能沒有一個好的答案。是否有一個端到端的開源項目，讓人們可以查看 Dan Becker：或評估當他們僅僅通過購買代碼或 MVP 進行結果時的結果質量差異。並將其與 Dan Becker：你展示的最終緊湊 MVP Plus Plus 進行比較。	圖片編號: 506 圖片時間: 2530.00秒
Ben: no, actually, that's a very good point. I don't think there is one that systematically go through every step. Ben: and that's probably something that I would like to build at some point or find one, because. Ben: or just like most things in retrieval, everything is kind of com convention, always done like you've seen it piece. And this is in a lot of projects, and you just know that that's how it is. Ben: But unless you dig deep into the papers, or like, do it yourself. It's quite rare to find very good resources showing that. Dan Becker: Related question, do you have a tutorial that you typically point to people to Dan Becker: on fine-tuning fine, tuning their encoder. Ben: That would be the sentence transformers documentation. But it's not the friendliest tutorial. So that's a half answer. Ben: That's why we punch you to, but still a bit like up to gain, to sadly. Hamel Husain: Wade is asking if you have Goto embedding models. Ben: like, my, go to these days when I'm demoing something, you do cohere one because it's nice to be able to walk with an Api. It works really well, it's cheap. Ben: But Ben: but other than that, though I would just call bear. If I'm using something in my own pipeline, I would just like Ben: multi vectors. But it really depends on the used case, cause you would often find that, like some things, walk well for you, and some things, don't. Ben: I do have strong opinions on not using. So if you go to the Mtv. Double board, which is the embedding leaderboard right now, you'll see Ben: a lot of Lms as Ben: encoders, and I would advise again that because the latency isn't worth it don't need 7 billion parameters to uncode stuff. Ben: and at least some of the early ones actually generalize, was like this, remember, from Cohere at the really interesting table where the E 5 mist road was worth an E 5 large, despite being 7 times as big. Ben: So probably just stick to like the small ones between like 100, and at most a billion parameters. But that would be my only advice about that. Ben: Try all the good ones like Gt. Bg, E. 5. Hamel Husain: Chris Levy is asking Hamel Husain: this question about elastic search, which I also get quite a lot. Hamel Husain: So yes, anyone here have experience building rag application with just keyword. BM. 25. As a retriever at work. It makes use of elastic search. Hamel Husain: And you said it's all over the tech stack, like people are already using elastic search. Hamel Husain: Is there? Basically, he's asking, is there a way to keep using elastic search with rag that you know about, or that you've encountered? Or do you mainly use like Hamel Husain: vector database like lance dB, and things like that? Have you tried seeing people using elastic search and Hamel Husain: trying to bootstrap off of that. Ben: Yeah, I've used elastic search a bit, and it's perfectly possible you do lose. Obviously the semantic search aspect, although I think now elastic search. Has a vector dB offering so you could add vectors to it. Ben: You could always plug in. You could always just do bm, 25, and then plug in the rerun car at the end. That's often, if you read papers on like cross encoders. Ben: Ben Ali, the way they evaluate them is actually doing just that like do Bm, 25. To retrieve 50 to 100 documents and then run them, using the rerun car. Ben: which, if you can afford to just set up your rerunking pipeline or call the cohere. Api is a really good way to go about it, because you don't need to embed your whole documents to sample how good it would be with deep learning. Ben: because there are domains where you do not need the planning. The M. 25 is still good enough in some bits. Ben：不，其實，這是一個非常好的觀點。我不認為有一個系統地涵蓋每一步的資源。 Ben：這可能是我想在某個時候建立或找到的東西，因為。 Ben：或者就像檢索中的大多數事情一樣，一切都有點像慣例，總是像你看到的那樣分散在各個項目中，你只是知道事情就是這樣。 Ben：但除非你深入研究論文，或者自己動手做，否則很難找到非常好的資源來展示這一點。 Dan Becker：相關問題，你有沒有一個通常會指導人們的教程 Dan Becker：關於微調他們的編碼器。 Ben：那會是 sentence transformers 的文檔。但這不是最友好的教程。所以這是一個不完全的答案。 Ben：這就是我們指導你的地方，但仍然有點像在增益中，遺憾的是。 Hamel Husain：Wade 問你是否有 Goto 嵌入模型。 Ben：像是，我這些天在演示某些東西時會用 cohere 的一個，因為能夠使用 Api 很好。它運行得很好，價格便宜。 Ben：但是 Ben：除此之外，如果我在自己的管道中使用某些東西，我會選擇 ColBERT。我會喜歡 Ben：多向量。但這真的取決於使用案例，因為你會發現，有些東西對你來說效果很好，而有些東西則不然。 Ben：我對不使用有強烈的看法。所以如果你去 Mtv. Double board，這是目前的嵌入排行榜，你會看到 Ben：很多 LLM 作為 Ben：編碼器，我會建議不要這樣做，因為延遲不值得，你不需要 70 億個參數來編碼東西。 Ben：至少一些早期的模型實際上是通用的，記得 Cohere 有一個非常有趣的表格，其中 E 5 mist road 的效果相當於 E 5 large，儘管它大了 7 倍。 Ben：所以可能只堅持使用像 100 到最多 10 億個參數的小模型。但這是我唯一的建議。 Ben：嘗試所有好的模型，如 Gt. Bg, E. 5。 Hamel Husain：Chris Levy 問 Hamel Husain：這個關於 elastic search 的問題，我也經常遇到。 Hamel Husain：所以，是的，這裡有人有使用僅關鍵詞 BM. 25 作為檢索器來構建 RAG 應用的經驗嗎？它利用了 elastic search。 Hamel Husain：你說它遍佈技術堆棧，就像人們已經在使用 elastic search。 Hamel Husain：基本上，他在問，有沒有辦法繼續使用 elastic search 與 RAG 結合，你知道或遇到過的？還是你主要使用像 Hamel Husain：向量數據庫如 lance dB 等等？你有沒有看到人們使用 elastic search 並 Hamel Husain：嘗試從那裡啟動。 Ben：是的，我用過一些 elastic search，這完全是可行的。你確實會失去語義搜索的部分，儘管我認為現在 elastic search 有一個向量數據庫的選項，所以你可以添加向量到它。 Ben：你總是可以插入。你總是可以只做 BM. 25，然後在最後插入重新排序器。如果你閱讀關於交叉編碼器的論文，通常就是這樣評估的，先用 BM. 25 檢索 50 到 100 個文檔，然後用重新排序器運行它們。 Ben：如果你能夠設置你的重新排序管道或調用 cohere Api，這是一個非常好的方法，因為你不需要嵌入整個文檔來樣本化深度學習的效果。 Ben：因為有些領域你不需要深度學習。BM. 25 在某些方面仍然足夠好。	圖片編號: 556 圖片時間: 2780.00秒
Ben: and you know, like I think it's become very apparent, like bm, 25 has never told anyone they should eat 3 rocks a day. Ben: but my biddings have so. Hamel Husain: Dmitry is asking, Is it worthwhile to weigh the 25 a. Bm. 25 similarity score during the ring re ranking step as well. Ben: Probably not. You generally just want to use Bm. 25 to retrieve candidates. So you don't need to give those calls to the embedded to your cross and color. Dan Becker: There's a question. I'm going to Dan Becker: change it slightly. So someone asks about retrieving Dan Becker: from many documents rather than finding the best one, and maybe the tweak there is. If you have a theory Dan Becker: that Dan Becker: information within any single document is so correlated that you actually want to try and get some diversity. Dan Becker: Are you familiar with? Or have you used approaches where you? Dan Becker: I specifically try and Dan Becker: in some lost function somewhere. Encourage that diversity and encourage pulling from many documents rather than from one. Ben: I have not done that myself. I know that there's different like loss methods to optimize like for diversity versus the accuracy. Ben: But I don't think I would be able to give you a clear answer without something really confident about something. I don't know about you, but. Dan Becker: Have you used hierarchical rag? Any thoughts on it? Ben: I have not, and I don't think it's very needed, for, like the current pipelines, I think there's a lot other for those steps you can improve. Dan Becker: Since I think we have several answer. AI people here. I don't know if this is a question or request. I'm eager to learn if answer AI will come with up with any books on Lm. Applications in the future. Ben: I don't think so, but never say, never. Ben: Jamie, if you want to chime in. Ben: Yeah, cause I can't make any promises cause my boss is watching so. Dan Becker: You see anything else to? Dan Becker: Ben, did you say that you can't see the questions. Ben: Yeah, they're all blank for me. I saw one earlier, but they really show I've spotted a cliff. Ben: Pam. Dan Becker: Not sure what's happened with her. Dan Becker: and I think people also cannot upvote these so couple of quirks today. Dan Becker: You see any others here, Emil, that you Dan Becker: think we should pull in. Hamel Husain: no, not necessarily, I think, like Hamel Husain: probably going to the discord disk. Hamel Husain: Pretty good. Now. Dan Becker: Yep. Hamel Husain: Tons of activity there as well. Hamel Husain: I mean, there's infinite number of questions. So we we can. I can't. I can't keep going. You like. Hamel Husain: okay, Laurie is asking, what's the best strategy when chunks. Hamel Husain: when the documents, when chunks or documents don't fit into the context window. Hamel Husain: do you do rag in a mapreduce style, summarize aggressively. Hamel Husain: What are the techniques you've seen work most effectively. Ben: So that's, I think, a very broad question, because it's like. Ben: why do they not fit? Is it because, like every documents 3 day long is because you need a lot of different documents etc, etc. So Ben：你知道的，我覺得這已經很明顯了，像 bm25 從來沒有告訴任何人他們應該每天吃三顆石頭。 Ben：但我的出價有這麼說過。 Hamel Husain：Dmitry 問，是否值得在重新排序步驟中權衡 25 的 bm25 相似度分數。 Ben：可能不需要。你通常只想用 bm25 來檢索候選項。所以你不需要把這些調用給嵌入到你的交叉和顏色中。 Dan Becker：有個問題。我會稍微改一下。所以有人問關於從許多文檔中檢索，而不是找到最好的那一個，也許這裡的調整是，如果你有一個理論 Dan Becker：認為 Dan Becker：任何單一文檔中的信息是如此相關，以至於你實際上想嘗試獲得一些多樣性。 Dan Becker：你熟悉或使用過哪些方法嗎？ Dan Becker：我特意嘗試在某些損失函數中鼓勵這種多樣性，並鼓勵從許多文檔中提取，而不是從一個文檔中提取。 Ben：我自己沒有這麼做過。我知道有不同的損失方法來優化多樣性與準確性。 Ben：但我不認為我能給你一個明確的答案，沒有對某些事情非常自信。我不知道你怎麼想，但。 Dan Becker：你用過分層 RAG 嗎？有什麼想法嗎？ Ben：我沒有，我也不認為這對於當前的管道來說是非常需要的，我認為還有很多其他步驟可以改進。 Dan Becker：因為我認為我們這裡有幾個 Answer AI 的人。我不知道這是個問題還是請求。我很想知道 Answer AI 未來是否會出書關於 LLM 應用。 Ben：我不認為會，但永遠不要說不可能。 Ben：Jamie，如果你想插話。 Ben：是的，因為我不能做任何承諾，因為我的老闆在看。 Dan Becker：你還看到其他問題嗎？ Dan Becker：Ben，你說你看不到問題。 Ben：是的，對我來說都是空白的。我之前看到了一個，但它們真的顯示我看到了懸崖。 Ben：Pam。 Dan Becker：不確定她發生了什麼。 Dan Becker：而且我認為人們也不能給這些投票，所以今天有幾個怪癖。 Dan Becker：你看到其他的嗎，Emil，你 Dan Becker：認為我們應該拉進來的。 Hamel Husain：不，不一定，我覺得，像 Hamel Husain：可能去 discord 討論。 Hamel Husain：現在還不錯。 Dan Becker：是的。 Hamel Husain：那裡也有很多活動。 Hamel Husain：我的意思是，有無數的問題。所以我們可以。我不能一直說下去。你知道的。 Hamel Husain：好的，Laurie 問，當文檔的塊或文檔不適合上下文窗口時，最佳策略是什麼。 Hamel Husain：你是用 mapreduce 風格的 RAG，積極總結嗎？ Hamel Husain：你見過哪些技術最有效。 Ben：所以我認為這是一個非常廣泛的問題，因為這就像。 Ben：為什麼它們不適合？是因為每個文檔都很長，還是因為你需要很多不同的文檔等等。所以	圖片編號: 606 圖片時間: 3030.00秒
Ben: and how also another important aspect is, what's the latency tolerance? Because quite a lot of the time. You can make Rag infinitely better. But users won't stay Ben: waiting like 20 seconds for and answer. So you need to figure out like, how much time do I have? Ben: One way that you can often see Ben: what I've done in production actually is, retrieve the full documents, but have another database that maps every document with semi. Ben: So you would have like done your Lm. Summization at the previous step you would retrieve the relevant checks, and then you would pass the relevant surmise to the context window. Ben: But that kind of depends on like your actual setting. Ben: I have another call at Ben: 10, which is in 5 min for me. So if you've got like another final question. Ben: this is it. Hamel Husain: Really enjoy this presentation. Ben: Thank you. Dan Becker: Yeah, this is this is really great. Dan Becker: covid. They just Dan Becker: super clear and Ben：還有另一個重要的方面是，延遲容忍度是多少？因為很多時候，你可以讓 RAG 變得無限好，但用戶不會等 Ben：等 20 秒才得到答案。所以你需要弄清楚，我有多少時間？ Ben：有一種方法，你經常可以看到 Ben：我在生產環境中實際上做的是，檢索完整的文檔，但有另一個數據庫將每個文檔與半結構化數據映射。 Ben：所以你會在前一步完成 LLM 摘要，你會檢索相關的檢查，然後將相關的摘要傳遞到上下文窗口。 Ben：但這取決於你的實際設置。 Ben：我還有一個會議 Ben：在 10 點，對我來說是 5 分鐘後。所以如果你有最後一個問題。 Ben：就是現在問了。 Hamel Husain：真的很享受這次演講。 Ben：謝謝。 Dan Becker：是的，這真的很棒。 Dan Becker：covid。他們只是 Dan Becker：非常清楚和	圖片編號: 622 圖片時間: 3110.00秒
Dan Becker: well presented. So thanks so much. Ben: Thank you. Chat. Dan Becker：講得很好。非常感謝。 Ben：謝謝。聊天。	圖片編號: 623 圖片時間: 3115.00秒
Hamel Husain: Thank you. Ben: Bye. Dan Becker: Thanks. Everyone. Hamel Husain：謝謝。 Ben：再見。 Dan Becker：謝謝。大家。	圖片編號: 624 圖片時間: 3120.00秒