Llamafile: bringing AI to the masses with fast CPU inference

Stephen Hood & Justine Tunney, Mozilla

Youtube: https://www.youtube.com/watch?v=5zE2sMka620

請注意，本網頁為程式自動產生，可能會有錯誤，請觀賞原影片做查核。網頁產生方式為影片每5秒截圖、去除重複的影像，使用 whisper 模型做語音辨識字幕、使用 gpt-3.5-turbo 做中文翻譯，以及 Claude 做摘要。

Llamafile 是 Mozilla 的開源項目,旨在民主化 AI 的使用。它可以將 AI 模型權重轉換為單一可執行文件,無需安裝即可在各種操作系統和硬體上運行。
該項目專注於提高 CPU 推理速度,使 AI 模型能在普通 CPU 上高效運行,不必依賴昂貴的 GPU。
Llamafile 建立在 Llama CPP 項目的基礎上,根據不同的 CPU 和模型,可實現 30% 到 500% 的速度提升。
它完全在本地運行,不需要網絡連接,確保了隱私和安全。
Mozilla 參與此項目是為了確保 AI 技術的開放性和透明度,防止大型科技公司壟斷 AI 發展。
Justine Tunney 介紹了一些技術細節,包括如何實現跨平台兼容性和提高矩陣乘法效率。
項目吸引了許多貢獻者,如 kquants 的發明者 Iwan,他們進一步提高了量化格式的效率。
Mozilla 推出了 Builders 計劃和加速器,為開源 AI 項目提供資金和支持。
Mozilla 鼓勵開發者參與本地 AI 和邊緣計算相關的開源項目,並提供非稀釋性資金支持。


Hey, everybody, how y'all doing? 嘿，大家，你們好嗎？
Nice, not bad for 940. All right. Hey, I'm Stephen Hood. And I'm just... 不錯，940 不錯。好的。嘿，我是 Stephen Hood。我只是...
Oh, sorry, go ahead. And I'm Justine Tunney. Yeah. So we are here to talk to you about Llamafile today and what we've been doing on this project. 哦，抱歉，請繼續。我是Justine Tunney。對，我們今天在這裡要和大家談談有關Llamafile的事情，以及我們在這個專案上所做的工作。
So I'll get us started. 所以我會開始。
I'm going to tell you what Llamafile is, how it works. I'm going to spend a little time talking about why we're building it, why Mozilla specifically 我將告訴你 Llamafile 是什麼，它是如何運作的。我將花一點時間談談我們為什麼要建立它，為什麼是 Mozilla 特別。
is involved. And then I'm going to hand it over for the fun part to Justine. Justine is going to talk about the actual work that she and the open source community 參與其中。然後我將把有趣的部分交給 Justine。Justine 將談論她和開源社區實際工作。
have been doing on this project. 一直在進行這個專案。
Lots of insights and tricks and hacks that have made CPU inference go faster than ever 許多見解、技巧和技巧，使得中央處理器推論比以往更快。
before. So that will be fun. And when we're done, we want you to share the feeling that we have, which is kind of a sense of excitement and empowerment from the knowledge that there are lots of really 之前。所以那將會很有趣。當我們完成時，我們希望您分享我們擁有的感覺，這種興奮和自信的感覺，因為我們知道真的有很多。
interesting, juicy, impactful problems still left to be solved in AI. AI中仍有一些有趣、多汁且有影響力的問題有待解決。
A lot of them. And the key thing is, it's not just the big folks who can solve these problems. 很多人。而關鍵是，解決這些問題的不僅僅是大人物。
It's individuals and small groups working together in open source. 這是個體和小團體在開源中共同合作。
So anyone in this room or anyone listening to this talk can potentially make a big impact 所以這個房間裡的任何人或聽著這個演講的任何人都有可能產生重大影響
in this space. So what's Llamafile? 在這個空間。那麼 Llamafile 是什麼？
Llamafile is an open source project from Mozilla that has the goal of democratizing access to AI. So we do that in a few different ways. The first is probably how, if you've heard of Llamafile, the reason you heard of it. It's the original magic trick of the project that Justine figured out, which is how to turn weights into programs. So Llamafile is a single file, executable, that runs without any installation on pretty much every operating system, every CPU architecture, and every GPU architecture. Llamafile是Mozilla的一個開源項目，旨在實現AI的民主化訪問。所以我們通過幾種不同的方式來實現這一目標。第一種可能是，如果你聽說過Llamafile，你聽說它的原因。這是該項目的原始魔術技巧，Justine找出的方法，即如何將權重轉換為程序。因此，Llamafile是一個單文件，可執行，在幾乎每個操作系統、每個CPU架構和每個GPU架構上運行，無需任何安裝。
And that's all. 這就是全部。
Thank you very much. [ Laughter ] >>That was easy. Yeah, so by the way, this isn't just one file. 非常感謝。『笑聲』>>這很容易。是的，順便說一下，這不只是一個檔案。
Like for Windows, right? And a different one for Linux and Mac. 就像對於Windows一樣，對於Linux和Mac則是不同的。
It's actually a single file. You can download a Llamafile, run it on any computer in the world, and it will just work, 這實際上是一個單一檔案。您可以下載一個 LLM 檔案，在世界上的任何電腦上運行它，它將正常運作。
and it will use the hardware you have, whether that be fancy GPUs or your CPU. So we'll talk a little more later about how Justine made that work. 它將使用您擁有的硬體，無論是高級GPU還是您的CPU。所以我們稍後會更詳細地談談Justine是如何做到這一點的。
But we're here to talk about another topic, too. Most of the talk is actually about this, which is CPU inference speed. 但我們這裡也要談談另一個話題。大部分的討論實際上都是關於這個，也就是中央處理器推論速度。
Now, you might ask, why do we need to worry about CPU inference speed? 現在，你可能會問，為什麼我們需要擔心 CPU 推論速度呢？
We've got these fancy GPUs, right? Well, no disrespect, almighty Jensen, first of his name, master of market cap. 我們有這些高級的GPU，對吧？嗯，沒有不尊重的意思，萬能的Jensen，第一任名字，市值大師。

[ Laughter ] >>Don't strike me down. 【笑聲】>>不要擊倒我。
But I would posit that it is not a universally good thing that we are so dependent in this 但我會提出，我們如此依賴這件事並不是一件普遍好的事情
room on GPUs. They are expensive. They are difficult to source. Let's face it, they consume a lot of electricity, which we might want to think about. 在GPU上的空間。它們很昂貴。它們很難取得。讓我們面對現實，它們消耗大量的電力，這是我們可能需要考慮的。

But bigger picture, we have an entire planet of CPUs out there. 但更廣泛來看，我們有整個行星上的CPU。
Literally all over the world. Great hardware. Often affordable hardware. And we are at risk of just kind of throwing that all away with this new era of AI. 幾乎遍布全球。優秀的硬體。通常是負擔得起的硬體。我們有可能在這個新的 AI 時代中將這一切都拋諸腦後。
And we don't need to do that. 我們不需要這樣做。
So who here knows Llama CPP? This is an easy question. Yeah, right? 誰在這裡知道 Llama CPP？這是一個簡單的問題。對吧？
So we all know and love this project. We build on top of that project with Llamafile, and we contribute our performance enhances back to it. Many have been merged in. That project proved that CPUs could do inference perfectly well. 所以我們都知道並喜歡這個專案。我們在這個專案的基礎上建立 Llamafile，並將我們的效能增強貢獻回去。許多已經被合併進去。這個專案證明了 CPU 可以很好地進行推論。
And so we have been basically trying to take that performance to the next level. 所以我們基本上一直在試著將那個表現提升到下一個層級。
And as a result of Justine and the community's work, depending on what CPU you are using, 由於Justine和社區的努力，取決於您使用的CPU，
what model you are running, what weights, you will see between 30 and 500% speed increases with Llamafile. Which kind of still blows my mind. And by the way, I don't think we're anywhere near done. 你正在運行哪個模型，使用哪些權重，你將看到在 Llamafile 上速度提升 30% 到 500%。這種情況仍然讓我感到驚訝。順帶一提，我認為我們還遠遠沒有完成。
So these things also run locally, by the way. This one is totally on your machine. There's no network access. You can take a pair of scissors and cut the Ethernet cord and it will work. 所以這些事情也在本地運行，順便說一下。這個完全在你的機器上。沒有網絡訪問。你可以拿一把剪刀剪斷乙太網線，它就能運作。
Which is what I asked Ali3 to draw. Okay. I don't think it understood the assignment, but that's all right. 這就是我要 Ali3 畫的東西。好的。我不認為它理解這個任務，但沒關係。
But seriously, we're not calling cloud LLMs. 但是認真地，我們不會稱呼雲端為 LLM。
There's no monitoring or analytics. No bits leave your machine. 沒有監控或分析。沒有任何數據離開您的機器。
It's totally private and local. And everything you need comes in the box. So whether you want to just play with a model that you just found on Hugging Face, or you want to start building locally running LLM applications on your machine, you got everything 這是完全私人和本地的。盒子裡包含了你需要的一切。所以無論你只是想玩一下在 Hugging Face 上找到的模型，或者你想要在你的機器上開始建立本地運行的 LLM 應用程式，你都有一切。
you need in the box. And they're readily available. So Hugging Face now supports Llamafile as a file type. So you can search and filter by Llamafile. You can also just search Mozilla on Hugging Face. You'll find we have a bunch of Llamafiles that we've already published. And with a single command, you can create your own. So really this project is collapsing all the complexity of the open source AI stack down 您需要的都在盒子裡。而且它們都很容易取得。所以 Hugging Face 現在支援 Llamafile 作為一種檔案類型。因此您可以透過 Llamafile 進行搜尋和篩選。您也可以在 Hugging Face 上直接搜尋 Mozilla。您會發現我們已經發佈了許多 Llamafile。而且只需一個指令，您就可以創建自己的檔案。因此，這個專案實際上將所有開源人工智慧堆疊的複雜性都簡化了。
into a single action and a single file. 轉換成單一動作和單一檔案。
So why are we involved? 所以我們為什麼會參與其中呢？
Why is Mozilla involved in this? You might be saying, don't you folks make browsers? In fact, we do. We make a damn fine browser. You should try it out if you haven't lately. 為什麼 Mozilla 參與其中？你可能會說，你們不是製作瀏覽器的嗎？事實上，我們是。我們製作了一款非常好的瀏覽器。如果你最近還沒試過，你應該試試看。
But we exist also for a bigger purpose, which is to fight for the web. So I'm going to ask you a question here. 但我們也存在著更重要的目的，那就是為了網路而戰。所以我要在這裡問你一個問題。
Who here remembers using the original Netscape Navigator? 誰還記得使用原始的 Netscape Navigator？
Don't be shy. They can see how old you are. They can only see how old I am. A lot of hands, right? 不要害羞。他們可以看出你有多大了。他們只能看出我有多大。很多手，對吧？
So you are my people. 所以你們是我的人。
You remember the '90s. My CTV. Terrible haircuts. Nilly-vanilly. I don't know. 你記得90年代。我的CTV。糟糕的髮型。Nilly-vanilly。我不知道。
Whatever. My point is, you remember the early days of the web. 無論如何。我的觀點是，你還記得網路的早期嗎。
And you remember how close we came to one company and one product, kind of controlling 你還記得我們差點就只有一家公司和一種產品控制了。
the whole thing. 整個事情。
And we kind of see that maybe happening again today with AI. No matter what we may think of these companies, the reality is there are some very influential 我們有點覺得今天可能會再次發生這種情況，這次是人工智慧。無論我們對這些公司有什麼看法，現實是有一些非常有影響力。
big tech companies that are in a position to maybe control the future of machine intelligence. 大型科技公司可能掌控機器智能未來的位置。
And that's itself not a great thing. It's not great for equity. It's not great especially for users' sense of privacy and safety and agency and control. 這本身並不是一件好事。這對公平來說並不好。尤其對使用者的隱私、安全感、自主權和控制權來說並不好。
And we've had an answer to this for many years. It's called open source. And the answer is right in the name, right? Open source. Transparency is the solution here. And it's important for us to have viable open source alternatives in AI. 多年來，我們對這個問題已經有了答案。它被稱為開源。而答案就在名字裡，對吧？開源。透明度就是解決之道。對我們來說，擁有在人工智慧領域中可行的開源替代方案是很重要的。
And that's why Mozilla is getting involved. That's why we made Llamafile and more projects to follow. 這就是為什麼 Mozilla 參與其中。這就是為什麼我們製作了 Llamafile，並將推出更多專案。
And I know many of you in this room are already working on open source AI. We want to help support what you're doing. 我知道在座的許多人已經在從事開源人工智慧。我們希望能夠支持你們正在進行的工作。
So with that, I'm going to hand it over to Justine, who's going to tell you actually the cool part, which is all the things that she and the community have been doing on this 所以，我現在要把話筒交給 Justine，她將告訴大家實際上很酷的部分，也就是她和社區一直在做的所有事情
project. Justine. [ Applause ] >> Thank you, Stephen. 專案。Justine。【掌聲】>> 謝謝你，Stephen。

So I'm Justine Tunney. I'm the lead developer on Llamafile. 所以我是 Justine Tunney。我是 Llamafile 的首席開發人員。
And as Stephen mentioned, I'm going to talk about some of the cool work we've been doing 正如 Stephen 提到的，我將談談我們一直在進行的一些很酷的工作。
in the community to help you run the fastest local M experience possible. And in order to do this, we started by first getting it to run on the systems at all. 在社區中幫助您運行最快的本地 M 体驗。為了做到這一點，我們首先讓它在所有系統上運行。
And with Cosmopolitan, what it enables us to do is take your weights in a single file 而且透過Cosmopolitan，它讓我們可以將您的權重放入單一檔案
and run it on six OSes. And there's a really cool hack that makes that possible, which is we basically take 並在六個作業系統上運行。而且有一個非常酷的駭客技巧，使這成為可能，基本上我們就是採取
a Unix 6 edition shell script, put it in the MS-DOS stub of a portable executable, and 一個 Unix 6 版本的 shell 腳本，將其放入可攜式可執行檔的 MS-DOS 樁。
that enables it to run on Mac, Windows, and BSDs and Linux, et cetera. 能夠在 Mac、Windows、BSD 和 Linux 等系統上運行。
Really cool stuff. 真的很酷的東西。
And once we conquered the portability issue with CPUs, I had the opportunity to work with Mozilla on bringing this to AI. And with AI, GPUs are indispensable. As much as we focus on CPUs, we care very much about GPUs, too. 一旦我們解決了中央處理器的可攜性問題，我就有機會與 Mozilla 合作將這項技術應用於人工智慧。在人工智慧領域，顯示卡是不可或缺的。儘管我們專注於中央處理器，但我們也非常關心顯示卡。
But GPUs have always had the problem of distributability. 但是GPU一直存在著可分配性的問題。
Many people have needed to ship Kublas binaries with their project. 許多人需要將Kublas二進制文件與他們的項目一起發送。
500 megs in size. Can we really call our software open source if it spends the majority of its time in a proprietary blob? So I never felt comfortable with that. 大小為 500 megs。如果我們的軟體大部分時間都在專有的 blob 中運行，我們真的能稱其為開源軟體嗎？所以我從來不覺得舒服。
And one of the ways we're solving that is by distributing a library called Tiny Blast that enables you to ship your LLMs to platforms like Windows without depending on SDKs. It'll run with only the driver installed. 而我們解決這個問題的其中一種方式是通過分發一個名為 Tiny Blast 的程式庫，讓您能夠將您的 LLMs 部署到像 Windows 這樣的平台，而不需要依賴 SDK。它將僅運行需要安裝驅動程式。
But more importantly, performance. 但更重要的是，表現。
Now LLMs spend the majority of their time doing matrix multiplication. Probably the most important algorithm in the world has a really simple definition. 現在 LLMs 花大部分時間在做矩陣乘法。可能是世界上最重要的演算法有一個非常簡單的定義。
We've been making it go faster for prompt processing. And the way we did it is with a very simple trick we figured out. And this is something all programmers can adopt in their code. 我們一直在讓它更快以進行即時處理。我們所做的方法是一個非常簡單的技巧，我們找到了。這是所有程式設計師都可以在他們的程式碼中採用的方法。
And it entails unrolling the outer loop. So let's talk about what not to do first. And that would be unrolling the inner one. We've all seen fun roll loops, Gen 2. It's a bad idea. Computers can generally do that on their own. If you unroll the outer loops, then your algorithm with matrix multiplication can sort of unfold 這意味著展開外迴圈。所以讓我們先談談不應該做的事情。那就是展開內迴圈。我們都見過有趣的展開迴圈，Gen 2。這是個壞主意。電腦通常可以自行處理這部分。如果你展開外迴圈，那麼你的矩陣乘法演算法就可以有點展開。
like a flower and focus on pure flops like a blast kernel. And that's really all there is to it to getting the majority of the benefits of blast to make prompt processing go really fast. 就像一朵花，專注於純粹的失敗，就像一個爆炸核心。這就是獲得大部分爆炸好處的全部內容，讓快速處理變得非常快。
So what's the impact of this really simple solution? This generalizes to a wide variety of hardware. 這個非常簡單的解決方案有什麼影響？這適用於各種各樣的硬體。
We've seen everything from a scrappy hobbyist raspberry pi to much bigger computers going significantly faster. You need algorithms like this to exploit the latest capabilities of hardware. Token generation, race I wouldn't believe. 我們見過從一個堅持不懈的業餘樹莓派到運行速度顯著更快的更大型電腦。你需要像這樣的算法來利用硬體的最新功能。代幣生成，比賽我不會相信。
If you use a gaming computer like Intel, you're going to see better performance with LumaFile 如果您使用像Intel這樣的遊戲電腦，您將會看到更好的效能與LumaFile
on those two. 在那兩個上。
Really exciting stuff, like particularly with Alder Lake, we were able to get a 4X improvement. 真的很令人興奮的事情，特別是在Alder Lake方面，我們成功地達到了4倍的改善。
But ThreadRipper, most of all, for the first time, AVX 512 is available to consumers. And we've been able to help you prepare for that future. So if you have a ThreadRipper, you're going to see better performance than ever, almost like a GPU. Now prompt avail speed, what makes it important is it's really cool to be able to generate 但 ThreadRipper，最重要的是，第一次，AVX 512 對消費者來說是可用的。我們已經能夠幫助您為未來做好準備。所以如果您擁有一個 ThreadRipper，您將看到比以往更好的性能，幾乎像一個 GPU。現在快速可用的速度，使它變得重要的是，能夠產生是非常酷的。
text and use a chat bot. But the way I want you to think about LumaFile is it's more of a word crunching machine that 文本和使用聊天機器人。但我希望你想到 LumaFile 是更像是一個文字處理機器。
can help you understand our world. 可以幫助你了解我們的世界。
And I love to use it personally for tasks like summarization. I love that it can help me read a blog post. 我喜歡親自使用它來進行總結等任務。我喜歡它可以幫助我閱讀部落格文章。
And we've used other performance tricks too. With NVIDIA, part of what makes them so successful, it's not just great hardware, but they built a great framework too. 我們也使用了其他的效能技巧。與 NVIDIA 合作，他們成功的部分不僅僅是出色的硬體，還有他們建立了一個出色的框架。
And their framework helps developers think about programming in a different way that 而他們的框架幫助開發者以不同的方式思考程式設計
helps them be successful. Who here thinks that software with CPUs just gets slower each year? 幫助他們成功。誰在這裡認為，帶有中央處理器的軟體每年都會變慢？
Can I see some hands? 我可以看到一些手嗎？
Well, one of the things that's great about NVIDIA is they showed us a better alternative 嗯，NVIDIA 的一個偉大之處在於他們向我們展示了一個更好的選擇。
to getting performance. And when I learned how to program Ancuda, I found one of the most important functions was sync threads. This is how you can implement it for CPU in 10 lines of code. And if you use the lockstep programming model, use your CPU as though it were a GPU. 對於效能的提升。當我學會如何編程 Ancuda 時，我發現其中一個最重要的功能是同步執行緒。這是你可以在 10 行程式碼中為 CPU 實現它的方式。如果你使用 lockstep 編程模型，將你的 CPU 用作 GPU。
You can get really good performance. Now, this is going to be a demo showing the impact of this work before and after for summarization. 您可以獲得非常好的表現。現在，這將是一個展示這項工作在總結前後影響的演示。
And here we're going to be processing an essay by Dijkstra. Really cool, worth reading. But I want you to watch as it processes it in terms of speed. 這裡我們將處理一篇 Dijkstra 的文章。真的很酷，值得一讀。但我希望你看著它處理速度。
Here we see it going. And on the right, we have the new version. It's like, bam, bam, bam, bam. 這裡我們看到它正在進行。右邊是新版本。就像是，嘭，嘭，嘭，嘭。
Huge night and day difference. It's already summarizing it in the old version. 巨大的晝夜差異。它已經在舊版本中總結了。
It's like nowhere close. So that is the kind of new performance you can expect. And it's the kind of performance that's actually possible, which I wouldn't have imagined beforehand. 這根本不像。所以這就是你可以期待的新表現。這是一種實際可能的表現，這是我事先無法想像的。
It's really great. 真的很棒。
[ Applause ] Thank you. 【掌聲】謝謝。
CPUs can do so much. And people in the community have, like, loved this work. We've managed to attract some, like, really amazing contributors, like Iwan, the inventor of kquants, very popular. I'm sure many of you have used them. CPU可以做很多事情。社群裡的人們對這項工作非常喜愛。我們成功吸引了一些非常了不起的貢獻者，像是kquants的發明者Iwan，非常受歡迎。我相信很多人都使用過它們。
He got them going 2x, 4x, faster, too, on both x86 and ARM. 他讓它們在 x86 和 ARM 上都變快了 2 倍、4 倍。
So if you use quantized formats, those are going to be better than ever with Llamafile now, too. And it's worth mentioning that we've seen really interesting things about this. 所以如果您使用量化格式，這些現在將比以往更好與 Llamafile 一起。值得一提的是，我們對此看到了非常有趣的事情。
Like people, once we put this out into the world, people have come back and given us feedback and reported, like, their own experiences. We found out that someone was running Mixtel 8x22b on a $350 CPU. 就像人一樣，一旦我們將這個放到世界上，人們回來給了我們反饋並報告，就像他們自己的經驗一樣。我們發現有人在一顆 350 美元的 CPU 上運行 Mixtel 8x22b。

And to me, that's just wonderful. Because performance matters, but it's not really the thing we care about. 對我來說，那真是太棒了。因為表現很重要，但這並不是我們真正在乎的事情。
What we care about is intelligence. And to have the intelligence, you need to run bigger models. 我們關心的是智慧。而要擁有智慧，你需要運行更大的模型。
And RAM is cheap with CPUs. For the price of a graphics card, I put 512 gigs in my workstation. RAM對於CPU來說很便宜。以一張顯示卡的價格，我在我的工作站上放了512GB。
And that means I can run all the frontier models coming out. 這表示我可以運行所有新推出的前線模型。
And I just have to wait a little longer, but I get a much more intelligent answer. 而我只需要再等一下，但我會得到一個更聰明的答案。
And the fact that that went from impossible to possible for most consumers is, you know, 而對於大多數消費者來說，這從不可能變得可能的事實是，你知道的，
a story I want you all to tell. 一個故事我想要你們都說。
Individuals are making a big difference. 個人正在產生重大影響。
And you can be a part of that, too. And I'm going to hand it back to Stephen, who can explain what Mozilla can do to support 而你也可以成為其中的一部分。現在我將轉交給 Stephen，他將解釋 Mozilla 可以如何支援。
you getting involved in that effort. 你參與了那個努力。
[ Applause ] Thanks a lot for all your efforts. 【掌聲】非常感謝大家的努力。

So yeah, that's a key message of this talk is, anyone in this audience, you don't have 所以，這次演講的一個關鍵訊息就是，這個觀眾中的任何人，你們都不必
to work for these big, giant, largest in the history of humanity companies necessarily 為這些在人類歷史上最大、最巨大的公司工作是必然的
to have a big impact. There's lots of headroom here. 對於產生重大影響。這裡有很大的發展空間。
There's lots of unsolved interesting problems in this space. And we want to get involved in helping. So we recently launched a program called Mozilla Builders. 這個領域有很多未解決的有趣問題。我們希望參與協助。因此，我們最近推出了一個名為 Mozilla Builders 的計畫。
And this is a program by which we either sponsor or in some cases co-develop impactful open 這是一個計畫，我們可以贊助，或在某些情況下共同開發有影響力的開放。
source AI projects. LlamaFile is actually the first in this program. I'm happy to announce today the second, which is SQLite Vec. 來源 AI 專案。LlamaFile 實際上是這個計畫中的第一個。我很高興今天宣布第二個，也就是 SQLite Vec。
This is from a developer named Alex Garcia. Alex is adding vector search capability to SQLite. 這是來自一位名為Alex Garcia的開發者。Alex正在將向量搜索功能添加到SQLite。
So for some folks in this audience, that will have some obvious implications that are kind 對於這個觀眾中的一些人來說，這將具有一些明顯的影響。
of cool. [ Applause ] But just imagine, remember that little modest Raspberry Pi 5? 相當酷。【掌聲】但想像一下，還記得那個小而謙虛的 Raspberry Pi 5 嗎？
So I can imagine now a local LLM, open LLM, running privately on that machine with no 所以我現在可以想像一個本地 LLM，開放 LLM，在那台機器上私下運行
network connection, connected to your personal private data, which you can use with confidence 網路連線，連接到您的個人私人資料，您可以放心使用

that it's safe to do RAG and other interesting applications. That's the kind of stuff we're talking about here. 這是安全進行 RAG 和其他有趣應用的證據。這就是我們在這裡討論的內容。
We also just launched our own accelerator. It's called the Mozilla Builders Accelerator. So we are offering 100,000 US in non-dilutive funding for open source projects that advance the promise and potential of local AI. 我們也剛剛推出了自己的加速器。它被稱為 Mozilla Builders 加速器。因此，我們為推進當地人工智慧的承諾和潛力的開源項目提供了 100,000 美元的非稀釋性資金。
So that's AI applications running at the edge on user devices. These are some of the bullet points of areas we're particularly interested in, but it's 這就是在使用者裝置上運行的AI應用程式。這些是我們特別感興趣的一些重點領域，但這只是部分。
not an exclusive list. And you don't have to necessarily be building a company to apply for this accelerator. 並非專屬清單。您並不一定需要正在創立一家公司才能申請這個加速器。
So if you want to learn more about the accelerator, this QR code will take you there. Take a picture of that. Or just go to future.mozilla.org/builders. And Justine and I and a lot of Mozilians are here this week. 所以如果你想要更多關於加速器的資訊，這個QR碼會帶你到那裡。拍下來。或者直接前往future.mozilla.org/builders。Justine和我以及許多Mozilians這週都在這裡。
If you have something you're working on, something you think we should know about or you want to collaborate with us, please find us, reach out, or reach out to me via e-mail. 如果你有正在進行的事情，有什麼我們應該知道的事情，或者想要與我們合作，請找到我們，聯絡我們，或透過電子郵件聯絡我。
So thanks again. 所以再次感謝。
Thanks to Justine and the community and all their work on Llamafile. Thank you, Stephen.	無對應圖片