https://www.youtube.com/watch?v=4pYzYmSdSH4
## 背景與開場
**Harrison**:Andrew 在 LangChain 的發展歷程中扮演了重要角色。大約兩年多前我們在一個會議上相遇,開始討論 LangChain,他很慷慨地邀請我們與 DeepLearning.AI 合作開設 LangChain 課程,這應該是他們開設的第二或第三門課程。我知道這裡很多人可能看過那門課程或因為那門課程開始使用 LangChain。
**Andrew**:Harrison 非常友善。Harrison 和他的團隊至今已經在 DeepLearning.AI 教授了六門短期課程。根據我們的淨推薦值等指標,Harrison 的課程是我們評價最高的課程之一。我認為 LangGraph 那門課程對許多 agentic 概念有我見過最清晰的解釋。它們確實幫助我們的課程和解釋變得更好。
## 關於 Agent 定義的核心觀點
**Harrison**:你有一個我經常引用的觀點,就是談論應用程式的「agentic 程度」,而不是爭論「某個東西是否為 Agent」。現在我們在這個 Agent 會議上,也許我們應該將其重新命名為 agentic 會議。
**Andrew**:大約一年多前,我記得 Harrison 和我都在一個會議上發言。當時我們都在試圖說服其他人 Agent 是一個值得關注的領域。那是在去年年中之前,後來一群行銷人員掌握了 agentic 這個詞,開始把這個貼紙貼到任何地方,直到失去意義。
回到你的問題,大約一年半前,我看到很多人在爭論:「這是 Agent 嗎?這不是 Agent 嗎?它是真正自主的嗎,不是 Agent?」我覺得進行這樣的爭論是可以的,但如果我們作為一個社群說存在不同程度的 agentic 系統會更成功。
所以如果你想建立一個有一點自主性或很多自主性的 agentic 系統都很好,不需要花時間爭論這是否真的是 Agent。讓我們把這些都稱為具有不同自主程度的 agentic 系統。我認為這實際上希望能減少人們浪費在爭論某個東西是否為 Agent 上的時間。讓我們把它們都稱為 agentic,然後繼續進行。我認為這實際上成功了。
## 目前 Agent 應用的現狀
**Harrison**:在從一點自主性到大量自主性的光譜上,你看到人們現在在哪個位置建構?
**Andrew**:我的團隊經常使用 LangGraph 來解決我們最困難的問題,處理複雜的流程等等。我也看到大量的商業機會,坦率地說,都是相當線性的工作流程或只有偶爾邊分支的線性流程。
很多企業有這樣的機會:現在有人在網站上查看表單,進行網路搜尋,檢查資料庫看是否有合規問題,或者是否有我們不應該賣某些東西給他們的人。這有點像:或者拿某些東西,複製貼上,也許再做一次網路搜尋,貼到不同的表單中。在商業流程中,實際上有很多相當線性的工作流程,或者線性但有很小的迴圈和偶爾的分支,通常表示因為拒絕這個工作流程而導致的失敗。
所以我看到很多機會,但企業面臨的一個挑戰是,要看著某些在你業務中正在做的事情,並想出如何將其轉化為 agentic 工作流程仍然相當困難。你應該以什麼粒度將這些事情分解為微任務?然後在你建立初始原型後,如果效果不夠好,你應該在這些步驟中的哪一個上努力來改善效能?
我認為整套技能:如何看一堆人們正在做的事情,將其分解為順序步驟,找出少數分支在哪裡,如何建立評估系統,所有這些技能集仍然太稀少了。
當然,還有更複雜的 agentic 工作流程,我認為你們聽到了很多關於非常複雜迴圈的東西,那也非常有價值。但就機會數量而言,我看到更多的,就價值量而言,仍有很多更簡單的工作流程我認為仍在建構中。
## Agent 開發者需要掌握的技能
**Harrison**:你一直在進行深度學習,很多課程都是為了幫助人們建立 Agent。那麼你認為 Agent 建構者在整個光譜上應該掌握和開始學習哪些技能?
**Andrew**:這是個好問題。我希望我知道好答案。我最近一直在思考這個問題。
我認為,很多挑戰是,如果你有一個商業流程工作流程,你經常有合規、法務、人資等部門的人在做這些步驟。你如何建立管道,無論是透過 LangGraph 類型的整合,還是我們看看 MCP 是否也能維持下去,來攝取資料,然後你如何提示或處理並執行多個步驟以建立這個端對端系統?
然後我看到很多的一件事是建立正確的評估框架,不僅要了解整體系統的效能,還要追蹤個別步驟。你可以專注於什麼是壞掉的那一步,什麼是壞掉的那個提示來努力改進。
我發現很多團隊可能等得比應該等的時間更長,只是使用人工評估,每次你改變某些東西時,你就坐在那裡看一堆輸出接收器,對吧?我看到大多數團隊可能比理想情況更慢地建立系統性評估。
我發現對於專案中下一步該做什麼有正確的直覺仍然非常困難。不熟練的團隊,仍在學習這些技能的團隊經常會走入死胡同,對吧?你花幾個月時間試圖改進一個組件,更有經驗的團隊會說:「你知道嗎,我不認為這個能被改善到可行。所以就是找個不同的方法繞過這個問題。」
我希望我知道更有效的方法來獲得這種幾乎是觸覺的知識。通常你在那裡,看輸出,看追蹤,看 LangSmith 輸出,你就必須在幾分鐘或幾小時內做出決定下一步該做什麼,這仍然非常困難。
## 工具生態系統的比喻
**Harrison**:這種觸覺知識主要是關於 LLM 及其限制,還是更多關於產品框架和將工作分解的技能?
**Andrew**:我認為以上皆是。過去幾年,AI 工具公司創造了一套令人驚豔的 AI 工具。這包括像 LangGraph 這樣的工具,還有如何思考 RAG 的想法?如何思考建立聊天機器人?許多不同的記憶處理方法。我不知道,還有什麼。如何建立評估?如何建立護欄?
但我覺得有這個廣泛的令人興奮的工具陣列。我腦中經常有的一個畫面是,如果你只有紫色樂高積木,對吧?你就不能建造那麼多有趣的東西。我把這些工具想像成樂高積木,對吧?
你擁有的工具越多,就好像你不只有紫色樂高積木,還有紅色、黑色、黃色和綠色的。隨著你獲得更多不同顏色和形狀的樂高積木,你可以非常快速地將它們組裝成真正酷的東西。所以我認為很多這些工具,就像我剛才提到的那些,是不同類型的樂高積木。
當你試圖建造某些東西時,有時你需要那個扭曲的、奇怪形狀的樂高積木,有些人知道它,可以插入它並完成工作。但如果你從未建立過某種類型的評估,那麼你實際上可能最終花費三個月的時間做某件事,而其他做過的人可能會說:「哦,我們應該就這樣建立評估。使用 LLM 作為評判,然後通過那個過程更快地完成它。」
關於 AI 的不幸之處之一是它不只是一個工具。當我在編碼時,我只是使用一大堆不同的東西,對吧?我自己不是足夠多東西的專家,但我已經學會了足夠的工具來快速組裝它們。
所以是的,我認為對不同工具的練習也有助於更快的決策制定。
## 技術環境的快速變化
**Andrew**:這也在改變。例如,因為 LLM 有了越來越長的上下文記憶,很多一年半前關於 RAG 的最佳實務現在已經不太相關了。
對吧。我記得 Harrison 在很多這些事情上都很早期。所以早期的 LangChain、RAG 框架、遞歸摘要等等。隨著 LM 上下文記憶變長,現在我們只是將更多東西倒入我們的上下文中。這不是說 RAG 已經消失了,而是超參數調整變得容易得多。有很大範圍的超參數都工作得很好。
所以隨著 LM 持續進步,我們幾年前持有的直覺今天可能相關也可能不相關了。
## 被低估的技術領域
**Harrison**:你提到了很多我想談論的事情。那麼,有哪些可能被低估的樂高積木是你會推薦人們現在關注的,但人們還沒有談論的?比如,評估,我們有三個人談論評估,我認為這是人們心中的首要問題。但有哪些大多數人可能還沒有想到或還沒有聽說過,你會推薦他們研究的?
**Andrew**:好問題。我不知道。也許我只是顯示了。
即使人們談論評估,出於某種原因,人們不去做它。
**Harrison**:你認為他們為什麼不做?
**Andrew**:我認為這是因為人們經常有,我看到一篇關於「評估作者障礙」的文章。人們認為編寫評估是你必須做對的巨大事情。
我把評估想成我會非常快速地拼湊起來的東西,在 20 分鐘內,它不是那麼好,但它開始補充我的人工眼球評估。
所以經常發生的是,我會建立一個系統,這個問題我一直得到回歸。我以為我讓它工作了,然後它壞了。我讓它工作了,然後它又壞了,這樣一直下去,這變得很煩人。然後我編寫了一個非常簡單的評估,也許有五個輸入例子和一些非常簡單的 LLM 評判,只是檢查這一個回歸,對吧?這一個東西壞了嗎?
然後我不是用自動評估替換人工評估。我仍然自己看輸出,但當我改變某些東西時,我會運行這個評估只是為了減輕這個負擔,這樣我就不必考慮它。然後發生的事情就像我們寫英語的方式一樣,一旦你有了一些稍微有用但明顯非常破碎、不完美的評估,然後你開始想:「你知道嗎?我可以改進我的評估使其更好,我可以改進它使其更好。」
就像我們建立許多應用程式時一樣,我們建立一些非常快速和髒的東西,它不工作,然後漸進地讓它更好。對於我建立評估的很多方式,我建立了真的很糟糕的評估,幾乎沒有幫助。然後當你看它做什麼時,你想:「你知道嗎?這個評估壞了。我可以修復它。」然後你漸進地讓它更好。所以這是一件事。
## 語音應用的潛力
**Andrew**:我要提到一件人們談論很多但我認為非常被低估的事情,就是語音堆疊。這是我實際上非常興奮的語音應用之一。我的很多朋友對語音應用非常興奮。我看到一堆大型企業對語音應用非常興奮,非常大且針對我們的非常大的使用案例。
出於某種原因,雖然這個社群中有一些開發者在做語音,但對語音堆疊應用的不同關注程度。有一些,對吧?這不是說你已經忽略了它,但這是一件感覺比我看到的大型企業重要性以及即將到來的應用程式要小得多的事情。
這不全是即時語音 API。這不全是語音到語音的原生音頻進音頻出模型。我發現那些模型很難控制,但我們使用更多的 agentic 語音堆疊工作流程,我發現更容易控制。
哎呀,AI Fund 我正在與大量團隊合作語音堆疊的東西,其中一些希望在不久的將來會被宣布。我已經看到很多非常令人興奮的東西。
### 語音應用的技術挑戰與解決方案
**Harrison**:如果這裡的人想進入語音領域,他們熟悉用 LLM 建立 Agent,這有多相似?有很多想法是可以轉移的,還是有什麼新的東西?他們需要學習什麼?
**Andrew**:事實證明有很多應用程式,我認為語音是重要的。它創造了某些互動,這些互動要容易得多,事實證明從應用程式的角度來看。
輸入文字提示實際上是令人intimidating的,對吧?對於很多應用程式,我們可以對用戶說:「告訴我你的想法?這裡有一大塊文字提示,為我寫一堆文字。」這對用戶來說實際上非常令人intimidating。
這的問題之一是人們可以使用退格鍵,所以人們透過文字回應的速度較慢。而對於語音,時間向前推進,你只需要繼續說話,你可以改變主意,你實際上可以說:「哦,我改變主意了,忘記前面那個東西。」我們的模型實際上很善於處理這個。
但我發現在很多應用程式中,讓用戶使用它的摩擦力要低。我們只是說:「告訴我你的想法。」然後他們用語音回應。
就語音而言,最大的差異是在引擎需求方面是延遲,因為如果有人說了什麼,你真的想在一秒內回應,對吧?少於 500 毫秒是很好的,但真的理想情況下是一秒以下。
我們有很多 agentic 工作流程會運行很多秒。所以當 DeepLearning.AI 與 Real Avatar 合作建立我的頭像時,這在網頁上,如果你想的話可以與我的頭像交談。我們的初始版本有大約五到九秒的延遲,這只是糟糕的用戶體驗。你說了什麼,九秒的沉默,然後我的頭像回應。
所以我們想建立我們稱為「預回應」的東西。就像如果你問我一個問題,我可能會說:「嗯,那很有趣。讓我想想。」所以我們提示一個小時基本上這樣做來隱藏延遲,它實際上似乎有效。
很好。還有很多其他小技巧。事實證明,如果你在建立語音客戶服務聊天機器人,事實證明如果你播放客戶聯絡中心的背景噪音,而不是死寂無聲,人們對那種延遲的接受度會高得多。
所以我發現有很多這些東西與純文字基礎的 LLM 不同,但在語音基礎模式讓用戶感到舒適並開始說話的應用程式中。我認為它有時真的減少了用戶從他們那裡獲取一些資訊的摩擦,一個安全的,我認為當我們說話時,我們不覺得需要像我們寫作時那樣傳達完美。
所以某種程度上,人們更容易開始說出他們的想法並改變主意,來回進行,這讓我們從他們那裡獲得我們需要的資訊來幫助用戶前進。
## AI 輔助編程的重要性與程式設計普及化
**Andrew**:然後其他我認為被低估的事情,也許不是被低估,而是更多企業應該做的。我認為你們很多人已經看到使用 AI 輔助編程的開發者比不使用的開發者快得多。
我一直很有趣地看到有多少公司、CIO 和 CTO 仍然有不讓工程師使用 AI 輔助編程的政策。我認為也許有時出於好的理由,但我認為我們必須克服這個,因為坦率地說,我不知道,我的團隊和我只是討厭不得不再次在沒有 AI 輔助的情況下編程。
所以我認為一些企業當然需要克服這個。我認為被低估的是我認為每個人都應該學會編程的想法。
關於 AI Fund 的一個有趣事實。AI Fund 的每個人,包括經營我們前台接待員的人、我的 CFO 和總法律顧問,AI Fund 的每個人實際上都知道如何編程。這不是說我想要他們成為軟體工程師,他們不是。但在他們各自的工作職能中,他們中的許多人,透過學習一點如何編程,能夠更好地告訴電腦他們想要它做什麼。
所以這實際上在所有這些不是軟體工程的工作職能中推動了有意義的生產力改進。所以這也一直很令人興奮。
## AI 編程工具
**Harrison**:談到 AI 編程,你個人使用什麼工具?
**Andrew**:所以我們正在開發一些我們還沒有宣布的東西。
**Harrison**:哦,令人興奮。
**Andrew**:是的。所以也許我確實使用 Cursor、Claude 和一些其他東西。
**Harrison**:好的,我們稍後再回到這個話題。
## AI 輔助程式設計(Vibe Coding)
**Harrison**:另一個現在人們非常關注的事情是 vibe coding 和所有那些。你之前談到了人們如何使用這些 AI 編程助手。但你如何看待 vibe coding?這是與以前不同的技能嗎?它在世界上服務什麼目的?
**Andrew**:所以我認為我們很多人幾乎不看程式碼就在編程,對吧?我認為這是一件絕佳的事情要做。我認為不幸的是它被稱為 vibe coding,因為它誤導了很多人認為,只是跟著感覺走,接受這個,拒絕那個。坦率地說,當我用 vibe coding 或任何與 AI 編程助手一起編程一天時,到一天結束時我坦率地說已經精疲力盡了。
這是一個深度的智力練習。所以我認為這個名字是不幸的,但這個現象是真實的,它正在興起,這很好。
在過去一年中,有幾個人一直在建議其他人不要學習編程,基於 AI 將自動化編程的理由。我認為我們回顧時會發現這是過去幾十年給出的一些最糟糕的職業建議。
隨著編程變得更容易,更多人開始編程。所以事實證明,當我們從打孔卡轉到鍵盤和終端時,對吧?或者事實證明,我實際上找到了一些非常舊的文章。當編程從組合語言轉到字面上的 COBOL 時,當時有人爭論,是的,我們有 COBOL,它非常容易,我們不再需要程式設計師了。
顯然,當它變得更容易時,更多人學會編程。所以用 AI 編程助手,更多人應該編程。但我認為,事實證明未來對開發者和非開發者最重要的技能之一是能夠確切地告訴電腦你想要什麼,這樣它們就會為你做。
我認為在某種程度上理解,你們所有人都做到了,我知道,但在某種程度上理解電腦如何工作讓你能夠更精確地提示或指導電腦,這就是為什麼我仍然試圖建議每個人學習一種程式語言,學習 Python 或其他東西。
然後,我認為,也許你們一些人知道這個。我個人在 Python 開發上比,比如說,JavaScript 更強,對吧?但是用 AI 輔助編程,我現在寫了比我以前更多的 JavaScript 和其他類型的程式碼。
但即使在除錯某些其他東西為我寫的我沒有用自己的手指寫的 JavaScript 程式碼時,真正理解錯誤情況是什麼,這意味著什麼,這對我除錯我的 JavaScript 程式碼非常重要。
**Harrison**:如果你不喜歡 Vibe coding 這個名字,你腦中有更好的名字嗎?
**Andrew**:哦,這是個好問題。我應該考慮一下。
**Harrison**:我們稍後再回到你那個問題。這是個好問題。
## MCP 協議的發展
**Harrison**:有一個新出現的東西,你簡短地提到了,就是 MCP。你如何看待它轉變人們建立應用程式的方式,他們建立什麼類型的應用程式,或者生態系統中一般正在發生什麼?
**Andrew**:我認為這非常令人興奮。就在今天早上,我們與 Anthropic 發布了關於 MCP 的短期課程。我實際上在網路上看到很多關於 MCP 的東西,我認為相當令人困惑。
所以當我們與 Anthropic 合作時,我們說:「讓我們創建一個真正好的 MCP 短期課程,清楚地解釋它。」
我認為 MCP 是絕佳的。我認為這是一個非常明確的市場空白,OpenAI 採用它也說明了這的重要性。我認為 MCP 標準將繼續發展,對吧?所以例如,我認為你們很多人知道 MCP 是什麼,對吧?讓 Agent 主要,但坦率地說我認為其他類型的軟體也更容易插入不同類型的資料。
當我自己使用 LM 或當我建立應用程式時,坦率地說,對我們很多人來說,我們花了很多時間在管道上,對吧?所以我認為對於你們來自大型企業的人來說,AI,特別是推理模型就像非常聰明。當給予正確的上下文時,它們可以做很多事情。但我發現我和我的團隊花很多時間在資料整合的管道上,將上下文提供給 LM 讓它做通常在有很好的輸入上下文時相當明智的事情。
所以 MCP,我認為,是嘗試標準化介面到很多工具或 API 呼叫以及資料來源的絕佳方式。感覺有點像荒野西部。你在網路上找到的很多 MCP 服務不工作,對吧?
然後認證系統有點,你知道,即使對於非常大的公司,MCP 服務有點怪異,不清楚認證令牌是否完全工作,何時過期,很多這樣的事情正在發生。
我認為 MCP 協議本身也很早期。現在,MCP 給出可用資源的長列表。你知道,最終,我認為我們需要一些更階層化的發現。想像你想建立某些東西,我不知道,即使我不知道是否會有 LangGraph 的 MCP 介面,但 LangGraph 有這麼多 API 呼叫,你不能只有一個長列表,列出天底下所有的東西讓 Agent 來分類。
所以我認為某種階層化發現,所以我認為 MCP 是一個真正絕佳的第一步。絕對鼓勵你學習它。如果你找到一個好的 MCP 伺服器實作來幫助一些資料整合,它可能不會讓你的生活更輕鬆。我認為它會很重要。
這個想法是當你有 N 個模型或 N 個 Agent 和 M 個資料來源時,不應該是 N 乘以 M 的努力來做所有的整合,應該是 N 加 M。我認為 MCP 是朝著那種資料整合的絕佳第一步,它需要發展。這是朝著那種資料整合的絕佳第一步。
## 多 Agent 系統的現狀
**Harrison**:另一種比 MCP 獲得較少關注的協議類型是一些 Agent 對 Agent 的東西。我記得當我們一年多前在會議上時,我認為你在談論多 Agent 系統,這會讓這種情況成為可能。你如何看待一些多 Agent 或 Agent 對 Agent 的東西演進?
**Andrew**:我認為 Agent 對 Agent 仍然非常早期。我們大多數人,包括我,我們甚至努力讓我們的程式碼工作。所以讓我的 Agent 與別人的 Agent 一起工作,感覺像是需要兩個奇蹟的要求。
所以我看到當一個團隊正在建立多 Agent 系統時,那經常有效,因為我們建立一堆 Agent,它們可以與它們自己一起工作,你了解協議,那有效。但現在,至少在這個時刻,也許我錯了,我看到的例子數量,當一個團隊的 Agent 或 Agent 集合成功地與完全不同團隊的 Agent 或 Agent 集合互動。
我認為我們對那個還有點早。我確信我們會到達那裡,但我個人還沒有看到真正成功的巨大成功故事。我不確定你是否看到。
**Harrison**:不,我同意。我認為這超級早期。我認為如果 MCP 是早期的,我認為 Agent 對 Agent 的東西甚至更早期。
## 創業建議
**Harrison**:對於觀眾中可能考慮創業或正在考慮這個的人,你會給他們什麼建議?
**Andrew**:所以 AI Fund 是一個創業工作室。我們建立公司,我們專門投資我們將共同創立的公司。
我認為回顧 AI Fund 的經驗教訓,我要說成功創業的第一預測因子是速度。我知道我們在矽谷,但我看到很多人從未見過熟練團隊能夠執行的速度。如果你以前從未見過,我知道你們很多人見過,它就是比任何較慢的企業知道如何做的快得多。
我認為第二個預測因子也非常重要是技術知識。事實證明,如果我們看建立創業公司所需的技能,有一些東西,如何行銷,如何銷售,如何定價,你知道,所有這些都很重要,但那個知識已經存在,所以它更廣泛地傳播。
但真正稀少的知識是技術實際上如何工作?因為技術發展如此迅速。所以我對go-to-market人員有深深的尊重。定價很難。你知道,行銷很難。銷售很難,但知識更分散最稀少的是真正理解技術如何工作的人。
所以在 AI Fund,我們真的喜歡與深度技術人員合作,他們有好的直覺或理解,做這個,不要做那個,只是讓你快兩倍。
然後我認為很多商業東西,你知道,那個知識非常重要,但通常更容易弄清楚。
**Harrison**:這是創業的好建議。我們要結束了。我們現在要休息,但在我們這樣做之前,請加入我給 Andrew 熱烈的掌聲。謝謝你的到來。
**Andrew**:謝謝。
這次對談提供了對當前 AI Agent 領域的深入洞察,從實際技術挑戰到未來發展方向,以及創業建議,涵蓋了從技術細節到商業策略的完整視角。
# 原始逐字稿
```
I'm really excited for this next section.
So we'll be doing a fireside chat with Andrew Ng.
Andrew probably doesn't need any introduction to most folks here.
I'm guessing a lot of people have taken some of his classes on Coursera or deep learning. But Andrew has also been a big part of the lane chain story. So I met Andrew a little over two years ago at a conference when we started talking about
LingChain and he graciously invited us to do a course on Langchain with deep learning. I think it must have been the second or third one that they ever did. And I know a lot of people here probably watched that course or got started on Langchain
because of that course. So Andrew has been a huge part of the Langchane journey and I'm super excited to welcome him
on stage for a fireside chat.
So let's welcome Andrew in.
Thanks for being here.
But by the way, Harrison is really kind. I think Harrison and this team has taught six short causes so far on DeepLanta AI.
And our metrics by Net Promotter Score and so on are that Harrison's causes are among our most highly rated.
So, Susie, go take all of Harrison's causes. I think the Vincent Langrov one had the clearest explanation I have seen myself of a bunch of agented concepts.
They've definitely helped make our courses and explanations better.
So thank you guys for that as well.
You've obviously touched and thought about so many things in this industry, but one of your
takes that I cite a lot and probably people have heard me talk about is your take on kind of
like talking about the agenticness of an application as opposed to whether something's an agent.
And so, you know, as we're here now at an agent conference, maybe we should rename it to an agent conference. But would you mind kind of clarifying that?
And I think it was like almost a year and a half, two years ago that you said that.
And so I'm curious if things have changed in your mind since then. So I remember, I guess, Harrison and I both spoke at a conference like a year, over a year ago.
And at that time, I think both of us were trying to convince other people that agents are a thing,
and we should pay attention to it.
And that was before maybe, I think it was midsummer last year, the bunch of marketers got a hold of the agentic term and started sticking that sticker. everywhere until lost meaning. But to her question, I think about a year and a half ago,
I saw that a lot of people arguing. Is this an agent? This is not a different, you know, arguments?
Is it truly autonomous not an agent? And I felt that it was fine to have the argument, but that we
would succeed better as a community. We just say that there are degrees to which something is
agentic. So, and then if we just say, if you want to build an agentic system with a little bit
of autonomy or a lot of autonomy is all fine, no need to spend time arguing. Is this truly an
agent. Let's just call all of these things agentic systems with different degrees of autonomy.
And I think that actually hopefully reduce the amount of time people wasted, spend arguing of something
as an agent. And let's just call them all agentic and then get on with it. So I think that actually
worked out. Where on that spectrum of kind of like a little autonomy to a lot of autonomy do you see
people building for these days? Yeah. So my team routinely uses Landgraf for our hardest problems,
with complex flows and so on. I'm also seeing tons of business opportunities that,
frankly, are fairly linear workflows or linear with just occasional side branches. So a lot of businesses,
there are opportunities where right now we have people looking at a form on a website, doing
web search, checking some of the database to see if it's a compliance issue or if there are, you know,
someone we shouldn't sell certain stuff to. And it's kind of a, or take something, copy paste it, maybe do another web search, paste it in a different form. So in business process,
there are actually a lot of fairly linear workflows or linear with very small loose and occasional branches usually connoting a failure because of reject this workflow.
So I see a lot of opportunities, but one challenge I see businesses have is it's still pretty difficult to look at some stuff that's being done in your business and figure out how to turn into an agentic workflow.
So what is the granularity with which you should break down this thing into micro tasks?
And then, you know, after you build your building, you know, initial prototype, if it doesn't work well enough, which of these steps do you work on
to improve the performance? So I think that whole bag of skills on how to look at a bunch
of stuff that people are doing, break into sequential steps, where are the small number of branches,
how do you put in place e-vowls, you know, all that, that skill set is still far too rare, I think.
And then of course, there are much more complex agentic workflows that I think you heard a bunch
about with very complex loops that's very valuable as well. But I see much more
in terms of number of opportunities, still in the amount of value, there's a lot of simpler
workflows that I think are still being built out.
Let's talk about some of those skills. Like, so you've been doing deep learning. I think a lot of courses are in pursuit of helping people kind of like build agents.
And so what are some of the skills that you think agent builders all across the spectrum should kind
of like master and get started with?
Boy, it's a good question. I wish I knew good answer that.
I've been thinking a lot about this actually recently. I think, A lot of the challenges, if you have a business process workflow, you often have people in compliance, legal, HR, whatever, doing these steps.
How do you put in place the plumbing, either through a land graph type integration, or we'll see if MCP holds or some of that too, to ingest the data,
and then how do you prompts or process and do the multiple steps in order to build this end-to-end system?
And then one thing I see a lot is putting in place the right thing.
Evales framework to not only understand the performance of the overall system, but
to trace the individual steps. You can hone in on what's the one step that is broken, what's the one prompt that is broken to
work on. I find that a lot of teams probably wait longer than they should, just using human e-vowels, where every time you change something, you then sit there and look at a bunch of output receivers,
right? I see most teams probably slower to put in place e-vowels, systematic e-vowels than is ideal.
I find that having the right instincts for what to do next in the project is still really difficult.
The school teams, the teams are still learning these skills will often, you know, go down blind alleys, right? Where you spend like a few months trying to improve one component, the more experienced team
will say, you know what, I don't think this can ever be made to work.
So just don't just find a different way around this problem. I wish I knew, I knew more efficient ways to get this kind of almost tactile knowledge. Often you're there, you know, look at the output, look at trace, look
at the Lansmith output, and you just got to make a decision, right, in minutes or hours on what to
do next, and that's still very difficult.
And is this kind of like tactile knowledge mostly around LLMs and their limitations,
or more around like just the product framing of things and that skill of taking a job and breaking
it down? That's something that we're still getting accustomed to.
I think it's all of the above, actually. So I feel like over the last couple years, AI tool companies have created an amazing set of AI tools.
And this includes tools like, you know, Land Graph, but also how do you, ideas like, how do you think about Rack?
How do you think about building chatbots? Many, many different ways of approaching memory. I don't know, what else. How do you build e-vows? How do you build guardrails?
But I feel like there's this, you know, wide-shralling array of really exciting tools.
One picture I often have in my head is if all you have are, you know, purple Lego brakes, right? You can't build that much interesting stuff. But, and I think of these tools as being akin to Lego brakes, right?
And the more tools you have is as if you don't just have purple Lego brakes, but a red one and a black one and a yellow one and a green one.
And as you get more different colored and shaped Lego brakes,
you can very quickly assemble them into really cool things. And so I think a lot of these tools, like the ones that's rattling off as different types of Lego brakes,
And when you're trying to build something, you know,
sometimes you need that squiggly, weird shape, leg, or break, and some people know it, and can plug it in and just get the job done.
But if you've never built e-vows of a certain type,
then you know, then you could actually end up spending, whatever, three extra months doing something
that someone else that's done that before. Could say, oh, well, we should just build e-vowls this way. Use an, oh, I'm as a judge, and just go through that process
to get it done much faster. So one of the unfortunate things about AI
It's not just one tool.
And when I'm coding, I just use a whole bunch of different stuff, right? And I'm not a master of enough stuff myself,
but I've learned enough tools to assemble them quickly. So yeah, and I think having that practice
with different tools also helps with much faster decision-making. And one of the things is it also changes. So for example, because OLMs have been having a longer and longer context memory, a lot of the best practices
for RAG from a year and a half, half a go or whatever, much less relevant today.
Right. And I remember Harrison was really early to a lot of these things.
So I played the early land chain, rag frameworks, recursive summarization and all that. As OM context memories got longer, now we just dump a lot more stuff into our own context. It's not that Rack has gone away, but the hyperparameter tuning
has gone way easier. There's a huge range of hypergramses that work, you know,
like just fine. So as OMS keep progressing, the instincts we hold, you know,
years ago may or may not be relevant anymore today.
You mentioned a lot of things that I want to talk about. So, okay, what are some of the Lego bricks that are maybe underrated right now that you
would recommend that people aren't talking about?
Like, eVALs, you know, we had three people talk about eVALs, and I think that's top of people's
mind. But what are some things that most people maybe haven't thought of or haven't heard of yet that you
would recommend them looking into? Good question. I don't know. Yeah. Maybe I'm just showing it.
So even though people talk about evel For some reason, people don't do it.
Why don't you think they do it? And I think it's because people often have, I saw a post on this on E-Vow's Writers' Block. People think of writing E-Vals as this huge thing that you have to do right.
I think of E-VALs is something I'm going to fill together really quickly, you know, in 20 minutes,
and it's not that good, but it starts to complement my human eyeball e-vowel.
And so what often happens is, I'll build a system and this one problem where I keep on getting regression.
regression. I thought I made it work, then it breaks. I made it work, then it breaks,
it's all done it. It just getting annoying. Then a code of a very simple E-vow, maybe with,
you know, five input examples and some very simple E-MIS judge, to just check for this one regression,
right? Did this one thing break? And then I'm not swapping out human evils for automated e-vowls.
I'm still looking at the output myself, but when I change something, I'll run this E-VALs to just, you know,
take this burden something so I don't have to think about it. And then what happens is just
Just like the way we write English maybe, once you have some slightly helpful but clearly very broken,
imperfect e-vow, then you start to go, you know what? I can improve my e-vowl to make it better,
and I can improve it to make it better. So just as when we build a lot of applications, we built some very quick and dirty thing that doesn't work, and it would increment make it better.
For a lot of the way I built e-vowels, I built really awful e-vowls that barely helps.
And then when you look at what it does, you go, you know what?
this e-va is broken. I could fix it. And you incrementally make it better. So that's one thing.
I'll mention one thing that people have talked a lot about, but I think it's so underrated,
is the voice stack. It's one of the things that I'm actually very excited about voice applications. A lot of my friends are very excited about voice applications. I see a bunch of large enterprises,
really excited about voice applications, very large and for us, very large use cases.
For some reason, while there are some developers in this community doing voice, the amount of different
for attention on voice stack applications. There is some, right? It's not if you've ignored it, but that's one thing that feels much smaller than the large enterprise
importance I see as well as applications coming down the pipe. And not all of this is the
real-time voice API. It's not all speech-to-speech native audio in audio models. I find those
models are very hard to control, but we use more of an agenetic voice stack workflow, which is which find much more controllable.
Boy, AIF I'm working with a ton of teams on voice stack stuff, some of which hopefully will be announced in the near future. I've seen a lot of very exciting things. And then other things I think underrated, one of the one that maybe is not underrated,
but more business should do it. I think many of you have seen that developers that use AI assistance in our coding
is so much faster than developers that don't. I've been, it's been interesting to see that how many companies, CIOs and CTOs, still have, you know, policies that don't let engineers
use AI-assisted coding. I think maybe sometimes for good reasons, but I think we have to get
past that because frankly, I don't know, my teams and I just hate to ever have to code again without
AI assistance. So, but I think some businesses certainly need to get through that. I think underrated is the idea that I think everyone should learn to code. One fun fact about AI fund. Everyone in AI fund, including, you know, the person that runs our front desk receptionist
and my CFO and my, and the general counsel, everyone in AI fund actually knows how to code. And it's not that I want them to be software engineers, they're not. But in their respective job functions, many of them, by learning a little bit about how to code,
are better able to tell a computer what they wanted to do.
And so it's actually driving meaningful productivity improvements across all of these job functions that are not software engineering. So that's been exciting as well. Talking about kind of like AI coding, what tools are you using for that personally?
So we're working on some things that we've not yet announced.
Oh, exciting. Yeah. So maybe I do use cursor, Winserve, and some other things.
All right, we'll come back to that later. Talking about voice. If people here want to get into voice and they're familiar with building kind of like agents with LLMs, how similar is it?
Are there a lot of ideas that are transferable or what's new? What will they have to learn? Yeah, so it turns out there are a lot of applications where I think voice is important. It creates certain interactions that are much more, it turns out that, it turns out from an application perspective.
an input text prompt is kind of intimidating, right?
For a lot of applications, well, we can go to the user and say, tell me what you think?
Here's a block of text prompt, write a bunch of text for me. That's actually very intimidating for users.
And one of the problems with that is people can use backspace, and so, you know, people are just slower to respond via text. Whereas for voice, you know, time rolls forward, you just have to keep talking, you could change
your mind, you could actually say, oh, I changed my mind, forget that early thing, and our models that's actually pretty good at dealing with it.
But I find that the lot of applications where the user friction to just getting them to use it
is lower. We just say, you know, tell me what you think. And then they respond in voice.
So in terms of voice, the one biggest difference is in terms of engine requirements is latency, because if you can, if someone says something, you kind of really want to respond in, you know,
I don't know, sub one second, right? Less than 500 milliseconds is great, but really ideally sub one second.
And we have a lot of agentic workflows that were run for many seconds.
So when DeepM.WI worked with Real Avatar to build an avatar of me, this is on a web page,
you can talk to an avatar of me if you want.
Our initial version had kind of five to nine seconds of latency, and it's just a bad user experience.
You say something, you know, nine seconds of silence, then my avatar response.
But so we want to building things like, we call the pre-response.
So just as, you know, if you ask me a question, I might go, huh. that's interesting. Let me think about that. So we prompted an hour to basically do that to hide the latency, and it actually seems to work.
Great. And there are all these other little tricks as well. Turns out if you're building a voice, customer service chatbot, it turns out that if you play
background noise of a customer contact center, instead of dead silence, people are much more accepting of that, you know, latency. So I find that there are a lot of these things that are different than a pure
text-based LOM, but in applications where a voice-based morality, lets the user be
comfortable and just start talking.
I think it sometimes really reduces the user friction to, you know, getting some information
of them and a safe, I think when we talk, we don't feel like we need to deliver perfection
as much as when we write. So it's somehow easier for people to just start blurting out their ideas and change your mind and
go back and forth, and that lets us get the information from them that we need to help the user
to move forward. Huh, that's interesting.
Yeah. One of the new things that's out there, and you mentioned briefly, is MCP.
How are you seeing that transform how people are building apps, what types of apps they're building
or what's generally happening in the ecosystem?
Yeah, I think it's really exciting. Just this morning we released with Anthropic short calls on MCP. I actually saw a lot of stuff, you know, on the interweb on MCP that
I thought was quite confusing.
So when we got to go to Anthropy, we said, you know, let's create a really good short quotes on MCP that explains it clearly.
I think MCP is fantastic. I think it was a very clear market gap and, you know, that Open AI adopted it.
Also, I think speaks to the importance of this. I think the MCP standard will continue to evolve, right? So for example, so I think many of you know what MCP is, right, make it much easier for agents
primarily, but frankly I think other types of software. to plug into different types of data.
When I'm using OMS myself or when I'm building applications, frankly, for a lot of us, we spend so much time on the plumbing, right? So I think for those of you from large enterprises as well,
the AI, especially, you know, reasoning models are like pretty darn intelligent.
They could do a lot of stuff when given the right context. But so I find that I spend, my team spend a lot of time working on the plumbing
on the data integrations to get the context to the OM to make it, you know,
do something that often is pretty sensible when it has a very input context.
So MCP, I think, is a fantastic way to try to standardize the interface,
to a lot of tools or API calls as well as data sources. It feels like, it feels a little bit like Wild West. You know, a lot of MCP service, you find the internet do not work, right?
And then the authentication systems are kind of, you know, even for the very large companies, you know, with MCP service is a little bit funky, it's not clear if the authentication token totally work.
and when expires, a lot of that going on.
I think the MCP protocol itself is also early. Right now, MCP gives a long list of the resources available. You know, eventually, I think we'll need some more hierarchical discovery.
Imagine you want to build something, I don't know,
even, I don't know if there would be an MCP interface to Langdraft, but Langgraph has so many API calls,
you just can't have like a long list of everything under the sun for agent to sort out.
So I think some sort of hierarchical discovery, So I think MCP is a really fantastic first step.
Definitely encourage you to learn about it. It won't make your life easier, probably, if you find a good MCP server implementation to help some of the data integrations. And I think it'll be important. This idea of when you have, you know, N models or N agents and M data sources, it should not be an N-times M effort to do all the integrations, should be N plus M.
And I think MCP is a fantastic first step.
step, it will need to evolve. It's a fantastic first step toward that type of data integration.
Another type of protocol that's seen less buzz than MCP is some of the agent-to-agent stuff. And I remember when we were at a conference a year or so ago, I think you were talking about multi-agent
systems, which this would kind of enable.
So how do you see some of the multi-agent or agent-to-agent stuff evolving?
Yeah. So I think, you know, agent-to-AI is still so early.
Most of us, right? including me, we struggle to even make our code work. And so making my agent work with someone else's agent,
it feels like a two miracle, you know, requirement. So I see that when one team is building a multi-agent system,
that often works, because we build a bunch of agents,
they can make it with themselves, you understand the protocols, that works. But right now, at least at this moment in time,
and maybe I'm off, the number of examples I'm seeing of when one team's agent or collection
agents successfully engages a totally different team's agent or collection of agents.
I think we're a little bit early to that. I'm sure we'll get there, but I'm not personally seeing, you know, real success, huge success stories of that yet. I'm not sure if you are seeing. No, I agree. It's, I think it's super early.
I think if MCP is early, I think agent-agent stuff is even earlier. Another thing that's kind of like top of people's mind right now is kind of vibe coding and all of that.
And you touched on it a little bit earlier with how people are. using these AI coding assistance. But how do you think about vibe coding?
Is that a different skill than before? What kind of purpose does that serve in the world?
Yeah, so, but I think many of us cope with barely looking at the code, right?
I think it's a fantastic thing to be doing. I think it's unfortunate that that called vibe coding,
because it's misleading a lot of people into thinking, just go to the vibes, you know, accept this, reject that. And frankly, when I'm coding for a day, you know, with VibeCode,
coding or whatever with AI coding assistance, I'm frankly exhausted by the end of the day.
This is a deeply intellectual exercise. And so I think the name is unfortunate, but the phenomenon is real and it's been taking off
and it's great.
So over the last year, a few people have been advising others to not learn to code on the basis
that AI will automate coding. I think we'll look back at some of the worst career advice ever given. over the last many decades, as coding became easier, more people started to code.
So it turns out, you know, when we went from punch cards to keyboards and terminals, right?
Or when it turns out, I actually found some very old articles. When programming went from assembly language to literally COBOL, there were people arguing back then,
yeah, we have COBOL, it's so easy, we don't need programmers anymore.
And obviously, when it became easier, more people learn to code.
And so with AI coding assistance, a lot more people people should code. But I think, and it turns out one of the most important skills of the future for developers and
non-developers is the ability to tell a computer exactly what you want, so they will do it for you.
And I think understanding at some level, which all of you do, I know, but understanding at some
level how a computer works lets you prompt or instruct the computer much more precisely,
which is why I still try to advise everyone to learn one program language, learn Python or something.
And then, I think, maybe some of you know this. I personally am a much stronger Python developer than, say, JavaScript, right?
But with AI-Sysic coding, I now write a lot more JavaScript and types of code than I ever used to.
But even when debugging JavaScript code that something else wrote for me that I didn't write with my own fingers,
really understanding, you know, what are the error cases, what does this mean, that that's been really important for me to
right, debug my JavaScript code. If you don't like the name Vibe coding, do you have a better name in mind? Oh, that's a good question. I should think about that.
We'll get back to you on that. That's a good question. One of the things that you announced recently is a new fund for AI Fund, so congrats on that.
Oh, thank you. For people in the audience who are maybe thinking of starting a startup or looking into that,
what advice would you have for them? So, Air Fund's Adventure Studio.
So we built companies and we exclusively invest in companies that we're going to be.
co-founded. So I think in terms of looking back on AI funds, you know, lessons learned, the number one,
I would say the number one predictor of a startup success is speed. I know we're in Silicon Valley, but I see a lot of people that have never seen yet the speed with which a skilled team can execute. And if you've never seen it before, I know many of you have seen it, it's just so much faster than, you know, anything that's... slower businesses know how to do. And I think the number two predictor also very important is technical knowledge.
It turns out if we look at the skills needed to build a startup, there's some things like, how do you market, how do you sell, how do you price, you know, all that is important, but that knowledge has been around, so it's a little bit more widespread.
But the knowledge that's really rare is how does technology actually work?
Because technology you've been evolving so quickly. So I have deep respect for the go-to-market.
People pricing is hard. You know, marketing is hard.
We're actuallying us hard, but that knowledge is more diffuse, and the most rare reason is someone that really understands how the technology works.
So AI fund, we really like working with deeply technical people that have good instincts or understands,
do this, don't do that, just lets you go twice as fast.
And then I think a lot of the business stuff, you know, that knowledge is very important, but it's
usually easier to figure out.
All right, that's great advice for starting something. We are going to wrap this up. We're going to go to a break now, but before we do, please join me in giving Andrew a big hand.
Thank you.
```