> 視覺介面控制代理,讓 Agent 可以操作電腦的能力
* Anthropic 的 [[Computer Use]]
* https://www.anthropic.com/news/3-5-models-and-computer-use
* https://docs.anthropic.com/en/docs/build-with-claude/computer-use
* https://www.facebook.com/ihower/posts/10161528069893971
* paper: Large Language Model-Brained GUI Agents: A Survey https://arxiv.org/abs/2411.18279v6 (2024/11)
* https://twitter.com/leeoxiang/status/1777685999143026953 (2024/4/9)
* 1、Windows平台:UFO: A UI-Focused Agent for Windows OS Interaction.
* 2、iOS 平台:苹果 Ferret-UI 苹果 Ferret-UI 多模态大型语言模型(MLLM),专门针对移动用户界面(UI)屏幕的理解进行了优化。Ferret-UI具备引用、定位和推理能力,能够更有效地理解和与UI屏幕进行交互。 https://arxiv.org/abs/2410.18967
* 3、Android 平台:ScreenAI: ScreenAI的核心是一种新的屏幕截图文本表示方法,可以识别UI元素的类型和位置
* awesome papers: https://github.com/francedot/acu
* agent
* https://x.com/LangChainAI/status/1881023825933942886 (2025/1/20)
* https://github.com/Upsonic/Upsonic
* UI -TARS https://github.com/bytedance/UI-TARS
* https://x.com/TsingYoga/status/1881570775263859047
## Browser 工具
* https://browser-use.com/
* https://github.com/steel-dev/steel-browser
* https://www.browserbase.com/
* https://github.com/web-infra-dev/midscene
* https://x.com/yadong_xie/status/1871189552192430152 (2024/12/23)
* 整理 https://x.com/johnrushx/status/1883872256121774401 (2025/1/27)
## Open Computer Use
* https://github.com/e2b-dev/open-computer-use
## AutoGLM
https://xiao9905.github.io/AutoGLM/
## 微軟 OmniParser
https://github.com/microsoft/OmniParser
https://www.jiqizhixin.com/articles/2024-10-26-4
https://www.microsoft.com/en-us/research/articles/omniparser-v2-turning-any-llm-into-a-computer-use-agent/
## GPT-4V
https://aiemploye.com/
https://osu-nlp-group.github.io/SeeAct/
https://github.com/OthersideAI/self-operating-computer 操控電腦
## MultiOn
* https://www.multion.ai/
* More: https://www.kadoa.com/blog/ai-agents-hype-vs-reality
## 視覺代理控制
* web agents: https://twitter.com/omarsar0/status/1742923330544706035 (2024/1/4)
* https://twitter.com/omarsar0/status/1753889394111479852 (2024/2/4)
* https://baai-agents.github.io/Cradle/
* WebVoyager
* Skyvern 開源
* https://github.com/Skyvern-AI/skyvern
* https://twitter.com/tuturetom/status/1787296091475780054 2024/5/6
* https://github.com/X-PLUG/MobileAgent
* https://www.airtop.ai/
## 評測
* OSWorld
* https://twitter.com/dotey/status/1778605434229731667
* https://os-world.github.io/
* Mind2Web
* https://osu-nlp-group.github.io/Mind2Web/
* paper: The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use (2024/11)
* https://arxiv.org/abs/2411.10323
* https://x.com/omarsar0/status/1858526493661446553
* https://gui-agent.github.io/grounding-leaderboard/
* https://x.com/ChiYeung_Law/status/1875179243401019825 (2025/1/3)