可以做 Evaluation 的 LLMOps
## LangWatch
https://langwatch.ai/
server-side 有開源
有強調支援 DSPy
## Langfuse
https://langfuse.com/
server-side 有開源
## Opik
https://www.comet.com/site/products/opik/
* 有開源
- UI 簡單漂亮
- 功能都有但都比較陽春,但看起來都有 API 可以用
- dataset 缺少編輯功能????
## LangSmith
https://docs.smith.langchain.com/old/cookbook/testing-examples/rag_eval
## TruLens
https://www.trulens.org/
## Ragas
https://github.com/explodinggradients/ragas
notebook: https://colab.research.google.com/github/explodinggradients/ragas/blob/main/docs/quickstart.ipynb
blog
https://blog.langchain.dev/evaluating-rag-pipelines-with-ragas-langsmith/
https://cobusgreyling.medium.com/combining-ragas-rag-assessment-tool-with-langsmith-e46078001f95
Florian 的介紹文 https://ai.plainenglish.io/advanced-rag-03-using-ragas-llamaindex-for-rag-evaluation-84756b82dca7
## Deepeval
https://github.com/confident-ai/deepeval
## continuous-eval
https://github.com/relari-ai/
https://www.relari.ai/
## braintrust
jason liu 推薦
https://www.braintrust.dev/
## UpTrain
https://www.llamaindex.ai/blog/supercharge-your-llamaindex-rag-pipeline-with-uptrain-evaluations (2024/3/19)
https://uptrain.ai/
內建了很多指標,還有做介面
YC 投資的公司
## Parea.ai
https://www.parea.ai/
## Azure 的 Evaluation 功能
* https://learn.microsoft.com/en-us/azure/ai-studio/concepts/evaluation-approach-gen-ai
* https://learn.microsoft.com/en-us/azure/ai-studio/concepts/evaluation-metrics-built-in
* 各種指標,也有 RAG 用的
* 有 prompt 可以參考
## Arize Phoenix
- https://github.com/Arize-ai/phoenix
- https://app.phoenix.arize.com/
## Comet
## Weights & Biases
## 其他
- https://github.com/lmnr-ai/lmnr
- https://github.com/helicone/helicone
- https://github.com/Scale3-Labs/langtrace
- https://www.confident-ai.com/
## 傳統 Evaluation 工具
* https://github.com/cvangysel/pytrec_eval
* https://ir-measur.es/en/latest/
* https://x.com/jobergum/status/1794996654958854219
* https://pyterrier.readthedocs.io/en/latest/installation.html