Aligning Evaluation with Human Preferences

* Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences * https://arxiv.org/abs/2404.12272 * https://blog.langchain.dev/aligning-llm-as-a-judge-with-human-preferences/ * 用人類校正資料，去 few-shot 評估 prompt 做對齊 * https://x.com/eugeneyan/status/1817664535186317341 - [[LangSmith Evaluations 影片系列]] 最後兩集 * https://eugeneyan.com/writing/aligneval/ * 一個半自動化的評估 app * https://x.com/eugeneyan/status/1851654159692616020 (2024/10/30) * https://aligneval.com/