LLM Guardrails - ihower's Notes

## OpenAI Cookbook * How to implement LLM guardrails * https://cookbook.openai.com/examples/how_to_use_guardrails * 檢查輸入 * 檢出輸出閾值 * Pro tip: 平行發送用戶 query 和檢查 query 這招厲害喔 * Developing Hallucination Guardrails * https://cookbook.openai.com/examples/developing_hallucination_guardrails ## Guardrails AI https://github.com/guardrails-ai/guardrails ## LLM Guard https://github.com/protectai/llm-guard ## NeMo * https://github.com/NVIDIA/NeMo-Guardrails * https://docs.nvidia.com/nemo/guardrails/ * https://blogs.nvidia.com.tw/2023/04/26/ai-chatbot-guardrails-nemo/ 2023/4/26 * https://towardsdatascience.com/nemo-guardrails-the-ultimate-open-source-llm-security-toolkit-0a34648713ef * 有跟 Lllama Guard 比較，這似乎更全面 * 可以防止離題問題 XD * prompt injection 可以 ## Llama Guard s a 7B parameter Llama 2-based input-output safeguard model https://towardsdatascience.com/safeguarding-your-rag-pipelines-a-step-by-step-guide-to-implementing-llama-guard-with-llamaindex-6f80a2e07756 > 就跟 OpenAI 的 moderation API 一樣的功能? 但 openai 這沒檢測 prompt injection LLM01: Prompt injection LLM01：提示注入 LLM02: Insecure output handling 不安全的輸出處理 LLM06: Sensitive information disclosure 敏感資訊洩露 Llama Guard 2 https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-guard-2/ ## Amazon Bedrock https://aws.amazon.com/tw/bedrock/guardrails/