{"id":11933,"date":"2024-04-11T13:53:39","date_gmt":"2024-04-11T05:53:39","guid":{"rendered":"https:\/\/ihower.tw\/blog\/?p=11933"},"modified":"2025-07-04T07:22:43","modified_gmt":"2025-07-03T23:22:43","slug":"llm-tokenizer","status":"publish","type":"post","link":"https:\/\/ihower.tw\/blog\/11933-llm-tokenizer","title":{"rendered":"\u4f7f\u7528\u7e41\u9ad4\u4e2d\u6587\u8a55\u6e2c\u5404\u5bb6 LLM Tokenizer \u5206\u8a5e\u5668"},"content":{"rendered":"\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>\u60f3\u7cfb\u7d71\u6027\u5b78\u7fd2\u5982\u4f55\u6253\u9020 LLM\u3001RAG \u548c Agents \u61c9\u7528\u55ce? \u6b61\u8fce\u5831\u540d\u6211\u7684\u8ab2\u7a0b\u00a0<a href=\"https:\/\/aihao.tw\/llm\">\u5927\u8a9e\u8a00\u6a21\u578b LLM \u61c9\u7528\u958b\u767c\u5de5\u4f5c\u574a<\/a><\/p>\n<\/blockquote>\n\n\n\n<p>Updated(2024\/5\/14) \u66f4\u65b0\u4e0a GPT-4o (o200k_base)\uff0c\u9019\u6b21 OpenAI <a href=\"https:\/\/openai.com\/index\/hello-gpt-4o\/\">\u6709\u66f4\u63db Tokenizer<\/a> \u9032\u6b65\u975e\u5e38\u975e\u5e38\u591a\u3002<\/p>\n\n\n\n<p>Updated(2024\/4\/21) \u66f4\u65b0\u4e0a Llama 3\uff0c\u9019\u6b21 Meta <a href=\"https:\/\/ai.meta.com\/blog\/meta-llama-3\/\">\u6709\u66f4\u63db Tokenizer<\/a> \u9032\u6b65\u975e\u5e38\u975e\u5e38\u591a\u3002<\/p>\n\n\n\n<p>\u8a71\u8aaa\u5927\u8a9e\u8a00\u6a21\u578b LLM \u7684\u904b\u7b97\u548c\u63a8\u8ad6\u6210\u672c\u90fd\u662f\u7528 Tokens \u6578\u91cf\u4f86\u8a08\u7b97\u7684\uff0c\u8f38\u5165\u7684\u5167\u5bb9\u90fd\u5f97\u8f49\u6210 Tokens \u5e8f\u5217\u4f86\u904b\u7b97\uff0c\u8f38\u51fa\u5247\u8f49\u56de\u4f86\u3002<\/p>\n\n\n\n<p>\u4f46\u662f\u5462\uff0c\u5176\u5be6\u6bcf\u4e00\u5bb6\u7528\u7684 Tokenizer (\u5206\u8a5e\u5668)\u90fd\u4e0d\u592a\u4e00\u6a23\uff0c\u56e0\u6b64\u76f8\u540c\u7684\u6587\u672c\uff0c\u62c6\u51fa\u4f86\u7684 tokens \u6578\u91cf\u662f\u4e0d\u4e00\u6a23\u7684\u3002\u56e0\u6b64\u5f88\u591a\u6a21\u578b\u7684\u63a8\u8ad6\u6210\u672c\u6bd4\u8f03\u3001Context window \u9577\u5ea6\u9650\u5236\u6bd4\u8f03\u7b49\u7b49\uff0c\u5be6\u969b\u61c9\u7528\u6642\u90fd\u4e0d\u592a\u6e96\u78ba\uff0c\u7279\u5225\u662f\u975e\u82f1\u6587\u7684\u8a9e\u8a00\uff0c\u5404\u5bb6\u5dee\u7570\u975e\u5e38\u5927\u3002<\/p>\n\n\n\n<p>\u4e00\u6a23\u7684\u6587\u672c\uff0c\u82e5\u80fd\u7528\u6bd4\u8f03\u5c11\u7684 Tokens \u6578\u4f86\u904b\u7b97\uff0c\u63a8\u8ad6\u901f\u5ea6\u6703\u6bd4\u8f03\u5feb\u3001\u6210\u672c(\u904b\u7b97\u8cc7\u6e90)\u4e5f\u6703\u6bd4\u8f03\u5212\u7b97\uff0c\u7562\u7adf\u8a08\u50f9\u4e5f\u662f\u7528 tokens \u6578\u8a08\u7b97\u7684\uff0c\u5148\u4e88\u6558\u660e\u3002<\/p>\n\n\n\n<p>\u5177\u9ad4\u6703\u5dee\u591a\u5c11\uff0c\u4ee5\u4e0b\u662f\u6211\u7684\u6e2c\u8a66\u7d50\u679c\uff0c\u4f7f\u7528\u4e86\u7e41\u9ad4\u4e2d\u6587\u7d04\u516b\u842c\u591a\u500b\u5b57(\u653f\u5e9c\u5831\u544a\u548c\u7ba1\u7406\u5b78\u8b1b\u7fa9)\u505a\u51fa\u4f86\u7684\u5be6\u9a57\u7d50\u679c\u3002\u53e6\u5916\u4e5f\u505a\u4e86\u82f1\u6587\u4e5f\u662f\u7d04\u516b\u842c\u5b57(\u5169\u7bc7\u90e8\u843d\u683c\u6587\u7ae0)\u3002<\/p>\n\n\n\n<!--more-->\n\n\n\n<h2 class=\"wp-block-heading\">\u7e41\u9ad4\u4e2d\u6587<\/h2>\n\n\n\n<p>\u5148\u770b\u5927\u5bb6\u95dc\u5fc3\u7684\u7e41\u9ad4\u4e2d\u6587\u7d50\u679c\uff0c\u4ee5 OpenAI GPT3.5 (cl100k_base)\u70ba\u57fa\u6e96\u4f86\u6bd4\u8f03\u7684\u8a71:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/05\/image.png\"><img loading=\"lazy\" decoding=\"async\" width=\"968\" height=\"514\" data-attachment-id=\"12112\" data-permalink=\"https:\/\/ihower.tw\/blog\/11933-llm-tokenizer\/image-5\" data-orig-file=\"https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/05\/image.png\" data-orig-size=\"968,514\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"image\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/05\/image-300x159.png\" data-large-file=\"https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/05\/image.png\" src=\"https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/05\/image.png\" alt=\"\" class=\"wp-image-12112\" srcset=\"https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/05\/image.png 968w, https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/05\/image-300x159.png 300w, https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/05\/image-768x408.png 768w\" sizes=\"auto, (max-width: 968px) 100vw, 968px\" \/><\/a><\/figure>\n\n\n\n<p>\u6c92\u60f3\u5230\u5dee\u8ddd\u7adf\u7136\u53ef\u4ee5\u9054\u5230\u5169\u500d\u9019\u9ebc\u591a\uff0c\u9664\u4e86 Llama 2 \u4e4b\u5916\u90fd\u6bd4 GPT3.5 &amp; GPT-4 \u9084\u8981\u7bc0\u7701 Tokens \u6578\uff0cGPT3.5 \u548c GPT-4 \u7684 Tokenizer \u5c0d\u7e41\u9ad4\u4e2d\u6587\u771f\u7684\u5f88\u4e0d\u53cb\u5584\u554a! \u76f4\u5230\u6700\u65b0\u7684 GPT-4o \u6709\u5927\u5e45\u6539\u9032! <\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>\u8868\u73fe\u6700\u597d\u7684\u662f\u806f\u767c\u79d1 Breeze \u7d04\u7bc0\u7701 50% \u7684 tokens \u6578 \ud83d\udc4d\ud83d\udc4d\ud83d\udc4d \u679c\u7136\u662f\u91dd\u5c0d\u7e41\u9ad4\u4e2d\u6587\u6700\u4f73\u5316<\/li>\n\n\n\n<li>Cohere Command R+ \u7d04\u7bc0\u7701 47% \ud83d\udc4d\ud83d\udc4d\ud83d\udc4d<\/li>\n\n\n\n<li>Google Gemini \u7d04\u7bc0\u7701 46% \ud83d\udc4d\ud83d\udc4d\ud83d\udc4d<\/li>\n\n\n\n<li>Llama 3 \u6bd4 gpt-3.5 \u7d04\u7bc0\u7701\u4e86 37% \ud83d\udc4d\ud83d\udc4d<\/li>\n\n\n\n<li>\u65b0\u7684 gpt-4o (o200k_base)\u6bd4 gpt-3.5 \u7d04\u7bc0\u7701 36% \ud83d\udc4d\ud83d\udc4d<\/li>\n\n\n\n<li>Claude 3 \u7d04\u7bc0\u7701 22% \ud83d\udc4d\ud83d\udc4d<\/li>\n\n\n\n<li>Mistral \u7d04\u7bc0\u7701 12% \ud83d\udc4d<\/li>\n\n\n\n<li>Llama 2 \u6bd4 gpt-3.5 \u9084\u5dee\uff0c\u9084\u589e\u52a0\u4e86 20% \ud83d\ude31<\/li>\n<\/ul>\n\n\n\n<p>\u8b93\u6211\u63db\u53e5\u8a71\u8aaa:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>1K MediaTek Breeze Tokens \u7d04\u7b49\u65bc 2K GPT-3.5 Tokens<\/li>\n\n\n\n<li>1K Command-R Tokens \u7d04\u7b49\u65bc 1.9K GPT-3.5 Tokens<\/li>\n\n\n\n<li>1K Gemini Tokens \u7d04\u7b49\u65bc 1.8K GPT-3.5 Tokens<\/li>\n\n\n\n<li>1K Llama 3 Tokens \u7d04\u7b49\u65bc 1.6K GPT-3.5 Tokens<\/li>\n\n\n\n<li>1K GPT-4o Tokens \u7d04\u7b49\u65bc 1.6K GPT-3.5 Tokens<\/li>\n\n\n\n<li>1K Claude 3 Tokens \u7d04\u7b49\u65bc 1.3K GPT-3.5 Tokens<\/li>\n\n\n\n<li>1K Mistral Tokens \u7d04\u7b49\u65bc 1.1K GPT-3.5 Tokens<\/li>\n\n\n\n<li>1K Llama 2 Tokens \u7d04\u7b49\u65bc 837 gpt-3.5 Tokens (\u8868\u73fe\u6bd4 OpenAI \u9084\u5dee)<\/li>\n<\/ul>\n\n\n\n<p>\u7e3d\u4e4b\uff0c\u4ee5\u5f8c\u770b\u5404\u5bb6\u6a21\u578b\u7684 Token \u6210\u672c\u548c Context Window \u9650\u5236\u6642\uff0c\u90fd\u5f97\u5206\u5225\u5fc3\u5e95\u63db\u7b97\u4e00\u4e0b\u4e86\u3002<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Updated: \u4ee5\u4e0b\u6539\u7528 gpt-4o \u505a\u57fa\u6e96\u4f86\u756b\u5716<\/h3>\n\n\n\n<p>56% \u610f\u601d\u5c31\u662f gpt-4 \u6bd4 gpt-4o \u591a\u7528 56% \u7684 tokens\uff0c\u800c Breeze -22% \u5247\u662f\u7bc0\u7701 22% \u7684 tokens\u3002<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/07\/image.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"582\" data-attachment-id=\"12154\" data-permalink=\"https:\/\/ihower.tw\/blog\/11933-llm-tokenizer\/image-6\" data-orig-file=\"https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/07\/image.png\" data-orig-size=\"1850,1052\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"image\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/07\/image-300x171.png\" data-large-file=\"https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/07\/image-1024x582.png\" src=\"https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/07\/image-1024x582.png\" alt=\"\" class=\"wp-image-12154\" srcset=\"https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/07\/image-1024x582.png 1024w, https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/07\/image-300x171.png 300w, https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/07\/image-768x437.png 768w, https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/07\/image-1536x873.png 1536w, https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/07\/image-1568x892.png 1568w, https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/07\/image.png 1850w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">\u82f1\u6587<\/h3>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/05\/image-1.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"443\" data-attachment-id=\"12114\" data-permalink=\"https:\/\/ihower.tw\/blog\/11933-llm-tokenizer\/image-1-2\" data-orig-file=\"https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/05\/image-1.png\" data-orig-size=\"1216,526\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"image-1\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/05\/image-1-300x130.png\" data-large-file=\"https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/05\/image-1-1024x443.png\" src=\"https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/05\/image-1-1024x443.png\" alt=\"\" class=\"wp-image-12114\" srcset=\"https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/05\/image-1-1024x443.png 1024w, https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/05\/image-1-300x130.png 300w, https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/05\/image-1-768x332.png 768w, https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/05\/image-1.png 1216w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<p>\u7d50\u679c\u8ddf\u7e41\u9ad4\u4e2d\u6587\u5dee\u7570\u5f88\u5927\uff0c\u9019\u88e1 OpenAI \u548c Llama 3 \u4e26\u5217\u7b2c\u4e00\u6700\u7701 tokens \u6578\uff0c\u5176\u4ed6\u5bb6\u90fd\u6703\u82b1\u6bd4 OpenAI \u591a\u4e00\u4e9b\u7684 tokens \u6578\u3002Llama 2 \u656c\u966a\u672b\u5ea7\u3002<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">\u5be6\u9a57\u65b9\u6cd5<\/h3>\n\n\n\n<p>\u4ee5\u4e0b\u662f\u4f7f\u7528\u7684\u6587\u672c\u548c\u6578\u64da<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>\u4e2d\u6587\u6587\u672c\u4e00\u662f <a href=\"https:\/\/moda.gov.tw\/information-service\/govinfo\/business-report\/1177\">\u6578\u767c\u90e8 \u7acb\u6cd5\u9662\u7b2c11\u5c46\u7b2c1\u6703\u671f\u4ea4\u901a\u59d4\u54e1\u6703\u6578\u4f4d\u767c\u5c55\u90e8\u696d\u52d9\u6982\u6cc1\u5831\u544a.pdf<\/a><\/li>\n\n\n\n<li>\u4e2d\u6587\u6587\u672c\u4e8c\u662f <a href=\"https:\/\/ba.nccu.edu.tw\/zh_tw\/download\/DownloadOthers?page_no=1&amp;\">\u53f8\u5f92\u9054\u8ce2\u7b56\u7565\u7ba1\u7406\u8b1b\u7fa9 \u524d\u4e94\u7ae0\u5167\u5bb9<\/a><\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/05\/image-3.png\"><img loading=\"lazy\" decoding=\"async\" width=\"975\" height=\"306\" data-attachment-id=\"12116\" data-permalink=\"https:\/\/ihower.tw\/blog\/11933-llm-tokenizer\/image-3-3\" data-orig-file=\"https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/05\/image-3.png\" data-orig-size=\"975,306\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"image-3\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/05\/image-3-300x94.png\" data-large-file=\"https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/05\/image-3.png\" src=\"https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/05\/image-3.png\" alt=\"\" class=\"wp-image-12116\" srcset=\"https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/05\/image-3.png 975w, https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/05\/image-3-300x94.png 300w, https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/05\/image-3-768x241.png 768w\" sizes=\"auto, (max-width: 975px) 100vw, 975px\" \/><\/a><\/figure>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/04\/image-5.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"292\" data-attachment-id=\"12368\" data-permalink=\"https:\/\/ihower.tw\/blog\/11933-llm-tokenizer\/image-7\" data-orig-file=\"https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/04\/image-5.png\" data-orig-size=\"1096,312\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"image\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/04\/image-5-300x85.png\" data-large-file=\"https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/04\/image-5-1024x292.png\" src=\"https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/04\/image-5-1024x292.png\" alt=\"\" class=\"wp-image-12368\" srcset=\"https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/04\/image-5-1024x292.png 1024w, https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/04\/image-5-300x85.png 300w, https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/04\/image-5-768x219.png 768w, https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/04\/image-5.png 1096w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<ul class=\"wp-block-list\">\n<li>\u82f1\u6587\u6587\u672c\u4e00\u662f <a href=\"https:\/\/hyperstellar.substack.com\/p\/let-me-finish-your-sentences\">We&#8217;re all Stochastic Parrots<\/a><\/li>\n\n\n\n<li>\u82f1\u6587\u6587\u672c\u4e8c\u662f <a href=\"https:\/\/towardsdatascience.com\/how-i-won-singapores-gpt-4-prompt-engineering-competition-34c195a93d41\">How I Won Singapore\u2019s GPT-4 Prompt Engineering Competition<\/a><\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/05\/image-2.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1012\" height=\"277\" data-attachment-id=\"12115\" data-permalink=\"https:\/\/ihower.tw\/blog\/11933-llm-tokenizer\/image-2-3\" data-orig-file=\"https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/05\/image-2.png\" data-orig-size=\"1012,277\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"image-2\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/05\/image-2-300x82.png\" data-large-file=\"https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/05\/image-2.png\" src=\"https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/05\/image-2.png\" alt=\"\" class=\"wp-image-12115\" srcset=\"https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/05\/image-2.png 1012w, https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/05\/image-2-300x82.png 300w, https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/05\/image-2-768x210.png 768w\" sizes=\"auto, (max-width: 1012px) 100vw, 1012px\" \/><\/a><\/figure>\n\n\n\n<p>\u8a08\u7b97\u65b9\u5f0f<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>OpenAI \u662f\u7528 <a href=\"https:\/\/platform.openai.com\/tokenizer\">platform.openai.com\/tokenizer<\/a> \u8a08\u7b97\u7684<\/li>\n\n\n\n<li>Gemini, Mistral, Llama \u548c Cohere \u662f\u7528<a href=\" https:\/\/huggingface.co\/spaces\/Xenova\/the-tokenizer-playground\"> huggingface.co\/spaces\/Xenova\/the-tokenizer-playground <\/a>\u7b97\u7684\uff0c\u6211\u6709\u7528 Gemini \u5f8c\u53f0\u8ddf Cohere API \u78ba\u8a8d\u662f\u4e00\u6a23\u7684\u3002<\/li>\n\n\n\n<li>Claude 3 \u6c92\u627e\u5230\u5de5\u5177\u53ef\u4ee5\u81ea\u5df1\u7b97\uff0c\u6211\u53ea\u80fd\u5be6\u969b\u7528 Claude API \u547c\u53eb\u5f8c\uff0c\u67e5\u770b\u56de\u50b3\u7684 tokens usage \u6578<\/li>\n\n\n\n<li>Breeze 7B \u5247\u662f\u5728\u6211\u672c\u6a5f\u81ea\u5df1\u5b89\u88dd\u8dd1\u7684 <a href=\"https:\/\/huggingface.co\/MediaTek-Research\/Breeze-7B-Instruct-v1_0\">huggingface.co\/MediaTek-Research\/Breeze-7B-Instruct-v1_0<\/a><\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">\u5176\u4ed6\u5225\u4eba\u505a\u7684 Non-English \u8a55\u6e2c<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>\u6709\u53d7\u5230<a href=\"https:\/\/www.linkedin.com\/posts\/peter-gostev_llm-providers-charge-you-per-token-but-their-activity-7177810257417523200-c4dp\/\">\u9019\u7bc7<\/a>\u7684\u555f\u767c\uff0c\u4e0d\u904e\u4ed6\u628a Non-English Text \u6df7\u4e00\u8d77\uff0c\u4e0d\u77e5\u9053\u662f\u600e\u9ebc\u8a08\u7b97\u7684<\/li>\n\n\n\n<li>Cohere Command R+ \u5b98\u65b9\u4e5f\u6709<a href=\"https:\/\/txt.cohere.com\/command-r-plus-microsoft-azure\/\">\u505a\u6bd4\u8f03<\/a>\uff0cChinese (\u7c21\u9ad4\u4e2d\u6587\u5427?) \u548c OpenAI \u76f8\u5dee 1.54 \u500d<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"517\" data-attachment-id=\"11936\" data-permalink=\"https:\/\/ihower.tw\/blog\/11933-llm-tokenizer\/image-2-2\" data-orig-file=\"https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/04\/image-2.png\" data-orig-size=\"1574,794\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"image-2\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/04\/image-2-300x151.png\" data-large-file=\"https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/04\/image-2-1024x517.png\" src=\"https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/04\/image-2-1024x517.png\" alt=\"\" class=\"wp-image-11936\" srcset=\"https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/04\/image-2-1024x517.png 1024w, https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/04\/image-2-300x151.png 300w, https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/04\/image-2-768x387.png 768w, https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/04\/image-2-1536x775.png 1536w, https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/04\/image-2-1568x791.png 1568w, https:\/\/ihower.tw\/blog\/wp-content\/uploads\/2024\/04\/image-2.png 1574w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">\u95dc\u65bc Tokenizer \u7684\u79d1\u666e\u77e5\u8b58<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>\u63a8\u85a6 <a href=\"https:\/\/www.facebook.com\/permalink.php?story_fbid=3130697160397454&amp;id=100003716013282\">YC \u7684\u89e3\u8aaa: \u70ba\u4f55\u64c1\u6709\u597d\u7684\u4e2d\u6587\u8a5e\u8868\u662f\u91cd\u8981\u7684 (\u5c0d\u65bc\u4e2d\u6587\u8a9e\u8a00\u6a21\u578b)<\/a><\/li>\n\n\n\n<li>\u63a8\u85a6 Andrej Karpathy \u5927\u795e\u7684<a href=\"https:\/\/www.youtube.com\/watch?v=zduSFxRajkE\">\u6559\u5b78\u5f71\u7247 Let&#8217;s build the GPT Tokenizer<\/a> \u6709\u5f88\u8a73\u7d30\u7684 Tokenizer \u548c <a href=\"https:\/\/zh.wikipedia.org\/wiki\/%E5%AD%97%E8%8A%82%E5%AF%B9%E7%BC%96%E7%A0%81\">Byte Pair Encoding<\/a> \u539f\u7406\u89e3\u8aaa\u3002<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>\u60f3\u7cfb\u7d71\u6027\u5b78\u7fd2\u5982\u4f55\u6253\u9020 LLM\u3001RAG \u548c Agents \u61c9\u7528\u55ce? \u6b61\u8fce\u5831\u540d\u6211\u7684\u8ab2\u7a0b\u00a0\u5927\u8a9e\u8a00\u6a21\u578b LLM \u61c9\u7528\u958b &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/ihower.tw\/blog\/11933-llm-tokenizer\" class=\"more-link\">\u95b1\u8b80\u5168\u6587<span class=\"screen-reader-text\">\u3008\u4f7f\u7528\u7e41\u9ad4\u4e2d\u6587\u8a55\u6e2c\u5404\u5bb6 LLM Tokenizer \u5206\u8a5e\u5668\u3009<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[80],"tags":[],"class_list":["post-11933","post","type-post","status-publish","format-standard","hentry","category-llm","entry"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p1q6tG-36t","jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"_links":{"self":[{"href":"https:\/\/ihower.tw\/blog\/wp-json\/wp\/v2\/posts\/11933","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ihower.tw\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ihower.tw\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ihower.tw\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/ihower.tw\/blog\/wp-json\/wp\/v2\/comments?post=11933"}],"version-history":[{"count":62,"href":"https:\/\/ihower.tw\/blog\/wp-json\/wp\/v2\/posts\/11933\/revisions"}],"predecessor-version":[{"id":12703,"href":"https:\/\/ihower.tw\/blog\/wp-json\/wp\/v2\/posts\/11933\/revisions\/12703"}],"wp:attachment":[{"href":"https:\/\/ihower.tw\/blog\/wp-json\/wp\/v2\/media?parent=11933"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ihower.tw\/blog\/wp-json\/wp\/v2\/categories?post=11933"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ihower.tw\/blog\/wp-json\/wp\/v2\/tags?post=11933"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}