实测书生系列开源的最新模型-InternLM3-8B - 文章 - 开发者社区

大家好，我是刘聪NLP。

2025年才过了半个月，开源社区持续躁动，这两天已经有4家又开源了新模型，千问开源了过程奖励模型-Qwen2.5-Math-PRM、面壁开源了MiniCPM-o 2.6全模态模型、MiniMax开源了MiniMax-01 456B 线性注意力模型，书生开源了InternLM3-8B模型。

好起来了，全都好起来了!!!

今天先给大家带来一手InternLM3-8B的模型实测，看看效果到底如何！


        
          
HF: https://huggingface.co/internlm/internlm3-8b-instruct

模型主要特点：

仅使用4T Tokens数据训练，效果超过同尺寸的Llama3.1-8B和Qwen2.5-7B
既支持普通对话模型，又支持深度思考模式，通过不同system prompt可以实现。

picture.image

下面开始评测，因为自己懒得部署模型了，就直接使用官方的链接了。测试依旧老三样，具体如下：

将“I love InternLM3”这句话的所有字母反转

picture.image

说明：感觉是tokenizer的问题，反转的不对，而且我变换了几种问法都没对。

9.9和9.11谁大

picture.image

说明：结果正确。

监狱里的都是犯人，为什么警察不去监狱里抓坏人

picture.image

说明：结果正确

生蚝煮熟了叫什么？

picture.image

说明：熟蚝，但是也解释了，是煮熟的生蚝，勉强接受吧！

用水来兑水，得到的是浓水还是稀水

picture.image

说明：结果正确

小红有2个兄弟，3个姐妹，那么小红的兄弟有几个姐妹

picture.image

说明：结果正确（此题，我默认小红是女生）

小红（女）有2个兄弟，3个姐妹，那么小红的兄弟有几个姐妹

picture.image

说明：结果正确

未来的某天，李同学在实验室制作超导磁悬浮材料时，意外发现实验室的老鼠在空中飞，分析发现，是因为老鼠不小心吃了磁悬浮材料。第二天，李同学又发现实验室的蛇也在空中飞，分析发现，是因为蛇吃了老鼠。第三天，李同学又发现实验室的老鹰也在空中飞，你认为其原因是

picture.image

说明：回答的挺不错的，但只是否定了磁悬浮的事情，要是再强调出老鹰本来就会飞就更完美了。

有一天，一个女孩参加数学考试只得了 38 分。她心里对父亲的惩罚充满恐惧，于是偷偷把分数改成了 88 分。她的父亲看到试卷后，怒发冲冠，狠狠地给了她一巴掌，怒吼道：“你这 8 怎么一半是绿的一半是红的，你以为我是傻子吗？”女孩被打后，委屈地哭了起来，什么也没说。过了一会儿，父亲突然崩溃了。请问这位父亲为什么过一会崩溃了？

picture.image

说明：没回答到我想要的点上，详见该链接。

下面测试深思考的数学能力。

2024年年高考全国甲卷数学（文）试题

picture.image

结果正确，

picture.image

2024年高考全国甲卷数学（理）试题

picture.image

结果正确，C方程为;

picture.image

数学测试了比较多，基本上都是正确的，并且推理过程也很完整，有自我纠错的过程。8B模型能达到这种程度，真不错了。

最后如果本地部署InternLM3-8B的话，可用transformers直接推理、LMDeploy、Ollama、vLLM也都支持。


        
          
import torch  
from transformers import AutoTokenizer, AutoModelForCausalLM  
  
model_dir = "internlm/internlm3-8b-instruct"  
tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)  
model = AutoModelForCausalLM.from_pretrained(model_dir, trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()  
model = model.eval()  
  
system_prompt = """You are an AI assistant whose name is InternLM (书生·浦语).  
- InternLM (书生·浦语) is a conversational language model that is developed by Shanghai AI Laboratory (上海人工智能实验室). It is designed to be helpful, honest, and harmless.  
- InternLM (书生·浦语) can understand and communicate fluently in the language chosen by the user such as English and 中文."""  
messages = [  
    {"role": "system", "content": system_prompt},  
    {"role": "user", "content": "Please tell me five scenic spots in Shanghai"},  
 ]  
tokenized_chat = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to("cuda")  
  
generated_ids = model.generate(tokenized_chat, max_new_tokens=1024, temperature=1, repetition_penalty=1.005, top_k=40, top_p=0.8)  
  
generated_ids = [  
    output_ids[len(input_ids):] for input_ids, output_ids in zip(tokenized_chat, generated_ids)  
]  
prompt = tokenizer.batch_decode(tokenized_chat)[0]  
print(prompt)  
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]  
print(response)

这个系统提示词倒是有点意思，英文名称中夹杂着中文名称，哈哈哈！！

PS：看到这里，如果觉得不错，可以来个点赞、在看、关注。给公众号添加【星标⭐️】不迷路！您的支持是我坚持的最大动力！

欢迎多多关注公众号「NLP工作站」，加入交流群，交个朋友吧，一起学习，一起进步！