来了！Kimi开源Moonlight-16B-A3B的MoE模型！！ - 文章 - 开发者社区

言简意赅，发现月之暗面开源MoE模型，总参数量15.29B，激活参数2.24B，使用Muon优化器，在5.7T Tokens的训练数据下，拿到了很好的效果。

Github：https://github.com/MoonshotAI/Moonlight

HF：https://huggingface.co/moonshotai/Moonlight-16B-A3B

Paper：https://github.com/MoonshotAI/Moonlight/blob/master/Moonlight.pdf

效果如下：

picture.image

比较 Muon 和 Adam 的扩展定律实验，发现Muon 的样本效率比 Adam 高 2 倍。

picture.image

Muon 优化器原理如下：

picture.image

同时，Moonlight-16B-A3B的模型架构与DeepSeek-V3一致。

HF快速使用：


        
 
   

 
 
        
            

          from transformers import AutoModelForCausalLM, AutoTokenizer
          
   

 
          
   

 
          model\_path = 
          
 "moonshotai/Moonlight-16B-A3B-Instruct"
 
          
   

 
          model = AutoModelForCausalLM.from\_pretrained(
          
   

 
              model\_path,
          
   

 
              torch\_dtype=
          
 "auto"
 
          ,
          
   

 
              device\_map=
          
 "auto"
 
          ,
          
   

 
              trust\_remote\_code=True
          
   

 
          )
          
   

 
          tokenizer = AutoTokenizer.from\_pretrained(model\_path, trust\_remote\_code=True)
          
   

 
          
   

 
          messages = [
          
   

 
              {
          
 "role"
 
          : 
          
 "system"
 
          , 
          
 "content"
 
          : 
          
 "You are a helpful assistant provided by Moonshot-AI."
 
          },
          
   

 
              {
          
 "role"
 
          : 
          
 "user"
 
          , 
          
 "content"
 
          : 
          
 "Is 123 a prime?"
 
          }
          
   

 
          ]
          
   

 
          input\_ids = tokenizer.apply\_chat\_template(messages, add\_generation\_prompt=True, return\_tensors=
          
 "pt"
 
          ).to(model.device)
          
   

 
          generated\_ids = model.generate(inputs=input\_ids, max\_new\_tokens=500)
          
   

 
          response = tokenizer.batch\_decode(generated\_ids)[0]
          
   

 
          
 print
 
          (response)

PS：看到这里，如果觉得不错，可以来个点赞、在看、关注。给公众号添加【星标⭐️】不迷路！您的支持是我坚持的最大动力！

欢迎多多关注公众号「NLP工作站」，加入交流群（3群也满了，等开4群吧），交个朋友吧，一起学习，一起进步！