大模型：深度学习之旅与未来趋势｜社区征文 - 文章 - 开发者社区

前言

从去年chatGPT爆火，到国内千模大战，关乎大模型的热度已经沸反盈天。但大模型出现的价值、意义似乎与实际使用效果存在鲜明的对比，特别是日常工作中，最多让大模型帮助生成一些不痛不痒、凑字数的内容，难易触达工作的核心环节。所以趁着国庆假期，我试图用国产大模型来协助完成一篇文章，从“知识生产”这个大模型擅长的角度来验证大模型能否更深度提升个人工作效率。

picture.image

训练方法

目前，模型加速领域已经建立了很多有影响力的开源工具，国际上比较有名的有微软DeepSpeed、英伟达Megatron-LM，国内比较有名的是OneFlow、ColossalAI等，能够将GPT-3规模大模型训练成本降低90%以上。

未来，如何在大量的优化策略中根据硬件资源条件自动选择最合适的优化策略组合，是值得进一步探索的问题。此外，现有的工作通常针对通用的深度神经网络设计优化策略，如何结合 Transformer 大模型的特性做针对性的优化有待进一步研究。

picture.image

项目分享

下面我给大家分享一个基于预训练模型的命名实体识别（NER）应用： 1.安装所需库：

pip install torch transformers

2.导入所需库

import torch
from transformers import BertTokenizer, BertForTokenClassification

导入PyTorch和Hugging Face的Transformers库，并加载预训练的BERT模型和tokenizer。

model_name = "bert-base-uncased"
tokenizer = BertTokenizer.from_pretrained(model_name)
model = BertForTokenClassification.from_pretrained(model_name)

定义变量model_name为"bert-base-uncased"，这是一个预训练的BERT模型。我们还通过BertTokenizer.from_pretrained()方法加载了预训练的tokenizer。最后，我们通过BertForTokenClassification.from_pretrained()方法加载了BERT模型。

3.输入文本进行NER：

def ner_inference(text):
    input_ids = tokenizer.encode(text, add_special_tokens=True)
    input_tensors = torch.tensor([input_ids])

    # 使用GPU进行推理（如果可用）
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    input_tensors = input_tensors.to(device)
    model.to(device)

    with torch.no_grad():
        outputs = model(input_tensors)
        predictions = torch.argmax(outputs.logits, dim=2).squeeze().tolist()

    # 解码预测结果
    tokens = tokenizer.convert_ids_to_tokens(input_ids)
    labels = [tokenizer.decode([pred]) for pred in predictions]

    # 提取实体标签和对应的文本
    entities = []
    current_entity = None
    for token, label in zip(tokens, labels):
        if label.startswith("B-"):
            if current_entity:
                entities.append(current_entity)
            current_entity = {"text": token.replace("##", ""), "label": label[2:]}
        elif label.startswith("I-"):
            if current_entity:
                current_entity["text"] += token.replace("##", "")
        else:
            if current_entity:
                entities.append(current_entity)
                current_entity = None

    if current_entity:
        entities.append(current_entity)

    return entities

我们定义了一个函数ner_inference来进行命名实体识别（NER）。该函数接受一段文本作为输入，并返回一个包含所有实体的列表。

首先，我们使用tokenizer.encode()方法将输入文本编码为token ID序列，并添加了特殊的token（例如[CLS]和[SEP]）。我们将编码后的序列转换为PyTorch张量，并将其发送到GPU设备进行推理（如果可用）。

    input_ids = tokenizer.encode(text, add_special_tokens=True)
    input_tensors = torch.tensor([input_ids])

    # 使用GPU进行推理（如果可用）
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    input_tensors = input_tensors.to(device)
    model.to(device)

我们使用BERT模型对输入进行推理，并通过torch.argmax()方法获取每个token的预测标签。我们还通过tokenizer.convert_ids_to_tokens()方法将token ID序列转换回token字符串，并使用tokenizer.decode()方法将预测标签转换为字符串。

    with torch.no_grad():
        outputs = model(input_tensors)
        predictions = torch.argmax(outputs.logits, dim=2).squeeze().tolist()

    # 解码预测结果
    tokens = tokenizer.convert_ids_to_tokens(input_ids)
    labels = [tokenizer.decode([pred]) for pred in predictions]

最后，我们遍历token序列和预测标签序列，并提取包含实体文本和标签的实体对象，并将它们添加到列表中。如果当前token没有预测到实体，则我们将当前实体设置为None。如果在序列末尾存在一个实体，则我们将其添加到实体列表中。

    entities = []
    current_entity = None
    for token, label in zip(tokens, labels):
        if label.startswith("B-"):
            if current_entity:
                entities.append(current_entity)
            current_entity = {"text": token.replace("##", ""), "label": label[2:]}
        elif label.startswith("I-"):
            if current_entity:
                current_entity["text"] += token.replace("##", "")
        else:
            if current_entity:
                entities.append(current_entity)
                current_entity = None

    if current_entity:
        entities.append(current_entity)

    return entities

总结

最大的不足还是内容质量的问题，大模型生成的内容较为空洞，没有论点、论据结合。另外，回答的内容缺乏事实依据，缺乏必要联想，还有可信度的问题，甚至能凭空捏造。另外，在使用大模型时，需要熟练运用思维链、结合多家大模型进行优化，才能获得更优质的结果。除此之外，在和大模型交流过程中，确实能被它查漏补缺、借鉴想法，个人认为这一点是比直接用它写文章更有价值的地方。另外，在讯飞的文档问答或者文心一言的览卷文档的加持下，能加快认识一个行业、一个知识的效率。最后，在同一套流程、提示词的操作下，ChatGPT or GPT-4的效果是不是会更优，国产大模型可否承受得住它们的暴击？由于没有工具，只能留下一个遗憾。