bge系列又双叒叕更新了(多模态bge、reranker v2)

火山方舟向量数据库智能语音交互

一共更新了2个系列模型,bge-reranker-v2 , Visualized-BGE,reranker模型系列计算2个输入文本的相关性,Visualized-BGE系列可以计算文本、图片多模态的相关性

reranker系列更新

        
          
https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/llm_reranker  

      

reranker系列更新了3个模型:

picture.image

官方建议是:

  • 多语言,使用 BAAI/bge-reranker-v2-m3 and BAAI/bge-reranker-v2-gemma
  • 对于中文或者英文, 使用 BAAI/bge-reranker-v2-m3 and BAAI/bge-reranker-v2-minicpm-layerwise.
  • 性能要求高,使用 BAAI/bge-reranker-v2-m3 and the low layer of BAAI/bge-reranker-v2-minicpm-layerwise.
  • 最好的效果, 使用 BAAI/bge-reranker-v2-minicpm-layerwise and BAAI/bge-reranker-v2-gemma

一份内部测试集结果,仅供参考,召回Top 10,然后做rerank (没有评测2个2B的模型)

@1@3@5@10
bce-rerankerR:0.773480662R:0.902920R:0.9218626R:0.933701657
bge-reranker-v1R:0.7253354380426204R:0.862667R:0.9076558R:0.933701657
bge-reranker-v2-m3R:0.7584846093133386R:0.899763R:0.9218626R:0.933701657

比较特别的是BAAI/bge-reranker-v2-minicpm-layerwise模型,可以选8~40层的中间结果输出相关性分数:


        
          
from FlagEmbedding import LayerWiseFlagLLMReranker  
reranker = LayerWiseFlagLLMReranker('BAAI/bge-reranker-v2-minicpm-layerwise', use_fp16=True) # Setting use\_fp16 to True speeds up computation with a slight performance degradation  
# reranker = LayerWiseFlagLLMReranker('BAAI/bge-reranker-v2-minicpm-layerwise', use\_bf16=True) # You can also set use\_bf16=True to speed up computation with a slight performance degradation  
  
score = reranker.compute_score(['query', 'passage'], cutoff_layers=[28]) # Adjusting 'cutoff\_layers' to pick which layers are used for computing the score.  
print(score)  
  
scores = reranker.compute_score([['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']], cutoff_layers=[28])  
print(scores)  

      

官方评测结果:

picture.image

picture.image

Visualized-BGE 系列更新

        
          
https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/visual  

      

Visualized-BGE主要用于混合模态检索任务,包括但不限于:

  • 多模态知识检索(query:文本; candidate:图像文本对、文本或图像)
  • 组合图像检索(query:图像文本对;candidate:图像)
  • 使用多模态query进行知识检索(query:图像文本对;candidate:文本)

说白了,就是输入和候选库都可以是 文本或图片或二者组合

一共2个模型,一个英语,一个多模态

picture.image

用法也很简单,当成编码器就好了


        
          
import torch  
from FlagEmbedding.visual.modeling import Visualized_BGE  
  
model = Visualized_BGE(model_name_bge = "BAAI/bge-base-en-v1.5", model_weight="path: Visualized\_base\_en\_v1.5.pth")  
  
with torch.no_grad():  
    query_emb = model.encode(text="Are there sidewalks on both sides of the Mid-Hudson Bridge?")  
    candi_emb_1 = model.encode(text="The Mid-Hudson Bridge, spanning the Hudson River between Poughkeepsie and Highland.", image="./imgs/wiki\_candi\_1.jpg")  
    candi_emb_2 = model.encode(text="Golden\_Gate\_Bridge", image="./imgs/wiki\_candi\_2.jpg")  
    candi_emb_3 = model.encode(text="The Mid-Hudson Bridge was designated as a New York State Historic Civil Engineering Landmark by the American Society of Civil Engineers in 1983. The bridge was renamed the \"Franklin Delano Roosevelt Mid-Hudson Bridge\" in 1994.")  
  
sim_1 = query_emb @ candi_emb_1.T  
sim_2 = query_emb @ candi_emb_2.T  
sim_3 = query_emb @ candi_emb_3.T  
print(sim_1, sim_2, sim_3) # tensor([[0.6932]]) tensor([[0.4441]]) tensor([[0.6415]])  

      

评测结果:

  • zero-shot

picture.image

  • finetune

picture.image

0
0
0
0
关于作者
关于作者

文章

0

获赞

0

收藏

0

相关资源
云原生可观测性技术的落地实践
云原生技术和理念在近几年成为了备受关注的话题。应用通过云原生改造,变得更动态、弹性,可以更好地利用云的弹性能力。但是动态、弹性的环境也给应用以及基础设施的观测带来了更大的挑战。本次分享主要介绍了云原生社区中可观测性相关的技术和工具,以及如何使用这些工具来完成对云原生环境的观测。
相关产品
评论
未登录
看完啦,登录分享一下感受吧~
暂无评论