期刊配图：模型评估指标+性能提升百分比的简洁可视化柱状图分析 - 文章 - 开发者社区

picture.image

✨ 欢迎关注Python机器学习AI ✨

本节介绍：相关系数可视化的优美呈现揭示数据之间的深层联系，数据采用模拟数据无任何现实意义，作者根据个人对机器学习的理解进行代码实现与图表输出，仅供参考。完整数据和代码将在稍后上传至交流群，成员可在交流群中获取下载。需要的朋友可关注公众文末提供的获取方式。

✨ 论文信息 ✨

picture.image

文献中通过柱状图展示不同机器学习模型（RF、GBM、SVM、CNN、GCN）在预测四个不同输出（PI、Er、Igeo、CER）时的性能比较。评估性能的指标是拟合优度R^2，该值越接近1表示模型拟合数据的能力越强。每个图表展示了不同模型的 R^2 值，并且给出了GCN模型相对于传统模型（RF、GBM、SVM、CNN）的百分比提升

基于前述柱状图中所使用的百分比提升标注方法，可以在模拟数据集上同样对回归与分类任务的多个模型进行了对比分析。通过在可视化图中标注最优模型相较于其他模型的性能提升百分比，不仅能够直观展示各模型在不同任务上的绝对表现，还能凸显最优模型在预测精度与泛化能力上的优势。这种方式为后续研究中不同任务和模型的对比提供了一种更清晰、易理解的展示思路

✨ 代码实现 ✨

  
import matplotlib.pyplot as plt  
import numpy as np  
  
  
# X轴的类别  
categories = ['RF', 'GBM', 'SVM', 'CNN', 'GCN']  
# 每个类别的R²值  
values = [0.604, 0.614, 0.558, 0.632, 0.730]  
# 为每个条形指定颜色以匹配原始图像  
colors = ['#4A7FB9', '#C57683', '#98B155', '#7AB699', '#4A7FB9']  
  
# 创建一个图形和一组子图  
fig, ax = plt.subplots(figsize=(8, 6.5))  
  
# 调整宽度并绘制条形图  
bar_width = 0.5  # 设置条形的宽度  
bars = ax.bar(categories, values, color=colors, edgecolor='black', zorder=2, width=bar_width, align='center')  
  
# 设置Y轴的范围和刻度  
ax.set_ylim(0.50, 0.77)  
ax.set_yticks(np.arange(0.50, 0.75, 0.05))  
  
  
ax.set_xlabel('(a) Pl', fontsize=14, labelpad=10)  
  
# 设置刻度标签的字体大小  
ax.tick_params(axis='both', which='major', labelsize=14)  
  
# 隐藏顶部和右侧的坐标轴线，使图表更简洁  
ax.spines['top'].set_visible(False)  
ax.spines['right'].set_visible(False)  
  
# 在每个条形的顶部添加其精确值  
for bar in bars:  
    height = bar.get_height()  
    # 直接在y坐标上增加一个小的偏移量 (例如 0.003)，而不是使用 xytext  
    ax.text(bar.get_x() + bar.get_width() / 2, height + 0.003, f'{height:.3f}',  
            ha='center', va='bottom', fontsize=12, zorder=3)  
  
# 定义注释中使用的关键Y轴值  
y_ref = 0.592  
y_cnn = 0.632  
y_gcn = 0.730  
  
# 绘制水平虚线  
# 水平线1 (在 y=0.592)  
ax.plot([-0.5, 4.45], [y_ref, y_ref], color='black', linestyle='--', linewidth=1, zorder=1)  
ax.text(4.45, y_ref, f' {y_ref}', va='center', ha='left', fontsize=12)  
  
# 水平线2 (在 y=0.632, CNN条形的顶部)  
ax.plot([2.7, 3.7], [y_cnn, y_cnn], color='black', linestyle='--', linewidth=1, zorder=1)  
  
# 绘制带箭头的注释和百分比文本  
# 箭头 1: 6.8%  
x_pos1 = 3.35  # CNN条形右侧  
ax.annotate('', xy=(x_pos1, y_ref), xytext=(x_pos1, y_cnn),  
            arrowprops=dict(arrowstyle='<->', color='black', shrinkA=0, shrinkB=0))  
ax.text(x_pos1 + 0.05, (y_ref + y_cnn) / 2, '6.8%',  
        rotation=90, va='center', ha='left', fontsize=12)  
  
# 箭头 2: 15.5% (红色)  
x_pos2 = 3.65  # GCN条形左侧  
ax.annotate('', xy=(x_pos2, y_cnn), xytext=(x_pos2, y_gcn),  
            arrowprops=dict(arrowstyle='<->', color='black', shrinkA=0, shrinkB=0))  
ax.text(x_pos2 - 0.05, (y_cnn + y_gcn) / 2, '15.5%',  
        rotation=90, va='center', ha='right', fontsize=12, color='red')  
  
# 箭头 3: 23.3%  
x_pos3 = 4.45  # GCN条形右侧  
ax.annotate('', xy=(x_pos3, y_ref), xytext=(x_pos3, y_gcn),  
            arrowprops=dict(arrowstyle='<->', color='black', shrinkA=0, shrinkB=0))  
ax.text(x_pos3 + 0.05, (y_ref + y_gcn) / 2, '23.3%',  
        rotation=90, va='center', ha='left', fontsize=12)  
  
# 调整布局以确保所有元素都清晰可见  
plt.tight_layout()  
plt.savefig("org.pdf", format='pdf', bbox_inches='tight', dpi=1200)  
plt.show()

picture.image

代码实现了对文献中图的完全复现，可视化展示不同模型的R^2值及其百分比提升关系

在上述复现文献图A的可视化基础上，进一步在模拟数据集上构建了多个常见的回归模型，并对比它们在测试集上的

R^2表现。随后采用与文献图一致的方式，在柱状图中加入基准模型与最优模型之间的百分比提升标注，从而更直观地展示不同方法在模拟数据集上的预测效果差异

  
import pandas as pd  
import numpy as np  
import matplotlib.pyplot as plt  
import warnings  
# 忽略所有警告  
warnings.filterwarnings("ignore")  
  
path = r"回归.xlsx"  
df = pd.read_excel(path)  
from sklearn.model_selection import train_test_split  
# 划分特征和目标变量  
X = df.drop(['SR'], axis=1)    
y = df['SR']    
  
# 划分训练集和测试集  
X_train, X_test, y_train, y_test = train_test_split(X,  y,  test_size=0.3,  random_state=42)  
  
  
from sklearn.preprocessing import StandardScaler  
  
# 初始化标准化器  
scaler = StandardScaler()  
# 仅使用训练集数据来计算标准化参数（均值和标准差）  
X_train_scaled = scaler.fit_transform(X_train)  
# 使用相同的标准化器对测试集进行转换  
X_test_scaled = scaler.transform(X_test)  
from sklearn.tree import DecisionTreeRegressor  # DT  
from sklearn.ensemble import RandomForestRegressor  # RF  
from sklearn.svm import SVR  # SVM  
from sklearn.ensemble import GradientBoostingRegressor  # GBM  
from xgboost import XGBRegressor  # XGB  
  
# 随机种子  
random_seed = 1314  
  
# 使用标准化后的训练数据进行训练  
# DT  
dt_model = DecisionTreeRegressor(random_state=random_seed)  
dt_model.fit(X_train_scaled, y_train)  
  
# RF  
rf_model = RandomForestRegressor(random_state=random_seed)  
rf_model.fit(X_train_scaled, y_train)  
  
# XGB  
xgb_model = XGBRegressor(random_state=random_seed)    
xgb_model.fit(X_train_scaled, y_train)  
  
# GBM  
gbm_model = GradientBoostingRegressor(random_state=random_seed)  
gbm_model.fit(X_train_scaled, y_train)  
  
# SVM  
svm_model = SVR(kernel='rbf')  
svm_model.fit(X_train_scaled, y_train)

在构建多个回归模型前对特征进行了标准化，主要是因为SVM对数据量纲敏感，读者若对比不做标准化的结果会发现SVM性能差异很大，而其它模型（如 DT、RF、GBM、XGB）对量纲不敏感，受影响程度不大

  
from sklearn.metrics import r2_score  
  
# 计算每个模型的预测值（使用标准化后的测试集数据）  
dt_y_pred = dt_model.predict(X_test_scaled)  
rf_y_pred = rf_model.predict(X_test_scaled)  
xgb_y_pred = xgb_model.predict(X_test_scaled)  
gbm_y_pred = gbm_model.predict(X_test_scaled)  
svm_y_pred = svm_model.predict(X_test_scaled)  
  
# 计算每个模型的拟合优度（R²）  
dt_r2 = r2_score(y_test, dt_y_pred)  
rf_r2 = r2_score(y_test, rf_y_pred)  
xgb_r2 = r2_score(y_test, xgb_y_pred)  
gbm_r2 = r2_score(y_test, gbm_y_pred)  
svm_r2 = r2_score(y_test, svm_y_pred)  
  
  
# 返回一个模型拟合优度的列表  
r2_list = [dt_r2, rf_r2, xgb_r2, gbm_r2, svm_r2]  
models = ['DT', 'RF', 'XGB', 'GBM', 'SVM']  
  
# 计算平均 R² 值  
average_r2 = sum(r2_list) / len(r2_list)  
  
# 计算每个模型相对于平均 R² 值的提高百分比  
increase_percentage = [(model, (r2 - average_r2) / average_r2 * 100) for model, r2 in zip(models, r2_list)]  
  
# 排序后找出 R² 值最高的模型和第二高的模型  
sorted_models_r2 = sorted(zip(models, r2_list), key=lambda x: x[1], reverse=True)  
highest_model, highest_r2 = sorted_models_r2[0]  
second_highest_model, second_highest_r2 = sorted_models_r2[1]  
  
# 计算最高 R² 模型与平均值、第二高 R² 模型与平均值的提高百分比  
highest_model_increase = (highest_r2 - average_r2) / average_r2 * 100  
second_highest_model_increase = (second_highest_r2 - average_r2) / average_r2 * 100  
  
# 计算最高 R² 模型与第二高 R² 模型的提高百分比  
highest_vs_second_highest_increase = (highest_r2 - second_highest_r2) / second_highest_r2 * 100  
  
# 打印结果  
print(f"平均 R² 值: {average_r2:.4f}")  
print(f"{highest_model} 相对于平均 R² 提高了 {highest_model_increase:.2f}%")  
print(f"{second_highest_model} 相对于平均 R² 提高了 {second_highest_model_increase:.2f}%")  
print(f"{highest_model} 相对于 {second_highest_model} 提高了 {highest_vs_second_highest_increase:.2f}%")  
  
# 返回每个模型的提高百分比  
increase_percentage

计算多个回归模型（DT、RF、XGB、GBM、SVM）在测试集上的R^2值，并根据与平均R^2的差异计算了每个模型的性能提升百分比，最终输出最优模型、次优模型以及它们与平均值和彼此之间的提升对比

  
平均 R² 值: 0.8379  
RF 相对于平均 R² 提高了 7.87%  
GBM 相对于平均 R² 提高了 6.62%  
RF 相对于 GBM 提高了 1.17%  
[('DT', 2.2245796089721157),   ('RF', 7.865454310431476),   ('XGB', 5.906449983351724),   ('GBM', 6.616021593410731),   ('SVM', -22.612505496166037)]

  
# 将模型和 R² 值打包在一起并排序  
sorted_models_r2 = sorted(zip(models, r2_list), key=lambda x: x[1])  # 按R²值排序  
  
# 打印排序结果  
print("排序后的模型拟合优度 (R²):")  
for model, r2 in sorted_models_r2:  
    print(f"{model}: R² = {r2:.4f}")  
  
# 返回排序后的R²值和模型列表  
sorted_r2_list = [round(r2, 3) for _, r2 in sorted_models_r2]  
sorted_models_list = [model for model, _ in sorted_models_r2]  
sorted_r2_list

将各个模型的R^2值与模型名称打包后按拟合优度从低到高排序，并输出排序结果，同时返回排序后的模型列表和对应的R^2数值

  
排序后的模型拟合优度 (R²):  
SVM: R² = 0.6484  
DT: R² = 0.8565  
XGB: R² = 0.8874  
GBM: R² = 0.8933  
RF: R² = 0.9038  
[0.648, 0.857, 0.887, 0.893, 0.904]

  
# X轴的类别  
categories = sorted_models_list  
# 每个类别的R²值  
values = sorted_r2_list  
# 为每个条形指定颜色以匹配原始图像  
colors = ['#4A7FB9', '#C57683', '#98B155', '#7AB699', '#4A7FB9']  
  
# --- 图表创建 ---  
# 创建一个图形和一组子图  
fig, ax = plt.subplots(figsize=(8, 6.5))  
  
# 调整宽度并绘制条形图  
bar_width = 0.5  # 设置条形的宽度  
bars = ax.bar(categories, values, color=colors, edgecolor='black', zorder=2, width=bar_width, align='center')  
  
# 设置Y轴的范围和刻度  
ax.set_ylim(0.60, 0.99)  
ax.set_yticks(np.arange(0.60, 0.99, 0.05))  
  
  
ax.set_xlabel('Test R²', fontsize=14, labelpad=10)  
  
# 设置刻度标签的字体大小  
ax.tick_params(axis='both', which='major', labelsize=14)  
  
# 隐藏顶部和右侧的坐标轴线，使图表更简洁  
ax.spines['top'].set_visible(False)  
ax.spines['right'].set_visible(False)  
  
# --- 数据标签 (已修正的部分) ---  
# 在每个条形的顶部添加其精确值  
for bar in bars:  
    height = bar.get_height()  
    # 直接在y坐标上增加一个小的偏移量 (例如 0.003)，而不是使用 xytext  
    ax.text(bar.get_x() + bar.get_width() / 2, height + 0.003, f'{height:.3f}',  
            ha='center', va='bottom', fontsize=12, zorder=3)  
  
# 定义注释中使用的关键Y轴值  
y_mean = 0.838   
y_no2 = 0.893    
no1 = 0.904      
  
# 绘制水平虚线  
# 水平线1 (在 y=y_mean)  
ax.plot([-0.5, 4.45], [y_mean, y_mean], color='black', linestyle='--', linewidth=1, zorder=1)  
ax.text(4.45, y_mean, f' {y_mean}', va='center', ha='left', fontsize=12)  
  
# 水平线2 (在 y=y_no2, no2条形的顶部)  
ax.plot([2.7, 3.7], [y_no2, y_no2], color='black', linestyle='--', linewidth=1, zorder=1)  
  
# 绘制带箭头的注释和百分比文本  
# 箭头 1:   
x_pos1 = 3.35 # no2条形右侧  
ax.annotate('', xy=(x_pos1, y_mean), xytext=(x_pos1, y_no2),  
            arrowprops=dict(arrowstyle='<->', color='black', shrinkA=0, shrinkB=0))  
ax.text(x_pos1 + 0.05, (y_mean + y_no2) / 2, '6.62%',  
        rotation=90, va='center', ha='left', fontsize=12)  
  
# 箭头 2: 红色  
x_pos2 = 3.65  # no1条形左侧  
ax.annotate('', xy=(x_pos2, y_no2), xytext=(x_pos2, no1),  
            arrowprops=dict(arrowstyle='<->', color='black', shrinkA=0, shrinkB=0))  
ax.text(x_pos2 - 0.05, (y_no2 + no1) / 2, '1.17%',  
        rotation=90, va='center', ha='right', fontsize=12, color='red')  
  
# 箭头 3:  
x_pos3 = 4.45  # no1条形右侧  
ax.annotate('', xy=(x_pos3, y_mean), xytext=(x_pos3, no1),  
            arrowprops=dict(arrowstyle='<->', color='black', shrinkA=0, shrinkB=0))  
ax.text(x_pos3 + 0.05, (y_mean + no1) / 2, '7.87%',  
        rotation=90, va='center', ha='left', fontsize=12)  
  
# 调整布局以确保所有元素都清晰可见  
plt.tight_layout()  
# 显示最终的图表  
plt.savefig("TEST.pdf", format='pdf', bbox_inches='tight', dpi=1200)  
plt.show()