前端性能指标接入 Prometheus 技术方案

目标:以最小指标集将 Web Vitals 数据采集、上报、存储、并在 Prometheus + Grafana 中实现"可见即所得"的展示,不扩散冗余指标。


一、整体架构

bash
 体验AI代码助手
 代码解读
复制代码
浏览器
  └─ web-vitals SDK
       └─ Navigator.sendBeacon / fetch
            └─ [POST /metrics/vitals]
                 └─ Node.js / Go 上报服务
                      ├─ 聚合 → prom-client(Histogram / Counter / Gauge)
                      └─ /metrics(Prometheus scrape endpoint)
                           └─ Prometheus → Grafana Dashboard

二、前端采集(web-vitals SDK)

2.1 只采集核心 Web Vitals

指标含义类型建议
LCP最大内容绘制Histogram(分位数)
FID / INP首次输入延迟 / 交互响应Histogram
CLS累计布局偏移Histogram
FCP首次内容绘制Histogram
TTFB首字节时间Histogram

不采集:自定义埋点、资源加载明细、长任务列表等——避免指标爆炸。

2.2 采集代码示例

typescript
 体验AI代码助手
 代码解读
复制代码
// vitals-reporter.ts
import { onLCP, onFID, onCLS, onFCP, onTTFB, onINP } from 'web-vitals';

interface VitalPayload {
  name: string;   // 'LCP' | 'FID' | 'CLS' | 'FCP' | 'TTFB' | 'INP'
  value: number;  // 原始值(ms 或 score)
  rating: string; // 'good' | 'needs-improvement' | 'poor'
  page: string;   // location.pathname(不带查询参数)
}

const ENDPOINT = '/metrics/vitals';

function report(payload: VitalPayload) {
  const body = JSON.stringify(payload);
  // 优先 sendBeacon(页面卸载时不丢失)
  if (navigator.sendBeacon) {
    navigator.sendBeacon(ENDPOINT, new Blob([body], { type: 'application/json' }));
  } else {
    fetch(ENDPOINT, { method: 'POST', body, keepalive: true,
      headers: { 'Content-Type': 'application/json' } });
  }
}

function buildPayload(metric: any): VitalPayload {
  return {
    name: metric.name,
    value: metric.value,
    rating: metric.rating,
    page: location.pathname,
  };
}

onLCP(m => report(buildPayload(m)));
onFID(m => report(buildPayload(m)));
onINP(m => report(buildPayload(m)));
onCLS(m => report(buildPayload(m)));
onFCP(m => report(buildPayload(m)));
onTTFB(m => report(buildPayload(m)));

关键原则

  • page 只传 pathname不传完整 URL,防止高基数标签炸掉 Prometheus。
  • 每个指标只上报最终值(web-vitals 默认行为),不上报中间值。
  • 不附加用户 ID、Session ID 等高基数维度。

三、后端处理(Node.js + prom-client)

3.1 指标定义(仅 6 个 Histogram + 1 个 Counter)

php
 体验AI代码助手
 代码解读
复制代码
// metrics.ts
import client from 'prom-client';

const register = new client.Registry();
client.collectDefaultMetrics({ register }); // 可选:CPU/内存等默认指标

// --- Histogram:用于分位数 p50 / p75 / p90 / p95 / p99 ---
// Bucket 设计原则:覆盖 Good / NI / Poor 阈值的边界点
const TIMING_BUCKETS = [100, 200, 300, 500, 800, 1000, 1500, 2000, 3000, 4000, 5000, 8000, 10000];
const CLS_BUCKETS    = [0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.4, 0.5, 1.0];

export const vitalsHistogram = new client.Histogram({
  name: 'web_vitals_duration_ms',
  help: 'Web Vitals timing metrics (LCP/FID/INP/FCP/TTFB in ms; CLS * 1000)',
  labelNames: ['metric', 'page', 'rating'] as const,
  buckets: TIMING_BUCKETS,
  registers: [register],
});

export const clsHistogram = new client.Histogram({
  name: 'web_vitals_cls_score',
  help: 'Cumulative Layout Shift score',
  labelNames: ['page', 'rating'] as const,
  buckets: CLS_BUCKETS,
  registers: [register],
});

// Counter:统计各评级的页面加载次数(Good / NI / Poor)
export const vitalsRatingCounter = new client.Counter({
  name: 'web_vitals_rating_total',
  help: 'Count of Web Vitals reports by metric and rating',
  labelNames: ['metric', 'page', 'rating'] as const,
  registers: [register],
});

export { register };

为什么用 Histogram 而非 Summary? Histogram 在 Prometheus 服务端聚合分位数(histogram_quantile),可跨实例合并;Summary 在客户端计算,无法合并多实例数据。

3.2 上报接口

javascript
 体验AI代码助手
 代码解读
复制代码
// server.ts
import express from 'express';
import { vitalsHistogram, clsHistogram, vitalsRatingCounter, register } from './metrics';

const app = express();
app.use(express.json({ limit: '10kb' }));

const PAGE_ALLOWLIST = /^/[a-zA-Z0-9-_/]{0,100}$/; // 白名单,防注入

app.post('/metrics/vitals', (req, res) => {
  const { name, value, rating, page } = req.body;

  // 基本校验
  if (!['LCP','FID','INP','CLS','FCP','TTFB'].includes(name)) return res.sendStatus(400);
  if (!['good','needs-improvement','poor'].includes(rating)) return res.sendStatus(400);
  if (typeof value !== 'number' || value < 0 || value > 60000) return res.sendStatus(400);

  // 清洗 page:只保留路径,去掉查询参数和锚点
  const safePage = PAGE_ALLOWLIST.test(page) ? page : '/unknown';

  if (name === 'CLS') {
    clsHistogram.observe({ page: safePage, rating }, value);
  } else {
    vitalsHistogram.observe({ metric: name, page: safePage, rating }, value);
  }
  vitalsRatingCounter.inc({ metric: name, page: safePage, rating });

  res.sendStatus(204);
});

// Prometheus scrape endpoint
app.get('/metrics', async (_req, res) => {
  res.set('Content-Type', register.contentType);
  res.end(await register.metrics());
});

app.listen(3000);

3.3 高基数防护

防护措施说明
page 白名单正则防止随机路径生成海量 label 值
page 路径归一化/product/123/product/:id(可选,用路由映射表)
不暴露 user/session绝不作为 label
限流单 IP 限流,防刷接口污染指标

四、Prometheus 配置

yaml
 体验AI代码助手
 代码解读
复制代码
# prometheus.yml
scrape_configs:
  - job_name: 'web-vitals-backend'
    static_configs:
      - targets: ['your-backend:3000']
    scrape_interval: 15s

五、Grafana 展示:可见即所得

5.1 推荐展示面板(共 8 个 Panel)

Panel 1:核心指标分位数总览(Stat 或 Gauge)

ini
 体验AI代码助手
 代码解读
复制代码
# LCP p75(Google 推荐的评估分位数)
histogram_quantile(0.75,
  sum(rate(web_vitals_duration_ms_bucket{metric="LCP"}[5m])) by (le)
)

对 LCP / INP / FCP / TTFB 各出一个 Stat Panel,阈值颜色:

  • 绿色(Good):LCP < 2500ms、INP < 200ms、FCP < 1800ms、TTFB < 800ms
  • 黄色(NI)
  • 红色(Poor)

Panel 2:LCP 分位数趋势(Time Series)

ini
 体验AI代码助手
 代码解读
复制代码
# p50 / p75 / p95
histogram_quantile(0.50, sum(rate(web_vitals_duration_ms_bucket{metric="LCP"}[5m])) by (le))
histogram_quantile(0.75, sum(rate(web_vitals_duration_ms_bucket{metric="LCP"}[5m])) by (le))
histogram_quantile(0.95, sum(rate(web_vitals_duration_ms_bucket{metric="LCP"}[5m])) by (le))

Panel 3:INP 分位数趋势(Time Series)

ini
 体验AI代码助手
 代码解读
复制代码
histogram_quantile(0.75, sum(rate(web_vitals_duration_ms_bucket{metric="INP"}[5m])) by (le))

Panel 4:CLS p75 趋势(Time Series)

scss
 体验AI代码助手
 代码解读
复制代码
histogram_quantile(0.75, sum(rate(web_vitals_cls_score_bucket[5m])) by (le))

Panel 5:各指标 Good 率(Bar Gauge 或 Pie)

ini
 体验AI代码助手
 代码解读
复制代码
# LCP Good 率
sum(rate(web_vitals_rating_total{metric="LCP", rating="good"}[1h]))
/
sum(rate(web_vitals_rating_total{metric="LCP"}[1h]))

对 LCP / INP / CLS 各出一条,直观反映"用户体验达标率"。

Panel 6:按页面分组的 LCP p75(Bar Chart)

ini
 体验AI代码助手
 代码解读
复制代码
histogram_quantile(0.75,
  sum(rate(web_vitals_duration_ms_bucket{metric="LCP"}[30m])) by (le, page)
)

快速定位哪个页面是性能瓶颈。https://app-ahj4jy6rsikh.appmiaoda.com/ https://www.coze.cn/s/lowsPieS8tI/ https://www.coze.cn/s/mSi_CcZIEvM/ https://www.coze.cn/s/gVqS4cAiDOs/ https://hydro.ac/user/125210 http://www.code-fans.cn/user/3648 http://www.marsoj.com/user/803 https://yb.tencent.com/s/JLsNf4UvfQmn https://yb.tencent.com/s/zH0EnPVdM03E https://yb.tencent.com/s/dsR9JELkf1Ku https://yb.tencent.com/s/901XeR0zlSOG https://yuanzhuo.bnu.edu.cn/user/d0de8de9a7c4302b000f2c42795669908538c06f/about https://yuanzhuo.bnu.edu.cn/user/baac83b7d9ec8116ff4267790df249ed52a874cd/about https://yuanzhuo.bnu.edu.cn/user/718efc2b04f8b2333de2936ce62af68f1d3144d4/about https://yuanzhuo.bnu.edu.cn/user/027ae46790f5715482a48cd46852fdd7993d8355/about https://yuanzhuo.bnu.edu.cn/user/45f007ed6458a8c0e592fd5c08bfdf88e882d178/about https://yuanzhuo.bnu.edu.cn/user/8dcd79b79480470ebb6503e141bd0606ca5e992a/about https://yuanzhuo.bnu.edu.cn/user/9d68cfd1a2c0c01dae4a09164d531107d7d7593e/about https://yuanzhuo.bnu.edu.cn/user/79d5a7f6c2d2f318a150f21fdc0af054db373917/about https://yuanzhuo.bnu.edu.cn/user/e71158978ecc54cf41e3c7d8523f38479d3d7ae6/about https://yuanzhuo.bnu.edu.cn/user/50addd2daab802841cf03ab6c945e5ca4c9a1145/about https://yuanzhuo.bnu.edu.cn/user/7d71e7fc1fce465743f22c333be84bb71d8278b6/about https://yuanzhuo.bnu.edu.cn/user/64a9528ab51c9e64aa84f267448c6b69dd8d9f15/about https://yuanzhuo.bnu.edu.cn/user/a34e535a942ede1359603856c908600b0b4e2e05/about

Panel 7:上报量 / 错误率(Time Series)

scss
 体验AI代码助手
 代码解读
复制代码
# 每分钟上报次数
sum(rate(web_vitals_rating_total[1m])) by (metric)

监控数据采集本身是否正常。

Panel 8:TTFB p75 趋势(Time Series)

ini
 体验AI代码助手
 代码解读
复制代码
histogram_quantile(0.75, sum(rate(web_vitals_duration_ms_bucket{metric="TTFB"}[5m])) by (le))

反映服务端响应速度,与后端性能关联。


5.2 告警规则示例

yaml
 体验AI代码助手
 代码解读
复制代码
# alerts.yml
groups:
  - name: web-vitals
    rules:
      - alert: LCP_P75_Too_High
        expr: |
          histogram_quantile(0.75,
            sum(rate(web_vitals_duration_ms_bucket{metric="LCP"}[10m])) by (le)
          ) > 4000
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "LCP p75 超过 4s,用户体验差"

      - alert: INP_P75_Too_High
        expr: |
          histogram_quantile(0.75,
            sum(rate(web_vitals_duration_ms_bucket{metric="INP"}[10m])) by (le)
          ) > 500
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "INP p75 超过 500ms,页面交互迟钝"

      - alert: Good_Rate_LCP_Drop
        expr: |
          sum(rate(web_vitals_rating_total{metric="LCP",rating="good"}[30m]))
          / sum(rate(web_vitals_rating_total{metric="LCP"}[30m])) < 0.5
        for: 10m
        labels:
          severity: critical
        annotations:
          summary: "LCP Good 率低于 50%,大量用户体验差"

六、指标清单汇总

指标名类型Labels用途
web_vitals_duration_msHistogrammetric, page, rating计算 LCP/FID/INP/FCP/TTFB 分位数
web_vitals_cls_scoreHistogrampage, rating计算 CLS 分位数
web_vitals_rating_totalCountermetric, page, rating计算 Good/NI/Poor 分布率

3 个指标,配合 label 维度满足所有展示需求,无冗余。


七、依赖版本参考

组件版本
web-vitals^4.x
prom-client(Node.js)^15.x
Prometheus^2.45
Grafana^10.x

八、实施步骤

  1. 前端npm install web-vitals,在应用入口引入 vitals-reporter.ts
  2. 后端:部署上报服务,暴露 /metrics/vitals(POST)和 /metrics(GET)
  3. Prometheus:添加 scrape job,15s 采集间隔
  4. Grafana:导入上述 8 个 Panel,设置阈值颜色映射
  5. 告警:配置 Alertmanager 接收 Web Vitals 告警,对接钉钉 / Slack

方案遵循 Google Web Vitals 评估标准(2024),以 p75 作为主要健康评估分位数。

0
0
0
0
评论
未登录
暂无评论