Prometheus+Grafana监控springboot、MySQL、Redis并通过钉钉发报警 - 文章 - 开发者社区

本文将使用prometheus及Grafana搭建一套监控系统来监控主机springboot应用及数据库（MySQL、Redis）

安装grafana可视化面板

Grafana是一个可视化面板(Dashboard),有着非常漂亮的图表和布局展示,功能齐全的度量仪表盘和图形编辑器,支持Graphite、zabbix、InfluxDB、Prometheus等数据源。

下载地址:https://grafana.com/grafana/download

本文主要介绍linux版本:

picture.image

centos下安装命令为:


      1. `wget https://dl.grafana.com/oss/release/grafana-6.3.3-1.x86_64.rpm`
2. `sudo yum localinstall grafana-6.3.3-1.x86_64.rpm`

配置

安装完成后，配置文件位于/etc/grafana/grafana.ini

picture.image

可以看到上面配置的http端口是3000

启动grafana


      1. `/etc/
 init
 .
 d
 /
 grafana
 -
 server start`

登录grafana

访问页面http://服务器IP:3000 ，默认账号、密码admin/admin 首次登录将提示修改密码，建议修改

picture.image

安装Prometheus

Prometheus时序数据库结构:

picture.image

下载地址

https://prometheus.io/download/

下载页面内有很多拓展包，如alertManager和mysqldexporter、haproxyexporter、memcache_exporter等exporter。

picture.image

普通方式安装与启动

安装:


      1. `/** 下载*/`
2. `wget https://github.com/prometheus/prometheus/releases/download/v2.12.0/prometheus-2.12.0.linux-amd64.tar.gz`
3. 
4. `/** 解压*/`
5. `tar -zxvf prometheus-2.12.0.linux-amd64.tar.gz`

启动


      1. `跳到目录内，然后执行`
2. `/** 生产环境启动*/`
3. `nohup ./prometheus --config.file=prometheus.yml --web.enable-lifecycle --storage.tsdb.retention.time=60d &`
4. 
5. `/**`
6. `--web.enable-lifecycle 加上此参数可以远程热加载配置文件，无需重启prometheus,调用指令是curl -X POST http://ip:9090/-/reload`
7. `-- storage.tsdb.retention.time 数据默认保存时间为15天，启动时加上此参数可以控制数据保存时间`
8. 
9. `*/`

docker 方式安装（前提docker已经安装完毕）

创建目录和prometheus配置文件


      1. `mkdir /prometheus`
2. `vim /prometheus/prometheus.yml`

拉取prometheus镜像


      1. `docker pull prom
 /
 prometheus`

启动prometheus


      1. `docker run 
 -
 d 
 -
 p 
 9090
 :
 9090
  
 --
 name prometheus 
 -
 v 
 /
 home
 /
 prometheus
 .
 yml
 :
 /etc/
 prometheus
 /
 prometheus
 .
 yml prom
 /
 prometheus`

参数说明:

-d选项启动独立模式下的prometheus容器，这意味着容器将在后台启动，这种情况下只有stop docker才可以关闭prometheus，而不能执行ctrl+c
-p选择指定端口号映射，通过访问本机的9090端口，即可访问prometheus容器的9090端口
--name指定容器的名称
-v选项建立本机文件和docker内文件的映射
--config.file指定运行docker内prometheus的配置文件

prometheus配置文件的设定

书写要求


      1. `1. 大小写敏感`
2. `2. 使用缩进表示层级关系`
3. `3. 缩进时不允许使用Tab键，只允许使用空格。`
4. `4. 缩进的空格数目不重要，只要相同层级的元素左侧对齐即可`

prometheus.yml的样例

将在多种组件组合在一起之后统一讲解

在需监控的机器上部署exporter

Alertmanager安装

源码安装:


      1. `git clone https://github.com/prometheus/alertmanager.git`
2. `cd alertmanager`
3. `make build`

启动:


      1. `./
 alertmanager
 -
 config
 .
 file
 =
  alertmanager
 .
 yml 
 #默认配置项为alertmanager.yml`

官网下载安装启动:


      1. `wget https://github.com/prometheus/alertmanager/releases/download/v0.18.0/alertmanager-0.18.0.linux-amd64.tar.gz`
2. 
3. `tar -zxvf alertmanager-0.18.0.linux-amd64.tar.gz`

启动:


      1. `跳到目录里面然后执行`
2. `nohup ./alertmanager --config.file=alertmanager.yml &`

端口是：9093和9094

访问http://192.168.1.163:9093:

picture.image

配置文件alertmanager.yml


      1. `# 全局配置项`
2. `global:`
3. `resolve_timeout: 5m #处理超时时间，默认为5min`
4. `smtp_smarthost: 'smtp.sina.com:25' # 邮箱smtp服务器代理`
5. `smtp_from: '******@sina.com' # 发送邮箱名称`
6. `smtp_auth_username: '******@sina.com' # 邮箱名称`
7. `smtp_auth_password: '******' # 邮箱密码或授权码`
8. `wechat_api_url: 'https://qyapi.weixin.qq.com/cgi-bin/' # 企业微信地址`
9. 
10. 
11. `# 定义模板信心`
12. `templates:`
13. `- 'template/*.tmpl'`
14. 
15. `# 定义路由树信息`
16. `route:`
17. `group_by: ['alertname'] # 报警分组依据`
18. `group_wait: 10s # 最初即第一次等待多久时间发送一组警报的通知`
19. `group_interval: 10s # 在发送新警报前的等待时间`
20. `repeat_interval: 1m # 发送重复警报的周期 对于email配置中，此项不可以设置过低，否则将会由于邮件发送太多频繁，被smtp服务器拒绝`
21. `receiver: 'email' # 发送警报的接收者的名称，以下receivers name的名称`
22. 
23. `# 定义警报接收者信息`
24. `receivers:`
25. `- name: 'email' # 警报`
26. `email_configs: # 邮箱配置`
27. `- to: '******@163.com' # 接收警报的email配置`
28. `html: '{{ template "test.html" . }}' # 设定邮箱的内容模板`
29. `headers: { Subject: "[WARN] 报警邮件"} # 接收邮件的标题`
30. `webhook_configs: # webhook配置`
31. `- url: 'http://127.0.0.1:5001'`
32. `send_resolved: true`
33. `wechat_configs: # 企业微信报警配置`
34. `- send_resolved: true`
35. `to_party: '1' # 接收组的id`
36. `agent_id: '1000002' # (企业微信-->自定应用-->AgentId)`
37. `corp_id: '******' # 企业信息(我的企业-->CorpId[在底部])`
38. `api_secret: '******' # 企业微信(企业微信-->自定应用-->Secret)`
39. `message: '{{ template "test_wechat.html" . }}' # 发送消息模板的设定`
40. `# 一个inhibition规则是在与另一组匹配器匹配的警报存在的条件下，使匹配一组匹配器的警报失效的规则。两个警报必须具有一组相同的标签。`
41. `inhibit_rules:`
42. `- source_match:`
43. `severity: 'critical'`
44. `target_match:`
45. `severity: 'warning'`
46. `equal: ['alertname', 'dev', 'instance']`

repeat_interval配置项，对于email来说，此项不可以设置过低，否则将会由于邮件发送太多频繁，被smtp服务器拒绝
企业微信注册地址：https://work.weixin.qq.com
上述配置的email、webhook和wechat三种报警方式。目前Alertmanager所有的报警方式有以下几个方面：


      1. `email_config`
2. `hipchat_config`
3. `pagerduty_config`
4. `pushover_config`
5. `slack_config`
6. `opsgenie_config`
7. `victorops_config`

.tmpl模板的配置

test.tmpl


      1. `{{ define "test.html" }}`
2. `<table border="1">`
3. `<tr>`
4. `<td>报警项</td>`
5. `<td>实例</td>`
6. `<td>报警阀值</td>`
7. `<td>开始时间</td>`
8. `</tr>`
9. `{{ range $i, $alert := .Alerts }}`
10. `<tr>`
11. `<td>{{ index $alert.Labels "alertname" }}</td>`
12. `<td>{{ index $alert.Labels "instance" }}</td>`
13. `<td>{{ index $alert.Annotations "value" }}</td>`
14. `<td>{{ $alert.StartsAt }}</td>`
15. `</tr>`
16. `{{ end }}`
17. `</table>`
18. `{{ end }}`

上述Labels项，表示prometheus里面的可选label项。annotation项表示报警规则中定义的annotation项的内容。

test_wechat.tmpl


      1. `{{ define "cdn_live_wechat.html" }}`
2. `{{ range $i, $alert := .Alerts.Firing }}`
3. `[报警项]:{{ index $alert.Labels "alertname" }}`
4. `[实例]:{{ index $alert.Labels "instance" }}`
5. `[报警阀值]:{{ index $alert.Annotations "value" }}`
6. `[开始时间]:{{ $alert.StartsAt }}`
7. `{{ end }}`
8. `{{ end }}`

此处range遍历项与email模板中略有不同，只遍历当前没有处理的报警（Firing）。此项如果不设置，则在Alert中已经Resolved的报警项，也会被发送到企业微信。

在Prometheus模块定义告警规则

alertmanager_rules.yml样例配置文件（与prometheus同目录下）


      1. `groups:`
2. `- name: test-rules`
3. `rules:`
4. `- alert: InstanceDown # 告警名称`
5. `expr: up == 0 # 告警的判定条件，参考Prometheus高级查询来设定`
6. `for: 2m # 满足告警条件持续时间多久后，才会发送告警`
7. `labels: #标签项`
8. `team: node`
9. `annotations: # 解析项，详细解释告警信息`
10. `summary: "{{$labels.instance}}: has been down"`
11. `description: "{{$labels.instance}}: job {{$labels.job}} has been down "`
12. `value: {{$value}}`

告警信息生命周期三种状态

inactive：表示当前报警信息即不是firing状态也不是pending状态
pending：表示在设置的阈值时间范围内被激活的
firing：表示超过设置的阈值时间被激活的

通过钉钉发消息

地址:https://github.com/timonwong/prometheus-webhook-dingtalk 也可以使用docker安装。


      1. `You can deploy this tool using the Docker image from following registry:`
2. 
3. `DockerHub: https://hub.docker.com/r/timonwong/prometheus-webhook-dingtalk/`
4. `Quay.io: https://quay.io/repository/timonwong/prometheus-webhook-dingtalk`

源码安装:


      1. `yum install git`
2. `git clone https://github.com/timonwong/prometheus-webhook-dingtalk.git`
3. `cd prometheus-webhook-dingtalk`
4. `make`

prometheus-webhook-dingtalk发送钉钉告警模版文件就是src/github.com/timonwong/prometheus-webhook-dingtalk/template/default.tmpl，可以根据需要进行更改。

启动prometheus-webhook-dingtalk:


      1. `nohup ./prometheus-webhook-dingtalk --ding.profile=“ops_dingding=https://oapi.dingtalk.com/robot/send?access_token=xxx” 2>&1 1>dingding.log &`
2. `端口是8060`
3. `如果不想每次都把机器人加上可以在/etc/systemd/system/prometheus-webhook-dingtalk.service 文件中添加机器人的url。`

添加机器人url的方法见:https://www.jianshu.com/p/a3c62eb71ae3 也可以添加多个:


      1. `prometheus-webhook-dingtalk \`
2. `--ding.profile="webhook1=https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxx" \`
3. `--ding.profile="webhook2=https://oapi.dingtalk.com/robot/send?access_token=yyyyyyyyyyy"`

这里就定义了两个 WebHook，一个 webhook1，一个 webhook2，用来往不同的钉钉组发送报警消息，见:https://theo.im/blog/2017/10/16/release-prometheus-alertmanager-webhook-for-dingtalk/

此时在alertmanager.yml中要加上webhook的配置:


      1. `global:`
2. `resolve_timeout: 5m`
3. `route:`
4. `receiver: webhook`
5. `group_wait: 3s`
6. `group_interval: 5s`
7. `repeat_interval: 5m`
8. `group_by: [alertname]`
9. `routes:`
10. `- receiver: webhook`
11. `group_wait: 10s`
12. `match:`
13. `team: node`
14. `receivers:`
15. `- name: webhook`
16. `webhook_configs:`
17. `- url: http://localhost:8060/dingtalk/ops_dingding/send`
18. `send_resolved: true`

监控linux主机安装

下载:


      1. `/** 下载 */`
2. `wget https://github.com/prometheus/node_exporter/releases/download/v0.18.1/node_exporter-0.18.1.linux-amd64.tar.gz`
3. 
4. `/** 解压 */`
5. `tar -zxvf node_exporter-0.18.1.linux-amd64.tar.gz`

安装启动:


      1. `/** 启动 node_exporter*/`
2. `cd node_exporter-0.18.1.linux-amd64`
3. `nohup ./node_exporter &`
4. `/**`
5. `默认端口9100`
6. `*/`

监控mysql

下载监控MySQL的mysqld_exporter，依旧从官网下载:


      1. `/** 下载 */`
2. `wget https://github.com/prometheus/mysqld_exporter/releases/download/v0.12.1/mysqld_exporter-0.12.1.linux-amd64.tar.gz`
3. 
4. `/** 解压 */`
5. `tar -zxvf mysqld_exporter-0.12.1.linux-amd64.tar.gz`

监控账号及修改文件配置:


      1. `/** 创建账号 */`
2. `mysql> create user 'mysql_monitor'@'localhost' identified by 'aA&12345';`
3. `或者mysql> create user 'mysql_monitor_user'@'192.168.1.%' identified by 'aA&12345';`
4. `/** 授权 */`
5. `mysql> GRANT REPLICATION CLIENT, PROCESS ON *.* TO 'mysql_monitor'@'localhost';`
6. `mysql> GRANT SELECT ON performance_schema.* TO 'mysql_monitor'@'localhost';`
7. 
8. `mysql> flush privileges;`
9. `/**`
10. `注意,不同版本对权限要求不一致，启动时注意查看日志，如权限不足则继续授权或创建对应的账号`
11. `*/`

配置文件修改:


      1. `cd mysqld_exporter-0.12.0.linux-amd64`
2. 
3. `vim .my.cnf`
4. `/** 添加如下配置 */`
5. `[client]`
6. `port=3306`
7. `user=mysql_monitor`
8. `password=aA&12345`

启动:


      1. `nohup 
 ./
 mysqld\_exporter 
 --
 config
 .
 my
 -
 cnf
 =.
 my
 .
 cnf 
 &`

实际使用中用的是root用户，但是在nohup.out日志中报了:Host '127.0.0.1' is not allowed to connect to this MySQL server" 解决办法:


      1. `mysql> show databases;`
2. `+--------------------+`
3. `| Database |`
4. `+--------------------+`
5. `| information_schema |`
6. `| infosys_login |`
7. `| infosys_test |`
8. `| mms |`
9. `| mysql |`
10. `| performance_schema |`
11. `| sys |`
12. `| test |`
13. `| zabbix |`
14. `| zm_doc |`
15. `+--------------------+`
16. `10 rows in set (0.00 sec)`
17. 
18. `mysql> use mysql`
19. `Reading table information for completion of table and column names`
20. `You can turn off this feature to get a quicker startup with -A`
21. 
22. `Database changed`
23. `mysql> select host,user form mysql`
24. `-> ;`
25. `ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'mysql' at line 1`
26. `mysql> show tables;`
27. `+---------------------------+`
28. `| Tables_in_mysql |`
29. `+---------------------------+`
30. `| columns_priv |`
31. `| db |`
32. `| engine_cost |`
33. `| event |`
34. `| func |`
35. `| general_log |`
36. `| gtid_executed |`
37. `| help_category |`
38. `| help_keyword |`
39. `| help_relation |`
40. `| help_topic |`
41. `| innodb_index_stats |`
42. `| innodb_table_stats |`
43. `| ndb_binlog_index |`
44. `| plugin |`
45. `| proc |`
46. `| procs_priv |`
47. `| proxies_priv |`
48. `| server_cost |`
49. `| servers |`
50. `| slave_master_info |`
51. `| slave_relay_log_info |`
52. `| slave_worker_info |`
53. `| slow_log |`
54. `| tables_priv |`
55. `| time_zone |`
56. `| time_zone_leap_second |`
57. `| time_zone_name |`
58. `| time_zone_transition |`
59. `| time_zone_transition_type |`
60. `| user |`
61. `+---------------------------+`
62. `31 rows in set (0.00 sec)`
63. 
64. `mysql> select Host, User,Password from user;`
65. `ERROR 1054 (42S22): Unknown column 'Password' in 'field list'`
66. `mysql> select Host, User from user;`
67. `+---------------------+--------------------+`
68. `| Host | User |`
69. `+---------------------+--------------------+`
70. `| 192.168.1.% | infosys_test |`
71. `| 192.168.1.% | mysql_monitor_user |`
72. `| 192.168.1.% | root |`
73. `| 192.168.1.163 | test1664 |`
74. `| 192.168.1.164 | host164 |`
75. `| 192.168.1.164 | test123 |`
76. `| 192.168.1.164 | test14 |`
77. `| 192.168.1.164 | test1669 |`
78. `| localhost | mysql.session |`
79. `| localhost | mysql.sys |`
80. `| localhost | mysql_monitor |`
81. `| localhost | root |`
82. `| ‘192.168.1.164’ | test14 |`
83. `+---------------------+--------------------+`
84. `13 rows in set (0.00 sec)`
85. 
86. `mysql> grant all privileges on *.* to root@"127.0.0.1" identified by "123423$*MD7369qwezxc" with grant option;`
87. `Query OK, 0 rows affected, 1 warning (0.00 sec)`
88. 
89. `mysql> flush privileges;`
90. `Query OK, 0 rows affected (0.00 sec)`

问题解决

监控redis

官网上没有redis_exporter, 可以从github上获取，另外redis插件无需放在redis机器上也可以:


      1. `/** 下载 */`
2. `wget https://github.com/oliver006/redis_exporter/releases/download/v0.30.0/redis_exporter-v0.30.0.linux-amd64.tar.gz`
3. `/** 解压 */`
4. `tar -zxvf redis_exporter-v0.30.0.linux-amd64.tar.gz`

启动:


      1. `/** redis无密码 */`
2. `nohup ./redis_exporter -redis.addr=192.168.56.118:6379 -web.listen-address 0.0.0.0:9121 &`
3. 
4. `/** redis有密码 */`
5. `nohup ./redis_exporter -redis.addr=192.168.1.136:6379 -redis.password reRedis123 -web.listen-address 0.0.0.0:9122 &`
6. 
7. `/**`
8. `-web.listen-address 可以自定义监控端口`
9. `*/`

监控springboot程序

先添加 pom 依赖

springboot1:


      1. `<dependency>`
2. `<groupId>io.prometheus</groupId>`
3. `<artifactId>simpleclient_spring_boot</artifactId>`
4. `<version>0.1.0</version>`
5. `</dependency>`

springboot2:


      1. `<dependency>`
2. `<groupId>org.springframework.boot</groupId>`
3. `<artifactId>spring-boot-starter-actuator</artifactId>`
4. `</dependency>`
5. `<dependency>`
6. `<groupId>io.micrometer</groupId>`
7. `<artifactId>micrometer-core</artifactId>`
8. `</dependency>`
9. `<dependency>`
10. `<groupId>io.micrometer</groupId>`
11. `<artifactId>micrometer-registry-prometheus</artifactId>`
12. `</dependency>`

需要自定义metrics.

启动类添加注解

springboot1:


      1. `@EnablePrometheusEndpoint`
2. `@EnableSpringBootMetricsCollector`

配置文件添加

springboot1:


      1. `# 默认账号密码`
2. `managment.security.enabled=false`
3. `spring.application.name=microservice-prometheus`

springboot2参考:https://segmentfault.com/a/1190000018642077

配置prometheus配置文件

添加各监控项


      1. `# Prometheus全局配置项`
2. `global:`
3. `scrape_interval: 15s # 设定抓取数据的周期，默认为1min`
4. `evaluation_interval: 15s # 设定更新rules文件的周期，默认为1min`
5. `scrape_timeout: 15s # 设定抓取数据的超时时间，默认为10s`
6. `external_labels: # 额外的属性，会添加到拉取得数据并存到数据库中`
7. `monitor: 'codelab_monitor'`
8. 
9. 
10. `# Alertmanager配置`
11. `alerting:`
12. `alertmanagers:`
13. `- static_configs:`
14. `- targets: ["localhost:9093"] # 设定alertmanager和prometheus交互的接口，即alertmanager监听的ip地址和端口`
15. 
16. `# rule配置，首次读取默认加载，之后根据evaluation_interval设定的周期加载`
17. `rule_files:`
18. `- "alertmanager_rules.yml"`
19. `- "prometheus_rules.yml"`
20. 
21. `# scape配置`
22. `scrape_configs:`
23. `- job_name: 'prometheus' # job_name默认写入timeseries的labels中，可以用于查询使用`
24. `scrape_interval: 15s # 抓取周期，默认采用global配置`
25. `static_configs: # 静态配置`
26. `- targets: ['localhost:9090'] # prometheus所要抓取数据的地址，即instance实例项`
27. 
28. `- job_name: 'OS'`
29. `static_configs:`
30. `- targets: ['localhost:9100']`
31. `labels:`
32. `instance:'192.168.1.163'`
33. `- targets: ['192.168.56.116:9100']`
34. `labels:`
35. `instance: '192.168.56.116'`
36. 
37. `- targets: ['192.168.56.117:9100']`
38. `labels:`
39. `instance: '192.168.56.117'`
40. `## 上述job单独做主机监控，每台主机的instance不同`
41. `- job_name: 'mysql'`
42. 
43. `# metrics_path defaults to '/metrics'`
44. `# scheme defaults to 'http'.`
45. 
46. `static_configs:`
47. `- targets: ['192.168.56.116:9104']`
48. `labels:`
49. `instance: '192.168.56.116'`
50. 
51. `- targets: ['192.168.56.117:9104']`
52. `labels:`
53. `instance: '192.168.56.117'`
54. 
55. `## 以上是监控mysql的，instance和主机的instance的相同`
56. `- job_name: 'redis'`
57. 
58. `# metrics_path defaults to '/metrics'`
59. `# scheme defaults to 'http'.`
60. 
61. `static_configs:`
62. `- targets: ['192.168.56.118:9121','192.168.56.118:9122']`
63. `labels:`
64. `instance: '192.168.56.118'`
65. 
66. `- targets: ['192.168.56.118:9100']`
67. `labels:`
68. `instance: '192.168.56.118'`
69. `# 可以类似上述这种，redis的主机及各redis监控项组合在一起，instance使用相同的`

prometheus_rule.yml:


      1. `groups:`
2. `- name: example`
3. `rules:`
4. `- record:cpu_utilization_ratio //新的规则名`
5. `expr: 100 - (avg by (instance) (irate(node_cpu_seconds_total[5m])) * 100) //规则表达式`

alertmanager_rules.yml:


      1. `groups:`
2. `- name: test-rules`
3. `rules:`
4. `- alert: InstanceDown # 告警名称`
5. `expr: up == 0 # 告警的判定条件，参考Prometheus高级查询来设定`
6. `for: 2m # 满足告警条件持续时间多久后，才会发送告警`
7. `labels: #标签项`
8. `team: node`
9. `annotations: # 解析项，详细解释告警信息`
10. `summary: "{{$labels.instance}}: has been down"`
11. `description: "{{$labels.instance}}: job {{$labels.job}} has been down "`
12. `value: {{$value}}`

格式化之后:


      1. `global:`
2. `scrape_interval: 15s # 设定抓取数据的周期，默认为1min`
3. `evaluation_interval: 15s # 设定更新rules文件的周期，默认为1min`
4. `scrape_timeout: 15s # 设定抓取数据的超时时间，默认为10s`
5. `external_labels: # 额外的属性，会添加到拉取得数据并存到数据库中`
6. `monitor: 'codelab_monitor'`
7. 
8. `alerting:`
9. `alertmanagers:`
10. `- static_configs:`
11. `- targets: ['localhost:9093']`
12. 
13. `rule_files:`
14. `- "alertmanager_rulesl.yml"`
15. `- "prometheus_rules.yml"`
16. 
17. `scrape_configs:`
18. `- job_name: 'prometheus'`
19. `scrape_interval: 15s`
20. `static_configs:`
21. `- targets: ['localhost:9090']`
22. 
23. `- job_name: 'OS'`
24. 
25. `# metrics_path defaults to '/metrics'`
26. `# scheme defaults to 'http'.`
27. 
28. `static_configs:`
29. `- targets: ['localhost:9100']`
30. `labels:`
31. `instance: '192.168.1.163'`
32. `- targets: ['192.168.1.164:9100']`
33. `labels:`
34. `instance: '192.168.1.164'`
35. 
36. `- job_name: 'mysql'`
37. 
38. `# metrics_path defaults to '/metrics'`
39. `# scheme defaults to 'http'.`
40. 
41. `static_configs:`
42. `- targets: ['192.168.1.163:9104']`
43. `labels:`
44. `instance: '192.168.1.163'`
45. `- targets: ['192.168.1.164:9104']`
46. `labels:`
47. `instance: '192.168.1.164'`
48. 
49. `- job_name: spring-boot`
50. `static_configs:`
51. `- targets: ['192.168.1.208:8080']`
52. 
53. `- job_name: 'redis'`
54. `static_configs:`
55. `- targets: ['192.168.1.136:9122']`
56. `labels:`
57. `instance: '192.168.1.136'`

在http://www.bejson.com/validators/yaml\_editor/中:

picture.image

启动或热加载prometheus


      1. `/** 启动 */`
2. `nohup ./prometheus --config.file=prometheus.yml --web.enable-lifecycle --storage.tsdb.retention.time=60d &`
3. 
4. `/**`
5. `-- storage.tsdb.retention.time 数据默认保存时间为15天，启动时加上此参数可以控制数据保存时间`
6. `*/`
7. 
8. `/** 热加载 */`
9. `curl -X POST http://ip:9090/-/reload`
10. 
11. `/**`
12. `热加载的前提是启动时加了--web.enable-lifecycle`
13. `*/`