本文将使用prometheus及Grafana搭建一套监控系统来监控主机springboot应用及数据库(MySQL、Redis)
Grafana是一个可视化面板(Dashboard),有着非常漂亮的图表和布局展示,功能齐全的度量仪表盘和图形编辑器,支持Graphite、zabbix、InfluxDB、Prometheus等数据源。
下载地址:https://grafana.com/grafana/download
本文主要介绍linux版本:
centos下安装命令为:
1. `wget https://dl.grafana.com/oss/release/grafana-6.3.3-1.x86_64.rpm`
2. `sudo yum localinstall grafana-6.3.3-1.x86_64.rpm`
配置
安装完成后,配置文件位于/etc/grafana/grafana.ini
可以看到上面配置的http端口是3000
启动grafana
1. `/etc/
init
.
d
/
grafana
-
server start`
登录grafana
访问页面http://服务器IP:3000 ,默认账号、密码admin/admin 首次登录将提示修改密码,建议修改
Prometheus时序数据库结构:
下载地址
https://prometheus.io/download/
下载页面内有很多拓展包,如alertManager和mysqldexporter、haproxyexporter、memcache_exporter等exporter。
普通方式安装与启动
安装:
1. `/** 下载*/`
2. `wget https://github.com/prometheus/prometheus/releases/download/v2.12.0/prometheus-2.12.0.linux-amd64.tar.gz`
3.
4. `/** 解压*/`
5. `tar -zxvf prometheus-2.12.0.linux-amd64.tar.gz`
启动
1. `跳到目录内,然后执行`
2. `/** 生产环境启动*/`
3. `nohup ./prometheus --config.file=prometheus.yml --web.enable-lifecycle --storage.tsdb.retention.time=60d &`
4.
5. `/**`
6. `--web.enable-lifecycle 加上此参数可以远程热加载配置文件,无需重启prometheus,调用指令是curl -X POST http://ip:9090/-/reload`
7. `-- storage.tsdb.retention.time 数据默认保存时间为15天,启动时加上此参数可以控制数据保存时间`
8.
9. `*/`
docker 方式安装(前提docker已经安装完毕)
创建目录和prometheus配置文件
1. `mkdir /prometheus`
2. `vim /prometheus/prometheus.yml`
拉取prometheus镜像
1. `docker pull prom
/
prometheus`
启动prometheus
1. `docker run
-
d
-
p
9090
:
9090
--
name prometheus
-
v
/
home
/
prometheus
.
yml
:
/etc/
prometheus
/
prometheus
.
yml prom
/
prometheus`
参数说明:
- -d选项启动独立模式下的prometheus容器,这意味着容器将在后台启动,这种情况下只有stop docker才可以关闭prometheus,而不能执行ctrl+c
- -p选择指定端口号映射,通过访问本机的9090端口,即可访问prometheus容器的9090端口
- --name指定容器的名称
- -v选项建立本机文件和docker内文件的映射
- --config.file指定运行docker内prometheus的配置文件
prometheus配置文件的设定
书写要求
1. `1. 大小写敏感`
2. `2. 使用缩进表示层级关系`
3. `3. 缩进时不允许使用Tab键,只允许使用空格。`
4. `4. 缩进的空格数目不重要,只要相同层级的元素左侧对齐即可`
prometheus.yml的样例
将在多种组件组合在一起之后统一讲解
Alertmanager安装
源码安装:
1. `git clone https://github.com/prometheus/alertmanager.git`
2. `cd alertmanager`
3. `make build`
启动:
1. `./
alertmanager
-
config
.
file
=
alertmanager
.
yml
#默认配置项为alertmanager.yml`
官网下载安装启动:
1. `wget https://github.com/prometheus/alertmanager/releases/download/v0.18.0/alertmanager-0.18.0.linux-amd64.tar.gz`
2.
3. `tar -zxvf alertmanager-0.18.0.linux-amd64.tar.gz`
启动:
1. `跳到目录里面然后执行`
2. `nohup ./alertmanager --config.file=alertmanager.yml &`
端口是:9093和9094
配置文件alertmanager.yml
1. `# 全局配置项`
2. `global:`
3. `resolve_timeout: 5m #处理超时时间,默认为5min`
4. `smtp_smarthost: 'smtp.sina.com:25' # 邮箱smtp服务器代理`
5. `smtp_from: '******@sina.com' # 发送邮箱名称`
6. `smtp_auth_username: '******@sina.com' # 邮箱名称`
7. `smtp_auth_password: '******' # 邮箱密码或授权码`
8. `wechat_api_url: 'https://qyapi.weixin.qq.com/cgi-bin/' # 企业微信地址`
9.
10.
11. `# 定义模板信心`
12. `templates:`
13. `- 'template/*.tmpl'`
14.
15. `# 定义路由树信息`
16. `route:`
17. `group_by: ['alertname'] # 报警分组依据`
18. `group_wait: 10s # 最初即第一次等待多久时间发送一组警报的通知`
19. `group_interval: 10s # 在发送新警报前的等待时间`
20. `repeat_interval: 1m # 发送重复警报的周期 对于email配置中,此项不可以设置过低,否则将会由于邮件发送太多频繁,被smtp服务器拒绝`
21. `receiver: 'email' # 发送警报的接收者的名称,以下receivers name的名称`
22.
23. `# 定义警报接收者信息`
24. `receivers:`
25. `- name: 'email' # 警报`
26. `email_configs: # 邮箱配置`
27. `- to: '******@163.com' # 接收警报的email配置`
28. `html: '{{ template "test.html" . }}' # 设定邮箱的内容模板`
29. `headers: { Subject: "[WARN] 报警邮件"} # 接收邮件的标题`
30. `webhook_configs: # webhook配置`
31. `- url: 'http://127.0.0.1:5001'`
32. `send_resolved: true`
33. `wechat_configs: # 企业微信报警配置`
34. `- send_resolved: true`
35. `to_party: '1' # 接收组的id`
36. `agent_id: '1000002' # (企业微信-->自定应用-->AgentId)`
37. `corp_id: '******' # 企业信息(我的企业-->CorpId[在底部])`
38. `api_secret: '******' # 企业微信(企业微信-->自定应用-->Secret)`
39. `message: '{{ template "test_wechat.html" . }}' # 发送消息模板的设定`
40. `# 一个inhibition规则是在与另一组匹配器匹配的警报存在的条件下,使匹配一组匹配器的警报失效的规则。两个警报必须具有一组相同的标签。`
41. `inhibit_rules:`
42. `- source_match:`
43. `severity: 'critical'`
44. `target_match:`
45. `severity: 'warning'`
46. `equal: ['alertname', 'dev', 'instance']`
- repeat_interval配置项,对于email来说,此项不可以设置过低,否则将会由于邮件发送太多频繁,被smtp服务器拒绝
- 企业微信注册地址:https://work.weixin.qq.com
- 上述配置的email、webhook和wechat三种报警方式。目前Alertmanager所有的报警方式有以下几个方面:
1. `email_config`
2. `hipchat_config`
3. `pagerduty_config`
4. `pushover_config`
5. `slack_config`
6. `opsgenie_config`
7. `victorops_config`
.tmpl模板的配置
test.tmpl
1. `{{ define "test.html" }}`
2. `<table border="1">`
3. `<tr>`
4. `<td>报警项</td>`
5. `<td>实例</td>`
6. `<td>报警阀值</td>`
7. `<td>开始时间</td>`
8. `</tr>`
9. `{{ range $i, $alert := .Alerts }}`
10. `<tr>`
11. `<td>{{ index $alert.Labels "alertname" }}</td>`
12. `<td>{{ index $alert.Labels "instance" }}</td>`
13. `<td>{{ index $alert.Annotations "value" }}</td>`
14. `<td>{{ $alert.StartsAt }}</td>`
15. `</tr>`
16. `{{ end }}`
17. `</table>`
18. `{{ end }}`
上述Labels项,表示prometheus里面的可选label项。annotation项表示报警规则中定义的annotation项的内容。
test_wechat.tmpl
1. `{{ define "cdn_live_wechat.html" }}`
2. `{{ range $i, $alert := .Alerts.Firing }}`
3. `[报警项]:{{ index $alert.Labels "alertname" }}`
4. `[实例]:{{ index $alert.Labels "instance" }}`
5. `[报警阀值]:{{ index $alert.Annotations "value" }}`
6. `[开始时间]:{{ $alert.StartsAt }}`
7. `{{ end }}`
8. `{{ end }}`
此处range遍历项与email模板中略有不同,只遍历当前没有处理的报警(Firing)。此项如果不设置,则在Alert中已经Resolved的报警项,也会被发送到企业微信。
在Prometheus模块定义告警规则
alertmanager_rules.yml样例配置文件(与prometheus同目录下)
1. `groups:`
2. `- name: test-rules`
3. `rules:`
4. `- alert: InstanceDown # 告警名称`
5. `expr: up == 0 # 告警的判定条件,参考Prometheus高级查询来设定`
6. `for: 2m # 满足告警条件持续时间多久后,才会发送告警`
7. `labels: #标签项`
8. `team: node`
9. `annotations: # 解析项,详细解释告警信息`
10. `summary: "{{$labels.instance}}: has been down"`
11. `description: "{{$labels.instance}}: job {{$labels.job}} has been down "`
12. `value: {{$value}}`
告警信息生命周期三种状态
- inactive:表示当前报警信息即不是firing状态也不是pending状态
- pending:表示在设置的阈值时间范围内被激活的
- firing:表示超过设置的阈值时间被激活的
通过钉钉发消息
地址:https://github.com/timonwong/prometheus-webhook-dingtalk 也可以使用docker安装。
1. `You can deploy this tool using the Docker image from following registry:`
2.
3. `DockerHub: https://hub.docker.com/r/timonwong/prometheus-webhook-dingtalk/`
4. `Quay.io: https://quay.io/repository/timonwong/prometheus-webhook-dingtalk`
源码安装:
1. `yum install git`
2. `git clone https://github.com/timonwong/prometheus-webhook-dingtalk.git`
3. `cd prometheus-webhook-dingtalk`
4. `make`
prometheus-webhook-dingtalk发送钉钉告警模版文件就是src/github.com/timonwong/prometheus-webhook-dingtalk/template/default.tmpl,可以根据需要进行更改。
启动prometheus-webhook-dingtalk:
1. `nohup ./prometheus-webhook-dingtalk --ding.profile=“ops_dingding=https://oapi.dingtalk.com/robot/send?access_token=xxx” 2>&1 1>dingding.log &`
2. `端口是8060`
3. `如果不想每次都把机器人加上可以在/etc/systemd/system/prometheus-webhook-dingtalk.service 文件中添加机器人的url。`
添加机器人url的方法见:https://www.jianshu.com/p/a3c62eb71ae3 也可以添加多个:
1. `prometheus-webhook-dingtalk \`
2. `--ding.profile="webhook1=https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxx" \`
3. `--ding.profile="webhook2=https://oapi.dingtalk.com/robot/send?access_token=yyyyyyyyyyy"`
这里就定义了两个 WebHook,一个 webhook1,一个 webhook2,用来往不同的钉钉组发送报警消息,见:https://theo.im/blog/2017/10/16/release-prometheus-alertmanager-webhook-for-dingtalk/
此时在alertmanager.yml中要加上webhook的配置:
1. `global:`
2. `resolve_timeout: 5m`
3. `route:`
4. `receiver: webhook`
5. `group_wait: 3s`
6. `group_interval: 5s`
7. `repeat_interval: 5m`
8. `group_by: [alertname]`
9. `routes:`
10. `- receiver: webhook`
11. `group_wait: 10s`
12. `match:`
13. `team: node`
14. `receivers:`
15. `- name: webhook`
16. `webhook_configs:`
17. `- url: http://localhost:8060/dingtalk/ops_dingding/send`
18. `send_resolved: true`
监控linux主机安装
下载:
1. `/** 下载 */`
2. `wget https://github.com/prometheus/node_exporter/releases/download/v0.18.1/node_exporter-0.18.1.linux-amd64.tar.gz`
3.
4. `/** 解压 */`
5. `tar -zxvf node_exporter-0.18.1.linux-amd64.tar.gz`
安装启动:
1. `/** 启动 node_exporter*/`
2. `cd node_exporter-0.18.1.linux-amd64`
3. `nohup ./node_exporter &`
4. `/**`
5. `默认端口9100`
6. `*/`
监控mysql
下载监控MySQL的mysqld_exporter,依旧从官网下载:
1. `/** 下载 */`
2. `wget https://github.com/prometheus/mysqld_exporter/releases/download/v0.12.1/mysqld_exporter-0.12.1.linux-amd64.tar.gz`
3.
4. `/** 解压 */`
5. `tar -zxvf mysqld_exporter-0.12.1.linux-amd64.tar.gz`
监控账号及修改文件配置:
1. `/** 创建账号 */`
2. `mysql> create user 'mysql_monitor'@'localhost' identified by 'aA&12345';`
3. `或者mysql> create user 'mysql_monitor_user'@'192.168.1.%' identified by 'aA&12345';`
4. `/** 授权 */`
5. `mysql> GRANT REPLICATION CLIENT, PROCESS ON *.* TO 'mysql_monitor'@'localhost';`
6. `mysql> GRANT SELECT ON performance_schema.* TO 'mysql_monitor'@'localhost';`
7.
8. `mysql> flush privileges;`
9. `/**`
10. `注意,不同版本对权限要求不一致,启动时注意查看日志,如权限不足则继续授权或创建对应的账号`
11. `*/`
配置文件修改:
1. `cd mysqld_exporter-0.12.0.linux-amd64`
2.
3. `vim .my.cnf`
4. `/** 添加如下配置 */`
5. `[client]`
6. `port=3306`
7. `user=mysql_monitor`
8. `password=aA&12345`
启动:
1. `nohup
./
mysqld\_exporter
--
config
.
my
-
cnf
=.
my
.
cnf
&`
实际使用中用的是root用户,但是在nohup.out日志中报了:Host '127.0.0.1' is not allowed to connect to this MySQL server" 解决办法:
1. `mysql> show databases;`
2. `+--------------------+`
3. `| Database |`
4. `+--------------------+`
5. `| information_schema |`
6. `| infosys_login |`
7. `| infosys_test |`
8. `| mms |`
9. `| mysql |`
10. `| performance_schema |`
11. `| sys |`
12. `| test |`
13. `| zabbix |`
14. `| zm_doc |`
15. `+--------------------+`
16. `10 rows in set (0.00 sec)`
17.
18. `mysql> use mysql`
19. `Reading table information for completion of table and column names`
20. `You can turn off this feature to get a quicker startup with -A`
21.
22. `Database changed`
23. `mysql> select host,user form mysql`
24. `-> ;`
25. `ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'mysql' at line 1`
26. `mysql> show tables;`
27. `+---------------------------+`
28. `| Tables_in_mysql |`
29. `+---------------------------+`
30. `| columns_priv |`
31. `| db |`
32. `| engine_cost |`
33. `| event |`
34. `| func |`
35. `| general_log |`
36. `| gtid_executed |`
37. `| help_category |`
38. `| help_keyword |`
39. `| help_relation |`
40. `| help_topic |`
41. `| innodb_index_stats |`
42. `| innodb_table_stats |`
43. `| ndb_binlog_index |`
44. `| plugin |`
45. `| proc |`
46. `| procs_priv |`
47. `| proxies_priv |`
48. `| server_cost |`
49. `| servers |`
50. `| slave_master_info |`
51. `| slave_relay_log_info |`
52. `| slave_worker_info |`
53. `| slow_log |`
54. `| tables_priv |`
55. `| time_zone |`
56. `| time_zone_leap_second |`
57. `| time_zone_name |`
58. `| time_zone_transition |`
59. `| time_zone_transition_type |`
60. `| user |`
61. `+---------------------------+`
62. `31 rows in set (0.00 sec)`
63.
64. `mysql> select Host, User,Password from user;`
65. `ERROR 1054 (42S22): Unknown column 'Password' in 'field list'`
66. `mysql> select Host, User from user;`
67. `+---------------------+--------------------+`
68. `| Host | User |`
69. `+---------------------+--------------------+`
70. `| 192.168.1.% | infosys_test |`
71. `| 192.168.1.% | mysql_monitor_user |`
72. `| 192.168.1.% | root |`
73. `| 192.168.1.163 | test1664 |`
74. `| 192.168.1.164 | host164 |`
75. `| 192.168.1.164 | test123 |`
76. `| 192.168.1.164 | test14 |`
77. `| 192.168.1.164 | test1669 |`
78. `| localhost | mysql.session |`
79. `| localhost | mysql.sys |`
80. `| localhost | mysql_monitor |`
81. `| localhost | root |`
82. `| ‘192.168.1.164’ | test14 |`
83. `+---------------------+--------------------+`
84. `13 rows in set (0.00 sec)`
85.
86. `mysql> grant all privileges on *.* to root@"127.0.0.1" identified by "123423$*MD7369qwezxc" with grant option;`
87. `Query OK, 0 rows affected, 1 warning (0.00 sec)`
88.
89. `mysql> flush privileges;`
90. `Query OK, 0 rows affected (0.00 sec)`
问题解决
监控redis
官网上没有redis_exporter, 可以从github上获取,另外redis插件无需放在redis机器上也可以:
1. `/** 下载 */`
2. `wget https://github.com/oliver006/redis_exporter/releases/download/v0.30.0/redis_exporter-v0.30.0.linux-amd64.tar.gz`
3. `/** 解压 */`
4. `tar -zxvf redis_exporter-v0.30.0.linux-amd64.tar.gz`
启动:
1. `/** redis无密码 */`
2. `nohup ./redis_exporter -redis.addr=192.168.56.118:6379 -web.listen-address 0.0.0.0:9121 &`
3.
4. `/** redis有密码 */`
5. `nohup ./redis_exporter -redis.addr=192.168.1.136:6379 -redis.password reRedis123 -web.listen-address 0.0.0.0:9122 &`
6.
7. `/**`
8. `-web.listen-address 可以自定义监控端口`
9. `*/`
监控springboot程序
先添加 pom 依赖
springboot1:
1. `<dependency>`
2. `<groupId>io.prometheus</groupId>`
3. `<artifactId>simpleclient_spring_boot</artifactId>`
4. `<version>0.1.0</version>`
5. `</dependency>`
springboot2:
1. `<dependency>`
2. `<groupId>org.springframework.boot</groupId>`
3. `<artifactId>spring-boot-starter-actuator</artifactId>`
4. `</dependency>`
5. `<dependency>`
6. `<groupId>io.micrometer</groupId>`
7. `<artifactId>micrometer-core</artifactId>`
8. `</dependency>`
9. `<dependency>`
10. `<groupId>io.micrometer</groupId>`
11. `<artifactId>micrometer-registry-prometheus</artifactId>`
12. `</dependency>`
需要自定义metrics.
启动类添加注解
springboot1:
1. `@EnablePrometheusEndpoint`
2. `@EnableSpringBootMetricsCollector`
配置文件添加
springboot1:
1. `# 默认账号密码`
2. `managment.security.enabled=false`
3. `spring.application.name=microservice-prometheus`
springboot2参考:https://segmentfault.com/a/1190000018642077
添加各监控项
1. `# Prometheus全局配置项`
2. `global:`
3. `scrape_interval: 15s # 设定抓取数据的周期,默认为1min`
4. `evaluation_interval: 15s # 设定更新rules文件的周期,默认为1min`
5. `scrape_timeout: 15s # 设定抓取数据的超时时间,默认为10s`
6. `external_labels: # 额外的属性,会添加到拉取得数据并存到数据库中`
7. `monitor: 'codelab_monitor'`
8.
9.
10. `# Alertmanager配置`
11. `alerting:`
12. `alertmanagers:`
13. `- static_configs:`
14. `- targets: ["localhost:9093"] # 设定alertmanager和prometheus交互的接口,即alertmanager监听的ip地址和端口`
15.
16. `# rule配置,首次读取默认加载,之后根据evaluation_interval设定的周期加载`
17. `rule_files:`
18. `- "alertmanager_rules.yml"`
19. `- "prometheus_rules.yml"`
20.
21. `# scape配置`
22. `scrape_configs:`
23. `- job_name: 'prometheus' # job_name默认写入timeseries的labels中,可以用于查询使用`
24. `scrape_interval: 15s # 抓取周期,默认采用global配置`
25. `static_configs: # 静态配置`
26. `- targets: ['localhost:9090'] # prometheus所要抓取数据的地址,即instance实例项`
27.
28. `- job_name: 'OS'`
29. `static_configs:`
30. `- targets: ['localhost:9100']`
31. `labels:`
32. `instance:'192.168.1.163'`
33. `- targets: ['192.168.56.116:9100']`
34. `labels:`
35. `instance: '192.168.56.116'`
36.
37. `- targets: ['192.168.56.117:9100']`
38. `labels:`
39. `instance: '192.168.56.117'`
40. `## 上述job单独做主机监控,每台主机的instance不同`
41. `- job_name: 'mysql'`
42.
43. `# metrics_path defaults to '/metrics'`
44. `# scheme defaults to 'http'.`
45.
46. `static_configs:`
47. `- targets: ['192.168.56.116:9104']`
48. `labels:`
49. `instance: '192.168.56.116'`
50.
51. `- targets: ['192.168.56.117:9104']`
52. `labels:`
53. `instance: '192.168.56.117'`
54.
55. `## 以上是监控mysql的,instance和主机的instance的相同`
56. `- job_name: 'redis'`
57.
58. `# metrics_path defaults to '/metrics'`
59. `# scheme defaults to 'http'.`
60.
61. `static_configs:`
62. `- targets: ['192.168.56.118:9121','192.168.56.118:9122']`
63. `labels:`
64. `instance: '192.168.56.118'`
65.
66. `- targets: ['192.168.56.118:9100']`
67. `labels:`
68. `instance: '192.168.56.118'`
69. `# 可以类似上述这种,redis的主机及各redis监控项组合在一起,instance使用相同的`
prometheus_rule.yml:
1. `groups:`
2. `- name: example`
3. `rules:`
4. `- record:cpu_utilization_ratio //新的规则名`
5. `expr: 100 - (avg by (instance) (irate(node_cpu_seconds_total[5m])) * 100) //规则表达式`
alertmanager_rules.yml:
1. `groups:`
2. `- name: test-rules`
3. `rules:`
4. `- alert: InstanceDown # 告警名称`
5. `expr: up == 0 # 告警的判定条件,参考Prometheus高级查询来设定`
6. `for: 2m # 满足告警条件持续时间多久后,才会发送告警`
7. `labels: #标签项`
8. `team: node`
9. `annotations: # 解析项,详细解释告警信息`
10. `summary: "{{$labels.instance}}: has been down"`
11. `description: "{{$labels.instance}}: job {{$labels.job}} has been down "`
12. `value: {{$value}}`
格式化之后:
1. `global:`
2. `scrape_interval: 15s # 设定抓取数据的周期,默认为1min`
3. `evaluation_interval: 15s # 设定更新rules文件的周期,默认为1min`
4. `scrape_timeout: 15s # 设定抓取数据的超时时间,默认为10s`
5. `external_labels: # 额外的属性,会添加到拉取得数据并存到数据库中`
6. `monitor: 'codelab_monitor'`
7.
8. `alerting:`
9. `alertmanagers:`
10. `- static_configs:`
11. `- targets: ['localhost:9093']`
12.
13. `rule_files:`
14. `- "alertmanager_rulesl.yml"`
15. `- "prometheus_rules.yml"`
16.
17. `scrape_configs:`
18. `- job_name: 'prometheus'`
19. `scrape_interval: 15s`
20. `static_configs:`
21. `- targets: ['localhost:9090']`
22.
23. `- job_name: 'OS'`
24.
25. `# metrics_path defaults to '/metrics'`
26. `# scheme defaults to 'http'.`
27.
28. `static_configs:`
29. `- targets: ['localhost:9100']`
30. `labels:`
31. `instance: '192.168.1.163'`
32. `- targets: ['192.168.1.164:9100']`
33. `labels:`
34. `instance: '192.168.1.164'`
35.
36. `- job_name: 'mysql'`
37.
38. `# metrics_path defaults to '/metrics'`
39. `# scheme defaults to 'http'.`
40.
41. `static_configs:`
42. `- targets: ['192.168.1.163:9104']`
43. `labels:`
44. `instance: '192.168.1.163'`
45. `- targets: ['192.168.1.164:9104']`
46. `labels:`
47. `instance: '192.168.1.164'`
48.
49. `- job_name: spring-boot`
50. `static_configs:`
51. `- targets: ['192.168.1.208:8080']`
52.
53. `- job_name: 'redis'`
54. `static_configs:`
55. `- targets: ['192.168.1.136:9122']`
56. `labels:`
57. `instance: '192.168.1.136'`
在http://www.bejson.com/validators/yaml\_editor/中:
1. `/** 启动 */`
2. `nohup ./prometheus --config.file=prometheus.yml --web.enable-lifecycle --storage.tsdb.retention.time=60d &`
3.
4. `/**`
5. `-- storage.tsdb.retention.time 数据默认保存时间为15天,启动时加上此参数可以控制数据保存时间`
6. `*/`
7.
8. `/** 热加载 */`
9. `curl -X POST http://ip:9090/-/reload`
10.
11. `/**`
12. `热加载的前提是启动时加了--web.enable-lifecycle`
13. `*/`
浏览器上打开http://192.168.1.163:9090/targets:
关掉mysql exporter之后:
多添加了几台之后:
下载地址:
https://grafana.com/grafana/dashboards
流程
关于dashboards,主机监控我选用的是9276:
import:
import之后:
也可以选用:https://github.com/percona/grafana-dashboards/blob/master/dashboards/System\_Overview.json
mysql dashboards
选用的是:https://github.com/percona/grafana-dashboards/blob/master/dashboards/MySQL\_Overview.json
redis dashboards
根据版本,也可选用:https://grafana.com/grafana/dashboards/763 import: