第一步 部署prometheus operator环境
git地址:https://github.com/prometheus-operator/kube-prometheus.git
选用适用自己k8s版本的release,例如我k8s是1.13的,所以我选择了release-o.1。
部署文件都在manifests/文件夹下,直接一键部署就行。
第二步 修改alertmanager告警配置
由于内置的告警方式不符合需求,所以需要修改下,加入邮箱和webhook配置。
alertmanager.yaml
global:
resolve_timeout: 5m
smtp_smarthost: 'smtp.163.com:25'
smtp_from: '***@163.com'
smtp_auth_username: '***@163.com'
smtp_auth_password: '***'
smtp_hello: '163.com'
smtp_require_tls: false
route:
group_by: ['job', 'severity']
group_wait: 30s
group_interval: 30s
repeat_interval: 1m
receiver: 'webhook'
receivers:
- name: 'default'
email_configs:
- to: '****@qq.com'
send_resolved: true
- name: 'webhook'
webhook_configs:
- url: 'http://172.16.3.63:9006/webhook/'
send_resolved: true
这一份配置中,配置了邮件告警和webhook告警,route里面指定了只开启webhook告警。webhook的实现很简单,示例:
@RestController
@Slf4j
@RequestMapping("/webhook")
public class WebHookController {
@RequestMapping("/")
public String webhook(@RequestBody String body) {
log.info("webhook警报系统,body:{}",body);
return "success";
}
}
所有信息都会出现在body中,程序拿到告警信息后进行二次处理。
第三步 部署PrometheusRule
prometheus operator部署完成后,会有一个默认的prometheus配置,如下:
[root@master manifests]# kubectl get prometheusRule --all-namespaces
NAMESPACE NAME AGE
default prometheus-k8s-rules 18h
fline rule 15h
monitoring etcd-rules 12h
其中,prometheus-k8s-rules 是自带的配置,里面定义了很多监控项。
PrometheusRule作为一个新的自定义资源类型,定义alertmanager的监控项,里面的写法重点是统计的表达式,以下是一份示例:
etcd-rules.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
labels:
prometheus: k8s
role: alert-rules
name: etcd-rules
namespace: monitoring
spec:
groups:
- name: etcd
rules:
- alert: EtcdClusterUnavailable
annotations:
summary: etcd cluster small
description: If one more etcd peer goes down the cluster will be unavailable
expr: |
count(up{job="etcd"} == 0) > (count(up{job="etcd"}) / 2 - 1)
for: 3m
labels:
severity: critical
文件中expr的表达式是prometheus表达式,用于定时统计,然后告警。
第四步 fabric8代码操作PrometheusRule
代码示例:
String rule = "apiVersion: monitoring.coreos.com/v1\n" +
"kind: PrometheusRule\n" +
"metadata:\n" +
" name: "+ byId.getAlertName() +"\n" +
" namespace: monitoring\n" +
" labels:\n" +
" prometheus: k8s\n"+
" role: alert-rules\n"+
"spec:\n" +
" groups:\n" +
" - name: "+ byId.getAlertName() +"\n" +
" rules:\n" +
" - alert: Prometheus scraping errors\n" +
" expr: >-\n" +
" "+PrometheusExprUtil.getExpr(byId.getTarget(), Double.valueOf(byId.getQuota()),byId.getAppName())+"\n" +
" for: 5m\n" +
" labels:\n" +
" page: monitoring\n" +
" team: monitoring\n" +
" annotations:\n" +
" summary: "+ byId.getAlertDesc() +"\n" +
" description: |\n" +
" Check failing services";
CustomResourceDefinitionContext crdContext = new CustomResourceDefinitionContext.Builder()
.withGroup("monitoring.coreos.com")
.withPlural("prometheusrules")
.withScope("Namespaced")
.withVersion("v1")
.build();
try {
kubernetesClient.customResource(crdContext)
.createOrReplace("monitoring",rule);
}catch (Exception e){
e.printStackTrace();
return ResultVo.renderErr(CodeEnum.ERR).withRemark("操作出错:"+e.getMessage());
}
基本就是字符串拼接成合法的yaml文件格式,然后直接传入。
总结
fabric8 支持操作自定义资源,但是很明显没有内置的项deployment和service的支持好用。
阅读次数: 本文累计被阅读 1000000 次