SRE 每日主题：Higress 云原生网关部署与生产实践

日期： 2026-03-13
主题序号： 1 (13 % 12 = 1)
难度等级： ⭐⭐⭐⭐
适用场景： 生产环境云原生网关部署

一、Higress 概述

Higress 是阿里巴巴开源的云原生网关，基于 Envoy + Istio 构建，提供：

流量网关：南北向流量入口
微服务网关：东西向服务治理
安全网关：WAF、认证、限流
AI 网关：大模型 API 统一接入

核心优势

特性	说明
高性能	基于 Envoy，单机 10W+ QPS
热更新	配置变更无需重启
多协议	HTTP/HTTPS/gRPC/Dubbo
可观测	内置 Prometheus 指标
插件化	WASM 插件扩展能力

二、生产环境部署方案

2.1 前置要求

# Kubernetes 版本要求
kubectl version --short
# 要求：v1.20+

# Helm 版本要求
helm version
# 要求：v3.0+

# 节点资源要求（生产环境最小配置）
# CPU: 4 核 × 3 节点
# 内存：8Gi × 3 节点

2.2 添加 Helm Chart 仓库

helm repo add higress https://higress.io/helm-charts
helm repo update

2.3 创建命名空间

kubectl create namespace higress-system

2.4 生产环境 values.yaml 配置

# higress-production-values.yaml

# ========== 全局配置 ==========
global:
  # 镜像仓库（国内使用阿里云镜像）
  imageRepository: registry.cn-hangzhou.aliyuncs.com/higress
  # 镜像拉取策略
  imagePullPolicy: IfNotPresent

# ========== Gateway 配置 ==========
gateway:
  # 副本数（生产环境至少 3 副本）
  replicas: 3

  # 资源限制（关键！防止 OOM）
  resources:
    requests:
      cpu: "2"
      memory: "4Gi"
    limits:
      cpu: "4"
      memory: "8Gi"

  # 自动扩缩容配置
  autoscaling:
    enabled: true
    minReplicas: 3
    maxReplicas: 10
    targetCPUUtilizationPercentage: 70
    targetMemoryUtilizationPercentage: 80

  # Pod 反亲和性（分散到不同节点）
  antiAffinity:
    enabled: true
    type: "preferred"

  # 容忍度（允许调度到 master 节点，如需）
  tolerations:
    - key: "node-role.kubernetes.io/master"
      operator: "Exists"
      effect: "NoSchedule"

  # 节点选择器
  nodeSelector:
    gateway-node: "true"

  # 健康检查
  livenessProbe:
    httpGet:
      path: /healthz/ready
      port: 15021
    initialDelaySeconds: 5
    periodSeconds: 10
    timeoutSeconds: 5
    failureThreshold: 3

  readinessProbe:
    httpGet:
      path: /healthz/ready
      port: 15021
    initialDelaySeconds: 5
    periodSeconds: 10
    timeoutSeconds: 5
    failureThreshold: 3

  # Service 配置（LoadBalancer 类型）
  service:
    type: LoadBalancer
    # 阿里云 SLB 注解
    annotations:
      service.beta.kubernetes.io/alibaba-cloud-loadbalancer-type: "nlb"
      service.beta.kubernetes.io/alibaba-cloud-loadbalancer-spec: "slb.s3.small"
      service.beta.kubernetes.io/alibaba-cloud-loadbalancer-charge-type: "paybytraffic"
    # 外部 IP（如使用固定 IP）
    # loadBalancerIP: "192.168.1.100"
    ports:
      - name: http2
        port: 80
        targetPort: 80
        protocol: TCP
      - name: https
        port: 443
        targetPort: 443
        protocol: TCP

  # 日志配置
  logging:
    level: "warning"  # production: warning, debug: debug
    format: "json"    # 生产环境使用 JSON 格式便于日志收集

# ========== Controller 配置 ==========
controller:
  replicas: 2

  resources:
    requests:
      cpu: "500m"
      memory: "512Mi"
    limits:
      cpu: "1"
      memory: "1Gi"

  # Leader 选举配置
  leaderElection:
    enabled: true
    leaseDuration: 30s
    renewDeadline: 20s
    retryPeriod: 5s

# ========== 监控配置 ==========
monitoring:
  enabled: true
  serviceMonitor:
    enabled: true
    namespace: higress-system
    interval: 30s
    scrapeTimeout: 10s

# ========== TLS/SSL 配置 ==========
tls:
  # 启用自动证书（Let's Encrypt）
  autoCert:
    enabled: true
    email: "admin@example.com"
    server: "https://acme-v02.api.letsencrypt.org/directory"
  # 或手动指定证书 Secret
  # secretName: "higress-tls"

# ========== 限流配置 ==========
rateLimit:
  enabled: true
  redis:
    # Redis 地址（生产环境使用独立 Redis）
    host: "redis-master.redis.svc.cluster.local"
    port: 6379
    password: "your-redis-password"
    db: 0

# ========== WAF 配置 ==========
waf:
  enabled: true
  # 自定义规则
  customRules:
    - name: "block-sql-injection"
      action: "block"
      conditions:
        - field: "uri_query"
          operator: "contains"
          value: "union select"
        - field: "uri_query"
          operator: "contains"
          value: "or 1=1"

# ========== 认证配置 ==========
auth:
  enabled: true
  # JWT 认证
  jwt:
    enabled: true
    issuer: "https://auth.example.com"
    jwksUri: "https://auth.example.com/.well-known/jwks.json"
    audiences:
      - "higress-gateway"

2.5 部署命令

# 安装 Higress（生产环境）
helm install higress higress/higress \
  -n higress-system \
  -f higress-production-values.yaml \
  --wait \
  --timeout 10m

# 验证部署
kubectl get pods -n higress-system
kubectl get svc -n higress-system

# 查看部署详情
helm status higress -n higress-system

2.6 升级命令

# 平滑升级（零停机）
helm upgrade higress higress/higress \
  -n higress-system \
  -f higress-production-values.yaml \
  --reuse-values \
  --wait

# 回滚到上一版本
helm rollback higress -n higress-system

# 查看历史版本
helm history higress -n higress-system

三、路由配置示例

3.1 基础 HTTP 路由

# http-route.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: web-app
  namespace: default
  annotations:
    kubernetes.io/ingress.class: higress
    # 路径匹配类型：Exact, Prefix, ImplementationSpecific
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  rules:
  - host: app.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: web-service
            port:
              number: 80
      - path: /api
        pathType: Prefix
        backend:
          service:
            name: api-service
            port:
              number: 8080

3.2 灰度发布（Canary）

# canary-release.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: web-app-canary
  namespace: default
  annotations:
    kubernetes.io/ingress.class: higress
    # 灰度流量比例（10%）
    higress.io/canary: "true"
    higress.io/canary-by-header: "X-Canary"
    higress.io/canary-by-header-value: "true"
    # 或按权重
    # higress.io/canary-weight: "10"
spec:
  rules:
  - host: app.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: web-service-v2
            port:
              number: 80

3.3 gRPC 路由

# grpc-route.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: grpc-service
  namespace: default
  annotations:
    kubernetes.io/ingress.class: higress
    nginx.ingress.kubernetes.io/backend-protocol: "GRPC"
spec:
  tls:
  - hosts:
    - grpc.example.com
    secretName: grpc-tls
  rules:
  - host: grpc.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: grpc-backend
            port:
              number: 50051

3.4 WebSocket 支持

# websocket-route.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: websocket-app
  namespace: default
  annotations:
    kubernetes.io/ingress.class: higress
    nginx.ingress.kubernetes.io/proxy-read-timeout: "3600"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "3600"
spec:
  rules:
  - host: ws.example.com
    http:
      paths:
      - path: /ws
        pathType: Prefix
        backend:
          service:
            name: websocket-service
            port:
              number: 8080

四、关键参数调优

4.1 Envoy 连接参数

# 在 values.yaml 的 gateway.extraEnvoyConfig 中添加
gateway:
  extraEnvoyConfig: |
    # 连接超时配置
    connect_timeout: 5s

    # 连接池配置
    max_connections: 1024
    max_pending_requests: 1024
    max_requests: 1024
    max_retries: 3

    # HTTP/2 配置
    http2_protocol_options:
      max_concurrent_streams: 100
      initial_stream_window_size: 65536
      initial_connection_window_size: 1048576

    # 保持连接
    keepalive:
      time: 30s
      interval: 10s
      timeout: 5s

4.2 超时配置（生产推荐值）

参数	推荐值	说明
connect_timeout	5s	连接建立超时
request_timeout	60s	请求总超时
idle_timeout	300s	空闲连接超时
stream_idle_timeout	30s	流空闲超时
max_stream_duration	3600s	最大流时长（WebSocket）

4.3 限流配置

# 全局限流
apiVersion: networking.higress.io/v1
kind: HigressRateLimit
metadata:
  name: global-rate-limit
  namespace: higress-system
spec:
  # 限流维度：global, route, cluster
  domain: higress
  descriptors:
  - key: remote_address
    rate_limit:
      unit: second
      requests_per_unit: 100  # 每 IP 每秒 100 请求
  - key: header_match
    value: "api-key"
    rate_limit:
      unit: minute
      requests_per_unit: 1000  # 每 API Key 每分钟 1000 请求

4.4 熔断配置

# 熔断器配置（HigressRoute）
apiVersion: networking.higress.io/v1
kind: HigressRoute
metadata:
  name: api-route
  namespace: default
spec:
  hosts:
  - "api.example.com"
  routes:
  - match:
      uri:
        prefix: /api
    route:
    - destination:
        host: api-service
        port: 8080
    # 熔断配置
    outlierDetection:
      consecutive5xxErrors: 5      # 连续 5 次 5xx 错误触发
      interval: 30s                 # 检测间隔
      baseEjectionTime: 30s         # 隔离基础时间
      maxEjectionPercent: 50        # 最大隔离比例 50%
      minHealthPercent: 30          # 最小健康实例比例

五、监控与告警

5.1 Prometheus 指标

# 关键指标列表
# 请求量
higress_gateway_requests_total{route, status_code}

# 延迟
higress_gateway_request_duration_seconds{route, quantile}

# 连接数
higress_gateway_connections_active
higress_gateway_connections_total

# 限流
higress_gateway_rate_limited_requests_total

# 熔断
higress_gateway_circuit_breaker_open

# 证书
higress_gateway_ssl_cert_expiry_timestamp_seconds

5.2 Grafana 仪表盘配置

{
  "dashboard": {
    "title": "Higress Gateway 监控",
    "panels": [
      {
        "title": "QPS",
        "targets": [{
          "expr": "sum(rate(higress_gateway_requests_total[1m]))"
        }]
      },
      {
        "title": "P99 延迟",
        "targets": [{
          "expr": "histogram_quantile(0.99, rate(higress_gateway_request_duration_seconds_bucket[5m]))"
        }]
      },
      {
        "title": "错误率",
        "targets": [{
          "expr": "sum(rate(higress_gateway_requests_total{status_code=~\"5..\"}[5m])) / sum(rate(higress_gateway_requests_total[5m]))"
        }]
      },
      {
        "title": "活跃连接数",
        "targets": [{
          "expr": "higress_gateway_connections_active"
        }]
      }
    ]
  }
}

5.3 告警规则（Prometheus AlertManager）

# higress-alerts.yaml
groups:
- name: higress-alerts
  rules:
  # 高错误率告警
  - alert: HigressHighErrorRate
    expr: |
      sum(rate(higress_gateway_requests_total{status_code=~"5.."}[5m])) 
      / sum(rate(higress_gateway_requests_total[5m])) > 0.05
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "Higress 错误率超过 5%"
      description: "当前错误率：{{ $value | humanizePercentage }}"

  # 高延迟告警
  - alert: HigressHighLatency
    expr: |
      histogram_quantile(0.99, rate(higress_gateway_request_duration_seconds_bucket[5m])) > 1
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Higress P99 延迟超过 1 秒"
      description: "当前 P99 延迟：{{ $value }}s"

  # Pod 重启告警
  - alert: HigressPodRestarting
    expr: |
      increase(kube_pod_container_status_restarts_total{namespace="higress-system"}[1h]) > 3
    for: 0m
    labels:
      severity: warning
    annotations:
      summary: "Higress Pod 频繁重启"
      description: "Pod {{ $labels.pod }} 1 小时内重启 {{ $value }} 次"

  # 证书即将过期告警
  - alert: HigressCertExpiring
    expr: |
      (higress_gateway_ssl_cert_expiry_timestamp_seconds - time()) < 86400 * 7
    for: 1h
    labels:
      severity: warning
    annotations:
      summary: "SSL 证书将在 7 天内过期"

5.4 监控命令

# 查看 Gateway Pod 状态
kubectl get pods -n higress-system -o wide

# 查看资源使用
kubectl top pods -n higress-system

# 查看实时日志
kubectl logs -n higress-system -l app=higress-gateway -f --tail=100

# 查看 Envoy 配置
kubectl exec -n higress-system $(kubectl get pod -n higress-system -l app=higress-gateway -o jsonpath='{.items[0].metadata.name}') -- pilot-agent request GET /config_dump

# 查看连接统计
kubectl exec -n higress-system $(kubectl get pod -n higress-system -l app=higress-gateway -o jsonpath='{.items[0].metadata.name}') -- pilot-agent request GET /stats | grep connection

# 测试延迟
for i in {1..100}; do curl -s -o /dev/null -w "%{time_total}\n" https://app.example.com; done | awk '{sum+=$1} END {print "avg:", sum/NR}'

# 压力测试（ab）
ab -n 10000 -c 100 https://app.example.com/

# 压力测试（wrk）
wrk -t12 -c400 -d30s https://app.example.com/

六、故障排查

6.1 常见问题排查流程

1. 检查 Pod 状态
   kubectl get pods -n higress-system

2. 查看 Pod 事件
   kubectl describe pod <pod-name> -n higress-system

3. 查看日志
   kubectl logs <pod-name> -n higress-system

4. 检查 Service/Endpoints
   kubectl get svc,ep -n higress-system

5. 检查 Ingress 配置
   kubectl get ingress -A
   kubectl describe ingress <ingress-name>

6. 检查路由配置
   kubectl get higressroute -A

7. 验证 DNS 解析
   nslookup app.example.com
   dig app.example.com

8. 测试连通性
   curl -v https://app.example.com

6.2 典型故障场景

场景 1：502 Bad Gateway

# 原因：后端服务不可用
# 排查步骤：

# 1. 检查后端 Pod 状态
kubectl get pods -n default -l app=web-service

# 2. 检查 Endpoints
kubectl get endpoints web-service -n default

# 3. 查看 Gateway 日志中的 upstream 错误
kubectl logs -n higress-system -l app=higress-gateway | grep "upstream"

# 4. 测试后端直连
kubectl exec -n default <backend-pod> -- curl localhost:8080/health

场景 2：503 Service Unavailable

# 原因：无可用后端实例或熔断触发
# 排查步骤：

# 1. 检查熔断状态
kubectl exec -n higress-system <gateway-pod> -- pilot-agent request GET /stats | grep circuit_breaker

# 2. 检查限流状态
kubectl exec -n higress-system <gateway-pod> -- pilot-agent request GET /stats | grep rate_limit

# 3. 查看是否有健康检查失败
kubectl logs -n higress-system -l app=higress-gateway | grep "health_check"

场景 3：SSL/TLS 证书问题

# 原因：证书过期或配置错误
# 排查步骤：

# 1. 检查证书有效期
echo | openssl s_client -connect app.example.com:443 2>/dev/null | openssl x509 -noout -dates

# 2. 检查 Secret 中的证书
kubectl get secret higress-tls -n higress-system -o jsonpath='{.data.tls\.crt}' | base64 -d | openssl x509 -noout -dates

# 3. 验证证书链
openssl s_client -connect app.example.com:443 -showcerts

# 4. 检查自动证书状态（如使用 Let's Encrypt）
kubectl get certificaterequest -n higress-system
kubectl describe certificaterequest <request-name> -n higress-system

场景 4：路由不匹配

# 原因：Ingress 配置错误或路径不匹配
# 排查步骤：

# 1. 查看 Ingress 配置
kubectl get ingress <name> -o yaml

# 2. 检查 Higress 路由配置
kubectl get higressroute -A -o yaml

# 3. 查看 Envoy 路由表
kubectl exec -n higress-system <gateway-pod> -- pilot-agent request GET /config_dump | jq '.configs[] | select(.route_config != null)'

# 4. 测试不同路径
curl -v -H "Host: app.example.com" http://<gateway-ip>/api
curl -v -H "Host: app.example.com" http://<gateway-ip>/static

场景 5：性能下降

# 原因：资源不足或配置不当
# 排查步骤：

# 1. 检查资源使用
kubectl top pods -n higress-system

# 2. 检查连接数
kubectl exec -n higress-system <gateway-pod> -- pilot-agent request GET /stats | grep connection

# 3. 检查请求队列
kubectl exec -n higress-system <gateway-pod> -- pilot-agent request GET /stats | grep queue

# 4. 查看慢请求日志
kubectl logs -n higress-system -l app=higress-gateway | grep -E "duration.*[1-9][0-9]{2,}ms"

# 5. 检查是否有 OOM
kubectl describe pod -n higress-system | grep -A5 "OOM"

6.3 调试工具

# 启用 Debug 日志
kubectl patch deploy higress-gateway -n higress-system \
  --type='json' \
  -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/env", "value": [{"name": "LOG_LEVEL", "value": "debug"}]}]'

# 抓取 Envoy 配置快照
kubectl exec -n higress-system <gateway-pod> -- pilot-agent request GET /config_dump > envoy-config.json

# 抓取性能剖析
kubectl exec -n higress-system <gateway-pod> -- curl -s localhost:15000/ready
kubectl exec -n higress-system <gateway-pod> -- curl -s localhost:15000/stats/prometheus > metrics.prom

# 网络抓包（需要 debug 容器）
kubectl debug -n higress-system <gateway-pod> -it --image=nicolaka/netshoot -- tcpdump -i any port 80 or 443

七、最佳实践

7.1 部署最佳实践

实践	说明	推荐配置
多副本部署	避免单点故障	至少 3 副本
跨可用区部署	提高容灾能力	Pod 反亲和性 + 多 AZ
资源限制	防止资源耗尽	设置 requests/limits
PDB 配置	保证升级可用性	minAvailable: 2
健康检查	快速故障检测	5s interval, 3 次失败

7.2 安全最佳实践

# 1. 启用 mTLS
apiVersion: security.higress.io/v1
kind: PeerAuthentication
metadata:
  name: default
  namespace: higress-system
spec:
  mtls:
    mode: STRICT

# 2. 配置 WAF 规则
apiVersion: security.higress.io/v1
kind: WafPolicy
metadata:
  name: default-waf
  namespace: higress-system
spec:
  rules:
  - name: sql-injection
    action: BLOCK
    conditions:
    - field: ARGS
      operator: CONTAINS
      value: "(?i)(union.*select|select.*from)"

  - name: xss-protection
    action: BLOCK
    conditions:
    - field: ARGS
      operator: CONTAINS
      value: "(?i)(<script|javascript:)"

# 3. IP 白名单
apiVersion: networking.higress.io/v1
kind: HigressGateway
metadata:
  name: internal-gateway
spec:
  accessLog:
  - filter:
      remoteIp:
        cidr: "10.0.0.0/8"

7.3 性能最佳实践

# 1. 启用 HTTP/2
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    nginx.ingress.kubernetes.io/http2: "true"

# 2. 启用 Gzip 压缩
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    nginx.ingress.kubernetes.io/enable-gzip: "true"
    nginx.ingress.kubernetes.io/gzip-types: "text/plain,text/css,application/json,application/javascript"
    nginx.ingress.kubernetes.io/gzip-min-length: "256"

# 3. 配置连接池
# 在 HigressRoute 中
spec:
  routes:
  - route:
    - destination:
        host: backend-service
    connectionPool:
      tcp:
        maxConnections: 100
      http:
        h2UpgradePolicy: UPGRADE
        http1MaxPendingRequests: 100
        http2MaxRequests: 1000

7.4 运维最佳实践

# 1. 定期备份配置
kubectl get ingress,higressroute,virtualservice -A -o yaml > higress-config-backup-$(date +%Y%m%d).yaml

# 2. 证书监控（提前 30 天告警）
# 使用 cert-manager + Prometheus

# 3. 配置变更审计
# 启用 Kubernetes Audit Log

# 4. 定期压测
# 每月执行一次全链路压测

# 5. 灾备演练
# 每季度执行一次故障切换演练

八、配置模板速查

8.1 完整 Ingress 模板

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: production-app
  namespace: production
  annotations:
    kubernetes.io/ingress.class: higress
    # TLS
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
    # 限流
    nginx.ingress.kubernetes.io/limit-rps: "100"
    # 超时
    nginx.ingress.kubernetes.io/proxy-connect-timeout: "5"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "60"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "60"
    # 重定向
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    # CORS
    nginx.ingress.kubernetes.io/enable-cors: "true"
    nginx.ingress.kubernetes.io/cors-allow-origin: "https://*.example.com"
    nginx.ingress.kubernetes.io/cors-allow-methods: "GET, POST, PUT, DELETE, OPTIONS"
    nginx.ingress.kubernetes.io/cors-allow-headers: "DNT,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Range,Authorization"
spec:
  tls:
  - hosts:
    - app.example.com
    secretName: app-tls
  rules:
  - host: app.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: frontend
            port:
              number: 80
      - path: /api
        pathType: Prefix
        backend:
          service:
            name: backend
            port:
              number: 8080
      - path: /health
        pathType: Exact
        backend:
          service:
            name: frontend
            port:
              number: 80

8.2 HigressRoute 模板

apiVersion: networking.higress.io/v1
kind: HigressRoute
metadata:
  name: api-route
  namespace: production
spec:
  hosts:
  - "api.example.com"
  http:
  - name: "api-v1"
    match:
    - uri:
        prefix: "/api/v1"
    route:
    - destination:
        host: api-v1-service
        port:
          number: 8080
      weight: 90
    - destination:
        host: api-v2-service
        port:
          number: 8080
      weight: 10
    timeout: 30s
    retries:
      attempts: 3
      perTryTimeout: 10s
      retryOn: "5xx,reset,connect-failure"
    fault:
      delay:
        percentage:
          value: 0.1
        fixedDelay: 100ms
    corsPolicy:
      allowOrigins:
      - exact: "https://app.example.com"
      allowMethods:
      - GET
      - POST
      allowHeaders:
      - Authorization
      - Content-Type
      exposeHeaders:
      - X-Request-Id
      maxAge: 24h
      allowCredentials: true
    rateLimit:
      type: Local
      qps: 100
      burst: 200

九、参考资源

官方文档: https://higress.io/docs/
GitHub: https://github.com/alibaba/higress
Helm Chart: https://higress.io/helm-charts/
最佳实践: https://higress.io/docs/latest/overview/what-is-higress/
性能基准: https://higress.io/docs/latest/benchmark/

十、今日检查清单

检查 Gateway Pod 健康状态
验证 SSL 证书有效期（> 30 天）
检查错误率（< 1%）
检查 P99 延迟（< 500ms）
查看限流触发次数
检查熔断状态
备份当前配置
审查最近变更的 Ingress 配置

文档生成时间： 2026-03-13 10:00 CST
下次主题： 2026-03-14 - Redis 生产配置与性能调优

SRE 每日主题：Higress 云原生网关部署与生产实践

一、Higress 概述

核心优势

二、生产环境部署方案

2.1 前置要求

2.2 添加 Helm Chart 仓库

2.3 创建命名空间

2.4 生产环境 values.yaml 配置

2.5 部署命令

2.6 升级命令

三、路由配置示例

3.1 基础 HTTP 路由

3.2 灰度发布（Canary）

3.3 gRPC 路由

3.4 WebSocket 支持

四、关键参数调优

4.1 Envoy 连接参数

4.2 超时配置（生产推荐值）

4.3 限流配置

4.4 熔断配置

五、监控与告警

5.1 Prometheus 指标

5.2 Grafana 仪表盘配置

5.3 告警规则（Prometheus AlertManager）

5.4 监控命令

六、故障排查

6.1 常见问题排查流程

6.2 典型故障场景

场景 1：502 Bad Gateway

场景 2：503 Service Unavailable

场景 3：SSL/TLS 证书问题

场景 4：路由不匹配

场景 5：性能下降

6.3 调试工具

七、最佳实践

7.1 部署最佳实践

7.2 安全最佳实践

7.3 性能最佳实践

7.4 运维最佳实践

八、配置模板速查

8.1 完整 Ingress 模板

8.2 HigressRoute 模板

九、参考资源

十、今日检查清单

results matching ""

No results matching ""