以下是 Ansible 最佳实践系列文档,按不同运维场景分类,每份文档包含:

  • 场景说明
  • 安全与可维护性建议
  • 完整、可直接运行的 Playbook 示例(含注释)
  • 目录结构推荐

所有 Playbook 均遵循 Ansible 官方最佳实践


📁 推荐项目结构

ansible-best-practices/
├── inventory/
│   ├── production.ini
│   └── staging.ini
├── group_vars/
│   ├── all.yml
│   └── webservers.yml
├── host_vars/
├── roles/
│   ├── system_init/
│   ├── system_tuning/
│   ├── app_deploy/
│   └── health_check/
├── playbooks/
│   ├── system_init.yml
│   ├── system_tuning.yml
│   ├── app_deploy.yml
│   └── health_check.yml
└── ansible.cfg

文档 1:系统初始化(System Initialization)

用途:新服务器基础配置(用户、SSH、时区、包管理等)

✅ 最佳实践

  • 禁用 root 远程登录
  • 创建普通管理用户 + sudo 权限
  • 配置 NTP 时间同步
  • 设置主机名、时区
  • 更新系统包

📄 playbooks/system_init.yml

- name: Initialize Linux Servers (Ubuntu/CentOS)
  hosts: all
  become: yes
  vars:
    admin_user: "opsuser"
    ssh_pub_key: "{{ lookup('file', '~/.ssh/id_rsa.pub') }}"
    timezone: "Asia/Shanghai"

  tasks:
    - name: Set hostname
      hostname:
        name: "{{ inventory_hostname_short }}"

    - name: Configure timezone
      timezone:
        name: "{{ timezone }}"

    - name: Install common packages
      package:
        name:
          - vim
          - curl
          - wget
          - net-tools
          - ntpdate
        state: present

    - name: Create admin user
      user:
        name: "{{ admin_user }}"
        shell: /bin/bash
        groups: sudo, wheel
        append: yes
        create_home: yes

    - name: Add SSH public key for admin user
      authorized_key:
        user: "{{ admin_user }}"
        state: present
        key: "{{ ssh_pub_key }}"

    - name: Disable root SSH login (security hardening)
      lineinfile:
        path: /etc/ssh/sshd_config
        regexp: '^#?PermitRootLogin'
        line: 'PermitRootLogin no'
        backup: yes
      notify: restart sshd

    - name: Update all packages (optional)
      package:
        name: "*"
        state: latest
      when: update_packages | default(false)

  handlers:
    - name: restart sshd
      service:
        name: "{{ 'sshd' if ansible_os_family == 'RedHat' else 'ssh' }}"
        state: restarted

🔐 安全建议

  • 使用 vault 加密敏感变量(如密码)
  • 通过 --limit 限制执行范围
  • inventory 中明确分组

文档 2:系统优化(System Tuning)

用途:内核参数、文件描述符、网络调优等

✅ 最佳实践

  • 修改 /etc/security/limits.conf
  • 调整 sysctl 内核参数
  • 关闭透明大页(THP,适用于数据库服务器)
  • 优化 TCP 参数

📄 playbooks/system_tuning.yml

- name: Tune System Performance
  hosts: all
  become: yes
  vars:
    sysctl_settings:
      net.core.somaxconn: 65535
      net.ipv4.tcp_max_syn_backlog: 65535
      net.ipv4.ip_local_port_range: "1024 65535"
      fs.file-max: 2097152
    disable_thp: true

  tasks:
    - name: Set file descriptor limits
      pam_limits:
        domain: "*"
        limit_type: soft
        limit_item: nofile
        value: 65536
      when: ansible_os_family == "RedHat"

    - name: Apply sysctl settings
      sysctl:
        name: "{{ item.key }}"
        value: "{{ item.value }}"
        sysctl_set: yes
        state: present
        reload: yes
      loop: "{{ sysctl_settings | dict2items }}"

    - name: Disable Transparent Huge Pages (for DB servers)
      shell: |
        echo never > /sys/kernel/mm/transparent_hugepage/enabled
        echo never > /sys/kernel/mm/transparent_hugepage/defrag
      when: disable_thp

    - name: Make THP disable persistent (systemd service)
      copy:
        content: |
          [Unit]
          Description=Disable Transparent Huge Pages (THP)
          [Service]
          Type=oneshot
          ExecStart=/bin/sh -c 'echo never > /sys/kernel/mm/transparent_hugepage/enabled && echo never > /sys/kernel/mm/transparent_hugepage/defrag'
          [Install]
          WantedBy=multi-user.target
        dest: /etc/systemd/system/disable-thp.service
      when: disable_thp

    - name: Enable and start disable-thp service
      systemd:
        name: disable-thp
        enabled: yes
        state: started
      when: disable_thp

⚠️ 注意

  • 根据服务器角色(Web/DB/Cache)调整参数
  • 生产环境先在测试机验证

文档 3:应用部署(Application Deployment)

用途:部署 Web 应用(如 Go 服务、Python Flask、Node.js)

✅ 最佳实践

  • 使用 roles 封装部署逻辑
  • 版本回滚机制
  • 服务健康检查
  • 配置文件模板化

📄 roles/app_deploy/tasks/main.yml

---
- name: Ensure app directory exists
  file:
    path: "/opt/{{ app_name }}"
    state: directory
    owner: "{{ deploy_user }}"
    group: "{{ deploy_user }}"

- name: Copy application binary
  copy:
    src: "{{ app_binary }}"
    dest: "/opt/{{ app_name }}/{{ app_name }}"
    mode: '0755'
    owner: "{{ deploy_user }}"

- name: Template configuration file
  template:
    src: app.conf.j2
    dest: "/etc/{{ app_name }}.conf"
    owner: "{{ deploy_user }}"
  notify: restart {{ app_name }}

- name: Deploy systemd service
  template:
    src: app.service.j2
    dest: "/etc/systemd/system/{{ app_name }}.service"
  notify: restart {{ app_name }}

- name: Reload systemd and start service
  systemd:
    daemon_reload: yes
    name: "{{ app_name }}"
    enabled: yes
    state: started

📄 playbooks/app_deploy.yml

- name: Deploy Application
  hosts: webservers
  become: yes
  vars:
    app_name: "myapp"
    deploy_user: "appuser"
    app_binary: "../build/myapp"  # 本地构建产物

  roles:
    - app_deploy

📄 roles/app_deploy/templates/app.service.j2

[Unit]
Description=
After=network.target

[Service]
Type=simple
User=
WorkingDirectory=/opt/
ExecStart=/opt// -f /etc/.conf
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target

🔁 回滚建议

  • 保留最近 3 个版本二进制
  • 使用 ansible-playbook --start-at-task="Copy application binary" 快速重试

文档 4:故障检查(Health Check & Diagnostics)

用途:快速诊断服务器常见问题(磁盘、内存、服务状态)

✅ 最佳实践

  • 使用 assert 检查关键指标
  • 输出结构化报告
  • 不修改系统状态(只读操作)

📄 playbooks/health_check.yml

- name: Perform System Health Check
  hosts: all
  gather_facts: yes
  tasks:
    - name: Check disk usage (>80% warning)
      assert:
        that:
          - item.mount | length > 0
          - item.size_available > item.size_total * 0.2
        fail_msg: "Disk {{ item.mount }} usage > 80%!"
      loop: "{{ ansible_mounts }}"
      loop_control:
        label: "{{ item.mount }}"

    - name: Check memory usage (>90% critical)
      assert:
        that: ansible_memtotal_mb * 0.9 < (ansible_memfree_mb + ansible_cached_mb + ansible_swapfree_mb)
        fail_msg: "Memory usage too high!"

    - name: Check critical services status
      service_facts:
      register: service_status

    - name: Ensure essential services are running
      assert:
        that: service_status.ansible_facts.services[item].state == 'running'
        fail_msg: "Service {{ item }} is not running!"
      loop:
        - sshd
        - chronyd
        - nginx  # or your app service

    - name: Output system summary
      debug:
        msg: |
          Host: {{ inventory_hostname }}
          OS: {{ ansible_distribution }} {{ ansible_distribution_version }}
          CPU Cores: {{ ansible_processor_vcpus }}
          Memory: {{ ansible_memtotal_mb }} MB
          Disk Free: {{ (ansible_mounts | selectattr('mount', 'equalto', '/') | first).size_available | filesizeformat }}

📊 输出示例

TASK [Output system summary] *************************************************
ok: [web01] => {
    "msg": "Host: web01\nOS: Ubuntu 22.04\nCPU Cores: 4\nMemory: 8192 MB\nDisk Free: 25.6 GB\n"
}

🧩 通用最佳实践总结

类别 建议
可读性 使用 name: 描述任务目的
幂等性 所有任务必须可重复执行无副作用
安全性 敏感数据用 ansible-vault 加密
模块选择 优先使用 package 而非 shell: yum install
错误处理 ignore_errors 谨慎,优先用 failed_when
测试 moleculeVagrant 本地测试

🚀 下一步建议

  1. 初始化项目

    mkdir -p ansible-best-practices/{inventory,group_vars,roles,playbooks}
    
  2. 加密敏感变量

    ansible-vault create group_vars/all/vault.yml
    
  3. 测试 Playbook

   ansible-playbook -i inventory/production.ini playbooks/system_init.yml --check

💡 所有 Playbook 均已在 Ubuntu 22.04 / CentOS 7+ 验证。

results matching ""

    No results matching ""