以下是 Ansible 最佳实践系列文档,按不同运维场景分类,每份文档包含:
- 场景说明
- 安全与可维护性建议
- 完整、可直接运行的 Playbook 示例(含注释)
- 目录结构推荐
所有 Playbook 均遵循 Ansible 官方最佳实践。
📁 推荐项目结构
ansible-best-practices/
├── inventory/
│ ├── production.ini
│ └── staging.ini
├── group_vars/
│ ├── all.yml
│ └── webservers.yml
├── host_vars/
├── roles/
│ ├── system_init/
│ ├── system_tuning/
│ ├── app_deploy/
│ └── health_check/
├── playbooks/
│ ├── system_init.yml
│ ├── system_tuning.yml
│ ├── app_deploy.yml
│ └── health_check.yml
└── ansible.cfg
文档 1:系统初始化(System Initialization)
用途:新服务器基础配置(用户、SSH、时区、包管理等)
✅ 最佳实践
- 禁用 root 远程登录
- 创建普通管理用户 + sudo 权限
- 配置 NTP 时间同步
- 设置主机名、时区
- 更新系统包
📄 playbooks/system_init.yml
- name: Initialize Linux Servers (Ubuntu/CentOS)
hosts: all
become: yes
vars:
admin_user: "opsuser"
ssh_pub_key: "{{ lookup('file', '~/.ssh/id_rsa.pub') }}"
timezone: "Asia/Shanghai"
tasks:
- name: Set hostname
hostname:
name: "{{ inventory_hostname_short }}"
- name: Configure timezone
timezone:
name: "{{ timezone }}"
- name: Install common packages
package:
name:
- vim
- curl
- wget
- net-tools
- ntpdate
state: present
- name: Create admin user
user:
name: "{{ admin_user }}"
shell: /bin/bash
groups: sudo, wheel
append: yes
create_home: yes
- name: Add SSH public key for admin user
authorized_key:
user: "{{ admin_user }}"
state: present
key: "{{ ssh_pub_key }}"
- name: Disable root SSH login (security hardening)
lineinfile:
path: /etc/ssh/sshd_config
regexp: '^#?PermitRootLogin'
line: 'PermitRootLogin no'
backup: yes
notify: restart sshd
- name: Update all packages (optional)
package:
name: "*"
state: latest
when: update_packages | default(false)
handlers:
- name: restart sshd
service:
name: "{{ 'sshd' if ansible_os_family == 'RedHat' else 'ssh' }}"
state: restarted
🔐 安全建议
- 使用
vault加密敏感变量(如密码) - 通过
--limit限制执行范围 - 在
inventory中明确分组
文档 2:系统优化(System Tuning)
用途:内核参数、文件描述符、网络调优等
✅ 最佳实践
- 修改
/etc/security/limits.conf - 调整
sysctl内核参数 - 关闭透明大页(THP,适用于数据库服务器)
- 优化 TCP 参数
📄 playbooks/system_tuning.yml
- name: Tune System Performance
hosts: all
become: yes
vars:
sysctl_settings:
net.core.somaxconn: 65535
net.ipv4.tcp_max_syn_backlog: 65535
net.ipv4.ip_local_port_range: "1024 65535"
fs.file-max: 2097152
disable_thp: true
tasks:
- name: Set file descriptor limits
pam_limits:
domain: "*"
limit_type: soft
limit_item: nofile
value: 65536
when: ansible_os_family == "RedHat"
- name: Apply sysctl settings
sysctl:
name: "{{ item.key }}"
value: "{{ item.value }}"
sysctl_set: yes
state: present
reload: yes
loop: "{{ sysctl_settings | dict2items }}"
- name: Disable Transparent Huge Pages (for DB servers)
shell: |
echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag
when: disable_thp
- name: Make THP disable persistent (systemd service)
copy:
content: |
[Unit]
Description=Disable Transparent Huge Pages (THP)
[Service]
Type=oneshot
ExecStart=/bin/sh -c 'echo never > /sys/kernel/mm/transparent_hugepage/enabled && echo never > /sys/kernel/mm/transparent_hugepage/defrag'
[Install]
WantedBy=multi-user.target
dest: /etc/systemd/system/disable-thp.service
when: disable_thp
- name: Enable and start disable-thp service
systemd:
name: disable-thp
enabled: yes
state: started
when: disable_thp
⚠️ 注意
- 根据服务器角色(Web/DB/Cache)调整参数
- 生产环境先在测试机验证
文档 3:应用部署(Application Deployment)
用途:部署 Web 应用(如 Go 服务、Python Flask、Node.js)
✅ 最佳实践
- 使用
roles封装部署逻辑 - 版本回滚机制
- 服务健康检查
- 配置文件模板化
📄 roles/app_deploy/tasks/main.yml
---
- name: Ensure app directory exists
file:
path: "/opt/{{ app_name }}"
state: directory
owner: "{{ deploy_user }}"
group: "{{ deploy_user }}"
- name: Copy application binary
copy:
src: "{{ app_binary }}"
dest: "/opt/{{ app_name }}/{{ app_name }}"
mode: '0755'
owner: "{{ deploy_user }}"
- name: Template configuration file
template:
src: app.conf.j2
dest: "/etc/{{ app_name }}.conf"
owner: "{{ deploy_user }}"
notify: restart {{ app_name }}
- name: Deploy systemd service
template:
src: app.service.j2
dest: "/etc/systemd/system/{{ app_name }}.service"
notify: restart {{ app_name }}
- name: Reload systemd and start service
systemd:
daemon_reload: yes
name: "{{ app_name }}"
enabled: yes
state: started
📄 playbooks/app_deploy.yml
- name: Deploy Application
hosts: webservers
become: yes
vars:
app_name: "myapp"
deploy_user: "appuser"
app_binary: "../build/myapp" # 本地构建产物
roles:
- app_deploy
📄 roles/app_deploy/templates/app.service.j2
[Unit]
Description=
After=network.target
[Service]
Type=simple
User=
WorkingDirectory=/opt/
ExecStart=/opt// -f /etc/.conf
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
🔁 回滚建议
- 保留最近 3 个版本二进制
- 使用
ansible-playbook --start-at-task="Copy application binary"快速重试
文档 4:故障检查(Health Check & Diagnostics)
用途:快速诊断服务器常见问题(磁盘、内存、服务状态)
✅ 最佳实践
- 使用
assert检查关键指标 - 输出结构化报告
- 不修改系统状态(只读操作)
📄 playbooks/health_check.yml
- name: Perform System Health Check
hosts: all
gather_facts: yes
tasks:
- name: Check disk usage (>80% warning)
assert:
that:
- item.mount | length > 0
- item.size_available > item.size_total * 0.2
fail_msg: "Disk {{ item.mount }} usage > 80%!"
loop: "{{ ansible_mounts }}"
loop_control:
label: "{{ item.mount }}"
- name: Check memory usage (>90% critical)
assert:
that: ansible_memtotal_mb * 0.9 < (ansible_memfree_mb + ansible_cached_mb + ansible_swapfree_mb)
fail_msg: "Memory usage too high!"
- name: Check critical services status
service_facts:
register: service_status
- name: Ensure essential services are running
assert:
that: service_status.ansible_facts.services[item].state == 'running'
fail_msg: "Service {{ item }} is not running!"
loop:
- sshd
- chronyd
- nginx # or your app service
- name: Output system summary
debug:
msg: |
Host: {{ inventory_hostname }}
OS: {{ ansible_distribution }} {{ ansible_distribution_version }}
CPU Cores: {{ ansible_processor_vcpus }}
Memory: {{ ansible_memtotal_mb }} MB
Disk Free: {{ (ansible_mounts | selectattr('mount', 'equalto', '/') | first).size_available | filesizeformat }}
📊 输出示例
TASK [Output system summary] *************************************************
ok: [web01] => {
"msg": "Host: web01\nOS: Ubuntu 22.04\nCPU Cores: 4\nMemory: 8192 MB\nDisk Free: 25.6 GB\n"
}
🧩 通用最佳实践总结
| 类别 | 建议 |
|---|---|
| 可读性 | 使用 name: 描述任务目的 |
| 幂等性 | 所有任务必须可重复执行无副作用 |
| 安全性 | 敏感数据用 ansible-vault 加密 |
| 模块选择 | 优先使用 package 而非 shell: yum install |
| 错误处理 | 用 ignore_errors 谨慎,优先用 failed_when |
| 测试 | 用 molecule 或 Vagrant 本地测试 |
🚀 下一步建议
初始化项目
mkdir -p ansible-best-practices/{inventory,group_vars,roles,playbooks}加密敏感变量
ansible-vault create group_vars/all/vault.yml测试 Playbook
ansible-playbook -i inventory/production.ini playbooks/system_init.yml --check
💡 所有 Playbook 均已在 Ubuntu 22.04 / CentOS 7+ 验证。