Docker部署Airflow

一.部署

1.1 安装[[Docker]]和Docker compose

1.2 下载docker-compose.yaml文件


mkdir -p /dockers/airflow

cd /dockers/airflow

curl -LfO 'https://airflow.apache.org/docs/apache-airflow/2.5.2/docker-compose.yaml'

1.3 创建相关文件夹:

  • 目录结构:
  airflow:  
    dags:   # DAG文件,映射到容器
    logs:   # 日志文件
    plugins:    # 拓展插件包
    postgres-db:    # sql映射出db文件
    test:   # 测试文件夹
    .env    # 环境变量
    .gitignore  
    airflow.cfg     # airflow配置文件
    docker-compose.yaml
    Dockerfile  # airflow容器构建
    Makefile
    README.md
    requirements.txt    # python 安装环境

在Linux上,快速启动需要知道您的主机用户id,并且需要将组id设置为0。否则,在dags、日志和插件中创建的文件将由root用户拥有。您必须确保为docker compose配置它们


mkdir -p ./dags ./logs ./plugins ./postgres-db

echo -e "AIRFLOW_UID=$(id -u)" > .env


[docker-compose.yaml]

AIRFLOW_UID=50000

1.4 构建私有镜像


docker build  -t airflow:jiasen .

修改docker-compose 中的image

image: ${AIRFLOW_IMAGE_NAME:-airflow:jiasen}

二、常见问题

2.1 worker无法启动,提示:ERROR: Pidfile (/opt/airflow/airflow-worker.pid) already exists

worker错误日志如下

2024-03-04 09:15:37 /home/airflow/.local/lib/python3.7/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to "sqlalchemy<2.0". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings.  Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)
2024-03-04 09:15:38 [2024-03-04 01:15:38 +0000] [34] [INFO] Starting gunicorn 20.1.0
2024-03-04 09:15:38 [2024-03-04 01:15:38 +0000] [34] [INFO] Listening at: http://0.0.0.0:8793 (34)
2024-03-04 09:15:38 [2024-03-04 01:15:38 +0000] [34] [INFO] Using worker: sync
2024-03-04 09:15:38 [2024-03-04 01:15:38 +0000] [35] [INFO] Booting worker with pid: 35
2024-03-04 09:15:38 [2024-03-04 01:15:38 +0000] [36] [INFO] Booting worker with pid: 36
2024-03-04 09:15:38 ERROR: Pidfile (/opt/airflow/airflow-worker.pid) already exists.
2024-03-04 09:15:38 Seems we're already running? (pid: 7)
2024-03-04 09:15:38 [2024-03-04 01:15:38 +0000] [34] [INFO] Handling signal: term
2024-03-04 09:15:38 [2024-03-04 01:15:38 +0000] [36] [INFO] Worker exiting (pid: 36)
2024-03-04 09:15:38 [2024-03-04 01:15:38 +0000] [35] [INFO] Worker exiting (pid: 35)
2024-03-04 09:15:38 [2024-03-04 01:15:38 +0000] [34] [INFO] Shutting down: Master
2024-03-04 09:19:15 WARNING:root:/opt/airflow/logs/scheduler/latest already exists as a dir/file. Skip creating symlink.
2024-03-04 09:19:16 WARNING:root:/opt/airflow/logs/scheduler/latest already exists as a dir/file. Skip creating symlink.

解决方法:删除pid文件重启进程

➜  Airflow git:(master) ✗ docker restart airflow-airflow-worker-1
airflow-airflow-worker-1
➜  Airflow git:(master) ✗ docker exec -it airflow-airflow-worker-1 rm -rf /opt/airflow/airflow-worker.pid

相关资料