Docker部署Airflow
一.部署
1.1 安装[[Docker]]和Docker compose
1.2 下载docker-compose.yaml文件
mkdir -p /dockers/airflow
cd /dockers/airflow
curl -LfO 'https://airflow.apache.org/docs/apache-airflow/2.5.2/docker-compose.yaml'
1.3 创建相关文件夹:
- 目录结构:
airflow:
dags: # DAG文件,映射到容器
logs: # 日志文件
plugins: # 拓展插件包
postgres-db: # sql映射出db文件
test: # 测试文件夹
.env # 环境变量
.gitignore
airflow.cfg # airflow配置文件
docker-compose.yaml
Dockerfile # airflow容器构建
Makefile
README.md
requirements.txt # python 安装环境
在Linux上,快速启动需要知道您的主机用户id,并且需要将组id设置为0。否则,在dags、日志和插件中创建的文件将由root用户拥有。您必须确保为docker compose配置它们
mkdir -p ./dags ./logs ./plugins ./postgres-db
echo -e "AIRFLOW_UID=$(id -u)" > .env
[docker-compose.yaml]
AIRFLOW_UID=50000
1.4 构建私有镜像
docker build -t airflow:jiasen .
修改docker-compose 中的image
image: ${AIRFLOW_IMAGE_NAME:-airflow:jiasen}
二、常见问题
2.1 worker无法启动,提示:ERROR: Pidfile (/opt/airflow/airflow-worker.pid) already exists
worker错误日志如下
2024-03-04 09:15:37 /home/airflow/.local/lib/python3.7/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to "sqlalchemy<2.0". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)
2024-03-04 09:15:38 [2024-03-04 01:15:38 +0000] [34] [INFO] Starting gunicorn 20.1.0
2024-03-04 09:15:38 [2024-03-04 01:15:38 +0000] [34] [INFO] Listening at: http://0.0.0.0:8793 (34)
2024-03-04 09:15:38 [2024-03-04 01:15:38 +0000] [34] [INFO] Using worker: sync
2024-03-04 09:15:38 [2024-03-04 01:15:38 +0000] [35] [INFO] Booting worker with pid: 35
2024-03-04 09:15:38 [2024-03-04 01:15:38 +0000] [36] [INFO] Booting worker with pid: 36
2024-03-04 09:15:38 ERROR: Pidfile (/opt/airflow/airflow-worker.pid) already exists.
2024-03-04 09:15:38 Seems we're already running? (pid: 7)
2024-03-04 09:15:38 [2024-03-04 01:15:38 +0000] [34] [INFO] Handling signal: term
2024-03-04 09:15:38 [2024-03-04 01:15:38 +0000] [36] [INFO] Worker exiting (pid: 36)
2024-03-04 09:15:38 [2024-03-04 01:15:38 +0000] [35] [INFO] Worker exiting (pid: 35)
2024-03-04 09:15:38 [2024-03-04 01:15:38 +0000] [34] [INFO] Shutting down: Master
2024-03-04 09:19:15 WARNING:root:/opt/airflow/logs/scheduler/latest already exists as a dir/file. Skip creating symlink.
2024-03-04 09:19:16 WARNING:root:/opt/airflow/logs/scheduler/latest already exists as a dir/file. Skip creating symlink.
解决方法:删除pid文件重启进程
➜ Airflow git:(master) ✗ docker restart airflow-airflow-worker-1
airflow-airflow-worker-1
➜ Airflow git:(master) ✗ docker exec -it airflow-airflow-worker-1 rm -rf /opt/airflow/airflow-worker.pid
相关资料
- 官方文档:https://airflow.apache.org/docs/apache-airflow/stable/index.html
- Airflow中文文档:https://www.mianshigee.com/tutorial/AirflowZH/
- 【入门Airflow】 使用Docker在本地快速搭建Airflow_docker 部署airflow_mkdir700的博客-CSDN博客
- Running Airflow in Docker — Airflow Documentation (apache.org)
- docker部署Airflow(修改URL-path、更换postgres –>myslq数据库、LDAP登录)_airflow docker部署_常名先生的博客-CSDN博客
- Docker AirFlow 更换数据库
- Set up a Database Backend — Airflow Documentation (apache.org)
- Docker AirFlow LDAP
- Security — Flask AppBuilder (flask-appbuilder.readthedocs.io)
- Operators — Airflow Documentation (apache.org)
- AirFlow中文文档:(https://airflow.apachecn.org/)
- airflow-cn/airflow-learning-document: apache-airflow 系列中文资料 👏
Star
(github.com)
...