Apache Airflow tricks
12 Sep 2018I will update this post from time to time with more learnings.
Problem: I fixed problem in my pipeline but airflow doesn’t see this.
Possible things you can do:
- check if you actually did fix it :)
- try to refresh the DAG through UI
- remove
*.pyc
files from the dags directory - check if you have installed the dependencies in the right python environment (e.g. in the right version of system-wide python or in the right virtualenv)
Problem: This DAG isn’t available in the web server’s DagBag object. It shows up in this list because the scheduler marked it as active in the metadata database.
- Refresh the DAG code from the UI
- Restart webserver - this did the trick in my case. Some people report that there might be a stalled gunicorn process. If restart doesn’t help, try to find rogue processes and kill them manually (source, source 2)
Problem: I want to delete a DAG
- Airflow 1.10 has a command for this:
airflow delete ...
- Prior to 1.10, you can use following script:
import sys
from airflow.hooks.postgres_hook import PostgresHook
dag_input = sys.argv[1]
hook=PostgresHook( postgres_conn_id= "airflow_db")
for t in ["xcom", "task_instance", "sla_miss", "log", "job", "dag_run", "dag" ]:
sql="delete from {} where dag_id='{}'".format(t, dag_input)
hook.run(sql, True)
(tested: works like a charm :))