Skip to content
This repository was archived by the owner on Oct 11, 2021. It is now read-only.
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 2 additions & 4 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -25,13 +25,11 @@ pack:
7z a ./functions/package.zip ./functions/*.py -stl

s3-%: pack
# aws s3 rm $(bucket)-$*/$(prefix) --recursive
aws s3 sync --exclude '.*' --acl public-read . $(bucket)-$*/$(prefix)
aws s3 sync --delete --exclude '.*' --acl public-read . $(bucket)-$*/$(prefix)

targets := $(addprefix s3-,$(regions))
sync: pack $(targets)
# aws s3 rm $(bucket)/$(prefix) --recursive
aws s3 sync --exclude '.*' --acl public-read . $(bucket)/$(prefix)
aws s3 sync --delete --exclude '.*' --acl public-read . $(bucket)/$(prefix)

test: pack
pytest -vv
Expand Down
37 changes: 15 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ tinkered with, allowing it to be used in real production environments with
little extra effort. Deploy in a few clicks, personalize in a few fields,
configure in a few commands.


## Overview

![stack diagram](/.github/img/stack-diagram.png)
Expand All @@ -19,8 +20,7 @@ The stack is composed mainly of three services: the Airflow web server, the
Airflow scheduler, and the Airflow worker. Supporting resources include an RDS
to host the Airflow metadata database, an SQS to be used as broker backend, S3
buckets for logs and deployment bundles, an EFS to serve as shared directory,
and a custom CloudWatch metric measured by a timed AWS Lambda. All other
resources are the usual boilerplate to keep the wind blowing.
and a custom CloudWatch metric measured by a timed AWS Lambda.

### Deployment and File Sharing

Expand Down Expand Up @@ -53,6 +53,7 @@ the latter is a very advanced scenario and would be best handled by Celery's own
scaling mechanism or offloading the computation to another system (like Spark or
Kubernetes) and use Airflow only for orchestration.


## Get It Working

### 0. Prerequisites
Expand All @@ -70,8 +71,8 @@ branch (defaults to your last used region):
[![Launch](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/images/cloudformation-launch-stack-button.png)](https://console.aws.amazon.com/cloudformation/home#/stacks/new?templateURL=https://turbine-quickstart.s3.amazonaws.com/quickstart-turbine-airflow/templates/turbine-master.template)

The stack resources take around 15 minutes to create, while the airflow
installation and bootstrap another 3 to 5 minutes. After that you can already
access the Airflow UI and deploy your own Airflow DAGs.
installation another 3 to 5 minutes. After that you can already access the
Airflow UI and deploy your own Airflow DAGs.

### 2. Upstream your files

Expand All @@ -97,8 +98,6 @@ debug or just inspect the Airflow services and database. The stack is designed
to minimize this need, but just in case it also offers decent internal tooling
for those scenarios.

### Using Systems Manager Sessions

Instead of the usual SSH procedure, this stack encourages the use of AWS Systems
Manager Sessions for increased security and auditing capabilities. You can still
use the CLI after a bit more configuration and not having to expose your
Expand All @@ -125,7 +124,7 @@ coming, or the `--no-pager` to directly dump the text lines, but it offers [much
more](https://www.freedesktop.org/software/systemd/man/journalctl.html).

```bash
$ sudo journalctl -u airflow -n 50
$ sudo journalctl -u airflow-scheduler -n 50
```


Expand All @@ -144,26 +143,19 @@ $ sudo journalctl -u airflow -n 50
Workers have lifecycle hooks that make sure to wait for Celery to finish its
tasks before allowing EC2 to terminate that instance (except maybe for Spot
Instances going out of capacity). If you want to kill running tasks, you
will need to SSH into worker instances and stop the airflow service
forcefully.
will need to forcefully stop the airflow systemd services (via AWS Systems
Manager).

3. Is there any documentation around the architectural decisions?

Yes, most of them should be available in the project's GitHub
[Wiki](https://github.com/villasv/aws-airflow-stack/wiki). It doesn't mean
those decisions are final, but reading them beforehand will help formulating
new proposals.
Yes, they should be available in the project's GitHub [Wiki][]. It doesn't
mean those decisions are final, but reading them beforehand will help
formulating new proposals.

## Contributing
[Wiki]: https://github.com/villasv/aws-airflow-stack/wiki

>This project aims to be constantly evolving with up to date tooling and newer
>AWS features, as well as improving its design qualities and maintainability.
>Requests for Enhancement should be abundant and anyone is welcome to pick them
>up.
>
>Stacks can get quite opinionated. If you have a divergent fork, you may open a
>Request for Comments and we will index it. Hopefully this will help to build a
>diverse set of possible deployment models for various production needs.

## Contributing

See the [contribution guidelines](/CONTRIBUTING.md) for details.

Expand All @@ -174,6 +166,7 @@ Did this project help you? Consider buying me a cup of coffee ;-)

[![Buy me a coffee!](https://www.buymeacoffee.com/assets/img/custom_images/white_img.png)](https://www.buymeacoffee.com/villasv)


## Licensing

> MIT License
Expand Down
3 changes: 2 additions & 1 deletion ci/taskcat.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@ tests:
master:
template: templates/turbine-master.template
regions:
- sa-east-1
- us-east-1
- us-east-2
- us-west-1
parameters:
QSS3BucketName: "$[taskcat_autobucket]"
3 changes: 3 additions & 0 deletions examples/project/airflow/appspec.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,6 @@ hooks:
ApplicationStop:
- location: scripts/cdapp_stop.sh
runas: root
AfterInstall:
- location: scripts/cdapp_deps.sh
runas: root
44 changes: 37 additions & 7 deletions examples/project/airflow/dags/my_dag.py
Original file line number Diff line number Diff line change
@@ -1,17 +1,47 @@
from datetime import datetime
from datetime import datetime, timedelta

from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from airflow.operators.python_operator import PythonOperator
import silly

default_args = {
"start_date": datetime(2019, 1, 1),
}

dag = DAG(dag_id="my_dag", default_args=default_args, schedule_interval="@daily",)
with DAG(
"my_dag", default_args=default_args, schedule_interval=timedelta(days=1)
) as dag:

for i in range(5):
task = BashOperator(
task_id="runme_" + str(i),
bash_command='echo "{{ task_instance_key_str }}" && sleep 5 && echo "done"',
dag=dag,
setup_task = BashOperator(
task_id="setup",
bash_command='echo "setup initiated" && sleep 5 && echo "done"',
)

def fetch_companies():
return [silly.company(capitalize=True) for _ in range(5)]

fetch_companies_task = PythonOperator(
task_id="fetch_companies", python_callable=fetch_companies,
)
setup_task >> fetch_companies_task

def generate_reports(**context):
companies = context["task_instance"].xcom_pull(task_ids="fetch_companies")
reports = [
f"# '{company}' Report\n\n{silly.markdown()}" for company in companies
]
return reports

generate_reports_task = PythonOperator(
task_id="generate_reports",
python_callable=generate_reports,
provide_context=True,
)
fetch_companies_task >> generate_reports_task

teardown_task = BashOperator(
task_id="teardown",
bash_command='echo "teardown initiated" && sleep 5 && echo "done"',
)
generate_reports_task >> teardown_task
2 changes: 2 additions & 0 deletions examples/project/airflow/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
apache-airflow[aws]==1.10.10
silly
2 changes: 2 additions & 0 deletions examples/project/airflow/scripts/cdapp_deps.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
#!/bin/bash -e
pip3 install -r /airflow/requirements.txt