Reply is the place to meet an incredible variety of enthusiastic, passionate, ideas-driven people, who want to make a difference and an impact.Would you like to know more?
For organisations seeking value from Big Data, time is of the essence. Recurring tasks need to be performed as soon as the data is ready to be processed. Time-based execution over-simplifies, and will not always meet the business’s needs. By deploying Apache Airflow, Data Reply helps our Clients benefit from time-driven and event-driven task execution, enabling streamlined reporting and analytics, easier to manage Machine Learning pipelines, and providing data to your customer app/website more reliably.
Apache Airflow is an orchestration engine. It can be used to build a Data Pipeline with Task Dependencies. Apache Airflow provides close monitoring of the entire workflow as well as individual task performance over time allowing for continuous improvement of the data pipeline and gives you a reliable and transparent basis to enforce SLAs. Apache Airflow easily scales with increasing workloads, and will also detect underperforming tasks for debugging.
Apache Airflow can be used to build a data pipeline (ETL, Machine Learning, etc.) with task dependencies. It supports the scheduling of tasks and can handle task failures, so that certain actions will be triggered if a task results in an error: for example, issuing an alert, rerunning a task, or triggering alternative workflows. Also, thanks to parallelisation, the DAG can branch, so a task failure in one branch does not have to affect tasks in another.
Apache Airflow offers a user interface that provides close monitoring of the entire workflow as well as individual task performance over time. This is essential for the continuous improvement of the data pipeline and gives you a reliable and transparent basis to enforce SLAs. Apache Airflow easily scales with increasing workloads, and will also detect underperforming tasks for debugging.
We have comprehensive experience in Big Data technologies, many of which can be orchestrated through Apache Airflow, and our experience across a wide array of industries means we have encountered common problems and we can share best practice. Data Reply helps companies build custom features on top of Airflow to fit their needs and use-case.
By way of example, Data Reply built a configurable and automated Data Pipeline on Google Cloud Platform for a leading UK retailer. As soon as the data arrives in the Data Lake (Cloud Storage), Apache Airflow moves the data to a staging bucket and then inserts this data into an ODS (Operational Data Store) table in BigQuery (Google’s managed, petabyte scale, low cost enterprise data warehouse). Airflow then orchestrates joins to create a new table in a BigQuery Data Mart, to be accessed by Data Visualisation tools such as Tableau. The entire pipeline was automated, reducing the pipeline latency (time taken from data arrival to report generation) from 1 week to a single day.
Data Reply is a Reply Group company that specialises in Big Data and Analytics. Our main focus is helping clients to run successful data engineering and machine learning projects. We are based in London, Munich and Milan. www.datareply.co.uk