26-28 November, 2019, Vilnius
Conference is over! See you next year.
Magnus Runesson is Senior Data Engineer at Tink responsible for architect, develop and operate their data environment. He has a Master of Science and Engineering from Linköping University, Sweden. Magnus has a long experience to develop and operate distributed systems with high requirements on availability, performance, and integrity from organizations such as Spotify, Svenska Spel, and the Swedish weather service.
Optimize your Data Pipeline without Rewriting it
It is not fast enough! That is one of the more common responses to a data engineer when putting a data pipeline in production. It is easy to dig down into the code and try to optimize it. My experience as a data engineer shows me that it is often easier and more efficient, both in time spent and outcome, to focus on a more holistic view of the pipeline.
In this talk, we will look at a structured process to optimize our batch pipelines. We will introduce steps that make our process data-driven instead of a gut feeling. With examples from real-world cases where delivery time was reduced in order by magnitude, we will look on actions where taken.
The intended audience is a beginner to intermediate data engineers. After the talk, you will have a better understanding of how to optimize your pipeline and be able to explain the steps taken for a stakeholder. You will know:
* what metrics to look at
* how to visualize the metrics
* how to detect bottlenecks and other time thieves from the metrics
* what actions to take.