The amount of satellite data in Earth Observation (EO) is rapidly increasing, providing scientists with new opportunities to study how climate, weather events, and direct anthropogenic factors affect the Earth’s surface. However, the urgency for more accurate and holistic studies drives growth in the size of analyzed data sets (i.e., higher spatial resolution, larger study areas, increasing temporal depth) and complexity of analysis, leading to more complex and more data-intensive workflows. Analyzing such large data sets requires distributed computing resources, which are hard to program and often require specialized expertise to achieve satisfying runtimes. As a result, Earth Observation data to-date are often underused.
Recently, scientific workflow management systems (SWMS) emerged as a new programming paradigm for complex analysis pipelines over large data sets executed on distributed compute resources. They promise simple development, improved portability across systems, automatic scalability on different infrastructures, easier reuse of workflows, and reproducibility of analysis results. As such, using SWMS for EO analysis can boost the more efficient use of EO data and facilitate the dissemination of standardized data pre-processing and processing pipelines.
In this book chapter, we delve into the application of SWMS for Earth Observation. Specifically, we describe three research projects in which we used Nextflow, a popular open source scientific workflow engine, for programming portable and scalable data analysis pipelines. To this end, we describe SWMS in general and specifically Nextflow regarding their suitability for EO data analysis, and give practical examples to highlight advantages and challenges when using SWMS for analyzing large sets of satellite images.