BOOST-EDS

Using NERC EDS services to run data processing pipelines, host applications and archive live data products.

Thomas Zwagerman

DIT

David Wilby

DIT

James Byrne

DIT

Petra ten Hoopen

PDC

Paul Breen

PDC

Helen Peat

PDC

Scott Hosking

AI-LAB

2024-07-19

BOOST-EDS

  • Proof-of-concept cloud workflow
  • Goals:
    1. Demonstrate NERC Environmental Data Service (EDS) tools and how we can leverage them at BAS.
    2. Broader message on how to prepare research software for operationalisation.

NERC EDS and its services

  • What is the EDS?
  • What is JASMIN?
  • What is Datalabs?
  • How do the data centres fit in?

Archival into the UK Polar Data centre

  • NERC Data Policy
  • Data Management Plan (DMP)
  • Minting of DOI
  • BODC, NGDC, CEDA, EIDC
  • DSW, FAIR Principles

Use-case 1: Amundsen Sea Low Index (ASLI)

  • Highly dynamic, mobile climatological low pressure system
  • Crucial in understanding regional change over West Antarctica
  • Good ASL representation improve climate models

Use-case 1: ASLI Image

The Amundsen Seas Low (ASL) (Hosking et al. 2016)

Use-case 1: User Requirements

  1. Run monthly, when new ERA5 data is released.
  2. Automated data processing pipeline which performs calculations.
  3. Data accessible to other systems, notebook, app, digital twin?
  4. Automatic archival into the PDC, appending dataset under same DOI.

Use-case 1: The Onion

  • asli (Hosking & Wilby 2024): Python package containing ASL calculations described in Hosking et al. (2016).
  • asli-pipeline: Data processing pipeline.
  • asliapp: Shiny application.

Use-case 1: Technical Implementation

Discussion: FAIR Principles

  • FAIR Principles: archival of “live” data products.
  • Providing full provenance for a derived dataset.

Discussion: Operationalising Research Software

  • Combining digital research infrastructure (DRI).
  • Separation of data - code - configuration.
  • Enabling deployment on different DRI.
  • Preparing RS for deployment to operational systems ahead of time

Discussion: Limitations of EDS servicse

  • Digital research infrastructure - not for production.
  • “Everything is prototype”.
  • No clear guidance on cloud-optimised data formats keeping in mind accessibility to large datasets, performance, cost-effectiveness, reliability, collaboration benefits

References

Brown, M. J., & Chevuturi, A. object_store_tutorial [Computer software]. https://github.com/NERC-CEH/object_store_tutorial

Hosking, J. S., A. Orr, T. J. Bracegirdle, and J. Turner (2016), Future circulation changes off West Antarctica: Sensitivity of the Amundsen Sea Low to projected anthropogenic forcing, Geophys. Res. Lett., 43, 367–376, doi:10.1002/2015GL067143.

Hosking, J. S., & Wilby, D. asli [Computer software]. https://github.com/scotthosking/amundsen-sea-low-index

Lawrence, B. N. , Bennett, V. L., Churchill, J., Juckes, M., Kershaw, P., Pascoe, S., Pepler, S., Pritchard, M. and Stephens, A. (2013) Storing and manipulating environmental big data with JASMIN. In: IEEE Big Data, October 6-9, 2013, San Francisco.