Global Cryosphere Watch

WIS 2.0 Demonstration Project: GCW Data Portal

The GCW Data Portal is the entry point to datasets describing the cryosphere and forms the information basis for the assessment activities of the GCW. It can interface with scientific and other data providers with WMO-specific interfaces like real-time exchange through WMO GTS.

Access the GCW Data Portal here.

GCW Demonstration Project Charter

Introduction

The World Meteorological Organization's Global Cryosphere Watch (GCW) is a mechanism for supporting all key cryospheric in-situ and remote sensing observations, and it facilitates the provision of authoritative data, information, and analyses on the state of the cryosphere.

To achieve this, a real-time and long-time series of data and products will have to be made available to all consumers. Data and products are made by NMHSs and other operational and scientific communities. The latter two often have limited resources, relying on a variety of data management approaches, quite different from those of the WMO community. GCW is establishing a link between these communities through WIS and WIGOS. In order to successfully implement GCW, barriers between communities need to be lowered.

The GCW Data Management is a metadata-driven service-oriented approach. GCW data management is based on the FAIR guiding principles and aligns well with the WIS principles. It follows a metadata-driven approach where datasets are documented by standardized discovery metadata that are exchanged through standardized Web services. The GCW Data Portal can interface with scientific and other data providers with WMO-specific interfaces like real-time exchange through WMO GTS. For all other purposes, the Internet is used as a communication network. A critical component of the discovery metadata exchanged is the application of a standardized semantic annotation of data and interfaces, for example using ontologies as well as linkages between datasets and additional information useful to fully understand a dataset (e.g. WIGOS information).

At the data level, standardised use metadata are required along with containers for the data and services carrying the data. Currently GCW is promoting NetCDF following the Climate and Forecast (CF) convention as the preferred format for data and would welcome a number of WMO CF profiles accompanied by tools to simplify exchange. GCW is already serving free and open data extracted from WMO GTS, converted from WMO BUFR to NetCDF-CF. It is an ambition to fully support the opposite workflow for the data made available through GCW and requested by the WMO community to be available in WMO GTS.

GCW aims to provide access to both real time and archived data (in the form of climate consistent time series). This requires cost efficient mechanisms that can be used for both purposes. GCW is currently relying on OGC WMS and OPeNDAP for exchange of information. The combination of NetCDF-CF and OPeNDAP allows data streaming and on the fly services to be built on top of data in a distributed data management system. Currently GCW support on the fly visualisation and transformation of selected gridded products as well as time series. These services need to be extended to new areas. Transformation services include reformatting (e.g. NetCDF/CF to CSV or NetCDF/CF to WMO GRIB), reprojection, subsetting etc.

In order to support other providers of relevant data from sources which have limited resources for data management, GCW has developed a software stack relying on MeteoIO for transformation of data from unstructured to structured NetCDF/CF (FAIR compliant) and publishing these data using a lightweight OPeNDAP server based on pyDAP. This setup is still under development and the goal is, as resources allow, to establish web services based on MeteoIO. These data can be access by the GCW Data Portal. In essence the GCW Data Management is a metadata driven Service Oriented Approach.

The GCW outline of data centres currently involved is provided in the illustration below.

Project objectives

To facilitate the access to available datasets from different institutions and projects, by bridging between scientific communities and WMO systems in support of the WMO activities (e.g. WMO operating plan).
Improving interoperability of WMO GCW relevant datasets.
Increasing the amount of data available to support cryosphere related goals of WMO, as delivered by GCW.
Wherever possible efficiently link between WIGOS and WIS metadata.

WIS 2.0 Principles Demonstrated

GCW data management is aligned with the principles of WIS 2.0, as outlined below (using WIS 2.0 principles numbering).

Principle 1

GCW data management is based on harvesting discovery metadata through standardised web services for such exchange (primarily OAI-PMH). The information exchanged is standardised according to ISO19115 or GCMD DIF (currently) and data are encouraged to be served as NetCDF according to the Climate and Forecast convention. This links directly to a Service Oriented Approach relying on Semantic Web and Linked Data.

Principle 2

Discovery metadata are harvested from contributing data centres using URLs. The discovery metadata harvested contains URLs for data access, licence information as well as interpretation of semantic annotation (on scientific parameters or purpose of a URL).

Principle 3

The backbone for all communication within GCW data management is the Internet. For specific purposes GCW will connect with private networks (e.g. WMO GTS).

Principle 4

GCW data management relies on web services for exchanging information on datasets as well as the data themselves and higher order services on top of data. GCW does not currently have service catalogue as a web service, for now this is an internal catalogue.

Principle 5

GCW offers transformation services on top of data that are served according to the CF convention through OPeNDAP. These transformation services allows users (or applications to subset data in time, space or parameter space).

Principle 6

GCW does not currently have messaging protocol and would benefit from WIS efforts in this context.

Principle 7

GCW is currently not caching data, this will be implemented as part of an integration with GTS. These data will be treated as transient datasets in the GCW Data Portal.

Principle 8

GCW is currently considering the data provider and the host data centre of the data as the authoritative source for data. The direct access to a dataset is done by forwarding the data consumer to the web services offered by the host data centre. The only exception to this in the current implementation is when higher order services offered in the GCW Data Portal are used to modify or combine data prior to data delivery.

Principle 9

GCW data management is currently not using WMO GTS for transmission of data, and it relies on WMO efforts in this context. The critical part for GCW is how to connect in an efficient manner to relevant WMO services.

Principle 10

GCW maintains its own catalogue with discovery metadata, but holds currently no catalogue for web services. Integrating the existing GCW services with WIS 2.0 catalogue will be preferable. Currently the main effort of GCW is to ensure good enough quality on the discovery and use metadata supplied by contributors and transform this into WIS compliant information.

Principle 11

GCW is reimplementing web services offering discovery metadata and will in this context support OAI-PMH, OGC CSW and OpenSearch. Details are still under discussion as well as how to ensure integrity in the value chain between the originating data centre and the higher order catalogues like WIS (to avoid duplicated of records). GCW is also working with the ESIP community on extensions that will make Schema.org useful for dataset discovery. The current definition of Schema.org is insufficient for proper dataset discovery and filtering of information, but promising extensions are being discussed and the community working on this has good momentum.

Plan and milestones

Deliverables

No.	Deliverable name	Lead	Del. date	Status
D1	Updated information model enabling linkages on dataset to WIGOS metadata	MET	2020Q2	Complete
D2	Dynamic visualisation of time series from NetCDF-CF and OPeNDAP	MET	2020Q2	Complete
D3	Updated harvesting of discovery metadata supporting OAI-PMH, OGC CSW and OpenSearch	MET	2021Q2	In progress
D4	NetCDF-CF guidelines for timeseries and profiles (e.g. permafrost)	MET	2021Q3	In progress
D5	Mapping harvested discovery metadata to WMO Core Profile	MET	2021Q3	In progress
D6	Extension of metadata harvesting to support Schema.org provided current ESIP activities are approved by Schema.org	MET	2022Q3	Pending funding
D7	Conversion of NetCDF-CF to WMO BUFR for permafrost profiles	MET	2023Q2	Pending funding
D8	Web service converting non standardised data to NetCDF-CF using MeteoIO	WSL/SLF	2023Q4	Pending funding

Milestones

No.	Milestone name	Lead	Due	Status
M1	New information model implemented	MET	2021Q1	In progress
M2	Selected permafrost datasets available online and in real time	MET	2021Q4	In progress
M3	Harvested discovery metadata exposed through WIS	MET	2022Q1	Not started
M4	Transformation of NetCDF-CF to WMO BUFR for selected datasets	MET	2023	Not started

Supporting information and links

The Global Cryosphere Watch (GCW)

GCW fosters sustained and mutually beneficial international coordination and partnerships between research and operational institutions, by linking research and operations as well as scientists and practitioners. With the establishment of the GCW Data Portal GCW supports research institutes which often do not have the infrastructure, the resources, nor the mandate to enable FAIR data management, which is necessary for interoperability and discovery at data level. Currently, without homogenization work, most collected cryospheric data does not fit into standardized systems or dataflows for broader data access and exchange (as exists at the WMO) and thus is unavailable for operational meteorological and climate applications. This lack of standardization also impairs the reuse of data within the scientific community. Together with the developed software stack, the GCW Data Portal bridges this gap by enabling the transformation of sparsely documented and highly variable data into standardized and well documented data suitable for downstream applications with data level interoperability.

In this way GCW is supporting core WMO activities, which rely on cryospheric information, such as hydrological services, water resource management, weather forecasting, climate monitoring, operational ice services, preparation of early warnings and monitoring of natural hazards, etc. Currently, among others, weak institutional mandates cause limited data access in polar and many mountain regions. Insufficient data exchange mechanisms across sectors continue to hamper the development of hydro-meteorological and climate services for these regions, and existing data sources are underutilized or lost due to fragmentation across multiple operators and the lack of harmonized data policies. Through the establishment of the GCW Data Portal, GCW is also providing an interface for GCW metadata to WMO Information System (WIS) and WMO Integrated Global Observing System (WIGOS).

For further information on GCW, please regard the WMO public and community website, or related content on the GCW website at the Space Science and Engineering Center of the University of Wisconsin-Madison.

GCW Data Portal Specifications

In order to satisfy the requirements for the users, the NetCDF file format has been chosen with the Climate and Forecast Convention (CF) for the metadata. The NetCDF file format provides a standard file format that can be read by many different applications while quite compact and efficient for handling large amounts of data. The CF-1.6 convention provides standard names for the different meteorological parameters as well as the units and other metadata fields, allowing an application to read and interpret the data without any manual action.

A processing engine converts the raw data provided by the data producers into NetCDF-CF standard files with NetCDF Attribute Convention for Dataset Discovery (ACDD) metadata. The ACDD standard provides standard search metadata, describing the data origin and the spatial and temporal coverage. The data portal web front end harvests the metadata necessary for its search engine through an OPeNDAP server so no manual editing of the medatadata is necessary. Further, no data are stored on the data portal web frontend but only requested on demand to a backend. When a user downloads some data from the web portal, it gets the requested data through the OPeNDAP server. The OPeNDAP client/server architecture allows subset queries of datasets on a temporal, spatial or by variable basis. The search for scientific parameters is currently based on the GCMD Science Keywords.

The GCW/SLF Open Source Software Package

GCW is depending on a number of observing stations (CryoNet stations) for feeding the GCW value chain with observations. GCW has a requirement for both real time and archived data. In the period 2015-2017, GCW has been working with the WSL Institute for Snow and Avalanche Research (SLF) to set up interoperability between the WSL/SLF data centre being responsible for one of the CryoNet stations. WSL/SLF has kindly agreed to make the software stack they have developed available for a wider community. All projects are now available under open source licenses. The provided software tool allow to processes and manage data at various stages of the “datacycle” from sensors to published dataset.

MeteoIO

The core element in the software package is the data preprocessor MeteoIO that takes data from the sensor, through a quality control procedure into standardised NetCDF/CF files which can be published. MeteoIO was originally developed to provide robust meteorological forcing data to an operational model that forms part of the avalanche forecast at the SLF. However, it also happens to be very good at reading diverse data sources and producing a standardised output. It has a modular architecture which makes it flexible and fast to develop new use cases. It can handle both gridded and time series data and has various functions for cleaning/ processing data to various quality standards and produces QA reports. MeteoIO is a C++ library.

MeteoIO goes through several steps for preparing the data, aiming to offer within a single package all the tools that are required to bring raw data to an end data consumer: first, the data are read by one of the more than twenty available plugins supporting that many different formats or protocols (such as CSV files, NetCDF files, databases or web services). Then some basic data editing can be performed (such as merging stations that are next to each other or renaming sensors). The data can then be filtered, by applying a stack of user selected generic filters. These filters can either remove invalid data (such as despiking or low and high pass filters) or correct the data (such as precipitation undercatch correction, debiasing, Kalman filtering). Once this is done, the data are resampled to the requested time steps by various temporal interpolations methods. It is important to keep in mind that during this whole process, MeteoIO works with any sampling rates, including variable sampling rate and can resample to any point in time. If there are still missing data points at the requested time steps, it is possible to rely on data generators to produce some data out of either parametrizations (such as converting a specific humidity into a relative humidity) or very basic strategies (such as generating null precipitation to fill gaps). Finally, either the data are forwarded to the data consuming application or written back by a user-selected plugin.

For the MeteoIO git, please click here.

EnviDat

In order to publish discovery metadata for the data prepared through MeteoIO, software developed through the EnviDat project is used. EnviDat is the WSL/SLF main CKAN based dataportal and metadata repository. Core CKAN has been extended to cover specific requirements of research data management. These include an OAI-PMH server, DOI publishing and supporting metadata standards. The advantage of CKAN is that it provides a robust and intuitive UI for structured metadata submission. This enables large parts of the data
management process to be decentralised to the submitter.

For EnviDat extensions please click here.

For further information on the CKAN project, please click here.