Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

WIS 2.0 Demonstration Project: GCW Data Portal

The GCW Data Portal is the entry point to datasets describing the cryosphere and forms the information basis for the assessment activities of the GCW. It can interface with scientific and other data providers with WMO-specific interfaces like real-time exchange through WMO GTS.

...

List of items

Table of Contents
minLevel2
maxLevel3
exclude.*Introduction

Access the GCW Data Portal here.

Read more about MeteoIO and the GCW/SLF Open Source Software Package.

GCW Demonstration Project Charter

Introduction

The World Meteorological Organization's Global Cryosphere Watch (GCW) is a mechanism for supporting all key cryospheric in-situ and remote sensing observations, and it facilitates the provision of authoritative data, information, and analyses on the state of the cryosphere.

Image Added

To achieve this, a real-time and long-time series of data and products will have to be made available to all consumers. Data and products are made by NMHSs and other operational and scientific communities. The latter two often have limited resources, relying on a variety of data management approaches, quite different from those of the WMO community. GCW is establishing a link between these communities through WIS and WIGOS. In order to successfully implement GCW, barriers between communities need to be lowered.

...

Image Added

The GCW Data Management is a metadata-driven service-oriented approach. GCW data management is based on the FAIR guiding principles and aligns well with the WIS principles. It follows a metadata-driven approach where datasets are documented by standardized discovery metadata that are exchanged through standardized Web services. The GCW Data Portal can interface with scientific and other data providers with WMO-specific interfaces like real-time exchange through WMO GTS. For all other purposes, the Internet is used as a communication network. A critical component of the discovery metadata exchanged is the application of a standardized semantic annotation of data and interfaces, for example using ontologies as well as linkages between datasets and additional information useful to fully understand a dataset (e.g. WIGOS information).

At the data level, standardised use metadata are required along with containers for the data and services carrying the data. Currently GCW is promoting NetCDF following the Climate and Forecast (CF) convention as the preferred format for data and would welcome a number of WMO CF profiles accompanied by tools to simplify exchange. GCW is already serving free and open data extracted from WMO GTS, converted from WMO BUFR to NetCDF-CF. It is an ambition to fully support the opposite workflow for the data made available through GCW and requested by the WMO community to be available in WMO GTS.

GCW aims to provide access to both real time and archived data (in the form of climate consistent time series). This requires cost efficient mechanisms that can be used for both purposes. GCW is currently relying on OGC WMS and OPeNDAP for exchange of information. The combination of NetCDF-CF and OPeNDAP allows data streaming and on the fly services to be built on top of data in a distributed data management system. Currently GCW support on the fly visualisation and transformation of selected gridded products as well as time series. These services need to be extended to new areas. Transformation services include reformatting (e.g. NetCDF/CF to CSV or NetCDF/CF to WMO GRIB), reprojection, subsetting etc.

In order to support other providers of relevant data from sources which have limited resources for data management, GCW has developed a software stack relying on MeteoIO for transformation of data from unstructured to structured NetCDF/CF (FAIR compliant) and publishing these data using a lightweight OPeNDAP server based on pyDAP. This setup is still under development and the goal is, as resources allow, to establish web services based on MeteoIO. These data can be access by the GCW Data Portal. In essence the GCW Data Management is a metadata driven Service Oriented Approach.

 

The GCW outline of data centres currently involved is provided in the illustration below.

...

Project

...

objectives

  1. To facilitate the access to available datasets from different institutions and projects, by bridging between scientific communities and WMO systems in support of the WMO activities (e.g. WMO operating plan).

  2. Improving interoperability of WMO GCW relevant datasets.

  3. Increasing the amount of data available to support cryosphere related goals of WMO, as delivered by GCW.

  4. Wherever possible efficiently link between WIGOS and WIS metadata.

WIS 2.0 Principles Demonstrated

GCW data management is aligned with the principles of WIS 2.0, as outlined below (using WIS 2.0 principles numbering).

Principle 1

GCW data management is based on harvesting discovery metadata through standardised web services for such exchange (primarily OAI-PMH). The information exchanged is standardised according to ISO19115 or GCMD DIF (currently) and data are encouraged to be served as NetCDF according to the Climate and Forecast convention. This links directly to a Service Oriented Approach relying on Semantic Web and Linked Data.

Principle 2

Discovery metadata are harvested from contributing data centres using URLs. The discovery metadata harvested contains URLs for data access, licence information as well as interpretation of semantic annotation (on scientific parameters or purpose of a URL).

Principle 3

The backbone for all communication within GCW data management is the Internet. For specific purposes GCW will connect with private networks (e.g. WMO GTS).

Principle 4

GCW data management relies on web services for exchanging information on datasets as well as the data themselves and higher order services on top of data. GCW does not currently have service catalogue as a web service, for now this is an internal catalogue.

Principle 5

GCW offers transformation services on top of data that are served according to the CF convention through OPeNDAP. These transformation services allows users (or applications to subset data in time, space or parameter space).

Principle 6

GCW does not currently have messaging protocol and would benefit from WIS efforts in this context.

Principle 7

GCW is currently not caching data, this will be implemented as part of an integration with GTS. These data will be treated as transient datasets in the GCW Data Portal.

Principle 8

GCW is currently considering the data provider and the host data centre of the data as the authoritative source for data. The direct access to a dataset is done by forwarding the data consumer to the web services offered by the host data centre. The only exception to this in the current implementation is when higher order services offered in the GCW Data Portal are used to modify or combine data prior to data delivery.

Principle 9

GCW data management is currently not using WMO GTS for transmission of data, and it relies on WMO efforts in this context. The critical part for GCW is how to connect in an efficient manner to relevant WMO services.

Principle 10

GCW maintains its own catalogue with discovery metadata, but holds currently no catalogue for web services. Integrating the existing GCW services with WIS 2.0 catalogue will be preferable. Currently the main effort of GCW is to ensure good enough quality on the discovery and use metadata supplied by contributors and transform this into WIS compliant information.

Principle 11

GCW is reimplementing web services offering discovery metadata and will in this context support OAI-PMH, OGC CSW and OpenSearch. Details are still under discussion as well as how to ensure integrity in the value chain between the originating data centre and the higher order catalogues like WIS (to avoid duplicated of records). GCW is also working with the ESIP community on extensions that will make Schema.org useful for dataset discovery. The current definition of Schema.org is insufficient for proper dataset discovery and filtering of information, but promising extensions are being discussed and the community working on this has good momentum.

Plan and milestones

Deliverables

No.

Deliverable name

Lead

Del.  date

Status

D1

Updated information model enabling linkages on dataset to WIGOS metadata

MET

2020Q2

Complete

D2

Dynamic visualisation of time series from NetCDF-CF and OPeNDAP

MET

2020Q2

Complete

D3

Updated harvesting of discovery metadata supporting OAI-PMH, OGC CSW and OpenSearch

MET

2021Q2

In progress

D4

NetCDF-CF guidelines for timeseries and profiles (e.g. permafrost)

MET

2021Q3

In progress

D5

Mapping harvested discovery metadata to WMO Core Profile

MET

2021Q3

In progress

D6

Extension of metadata harvesting to support Schema.org provided current ESIP activities are approved by Schema.org

MET

2022Q3

Pending funding

D7

Conversion of NetCDF-CF to WMO BUFR for permafrost profiles

MET

2023Q2

Pending funding

D8

Web service converting non standardised data to NetCDF-CF using MeteoIO

WSL/SLF

2023Q4

Pending funding

 

Milestones

No.

Milestone name

Lead

Due

Status

M1

New information model implemented

MET

2021Q1

In progress

M2

Selected permafrost datasets available online and in real time

MET

2021Q4

In progress

M3

Harvested discovery metadata exposed through WIS

MET

2022Q1

Not started

M4

Transformation of NetCDF-CF to WMO BUFR for selected datasets

MET

2023

Not started

The Global Cryosphere Watch (GCW)

GCW fosters sustained and mutually beneficial international coordination and partnerships between research and operational institutions, by linking research and operations as well as scientists and practitioners. With the establishment of the GCW Data Portal GCW supports research institutes which often do not have the infrastructure, the resources, nor the mandate to enable FAIR data management, which is necessary for interoperability and discovery at data level. Currently, without homogenization work, most collected cryospheric data does not fit into standardized systems or dataflows for broader data access and exchange (as exists at the WMO) and thus is unavailable for operational meteorological and climate applications. This lack of standardization also impairs the reuse of data within the scientific community. Together with the developed software stack, the GCW Data Portal bridges this gap by enabling the transformation of sparsely documented and highly variable data into standardized and well documented data suitable for downstream applications with data level interoperability.

In this way GCW is supporting core WMO activities, which rely on cryospheric information, such as hydrological services, water resource management, weather forecasting, climate monitoring, operational ice services, preparation of early warnings and monitoring of natural hazards, etc. Currently, among others, weak institutional mandates cause limited data access in polar and many mountain regions. Insufficient data exchange mechanisms across sectors continue to hamper the development of hydro-meteorological and climate services for these regions, and existing data sources are underutilized or lost due to fragmentation across multiple operators and the lack of harmonized data policies. Through the establishment of the GCW Data Portal, GCW is also providing an interface for GCW metadata to WMO Information System (WIS) and WMO Integrated Global Observing System (WIGOS).

For further information on GCW, please regard the WMO public and community website, or related content on the GCW website at the Space Science and Engineering Center of the University of Wisconsin-Madison.

GCW Data Portal Specifications

The GCW Data Portal is connected to a number of observing stations (CryoNet stations) that feed the GCW value chain with observations. These may be operated by National Meteorological and Hydrological Services (NMHS) or research institutes. Complementary meteorological data is measured by Contributing Stations. Academia and projects provide additional data. Once registered, all station information and standardized WIGOS metadata can be accessed through OSCAR/Surface.

The table below summarizes all types of metadata involved in the GCW Data Portal.

Type

Purpose

Description

Examples

Discovery metadata

Used to find relevant data

Discovery metadata are also called index metadata and are a digital version of the library index card. It describes who did what, where and when, how to access data and potential constraints on the data. It shall also link to further information on the data like site metadata. GCW is required to expose this information through WMO Information System as well. Discovery metadata are thus WIS metadata, although the GCW portal can translate to WIS for those not using WMO standards directly.

ISO19115

ISO19115 (WIS)

GCMD DIF

Use metadata

Used to understand data found

Use metadata describe the actual content of a dataset and how it is encoded. The purpose is to enable the user to understand the data without any further communication. It describes content of variables using standardised vocabularies, units of variable, encoding of missing values, map projections etc.

Climate and Forecast Convention

BUFR

GRIB

Configuration metadata

Used to tune portal services for datasets for users.

Configuration metadata are used to improve the services offered through a portal to the user community. This can be e.g. how to best visualise a product. This information is maintained by the GCW portal and is not covered by discovery or use metadata standards.

 

Site metadata

Use to understand data found

Site metadata are used to describe the context of observational data. It describes the location of an observation, the instrumentation, procedures etc. To a certain extent it overlaps with discovery metadata, but also extends it. Site metadata can be used for observation network design.

WIGOS

OGC O&M

The GCW Data Portal is hosted by the Norwegian Meteorological Institute (MetNo). In order to keep it manageable, it is necessary to establish a system where adding new data sources (i.e. new stations) comes with very low overhead and where all data assimilation steps are operating automatically. Moreover, the system offers both, a distributed operation (so the data producers retain full control over their data) and a centralized operation (offering more convenience for the smaller data producers) hosted by the WSL Institute for Snow and Avalanche Research (SLF). Offering a centralized operation significantly lowers the requirements on the side of smaller data providers (with limited resources and capabilities) since raw data and metadata can be sent without additional processing. Distributed operation on the other hand allows larger data providers to set up a customized processing chain that is hosted by the data provider itself. In this case the data, use and discovery metadata needs to be exposed in the interface of the provider to be harvested by the GCW Data Portal. The setup of data processing and sharing mechanisms is facilitated by the GCW/SLF Open Source Software Package, which includes among others the data processing software MeteoIO and the User Interface for structured metadata submission ENVIDAT.

For further information on the GCW/SLF Open Source Software Package, please click here.

In the general workflow, a processing engine converts the raw data provided by the data producers into NetCDF-CF standard files. The NetCDF file format provides a standard file format that can be read by many different applications while quite compact and efficient for handling large amounts of data. The Climate and Forecast Convention CF-1.6 provides standard names for the different meteorological parameters as well as the units and other metadata fields, allowing an application to read and interpret the data without any manual action. CF-1.8 or higher is required for outline data. The data is accompanied by NetCDF Attribute Convention for Dataset Discovery (ACDD) metadata. The ACDD standard provides standard search metadata, describing the data origin and the spatial and temporal coverage. The harvest of the discovery metadata through an OPeNDAP server by the data portal web front end prevents that manual editing of the medatadata becomes necessary. Further, no data are stored on the data portal web frontend but only requested on demand to a backend. When a user downloads some data from the web portal, it gets the requested data through the OPeNDAP server. The OPeNDAP client/server architecture allows subset queries of datasets on a temporal, spatial or by variable basis. The search for scientific parameters is currently based on the GCMD Science Keywords.

To access the new release of the GCW Data Portal, please click here.

For a general description of MeteoIO, please click here.

For specific information and software download of MeteoIO, please click here.

Related Literature

For technical specifications, please regard the following document:

View file
nameOystein_2017_Technical_Documentation_Data_Portal.pdf

For further information please consider the following publications:

View file
nameDAMEI_2021_report.pdf
View file
nameBavay_2018_MeteoIO_preprocessing_library.pdf
View file
nameBavay_2018_EGU_Enhancing_data_quality.pdf
View file
nameBavay_2020_Automatic_Data_Standardization.pdf

Related Media

For a short, creative summary of the GCW Data Portal, please regard the following presentation.

...

Project team

Øystein Godøy (Norwegian Meteorological Institute, Oslo, NO) – project lead

Joel Fiddes (Norwegian Meteorological Institute, Oslo, NO, World Meteorological Organization, Geneva, CH)

Mathias Bavay (Institute for Snow and Avalanche Research SLF, Davos, CH)

Rodica Nitu ( World Meteorological Organization, Geneva, CH)