Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The GCW Data Portal is hosted by the Norwegian Meteorological Institute (MetNo). In order to keep it manageable, it is necessary to establish a system where adding new data sources (i.e. new stations) comes with very low overhead and where all data assimilation steps are operating automatically. Moreover, the system offers both, a distributed operation (so the data producers retain full control over their data) and a centralized operation (offering more convenience for the smaller data producers) hosted by the WSL Institute for Snow and Avalanche Research (SLF). Offering a centralized operation significantly lowers the requirements on the side of smaller data providers (with limited resources and capabilities) since raw data and metadata can be sent without additional processing. Distributed operation on the other hand allows larger data providers to set up a customized processing chain that is hosted by the data provider itself. In this case the data, use and discovery metadata needs to be exposed in the interface of the provider to be harvested by the GCW Data Portal. The setup of data processing and sharing mechanisms is facilitated by the GCW/SLF Open Source Software Package, which includes among others the data processing software MeteoIO and the User Interface for structured metadata submission ENVIDAT.

For further information on the GCW/SLF Open Source Software Package, please click here.

In the general workflow, a processing engine converts the raw data provided by the data producers into NetCDF-CF standard files. The NetCDF file format provides a standard file format that can be read by many different applications while quite compact and efficient for handling large amounts of data. The Climate and Forecast Convention CF-1.6 provides standard names for the different meteorological parameters as well as the units and other metadata fields, allowing an application to read and interpret the data without any manual action. CF-1.8 or higher is required for outline data. The data is accompanied by NetCDF Attribute Convention for Dataset Discovery (ACDD) metadata. The ACDD standard provides standard search metadata, describing the data origin and the spatial and temporal coverage. The harvest of the discovery metadata through an OPeNDAP server by the data portal web front end prevents that manual editing of the medatadata becomes necessary. Further, no data are stored on the data portal web frontend but only requested on demand to a backend. When a user downloads some data from the web portal, it gets the requested data through the OPeNDAP server. The OPeNDAP client/server architecture allows subset queries of datasets on a temporal, spatial or by variable basis. The search for scientific parameters is currently based on the GCMD Science Keywords.

The GCW/SLF Open Source Software Package

In the period 2015-2017, GCW has been working with the WSL Institute for Snow and Avalanche Research (SLF) to set up interoperability between the WSL/SLF data centre being responsible for one of the CryoNet stations. WSL/SLF has kindly agreed to make the software stack they have developed available for a wider community. All projects are now available under open source licenses. The provided software tool allow to processes and manage data at various stages of the “datacycle” from sensors to published dataset.

MeteoIO

The core element in the software package is the data preprocessor MeteoIO that takes data from the sensor, through a quality control procedure into standardised NetCDF/CF files which can be published. MeteoIO was originally developed to provide robust meteorological forcing data to an operational model that forms part of the avalanche forecast at the SLF. However, it also happens to be very good at reading diverse data sources and producing a standardised output. It has a modular architecture which makes it flexible and fast to develop new use cases. It can handle both gridded and time series data and has various functions for cleaning/ processing data to various quality standards and produces QA reports. MeteoIO is a C++ library.

MeteoIO goes through several steps for preparing the data, aiming to offer within a single package all the tools that are required to bring raw data to an end data consumer: first, the data are read by one of the more than twenty available plugins supporting that many different formats or protocols (such as CSV files, NetCDF files, databases or web services). Then some basic data editing can be performed (such as merging stations that are next to each other or renaming sensors). The data can then be filtered, by applying a stack of user selected generic filters. These filters can either remove invalid data (such as despiking or low and high pass filters) or correct the data (such as precipitation undercatch correction, debiasing, Kalman filtering). Once this is done, the data are resampled to the requested time steps by various temporal interpolations methods. It is important to keep in mind that during this whole process, MeteoIO works with any sampling rates, including variable sampling rate and can resample to any point in time. If there are still missing data points at the requested time steps, it is possible to rely on data generators to produce some data out of either parametrizations (such as converting a specific humidity into a relative humidity) or very basic strategies (such as generating null precipitation to fill gaps). Finally, either the data are forwarded to the data consuming application or written back by a user-selected plugin.

For the MeteoIO git, please click here.

...

EnviDat

In order to publish discovery metadata for the data prepared through MeteoIO, software developed through the EnviDat project is used. EnviDat is the WSL/SLF main CKAN based dataportal and metadata repository. Core CKAN has been extended to cover specific requirements of research data management. These include an OAI-PMH server, DOI publishing and supporting metadata standards. The advantage of CKAN is that it provides a robust and intuitive UI for structured metadata submission. This enables large parts of the data
management process to be decentralised to the submitter.

For EnviDat extensions please click here.

For further information on the CKAN project, please click here.

To access the new release of the GCW Data Portal, please click here.

...