WIS 2.0 Demonstration Project: GCW Data Portal
The GCW Data Portal is the entry point to datasets describing the cryosphere and forms the information basis for the assessment activities of the GCW. It can interface with scientific and other data providers with WMO-specific interfaces like real-time exchange through WMO GTS.
...
List of items
Table of Contents | ||||||
---|---|---|---|---|---|---|
|
Access the GCW Data Portal here.
Read more about MeteoIO and the GCW/SLF Open Source Software Package.
GCW Demonstration Project Charter
...
Plan and milestones
Deliverables
No. | Deliverable name | Lead | Del. date | Status |
---|---|---|---|---|
D1 | Updated information model enabling linkages on dataset to WIGOS metadata | MET | 2020Q2 | Complete |
D2 | Dynamic visualisation of time series from NetCDF-CF and OPeNDAP | MET | 2020Q2 | Complete |
D3 | Updated harvesting of discovery metadata supporting OAI-PMH, OGC CSW and OpenSearch | MET | 2021Q2 | In progress |
D4 | NetCDF-CF guidelines for timeseries and profiles (e.g. permafrost) | MET | 2021Q3 | In progress |
D5 | Mapping harvested discovery metadata to WMO Core Profile | MET | 2021Q3 | In progress |
D6 | Extension of metadata harvesting to support Schema.org provided current ESIP activities are approved by Schema.org | MET | 2022Q3 | Pending funding |
D7 | Conversion of NetCDF-CF to WMO BUFR for permafrost profiles | MET | 2023Q2 | Pending funding |
D8 | Web service converting non standardised data to NetCDF-CF using MeteoIO | WSL/SLF | 2023Q4 | Pending funding |
Milestones
No. | Milestone name | Lead | Due | Status |
---|---|---|---|---|
M1 | New information model implemented | MET | 2021Q1 | In progress |
M2 | Selected permafrost datasets available online and in real time | MET | 2021Q4 | In progress |
M3 | Harvested discovery metadata exposed through WIS | MET | 2022Q1 | Not started |
M4 | Transformation of NetCDF-CF to WMO BUFR for selected datasets | MET | 2023 | Not started |
Supporting information and links
...
GCW Data Portal Specifications
In order to satisfy the requirements for the users, the NetCDF file format has been chosen with the Climate and Forecast Convention (CF) for the metadata. The NetCDF file format provides a standard file format that can be read by many different applications while quite compact and efficient for handling large amounts of data. The CF-1.6 convention provides standard names for the different meteorological parameters as well as the units and other metadata fields, allowing an application to read and interpret the data without any manual action.
A processing engine converts the raw data provided by the data producers into NetCDF-CF standard files with NetCDF Attribute Convention for Dataset Discovery (ACDD) metadata. The ACDD standard provides standard search metadata, describing the data origin and the spatial and temporal coverage. The data portal web front end harvests the metadata necessary for its search engine through an OPeNDAP server so no manual editing of the medatadata is necessary. Further, no data are stored on the data portal web frontend but only requested on demand to a backend. When a user downloads some data from the web portal, it gets the requested data through the OPeNDAP server. The OPeNDAP client/server architecture allows subset queries of datasets on a temporal, spatial or by variable basis. The search for scientific parameters is currently based on the GCMD Science Keywords.
The GCW/SLF Open Source Software Package
GCW is depending on a number of observing stations (CryoNet stations) for feeding the GCW value chain with observations. GCW has a requirement for both real time and archived data. In the period 2015-2017, GCW has been working with the WSL Institute for Snow and Avalanche Research (SLF) to set up interoperability between the WSL/SLF data centre being responsible for one of the CryoNet stations. WSL/SLF has kindly agreed to make the software stack they have developed available for a wider community. All projects are now available under open source licenses. The provided software tool allow to processes and manage data at various stages of the “datacycle” from sensors to published dataset.
MeteoIO
The core element in the software package is the data preprocessor MeteoIO that takes data from the sensor, through a quality control procedure into standardised NetCDF/CF files which can be published. MeteoIO was originally developed to provide robust meteorological forcing data to an operational model that forms part of the avalanche forecast at the SLF. However, it also happens to be very good at reading diverse data sources and producing a standardised output. It has a modular architecture which makes it flexible and fast to develop new use cases. It can handle both gridded and time series data and has various functions for cleaning/ processing data to various quality standards and produces QA reports. MeteoIO is a C++ library.
MeteoIO goes through several steps for preparing the data, aiming to offer within a single package all the tools that are required to bring raw data to an end data consumer: first, the data are read by one of the more than twenty available plugins supporting that many different formats or protocols (such as CSV files, NetCDF files, databases or web services). Then some basic data editing can be performed (such as merging stations that are next to each other or renaming sensors). The data can then be filtered, by applying a stack of user selected generic filters. These filters can either remove invalid data (such as despiking or low and high pass filters) or correct the data (such as precipitation undercatch correction, debiasing, Kalman filtering). Once this is done, the data are resampled to the requested time steps by various temporal interpolations methods. It is important to keep in mind that during this whole process, MeteoIO works with any sampling rates, including variable sampling rate and can resample to any point in time. If there are still missing data points at the requested time steps, it is possible to rely on data generators to produce some data out of either parametrizations (such as converting a specific humidity into a relative humidity) or very basic strategies (such as generating null precipitation to fill gaps). Finally, either the data are forwarded to the data consuming application or written back by a user-selected plugin.
For the MeteoIO git, please click here.
...
EnviDat
In order to publish discovery metadata for the data prepared through MeteoIO, software developed through the EnviDat project is used. EnviDat is the WSL/SLF main CKAN based dataportal and metadata repository. Core CKAN has been extended to cover specific requirements of research data management. These include an OAI-PMH server, DOI publishing and supporting metadata standards. The advantage of CKAN is that it provides a robust and intuitive UI for structured metadata submission. This enables large parts of the data
management process to be decentralised to the submitter.
For EnviDat extensions please click here.
For further information on the CKAN project, please click hereThe GCW Data Portal is connected to a number of observing stations (CryoNet stations) that feed the GCW value chain with observations. These may be operated by National Meteorological and Hydrological Services (NMHS) or research institutes. Complementary meteorological data is measured by Contributing Stations. Academia and projects provide additional data. Once registered, all station information and standardized WIGOS metadata can be accessed through OSCAR/Surface.
The table below summarizes all types of metadata involved in the GCW Data Portal.
Type | Purpose | Description | Examples |
---|---|---|---|
Discovery metadata | Used to find relevant data | Discovery metadata are also called index metadata and are a digital version of the library index card. It describes who did what, where and when, how to access data and potential constraints on the data. It shall also link to further information on the data like site metadata. GCW is required to expose this information through WMO Information System as well. Discovery metadata are thus WIS metadata, although the GCW portal can translate to WIS for those not using WMO standards directly. | ISO19115 ISO19115 (WIS) GCMD DIF |
Use metadata | Used to understand data found | Use metadata describe the actual content of a dataset and how it is encoded. The purpose is to enable the user to understand the data without any further communication. It describes content of variables using standardised vocabularies, units of variable, encoding of missing values, map projections etc. | Climate and Forecast Convention BUFR GRIB |
Configuration metadata | Used to tune portal services for datasets for users. | Configuration metadata are used to improve the services offered through a portal to the user community. This can be e.g. how to best visualise a product. This information is maintained by the GCW portal and is not covered by discovery or use metadata standards. |
|
Site metadata | Use to understand data found | Site metadata are used to describe the context of observational data. It describes the location of an observation, the instrumentation, procedures etc. To a certain extent it overlaps with discovery metadata, but also extends it. Site metadata can be used for observation network design. | WIGOS OGC O&M |
The GCW Data Portal is hosted by the Norwegian Meteorological Institute (MetNo). In order to keep it manageable, it is necessary to establish a system where adding new data sources (i.e. new stations) comes with very low overhead and where all data assimilation steps are operating automatically. Moreover, the system offers both, a distributed operation (so the data producers retain full control over their data) and a centralized operation (offering more convenience for the smaller data producers) hosted by the WSL Institute for Snow and Avalanche Research (SLF). Offering a centralized operation significantly lowers the requirements on the side of smaller data providers (with limited resources and capabilities) since raw data and metadata can be sent without additional processing. Distributed operation on the other hand allows larger data providers to set up a customized processing chain that is hosted by the data provider itself. In this case the data, use and discovery metadata needs to be exposed in the interface of the provider to be harvested by the GCW Data Portal. The setup of data processing and sharing mechanisms is facilitated by the GCW/SLF Open Source Software Package, which includes among others the data processing software MeteoIO and the User Interface for structured metadata submission ENVIDAT.
For further information on the GCW/SLF Open Source Software Package, please click here.
In the general workflow, a processing engine converts the raw data provided by the data producers into NetCDF-CF standard files. The NetCDF file format provides a standard file format that can be read by many different applications while quite compact and efficient for handling large amounts of data. The Climate and Forecast Convention CF-1.6 provides standard names for the different meteorological parameters as well as the units and other metadata fields, allowing an application to read and interpret the data without any manual action. CF-1.8 or higher is required for outline data. The data is accompanied by NetCDF Attribute Convention for Dataset Discovery (ACDD) metadata. The ACDD standard provides standard search metadata, describing the data origin and the spatial and temporal coverage. The harvest of the discovery metadata through an OPeNDAP server by the data portal web front end prevents that manual editing of the medatadata becomes necessary. Further, no data are stored on the data portal web frontend but only requested on demand to a backend. When a user downloads some data from the web portal, it gets the requested data through the OPeNDAP server. The OPeNDAP client/server architecture allows subset queries of datasets on a temporal, spatial or by variable basis. The search for scientific parameters is currently based on the GCMD Science Keywords.
Links
To access the new release of the GCW Data Portal, please click here.
...