2022-02-21 ET-W2AT Meeting

Date

Feb 21, 2022 14:30-16:30 UTC

Participants

ET-W2AT

  • @Jeremy Tandy (Unlicensed)

  • @Rémy Giraud

  • @Dana Ostrenga (Unlicensed) (absent)

  • @thorsten.buesselberg (Unlicensed)

  • @Kai Wirt (Unlicensed)

  • @Henning Weber (Unlicensed)

  • @Tom Kralidis (Unlicensed) (absent)

  • @peter.silva (Unlicensed)

  • @Kenji Tsunoda (Unlicensed)

  • @Li Xiang (Unlicensed) (absent)

  • @Baudouin Raoult (Unlicensed) (absent)

WMO Secretariat

  • @Peiliang Shi (Unlicensed)

  • @Enrico Fucile

  • @HADDOUCH Hassan

  • @Timo Proescholdt

  • @David Berry

  • @Anna Milan

  • @Xiaoxia Chen

Goals

To discuss WIS2 Monitoring

Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

1

Rémy

Rémy introduces the topic and presents his slides on Monitoring Toward an Open Standard Approach

  • Monitoring needs:

    • GBON

    • Data quality (WDQMS - developed by ECMWF)

    • GTS to WIS2

    • WIS2 operations

  •  Prometheus: three-part solution :

1. standard way of describing metrics; "exposition format"

  • now a draft RFC with IETF "openmetrics"

  • many tools (mosquito, nginx, docker) are making metrics available in this form

2. storing data in a time-series database

3. exporting the data to dashboard like Grafana

We can't say "everyone must use Prometheus", but we can assert the standards to use: the metrics document

What should the metrics standard include? Encoding, data model for describing metrics, the method to present the metrics (e.g. HTTPS or MQTT) etc.

Remy > objective is to have one method for collecting metrics - not ad-hoc methods for different types of metrics. Push for a standard solution. The metrics themselves will be different across the application domains - but we want the same mechanism for gathering and describing those metrics.  So, we can develop specifications for particular "exposition documents", agreeing a standard for the metric format for WMO application domain, e.g. WDQMS, and for specific domain applications, we develop mechanism to publish those documents

 Jeremy >Do we need a Global Monitoring Centre to provide a (real-time) dashboard view of WIS2 (etc.)?

Remy > Let's first agree how we collect the metrics; we know that we'll need something for GBON, but we don't yet know what's needed for WIS2 operations , let's build toward that goal

[Kai > But there is a requirement for a Global Coordinate to collect the metrics and do something useful with them. We will need to agree what we're presenting on the dashboard - and how we're presenting it]

[Remy > We shouldn't try to standardise the UX for the dashboard - only the information presented, but for each "mission" (like GBON or WDQMS) there should be a reference implementation, and then others could look different - e.g. in different languages]

[David > I keep finding myself coming back to the Ocean-Ops dashboard which has proved very useful for the ocean observing community (https://www.ocean-ops.org/board

To be discussed:

  • Should we use openmetric?

  • What frequency of update do we need for metrics?

2

Kai

Kai introduce his slides on WIS Monitoring as a lead of TT-WISMon

Two types of monitoring:

  1. availability of data - relates to WDQMS; GBON requirements, and the GTS2WIS2 transition

  2. services monitoring - is component X available, is the system healthy, availability of broker, cache, catalogue

 TT-WISMon provides software to "Sensor centres" to decode incoming data streams (weather and climate data from GTS or WIS2) and generate metrics. Then send these metrics to the [Global] WIS Monitoring centre via pub/sub.

 Jeremy> Is "Sensor center" a new role in WIS2?

[Kai > for services monitoring, propose that all centers are Sensor centers - but only need some Sensor centers looking at data availability and exposing the related metrics]

[Remy > Is this just for WDQMS; improving the monitoring by establishing additional monitoring point]

[Kai > The role of a Sensor center is to share their view of the world - be that data or service availability]

[Remy > so a Sensor center might affiliate(?) to a particular type of monitoring - e.g. WDQMS, GBON, WIS2 ops etc.]

[Hassan > need to avoid "double counting" if data arrives twice.]

Kai >Sensor centers can be independently operated components - they don't need to be integrated into other components.  Currently the Sensor center s/w is packaged as a Docker container.  Support for TAC (FM12, FM13, FM35), BUFR, GRIB.

To do: define metrics format (JSON?), and topic structure.

 Roles:

  • Sensor center - at least one per region, sharing their perspective on what they're seeing in terms of data sharing (don't need this from _every_ center)

  • Monitoring center - (at least one) to provide the dashboard

Enrico > notes that we need enough Sensor centers to make sure that data from every WMC - this is a minimum requirement. WDQMS looks at data passing through data assimilation, not necessarily the data being published on the GTS (or WIS2) [?]

Peiliang > I think a few typical sensor centers at small met services are needed. We should be able to know: is that small met service getting a reasonable portion of globally exchanged data?

Remy > ECMWF is looking at data which can be ingested into their data assimilation - this is not the same as looking at all the data that is available to them. It also doesn't matter where the data came from - we just need to know what data is available at that location.

 [Kai continues his presentation]

 Services monitoring.

Propose to monitor the health of components at NC/DCPC as well as Global "shared services" components. All WIS Centers need to run "prometheus exporters" - so need to add this to the WIS2BOX project.

Everything we need already has Prometheus Exporters:

  • mqtt, http(s), ssh, nginx and python client to write your own.

  • The prometheus exporters include a HTTP server which enables the "exposition documents" to be gathered by the monitoring centre and those metrics included into the dashboard

Timo > What are the business requirements for reporting? Non-functional (e.g. service performance) vs. business intelligence (e.g. this much data arrived here). Proposal: map the requirements for monitoring to the metrics we might collect. We should avoid generating ad-hoc solutions [or a cottage industry of reporting metrics that aren't very useful

Kenji > The purpose of monitoring is to ensure performance is visible via the dashboard, but who monitors the dashboard; what do we do if something goes green to red?

Remy > appreciates Timo's comments; would like to assess whether the WDQMS monitoring requirement can be covered by openmetrics . Need to see if openmetrics is suitable. Also, do we need every center to run Prometheus Exporters? Not sure. I would prefer to have some neutral party to assess whether a NC's HTTP server is available (for example). I suggest that we have 3rd party as Sensor center to monitor availability of a service, and then providing (exporting) metrics based on this.

Answering Kenji - it depends on the topic:

  • if a Global Broker is not performing, then we kick it out

  • if a NC isn't meeting GBON requirements, then maybe the SOFF funding stops?

  • it's not for us to say what happens

  • it's for us to collect the metrics to inform those decisions

Kai > having a view at all Centers would be additional to the external service availability check; internal metrics could be collected

Remy > yes - we could add this to WIS2BOX implementation, but do we want to add this to the Technical Regulations - mandatory for all Centers? I'm reluctant to do this.

Jeremy > So where metrics are available at NC/DCPC, these _should_ be made available in the standard form, but provision of metrics is not mandatory. Technical Regulations will include requirements for Sensor centers. Both data and service availability types; at a minimum in terms of how they expose metrics to the Global Monitoring dashboard(s) [the definition of the metrics themselves may be part of other WMO manuals?]

Enrico > This kind of monitoring will help us diagnose the problem when there is a data delivery issue. I don't think that we need someone to be sitting in front of the dashboard at all times

Hassan > We need the monitoring to follow up issues with data supply, also, we could use the monitoring to send messages to "poor performing" centers so they become aware of the issue and resolve.

Jeremy > for Global Broker etc. we need an SLA and escalation route for resolving performance issues

Kai > I will investigate the openmetrics format; assess against WDQMS requirements

Timo > already started looking at high-level requirements - started on "service monitoring", but not got so far on the "business intelligence" stuff … happy to support Kai.

Jeremy > timescale of initial assessment? [Kai, Timo] 3-4 weeks

Kenji > in addition to observation data, we also need to consider [NWP] data products; some manuals identify mandatory products, e.g. GDPFS manual, do we need to monitor products; is this in scope of WIS2?

Jeremy > WIS2 provides the "plumbing" for data, but doesn't define which data needs to be shared. WIS2 provides the "plumbing" for capturing [performance] metrics, but we don't define which metrics are needed - this is the responsibility for the other programmes/activity.

Remy > I like this approach of WIS2 providing the foundation that others can build on.

Action items

Kai (together with Timo) to prepare the requirements assessment for WIS2 monitoring (functional and non-functional metrics) in 3-4 weeks
Tom to provide a presentation for next Monday on metadata and catalog

Decision

  1. A global monitoring dashboard for a real-time view
  2. Where metrics are available at NC/DCPC, these _should_ be made available in the standard form, but a provision of metrics is not mandatory. Technical Regulations will include requirements for Sensor centers. Both data and service availability types; at a minimum in terms of how they expose metrics to the Global Monitoring dashboard(s) [the definition of the metrics themselves may be part of other WMO manuals?
  3. WIS2 provides the "plumbing" for data but doesn't define which data need to be shared. WIS2 provides the "plumbing" for capturing [performance] metrics, but we don't define which metrics are needed - this is the responsibility for the other programs/activity

 

Â