2022-02-21 ET-W2AT Meeting
Date
Feb 21, 2022 14:30-16:30 UTC
Participants
ET-W2AT
@Jeremy Tandy (Unlicensed)
@Rémy Giraud
@Dana Ostrenga (Unlicensed) (absent)
@thorsten.buesselberg (Unlicensed)
@Kai Wirt (Unlicensed)
@Henning Weber (Unlicensed)
@Tom Kralidis (Unlicensed) (absent)
@peter.silva (Unlicensed)
@Kenji Tsunoda (Unlicensed)
@Li Xiang (Unlicensed) (absent)
@Baudouin Raoult (Unlicensed) (absent)
WMO Secretariat
@Peiliang Shi (Unlicensed)
@Enrico Fucile
@HADDOUCH Hassan
@Timo Proescholdt
@David Berry
@Anna Milan
@Xiaoxia Chen
Goals
To discuss WIS2 Monitoring
Discussion topics
Item | Presenter | Notes |
---|---|---|
1 | Rémy | Rémy introduces the topic and presents his slides on Monitoring Toward an Open Standard Approach
1. standard way of describing metrics; "exposition format"
2. storing data in a time-series database 3. exporting the data to dashboard like Grafana We can't say "everyone must use Prometheus", but we can assert the standards to use: the metrics document What should the metrics standard include? Encoding, data model for describing metrics, the method to present the metrics (e.g. HTTPS or MQTT) etc. Remy > objective is to have one method for collecting metrics - not ad-hoc methods for different types of metrics. Push for a standard solution. The metrics themselves will be different across the application domains - but we want the same mechanism for gathering and describing those metrics.  So, we can develop specifications for particular "exposition documents", agreeing a standard for the metric format for WMO application domain, e.g. WDQMS, and for specific domain applications, we develop mechanism to publish those documents  Jeremy >Do we need a Global Monitoring Centre to provide a (real-time) dashboard view of WIS2 (etc.)? Remy > Let's first agree how we collect the metrics; we know that we'll need something for GBON, but we don't yet know what's needed for WIS2 operations , let's build toward that goal [Kai > But there is a requirement for a Global Coordinate to collect the metrics and do something useful with them. We will need to agree what we're presenting on the dashboard - and how we're presenting it] [Remy > We shouldn't try to standardise the UX for the dashboard - only the information presented, but for each "mission" (like GBON or WDQMS) there should be a reference implementation, and then others could look different - e.g. in different languages] [David > I keep finding myself coming back to the Ocean-Ops dashboard which has proved very useful for the ocean observing community (https://www.ocean-ops.org/board To be discussed:
|
2 | Kai | Kai introduce his slides on WIS Monitoring as a lead of TT-WISMon Two types of monitoring:
 TT-WISMon provides software to "Sensor centres" to decode incoming data streams (weather and climate data from GTS or WIS2) and generate metrics. Then send these metrics to the [Global] WIS Monitoring centre via pub/sub.  Jeremy> Is "Sensor center" a new role in WIS2? [Kai > for services monitoring, propose that all centers are Sensor centers - but only need some Sensor centers looking at data availability and exposing the related metrics] [Remy > Is this just for WDQMS; improving the monitoring by establishing additional monitoring point] [Kai > The role of a Sensor center is to share their view of the world - be that data or service availability] [Remy > so a Sensor center might affiliate(?) to a particular type of monitoring - e.g. WDQMS, GBON, WIS2 ops etc.] [Hassan > need to avoid "double counting" if data arrives twice.] Kai >Sensor centers can be independently operated components - they don't need to be integrated into other components.  Currently the Sensor center s/w is packaged as a Docker container.  Support for TAC (FM12, FM13, FM35), BUFR, GRIB. To do: define metrics format (JSON?), and topic structure.  Roles:
Enrico > notes that we need enough Sensor centers to make sure that data from every WMC - this is a minimum requirement. WDQMS looks at data passing through data assimilation, not necessarily the data being published on the GTS (or WIS2) [?] Peiliang > I think a few typical sensor centers at small met services are needed. We should be able to know: is that small met service getting a reasonable portion of globally exchanged data? Remy > ECMWF is looking at data which can be ingested into their data assimilation - this is not the same as looking at all the data that is available to them. It also doesn't matter where the data came from - we just need to know what data is available at that location. Â [Kai continues his presentation] Â Services monitoring. Propose to monitor the health of components at NC/DCPC as well as Global "shared services" components. All WIS Centers need to run "prometheus exporters" - so need to add this to the WIS2BOX project. Everything we need already has Prometheus Exporters:
Timo > What are the business requirements for reporting? Non-functional (e.g. service performance) vs. business intelligence (e.g. this much data arrived here). Proposal: map the requirements for monitoring to the metrics we might collect. We should avoid generating ad-hoc solutions [or a cottage industry of reporting metrics that aren't very useful Kenji > The purpose of monitoring is to ensure performance is visible via the dashboard, but who monitors the dashboard; what do we do if something goes green to red? Remy > appreciates Timo's comments; would like to assess whether the WDQMS monitoring requirement can be covered by openmetrics . Need to see if openmetrics is suitable. Also, do we need every center to run Prometheus Exporters? Not sure. I would prefer to have some neutral party to assess whether a NC's HTTP server is available (for example). I suggest that we have 3rd party as Sensor center to monitor availability of a service, and then providing (exporting) metrics based on this. Answering Kenji - it depends on the topic:
Kai > having a view at all Centers would be additional to the external service availability check; internal metrics could be collected Remy > yes - we could add this to WIS2BOX implementation, but do we want to add this to the Technical Regulations - mandatory for all Centers? I'm reluctant to do this. Jeremy > So where metrics are available at NC/DCPC, these _should_ be made available in the standard form, but provision of metrics is not mandatory. Technical Regulations will include requirements for Sensor centers. Both data and service availability types; at a minimum in terms of how they expose metrics to the Global Monitoring dashboard(s) [the definition of the metrics themselves may be part of other WMO manuals?] Enrico > This kind of monitoring will help us diagnose the problem when there is a data delivery issue. I don't think that we need someone to be sitting in front of the dashboard at all times Hassan > We need the monitoring to follow up issues with data supply, also, we could use the monitoring to send messages to "poor performing" centers so they become aware of the issue and resolve. Jeremy > for Global Broker etc. we need an SLA and escalation route for resolving performance issues Kai > I will investigate the openmetrics format; assess against WDQMS requirements Timo > already started looking at high-level requirements - started on "service monitoring", but not got so far on the "business intelligence" stuff … happy to support Kai. Jeremy > timescale of initial assessment? [Kai, Timo] 3-4 weeks Kenji > in addition to observation data, we also need to consider [NWP] data products; some manuals identify mandatory products, e.g. GDPFS manual, do we need to monitor products; is this in scope of WIS2? Jeremy > WIS2 provides the "plumbing" for data, but doesn't define which data needs to be shared. WIS2 provides the "plumbing" for capturing [performance] metrics, but we don't define which metrics are needed - this is the responsibility for the other programmes/activity. Remy > I like this approach of WIS2 providing the foundation that others can build on. |
Action items
Decision
- A global monitoring dashboard for a real-time view
- Where metrics are available at NC/DCPC, these _should_ be made available in the standard form, but a provision of metrics is not mandatory. Technical Regulations will include requirements for Sensor centers. Both data and service availability types; at a minimum in terms of how they expose metrics to the Global Monitoring dashboard(s) [the definition of the metrics themselves may be part of other WMO manuals?
- WIS2 provides the "plumbing" for data but doesn't define which data need to be shared. WIS2 provides the "plumbing" for capturing [performance] metrics, but we don't define which metrics are needed - this is the responsibility for the other programs/activity
Â
Â