2024-03-25 WIS2 Monitoring and ET-W2AT Meeting

 Date

Mar 25, 2024 13:00-15:00 UTC

 Participants

ET-W2AT

  • Rémy GIRAUD

  • Jeremy TANDY

  • Tom KRALIDIS

Experts

  • Kai Wirt

  • Chems eddine ELGARRAI

  • Max Marno (Synoptic)

  • Lei XUE (CMA)

  • Wenjing GU (CMA)

WMO Secretariat

  • Hassan Haddouch

  • Maaike Limper

  • @Xiaoxia Chen

  • Anna Milan

 Discussion topics

No

Notes

No

Notes

1

Rémy gave a brief introduction on monitoring architecture and metrics agreed, alerts.

(XUE Lei) Question 1: which protocol is used for metrics collection? (HTTPS)

Question 2: alert message is one of the special notification message.

AlertManager is open source solution, using webhook to generate notification message.

Q3: Grafana to develop the dashboard? (YES) Promethus is scraping the metrics database.

In CMA, metrics generated by GTS or data platform and send to the monitor, client active, monitor passive. Opposite approach.

2

Alerts https://github.com/wmo-im/wis2-metric-hierarchy/tree/main/alerts

GM implements these alerts.

Three alert levels: Warning-error-critical. Followed by corresponding actions.

Alerts timeout? (to be discussed later)

(Jeremy) GM raise a ticket in case of critical alerts. Deconflict multiple tickets by GMs?

To agree on type and data in event

(XUE) we are preparing the metrics of Global Services.

(Wenjing) to share the current metrics provided by CMA.

 

image-20240325-144211.png

(Tom) shares the endpoint for GDC metrics (ECCC): https://gdc.wis2dev.io/wis2-gdc-metrics.txt

 

 

3

(Tom) introduces the gdc metrics https://github.com/wmo-im/wis2-metric-hierarchy/blob/main/metrics/gdc.csv

(XUE) Q1: wmo_wis2_gdc_downloaded_errors_total

Q2: archive data? make the URL to link to zip file of archived metadata. https://gdc.wis2dev.io/collections/wis2-discovery-metadata?f=json

(Rémy) 11 and 12 to be discussed, they are more about the quality assessment

4

(Jeremy) shows WIS 2.0 Monitoring management hierarchy in image

Global Services may publish the alerts, but not mandatory

5

Schema for notification messages

(Jeremy) do we want one schema or separate multiple simpler schemas for each purposes?

  • (Tom) starts with one and extends as needed

Data: level and text message/description? (Rémy) proposes using annotation & summary instead of text.

6

Review the current progress of GM work:

  • monitor topic hierarchy (agreed)

  • access control (not yet decided)

  • add centre-id into data as one property in the event (Rémy and Jeremy to look into this)

  • roles and responsibilities

    • WIS2 Node: no responsibility

    • Global Services: provide metrics

    • Global Monitor: collect metrics, raise alerts

    • GISC: Jira tickets, working with AoR WIS2 Nodes

    • Secretariat: follow up on the above-mentioned roles doing their job ( Enrico: Secretariat is running Jira for Regional WIGOS centres on Azure infrastructure, cloud service). (Expected to be available by September.)

      • (Jeremy) for GM operators to develop the ticketing system

      • (Enrico) if Jira procurement is too long, we tend to have temporary solution running on Azure

      • (Rémy) due to the time constraints, need to request the tender to create a ticket automatically.

      • (Enrico) we are aiming to have Jira.

 

 

 Action items

Next meeting

Â