Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

\uD83D\uDDE3 Discussion topics

No

Notes

1

Agree on the agenda today

(metrics, roles responsibility, wis2node registry)

2

Metrics hierarchy https://github.com/wmo-im/wis2-metric-hierarchy/blob/main/metric-hierarchy/gb.csv

  • Global Broker Metrics

    • wmo_wis2_gb_messages_received_total:

(Rémy: shows the current disconnected wis2node, Argentina/ar-smn, Cuba/cu-insmet, Indonesia/id-bmkg, Sweden/se-smhi, Trinidad and Tobago/tt-)

To decide the duration of the disconnected for the GB to raise local alerts, e.g. 10m; expression; the lables (severity); annotations

To decide the workflow of the alerting mechanism, e.g. local alerting -- global alerting

metrics name: 6 metrics can be found here. https://github.com/wmo-im/wis2-metric-hierarchy/blob/main/metric-hierarchy/gb.csv

There could be sensor centres creating local and global alerts.

image-20240226-142952.pngImage Added

Rémy shares the alertmanager https://blog.ruanbekker.com/cheatsheets/alertmanager/

  • 3 levels:

    • level 1: one GB not connecting for some time

    • level 2: all GS reporting same issues for some time

    • level 3: all GS reporting same issues for a longer time, action: to raise a ticket automatically

  • metric: timestamp

    • expression:

  • Question (Jeremy): to differentiate the channels of data or metadata?

    • (Rémy) more on GC side, not on GB side

3

  • Global Cache Metrics https://github.com/wmo-im/wis2-metric-hierarchy/blob/main/metric-hierarchy/gc.csv

    • GC (DWD) is the only one doing the metrics

    • centre-id and hostname (scenario, not connection from data centre but from other global cache)

  • Discussion

    • (Jeremy) GB and GC validate the notification message,

    • (Maaike) any metrics for the situation - if GC fails the connection with GB after several tries

      • decision: no, but sgc metrics may include this. Principle: to keep the number of metrics for GB and GC at a minimum level.

    • (Anna) GC, record the statistics of data cached or not cached?

      • (Rémy) Such statistics is not useful for WIS2 operation. But sensor centres can do so.

      • added two metrics: gc_cache_override_total, gc_integrity_failed_total

      • Action: Anna to raise a ticket to WIS2 Guide to update the WIS2 metrics

    • (Jeremy) metrics for gc to record who download the data (question)

    • (Rémy) to test if fake messages are sent to the system (stress test in May)

    • Open Metrics end point, GC to run prometheus

      • Currently, DWD doesn’t have all the metrics open.

4

  • Global Discovery Catalogue Metrics

(to be discussed next time)

5

  • GM metrics

(Kai) to create a gm.csv, including wmo_wis2_gm_metrics_server_last_download, wmo_wis2_gm_metrics_server_status

6

centre_id for all Global Services

(Jeremy) action: to come up with a Global Cache name for US-UK co-jointed GC.

✅ Action items

  •  Hassan to send out a notification to GS operators, informing them to provide the metrics by the first of May

Next meeting