Skip to end of metadata
Go to start of metadata

You are viewing an old version of this content. View the current version.

Compare with Current View Version History

« Previous Version 6 Next »

Date

13:00-15:00 UTC

\uD83D\uDC65 Invited Participants

  • Jeremy TANDY (ET-W2IT Chair)

  • Rémy GIRAUD (SC-IMT Chair)

  • Hyumin EOM (KMA)

  • Masato FUJIMOTO (JMA)

  • Kari Sheets (NOAA)

  • Steve Olson (NOAA)

  • Max Marno (Synoptic)

  • José Mauro(INMET)

  • Tom Kralidis (ECCC)

  • Kai Wirt-Thorsten (DWD)

  • Elena Arenskotter (DWD)

  • Saad Mohammed Almajnooni

  • Majed Mahjoub (NCM)

  • Chems eddine ELGARRAI (DGM)

  • Lei XUE (CMA)

  • Wenjing GU (CMA)

  • Xinqiang HAN (CMA)

WMO Secretariat

  • Enrico Fucile

  • Hassan Haddouch

  • Xiaoxia Chen

  • David Inglis Berry

  • Anna Milan

  • Timo Proescholdt

Apologies

  • Ping GUO (CMA)

  • Yoritsugi YUGE (JMA)

Meeting Note

  1. Assessment of metrics implementation in Global Services

  2. Agreement on consistent metrics behavior in Global Services

  3. validation of discovery metadata at Global Discovery Catalogue; implementation of Global Broker “discard”

WIS2 Global Services metrics

Commentary based on analysis performed by WMO Secretariat (Maaike), 1-hour period on 20-Feb-2025

Key outcomes from discussion: agree what the correct reporting is for each metric

  • Connected: 0 = not connected, 1 = connected, report 0 if you cannot connected (null / no data if you’ve never tried to connect)

  • Numbers: _after_ de-duplication, including errors or not?

This is needed to provide baseline information to assess whether Global Service behaviour is consistent.

Metrics can be used to understand WIS2 performance in a “stepwise” fashion - i.e., start with “connection”, then look at the numbers of messages/data-objects.

(Rémy) it is good to have a sensor centre. Do you have the broker of GM for the GB to subscribe the notification.

(Action) GM (Morocco) and GM (China) to share the GM broker endpoint with the Secretariat.

Global Monitor:

Looking at the metrics available from ma-meteomaroc-global-monitor there are gaps in the metrics from 18h-20h UTC, 19-Feb-2025.

This was due to an outage of the Global Monitor. Which happens … 

But: we’re not currently monitoring the Global Monitors so we cannot easily identify if “gaps” are due to an upstream Global Service not reporting data, or whether the GM was offline and didn’t scrape the metrics every 15-seconds.

** Recommendation: we monitor the Global Monitors

… should be able to see this from wmo_wis2_gb_connected_flag - but neither ma-meteomaroc-global-monitor nor cn-cma-global-monitor appear in the metrics for Global Brokers (below).

… do we need a dashboard indicating “Global Services available now” (red/green traffic light) etc. - based on connection to HTTP or MQTT end-points

Global Broker metrics:

gb_metrics_analysis.xlsx

Nomenclature:

  • BR-GB = br-inmet-global-broker

  • CN-GB = cn-cma-global-broker

  • FR-GB = fr-meteofrance-global-broker

  • US-GB = us-noaa-global-broker

Sheet #1: wmo_wis2_gb_connected_flag

Values are the “highest” value recorded during the hour (i.e., if “1” was recorded at any point during the hour, the value is “1”)

First challenge: getting consistent metrics on _connections_

Important so that we can trigger warnings/alerts from Global Monitors - if multiple GBs cannot connect to a WIS2 Node (or Global Service) this increases the likelihood that the centre is offline rather than there being a point-to-point connection issue.

  1. [lines 6, 8, 92] US-GB does not report a metric on WIS2 Node (au-bom, bf-anam, za-weathersa) that is known to be offline - all other GB report “0” (not connected)

Action: Steve to follow up on this.

  1. [lines 10, 15, etc.] CN-GB does not appear to report metrics on other Global Services

Action: Lei to follow up on this.

  1. [lines 17, 18, 23, 48, 62] CN-GB cannot connect to WIS2 Nodes from Chile (cl-meteochile), Cameroon (cm-meteocameroon), Cuba (cu-insmet), Italy (it-meteoam), Morocco (ma-marocmeteo) - confirm?

Action: the Secretariat to resend the IP addresses of GB(China) and GB (China) need to check the user and password of these WIS2 nodes, e.g Morocco.

Action:

  1. [line 35] Only BR-GB appears to be subscribing to FR-GB (specifically, only BR-GB is reporting being connected) - more connections expected (albeit CN-GB probably not reporting its connection)

  1. [line 36] US-GB is using the wrong centre-id for FR-GB (which probably also explains why it looks like there are so few connections to FR-GB - see above)

  2. [line 41, 47] Only FR-GB connected to WIS2 Nodes hk-hko-swic and ir-irimo - recommended to have at least 2 subscriptions

Action: hk-hko-swic, Secretariat to check the right IP address of GB (Brazil) and share with HKO colleague and also check the Iran colleague for their access control

  1. [lines 46, 47, 48] US-GB is not reporting connection to EUMETSAT, Iran, and Italy - is this because US-GB never attempts to establish a connection?

Action: Secretariat: to share GB (US) the credentials and the Italy to check their whitelist of GB (US)

  1. [line 53] US-GB reports connection to ke-kmd which isn’t in the WIS2 Registry - what’s happening here?

  2. [line 59] Only US-GB reports connection to WIS2 Node from Kazakhstan (kz-kazhydromet) - confirm?

  3. [lines 75, 76] Confusion on centre-id for Sint Maarten (sx-met or sx-metservice)

Sheet #2: wmo_wis2_gb_msg_received_total

Values are the _increase_ in numbers of messages received during the hour

Second challenge: diagnosing whether each GB is handling (roughly) the same number of messages

Important so that we can see if messages are getting lost, which would mean that subscribers to such a GB would be under-served

  1. [summary] FR-GB and BR-GB report roughly similar numbers (they’re running the same software!), CN-GB reports roughly 20x more messages, 3000-3500x more messages 

Action: Steve and Marc to follow up on this 

  1. [line 8] CN-GB reports receiving 25k messages from Burkina Faso while that WIS2 Node is known to be offline (and CN-GB reports no connection)

Action: Lei to follow up on this

Decision: no need to run the function test. To select 3 or 4 centres to monitor the all 4 GBs if they behave consistently. Each GB to test the metrics. For GB (France), two weeks of archiving of metrics. For GM, is it two weeks enough. (Hassan) how to report the Global Services, we consider using the Jira system to provide statistics of GS performance.

How long does GM (Morocco) retain the metrics and how much data space it takes? For GM (China), the retention period is one month for the metrics. (Action) Lei to report the size of the retention. (GM) Morocco, 15 days for the retention of metrics. 50GB a week. frequency of scraping is 10 seconds. (Action) Jeremy and Rémy to discuss the report mechanism.

(Rémy) we need to collectively define the KPI of global services performance to be presented at INFCOM-4.

Sheet #3: wmo_wis2_gb_msg_no_metadata_tot

(Similar to sheet #2)

[lines 14, 17, 24, 25, etc.] CN-GB is reporting no metadata when the other GBs are not - is the metadata validation only enabled for CN-GB?

[line 82] CN-GB reports no missing metadata from UK, BR-GB and FR-GB report errors, US-GB doesn’t report

Global Cache metrics

gc_metrics_analysis.xlsx

Nomenclature:

  • CN-GC = cn-cma-global-cache

  • UK/USA-GC = data-metoffice-noaa-global-cache

  • DE-GC = de-dwd-global-cache

  • JP-GC = jp-jma-global-cache

  • KR-GC ‎ =  kr-kma-global-cache

UK/USA-GC is still reporting metrics for the 100+ test nodes used in the Global Services testing - these are excluded from the analysis for clarity. But - these “old” centre-ids must be removed.

Update: Max fixed it.

Sheet #1: wmo_wis2_gb_connected_flag

  1. [line 3, 4] JP-GC reports no metrics for Antigua or ai-metservice (?) even though a small number of messages are sent

  1. [line 5] DE-GC reports no metric for Argentina even though messages are being sent according to GB metrics; from looking at Grafana DE-GC appears to have gaps in provision of metrics 

  1. [lines 17, 28, etc.] DE-GC and JP-GC often don’t report a metric (no value available, e.g., Cameroon, Guinea) where as other GCs are able to connect; GB metrics indicate that Cameroon and Guinea are connected but (excepting US-GB) not sending any messages. How are CN-GC, UK/USA-GC and KR-GC reporting “connected” when there probably wasn’t anything to download?

  2. [line 21] UK/USA-GC can’t connect to Cyprus - confirm?

Note: “last-download” timestamp - if nothing has ever been downloaded for a given data server, the value will be null (not reported)

… Generally, metrics are only set once something has been tried. They are (mostly?) not initialised. CN-GC appears to initialise _download_total to zero even when there’s been no connection. Confirm?

Decision: we need to agree on the consistent metrics considering current DWD and Uk&USA implement differently.

Action: Jeremy, Rémy, Kai and Max to have a meeting this week to discuss the GC metrics.

Sheet #2: wmo_wis2_gc_download_total

  1. [summary] Generally, CN-GC, UK/USA-GC and KR-GC tend to agree on numbers; DE-GC and JP-GC appear different.

  2. [line 15] ca-eccc-msc-global-discovery-catalogue published ~30 cacheable data objects in the hour - what’s being published? (Metadata tarballs are once per day)

  3. [line 20] GCs are not reporting a metric for cn-cma-global-discovery-catalogue - suggesting there has never been a download from the CN-GDC; is CN-GDC providing daily tarballs of the metadata records?

Action: pause for discussion

Sheet #3: wmo_wis2_gc_download_errors_total

  1. [line 14] GC downloads from ca-eccc-msc: CN-GC and UK/USA-GC report ~200 errors, KR-GC reports 13k errors, DE-GC reports 172k errors, JP-GC reports 550k errors … what is happening?

  2. [line 24] CN-GC, UK/USA-GC and KR-GC all report broadly consistent total-download from de-dwd-gts-to-wis2 (~500k message), CN-GC and UK/USA-GC report ~300 errors, but KR-GC reports 147k errors … does the _total_downloads include _error_downloads?

Action: pause for discussion

Global Discovery Catalogue metrics

gdc_metrics_analysis.xlsx

Global Discovery Catalogues (GDC) also appear to suffer from inconsistent metrics implementation.

But GDCs are not part of the data-exchange operations (mostly, the metrics are about metadata quality)

… so recommend that we prioritise getting metrics for GB and GC consistent first.

Topic 3:

(Rémy) currently, the properties of metadata is not mandatory. During ET-WISOP kick-off meeting, we notified that we will enable metadata validation at 1st September. Two different datasets having one set of metadata is enough. Before 2027, to use the channel in the topic hierarchy.

(Kai) we should enforce the metadata in WIS2 exchange. The problem is if there is something wrong in GDC, it will break data exchange. Canada GDC and Germany GDC, enforce the check. GDC got the single failure. GM, data from DWD is missing but we don’t update any metadata.

(Jeremy) the first part, because GB will need to look through GDC, if a metadata record is broken (or if the metadata fails validation by a GDC), the channel will be parsed down by GB. Then the data will be missing.

(Rémy) we cannot use metadata id on 1st September 2025. A risk one GDC having problems may impact data sharing. To improve the reliability, we need to be loose than to be too strict.

Actions

  1. GM (Morocco) and GM (China) to share the GM broker endpoint with the Secretariat.

  2. Kari and Steve to follow up to make sure more consistent GB (US) metrics reports

  3. All to go through the list shared in the email to have a consistent metrics

Next meeting

10 March

  • No labels