UI - display no of passed/skipped/failed Nemesis runs #597

mykaul · 2025-02-26T12:41:12Z

This is what I see today - pass or fail per run:

It'd be much more useful if I get a clearer, from the overview, picture - how many passed/failed/skipped. It'll allow me to weigh in what the importance of investigating a run.

Moreover - if I have the 1st Nemesis run, that would be even more helpful - if I see it's the same Nemesis, I may decide to de-prioritize the investigation.

fruch · 2025-02-26T16:09:40Z

I'm not sure I understand what you want here.

this panel was introduced for the sake of select a set of job for further investigation.

I don't see the point of putting more details into it.

mykaul · 2025-02-26T16:18:31Z

I'm not sure I understand what you want here.

this panel was introduced for the sake of select a set of job for further investigation.

I don't see the point of putting more details into it.

Since we are not doing a great job at investigating all failures, we need to prioritize. This (along with runtime information, btw), would help me do that:

I'd start with all failed executions that took 15 minutes or less - an infra / SCT issue most likely.
I'd continue with those that have a single failure.
I'll skip those that look similar (see in my screenshot above - 599-603 are all with the same exact failure)

fruch · 2025-02-26T20:57:50Z

I'm not sure I understand what you want here.

this panel was introduced for the sake of select a set of job for further investigation.

I don't see the point of putting more details into it.

Since we are not doing a great job at investigating all failures, we need to prioritize. This (along with runtime information, btw), would help me do that:

I'd start with all failed executions that took 15 minutes or less - an infra / SCT issue most likely.

You asked for that in a different issue, I don't see how it's relevant to what is asked in this issue

I'd continue with those that have a single failure.

In most cases you'll have more then one error event, I'm not sure how this metric help to triage.

I'll skip those that look similar (see in my screenshot above - 599-603 are all with the same exact failure)

As for look similar, we are working on something that can help classify events as happened in other runs, once it's operational, we might be able to show indications of it.

As for where those kind of things should be shown, I'm not sure, maybe a widget with table, would be better fit form this kind of requirement.

mykaul · 2025-02-27T07:44:13Z

I'm not sure I understand what you want here.
this panel was introduced for the sake of select a set of job for further investigation.
I don't see the point of putting more details into it.

Since we are not doing a great job at investigating all failures, we need to prioritize. This (along with runtime information, btw), would help me do that:

I'd start with all failed executions that took 15 minutes or less - an infra / SCT issue most likely.

You asked for that in a different issue, I don't see how it's relevant to what is asked in this issue

I did. It did not happen (yet?), so I'm asking for alternatives, which are not contradicting, btw.

I'd continue with those that have a single failure.

In most cases you'll have more then one error event, I'm not sure how this metric help to triage.

More than a single Nemesis failure?
Then show me the first. Or the last. Or just a number if it's >1. I'll know that there's more work to analyze that one, than others.

I'll skip those that look similar (see in my screenshot above - 599-603 are all with the same exact failure)

As for look similar, we are working on something that can help classify events as happened in other runs, once it's operational, we might be able to show indications of it.

That's great. I was hoping we can have AI helping us here.

As for where those kind of things should be shown, I'm not sure, maybe a widget with table, would be better fit form this kind of requirement.

Yes, I'm open for a better UI suggestions.

k0machi · 2025-02-27T11:47:55Z

I could experiment with both adding this information to cards and to the selector - We could fit simple counter: v 15 / x 3 for nemeses and a duration field: took 15 minutes. I think it should look nice and not clutter things too much.

mykaul added Argus enhancement New feature or request labels Feb 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UI - display no of passed/skipped/failed Nemesis runs #597

UI - display no of passed/skipped/failed Nemesis runs #597

mykaul commented Feb 26, 2025

fruch commented Feb 26, 2025

mykaul commented Feb 26, 2025

fruch commented Feb 26, 2025

mykaul commented Feb 27, 2025

k0machi commented Feb 27, 2025

UI - display no of passed/skipped/failed Nemesis runs #597

UI - display no of passed/skipped/failed Nemesis runs #597

Comments

mykaul commented Feb 26, 2025

fruch commented Feb 26, 2025

mykaul commented Feb 26, 2025

fruch commented Feb 26, 2025

mykaul commented Feb 27, 2025

k0machi commented Feb 27, 2025