mvcc: negative watcher count gauges #19577

kjgorman · 2025-03-12T12:14:10Z

Bug report criteria

This bug report is not security related, security issues should be disclosed privately via etcd maintainers.
This is not a support request or question, support requests or questions should be raised in the etcd discussion forums.
You have read the etcd bug reporting guidelines.
Existing open issues along with etcd frequently asked questions have been checked and this is not a duplicate.

What happened?

We observe a slow negative trend on the etcd_debugging_mvcc_watcher_total gauge—implying there are unpaired increment/decrement operations:

What did you expect to happen?

Logically the active watcher gauge should never be negative.

How can we reproduce it (as minimally and precisely as possible)?

I think this issue is very closely related to this PR which previously attempted to address negative watch counts from double decrements—but covers specifically the case when wa.compacted is true, and there is a cancel/close race. It's not so easy to reproduce as it is a race condition I believe, but I think you can somewhat provoke the scenario by contriving a unit test.

Consider for example adding this test to mvcc/watcher_test.go:

func TestDoubledDecrementWithCompactedWatcher(t *testing.T) {
	b, _ := betesting.NewDefaultTmpBackend(t)
	s := New(zaptest.NewLogger(t), b, &lease.FakeLessor{}, StoreConfig{})
	defer cleanup(s, b)

	w := s.NewWatchStream()

	wid, err := w.Watch(0, []byte("foo"), nil, 0)
	if err != nil {
		t.Fatalf("failed to watch: %v", err)
	}

	// pretend this is now compacted
	for id, wa := range w.(*watchStream).watchers {
		if wid == id {
			wa.compacted = true
		}
	}

	wg := sync.WaitGroup{}
	wg.Add(2)
	go func() {
		w.Cancel(wid)
		wg.Done()
	}()

	time.Sleep(time.Millisecond * 50)

	go func() {
		w.Close()
		wg.Done()
	}()

	wg.Wait()

	t.Fatalf("print output")
}

Note

I provoke the wa.compacted state by just directly setting it here—I'm not sure if in practice the act of compaction might somehow change this scenario, but just reading the code I'm struggling to see how a double decrement could occur on any other branch, so am somewhat surmising this must be the state we're in.

and then to provoke the race add a delay in the unlocked portion of watcher.go's Cancel(id WatchID) method:

	ws.mu.Unlock()

	if !ok {
		return ErrWatcherNotExist
	}
+
+	time.Sleep(time.Second * 1)
+
	cancel()

	ws.mu.Lock()

Then what is possible to observe is that:

We start the Cancel method
We interleave the Close method with our specifically placed Sleeps...
Close will find a synced watcher, delete it, and decrement the counter in this branch
Cancel wakes up and resumes, and invokes the cancelFunc again
Now we encounter the wa.compacted branch and decrement again

It seems as though the wa.ch == nil case is designed to prevent re-processing a cancelled watcher, and both the if s.synced.delete and if s.unsynced.delete branches are "idempotent", as once the watch is deleted subsequent invocation return false. However, checking wa.compacted is not idempotent, and by being placed before the if wa.ch == nil case seems like it can be encountered on a subsequent invocation of cancelWatch.

Possibly a fix would be to move the "has this already been cancelled" check higher, e.g.:

                } else if s.synced.delete(wa) {
                        watcherGauge.Dec()
                        break
-               } else if wa.compacted {
-                       watcherGauge.Dec()
-                       break
                } else if wa.ch == nil {
                        // already canceled (e.g., cancel/close race)
                        break
+               } else if wa.compacted {
+                       watcherGauge.Dec()
+                       break
                }

this would avoid possibly performing the decrement twice.

Anything else we need to know?

No response

Etcd version (please run commands below)

/ # /usr/local/bin/etcd --version
etcd Version: 3.5.17
Git SHA: 507c0de87
Go Version: go1.23.4
Go OS/Arch: linux/amd64

/ # /usr/local/bin/etcdctl version
etcdctl version: 3.5.17
API version: 3.5

Etcd configuration (command line flags or environment variables)

No response

Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)

No response

Relevant log output

The text was updated successfully, but these errors were encountered:

ivanvc · 2025-03-13T18:24:50Z

Discussed during our bi-weekly triage meeting. Thanks, @kjgorman for raising this issue. This looks like a valid issue. Given that you have invested in tracing where the possible culprit is, and wrote a test case, would you be interested in opening a pull request addressing it? Thanks.

kjgorman · 2025-03-14T10:19:12Z

Thanks for the triage @ivanvc — I've raised a PR with the proposed tweak here: #19600

ivanvc · 2025-03-14T16:45:57Z

/assign @kjgorman

kjgorman added the type/bug label Mar 12, 2025

ivanvc added the stage/triaged label Mar 13, 2025

k8s-ci-robot assigned kjgorman Mar 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mvcc: negative watcher count gauges #19577

mvcc: negative watcher count gauges #19577

kjgorman commented Mar 12, 2025

ivanvc commented Mar 13, 2025

kjgorman commented Mar 14, 2025

ivanvc commented Mar 14, 2025

mvcc: negative watcher count gauges #19577

mvcc: negative watcher count gauges #19577

Comments

kjgorman commented Mar 12, 2025

Bug report criteria

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Etcd version (please run commands below)

Etcd configuration (command line flags or environment variables)

Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)

Relevant log output

ivanvc commented Mar 13, 2025

kjgorman commented Mar 14, 2025

ivanvc commented Mar 14, 2025