Fix CPU and database spinning when retrying sending events to servers whilst at the same time purging those events. (#18499)

Fixes: #18491 Fix hotlooping due to skipped PDUs if there is still no progress to be made. This could bite if the event was purged since being skipped during catch-up. Signed-off-by: Olivier 'reivilibre <oliverw@matrix.org>
2025-12-05 01:10:13 +00:00 · 2025-07-15 11:01:41 +00:00
parent 945e22303c
commit 97d2738eef
2 changed files with 14 additions and 1 deletions
--- a/changelog.d/18499.bugfix
+++ b/changelog.d/18499.bugfix
@@ -0,0 +1 @@
+Fix CPU and database spinning when retrying sending events to servers whilst at the same time purging those events.
--- a/synapse/federation/sender/per_destination_queue.py
+++ b/synapse/federation/sender/per_destination_queue.py
@@ -129,6 +129,8 @@ class PerDestinationQueue:

        # The stream_ordering of the most recent PDU that was discarded due to
        # being in catch-up mode.
+        # Can be set to zero if no PDU has been discarded since the last time
+        # we queried for new PDUs during catch-up.
        self._catchup_last_skipped: int = 0

        # Cache of the last successfully-transmitted stream ordering for this
@@ -462,8 +464,18 @@ class PerDestinationQueue:
                # of a race condition, so we check that no new events have been
                # skipped due to us being in catch-up mode

-                if self._catchup_last_skipped > last_successful_stream_ordering:
+                if (
+                    self._catchup_last_skipped != 0
+                    and self._catchup_last_skipped > last_successful_stream_ordering
+                ):
                    # another event has been skipped because we were in catch-up mode
+                    # As an exception to this case: we can hit this branch if the
+                    # room has been purged whilst we have been looping.
+                    # In that case we avoid hot-looping by resetting the 'catch-up skipped
+                    # PDU' flag.
+                    # Then if there is still no progress to be made at the next iteration,
+                    # we can exit catch-up mode.
+                    self._catchup_last_skipped = 0
                    continue

                # we are done catching up!
				`@@ -0,0 +1 @@`
				`Fix CPU and database spinning when retrying sending events to servers whilst at the same time purging those events.`