Requires https://github.com/matrix-org/gomatrixserverlib/pull/376
This has numerous upsides:
- Less type casting to `*Event` is required.
- Making Dendrite work with `PDU` interfaces means we can swap out Event
impls more easily.
- Tests which represent weird event shapes are easier to write.
Part of a series of refactors on GMSL.
Replaced with types.HeaderedEvent _for now_. In reality we want to move
them all to gmsl.Event and only use HeaderedEvent when we _need_ to
bundle the version/event ID with the event (seriailsation boundaries,
and even then only when we don't have the room version).
Requires https://github.com/matrix-org/gomatrixserverlib/pull/373
This should hopefully fix an entire class of problems where components
downstream from the roomserver (i.e. the sync API) could just lose a
whole bunch of state after a rewrite operation like a federated join.
The root of the bug is that we set `RewritesState` in the output event
which instructs downstream components to purge their copy of any room
state, but then didn't send the entire state snapshot in
`adds_state_event_ids` so the downstream state ends up being incomplete
as a result.
* Check state before event
* Tweaks
* Refactor a bit, include in output events
* Don't waste time if soft failed either
* Tweak control flow, comments, use GMSL history visibility type
Squashed commit of the following:
commit 2bd0daf4d61376d2dd56628eaff267b0bc63e116
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Wed Jun 1 09:55:54 2022 +0100
Revert resolving old extremities as well as new
This may no longer be needed with the new state fixes and probably just burns more CPU time than is strictly necessary.
* Fix bugs related to state resolution
* Clean up `resolve-state`
* Don't panic when entries can't be found
* Ensure we have state entries for the auth events
* Revert "Ensure we have state entries for the auth events"
This reverts commit 9b13b7ed37.
* Revert "Revert "Ensure we have state entries for the auth events""
This reverts commit d86db197e3.
* Fix bug
* Try that again
* Update gomatrixserverlib
* Remove recursion from `loadAuthEvents`
* Feed existing state into state res when calculating state from new extremities
* Remove duplicates
* Fix bug
* Sort and unique
* Update to matrix-org/gomatrixserverlib#308
* Trim the slice properly
* Update gomatrixserverlib again
* Update to matrix-org/gomatrixserverlib#308
* Don't send `adds_state_events` in roomserver output events anymore
* Set `omitempty` on some output fields that aren't always set
* Add `AddsState` helper function
* No-op if no added state event IDs
* Revert "No-op if no added state event IDs"
This reverts commit 71a0ef3df1.
* Revert "Add `AddsState` helper function"
This reverts commit c9fbe45475.
* Only add events to `add_state_events` that haven't already been sent to the roomserver output before
* Filter on event NIDs instead, hopefully bring joy to SQLite
* UnsentFilter, review comments
* Add transaction to all database tables in roomserver, rename latest events updater to room updater, use room updater for all RS input
* Better transaction management
* Tweak order
* Handle cases where the room does not exist
* Other fixes
* More tweaks
* Fill some gaps
* Fill in the gaps
* good lord it gets worse
* Don't roll back transactions when events rejected
* Pass through errors properly
* Fix bugs
* Fix incorrect error check
* Don't panic on nil txns
* Tweaks
* Hopefully fix panics for good in SQLite this time
* Fix rollback
* Minor bug fixes with latest event updater
* Some review comments
* Revert "Some review comments"
This reverts commit 0caf8cf53e.
* Fix a couple of bugs
* Clearer commit and rollback results
* Remove unnecessary prepares
* Put federation client functions into their own file
* Look for missing auth events in RS input
* Remove retrieveMissingAuthEvents from federation API
* Logging
* Sorta transplanted the code over
* Use event origin failing all else
* Don't get stuck on mutexes:
* Add verifier
* Don't mark state events with zero snapshot NID as not existing
* Check missing state if not an outlier before storing the event
* Reject instead of soft-fail, don't copy roominfo so much
* Use synchronous contexts, limit time to fetch missing events
* Clean up some commented out bits
* Simplify `/send` endpoint significantly
* Submit async
* Report errors on sending to RS input
* Set max payload in NATS to 16MB
* Tweak metrics
* Add `workerForRoom` for tidiness
* Try skipping unmarshalling errors for RespMissingEvents
* Track missing prev events separately to avoid calculating state when not possible
* Tweak logic around checking missing state
* Care about state when checking missing prev events
* Don't check missing state for create events
* Try that again
* Handle create events better
* Send create room events as new
* Use given event kind when sending auth/state events
* Revert "Use given event kind when sending auth/state events"
This reverts commit 089d64d271.
* Only search for missing prev events or state for new events
* Tweaks
* We only have missing prev if we don't supply state
* Room version tweaks
* Allow async inputs again
* Apply backpressure to consumers/synchronous requests to hopefully stop things being overwhelmed
* Set timeouts on roomserver input tasks (need to decide what timeout makes sense)
* Use work queue policy, deliver all on restart
* Reduce chance of duplicates being sent by NATS
* Limit the number of servers we attempt to reduce backpressure
* Some review comment fixes
* Tidy up a couple things
* Don't limit servers, randomise order using map
* Some context refactoring
* Update gmsl
* Don't resend create events
* Set stateIDs length correctly or else the roomserver thinks there are missing events when there aren't
* Exclude our own servername
* Try backing off servers
* Make excluding self behaviour optional
* Exclude self from g_m_e
* Update sytest-whitelist
* Update consumers for the roomserver output stream
* Remember to send outliers for state returned from /gme
* Make full HTTP tests less upsetti
* Remove 'If a device list update goes missing, the server resyncs on the next one' from the sytest blacklist
* Remove debugging test
* Fix blacklist again, remove unnecessary duplicate context
* Clearer contexts, don't use background in case there's something happening there
* Don't queue up events more than once in memory
* Correctly identify create events when checking for state
* Fill in gaps again in /gme code
* Remove `AuthEventIDs` from `InputRoomEvent`
* Remove stray field
Co-authored-by: Kegan Dougal <kegan@matrix.org>
* Light-weight checking of state changes when updating forward extremities
* Only do this for non-state events, since state events will always result in state change at extremities
* Set RewritesState once
* Check if any new state provided
* Obey rewritesState
* Don't nuke everything the sync API knows when purging state
* Fix panic from duplicate insert
* Consistency
* Use HasState
* Remove nolint
* Clean up joined rooms on state rewrite
* Add resolve-state helper
* Tweaks
* Refactor forward extremities, again
* Tweaks
* Minor optimisation
* Make path a bit clearer
* Only process state/membership if forward extremities have changed
* Usage comments in resolve-state
* Fix sqite locking bugs present on sytest
Comments do the explaining.
* Fix deadlock in sqlite mode
Caused by starting a writer whilst within a writer
* Only complain about invalid state deltas for non-overwrite events
* Do not re-process outlier unnecessarily
* Add KindOld
* Don't process latest events/memberships for old events
* Allow federationsender to ignore duplicate key entries when LatestEventIDs is duplicated by RS output events
* Signal to downstream components if an event has become a forward extremity
* Don't exclude from sync
* Soft-fail checks on KindNew
* Don't run the latest events updater at all for KindOld
* Don't make federation sender change after all
* Kind in federation sender join
* Don't send isForwardExtremity
* Fix syncapi
* Update comments
* Fix SendEventWithState
* Update sytest-whitelist
* Generate old output events
* Sync API consumes old room events
* Update comments
* Capture errors
* Don't request only state key tuples needed for auth (we end up discarding room state this way)
* QueryStateAfterEvent returns all state when no tuples supplied
* Resolve state
* Comments
* Deep forward extremity calculation
* Use updater txn
* Update error
* Update error
* Create previous event references in StoreEvent
* Use latest events updater to row-lock prev events
* Fix unexpected fallthrough
* Fix deadlock
* Don't roll back
* Update comments in calculateLatest
* Don't include events that we can't find references for in the forward extremities
* Add another passing test
* Resolve state after event against current room state when determining latest state changes
* Update sytest-whitelist
* Update sytest-whitelist, blacklist