* Membership updater refactoring
* Pass in membership state
* Use membership check rather than referring to state directly
* Delete irrelevant membership states
* We don't need the leave event after all
* Tweaks
* Put a log entry in that I might stand a chance of finding
* Be less panicky
* Tweak invite handling
* Don't freak if we can't find the event NID
* Use event NID from `types.Event`
* Clean up
* Better invite handling
* Placate the almighty linter
* Blacklist a Sytest which is otherwise fine under Complement for reasons I don't understand
* Fix the sytest after all (thanks @S7evinK for the spot)
* Try Ristretto cache
* Tweak
* It's beautiful
* Update GMSL
* More strict keyable interface
* Fix that some more
* Make less panicky
* Don't enforce mutability checks for now
* Determine mutability using deep equality
* Tweaks
* Namespace keys
* Make federation caches mutable
* Update cost estimation, add metric
* Update GMSL
* Estimate cost for metrics better
* Reduce counters a bit
* Try caching events
* Some guards
* Try again
* Try this
* Use separate caches for hopefully better hash distribution
* Fix bug with admitting events into cache
* Try to fix bugs
* Check nil
* Try that again
* Preserve order jeezo this is messy
* thanks VS Code for doing exactly the wrong thing
* Try this again
* Be more specific
* aaaaargh
* One more time
* That might be better
* Stronger sorting
* Cache expiries, async publishing of EDUs
* Put it back
* Use a shared cache again
* Cost estimation fixes
* Update ristretto
* Reduce counters a bit
* Clean up a bit
* Update GMSL
* 1GB
* Configurable cache sizees
* Tweaks
* Add `config.DataUnit` for specifying friendly cache sizes
* Various tweaks
* Update GMSL
* Add back some lazy loading caching
* Include key in cost
* Include key in cost
* Tweak max age handling, config key name
* Only register prometheus metrics if requested
* Review comments @S7evinK
* Don't return errors when creating caches (it is better just to crash since otherwise we'll `nil`-pointer exception everywhere)
* Review comments
* Update sample configs
* Update GHA Workflow
* Update Complement images to Go 1.18
* Remove the cache test from the federation API as we no longer guarantee immediate cache admission
* Don't check the caches in the renewal test
* Possibly fix the upgrade tests
* Update to matrix-org/gomatrixserverlib#322
* Update documentation to refer to Go 1.18
* Add `evacuateUser` endpoint, use it when deactivating accounts
* Populate the API
* Clean up user devices when deactivating
* Include invites, delete pushers
* Check state before event
* Tweaks
* Refactor a bit, include in output events
* Don't waste time if soft failed either
* Tweak control flow, comments, use GMSL history visibility type
* Ensure we check powerlevel/origin before redacting an event
* Add passing test
* Use pl.UserLevel
* Make check more readable, also check for the sender
Squashed commit of the following:
commit 7a1568c716866594af6d0b1d561c58c96de29b20
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Mon Jun 6 15:17:49 2022 +0100
Make errors more useful
commit 64befe7c9a901b00650442171660c2dc4ea575fa
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Mon Jun 6 15:02:40 2022 +0100
Tweak ordering a bit
Squashed commit of the following:
commit 2bd0daf4d61376d2dd56628eaff267b0bc63e116
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Wed Jun 1 09:55:54 2022 +0100
Revert resolving old extremities as well as new
This may no longer be needed with the new state fixes and probably just burns more CPU time than is strictly necessary.
* Fix bugs related to state resolution
* Clean up `resolve-state`
* Don't panic when entries can't be found
* Ensure we have state entries for the auth events
* Revert "Ensure we have state entries for the auth events"
This reverts commit 9b13b7ed37.
* Revert "Revert "Ensure we have state entries for the auth events""
This reverts commit d86db197e3.
* Fix bug
* Try that again
* Update gomatrixserverlib
* Remove recursion from `loadAuthEvents`
* Add `QueryRestrictedJoinAllowed`
* Add `Resident` flag to `QueryRestrictedJoinAllowedResponse`
* Check restricted joins on federation API
* Return `Restricted` to determine if the room was restricted or not
* Populate `AuthorisedVia` properly
* Sign the event on `/send_join`, return it in the `/send_join` response in the `"event"` key
* Kick back joins with invalid authorising user IDs, use event from `"event"` key if returned in `RespSendJoin`
* Use invite helper in `QueryRestrictedJoinAllowed`
* Only use users with the power to invite, change error bubbling a bit
* Placate the almighty linter
One day I will nuke `gocyclo` from orbit and everything in the world will be much better for it.
* Review comments
* Fix flakey sytest 'Local device key changes get to remote servers'
* Debug logs
* Remove internal/test and use /test only
Remove a lot of ancient code too.
* Use FederationRoomserverAPI in more places
* Use more interfaces in federationapi; begin adding regression test
* Linting
* Add regression test
* Unbreak tests
* ALL THE LOGS
* Fix a race condition which could cause events to not be sent to servers
If a new room event which rewrites state arrives, we remove all joined hosts
then re-calculate them. This wasn't done in a transaction so for a brief period
we would have no joined hosts. During this interim, key change events which arrive
would not be sent to destination servers. This would sporadically fail on sytest.
* Unbreak new tests
* Linting
* Add Room Aliases tests
* Add Rooms table test
* Move StateKeyTuplerSorter to the types package
* Add StateBlock tests
Some optimizations
* Add State Snapshot tests
Some optimization
* Return []int64 and convert to pq.Int64Array for postgres
* Move []types.EventNID back to rows.Next()
* Update tests, rename SelectRoomIDs
* Feed existing state into state res when calculating state from new extremities
* Remove duplicates
* Fix bug
* Sort and unique
* Update to matrix-org/gomatrixserverlib#308
* Trim the slice properly
* Update gomatrixserverlib again
* Update to matrix-org/gomatrixserverlib#308
* Don't ask roomserver for events we already have in federation API
* Check number of events returned is as expected
* Preallocate array
* Improve shape a bit
* Add EventJSONTable tests
* Add eventJSON tests
* Add EventStateKeysTable tests
* Add EventTypesTable tests
* Add Events Table tests
Move variable declaration outside loops
Switch to testify/assert for tests
* Move variable declaration outside loop
* Remove random data
* Fix issue where the EventReferenceSHA256 is not set
* Add more tests
* Revert "Fix issue where the EventReferenceSHA256 is not set"
This reverts commit 8ae34c4e5f.
* Update GMSL
* Add tests for duplicate entries
* Test what happens if we select non-existing NIDs
* Add test for non-existing eventType
* Really update GMSL
* tidy up interfaces
* remove unused GetCreatorIDForAlias
* Add RoomserverUserAPI interface
* Define more interfaces
* Use AppServiceInternalAPI for consistent naming
* clean up federationapi constructor a bit
* Fix monolith in -http mode
* Specify interfaces used by appservice, do half of clientapi
* convert more deps of clientapi to finer-grained interfaces
* Convert mediaapi and rest of clientapi
* Somehow this got missed
* syncapi: use finer-grained interfaces when making the syncapi
* Use specific interfaces for syncapi-roomserver interactions
* Define query access token api for shared http auth code
* Add new endpoint to allow admins to evacuate the local server from the room
* Guard endpoint
* Use right prefix
* Auth API
* More useful return error rather than a panic
* More useful return value again
* Update the path
* Try using inputer instead
* oh provide the config
* Try that again
* Return affected user IDs
* Don't create so many forward extremities
* Add missing `Path` to name
Co-authored-by: Till <2353100+S7evinK@users.noreply.github.com>
* Add response size and requests total to internal handler
* Move MustRegister calls to New* funcs
* Move MustRegister back to init
* Init at some place, minimize changes
* Add test infrastructure code for dendrite unit/integ tests
Start re-enabling some syncapi storage tests in the process.
* Linting
* Add postgres service to unit tests
* dendrite not syncv3
* Skip test which doesn't work
* Linting
* Add `jetstream.PrepareForTests`
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
* Added /upgrade endpoint
* fix
* Fix lints
* More lint lifex
* Move room upgrading to the roomserver
* Remove extraneous arg
* Fix HTTP API for `PerformUpgrade`
* Reduce number of API calls in `generateInitialEvents`, preserve membership fields
* Refactor `generateInitialEvents` to preserve old state events for all but the essential room setup events
* Handle ban events in the state transfer
* Refactor and comment `createTemporaryPowerLevels`
* Only send two power levels if we needed to override the levels, preserve miscellaneous fields in the create event
* Fix copyrights
* Review comments @S7evinK
* Update sytest whitelist
* Specify empty state keys, use `EventLevel`, remove unnecessary check on state copy
* Add comment to `restrictOldRoomPowerLevels`
* Ensure canonical aliases exist before clearing
* Copy invites as well as bans
* Fix return error on `m.room.tombstone` handling in client API
* Relax checks for well-formedness of join rules, membership event etc
Co-authored-by: Alex Kursell <alex@awk.run>
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
Co-authored-by: kegsay <kegan@matrix.org>
* Roomserver input refactoring — again!
* Ensure the actor runs again
* Preserve consumer after unsubscribe
* Another sprinkling of magic
* Rename `TopicFor` to `Prefixed`
* Recreate the stream if the config is bad
* Check streams too
* Prefix subjects, preserve inboxes
* Recreate if subjects wrong
* Remove stream subject
* Reconstruct properly
* Fix mutex unlock
* Comments
* Fix tests
* Don't drop events
* Review comments
* Separate `queueInputRoomEvents` function
* Re-jig control flow a bit
* Assign room NIDs and state key NIDs outside of database transactions
* In roomserver storage package too
* Don't take a `txn` parameter, clean up SQLite
* Let's try to work out why this endpoint lies
* Try that again
* Fix `QueryPublishedRooms`
* Remove logging
* Remove unnecessary change
* Remove unnecessary change
* Don't send `adds_state_events` in roomserver output events anymore
* Set `omitempty` on some output fields that aren't always set
* Add `AddsState` helper function
* No-op if no added state event IDs
* Revert "No-op if no added state event IDs"
This reverts commit 71a0ef3df1.
* Revert "Add `AddsState` helper function"
This reverts commit c9fbe45475.
* Add canonical support
* Add test
* Check that the send event is actually an m.room.canonical_alias
Check that we got an event from the database
* Update to get correct required events
* Add flakey test to blacklist
Previously this error line would print because we were pulling out all user memberships, but now this is no longer necessary — an event state key that we don't know will no longer get passed to `SelectJoinedUsersSetForRooms` at all.
* Initial cut at fixing up MSC2946 to work with latest spec
* bugfix: send response back correctly
* Initial working version of MSC2946
* msc2946: handle suggested_only; remove custom database
As the MSC doesn't require reverse lookups, we can just pull
the room state and inspect via the roomserver database. To
handle this, expand QueryCurrentState to support wildcards.
Use all this and handle `?suggested_only`.
* Sort child rooms
* msc2946: Make TestClientSpacesSummary pass
* msc2946: allow invited rooms to be spidered
* msc2946: support basic federation requests
* fix up go mod
* Topologically sort with `SendEventWithState`, so that earlier events should satisfy auth for later ones
* Revert "Topologically sort with `SendEventWithState`, so that earlier events should satisfy auth for later ones"
This reverts commit b0cd706012.
* Update to matrix-org/gomatrixserverlib#293
* `Events` no longer returns an error, other tweaks
* Make sure `Events` is sorted for `parsedRespState` too
* Remove error when state keys are missing for user NIDs
There is still an actual bug here somewhere in the membership updater, but this check does more harm than good, since it means that the key consumers don't actually distribute updates to *anyone*. It's better just to deal with this silently for now.
To find these broken rows:
```
SELECT * FROM roomserver_membership AS m WHERE NOT EXISTS (
SELECT event_state_key_nid FROM roomserver_event_state_keys AS s
WHERE m.sender_nid = s.event_state_key_nid
);
```
* Logging
* Add server_notices config
* Disallow rejecting "server notice" invites
* Update config
* Slightly refactor sendEvent and CreateRoom so it can be reused
* Implement unspecced server notices
* Validate the request
* Set the user api when starting
* Rename function/variables
* Update comments
* Update config
* Set the avatar on account creation
* Update test
* Only create the account when starting
Only add routes if sever notices are enabled
* Use reserver username
Check that we actually got roomData
* Add check for admin account
Enable server notices for CI
Return same values as Synapse
* Add custom error for rejecting server notice invite
* Move building an invite to it's own function, for reusability
* Don't create new rooms, use the existing one (follow Synapse behavior)
Co-authored-by: kegsay <kegan@matrix.org>
* Don't proactively cache event types and state keys when we don't know if the transaction has persisted yet
* Remove event type and state key caches altogether
* Only add events to `add_state_events` that haven't already been sent to the roomserver output before
* Filter on event NIDs instead, hopefully bring joy to SQLite
* UnsentFilter, review comments
* Ensure the input API only uses a single transaction
* Remove more of the dead query API call
* Tidy up
* Fix tests hopefully
* Don't do unnecessary work for rooms that don't exist
* Improve error, fix another case where transaction wasn't used properly
* Add a unit test for checking single transaction on RS input API
* Fix logic oops when deciding whether to use a transaction in storeEvent
* Check that we have a populated state snapshot when determining if we closed the gap
* Do the same in the query API
* Use HasState more opportunistically
* Try to avoid falling down the hole of using a trustworthy but empty state snapshot for non-create events
* Refactor missing state and make sure that we really solve the problem for the new event
* Comments
* Review comments
* Tweak that check again
* Tidy up that create check further
* Fix build hopefully
* Update sendOutliers to use OrderAuthAndStateEvents
* Don't go out of bounds on missingEvents
* Revert "Revert "Fix storage bug in PSQL events table""
This reverts commit cf447dd52a.
* Membership updater to use updater
* Fix membership updater to use transactions properly
* Use new event json types in gmsl
* Fix EventJSON to actually unmarshal events
* Update GMSL
* Bump GMSL and improve error messages
* Send back the correct RespState
* Update GMSL
* Don't flake so badly for rejected events
* Moar
* Fix panic
* Don't count rejected events as missing
* Don't treat rejected events without state as missing
* Revert "Don't count rejected events as missing"
This reverts commit 4b6139b62e.
* Missing events should be KindOld
* If we have state, use it, regardless of memberships which could be stale now
* Fetch missing state for KindOld too
* Tweak the condition again
* Clean up a bit
* Use room updater to get latest events in a race-free way
* Return the correct error
* Improve errors
* Remove dependency on saramajetstream & sarama
Signed-off-by: Till Faelligen <tfaelligen@gmail.com>
* Remove internal.ContinualConsumer from federationapi
* Remove internal.ContinualConsumer from syncapi
* Remove internal.ContinualConsumer from keyserver
* Move to new Prepare function
* Remove saramajetstream & sarama dependency
* Delete unneeded file
* Remove duplicate import
* Log error instead of silently irgnoring it
* Move `OffsetNewest` and `OffsetOldest` into keyserver types, change them to be more sane values
* Fix comments
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
It isn't really clear that the deadlines actually help in any way. Currently we can use up our 2 minutes doing something, run out of context time and then return an error which causes the transaction to rollback and forgetting everything we've done. If the message came to us from NATS then we probably will end up retrying just to be in the same situation. We'd be really a lot better if we just spent the time reconciling the problem in the first place, and then we're much less likely to need to fetch those missing auth or prev events in the future.
Also includes matrix-org/gomatrixserverlib#287 so we don't wait so long for servers that are obviously dead.
* Add transaction to all database tables in roomserver, rename latest events updater to room updater, use room updater for all RS input
* Better transaction management
* Tweak order
* Handle cases where the room does not exist
* Other fixes
* More tweaks
* Fill some gaps
* Fill in the gaps
* good lord it gets worse
* Don't roll back transactions when events rejected
* Pass through errors properly
* Fix bugs
* Fix incorrect error check
* Don't panic on nil txns
* Tweaks
* Hopefully fix panics for good in SQLite this time
* Fix rollback
* Minor bug fixes with latest event updater
* Some review comments
* Revert "Some review comments"
This reverts commit 0caf8cf53e.
* Fix a couple of bugs
* Clearer commit and rollback results
* Remove unnecessary prepares
* PerformInvite: bugfix and rejig control flow
Local clients would not be notified of invites to rooms
Dendrite had already joined in all cases due to not returning
an `api.OutputNewInviteEvent` for local invites. We now do this.
This was an easy mistake to make due to the control flow of the
function which doesn't handle the happy case at the end of the
function and instead forks the function depending on if the
invite was via federation or not. This has now been changed to
handle the federated invite as if it were an error (in that we
check it, do it and bail out) rather than outstay our welcome.
This ends up with the local invite being the happy case, which
now both sends an `InputRoomEvent` to the roomserver _and_ a
`api.OutputNewInviteEvent` is returned.
* Don't send invite pokes in PerformInvite
* Move event ID into logger
* Improve server selection somewhat
* Remove things from the map when we're done
* Be less panicky about auth event signatures in case they are not fatal after all
* Accept HasState in all cases
* Send join asynchronously
* Revert "Send join asynchronously"
This reverts commit 5b685bfcd0.
* Joins and leaves use background context
* Put federation client functions into their own file
* Look for missing auth events in RS input
* Remove retrieveMissingAuthEvents from federation API
* Logging
* Sorta transplanted the code over
* Use event origin failing all else
* Don't get stuck on mutexes:
* Add verifier
* Don't mark state events with zero snapshot NID as not existing
* Check missing state if not an outlier before storing the event
* Reject instead of soft-fail, don't copy roominfo so much
* Use synchronous contexts, limit time to fetch missing events
* Clean up some commented out bits
* Simplify `/send` endpoint significantly
* Submit async
* Report errors on sending to RS input
* Set max payload in NATS to 16MB
* Tweak metrics
* Add `workerForRoom` for tidiness
* Try skipping unmarshalling errors for RespMissingEvents
* Track missing prev events separately to avoid calculating state when not possible
* Tweak logic around checking missing state
* Care about state when checking missing prev events
* Don't check missing state for create events
* Try that again
* Handle create events better
* Send create room events as new
* Use given event kind when sending auth/state events
* Revert "Use given event kind when sending auth/state events"
This reverts commit 089d64d271.
* Only search for missing prev events or state for new events
* Tweaks
* We only have missing prev if we don't supply state
* Room version tweaks
* Allow async inputs again
* Apply backpressure to consumers/synchronous requests to hopefully stop things being overwhelmed
* Set timeouts on roomserver input tasks (need to decide what timeout makes sense)
* Use work queue policy, deliver all on restart
* Reduce chance of duplicates being sent by NATS
* Limit the number of servers we attempt to reduce backpressure
* Some review comment fixes
* Tidy up a couple things
* Don't limit servers, randomise order using map
* Some context refactoring
* Update gmsl
* Don't resend create events
* Set stateIDs length correctly or else the roomserver thinks there are missing events when there aren't
* Exclude our own servername
* Try backing off servers
* Make excluding self behaviour optional
* Exclude self from g_m_e
* Update sytest-whitelist
* Update consumers for the roomserver output stream
* Remember to send outliers for state returned from /gme
* Make full HTTP tests less upsetti
* Remove 'If a device list update goes missing, the server resyncs on the next one' from the sytest blacklist
* Remove debugging test
* Fix blacklist again, remove unnecessary duplicate context
* Clearer contexts, don't use background in case there's something happening there
* Don't queue up events more than once in memory
* Correctly identify create events when checking for state
* Fill in gaps again in /gme code
* Remove `AuthEventIDs` from `InputRoomEvent`
* Remove stray field
Co-authored-by: Kegan Dougal <kegan@matrix.org>