Preparations to actually remove/replace `BaseDendrite`.
Quite a few changes:
- SyncAPI accepts an `fulltext.Indexer` interface (fulltext is removed
from `BaseDendrite`)
- Caches are removed from `BaseDendrite`
- Introduces a `Router` struct (likely to change)
- also fixes#2903
- Introduces a `sqlutil.ConnectionManager`, which should remove
`base.DatabaseConnection` later on
- probably more
We need to check the redaction PL in Dendrite, if we do it in GMSL, we
end up not sending the event to the output stream because it will be
rejected.
---------
Co-authored-by: kegsay <kegan@matrix.org>
This PR changes the following:
- `StoreEvent` now only stores an event (and possibly prev event),
instead of also doing redactions
- Adds a `MaybeRedactEvent` (pulled out from `StoreEvent`), which should
be called after storing events
- a few other things
This PR changes a few things:
- It pulls out the creation of several NIDs from the `StoreEvent`
function to make the functions more reusable
- Uses more caching when using those NIDs to avoid DB round trips
Should fix the following issues or make a lot less worse when using
Postgres:
The main issue behind #2911: The client gives up after a certain time,
causing a cascade of context errors, because the response couldn't be
built up fast enough. This mostly happens on accounts with many rooms,
due to the inefficient way we're getting recent events and current state
For #2777: The queries for getting the membership events for history
visibility were being executed for each room (I think 185?), resulting
in a whooping 2k queries for membership events. (Getting the
statesnapshot -> block nids -> actual wanted membership event)
Both should now be better by:
- Using a LATERAL join to get all recent events for all joined rooms in
one go (TODO: maybe do the same for room summary and current state etc)
- If we're lazy loading on initial syncs, we're now not getting the
whole current state, just to drop the majority of it because we're lazy
loading members - we add a filter to exclude membership events on the
first call to `CurrentState`.
- Using an optimized query to get the membership events needed to
calculate history visibility
---------
Co-authored-by: kegsay <kegan@matrix.org>
This adds Sytest and Complement coverage reporting to the nightly
scheduled CI runs.
Fixes a few API mode related issues as well, since we seemingly never
really ran them with Complement.
Also fixes a bug related to device list changes: When we pass in an
empty `newlyLeftRooms` slice, we got a list of all currently joined
rooms with the corresponding members. When we then got the
`newlyJoinedRooms`, we wouldn't update the `changed` slice, because we
already got the user from the `newlyLeftRooms` query. This is fixed by
simply ignoring empty `newlyLeftRooms`.
This adds a new admin endpoint `/_dendrite/admin/purgeRoom/{roomID}`. It
completely erases all database entries for a given room ID.
The roomserver will start by clearing all data for that room and then
will generate an output event to notify downstream components (i.e. the
sync API and federation API) to do the same.
It does not currently clear media and it is currently not implemented
for SQLite since it relies on SQL array operations right now.
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
Co-authored-by: Till Faelligen <2353100+S7evinK@users.noreply.github.com>
Needs https://github.com/matrix-org/sytest/pull/1315, as otherwise the
membership events aren't persisted yet when hitting `/state` after
kicking guest users.
Makes the following tests pass:
```
Guest users denied access over federation if guest access prohibited
Guest users are kicked from guest_access rooms on revocation of guest_access
Guest users are kicked from guest_access rooms on revocation of guest_access over federation
```
Todo (in a follow up PR):
- Restrict access to CS API Endpoints as per
https://spec.matrix.org/v1.4/client-server-api/#client-behaviour-14
Co-authored-by: kegsay <kegan@matrix.org>
The stale device lists table might contain entries for users we don't
share a room with anymore. This now asks the roomserver about left users
and removes those entries from the table.
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
Makes the following tests pass
```
/upgrade moves remote aliases to the new room
Local and remote users' homeservers remove a room from their public directory on upgrade
```
This optimizes history visibility checks by (mostly) avoiding database
hits.
Possibly solves https://github.com/matrix-org/dendrite/issues/2777
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
Adds `PUT
/_matrix/client/v3/directory/list/appservice/{networkId}/{roomId}` and
`DELTE
/_matrix/client/v3/directory/list/appservice/{networkId}/{roomId}`
support, as well as the ability to filter `/publicRooms` on networkID
and including all networks.
This prevents us from holding onto durable consumers indefinitely for
rooms that have long since turned inactive, since they do have a bit of
a processing overhead in the NATS Server. If we clear up a consumer and
then a room becomes active again, the consumer gets recreated as needed.
The threshold is set to 24 hours for now, we can tweak it later if needs
be.
Fixes `outliers whose auth_events are in a different room are correctly
rejected`, by validating that auth events are all from the same room and
not using rejected events for event auth.
Sytest was using a wrong `history_visibility` for `invited`
(https://github.com/matrix-org/sytest/pull/1303), so `invited` was
passing for the wrong reason (-> defaulted to `shared`, as `invite`
wasn't understood).
This change now handles missing events like Synapse, if a server isn't
allowed to see the event, it gets a redacted version of it, making the
`get_missing_events` tests pass.
This should hopefully fix an entire class of problems where components
downstream from the roomserver (i.e. the sync API) could just lose a
whole bunch of state after a rewrite operation like a federated join.
The root of the bug is that we set `RewritesState` in the output event
which instructs downstream components to purge their copy of any room
state, but then didn't send the entire state snapshot in
`adds_state_event_ids` so the downstream state ends up being incomplete
as a result.
Previously `LoadMembershipAtEvent` would fail if the state before one of
the events was not known, i.e. because it was an outlier. This modifies
it so that it gracefully handles not knowing the state and returns no
memberships instead, so that history visibility doesn't freak out and
kill `/sync` requests dead.
commit 1929b688e31987c46e0c8a546f0f9cb0a46bf9a3
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Mon Aug 22 10:09:44 2022 +0100
Still process state-before for soft-failed events
commit e83c0b701d40d78b92072c4643f6bc6f71b72800
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Mon Aug 22 10:06:50 2022 +0100
Improve logging
commit 29e26124bc27cb83d449de2a4214b253c594aa93
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Mon Aug 22 09:58:13 2022 +0100
Don't store soft-failed events as rejected
This should hopefully deflake Backfill works correctly with history visibility set to joined as we were using the default shared visibility, even if the events are set to joined (or something else)
* Reprocess outliers that were previously rejected
* Might as well do all events this way
* More useful errors
* Fix queries
* Tweak condition
* Don't wrap errors
* Report more useful error
* Flatten error on `r.Queryer.QueryStateAfterEvents`
* Some more debug logging
* Flatten error in `QueryRestrictedJoinAllowed`
* Revert "Flatten error in `QueryRestrictedJoinAllowed`"
This reverts commit 1238b4184c.
* Tweak `QueryStateAfterEvents`
* Handle MissingStateError too
* Scope to room
* Clean up
* Fix the error
* Only apply rejection check to outliers
* CS API changes
* Query remote profiles
* Add passing tests
* Don't create a new FullyQualifiedProfile
* Handle sql.ErrNoRows
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
* Add possibility to set history_visibility and user AccountType
* Add new DB queries
* Add actual history_visibility changes for /messages
* Add passing tests
* Extract check function
* Cleanup
* Cleanup
* Fix build on 386
* Move ApplyHistoryVisibilityFilter to internal
* Move queries to topology table
* Add filtering to /sync and /context
Some cleanup
* Add passing tests; Remove failing tests :(
* Re-add passing tests
* Move filtering to own function to avoid duplication
* Re-add passing test
* Use newly added GMSL HistoryVisibility
* Update gomatrixserverlib
* Set the visibility when creating events
* Default to shared history visibility
* Remove unused query
* Update history visibility checks to use gmsl
Update tests
* Remove unused statement
* Update migrations to set "correct" history visibility
* Add method to fetch the membership at a given event
* Tweaks and logging
* Use actual internal rsAPI, default to shared visibility in tests
* Revert "Move queries to topology table"
This reverts commit 4f0d41be9c.
* Remove noise/unneeded code
* More cleanup
* Try to optimize database requests
* Fix imports
* PR peview fixes/changes
* Move setting history visibility to own migration, be more restrictive
* Fix unit tests
* Lint
* Fix missing entries
* Tweaks for incremental syncs
* Adapt generic changes
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
Co-authored-by: kegsay <kegan@matrix.org>
* Generic-based internal HTTP API (tested out on a few endpoints in the federation API)
* Add `PerformInvite`
* More tweaks
* Fix metric name
* Fix LookupStateIDs
* Lots of changes to clients
* Some serverside stuff
* Some error handling
* Use paths as metric names
* Revert "Use paths as metric names"
This reverts commit a9323a6a34.
* Namespace metric names
* Remove duplicate entry
* Remove another duplicate entry
* Tweak error handling
* Some more tweaks
* Update error behaviour
* Some more error tweaking
* Fix API path for `PerformDeleteKeys`
* Fix another path
* Tweak federation client proxying
* Fix another path
* Don't return typed nils
* Some more tweaks, not that it makes any difference
* Tweak federation client proxying
* Maybe fix the key backup test
* Try more servers when calling `/state_ids`
* More logging
* Maybe fix concurrent map write
* Revert "Maybe fix concurrent map write"
This reverts commit da0dbb8362.
* Enforce a limit of 20s per server, 5 mins total
* Try optimising checking if server is allowed to see event
* Fix error
* Handle case where snapshot NID is 0
* Fix query
* Update SQL
* Clean up `CheckServerAllowedToSeeEvent`
* Not supported on SQLite
* Maybe placate the unit tests
* Review comments
* Membership updater refactoring
* Pass in membership state
* Use membership check rather than referring to state directly
* Delete irrelevant membership states
* We don't need the leave event after all
* Tweaks
* Put a log entry in that I might stand a chance of finding
* Be less panicky
* Tweak invite handling
* Don't freak if we can't find the event NID
* Use event NID from `types.Event`
* Clean up
* Better invite handling
* Placate the almighty linter
* Blacklist a Sytest which is otherwise fine under Complement for reasons I don't understand
* Fix the sytest after all (thanks @S7evinK for the spot)
* Try Ristretto cache
* Tweak
* It's beautiful
* Update GMSL
* More strict keyable interface
* Fix that some more
* Make less panicky
* Don't enforce mutability checks for now
* Determine mutability using deep equality
* Tweaks
* Namespace keys
* Make federation caches mutable
* Update cost estimation, add metric
* Update GMSL
* Estimate cost for metrics better
* Reduce counters a bit
* Try caching events
* Some guards
* Try again
* Try this
* Use separate caches for hopefully better hash distribution
* Fix bug with admitting events into cache
* Try to fix bugs
* Check nil
* Try that again
* Preserve order jeezo this is messy
* thanks VS Code for doing exactly the wrong thing
* Try this again
* Be more specific
* aaaaargh
* One more time
* That might be better
* Stronger sorting
* Cache expiries, async publishing of EDUs
* Put it back
* Use a shared cache again
* Cost estimation fixes
* Update ristretto
* Reduce counters a bit
* Clean up a bit
* Update GMSL
* 1GB
* Configurable cache sizees
* Tweaks
* Add `config.DataUnit` for specifying friendly cache sizes
* Various tweaks
* Update GMSL
* Add back some lazy loading caching
* Include key in cost
* Include key in cost
* Tweak max age handling, config key name
* Only register prometheus metrics if requested
* Review comments @S7evinK
* Don't return errors when creating caches (it is better just to crash since otherwise we'll `nil`-pointer exception everywhere)
* Review comments
* Update sample configs
* Update GHA Workflow
* Update Complement images to Go 1.18
* Remove the cache test from the federation API as we no longer guarantee immediate cache admission
* Don't check the caches in the renewal test
* Possibly fix the upgrade tests
* Update to matrix-org/gomatrixserverlib#322
* Update documentation to refer to Go 1.18
* Add `evacuateUser` endpoint, use it when deactivating accounts
* Populate the API
* Clean up user devices when deactivating
* Include invites, delete pushers
* Check state before event
* Tweaks
* Refactor a bit, include in output events
* Don't waste time if soft failed either
* Tweak control flow, comments, use GMSL history visibility type
Squashed commit of the following:
commit 7a1568c716866594af6d0b1d561c58c96de29b20
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Mon Jun 6 15:17:49 2022 +0100
Make errors more useful
commit 64befe7c9a901b00650442171660c2dc4ea575fa
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Mon Jun 6 15:02:40 2022 +0100
Tweak ordering a bit
Squashed commit of the following:
commit 2bd0daf4d61376d2dd56628eaff267b0bc63e116
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Wed Jun 1 09:55:54 2022 +0100
Revert resolving old extremities as well as new
This may no longer be needed with the new state fixes and probably just burns more CPU time than is strictly necessary.
* Fix bugs related to state resolution
* Clean up `resolve-state`
* Don't panic when entries can't be found
* Ensure we have state entries for the auth events
* Revert "Ensure we have state entries for the auth events"
This reverts commit 9b13b7ed37.
* Revert "Revert "Ensure we have state entries for the auth events""
This reverts commit d86db197e3.
* Fix bug
* Try that again
* Update gomatrixserverlib
* Remove recursion from `loadAuthEvents`
* Add `QueryRestrictedJoinAllowed`
* Add `Resident` flag to `QueryRestrictedJoinAllowedResponse`
* Check restricted joins on federation API
* Return `Restricted` to determine if the room was restricted or not
* Populate `AuthorisedVia` properly
* Sign the event on `/send_join`, return it in the `/send_join` response in the `"event"` key
* Kick back joins with invalid authorising user IDs, use event from `"event"` key if returned in `RespSendJoin`
* Use invite helper in `QueryRestrictedJoinAllowed`
* Only use users with the power to invite, change error bubbling a bit
* Placate the almighty linter
One day I will nuke `gocyclo` from orbit and everything in the world will be much better for it.
* Review comments
* Fix flakey sytest 'Local device key changes get to remote servers'
* Debug logs
* Remove internal/test and use /test only
Remove a lot of ancient code too.
* Use FederationRoomserverAPI in more places
* Use more interfaces in federationapi; begin adding regression test
* Linting
* Add regression test
* Unbreak tests
* ALL THE LOGS
* Fix a race condition which could cause events to not be sent to servers
If a new room event which rewrites state arrives, we remove all joined hosts
then re-calculate them. This wasn't done in a transaction so for a brief period
we would have no joined hosts. During this interim, key change events which arrive
would not be sent to destination servers. This would sporadically fail on sytest.
* Unbreak new tests
* Linting
* Feed existing state into state res when calculating state from new extremities
* Remove duplicates
* Fix bug
* Sort and unique
* Update to matrix-org/gomatrixserverlib#308
* Trim the slice properly
* Update gomatrixserverlib again
* Update to matrix-org/gomatrixserverlib#308
* tidy up interfaces
* remove unused GetCreatorIDForAlias
* Add RoomserverUserAPI interface
* Define more interfaces
* Use AppServiceInternalAPI for consistent naming
* clean up federationapi constructor a bit
* Fix monolith in -http mode
* Add new endpoint to allow admins to evacuate the local server from the room
* Guard endpoint
* Use right prefix
* Auth API
* More useful return error rather than a panic
* More useful return value again
* Update the path
* Try using inputer instead
* oh provide the config
* Try that again
* Return affected user IDs
* Don't create so many forward extremities
* Add missing `Path` to name
Co-authored-by: Till <2353100+S7evinK@users.noreply.github.com>
* Add response size and requests total to internal handler
* Move MustRegister calls to New* funcs
* Move MustRegister back to init
* Init at some place, minimize changes
* Add test infrastructure code for dendrite unit/integ tests
Start re-enabling some syncapi storage tests in the process.
* Linting
* Add postgres service to unit tests
* dendrite not syncv3
* Skip test which doesn't work
* Linting
* Add `jetstream.PrepareForTests`
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
* Added /upgrade endpoint
* fix
* Fix lints
* More lint lifex
* Move room upgrading to the roomserver
* Remove extraneous arg
* Fix HTTP API for `PerformUpgrade`
* Reduce number of API calls in `generateInitialEvents`, preserve membership fields
* Refactor `generateInitialEvents` to preserve old state events for all but the essential room setup events
* Handle ban events in the state transfer
* Refactor and comment `createTemporaryPowerLevels`
* Only send two power levels if we needed to override the levels, preserve miscellaneous fields in the create event
* Fix copyrights
* Review comments @S7evinK
* Update sytest whitelist
* Specify empty state keys, use `EventLevel`, remove unnecessary check on state copy
* Add comment to `restrictOldRoomPowerLevels`
* Ensure canonical aliases exist before clearing
* Copy invites as well as bans
* Fix return error on `m.room.tombstone` handling in client API
* Relax checks for well-formedness of join rules, membership event etc
Co-authored-by: Alex Kursell <alex@awk.run>
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
Co-authored-by: kegsay <kegan@matrix.org>
* Roomserver input refactoring — again!
* Ensure the actor runs again
* Preserve consumer after unsubscribe
* Another sprinkling of magic
* Rename `TopicFor` to `Prefixed`
* Recreate the stream if the config is bad
* Check streams too
* Prefix subjects, preserve inboxes
* Recreate if subjects wrong
* Remove stream subject
* Reconstruct properly
* Fix mutex unlock
* Comments
* Fix tests
* Don't drop events
* Review comments
* Separate `queueInputRoomEvents` function
* Re-jig control flow a bit
* Let's try to work out why this endpoint lies
* Try that again
* Fix `QueryPublishedRooms`
* Remove logging
* Remove unnecessary change
* Remove unnecessary change