Should fix the following issues or make a lot less worse when using
Postgres:
The main issue behind #2911: The client gives up after a certain time,
causing a cascade of context errors, because the response couldn't be
built up fast enough. This mostly happens on accounts with many rooms,
due to the inefficient way we're getting recent events and current state
For #2777: The queries for getting the membership events for history
visibility were being executed for each room (I think 185?), resulting
in a whooping 2k queries for membership events. (Getting the
statesnapshot -> block nids -> actual wanted membership event)
Both should now be better by:
- Using a LATERAL join to get all recent events for all joined rooms in
one go (TODO: maybe do the same for room summary and current state etc)
- If we're lazy loading on initial syncs, we're now not getting the
whole current state, just to drop the majority of it because we're lazy
loading members - we add a filter to exclude membership events on the
first call to `CurrentState`.
- Using an optimized query to get the membership events needed to
calculate history visibility
---------
Co-authored-by: kegsay <kegan@matrix.org>
This adds a new admin endpoint `/_dendrite/admin/purgeRoom/{roomID}`. It
completely erases all database entries for a given room ID.
The roomserver will start by clearing all data for that room and then
will generate an output event to notify downstream components (i.e. the
sync API and federation API) to do the same.
It does not currently clear media and it is currently not implemented
for SQLite since it relies on SQL array operations right now.
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
Co-authored-by: Till Faelligen <2353100+S7evinK@users.noreply.github.com>
Since #2849 there is no limit for the current state we fetch to
calculate history visibility. In large rooms this can cause us to fetch
thousands of membership events we don't really care about.
This now only gets the state event types and senders in our timeline,
which should significantly reduce the amount of events we fetch from the
database.
Also removes `MaxTopologicalPosition`, as it is an unnecessary DB call,
given we use the result in `topological_position < $1` calls.
* Multiroom feature
* Run multiroom visibility expiration conditionally
Remove SQLite and go 1.18 for tests matrixes
* Remove sqlite from unit tests
* Fix linter errors
* Do not build with go1.18
* Do not run upgrade tests
* Fix dendrite workflow
* Add forgotten content and timestamp fields to multiroom in sync response
* Fix syncapi multiroom unit tests
* Review adjustments in queries and naming
* Remove no longer maintained linters from golangci-lint configuration
* Document sqlc code generation
Makes the tests
```
Can get rooms/{roomId}/members at a given point
Can filter rooms/{roomId}/members
```
pass, by moving `/members` and `/joined_members` to the SyncAPI.
This fixes a temporary workaround with the `selectEventsWithEventIDsSQL`
queries where fields need to be artificially added to the queries so the
row results match the format of the `syncapi_output_room_events` table.
I made similar functions that accept row results from the
`syncapi_current_room_state` table and convert them into StreamEvents
without the fields that are specific to output room events.
There is also a unit test in the first commit to ensure the resulting
behavior doesn't change from the modified queries and functions.
Fixes#601.
### Pull Request Checklist
<!-- Please read docs/CONTRIBUTING.md before submitting your pull
request -->
* [x] I have added tests for PR _or_ I have justified why this PR
doesn't need tests.
* [x] Pull request includes a [sign
off](https://github.com/matrix-org/dendrite/blob/main/docs/CONTRIBUTING.md#sign-off)
Signed-off-by: `Ashley Nelson <fant@shley.email>`
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
This should transactional snapshot isolation for `/sync` etc requests.
For now we don't use repeatable read due to some odd test failures with
invites.
Based on #2480
This actually indexes events based on their event type. They are removed
from the index if we receive a `m.room.redaction` event on the
`OutputRoomEvent` stream.
An admin endpoint is added to reindex all existing events.
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
This PR changes the handling of notifications
- removes the `StreamEvent` and `ReadUpdate` stream
- listens on the `OutputRoomEvent` stream in the UserAPI to inform the
SyncAPI about unread notifications
- listens on the `OutputReceiptEvent` stream in the UserAPI to set
receipts/update notifications
- sets the `read_markers` directly from within the internal UserAPI
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
* Add possibility to set history_visibility and user AccountType
* Add new DB queries
* Add actual history_visibility changes for /messages
* Add passing tests
* Extract check function
* Cleanup
* Cleanup
* Fix build on 386
* Move ApplyHistoryVisibilityFilter to internal
* Move queries to topology table
* Add filtering to /sync and /context
Some cleanup
* Add passing tests; Remove failing tests :(
* Re-add passing tests
* Move filtering to own function to avoid duplication
* Re-add passing test
* Use newly added GMSL HistoryVisibility
* Update gomatrixserverlib
* Set the visibility when creating events
* Default to shared history visibility
* Remove unused query
* Update history visibility checks to use gmsl
Update tests
* Remove unused statement
* Update migrations to set "correct" history visibility
* Add method to fetch the membership at a given event
* Tweaks and logging
* Use actual internal rsAPI, default to shared visibility in tests
* Revert "Move queries to topology table"
This reverts commit 4f0d41be9c.
* Remove noise/unneeded code
* More cleanup
* Try to optimize database requests
* Fix imports
* PR peview fixes/changes
* Move setting history visibility to own migration, be more restrictive
* Fix unit tests
* Lint
* Fix missing entries
* Tweaks for incremental syncs
* Adapt generic changes
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
Co-authored-by: kegsay <kegan@matrix.org>
* Correctly redact events over federation (#2526)
* Ensure we check powerlevel/origin before redacting an event
* Add passing test
* Use pl.UserLevel
* Make check more readable, also check for the sender
* Add new next steps page to the documentation
* Highlighting in docs
* Rename the page to "Optimise your installation"
* Attempt to raise the file descriptor limit at startup (#2527)
* Add `--difference` to `resolve-state` tool
* Make the linter happy again
* generic CaddyFile in front of Dendrite (monolith) (#2531)
for Caddy 2.5.x
Co-authored-by: emanuele.aliberti <emanuele.aliberti@mtka.eu>
* Handle state before, send history visibility in output (#2532)
* Check state before event
* Tweaks
* Refactor a bit, include in output events
* Don't waste time if soft failed either
* Tweak control flow, comments, use GMSL history visibility type
* Fix rare panic when returning user devices over federation (#2534)
* Add `InputDeviceListUpdate` to the keyserver, remove old input API (#2536)
* Add `InputDeviceListUpdate` to the keyserver, remove old input API
* Fix copyright
* Log more information when a device list update fails
* Fix nats.go commit (#2540)
Signed-off-by: Jean Lucas <jean@4ray.co>
* Don't return `end` if there are not more messages (#2542)
* Be more spec compliant
* Move lazyLoadMembers to own method
* Return an error if trying to invite a malformed user ID (#2543)
* Add `evacuateUser` endpoint, use it when deactivating accounts (#2545)
* Add `evacuateUser` endpoint, use it when deactivating accounts
* Populate the API
* Clean up user devices when deactivating
* Include invites, delete pushers
* Silence presence logs (#2547)
* Blacklist `Guest users can join guest_access rooms` test until it can be investigated
* Disable WebAssembly builds for now
* Try to fix backfilling (#2548)
* Try to fix backfilling
* Return start/end to not confuse clients
* Update GMSL
* Update GMSL
* Roomserver producers package (#2546)
* Give the roomserver a producers package
* Change init point
* Populate ACLs API
* Fix build issues
* `RoomEventProducer` naming
* Version 0.8.9 (#2549)
* Version 0.8.9
* Update changelog
* feat+fix: Ignore unknown keys and verify required fields are present in appservice registration files (#2550)
* fix: ignore unknown keys in appservice configs
fixesmatrix-org/dendrite#1567
* feat: verify required fields in appservice configs
* Use new testrig for key changes tests (#2552)
* Use new testrig for tests
* Log the error message
* Fix QuerySharedUsers for the SyncAPI keychange consumer (#2554)
* Make more use of base.BaseDendrite
* Fix QuerySharedUsers if no UserIDs are supplied
* Return clearer error when no state NID exists for an event (#2555)
* Wrap error from `SnapshotNIDFromEventID`
* Hopefully fix read receipts timestamps (#2557)
This should avoid coercions between signed and unsigned ints which might fix problems like `sql: converting argument $5 type: uint64 values with high bit set are not supported`.
* Fix nil pointer access when redacting events (#2560)
* Fix issue `uint64 values with high bit are not supported` in presence (#2562)
* Fix issue #2528
* Use gomatrixserverlib.Timestamp
* Use ParseUint instead of ParseInt
* Update Pinecone to matrix-org/pinecone@1ce778f
* Ristretto cache (#2563)
* Try Ristretto cache
* Tweak
* It's beautiful
* Update GMSL
* More strict keyable interface
* Fix that some more
* Make less panicky
* Don't enforce mutability checks for now
* Determine mutability using deep equality
* Tweaks
* Namespace keys
* Make federation caches mutable
* Update cost estimation, add metric
* Update GMSL
* Estimate cost for metrics better
* Reduce counters a bit
* Try caching events
* Some guards
* Try again
* Try this
* Use separate caches for hopefully better hash distribution
* Fix bug with admitting events into cache
* Try to fix bugs
* Check nil
* Try that again
* Preserve order jeezo this is messy
* thanks VS Code for doing exactly the wrong thing
* Try this again
* Be more specific
* aaaaargh
* One more time
* That might be better
* Stronger sorting
* Cache expiries, async publishing of EDUs
* Put it back
* Use a shared cache again
* Cost estimation fixes
* Update ristretto
* Reduce counters a bit
* Clean up a bit
* Update GMSL
* 1GB
* Configurable cache sizees
* Tweaks
* Add `config.DataUnit` for specifying friendly cache sizes
* Various tweaks
* Update GMSL
* Add back some lazy loading caching
* Include key in cost
* Include key in cost
* Tweak max age handling, config key name
* Only register prometheus metrics if requested
* Review comments @S7evinK
* Don't return errors when creating caches (it is better just to crash since otherwise we'll `nil`-pointer exception everywhere)
* Review comments
* Update sample configs
* Update GHA Workflow
* Update Complement images to Go 1.18
* Remove the cache test from the federation API as we no longer guarantee immediate cache admission
* Don't check the caches in the renewal test
* Possibly fix the upgrade tests
* Update to matrix-org/gomatrixserverlib#322
* Update documentation to refer to Go 1.18
* Minor SendToDevice fix (#2565)
* Avoid unnecessary marshalling if sending to the local server
* Fix ordering of ToDevice messages
* Revive SendToDevice test
* Use `/v3` to request media from remote servers (update to matrix-org/gomatrixserverlib#324)
* Pointerise `types.RoomInfo` in the cache so we can update it in-place in the latest events updater
* Add a Troubleshooting page
* Update `sytest-whitelist`
* Use sync API database in `filterSharedUsers` (#2572)
* Add function to the sync API storage package for filtering shared users
* Use the database instead of asking the RS API
* Fix unit tests
* Fix map handling in `filterSharedUsers`
* Update 1_createusers.md (#2571)
* Update 1_createusers.md
Added description on how to create user accounts when running in docker.
* Update 1_createusers.md
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
* Fix connection_string format in dendrite-sample.polylith.yaml (#2574)
* History visibility database changes (#2533)
* Add new history_visibility column
* Update SQL queries to include history_visibility
* Store the history visibilty calculated by the roomserver
* Update GMSL
* Update migrations
* Fix migration
* Update GMSL
* Fix `go.sum`
* Update GMSL to use sql.Scanner & sql.Valuer
* Re-order migration/table creation
* Update gomatrixserverlib
* Add history_visibility column to current_room_state
* Fix migrations
* Return error instead of Fatal log
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
* Tweak cache counters (#2575)
* Tweak cache counters
This makes the number of counters relative to the
maximum cache size. Since the counters
effectively manage the size of the bloom filter,
larger caches need more counters and smaller
caches need less.
10 counters per 1KB data means that the default
cache size of 1GB should result in a bloom filter
and TinyLRU admission set of about 16MB
estimated.
* Remove line left by accident
* Set historyVisibility in rowsToStreamEvents
* Update FAQ
* Add event state key cache (#2576)
* Explain how SRV works in Matrix and discourage using it (#2577)
* Explain how SRV works in Matrix and discourage using it
* Minor tweaks to formatting
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
* Fix issue with membership event_nid being 0 (#2580)
* docs: Add build page; correct proxy info; fix Caddy example (#2579)
* Add build page; correct proxy info; fix Caddy example
* Improve Caddyfile example
* Apply review comments; add polylith Caddyfile
* Bump tzinfo from 1.2.9 to 1.2.10 in /docs (#2584)
Bumps [tzinfo](https://github.com/tzinfo/tzinfo) from 1.2.9 to 1.2.10.
- [Release notes](https://github.com/tzinfo/tzinfo/releases)
- [Changelog](https://github.com/tzinfo/tzinfo/blob/master/CHANGES.md)
- [Commits](https://github.com/tzinfo/tzinfo/compare/v1.2.9...v1.2.10)
---
updated-dependencies:
- dependency-name: tzinfo
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Membership updater refactoring (#2541)
* Membership updater refactoring
* Pass in membership state
* Use membership check rather than referring to state directly
* Delete irrelevant membership states
* We don't need the leave event after all
* Tweaks
* Put a log entry in that I might stand a chance of finding
* Be less panicky
* Tweak invite handling
* Don't freak if we can't find the event NID
* Use event NID from `types.Event`
* Clean up
* Better invite handling
* Placate the almighty linter
* Blacklist a Sytest which is otherwise fine under Complement for reasons I don't understand
* Fix the sytest after all (thanks @S7evinK for the spot)
* Try to fix HTTP 500s on `/members` (#2581)
* Update database migrations, remove goose (#2264)
* Add new db migration
* Update migrations
Remove goose
* Add possibility to test direct upgrades
* Try to fix WASM test
* Add checks for specific migrations
* Remove AddMigration
Use WithTransaction
Add Dendrite version to table
* Fix linter issues
* Update tests
* Update comments, outdent if
* Namespace migrations
* Add direct upgrade tests, skipping over one version
* Split migrations
* Update go version in CI
* Fix copy&paste mistake
* Use contexts in migrations
Co-authored-by: kegsay <kegan@matrix.org>
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
* Add .well-known/matrix/client to clientapi (#2551)
Signed-off-by: Jonathan Bartlett <jonathan@jonnobrow.co.uk>
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
* Remove `room_id` field from MSC2946 stripped events (closes#2588)
* Remove `goose` from Dockerfiles
* Make the User API responsible for sending account data output events (#2592)
* Make the User API responsible for sending account data output events
* Clean up producer
* Review comments
* Update NATS Server and nats.go to use upstream
* Set CORS headers for HTTP 404 and 405 errors (#2599)
* Set CORS headers for the 404s
* Use custom handlers, plus one for HTTP 405 too
* Tweak setup
* Add to muxes too
* Tidy up some more
* Use built-in HTTP 404 handler
* Don't bother setting it for federation-facing
* Optimise checking other servers allowed to see events (#2596)
* Try optimising checking if server is allowed to see event
* Fix error
* Handle case where snapshot NID is 0
* Fix query
* Update SQL
* Clean up `CheckServerAllowedToSeeEvent`
* Not supported on SQLite
* Maybe placate the unit tests
* Review comments
* De-race `types.RoomInfo` (#2600)
* De-race `CompleteSync` (#2601)
The `err` was coming from outside of the goroutine and being written to by concurrent goroutines.
* Version 0.9.0 (#2602)
Co-authored-by: Till <2353100+S7evinK@users.noreply.github.com>
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
Co-authored-by: Till Faelligen <davidf@element.io>
Co-authored-by: Emanuele Aliberti <dev@mtka.eu>
Co-authored-by: emanuele.aliberti <emanuele.aliberti@mtka.eu>
Co-authored-by: Jean Lucas <jean@4ray.co>
Co-authored-by: Kabir Kwatra <kabir@kwatra.me>
Co-authored-by: andreever <52261463+andreever@users.noreply.github.com>
Co-authored-by: Maximilian Gaedig <38767445+MaximilianGaedig@users.noreply.github.com>
Co-authored-by: Tulir Asokan <tulir@maunium.net>
Co-authored-by: Matt Holt <mholt@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kegsay <kegan@matrix.org>
Co-authored-by: Jonathan Bartlett <34320158+Jonnobrow@users.noreply.github.com>
* Add function to the sync API storage package for filtering shared users
* Use the database instead of asking the RS API
* Fix unit tests
* Fix map handling in `filterSharedUsers`
* Fix flakey sytest 'Local device key changes get to remote servers'
* Debug logs
* Remove internal/test and use /test only
Remove a lot of ancient code too.
* Use FederationRoomserverAPI in more places
* Use more interfaces in federationapi; begin adding regression test
* Linting
* Add regression test
* Unbreak tests
* ALL THE LOGS
* Fix a race condition which could cause events to not be sent to servers
If a new room event which rewrites state arrives, we remove all joined hosts
then re-calculate them. This wasn't done in a transaction so for a brief period
we would have no joined hosts. During this interim, key change events which arrive
would not be sent to destination servers. This would sporadically fail on sytest.
* Unbreak new tests
* Linting
* Only load members of newly joined rooms
* Comment that the query is prepared at runtime
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
* Use filter and limit presence count
* More limiting
* More limiting
* Fix unit test
* Also limit presence by last_active_ts
* Update query, use "from" as the initial lastPos
* Get 1000 presence events, they are filtered later
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
Squashed commit of the following:
commit 0ec8de57261d573a5f88577aa9d7a1174d3999b9
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Tue Apr 26 16:56:30 2022 +0100
Select filter onto provided target filter
commit da40b6fffbf5737864b223f49900048f557941f9
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Tue Apr 26 16:48:00 2022 +0100
Specify other field too
commit ffc0b0801f63bb4d3061b6813e3ce5f3b4c8fbcb
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Tue Apr 26 16:45:44 2022 +0100
Send as much account data as possible during complete sync
* syncapi: add more tests; fix more bugs
bugfixes:
- The postgres impl of TopologyTable.SelectEventIDsInRange did not use the provided txn
- The postgres impl of EventsTable.SelectEvents did not preserve the ordering of the input event IDs in the output events slice
- The sqlite impl of EventsTable.SelectEvents did not use a bulk `IN ($1)` query.
Added tests:
- `TestGetEventsInRangeWithTopologyToken`
- `TestOutputRoomEventsTable`
- `TestTopologyTable`
* -p 1 for now
* Add ignore users
* Ignore users in pushrules
Add passing tests
* Update sytest lists
* Store ignore knowledge in the sync API
* Fix copyrights
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
* Include joined and invite member counts in room summary
This should fix#2314 and also fix the problem where some clients like Element Android, Fluffychat etc would display the wrong member count for a given room.
* Improve SQLite query precision
* Check existence of state key for membership events
* Move receipt sending to own JetStream producer
* Move SendToDevice to producer
* Remove most parts of the EDU server
* Fix SendToDevice & copyrights
* Move structs, cleanup EDU Server traces
* Use HeadersOnly subscription
* Missing file
* Fix linter issues
* Move consumers to own files
* Rename durable consumer; Consumer cleanup
* Docs/config cleanup
* Convert stream positions into topological positions for both `from` and `to` in `/messages`
* Hopefully it works now
* Remove unnecessary logging
* Return sane values if `StreamToTopologicalPosition` can't work out the right thing to do
* Revert logging change
* tweaks
* Fix `selectEventIDsInRangeASCSQL`
* Test `Getting messages going forward is limited for a departed room (SPEC-216)` was passing incorrectly so un-whitelist it
* Add Pushserver component with Pushers API
Co-authored-by: Tommie Gannert <tommie@gannert.se>
Co-authored-by: Dan Peleg <dan@globekeeper.com>
* Wire Pushserver component
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
* Add PushGatewayClient.
The full event format is required for Sytest.
* Add a pushrules module.
* Change user API account creation to use the new pushrules module's defaults.
Introduces "scope" as required by client API, and some small field
tweaks to make some 61push Sytests pass.
* Add push rules query/put API in Pushserver.
This manipulates account data over User API, and fires sync messages
for changes. Those sync messages should, according to an existing TODO
in clientapi, be moved to userapi.
Forks clientapi/producers/syncapi.go to pushserver/ for later extension.
* Add clientapi routes for push rules to Pushserver.
A cleanup would be to move more of the name-splitting logic into
pushrules.go, to depollute routing.go.
* Output rooms.join.unread_notifications in /sync.
This is the read-side. Pushserver will be the write-side.
* Implement pushserver/storage for notifications.
* Use PushGatewayClient and the pushrules module in Pushserver's room consumer.
* Use one goroutine per user to avoid locking up the entire server for
one bad push gateway.
* Split pushing by format.
* Send one device per push. Sytest does not support coalescing
multiple devices into one push. Matches Synapse. Either we change
Sytest, or remove the group-by-url-and-format logic.
* Write OutputNotificationData from push server. Sync API is already
the consumer.
* Implement read receipt consumers in Pushserver.
Supports m.read and m.fully_read receipts.
* Add clientapi route for /unstable/notifications.
* Rename to UpsertPusher for clarity and handle pusher update
* Fix linter errors
* Ignore body.Close() error check
* Fix push server internal http wiring
* Add 40 newly passing 61push tests to whitelist
* Add next 12 newly passing 61push tests to whitelist
* Send notification data before notifying users in EDU server consumer
* NATS JetStream
* Goodbye sarama
* Fix `NewStreamTokenFromString`
* Consume on the correct topic for the roomserver
* Don't panic, NAK instead
* Move push notifications into the User API
* Don't set null values since that apparently causes Element upsetti
* Also set omitempty on conditions
* Fix bug so that we don't override the push rules unnecessarily
* Tweak defaults
* Update defaults
* More tweaks
* Move `/notifications` onto `r0`/`v3` mux
* User API will consume events and read/fully read markers from the sync API with stream positions, instead of consuming directly
Co-authored-by: Piotr Kozimor <p1996k@gmail.com>
Co-authored-by: Tommie Gannert <tommie@gannert.se>
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
* Don't request state events if we already have the timeline events (Postgres only)
* Rename variable
* nocyclo
* Add SQLite
* Tweaks
* Revert query change
* Don't dedupe if asking for full state
* Update query
* Add some filtering (postgres only for now)
* Fix build error
* Try to use request filter
* Use default filter as a template when retrieving from the database
* Remove unused strut
* Update sytest-whitelist
* Add filtering to SelectEarlyEvents
* Fix Postgres selectEarlyEvents query
* Attempt filtering on SQLite
* Test limit, set field for limit/order in prepareWithFilters
* Remove debug logging, add comments
* Tweaks, debug logging
* Separate SQLite stream IDs
* Fix filtering in current state table
* Fix lock issues
* More tweaks
* Current state requires room ID
* Review comments