Commit graph

367 commits

Author SHA1 Message Date
Neil Alexander 8ff3f1a7c9
Remove a couple unnecessary Sentry captures from backfill 2022-08-25 11:01:07 +01:00
Neil Alexander cd7fa34595
Tweak logging and Sentry reporting for roomserver input 2022-08-25 10:57:27 +01:00
Neil Alexander 16156b0b09
Fix 500s on /state, /state_ids when state not known (#2672)
This was due to bad error bubbling.
2022-08-25 09:51:36 +01:00
Neil Alexander 522bd2999f
Allow un-rejecting events on reprocessing 2022-08-24 14:03:06 +01:00
Neil Alexander 14fea600bb
Detect types.MissingStateError in CheckServerAllowedToSeeEvent (#2667)
This will hopefully stop some 500 errors on `/event` where there is no state-before known.
2022-08-23 13:57:11 +01:00
Neil Alexander 2668050e53
Tweak soft-failure handling in roomserver
commit 1929b688e31987c46e0c8a546f0f9cb0a46bf9a3
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Mon Aug 22 10:09:44 2022 +0100

    Still process state-before for soft-failed events

commit e83c0b701d40d78b92072c4643f6bc6f71b72800
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Mon Aug 22 10:06:50 2022 +0100

    Improve logging

commit 29e26124bc27cb83d449de2a4214b253c594aa93
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Mon Aug 22 09:58:13 2022 +0100

    Don't store soft-failed events as rejected
2022-08-22 10:34:07 +01:00
Till 365da70a23
Set historyVisibility for backfilled events over federation (#2656)
This should hopefully deflake Backfill works correctly with history visibility set to joined as we were using the default shared visibility, even if the events are set to joined (or something else)
2022-08-19 11:04:26 +02:00
Neil Alexander 6b48ce0d75
State handling tweaks (#2652)
This tweaks how rejected events are handled in room state and also to not apply checks we can't complete to outliers.
2022-08-18 17:06:13 +01:00
Neil Alexander 59bc0a6f4e
Reprocess rejected input events (#2647)
* Reprocess outliers that were previously rejected

* Might as well do all events this way

* More useful errors

* Fix queries

* Tweak condition

* Don't wrap errors

* Report more useful error

* Flatten error on `r.Queryer.QueryStateAfterEvents`

* Some more debug logging

* Flatten error in `QueryRestrictedJoinAllowed`

* Revert "Flatten error in `QueryRestrictedJoinAllowed`"

This reverts commit 1238b4184c.

* Tweak `QueryStateAfterEvents`

* Handle MissingStateError too

* Scope to room

* Clean up

* Fix the error

* Only apply rejection check to outliers
2022-08-18 10:37:47 +01:00
Till b4647fbb7e
Show/hide users in user directory (#2637)
* CS API changes

* Query remote profiles

* Add passing tests

* Don't create a new FullyQualifiedProfile

* Handle sql.ErrNoRows

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2022-08-12 13:33:31 +02:00
Neil Alexander a01af55ec6
Restore the room version cache in the roomserver internal API HTTP client 2022-08-11 17:34:09 +01:00
Till 05cafbd197
Implement history visibility on /messages, /context, /sync (#2511)
* Add possibility to set history_visibility and user AccountType

* Add new DB queries

* Add actual history_visibility changes for /messages

* Add passing tests

* Extract check function

* Cleanup

* Cleanup

* Fix build on 386

* Move ApplyHistoryVisibilityFilter to internal

* Move queries to topology table

* Add filtering to /sync and /context
Some cleanup

* Add passing tests; Remove failing tests :(

* Re-add passing tests

* Move filtering to own function to avoid duplication

* Re-add passing test

* Use newly added GMSL HistoryVisibility

* Update gomatrixserverlib

* Set the visibility when creating events

* Default to shared history visibility

* Remove unused query

* Update history visibility checks to use gmsl
Update tests

* Remove unused statement

* Update migrations to set "correct" history visibility

* Add method to fetch the membership at a given event

* Tweaks and logging

* Use actual internal rsAPI, default to shared visibility in tests

* Revert "Move queries to topology table"

This reverts commit 4f0d41be9c.

* Remove noise/unneeded code

* More cleanup

* Try to optimize database requests

* Fix imports

* PR peview fixes/changes

* Move setting history visibility to own migration, be more restrictive

* Fix unit tests

* Lint

* Fix missing entries

* Tweaks for incremental syncs

* Adapt generic changes

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
Co-authored-by: kegsay <kegan@matrix.org>
2022-08-11 18:23:35 +02:00
Neil Alexander 371336c6b5
Set default room version to 9 2022-08-11 16:31:44 +01:00
Neil Alexander c45d0936b5
Generic-based internal HTTP API (#2626)
* Generic-based internal HTTP API (tested out on a few endpoints in the federation API)

* Add `PerformInvite`

* More tweaks

* Fix metric name

* Fix LookupStateIDs

* Lots of changes to clients

* Some serverside stuff

* Some error handling

* Use paths as metric names

* Revert "Use paths as metric names"

This reverts commit a9323a6a34.

* Namespace metric names

* Remove duplicate entry

* Remove another duplicate entry

* Tweak error handling

* Some more tweaks

* Update error behaviour

* Some more error tweaking

* Fix API path for `PerformDeleteKeys`

* Fix another path

* Tweak federation client proxying

* Fix another path

* Don't return typed nils

* Some more tweaks, not that it makes any difference

* Tweak federation client proxying

* Maybe fix the key backup test
2022-08-11 15:29:33 +01:00
Till 03ddd98f5e
Fix issues with migrations not getting executed (#2628)
* Fix issues with migrations not getting executed

* Check actual postgres error

* Return error if it's not "column does not exist"
2022-08-08 10:18:57 +02:00
Till 1b7f84250a
Fix linter issues (#2624)
* Try that again

* All hail the mighty linter?

* And once again

* goimport all the things
2022-08-05 11:12:41 +02:00
Neil Alexander 3bf5ae5ffe
Try more servers when calling /state_ids (#2610)
* Try more servers when calling `/state_ids`

* More logging

* Maybe fix concurrent map write

* Revert "Maybe fix concurrent map write"

This reverts commit da0dbb8362.

* Enforce a limit of 20s per server, 5 mins total
2022-08-03 17:37:27 +01:00
Neil Alexander 2250768be1
Remove roominfo cache (#2615)
* Remove roominfo cache

It's the source of a number of race conditions which are seemingly causing bugs and CI failures.

* Make the linter less sad
2022-08-03 17:14:21 +01:00
Neil Alexander f4345dafde
Fix data race in lookupMissingStateViaStateIDs 2022-08-02 13:01:03 +01:00
Neil Alexander ca3fa58388
Various roominfo tweaks (#2607) 2022-08-02 12:27:15 +01:00
Neil Alexander 119cde3766
De-race types.RoomInfo (#2600) 2022-08-01 15:29:19 +01:00
Neil Alexander 05c83923e3
Optimise checking other servers allowed to see events (#2596)
* Try optimising checking if server is allowed to see event

* Fix error

* Handle case where snapshot NID is 0

* Fix query

* Update SQL

* Clean up `CheckServerAllowedToSeeEvent`

* Not supported on SQLite

* Maybe placate the unit tests

* Review comments
2022-08-01 14:11:00 +01:00
Till 081f5e7226
Update database migrations, remove goose (#2264)
* Add new db migration

* Update migrations
Remove goose

* Add possibility to test direct upgrades

* Try to fix WASM test

* Add checks for specific migrations

* Remove AddMigration
Use WithTransaction
Add Dendrite version to table

* Fix linter issues

* Update tests

* Update comments, outdent if

* Namespace migrations

* Add direct upgrade tests, skipping over one version

* Split migrations

* Update go version in CI

* Fix copy&paste mistake

* Use contexts in migrations

Co-authored-by: kegsay <kegan@matrix.org>
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2022-07-25 10:39:22 +01:00
Neil Alexander c7d978274d
Try to fix HTTP 500s on /members (#2581) 2022-07-22 19:43:48 +01:00
Neil Alexander f0c8a03649
Membership updater refactoring (#2541)
* Membership updater refactoring

* Pass in membership state

* Use membership check rather than referring to state directly

* Delete irrelevant membership states

* We don't need the leave event after all

* Tweaks

* Put a log entry in that I might stand a chance of finding

* Be less panicky

* Tweak invite handling

* Don't freak if we can't find the event NID

* Use event NID from `types.Event`

* Clean up

* Better invite handling

* Placate the almighty linter

* Blacklist a Sytest which is otherwise fine under Complement for reasons I don't understand

* Fix the sytest after all (thanks @S7evinK for the spot)
2022-07-22 14:44:04 +01:00
Till 9507966ebd
Fix issue with membership event_nid being 0 (#2580) 2022-07-20 12:39:06 +02:00
Neil Alexander 5c01306bb5
Add event state key cache (#2576) 2022-07-19 12:15:48 +01:00
Neil Alexander a1f9b02edf
Pointerise types.RoomInfo in the cache so we can update it in-place in the latest events updater 2022-07-13 10:13:34 +01:00
Neil Alexander 3ea21273bc
Ristretto cache (#2563)
* Try Ristretto cache

* Tweak

* It's beautiful

* Update GMSL

* More strict keyable interface

* Fix that some more

* Make less panicky

* Don't enforce mutability checks for now

* Determine mutability using deep equality

* Tweaks

* Namespace keys

* Make federation caches mutable

* Update cost estimation, add metric

* Update GMSL

* Estimate cost for metrics better

* Reduce counters a bit

* Try caching events

* Some guards

* Try again

* Try this

* Use separate caches for hopefully better hash distribution

* Fix bug with admitting events into cache

* Try to fix bugs

* Check nil

* Try that again

* Preserve order jeezo this is messy

* thanks VS Code for doing exactly the wrong thing

* Try this again

* Be more specific

* aaaaargh

* One more time

* That might be better

* Stronger sorting

* Cache expiries, async publishing of EDUs

* Put it back

* Use a shared cache again

* Cost estimation fixes

* Update ristretto

* Reduce counters a bit

* Clean up a bit

* Update GMSL

* 1GB

* Configurable cache sizees

* Tweaks

* Add `config.DataUnit` for specifying friendly cache sizes

* Various tweaks

* Update GMSL

* Add back some lazy loading caching

* Include key in cost

* Include key in cost

* Tweak max age handling, config key name

* Only register prometheus metrics if requested

* Review comments @S7evinK

* Don't return errors when creating caches (it is better just to crash since otherwise we'll `nil`-pointer exception everywhere)

* Review comments

* Update sample configs

* Update GHA Workflow

* Update Complement images to Go 1.18

* Remove the cache test from the federation API as we no longer guarantee immediate cache admission

* Don't check the caches in the renewal test

* Possibly fix the upgrade tests

* Update to matrix-org/gomatrixserverlib#322

* Update documentation to refer to Go 1.18
2022-07-11 14:31:31 +01:00
Till f3e8a9a4cb
Fix nil pointer access when redacting events (#2560) 2022-07-07 11:40:53 +02:00
Neil Alexander c0f824d437
Wrap error from SnapshotNIDFromEventID 2022-07-05 15:06:10 +01:00
Neil Alexander d4341a2d97
Return clearer error when no state NID exists for an event (#2555) 2022-07-05 15:01:34 +01:00
Till 5087b36af0
Fix QuerySharedUsers for the SyncAPI keychange consumer (#2554)
* Make more use of base.BaseDendrite

* Fix QuerySharedUsers if no UserIDs are supplied
2022-07-05 14:50:56 +02:00
Neil Alexander b50a24c666
Roomserver producers package (#2546)
* Give the roomserver a producers package

* Change init point

* Populate ACLs API

* Fix build issues

* `RoomEventProducer` naming
2022-07-01 10:54:07 +01:00
Till 89cd0e8fc1
Try to fix backfilling (#2548)
* Try to fix backfilling

* Return start/end to not confuse clients

* Update GMSL

* Update GMSL
2022-07-01 11:49:26 +02:00
Neil Alexander 519bc1124b
Add evacuateUser endpoint, use it when deactivating accounts (#2545)
* Add `evacuateUser` endpoint, use it when deactivating accounts

* Populate the API

* Clean up user devices when deactivating

* Include invites, delete pushers
2022-06-29 15:29:39 +01:00
Neil Alexander 2dea466685
Return an error if trying to invite a malformed user ID (#2543) 2022-06-29 12:32:24 +01:00
Neil Alexander 4c2a10f1a6
Handle state before, send history visibility in output (#2532)
* Check state before event

* Tweaks

* Refactor a bit, include in output events

* Don't waste time if soft failed either

* Tweak control flow, comments, use GMSL history visibility type
2022-06-13 15:11:10 +01:00
Till 660f7839f5
Correctly redact events over federation (#2526)
* Ensure we check powerlevel/origin before redacting an event

* Add passing test

* Use pl.UserLevel

* Make check more readable, also check for the sender
2022-06-09 18:38:07 +02:00
Neil Alexander 27948fb304
Optimise loadAuthEvents, add roomserver tracing 2022-06-07 14:23:26 +01:00
Neil Alexander 0d7020fbaf
Send tombstone to other servers when upgrading rooms 2022-06-06 17:27:50 +01:00
Neil Alexander 2cb609c428
Room upgrade tweaks
Squashed commit of the following:

commit 7a1568c716866594af6d0b1d561c58c96de29b20
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Mon Jun 6 15:17:49 2022 +0100

    Make errors more useful

commit 64befe7c9a901b00650442171660c2dc4ea575fa
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Mon Jun 6 15:02:40 2022 +0100

    Tweak ordering a bit
2022-06-06 15:18:02 +01:00
Neil Alexander 02597f15f0
Fix panic in QueryRestrictedJoinAllowed 2022-06-06 08:56:06 +01:00
Neil Alexander 02e5c74101
Revert #2457
Squashed commit of the following:

commit 2bd0daf4d61376d2dd56628eaff267b0bc63e116
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Wed Jun 1 09:55:54 2022 +0100

    Revert resolving old extremities as well as new

    This may no longer be needed with the new state fixes and probably just burns more CPU time than is strictly necessary.
2022-06-01 10:09:27 +01:00
Neil Alexander 3d9fe20748
Fix bugs related to state resolution (#2507)
* Fix bugs related to state resolution

* Clean up `resolve-state`

* Don't panic when entries can't be found

* Ensure we have state entries for the auth events

* Revert "Ensure we have state entries for the auth events"

This reverts commit 9b13b7ed37.

* Revert "Revert "Ensure we have state entries for the auth events""

This reverts commit d86db197e3.

* Fix bug

* Try that again

* Update gomatrixserverlib

* Remove recursion from `loadAuthEvents`
2022-06-01 09:46:21 +01:00
Neil Alexander b541f3043f
Add support for MSC3787 and org.matrix.msc3787 room version (update to matrix-org/gomatrixserverlib#310) 2022-05-26 15:08:17 +01:00
Neil Alexander 9eb4fec33b
Make logging output for state deletions a bit better 2022-05-26 10:38:46 +01:00
Neil Alexander 6940c7c7dd
Try to spot state deletions when they happen (#2489) 2022-05-25 16:40:31 +01:00
Neil Alexander 81843e8836
Restricted join support on /make_join, /send_join (#2478)
* Add `QueryRestrictedJoinAllowed`

* Add `Resident` flag to `QueryRestrictedJoinAllowedResponse`

* Check restricted joins on federation API

* Return `Restricted` to determine if the room was restricted or not

* Populate `AuthorisedVia` properly

* Sign the event on `/send_join`, return it in the `/send_join` response in the `"event"` key

* Kick back joins with invalid authorising user IDs, use event from `"event"` key if returned in `RespSendJoin`

* Use invite helper in `QueryRestrictedJoinAllowed`

* Only use users with the power to invite, change error bubbling a bit

* Placate the almighty linter

One day I will nuke `gocyclo` from orbit and everything in the world will be much better for it.

* Review comments
2022-05-25 10:05:30 +01:00
kegsay 6de29c1cd2
bugfix: E2EE device keys could sometimes not be sent to remote servers (#2466)
* Fix flakey sytest 'Local device key changes get to remote servers'

* Debug logs

* Remove internal/test and use /test only

Remove a lot of ancient code too.

* Use FederationRoomserverAPI in more places

* Use more interfaces in federationapi; begin adding regression test

* Linting

* Add regression test

* Unbreak tests

* ALL THE LOGS

* Fix a race condition which could cause events to not be sent to servers

If a new room event which rewrites state arrives, we remove all joined hosts
then re-calculate them. This wasn't done in a transaction so for a brief period
we would have no joined hosts. During this interim, key change events which arrive
would not be sent to destination servers. This would sporadically fail on sytest.

* Unbreak new tests

* Linting
2022-05-17 13:23:35 +01:00