Commit graph

512 commits

Author SHA1 Message Date
Till Faelligen 56c979d117
Fix nil pointer dereference for userRoomKeysStatements 2024-05-22 20:38:12 +02:00
Till b732eede27
Fix spaces over federation (#3347)
Fixes #2504

 A few issues with the previous iteration:
- We never returned `inaccessible_children`, which (if I read the code
correctly), made Synapse raise an error and thus not returning the
requested rooms
- For restricted rooms, we didn't return the list of allowed rooms
2024-03-28 20:40:45 +01:00
Till ad0a7d09e8
Add getting/deleting single event report (#3344)
Based on https://github.com/matrix-org/dendrite/pull/3342

Adds `GET /_synapse/admin/v1/event_reports/{reportID}` and `DELETE
/_synapse/admin/v1/event_reports/{reportID}`
2024-03-22 21:54:29 +00:00
Till 79072c3dcd
Add /_synapse/admin/v1/event_reports endpoint (#3342)
Based on #3340 

This adds a `/_synapse/admin/v1/event_reports` endpoint, the same
Synapse has. This way existing tools also work with Dendrite.
Given this is already getting huge (even though many test lines),
splitting this into two PRs. (The next adds "getting one report" and
"deleting reports")

[skip ci]
2024-03-22 22:32:30 +01:00
Till b9abbf7b20
Add event reporting (#3340)
Part of #3216 and #3226 

There will be a follow up PR which is going to add the same admin
endpoints Synapse has, so existing tools also work for Dendrite.
2024-03-21 19:27:34 +01:00
Till 928c8c8c4a
Query rooms with ACLs instead of all rooms (#3338)
This now should actually speed up startup times.
This is because _many_ rooms (like DMs) don't have room ACLs, this means
that we had around 95% pointless DB queries. (as queried on d.m.org)
2024-03-05 20:41:35 +01:00
Till 4ccf6d6f67
Cache ACLs regexes (#3336)
Since #3334 didn't change much on d.m.org, this is another attempt to
speed up startup.

Given moderation bots like Mjolnir/Draupnir are in many rooms with quite
often the same or similar ACLs, caching the compiled regexes _should_
reduce the startup time.

Using a pointer to the `*regexp.Regex` ensures we only store _one_
instance of a regex in memory, instead of potentially storing it hundred
of times. This should reduce memory consumption on servers with many
rooms with ACLs drastically. (5.1MB vs 1.7MB with this change on my
server with 8 ACL'd rooms [3 using the same ACLs])

[skip ci]
2024-02-28 20:58:56 +01:00
Till f4e77453cb
Speed up start up time by batch querying ACL events (#3334)
This should significantly speed up start up times on servers with many
rooms.
2024-02-21 14:10:22 +01:00
Till e9deb5244e
Fix /createRoom and /invite containing displayname/avatarURL of inviter (#3326)
Fixes #3324
2024-02-13 19:28:52 +01:00
Till d58daf9665
Update sentry reporting (#3305)
This hopefully reduces the garbage we currently produce.
(Using [GlitchTip](https://glitchtip.com/) on my personal instance, this
seems to look better)
2024-01-24 19:24:04 +01:00
Till 8e4dc6b4ae
Optimize PrevEventIDs when getting thousands of backwards extremeties (#3308)
Changes how many `PrevEventIDs` we send to other servers when
backfilling, capped to 100 events.

Unsure about how representative this benchmark is..
```
goos: linux
goarch: amd64
pkg: github.com/matrix-org/dendrite/roomserver/api
cpu: Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
                            │    old.txt     │               new.txt               │
                            │     sec/op     │   sec/op     vs base                │
PrevEventIDs/Original1-8         264.9n ± 5%   237.4n ± 7%  -10.36% (p=0.000 n=10)
PrevEventIDs/Original10-8        3.101µ ± 4%   1.590µ ± 2%  -48.72% (p=0.000 n=10)
PrevEventIDs/Original100-8       44.32µ ± 2%   12.80µ ± 4%  -71.11% (p=0.000 n=10)
PrevEventIDs/Original500-8     263.835µ ± 4%   7.907µ ± 4%  -97.00% (p=0.000 n=10)
PrevEventIDs/Original1000-8    578.798µ ± 2%   7.620µ ± 2%  -98.68% (p=0.000 n=10)
PrevEventIDs/Original2000-8   1272.039µ ± 2%   8.241µ ± 9%  -99.35% (p=0.000 n=10)
geomean                          43.81µ        3.659µ       -91.65%

                            │    old.txt     │               new.txt                │
                            │      B/op      │     B/op      vs base                │
PrevEventIDs/Original1-8          72.00 ± 0%     48.00 ± 0%  -33.33% (p=0.000 n=10)
PrevEventIDs/Original10-8        1512.0 ± 0%     500.0 ± 0%  -66.93% (p=0.000 n=10)
PrevEventIDs/Original100-8     11.977Ki ± 0%   7.023Ki ± 0%  -41.36% (p=0.000 n=10)
PrevEventIDs/Original500-8     67.227Ki ± 0%   7.023Ki ± 0%  -89.55% (p=0.000 n=10)
PrevEventIDs/Original1000-8   163.227Ki ± 0%   7.023Ki ± 0%  -95.70% (p=0.000 n=10)
PrevEventIDs/Original2000-8   347.227Ki ± 0%   7.023Ki ± 0%  -97.98% (p=0.000 n=10)
geomean                         12.96Ki        1.954Ki       -84.92%

                            │   old.txt   │              new.txt               │
                            │  allocs/op  │ allocs/op   vs base                │
PrevEventIDs/Original1-8       2.000 ± 0%   1.000 ± 0%  -50.00% (p=0.000 n=10)
PrevEventIDs/Original10-8      6.000 ± 0%   2.000 ± 0%  -66.67% (p=0.000 n=10)
PrevEventIDs/Original100-8     9.000 ± 0%   3.000 ± 0%  -66.67% (p=0.000 n=10)
PrevEventIDs/Original500-8    12.000 ± 0%   3.000 ± 0%  -75.00% (p=0.000 n=10)
PrevEventIDs/Original1000-8   14.000 ± 0%   3.000 ± 0%  -78.57% (p=0.000 n=10)
PrevEventIDs/Original2000-8   16.000 ± 0%   3.000 ± 0%  -81.25% (p=0.000 n=10)
geomean                        8.137        2.335       -71.31%
```
2024-01-20 22:26:57 +01:00
Till edd02ec468
Fix panic if unable to assign a state key NID (#3294) 2023-12-30 18:34:36 +01:00
Till f93d1c4790
Use AckExplicitPolicy instead of AckAllPolicy (#3288)
Fixes https://github.com/matrix-org/dendrite/issues/3240 and potentially
a root cause for state resets.

While testing, I've had added some more debug logging:
```
time="2023-12-16T18:13:11.319458084Z" level=warning msg="already processed event" event_id="$qFYMl_F2vb1N0yxmvlFAMhqhGhLKq4kA-o_YCQKH7tQ" kind=KindNew times=2
time="2023-12-16T18:13:14.537389126Z" level=warning msg="already processed event" event_id="$EU-LTsKErT6Mt1k12-p_3xOHfiLaK6gtwVDlZ35lSuo" kind=KindNew times=5
time="2023-12-16T18:13:16.789551206Z" level=warning msg="already processed event" event_id="$dIPuAfTL5x0VyG873LKPslQeljCSxFT1WKxUtjIMUGE" kind=KindNew times=5
time="2023-12-16T18:13:17.383838767Z" level=warning msg="already processed event" event_id="$7noSZiCkzerpkz_UBO3iatpRnaOiPx-3IXc0GPDQVGE" kind=KindNew times=2
time="2023-12-16T18:13:22.091946597Z" level=warning msg="already processed event" event_id="$3Lvo3Wbi2ol9-nNbQ93N-E2MuGQCJZo5397KkFH-W6E" kind=KindNew times=1
time="2023-12-16T18:13:23.026417446Z" level=warning msg="already processed event" event_id="$lj1xS46zsLBCChhKOLJEG-bu7z-_pq9i_Y2DUIjzGy4" kind=KindNew times=4
```

So we did receive the same event over and over again. Given they are
`KindNew`, we don't short circuit if we already processed them, which
potentially caused the state to be calculated with a now wrong state
snapshot.

Also fixes the back pressure metric. We now correctly increment the
counter once we sent the message to NATS and decrement it once we
actually processed an event.
2023-12-19 08:25:47 +01:00
Till b8f91485b4
Update ACLs when received as outliers (#3008)
This should fix #3004 by making sure we also update our in-memory ACLs
after joining a new room.
Also makes use of more caching in `GetStateEvent`

Bonus: Adds some tests, as I was about to use `GetBulkStateContent`, but
turns out that `GetStateEvent` is basically doing the same, just that it
only gets the `eventTypeNID`/`eventStateKeyNID` once and not for every
call.
2023-11-22 15:38:04 +01:00
Till 7863a405a5
Use IsBlacklistedOrBackingOff to determine if we should try to fetch devices (#3254)
Use `IsBlacklistedOrBackingOff` from the federation API to check if we
should fetch devices.

To reduce back pressure, we now only queue retrying servers if there's
space in the channel.
2023-11-09 08:43:27 +01:00
Till 699f5ca8c1
More rows.Close() and rows.Err() (#3262)
Looks like we missed some `rows.Close()`

Even though `rows.Err()` is mostly not necessary, we should be more
consistent in the DB layer.

[skip ci]
2023-11-09 08:42:33 +01:00
Till 5f872f4a82
Fix panic in QueryNextRoomHierarchyPage (#3253)
Sentry reported the following panic:
```
time="2023-11-01T01:33:56.220583478Z" level=error msg="Request panicked!
goroutine 43763845 [running]:
runtime/debug.Stack()
	runtime/debug/stack.go:24 +0x5e
github.com/matrix-org/dendrite/internal/httputil.MakeExternalAPI.MakeJSONAPI.Protect.func3.1()
	github.com/matrix-org/util@v0.0.0-20221111132719-399730281e66/json.go:98 +0x13e
panic({0x15b5540?, 0x2453560?})
	runtime/panic.go:914 +0x21f
github.com/matrix-org/dendrite/internal/httputil.MakeAuthAPI.func1.1()
	github.com/matrix-org/dendrite/internal/httputil/httpapi.go:91 +0x4a
panic({0x15b5540?, 0x2453560?})
	runtime/panic.go:914 +0x21f
github.com/matrix-org/dendrite/roomserver/internal/query.(*Queryer).QueryNextRoomHierarchyPage(0x413185?, {0x1a576e0, 0xc0436705a0}, {{{0xc01e5fd260, 0x1f}, {0xc01e5fd261, 0x12}, {0xc01e5fd274, 0xb}}, {0xc145cb5200, ...}, ...}, ...)
	github.com/matrix-org/dendrite/roomserver/internal/query/query_room_hierarchy.go:116 +0xbfe
github.com/matrix-org/dendrite/clientapi/routing.QueryRoomHierarchy(0xc0be13b200, 0xc144e65dd0, {0xc01e5fd260?, 0x6?}, {0x7faf140639c8, 0xc00059af20}, 0xc08adca000?)
	github.com/matrix-org/dendrite/clientapi/routing/room_hierarchy.go:141 +0x68b
github.com/matrix-org/dendrite/clientapi/routing.Setup.func35(0xc03e7d5c20?, 0x17c3a57?)
	github.com/matrix-org/dendrite/clientapi/routing/routing.go:534 +0xbe
github.com/matrix-org/dendrite/internal/httputil.MakeAuthAPI.func1(0xc0bd097300)
	github.com/matrix-org/dendrite/internal/httputil/httpapi.go:108 +0x5ed
github.com/matrix-org/util.(*jsonRequestHandlerWrapper).OnIncomingRequest(0xc0bd097200?, 0xc13b7d6fc0?)
	github.com/matrix-org/util@v0.0.0-20221111132719-399730281e66/json.go:79 +0x19
github.com/matrix-org/dendrite/internal/httputil.MakeExternalAPI.MakeJSONAPI.func2({0x1a54880, 0xc138f28b60}, 0xc0bd097200?)
	github.com/matrix-org/util@v0.0.0-20221111132719-399730281e66/json.go:141 +0xaa
github.com/matrix-org/dendrite/internal/httputil.MakeExternalAPI.MakeJSONAPI.Protect.func3({0x1a54880?, 0xc138f28b60?}, 0x17c01d9?)
	github.com/matrix-org/util@v0.0.0-20221111132719-399730281e66/json.go:103 +0x63
net/http.HandlerFunc.ServeHTTP(...)
	net/http/server.go:2136
github.com/matrix-org/dendrite/internal/httputil.MakeExternalAPI.func1({0x1a54880?, 0xc138f28b60?}, 0xc0bd097100)
	github.com/matrix-org/dendrite/internal/httputil/httpapi.go:191 +0x411
net/http.HandlerFunc.ServeHTTP(0xc0bd097000?, {0x1a54880?, 0xc138f28b60?}, 0xbe1348905308878e?)
	net/http/server.go:2136 +0x29
github.com/gorilla/mux.(*Router).ServeHTTP(0xc000000000, {0x1a54880, 0xc138f28b60}, 0xc0bd096f00)
	github.com/gorilla/mux@v1.8.0/mux.go:210 +0x1c5
github.com/matrix-org/dendrite/setup/base.SetupAndServeHTTP.(*Handler).Handle.(*Handler).handle.func5({0x1a54880, 0xc138f28b60}, 0xc0bd096e00)
	github.com/getsentry/sentry-go@v0.14.0/http/sentryhttp.go:103 +0x298
net/http.HandlerFunc.ServeHTTP(0xc0bd096a00?, {0x1a54880?, 0xc138f28b60?}, 0x7fae6812f5d0?)
	net/http/server.go:2136 +0x29
github.com/gorilla/mux.(*Router).ServeHTTP(0xc000000a80, {0x1a54880, 0xc138f28b60}, 0xc0bd096900)
	github.com/gorilla/mux@v1.8.0/mux.go:210 +0x1c5
net/http.serverHandler.ServeHTTP({0xc02884c4e0?}, {0x1a54880?, 0xc138f28b60?}, 0x6?)
	net/http/server.go:2938 +0x8e
net/http.(*conn).serve(0xc1926922d0, {0x1a576e0, 0xc024a6ec90})
	net/http/server.go:2009 +0x5f4
created by net/http.(*Server).Serve in goroutine 16979
	net/http/server.go:3086 +0x5cb
" context=missing panic="runtime error: invalid memory address or nil pointer dereference"
```

[skip ci]
2023-11-08 14:22:02 +01:00
Till da7bca0224
Some tweaks for the device list updater (#3251)
This makes the following changes:
- Adds two new metrics observing the usage of the `DeviceListUpdater`
workers
- Makes the number of workers configurable
- Adds a 30s timeout for DB requests when receiving a device list update
over federation
2023-10-31 16:39:45 +01:00
Till 4fa8512d57
Check event is not rejected (#3243)
Companion PR to https://github.com/matrix-org/gomatrixserverlib/pull/421
2023-10-25 09:47:21 +02:00
Till 8b3adaf244
Fix state resets (#3231)
Needs https://github.com/matrix-org/gomatrixserverlib/pull/419

May fix: https://github.com/matrix-org/dendrite/issues/2508,
https://github.com/matrix-org/dendrite/issues/1760
2023-10-23 15:17:21 +02:00
Till f02d998253
Remove the creator field when upgrading to v11 (#3210)
Minor oversight
2023-09-28 07:36:57 +02:00
Till 05a8f1ede3
Support for room version v11 (#3204)
Fixes #3203
2023-09-27 08:27:08 +02:00
devonh 16d922de70
Complement fixes for pseudoIDs (#3206) 2023-09-26 17:44:49 +00:00
devonh 8245b24100
Update gmsl to use new validated RoomID on PDUs (#3200)
GMSL returns a `spec.RoomID` when calling `PDU.RoomID()`
2023-09-15 14:39:06 +00:00
Sam Wedgwood 9b5be6b9c5
[pseudoIDs] More pseudo ID fixes - Part 2 (#3181)
Fixes include:
- Translating state keys that contain user IDs to their respective room
keys for both querying and sending state events
- **NOTE**: there may be design discussion needed on what should happen
when sender keys cannot be found for users
- A simple fix for kicking guests from rooms properly
- Logic for boundary history visibilities was slightly off (I'm
surprised this only manifested in pseudo ID room versions)

Signed-off-by: `Sam Wedgwood <sam@wedgwood.dev>`
2023-08-24 16:43:51 +01:00
Sam Wedgwood 9a12420428
[pseudoID] More pseudo ID fixes (#3167)
Signed-off-by: `Sam Wedgwood <sam@wedgwood.dev>`
2023-08-15 12:37:04 +01:00
Sam Wedgwood 35804f8493
Add config key for default room version (#3171)
This PR adds a config key `room_server.default_config_key` to set the
default room version for the room server.

Signed-off-by: `Sam Wedgwood <sam@wedgwood.dev>`
2023-08-08 14:20:05 +01:00
Sam Wedgwood c7193e24d0
Use *spec.SenderID for QuerySenderIDForUser (#3164)
There are cases where a dendrite instance is unaware of a pseudo ID for
a user, the user is not a member of that room. To represent this case,
we currently use the 'zero' value, which is often not checked and so
causes errors later down the line. To make this case more explict, and
to be consistent with `QueryUserIDForSender`, this PR changes this to
use a pointer (and `nil` to mean no sender ID).

Signed-off-by: `Sam Wedgwood <sam@wedgwood.dev>`
2023-08-02 11:12:14 +01:00
Sam Wedgwood af13fa1c75
[pseudoIDs] Fixes for room alias tests (#3159)
Some (deceptively) simple fixes for some bugs that caused room alias
tests to fail (sytext `tests/30rooms/05aliases.pl`). Each commit has
details about what it fixes.

Sytest results:

- Sytest before (79d4a0e):
https://gist.github.com/swedgwood/972ac4ef93edd130d3db0930703d6c82
- Sytest after (4b09bed):
https://gist.github.com/swedgwood/504b00ac4ee892acb757b7fac55fa28a

Room aliases go from `8/15` to `15/15`, but looks like these fixes also
managed to fix about `4` other tests, which is a nice bonus :)

Signed-off-by: `Sam Wedgwood <sam@wedgwood.dev>`
2023-07-31 14:39:41 +01:00
Till Faelligen 79d4a0e399
Restore old behaviour of PurgeRoom 2023-07-26 09:09:04 +02:00
devonh c809e95335
Fix event federation with pseudoID rooms (#3156) 2023-07-21 16:08:40 +00:00
Sam Wedgwood 9582827493
de-MSC-ifying space summaries (MSC2946) (#3134)
- This PR moves and refactors the
[code](https://github.com/matrix-org/dendrite/blob/main/setup/mscs/msc2946/msc2946.go)
for
[MSC2946](https://github.com/matrix-org/matrix-spec-proposals/pull/2946)
('Space Summaries') to integrate it into the rest of the codebase.
- Means space summaries are no longer hidden behind an MSC flag
- Solves #3096

Signed-off-by: Sam Wedgwood <sam@wedgwood.dev>
2023-07-20 15:06:05 +01:00
Till 297479ea49
Use pointer when passing the connection manager around (#3152)
As otherwise existing connections aren't reused.
2023-07-19 13:37:04 +02:00
Till 5267cc0f54
Optimise getting local members and membership counts (#3150)
The previous version was getting **ALL** membership events (as
`ClientEvents`, so going through `NewEventFromTrustedJSONWithID`) for a
given room.
Now we are querying only locally joined users as `ClientEvents`, which
should **significantly** reduce allocations.

Take for example a large room with 2k membership events, but only 1
local user - avoiding 1999 `NewEventFromTrustedJSONWithID` calls just to
calculate the `roomSize` which we can also query by other means.

This is also getting called for every `OutputRoomEvent` in the userAPI.

Benchmark with 1 local user and 100 remote users.
```
pkg: github.com/matrix-org/dendrite/userapi/consumers
cpu: 12th Gen Intel(R) Core(TM) i5-12500H
                    │   old.txt   │               new.txt               │
                    │   sec/op    │   sec/op     vs base                │
LocalRoomMembers-16   375.9µ ± 7%   327.6µ ± 6%  -12.85% (p=0.000 n=10)

                    │    old.txt    │               new.txt                │
                    │     B/op      │     B/op      vs base                │
LocalRoomMembers-16   79.426Ki ± 0%   8.507Ki ± 0%  -89.29% (p=0.000 n=10)

                    │   old.txt   │              new.txt               │
                    │  allocs/op  │ allocs/op   vs base                │
LocalRoomMembers-16   1015.0 ± 0%   277.0 ± 0%  -72.71% (p=0.000 n=10)
```
2023-07-13 14:19:08 +02:00
Till eb9e90379d
Add event size checks similar to Synapse (#3140)
Companion to https://github.com/matrix-org/gomatrixserverlib/pull/400
This tries to mimic the logic found in Synapse, as dropping events can
break rooms (and we may end up in endless loops..)
2023-07-07 20:37:23 +02:00
devonh d507c5fc95
Add pseudoID compatibility to Invites (#3126) 2023-07-06 15:15:24 +00:00
Till Faelligen 5a87c703fa
Fix metrics.. 2023-07-05 12:34:53 +02:00
Till 4c3a526e1b
Fix adding state events to the database (#3133)
When we're adding state to the database, we check which eventNIDs are
already in a block, if we already have that eventNID, we remove it from
the list. In its current form we would skip over eventNIDs in the case
we already found a match (we're decrementing `i` twice)
My theory is, that when we later get the state blocks, we are receiving
"too many" eventNIDs (well, yea, we stored too many), which may or may
not can result in state resets when comparing different state snapshots.
(e.g. when adding state we stored a eventNID by accident because we
skipped it, later we add more state and are not adding it because we
don't skip it)
2023-07-04 17:15:44 +02:00
Till Faelligen 939ee325f8
Actually use the parameter 2023-06-29 18:02:11 +02:00
Till 23cd7877a1
Add MXIDMapping for pseudoID rooms (#3112)
Add `MXIDMapping` on membership events when
creating/joining rooms.
2023-06-28 20:29:49 +02:00
Till a734b112c6
Fix backfilling (#3117)
This should fix two issues with backfilling:
1. right after creating and joining a room over federation, we are doing
a `/backfill` request, which would return redacted events, because the
`authEvents` are empty. Even though the spec states that, in the absence
of a history visibility event, it should be handled as `shared`.
2. `gomatrixserverlib: unsupported room version ''` - because, well, we
were never setting the `roomInfo` field..
2023-06-20 16:52:29 +02:00
Devon Hudson 8cf6c381e2
Fix senderID/key conversion unit tests 2023-06-14 17:11:27 +01:00
Devon Hudson 3f4df25b31
Add missing dep 2023-06-14 17:04:19 +01:00
Devon Hudson 5aaa539e3e
Fix senderID/key conversions 2023-06-14 16:42:09 +01:00
devonh e4665979bf
Merge SenderID & Per Room User Key work (#3109) 2023-06-14 14:23:46 +00:00
Till 7a2e325d10
Add AssignRoomNID to pre-assign roomNIDs (#3111) 2023-06-13 16:28:41 +02:00
Till 2c87972a3a
Create user room key if needed (#3108) 2023-06-13 14:19:31 +02:00
devonh 77d9e4e93d
Cleanup remaining statekey usage for senderIDs (#3106) 2023-06-12 11:19:25 +00:00
Till 832ccc32f6
Add initial support for storing user room keys (#3098) 2023-06-12 12:45:42 +02:00
devonh 8ea1a11105
Use SenderID Type (#3105) 2023-06-07 17:14:35 +00:00