Discovered while running
https://gitlab.futo.org/load-testing/matrix-goose.
Dendrite locks up and runs into `context cancelled`, so the error is not
`sql.ErrNoRows` nor "default" (and definitely shouldn't return that the
account exists in this case)
Fixes#2504
A few issues with the previous iteration:
- We never returned `inaccessible_children`, which (if I read the code
correctly), made Synapse raise an error and thus not returning the
requested rooms
- For restricted rooms, we didn't return the list of allowed rooms
Based on #3340
This adds a `/_synapse/admin/v1/event_reports` endpoint, the same
Synapse has. This way existing tools also work with Dendrite.
Given this is already getting huge (even though many test lines),
splitting this into two PRs. (The next adds "getting one report" and
"deleting reports")
[skip ci]
Part of #3216 and #3226
There will be a follow up PR which is going to add the same admin
endpoints Synapse has, so existing tools also work for Dendrite.
This now should actually speed up startup times.
This is because _many_ rooms (like DMs) don't have room ACLs, this means
that we had around 95% pointless DB queries. (as queried on d.m.org)
Since #3334 didn't change much on d.m.org, this is another attempt to
speed up startup.
Given moderation bots like Mjolnir/Draupnir are in many rooms with quite
often the same or similar ACLs, caching the compiled regexes _should_
reduce the startup time.
Using a pointer to the `*regexp.Regex` ensures we only store _one_
instance of a regex in memory, instead of potentially storing it hundred
of times. This should reduce memory consumption on servers with many
rooms with ACLs drastically. (5.1MB vs 1.7MB with this change on my
server with 8 ACL'd rooms [3 using the same ACLs])
[skip ci]
This hopefully reduces the garbage we currently produce.
(Using [GlitchTip](https://glitchtip.com/) on my personal instance, this
seems to look better)
Fixes https://github.com/matrix-org/dendrite/issues/3240 and potentially
a root cause for state resets.
While testing, I've had added some more debug logging:
```
time="2023-12-16T18:13:11.319458084Z" level=warning msg="already processed event" event_id="$qFYMl_F2vb1N0yxmvlFAMhqhGhLKq4kA-o_YCQKH7tQ" kind=KindNew times=2
time="2023-12-16T18:13:14.537389126Z" level=warning msg="already processed event" event_id="$EU-LTsKErT6Mt1k12-p_3xOHfiLaK6gtwVDlZ35lSuo" kind=KindNew times=5
time="2023-12-16T18:13:16.789551206Z" level=warning msg="already processed event" event_id="$dIPuAfTL5x0VyG873LKPslQeljCSxFT1WKxUtjIMUGE" kind=KindNew times=5
time="2023-12-16T18:13:17.383838767Z" level=warning msg="already processed event" event_id="$7noSZiCkzerpkz_UBO3iatpRnaOiPx-3IXc0GPDQVGE" kind=KindNew times=2
time="2023-12-16T18:13:22.091946597Z" level=warning msg="already processed event" event_id="$3Lvo3Wbi2ol9-nNbQ93N-E2MuGQCJZo5397KkFH-W6E" kind=KindNew times=1
time="2023-12-16T18:13:23.026417446Z" level=warning msg="already processed event" event_id="$lj1xS46zsLBCChhKOLJEG-bu7z-_pq9i_Y2DUIjzGy4" kind=KindNew times=4
```
So we did receive the same event over and over again. Given they are
`KindNew`, we don't short circuit if we already processed them, which
potentially caused the state to be calculated with a now wrong state
snapshot.
Also fixes the back pressure metric. We now correctly increment the
counter once we sent the message to NATS and decrement it once we
actually processed an event.
This should fix#3004 by making sure we also update our in-memory ACLs
after joining a new room.
Also makes use of more caching in `GetStateEvent`
Bonus: Adds some tests, as I was about to use `GetBulkStateContent`, but
turns out that `GetStateEvent` is basically doing the same, just that it
only gets the `eventTypeNID`/`eventStateKeyNID` once and not for every
call.
Use `IsBlacklistedOrBackingOff` from the federation API to check if we
should fetch devices.
To reduce back pressure, we now only queue retrying servers if there's
space in the channel.
This makes the following changes:
- Adds two new metrics observing the usage of the `DeviceListUpdater`
workers
- Makes the number of workers configurable
- Adds a 30s timeout for DB requests when receiving a device list update
over federation
Fixes include:
- Translating state keys that contain user IDs to their respective room
keys for both querying and sending state events
- **NOTE**: there may be design discussion needed on what should happen
when sender keys cannot be found for users
- A simple fix for kicking guests from rooms properly
- Logic for boundary history visibilities was slightly off (I'm
surprised this only manifested in pseudo ID room versions)
Signed-off-by: `Sam Wedgwood <sam@wedgwood.dev>`
This PR adds a config key `room_server.default_config_key` to set the
default room version for the room server.
Signed-off-by: `Sam Wedgwood <sam@wedgwood.dev>`
There are cases where a dendrite instance is unaware of a pseudo ID for
a user, the user is not a member of that room. To represent this case,
we currently use the 'zero' value, which is often not checked and so
causes errors later down the line. To make this case more explict, and
to be consistent with `QueryUserIDForSender`, this PR changes this to
use a pointer (and `nil` to mean no sender ID).
Signed-off-by: `Sam Wedgwood <sam@wedgwood.dev>`