Commit graph

95 commits

Author SHA1 Message Date
Neil Alexander b86529f3f8
Cap timeout, move cross-signing section 2022-10-24 09:32:28 +01:00
Neil Alexander c762abe1ef
Log what's going on 2022-10-24 09:14:07 +01:00
devonh 9041491201
Mutex protect query keys response (#2812) 2022-10-20 15:54:18 +00:00
Neil Alexander 8cbe14bd6d
Fix lock contention 2022-10-19 12:27:34 +01:00
Neil Alexander c1463db6c9
Fix concurrent map write in key server 2022-10-19 12:03:12 +01:00
Till b9d0e9f7ed
Add test for QueryDeviceMessages (#2773)
Adds tests for `QueryDeviceMessages` and also includes some
optimizations to reduce allocations in the DB layer.
2022-10-07 10:54:42 +02:00
Till ec5d1d681d
Always return one_time_key_counts on /keys/upload (#2769)
The OTK count is
[required](https://spec.matrix.org/v1.4/client-server-api/#post_matrixclientv3keysupload)
in responses to `/keys/upload`, so return those.
2022-10-06 12:30:24 +02:00
Neil Alexander 6f602bb096
Demote Failed to query device keys for some users warning to level=debug
Many of these warnings are due to dead servers and are quite annoying when they fill up the logs.
2022-10-05 11:16:05 +01:00
Neil Alexander fec3ee384b
Stop CPU burn in PerformMarkAsStaleIfNeeded 2022-10-03 12:59:56 +01:00
Neil Alexander 8a82f10046
Allow more time for device list updates (#2749)
This updates the device list updater so that it has a context
per-request, rather than a global 30 seconds for the entire server. This
could mean that talking to a slow remote server or requesting a lot of
user IDs was pretty much guaranteed to fail.

It also uses the process context to allow correct cancellation when
Dendrite wants to shut down cleanly.
2022-09-30 09:41:16 +01:00
Till 9005e5b4a8
Add /_dendrite/admin/refreshDevices/{userID} (#2746)
Allows to immediately query `/devices/{userID}` over federation to
(hopefully) resolve E2EE issues.
2022-09-30 09:32:31 +01:00
Till e007b8038f
Mark device list as stale, if we don't have the requesting device (#2728)
This hopefully makes E2EE chats a little bit more reliable by re-syncing
devices if we don't have the `requesting_device_id` in our database. (As
seen in
[Synapse](c52abc1cfd/synapse/handlers/devicemessage.py (L157-L201)))
2022-09-20 11:32:03 +02:00
Till 42a82091a8
Fix issue with stale device lists (#2702)
We were only sending the last entry to the worker, so most likely missed
updates.
2022-09-08 12:03:44 +02:00
Till 0d697f6754
Add HTTP status code to FederationClientError (#2699)
Also ensures we wait on more HTTP status codes.
2022-09-07 16:14:09 +02:00
Till Faelligen 4e352390b6
Re-add waitTime if we're not blacklisted and no RetryAfter was
specified.
2022-09-07 12:13:02 +02:00
Till 440eb0f3a2
Handle errors differently in the DeviceListUpdater (#2695)
`If a device list update goes missing, the server resyncs on the next
one` was failing because a previous test would receive a `waitTime` of
1h, resulting in the test timing out.
This now tries to handle the returned errors differently, e.g. by using
the default `waitTime` of 2s. Also doesn't try further users in the
list, if one of the errors would cause a longer `waitTime`.
2022-09-07 11:44:27 +02:00
Neil Alexander 5513f182cc
Enforce device list backoffs (#2653)
This ensures that if the device list updater is already backing off a node, we don't try to call processServer again anyway for server just because the server name arrived in the channel. Otherwise we can keep trying to hit a remote server that is offline or not behaving every second and that spams the logs too.
2022-08-19 10:23:09 +01:00
Neil Alexander c45d0936b5
Generic-based internal HTTP API (#2626)
* Generic-based internal HTTP API (tested out on a few endpoints in the federation API)

* Add `PerformInvite`

* More tweaks

* Fix metric name

* Fix LookupStateIDs

* Lots of changes to clients

* Some serverside stuff

* Some error handling

* Use paths as metric names

* Revert "Use paths as metric names"

This reverts commit a9323a6a34.

* Namespace metric names

* Remove duplicate entry

* Remove another duplicate entry

* Tweak error handling

* Some more tweaks

* Update error behaviour

* Some more error tweaking

* Fix API path for `PerformDeleteKeys`

* Fix another path

* Tweak federation client proxying

* Fix another path

* Don't return typed nils

* Some more tweaks, not that it makes any difference

* Tweak federation client proxying

* Maybe fix the key backup test
2022-08-11 15:29:33 +01:00
Neil Alexander c8935fb53f
Do not use ioutil as it is deprecated (#2625) 2022-08-05 10:26:59 +01:00
Till 1b7f84250a
Fix linter issues (#2624)
* Try that again

* All hail the mighty linter?

* And once again

* goimport all the things
2022-08-05 11:12:41 +02:00
Till 9fe509b18d
Fix syncapi shared users query & device lists (#2614)
* Fix query issue, only add "changed" users if we actually share a room

* Avoid log spam if context is done

* Undo changes to filterSharedUsers

* Add logging again..

* Fix SQLite shared users query

* Change query to include invited users
2022-08-03 18:35:17 +02:00
Neil Alexander 7120eb6bc9
Add InputDeviceListUpdate to the keyserver, remove old input API (#2536)
* Add `InputDeviceListUpdate` to the keyserver, remove old input API

* Fix copyright

* Log more information when a device list update fails
2022-06-15 14:27:07 +01:00
Neil Alexander 70cd8c68c2
Reduce error levels on device list update 2022-06-01 09:49:46 +01:00
kegsay 6de29c1cd2
bugfix: E2EE device keys could sometimes not be sent to remote servers (#2466)
* Fix flakey sytest 'Local device key changes get to remote servers'

* Debug logs

* Remove internal/test and use /test only

Remove a lot of ancient code too.

* Use FederationRoomserverAPI in more places

* Use more interfaces in federationapi; begin adding regression test

* Linting

* Add regression test

* Unbreak tests

* ALL THE LOGS

* Fix a race condition which could cause events to not be sent to servers

If a new room event which rewrites state arrives, we remove all joined hosts
then re-calculate them. This wasn't done in a transaction so for a brief period
we would have no joined hosts. During this interim, key change events which arrive
would not be sent to destination servers. This would sporadically fail on sytest.

* Unbreak new tests

* Linting
2022-05-17 13:23:35 +01:00
kegsay 85704eff20
Clean up interface definitions (#2427)
* tidy up interfaces

* remove unused GetCreatorIDForAlias

* Add RoomserverUserAPI interface

* Define more interfaces

* Use AppServiceInternalAPI for consistent naming

* clean up federationapi constructor a bit

* Fix monolith in -http mode
2022-05-06 12:39:26 +01:00
Neil Alexander 31799a3b2a
Device list display name fixes (#2405)
* Get device names from `unsigned` in `/user/devices`

* Fix display name updates

* Fix bug

* Fix another bug
2022-04-29 16:02:55 +01:00
Neil Alexander 2ff75b7c80
Ensure signature map exists (fixes #2393) (#2397) 2022-04-28 11:34:19 +01:00
Neil Alexander 5306c73b00
Fix bug when uploading device signatures (#2377)
* Find the complete key ID when uploading signatures

* Try that again

* Try splitting the right thing

* Don't do it for device keys

* Refactor `QuerySignatures`

* Revert "Refactor `QuerySignatures`"

This reverts commit c02832a3e9.

* Both requested key IDs and master/self/user keys

* Fix uniqueness

* Try tweaking GMSL

* Update GMSL again

* Revert "Update GMSL again"

This reverts commit bd6916cc37.

* Revert "Try tweaking GMSL"

This reverts commit 2a054524da.

* Database migrations
2022-04-26 13:08:54 +01:00
Neil Alexander aad81b7b4d
Only call key update process functions if there are updates, don't send things to ourselves over federation 2022-04-25 14:22:46 +01:00
Neil Alexander 6d78c4d67d
Fix retrieving cross-signing signatures in /user/devices/{userId} (#2368)
* Fix retrieving cross-signing signatures in `/user/devices/{userId}`

We need to know the target device IDs in order to get the signatures and we weren't populating those.

* Fix up signature retrieval

* Fix SQLite

* Always include the target's own signatures as well as the requesting user
2022-04-22 14:58:24 +01:00
Neil Alexander 9b316ac64c
Slower federation warm-up (#2320)
* Wake destination queues gradually, rather than all at once

* Delay device list updates too

* Maximum two minute warmup period
2022-04-04 15:14:10 +01:00
S7evinK 49dc49b232
Remove eduserver (#2306)
* Move receipt sending to own JetStream producer

* Move SendToDevice to producer

* Remove most parts of the EDU server

* Fix SendToDevice & copyrights

* Move structs, cleanup EDU Server traces

* Use HeadersOnly subscription

* Missing file

* Fix linter issues

* Move consumers to own files

* Rename durable consumer; Consumer cleanup

* Docs/config cleanup
2022-03-29 14:14:35 +02:00
Neil Alexander d983d17355
Fix lint errors 2022-03-24 10:03:22 +00:00
Neil Alexander e485f9c2bd
64-bit stream IDs for device list updates (#2267) 2022-03-10 13:17:28 +00:00
Kegan Dougal e46a61c49e Skip flakey test for now 2022-03-02 11:38:13 +00:00
Kegan Dougal a4c918ee17 Fix data race in unit tests 2022-03-02 10:49:36 +00:00
kegsay 23f028cf6e
Add unit test for device list update debouncing (#2220)
* Add unit test for device list update debouncing

* bugfix: actually return stale device lists in the test...

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2022-03-01 17:18:06 +00:00
Neil Alexander 58bf91a585
Check for changes in PerformUploadDeviceKeys (#2233)
* Don't generate key change notifs if nothing changed on cross-signing upload

* Check both directions of changes
2022-03-01 11:00:54 +00:00
S7evinK 41dc651b25
Send device update to local users if remote display name changes (#2215)
* Send device_list update to satisfy sytest

* Fix build issue from merged in change

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2022-02-22 16:34:53 +00:00
Neil Alexander c7811e9d71
Add DeviceKeysEqual (#2219)
* Add `DeviceKeysEqual`

* Update check order

* Fix check

* Tweak conditions again

* One more time

* Single return value
2022-02-22 15:43:17 +00:00
Neil Alexander 600fbae31f
Only emit key change notifications from federation when changes are made (#2217)
* Only emit key changes when poked over federation

* Remove logging

* Fix unit test possibly
2022-02-22 13:35:06 +00:00
Neil Alexander 9bd5e414c9
Missing commit from #2186 2022-02-18 11:32:45 +00:00
Neil Alexander 153bfbbea5
Merge both user API databases into one (#2186)
* Merge user API databases into one

* Remove DeviceDatabase from config

* Fix tests

* Try that again

* Clean up keyserver device keys when the devices no longer exist in the user API

* Tweak ordering

* Fix UserExists flag, device check

* Allow including empty entries so we can clean them up

* Remove logging
2022-02-18 11:31:05 +00:00
Neil Alexander 140077265e
Make GetUserDevices logging entry more useful 2022-02-17 15:02:06 +00:00
S7evinK 89b7519089
Raise waitTime for network related issues (#2192) 2022-02-17 13:15:49 +00:00
S7evinK e9b672a34e
Make "Device list doesn't change if remote server is down" pass (#2190) 2022-02-16 16:56:45 +00:00
S7evinK ac25065a54
Fix sytest uploading signed devices gets propagated over federation (#2162)
* Remove unneeded logging

* Add MasterKey & SelfSigningKey to update
Avoid panic if signatures are not present

* Add passing test

* Revert "Add MasterKey & SelfSigningKey to update"

This reverts commit 2c81b34884.

* Send MasterKey & SelfSigningKey with update

* Debugging

* Remove delete() so we also query signingkeys
2022-02-09 13:11:43 +01:00
S7evinK 2771d93748
Remove OutputKeyChangeEvent consumer on keyserver (#2160)
* Remove keyserver consumer

* Remove keyserver from eduserver

* Directly upload device keys without eduserver

* Add passing tests
2022-02-08 18:13:38 +01:00
kegsay 2c581377a5
Remodel how device list change IDs are created (#2098)
* Remodel how device list change IDs are created

Previously we made them using the offset Kafka supplied.
We don't run Kafka anymore, so now we make the SQL table assign
the change ID via an AUTOINCREMENTing ID. Redesign the
`keyserver_key_changes` table to have `UNIQUE(user_id)` so we
don't accumulate key changes forevermore, we now have at most 1
row per user which contains the highest change ID.

This needs a SQL migration.

* Ensure we bump the change ID on sqlite

* Actually read the DeviceChangeID not the Offset in synapi

* Add SQL migrations

* Prepare after migration; fixup dendrite-upgrade-test logging

* Use higher version numbers; fix sqlite query to increment better

* Default 0 on postgres

* fixup postgres migration on fresh dendrite instances
2022-01-21 09:56:06 +00:00
kegsay db7d9cba8a
BREAKING: Remove Partitioned Stream Positions (#2096)
* go mod tidy

* Break complement to check it fails CI

* Remove partitioned stream positions

This was used by the device list stream position. The device list position
now corresponds to the `Offset`, and the partition is always 0, in prep
for removing reliance on Kafka topics for device list changes.

* Linting

* Migrate old style tokens to new style because element-web doesn't soft-logoout on 4xx errors on /sync
2022-01-20 15:26:45 +00:00