Commit graph

35 commits

Author SHA1 Message Date
Till 42a82091a8
Fix issue with stale device lists ()
We were only sending the last entry to the worker, so most likely missed
updates.
2022-09-08 12:03:44 +02:00
Till 0d697f6754
Add HTTP status code to FederationClientError ()
Also ensures we wait on more HTTP status codes.
2022-09-07 16:14:09 +02:00
Till Faelligen 4e352390b6
Re-add waitTime if we're not blacklisted and no RetryAfter was
specified.
2022-09-07 12:13:02 +02:00
Till 440eb0f3a2
Handle errors differently in the DeviceListUpdater ()
`If a device list update goes missing, the server resyncs on the next
one` was failing because a previous test would receive a `waitTime` of
1h, resulting in the test timing out.
This now tries to handle the returned errors differently, e.g. by using
the default `waitTime` of 2s. Also doesn't try further users in the
list, if one of the errors would cause a longer `waitTime`.
2022-09-07 11:44:27 +02:00
Neil Alexander 5513f182cc
Enforce device list backoffs ()
This ensures that if the device list updater is already backing off a node, we don't try to call processServer again anyway for server just because the server name arrived in the channel. Otherwise we can keep trying to hit a remote server that is offline or not behaving every second and that spams the logs too.
2022-08-19 10:23:09 +01:00
Neil Alexander c45d0936b5
Generic-based internal HTTP API ()
* Generic-based internal HTTP API (tested out on a few endpoints in the federation API)

* Add `PerformInvite`

* More tweaks

* Fix metric name

* Fix LookupStateIDs

* Lots of changes to clients

* Some serverside stuff

* Some error handling

* Use paths as metric names

* Revert "Use paths as metric names"

This reverts commit a9323a6a34.

* Namespace metric names

* Remove duplicate entry

* Remove another duplicate entry

* Tweak error handling

* Some more tweaks

* Update error behaviour

* Some more error tweaking

* Fix API path for `PerformDeleteKeys`

* Fix another path

* Tweak federation client proxying

* Fix another path

* Don't return typed nils

* Some more tweaks, not that it makes any difference

* Tweak federation client proxying

* Maybe fix the key backup test
2022-08-11 15:29:33 +01:00
Till 1b7f84250a
Fix linter issues ()
* Try that again

* All hail the mighty linter?

* And once again

* goimport all the things
2022-08-05 11:12:41 +02:00
Neil Alexander 70cd8c68c2
Reduce error levels on device list update 2022-06-01 09:49:46 +01:00
kegsay 6de29c1cd2
bugfix: E2EE device keys could sometimes not be sent to remote servers ()
* Fix flakey sytest 'Local device key changes get to remote servers'

* Debug logs

* Remove internal/test and use /test only

Remove a lot of ancient code too.

* Use FederationRoomserverAPI in more places

* Use more interfaces in federationapi; begin adding regression test

* Linting

* Add regression test

* Unbreak tests

* ALL THE LOGS

* Fix a race condition which could cause events to not be sent to servers

If a new room event which rewrites state arrives, we remove all joined hosts
then re-calculate them. This wasn't done in a transaction so for a brief period
we would have no joined hosts. During this interim, key change events which arrive
would not be sent to destination servers. This would sporadically fail on sytest.

* Unbreak new tests

* Linting
2022-05-17 13:23:35 +01:00
Neil Alexander 9b316ac64c
Slower federation warm-up ()
* Wake destination queues gradually, rather than all at once

* Delay device list updates too

* Maximum two minute warmup period
2022-04-04 15:14:10 +01:00
Neil Alexander e485f9c2bd
64-bit stream IDs for device list updates () 2022-03-10 13:17:28 +00:00
S7evinK 41dc651b25
Send device update to local users if remote display name changes ()
* Send device_list update to satisfy sytest

* Fix build issue from merged in change

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2022-02-22 16:34:53 +00:00
Neil Alexander 600fbae31f
Only emit key change notifications from federation when changes are made ()
* Only emit key changes when poked over federation

* Remove logging

* Fix unit test possibly
2022-02-22 13:35:06 +00:00
Neil Alexander 140077265e
Make GetUserDevices logging entry more useful 2022-02-17 15:02:06 +00:00
S7evinK 89b7519089
Raise waitTime for network related issues () 2022-02-17 13:15:49 +00:00
Neil Alexander ec716793eb
Merge federationapi, federationsender, signingkeyserver components ()
* Initial federation sender -> federation API refactoring

* Move base into own package, avoids import cycle

* Fix build errors

* Fix tests

* Add signing key server tables

* Try to fold signing key server into federation API

* Fix dendritejs builds

* Update embedded interfaces

* Fix panic, fix lint error

* Update configs, docker

* Rename some things

* Reuse same keyring on the implementing side

* Fix federation tests, `NewBaseDendrite` can accept freeform options

* Fix build

* Update create_db, configs

* Name tables back

* Don't rename federationsender consumer for now
2021-11-24 10:45:23 +00:00
Neil Alexander 125ea75b24
Add type field to DeviceMessage, allow fields to be nullable () 2021-08-11 09:44:14 +01:00
Neil Alexander eb0efa4636
Cross-signing groundwork ()
* Cross-signing groundwork

* Update to 

* Fix gobind builds, which stops unit tests in CI from yelling

* Some changes from review comments

* Fix build by passing in UIA

* Update to matrix-org/gomatrixserverlib@bec8d22

* Process master/self-signing keys from devices call

* nolint

* Enum-ify the key type in the database

* Process self-signing key too

* Fix sanity check in device list updater

* Fix check

* Fix sytest, hopefully

* Fix build
2021-08-04 17:56:29 +01:00
Kegsay b769d5a25e
Optimise memory usage when calling /g_m_e ()
* Optimise memory usage when calling /g_m_e

* cache more events

* refactor handling of device list update pokes

* Sigh
2021-04-08 13:50:39 +01:00
Kegsay 802f1c96f8
Add more metrics ()
* Add more metrics

* Linting
2021-03-23 15:22:00 +00:00
Kegsay a1b7e4ef3f
log less for failed key querys, add counters for incoming pdus/edus ()
* log less for failed key querys, add counters for incoming pdus/edus

* use labels

* Blacklist flakey test

* Fix metrics
2021-03-23 11:33:36 +00:00
Kegsay 77fb981da5
device lists: backoff for longer if the wrong error type is returned () 2021-03-08 17:45:20 +00:00
Loïck Bonniot 940577cd3c
Fix integer overflow in device_list_update.go ()
Fix 
On 32-bits systems, int(hash.Sum32()) can be negative.
This makes the computation of array indices using modulo invalid, crashing dendrite.
Signed-off-by: Loïck Bonniot <git@lesterpig.com>
2021-01-18 12:43:15 +00:00
Neil Alexander fa65c40bae
Reduce device list GetUserDevices timeout () 2021-01-12 16:13:21 +00:00
Kegsay 29d6481842
Wait for 8h between device list updates for blacklisted servers () 2020-08-26 15:38:21 +01:00
Kegsay abd16ff4a0
Modify DeviceListUpdater to retry requests according to RetryAfter ()
* Modify DeviceListUpdater to retry requests according to RetryAfter

* Reduce wait time for sytest test pollution
2020-08-26 12:03:09 +01:00
Kegsay 6d6bb75137
Add FederationClient interface to federationsender ()
* Add FederationClient interface to federationsender

- Use a shim struct in HTTP mode to keep the same API as `FederationClient`.
- Use `federationsender` instead of `FederationClient` in `keyserver`.

* Pointers not values

* Review comments

* Fix unit tests

* Rejig backoff

* Unbreak test

* Remove debug logs

* Review comments and linting
2020-08-20 17:03:07 +01:00
Kegsay 02a8515e99
Only emit key changes which are different from what we had before ()
We did this already for local `/keys/upload` but didn't for
remote `/users/devices`. This meant any resyncs would spam produce
events, hammering disk i/o and spamming the logs.
2020-08-18 11:14:20 +01:00
Kegsay 20c8f252a7
Make 'Device list doesn't change if remote server is down' pass ()
- As a last resort, query the DB when exhausting all possible remote query
  endpoints, but keep the field in `failures` so clients can detect that this
  is stale data.
- Unblock `DeviceListUpdater.Update` on failures rather than timing out.
- Use a mutex when writing directly to `res`, not just for failures.
2020-08-13 16:43:27 +01:00
Kegsay 820c56c165
Fix more E2E sytests ()
* WIP: Eagerly sync device lists on /user/keys/query requests

Also notify servers when a user's device display name changes. Few
caveats:
 - sytest `Device deletion propagates over federation` fails
 - `populateResponseWithDeviceKeysFromDatabase` is called from multiple
   goroutines and hence is unsafe.

* Handle deleted devices correctly over federation
2020-08-12 22:43:02 +01:00
Kegsay d98ec12422
Add sync mechanism to block when updating device lists ()
* Add sync mechanism to block when updating device lists

With a timeout, mainly for sytest to fix the test
"Server correctly handles incoming m.device_list_update"
which is flakey because it assumes that when `/send` 200 OKs
that the server has updated the device lists in prep for
`/keys/query` which is not always true when using workers.

* Fix UT

* Add new working test
2020-08-12 13:50:54 +01:00
Kegsay b8b854d642
Bugfixes for 'If remote user leaves room we no longer receive device updates' ()
* Bugfixes for 'If remote user leaves room we no longer receive device updates'

* Update whitelist and README
2020-08-12 10:50:52 +01:00
Kegsay befccd7d51
Reduce cooldown to make sure sytest doesn't give up ()
* Reduce cooldown to make sure sytest doesn't give up

* More sytests pass weeeeeee
2020-08-11 10:44:59 +01:00
Kegsay f371783da7
Finish inbound E2E device lists ()
* Add tests for device list updates

* Add stale_device_lists table and use db before asking remote for device keys

* Fetch remote keys if all devices are requested

* Add display_name col to store remote device names

Few other tweaks to make `Server correctly handles incoming m.device_list_update`
pass.

* Fix sqlite otk bug

* Unbuffered channel to block /send causing sytest to not race anymore

* Linting and fix bug whereby we didn't send updated dl tokens to the client causing a tightloop on /sync sometimes

* No longer assert staleness as Update blocks on workers now

* Back out tweaks

* Bugfixes
2020-08-07 17:32:13 +01:00
Kegsay 32a4565b55
Add device list updater which manages updating remote device lists ()
* Add device list updater which manages updating remote device lists

- Doesn't persist stale lists to the database yet
- Doesn't have tests yet

* Mark device lists as fresh when we persist
2020-08-06 17:48:10 +01:00