Commit graph

442 commits

Author SHA1 Message Date
Neil Alexander 16048be236
Handle case when applying history visibility failed 2022-09-30 16:55:10 +01:00
Neil Alexander 7d9545ceea
Another /sync fix 2022-09-30 16:34:06 +01:00
Neil Alexander ee40a29e55
Fix broken /sync due to transaction error 2022-09-30 16:07:18 +01:00
Neil Alexander 6348486a13
Transactional isolation for /sync (#2745)
This should transactional snapshot isolation for `/sync` etc requests.

For now we don't use repeatable read due to some odd test failures with
invites.
2022-09-30 12:48:10 +01:00
Neil Alexander 3f9e38e80a
Consistent *sql.Tx usage across sync API (#2744)
This tidies up the `storage` package so that everything takes a
transaction parameter instead of something things that do and some that
don't.
2022-09-28 10:18:03 +01:00
texuf a574ed5369
Fix for `sql: converting argument $1 type: unsupported type []interfa… (#2743)
…ce {}, a slice of interface` in new notifications select

The sqlite3 version was just not working, original pr here:
https://github.com/matrix-org/dendrite/pull/2688

signed off by: austin ellis <austin@hntlabs.com>

This doesn't fix the notification counts, they still only work about 1
out of every 5 times in my tests. I will stick with my other fix locally
for reliable notification delivery:
https://github.com/matrix-org/dendrite/pull/2701
2022-09-28 06:19:34 +02:00
Neil Alexander 083ae01520
Promote reindexing log level 2022-09-27 17:30:40 +01:00
Till 87be32ca26
Fulltext implementation using Bleve (#2675)
Based on #2480

This actually indexes events based on their event type. They are removed
from the index if we receive a `m.room.redaction` event on the
`OutputRoomEvent` stream.
An admin endpoint is added to reindex all existing events.


Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2022-09-27 18:06:49 +02:00
Till 249b32c4f3
Refactor notifications (#2688)
This PR changes the handling of notifications
- removes the `StreamEvent` and `ReadUpdate` stream
- listens on the `OutputRoomEvent` stream in the UserAPI to inform the
SyncAPI about unread notifications
- listens on the `OutputReceiptEvent` stream in the UserAPI to set
receipts/update notifications
- sets the `read_markers` directly from within the internal UserAPI

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2022-09-27 15:01:34 +02:00
PiotrKozimor 12649ccedd
Improve selectRoomIDsWithAnyMembershipSQL performance (#2738)
Recently I have observed that dendrite spends a lot of time (~390s) in
`selectRoomIDsWithAnyMembershipSQL` query

```
dendrite_syncapi=# select total_exec_time, left(query,100) from pg_stat_statements order by total_exec_time desc limit 5 ;
  total_exec_time   |                                                 left
--------------------+------------------------------------------------------------------------------------------------------
  747826.5800519128 | SELECT event_id, id, headered_event_json, session_id, exclude_from_sync, transaction_id, history_vis
  389130.5490339942 | SELECT DISTINCT room_id, membership FROM syncapi_current_room_state WHERE type = $2 AND state_key =
 376104.17514700035 | SELECT psd.datname, xact_commit, xact_rollback, blks_read, blks_hit, tup_returned, tup_fetched, tup_
   363644.164092031 | SELECT event_type_nid, event_state_key_nid, event_nid FROM roomserver_events WHERE event_nid = ANY($
  58570.48104699995 | SELECT event_id, headered_event_json FROM syncapi_current_room_state WHERE room_id = $1 AND ( $2::te
(5 rows)
```

Explain analyze showed correct usage of `syncapi_room_state_unique`
index:

```
dendrite_syncapi=#
explain analyze SELECT distinct room_id, membership FROM syncapi_current_room_state WHERE type = 'm.room.member' AND state_key = '@qjfl:dendrite.stg.globekeeper.com';
                                                                               QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=2749.38..2749.56 rows=24 width=52) (actual time=2.933..2.956 rows=65 loops=1)
   ->  Sort  (cost=2749.38..2749.44 rows=24 width=52) (actual time=2.932..2.937 rows=65 loops=1)
         Sort Key: room_id, membership
         Sort Method: quicksort  Memory: 34kB
         ->  Index Scan using syncapi_room_state_unique on syncapi_current_room_state  (cost=0.41..2748.83 rows=24 width=52) (actual time=0.030..2.890 rows=65 loops=1)
               Index Cond: ((type = 'm.room.member'::text) AND (state_key = '@qjfl:dendrite.stg.globekeeper.com'::text))
 Planning Time: 0.140 ms
 Execution Time: 2.990 ms
(8 rows)
```

Multi-column indexes in Postgres shall perform well for leftmost
columns, but I gave it a try and created
`syncapi_current_room_state_type_state_key_idx` index. I could observe
significant performance improvement. Execution time dropped from 2.9 ms
to 0.24 ms:

```
explain analyze SELECT distinct room_id, membership FROM syncapi_current_room_state WHERE type = 'm.room.member' AND state_key = '@qjfl:dendrite.stg.globekeeper.com';
                                                                             QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=96.46..96.64 rows=24 width=52) (actual time=0.199..0.218 rows=65 loops=1)
   ->  Sort  (cost=96.46..96.52 rows=24 width=52) (actual time=0.199..0.202 rows=65 loops=1)
         Sort Key: room_id, membership
         Sort Method: quicksort  Memory: 34kB
         ->  Bitmap Heap Scan on syncapi_current_room_state  (cost=4.53..95.91 rows=24 width=52) (actual time=0.048..0.139 rows=65 loops=1)
               Recheck Cond: ((type = 'm.room.member'::text) AND (state_key = '@qjfl:dendrite.stg.globekeeper.com'::text))
               Heap Blocks: exact=59
               ->  Bitmap Index Scan on syncapi_current_room_state_type_state_key_idx  (cost=0.00..4.53 rows=24 width=0) (actual time=0.037..0.037 rows=65 loops=1)
                     Index Cond: ((type = 'm.room.member'::text) AND (state_key = '@qjfl:dendrite.stg.globekeeper.com'::text))
 Planning Time: 0.236 ms
 Execution Time: 0.242 ms
(11 rows)
```

Next improvement is skipping DISTINCT and rely on map assignment in
`SelectRoomIDsWithAnyMembership`. Execution time drops by almost half:

```
explain analyze SELECT room_id, membership FROM syncapi_current_room_state WHERE type = 'm.room.member' AND state_key = '@qjfl:dendrite.stg.globekeeper.com';
                                                                       QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on syncapi_current_room_state  (cost=4.53..95.91 rows=24 width=52) (actual time=0.032..0.113 rows=65 loops=1)
   Recheck Cond: ((type = 'm.room.member'::text) AND (state_key = '@qjfl:dendrite.stg.globekeeper.com'::text))
   Heap Blocks: exact=59
   ->  Bitmap Index Scan on syncapi_current_room_state_type_state_key_idx  (cost=0.00..4.53 rows=24 width=0) (actual time=0.021..0.021 rows=65 loops=1)
         Index Cond: ((type = 'm.room.member'::text) AND (state_key = '@qjfl:dendrite.stg.globekeeper.com'::text))
 Planning Time: 0.087 ms
 Execution Time: 0.136 ms
(7 rows)
```

In our env we spend only 1s on inserting to table, so the write penalty
of creating an index should be small.
```
dendrite_syncapi=# select total_exec_time, left(query,100) from pg_stat_statements where query like '%INSERT%syncapi_current_room_state%' order by total_exec_time desc;
  total_exec_time   |                                                 left
--------------------+------------------------------------------------------------------------------------------------------
 1139.9057619999971 | INSERT INTO syncapi_current_room_state (room_id, event_id, type, sender, contains_url, state_key, he
(1 row)
``` 

This PR does not require test modifications.

### Pull Request Checklist

<!-- Please read docs/CONTRIBUTING.md before submitting your pull
request -->

* [x] I have added added tests for PR _or_ I have justified why this PR
doesn't need tests.
* [x] Pull request includes a [sign
off](https://github.com/matrix-org/dendrite/blob/main/docs/CONTRIBUTING.md#sign-off)

Signed-off-by: `Piotr Kozimor <p1996k@gmail.com>`
2022-09-27 09:41:36 +01:00
Till c53f284fdb
Get the DeviceListPosition before anything else in complete syncs (#2733)
This should hopefully unflake `Can query remote device keys using POST`
in Complement.
2022-09-22 17:49:35 +02:00
Neil Alexander 97d7cf2232
Remove deleted state logging lines from sync API (they are pointless) 2022-09-20 11:25:18 +01:00
Till e007b8038f
Mark device list as stale, if we don't have the requesting device (#2728)
This hopefully makes E2EE chats a little bit more reliable by re-syncing
devices if we don't have the `requesting_device_id` in our database. (As
seen in
[Synapse](c52abc1cfd/synapse/handlers/devicemessage.py (L157-L201)))
2022-09-20 11:32:03 +02:00
Neil Alexander b05e028f7d
Tweak LoadMembershipAtEvent behaviour when state not known (#2716)
Previously `LoadMembershipAtEvent` would fail if the state before one of
the events was not known, i.e. because it was an outlier. This modifies
it so that it gracefully handles not knowing the state and returns no
memberships instead, so that history visibility doesn't freak out and
kill `/sync` requests dead.
2022-09-13 12:52:09 +01:00
Till c366ccdfca
Send-to-device consumer/producer tweaks (#2713)
Some tweaks for the send-to-device consumers/producers:
- use `json.RawMessage` without marshalling it first
- try further devices (if available) if we failed to `PublishMsg` in the
producers
- some logging changes (to better debug E2EE issues)
2022-09-13 09:35:45 +02:00
Neil Alexander 955e69a3b7
Optimise SharedUsers again by using complete composite index 2022-09-09 14:18:45 +01:00
Neil Alexander 6ee758df63
Optimise shared users query in Synx API slightly by removing a potential sort 2022-09-09 13:50:50 +01:00
Neil Alexander 646de03d60
More writer fixes in the Sync API 2022-09-09 13:06:42 +01:00
Neil Alexander 175f65407a
Allow batching in JetStreamConsumer (#2686)
This allows us to receive more than one message from NATS at a time if we want.
2022-08-31 12:21:56 +01:00
PiotrKozimor 2be43560ca
Index on syncapi_send_to_device table (#2684)
Introduced index improves select query performance. Example execution time of `selectSendToDeviceMessagesSQL` query dropped from 80 ms to 15 ms. No sytest modifications are required.

### Pull Request Checklist

* [x] I have added added tests for PR _or_ I have justified why this PR doesn't need tests.
* [x] Pull request includes a [sign off](https://github.com/matrix-org/dendrite/blob/main/docs/CONTRIBUTING.md#sign-off)

Signed-off-by: `Piotr Kozimor <p1996k@gmail.com>`
2022-08-30 14:47:54 +01:00
Till 7313f56f44
Use existing limit instead of default limit when lazy loading members (#2682)
This should fix an issue where we return less than the expected membership events, when doing an initial sync.
When doing an initial sync, the state limit is set to `math.MaxInt32`, while the default filter is set to 20.
2022-08-30 14:18:47 +02:00
Till 07dd9bd995
SyncAPI tweaks/fixes (#2671)
- Reverts 9dc57122d9 as it was causing issues https://github.com/matrix-org/dendrite/issues/2660
- Updates the GMSL `DefaultStateFilter` to use a limit of 20 events
- Uses the timeline events to determine the new position instead of the state events
2022-08-25 13:42:47 +01:00
Neil Alexander 522bd2999f
Allow un-rejecting events on reprocessing 2022-08-24 14:03:06 +01:00
Till 9dc57122d9
Fetch more data for newly joined rooms in an incremental sync (#2657)
If we've joined a new room in an incremental sync, try fetching more data.
This deflakes the complement server notices test (at least in my tests).
2022-08-19 15:32:24 +02:00
Till 365da70a23
Set historyVisibility for backfilled events over federation (#2656)
This should hopefully deflake Backfill works correctly with history visibility set to joined as we were using the default shared visibility, even if the events are set to joined (or something else)
2022-08-19 11:04:26 +02:00
Till 5cacca92d2
Make SyncAPI unit tests more reliable (#2655)
This should hopefully make some SyncAPI tests more reliable
2022-08-19 11:03:55 +02:00
Till Faelligen 8d9c8f11c5
Add a delay after sending events to the roomserver 2022-08-18 08:56:57 +02:00
Neil Alexander ad4ac2c016
Stop spamming the logs with StateBetween: ignoring deleted state event IDs 2022-08-16 14:42:35 +01:00
Neil Alexander ec16c944eb
Lazy-loading fixes (#2646)
* Use existing current room state if we have it

* Don't dedupe before applying the history vis filter

* Revert "Don't dedupe before applying the history vis filter"

This reverts commit d27c4a0874.

* Revert "Use existing current room state if we have it"

This reverts commit 5819b4a7ce.

* Tweaks
2022-08-16 14:42:00 +01:00
Till 0642ffc0f6
Only return non-retired invites (#2643)
* Only return non-retired invites

* Revert "Only return non-retired invites"

This reverts commit 1150aa7f38.

* Check if we're doing an initial sync in the stream
2022-08-16 10:29:36 +02:00
Till 05cafbd197
Implement history visibility on /messages, /context, /sync (#2511)
* Add possibility to set history_visibility and user AccountType

* Add new DB queries

* Add actual history_visibility changes for /messages

* Add passing tests

* Extract check function

* Cleanup

* Cleanup

* Fix build on 386

* Move ApplyHistoryVisibilityFilter to internal

* Move queries to topology table

* Add filtering to /sync and /context
Some cleanup

* Add passing tests; Remove failing tests :(

* Re-add passing tests

* Move filtering to own function to avoid duplication

* Re-add passing test

* Use newly added GMSL HistoryVisibility

* Update gomatrixserverlib

* Set the visibility when creating events

* Default to shared history visibility

* Remove unused query

* Update history visibility checks to use gmsl
Update tests

* Remove unused statement

* Update migrations to set "correct" history visibility

* Add method to fetch the membership at a given event

* Tweaks and logging

* Use actual internal rsAPI, default to shared visibility in tests

* Revert "Move queries to topology table"

This reverts commit 4f0d41be9c.

* Remove noise/unneeded code

* More cleanup

* Try to optimize database requests

* Fix imports

* PR peview fixes/changes

* Move setting history visibility to own migration, be more restrictive

* Fix unit tests

* Lint

* Fix missing entries

* Tweaks for incremental syncs

* Adapt generic changes

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
Co-authored-by: kegsay <kegan@matrix.org>
2022-08-11 18:23:35 +02:00
Neil Alexander c45d0936b5
Generic-based internal HTTP API (#2626)
* Generic-based internal HTTP API (tested out on a few endpoints in the federation API)

* Add `PerformInvite`

* More tweaks

* Fix metric name

* Fix LookupStateIDs

* Lots of changes to clients

* Some serverside stuff

* Some error handling

* Use paths as metric names

* Revert "Use paths as metric names"

This reverts commit a9323a6a34.

* Namespace metric names

* Remove duplicate entry

* Remove another duplicate entry

* Tweak error handling

* Some more tweaks

* Update error behaviour

* Some more error tweaking

* Fix API path for `PerformDeleteKeys`

* Fix another path

* Tweak federation client proxying

* Fix another path

* Don't return typed nils

* Some more tweaks, not that it makes any difference

* Tweak federation client proxying

* Maybe fix the key backup test
2022-08-11 15:29:33 +01:00
Till e930959e49
Send-to-device/sync tweaks (#2630)
* Always delete send to device messages

* Omit empty to_device

* Tweak /sync response to omit empty values
2022-08-09 10:40:46 +02:00
Till Faelligen 10a151cb55
Don't panic if we fail to upsert account data 2022-08-05 15:37:13 +02:00
Till 3a156a434a
Invalidate lazyLoadCache if we're doing an initial sync (#2623)
* Bypass lazyLoadCache if we're doing an initial sync

* Make the linter happy again?

* Revert "Make the linter happy again?"

This reverts commit 52a5691ba3.

* Try that again

* Invalidate LazyLoadCache on initial syncs

* Remove unneeded check

* Add TODO

* Rename Invalite -> InvalidateLazyLoadedUser

* Thanks IDE
2022-08-05 14:27:27 +02:00
Till cecd11be9a
Partly fix notification counts (#2621)
* Fix notification query

* Also for SQLite

* Move tests to whitelist

* Revert "Move tests to whitelist"

This reverts commit a7d0120019.
2022-08-05 13:44:20 +02:00
Neil Alexander c8935fb53f
Do not use ioutil as it is deprecated (#2625) 2022-08-05 10:26:59 +01:00
Till 1b7f84250a
Fix linter issues (#2624)
* Try that again

* All hail the mighty linter?

* And once again

* goimport all the things
2022-08-05 11:12:41 +02:00
Brian Meek de78eab63a
Add race testing to tests, and fix a few small race conditions in the tests (#2587)
* Add race testing to tests, and fix a few small race conditions in the tests

* Enable run-sytest on MacOS

* Remove deadlock detecting mutex, per code review feedback

* Remove autoformatting related changes and a closure that is not needed

* Adjust to importing nats client as 'natsclient'

Signed-off-by: Brian Meek <brian@hntlabs.com>

* Clarify the use of gooseMutex to proect goose internal state

Signed-off-by: Brian Meek <brian@hntlabs.com>

* Remove no longer needed mutex for guarding goose

Signed-off-by: Brian Meek <brian@hntlabs.com>
2022-08-05 09:19:33 +01:00
Till 9fe509b18d
Fix syncapi shared users query & device lists (#2614)
* Fix query issue, only add "changed" users if we actually share a room

* Avoid log spam if context is done

* Undo changes to filterSharedUsers

* Add logging again..

* Fix SQLite shared users query

* Change query to include invited users
2022-08-03 18:35:17 +02:00
Till df5d4dc7a3
Delete correct Send-to-Device messages (#2608)
* Add send-to-device tests

* Update tests, fix message deletion

* PR comments
2022-08-02 17:00:16 +02:00
sergekh2 6b6b420b9f
Fix issue with sync API not advancing. (#2603)
Issue: During conversation, under some conditions, sync cookie is not advanced, and, as a result, client loops on the same sync API call creating high traffic and CPU load.
Fix: pdu component of cookie was updated incorrectly.
2022-08-02 09:43:48 +01:00
Neil Alexander e94ef84aab
De-race CompleteSync (#2601)
The `err` was coming from outside of the goroutine and being written to by concurrent goroutines.
2022-08-01 15:55:56 +01:00
Till 081f5e7226
Update database migrations, remove goose (#2264)
* Add new db migration

* Update migrations
Remove goose

* Add possibility to test direct upgrades

* Try to fix WASM test

* Add checks for specific migrations

* Remove AddMigration
Use WithTransaction
Add Dendrite version to table

* Fix linter issues

* Update tests

* Update comments, outdent if

* Namespace migrations

* Add direct upgrade tests, skipping over one version

* Split migrations

* Update go version in CI

* Fix copy&paste mistake

* Use contexts in migrations

Co-authored-by: kegsay <kegan@matrix.org>
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2022-07-25 10:39:22 +01:00
Neil Alexander f0c8a03649
Membership updater refactoring (#2541)
* Membership updater refactoring

* Pass in membership state

* Use membership check rather than referring to state directly

* Delete irrelevant membership states

* We don't need the leave event after all

* Tweaks

* Put a log entry in that I might stand a chance of finding

* Be less panicky

* Tweak invite handling

* Don't freak if we can't find the event NID

* Use event NID from `types.Event`

* Clean up

* Better invite handling

* Placate the almighty linter

* Blacklist a Sytest which is otherwise fine under Complement for reasons I don't understand

* Fix the sytest after all (thanks @S7evinK for the spot)
2022-07-22 14:44:04 +01:00
Till Faelligen bcff14adea Set historyVisibility in rowsToStreamEvents 2022-07-18 18:19:44 +02:00
Till a7e92f8cb9
History visibility database changes (#2533)
* Add new history_visibility column

* Update SQL queries to include history_visibility

* Store the history visibilty calculated by the roomserver

* Update GMSL

* Update migrations

* Fix migration

* Update GMSL

* Fix `go.sum`

* Update GMSL to use sql.Scanner & sql.Valuer

* Re-order migration/table creation

* Update gomatrixserverlib

* Add history_visibility column to current_room_state

* Fix migrations

* Return error instead of Fatal log

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2022-07-18 14:46:15 +02:00
Neil Alexander 90bf01d8b1
Use sync API database in filterSharedUsers (#2572)
* Add function to the sync API storage package for filtering shared users

* Use the database instead of asking the RS API

* Fix unit tests

* Fix map handling in `filterSharedUsers`
2022-07-15 16:25:26 +01:00
Till 09f0ff14c8
Minor SendToDevice fix (#2565)
* Avoid unnecessary marshalling if sending to the local server

* Fix ordering of ToDevice messages

* Revive SendToDevice test
2022-07-12 08:23:58 +02:00
Neil Alexander 3ea21273bc
Ristretto cache (#2563)
* Try Ristretto cache

* Tweak

* It's beautiful

* Update GMSL

* More strict keyable interface

* Fix that some more

* Make less panicky

* Don't enforce mutability checks for now

* Determine mutability using deep equality

* Tweaks

* Namespace keys

* Make federation caches mutable

* Update cost estimation, add metric

* Update GMSL

* Estimate cost for metrics better

* Reduce counters a bit

* Try caching events

* Some guards

* Try again

* Try this

* Use separate caches for hopefully better hash distribution

* Fix bug with admitting events into cache

* Try to fix bugs

* Check nil

* Try that again

* Preserve order jeezo this is messy

* thanks VS Code for doing exactly the wrong thing

* Try this again

* Be more specific

* aaaaargh

* One more time

* That might be better

* Stronger sorting

* Cache expiries, async publishing of EDUs

* Put it back

* Use a shared cache again

* Cost estimation fixes

* Update ristretto

* Reduce counters a bit

* Clean up a bit

* Update GMSL

* 1GB

* Configurable cache sizees

* Tweaks

* Add `config.DataUnit` for specifying friendly cache sizes

* Various tweaks

* Update GMSL

* Add back some lazy loading caching

* Include key in cost

* Include key in cost

* Tweak max age handling, config key name

* Only register prometheus metrics if requested

* Review comments @S7evinK

* Don't return errors when creating caches (it is better just to crash since otherwise we'll `nil`-pointer exception everywhere)

* Review comments

* Update sample configs

* Update GHA Workflow

* Update Complement images to Go 1.18

* Remove the cache test from the federation API as we no longer guarantee immediate cache admission

* Don't check the caches in the renewal test

* Possibly fix the upgrade tests

* Update to matrix-org/gomatrixserverlib#322

* Update documentation to refer to Go 1.18
2022-07-11 14:31:31 +01:00
Till f76f28e6db
Fix issue uint64 values with high bit are not supported in presence (#2562)
* Fix issue #2528

* Use gomatrixserverlib.Timestamp

* Use ParseUint instead of ParseInt
2022-07-07 16:29:25 +02:00
Neil Alexander 460dccf93d
Hopefully fix read receipts timestamps (#2557)
This should avoid coercions between signed and unsigned ints which might fix problems like `sql: converting argument $5 type: uint64 values with high bit set are not supported`.
2022-07-05 17:13:26 +01:00
Till 89cd0e8fc1
Try to fix backfilling (#2548)
* Try to fix backfilling

* Return start/end to not confuse clients

* Update GMSL

* Update GMSL
2022-07-01 11:49:26 +02:00
Till 561c159ad7
Silence presence logs (#2547) 2022-06-30 12:34:37 +02:00
Till 2086992caf
Don't return end if there are not more messages (#2542)
* Be more spec compliant

* Move lazyLoadMembers to own method
2022-06-29 10:49:12 +02:00
Neil Alexander 7120eb6bc9
Add InputDeviceListUpdate to the keyserver, remove old input API (#2536)
* Add `InputDeviceListUpdate` to the keyserver, remove old input API

* Fix copyright

* Log more information when a device list update fails
2022-06-15 14:27:07 +01:00
Neil Alexander 4c2a10f1a6
Handle state before, send history visibility in output (#2532)
* Check state before event

* Tweaks

* Refactor a bit, include in output events

* Don't waste time if soft failed either

* Tweak control flow, comments, use GMSL history visibility type
2022-06-13 15:11:10 +01:00
kegsay 21dd5a7176
syncapi: don't return early for no-op incremental syncs (#2473)
* syncapi: don't return early for no-op incremental syncs

Comments explain why, but basically it's an inefficient use
of bandwidth and some sytests rely on /sync to block.

* Honour timeouts

* Actually return a response with timeout=0
2022-05-19 09:00:56 +01:00
kegsay b3162755a9
bugfix: fix race condition when updating presence via /sync (#2470)
* bugfix: fix race condition when updating presence via /sync

Previously when presence is updated via /sync, we would send the presence update
asyncly via NATS. This created a race condition:
 - If the presence update is processed quickly, the /sync which triggered the presence
   update would see an online presence.
 - If the presence update was processed slowly, the /sync which triggered the presence
   update would see an offline presence.

This is the root cause behind the flakey sytest: 'User sees their own presence in a sync'.

The fix is to ensure we update the database/advance the stream position synchronously
for local users.

* Bugfix for test
2022-05-17 15:53:08 +01:00
kegsay 6de29c1cd2
bugfix: E2EE device keys could sometimes not be sent to remote servers (#2466)
* Fix flakey sytest 'Local device key changes get to remote servers'

* Debug logs

* Remove internal/test and use /test only

Remove a lot of ancient code too.

* Use FederationRoomserverAPI in more places

* Use more interfaces in federationapi; begin adding regression test

* Linting

* Add regression test

* Unbreak tests

* ALL THE LOGS

* Fix a race condition which could cause events to not be sent to servers

If a new room event which rewrites state arrives, we remove all joined hosts
then re-calculate them. This wasn't done in a transaction so for a brief period
we would have no joined hosts. During this interim, key change events which arrive
would not be sent to destination servers. This would sporadically fail on sytest.

* Unbreak new tests

* Linting
2022-05-17 13:23:35 +01:00
Till Faelligen b57fdcc82d Only try to get OTKs if the context isn't done yet 2022-05-13 10:28:00 +02:00
Kegan Dougal 3437adf597 Wait 100ms for events to be processed by syncapi 2022-05-12 10:11:46 +01:00
Till 58af7f61b6
Fix OTK upload spam (#2448)
* Fix OTK spam

* Update comment

* Optimize selectKeysCountSQL to only return max 100 keys

* Return CurrentPosition if the request timed out

* Revert "Return CurrentPosition if the request timed out"

This reverts commit 7dbdda9641.

Co-authored-by: kegsay <kegan@matrix.org>
2022-05-11 17:15:18 +01:00
kegsay 9599b3686e
More syncapi tests (#2451)
* WIP tests for flakey create event

* Uncomment all database test
2022-05-11 13:44:32 +01:00
kegsay c15bfefd0d
Add RoomExists flag to QueryMembershipForUser (#2450)
Fixes https://github.com/matrix-org/complement/pull/369
2022-05-11 11:29:23 +01:00
Neil Alexander e2a932ec0b
Add indexes to syncapi_output_room_events table that satisfy the filters (#2446) 2022-05-10 11:23:36 +01:00
kegsay 236b16aa6c
Begin adding syncapi component tests (#2442)
* Add very basic syncapi tests

* Add a way to inject jetstream messages

* implement add_state_ids

* bugfixes

* Unbreak tests

* Remove now un-needed API call

* Linting
2022-05-09 17:23:02 +01:00
Neil Alexander a443d1e5f3
Don't store invites in sync API that aren't relevant to local users (#2439) 2022-05-09 16:25:22 +01:00
Neil Alexander 79da75d483
Federation consumer adds_state_event_ids tweak (#2441)
* Don't ask roomserver for events we already have in federation API

* Check number of events returned is as expected

* Preallocate array

* Improve shape a bit
2022-05-09 16:19:35 +01:00
Neil Alexander 1a7f4c8aa9
Don't try to re-fetch the event if it is listed in adds_state_event_ids (#2437)
* Don't try to re-fetch the event in the output message

* Try that again

* Add the initial event into the set
2022-05-09 15:22:33 +01:00
Neil Alexander 09d754cfbf
One NATS instance per BaseDendrite (#2438)
* One NATS instance per `BaseDendrite`

* Fix roomserver
2022-05-09 14:15:24 +01:00
Till 6493c0c0f2
Move LL cache (#2429) 2022-05-06 15:33:34 +02:00
kegsay d86dcbef66
syncapi: define specific interfaces for internal HTTP communications (#2416)
* syncapi: use finer-grained interfaces when making the syncapi

* Use specific interfaces for syncapi-roomserver interactions

* Define query access token api for shared http auth code
2022-05-05 09:56:03 +01:00
Till 3c940c428d
Add opt-in anonymous stats reporting (#2249)
* Initial phone home stats queries

* Add userAgent to UpdateDeviceLastSeen
Add new Table for tracking daily user vists

* Add user_daily_visits table

* Fix queries

* userapi stats tables & queries

* userapi interface and internal api

* sycnapi stats queries

* testing phone home stats

* Add complete config to syncapi

* add missing files

* Fix queries

* Send empty request

* Add version & monolith stats

* Add configuration for phone home stats

* Move WASM to its own file, add config and comments

* Add tracing methods

* Add total rooms

* Add more fields, actually send data somewhere

* Move stats to the userapi

* Move phone home stats to util package

* Cleanup

* Linter & parts of GH comments

* More GH comments changes
- Move comments to SQL statements
- Shrink interface, add struct for stats
- No fatal errors, use defaults

* Be more explicit when querying

* Fix wrong calculation & wrong query params
Add tests

* Add Windows stats

* ADd build constraint

* Use new testing structure
Fix issues with getting values when using SQLite
Fix wrong AddDate value
Export UpdateUserDailyVisits

* Fix query params

* Fix test

* Add comment about countR30UsersSQL and countR30UsersV2SQL; fix test

* Update config

* Also update example config file

* Use OS level proxy, update logging

Co-authored-by: kegsay <kegan@matrix.org>
2022-05-04 19:04:28 +02:00
Neil Alexander dd061a172e
Tidy up AddPublicRoutes (#2412)
* Simplify federation API `AddPublicRoutes`

* Simplify client API `AddPublicRoutes`

* Simplify media API `AddPublicRoutes`

* Simplify sync API `AddPublicRoutes`

* Simplify `AddAllPublicRoutes`
2022-05-03 17:17:02 +01:00
Neil Alexander 4ad5f9c982
Global database connection pool (for monolith mode) (#2411)
* Allow monolith components to share a single database pool

* Don't yell about missing connection strings

* Rename field

* Setup tweaks

* Fix panic

* Improve configuration checks

* Update config

* Fix lint errors

* Update comments
2022-05-03 16:35:06 +01:00
Till 987d7adc5d
Return "to", if we didn't return any presence events (#2407)
Return correct stream position, if we didn't return any presence events
2022-04-30 00:07:50 +02:00
Till 2a5b8e0306
Only load members of newly joined rooms (#2389)
* Only load members of newly joined rooms

* Comment that the query is prepared at runtime

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2022-04-28 18:53:28 +02:00
Till 21ee5b36a4
Limit presence in /sync responses (#2394)
* Use filter and limit presence count

* More limiting

* More limiting

* Fix unit test

* Also limit presence by last_active_ts

* Update query, use "from" as the initial lastPos

* Get 1000 presence events, they are filtered later

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2022-04-28 15:12:40 +01:00
Till 74259f296f
Fix #2390 (#2392)
Fix duplicate heroes in `/sync` response.
2022-04-27 21:31:30 +02:00
Neil Alexander 655ac3e8fb
Try that again 2022-04-27 15:10:15 +01:00
Neil Alexander 6ee8507955
Correct account data position mapping 2022-04-27 15:10:10 +01:00
Neil Alexander dca4afd2f0
Don't send account data or receipts for left/forgotten rooms (#2382)
* Only include account data and receipts for rooms in a complete sync that we care about

* Fix global account data
2022-04-27 12:03:34 +01:00
Neil Alexander 66b397b3c6
Don't create fictitious presence entries (#2381)
* Don't create fictitious presence entries for users that don't have any

* Update whitelist, since that test probably shouldn't be passing

* Fix panics
2022-04-27 11:25:07 +01:00
Neil Alexander 6c5c6d73d7
Use a value that is Go 1.16-friendly 2022-04-26 17:05:31 +01:00
Neil Alexander b527e33c16
Send all account data on complete sync by default
Squashed commit of the following:

commit 0ec8de57261d573a5f88577aa9d7a1174d3999b9
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Tue Apr 26 16:56:30 2022 +0100

    Select filter onto provided target filter

commit da40b6fffbf5737864b223f49900048f557941f9
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Tue Apr 26 16:48:00 2022 +0100

    Specify other field too

commit ffc0b0801f63bb4d3061b6813e3ce5f3b4c8fbcb
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Tue Apr 26 16:45:44 2022 +0100

    Send as much account data as possible during complete sync
2022-04-26 16:58:20 +01:00
Neil Alexander f6d07768a8
Fix account data position 2022-04-26 16:08:01 +01:00
Neil Alexander 6892e0f0e0
Start account data ID from from 2022-04-26 16:02:21 +01:00
Till 4c19f22725
Fix account_data not correctly send in a complete sync (#2379)
* Return the StreamPosition from the database and not the latest

* Fix linter issue
2022-04-26 14:50:56 +01:00
Till e8be2b234f
Add heroes to the room summary (#2373)
* Implement room summary heroes

* Add passing tests

* Move MembershipCount to addRoomSummary

* Add comments, close Statement
2022-04-26 10:53:17 +02:00
Till e95fc5c5e3
Use provided filter for account_data (#2372)
* Reuse IncrementalSync, use provided filter

* Inform SyncAPI about newly created push_rules
2022-04-25 19:04:46 +02:00
Till c07f347f00
Reuse the existing lazyload cache on /context and /messages (#2367) 2022-04-22 11:38:29 +02:00
Neil Alexander 54e7ea41c6
Eliminate more SQL no row errors in sync API (#2363)
* Handle `sql.ErrNoRows` in main `/sync` codepaths

* Catch more
2022-04-20 16:51:37 +01:00
Neil Alexander bb987cd64b
Lazy loading fixes (#2362)
* Return some more usefully wrapped errors when doing sync

* Remove unnecessary error check

* Couple of guards around `sql.ErrNoRows`

* Nolint
2022-04-20 16:06:46 +01:00
Till 57e3622b85
Implement lazy loading on /sync (#2346)
* Initial work on lazyloading

* Partially implement lazy loading on /sync

* Rename methods

* Make missing tests pass

* Preallocate slice, even if it will end up with fewer values

* Let the cache handle the user mapping

* Linter

* Cap cache growth
2022-04-19 09:46:45 +01:00
Neil Alexander 3a5e9a0f28
Use default sync filter if specified filter is not found (should fix #2350) (#2351) 2022-04-13 16:41:22 +01:00
Neil Alexander 1140f39993
Precompute values for userIDSet in sync notifier (#2348)
* Precompute values for `userIDSet` in sync notifier

* Mutexes

* Fixes

* Sensible initial value

* Update syncapi/notifier/notifier.go

Co-authored-by: Till <2353100+S7evinK@users.noreply.github.com>

* Placate the almighty linter

Co-authored-by: Till <2353100+S7evinK@users.noreply.github.com>
2022-04-13 12:35:30 +01:00
Till 29f2168789
Make /messages filterable (#2347)
* Make /messages filterable
Fix bug when determining if an event contains an URL

* Add newly passing test

* Fix test
2022-04-13 13:16:02 +02:00
Till 69f2ff7c82
Correctly use provided filters (#2339)
* Apply filters correctly

* Fix issues; Use prepareWithFilters

* Update gmsl & tests

* go.mod..

* PR comments
2022-04-11 09:05:23 +02:00
kegsay b4b2fbc36b Remove dead code in the sync api (#2341) 2022-04-09 00:37:50 +01:00
kegsay 6d25bd6ca5
syncapi: add more tests; fix more bugs (#2338)
* syncapi: add more tests; fix more bugs

bugfixes:
 - The postgres impl of TopologyTable.SelectEventIDsInRange did not use the provided txn
 - The postgres impl of EventsTable.SelectEvents did not preserve the ordering of the input event IDs in the output events slice
 - The sqlite impl of EventsTable.SelectEvents did not use a bulk `IN ($1)` query.

Added tests:
 - `TestGetEventsInRangeWithTopologyToken`
 - `TestOutputRoomEventsTable`
 - `TestTopologyTable`

* -p 1 for now
2022-04-08 17:53:24 +01:00
Till e8dd37d533
Add metrics for internal API requests (#2310)
* Add response size and requests total to internal handler

* Move MustRegister calls to New* funcs

* Move MustRegister back to init

* Init at some place, minimize changes
2022-04-08 12:24:40 +02:00
Neil Alexander f299f97e0a
Fix data race in TestCorrectStreamWakeup (#2334) 2022-04-08 10:46:36 +01:00
kegsay 7499147550
Add test infrastructure code for dendrite unit/integ tests (#2331)
* Add test infrastructure code for dendrite unit/integ tests

Start re-enabling some syncapi storage tests in the process.

* Linting

* Add postgres service to unit tests

* dendrite not syncv3

* Skip test which doesn't work

* Linting

* Add `jetstream.PrepareForTests`

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2022-04-08 10:12:30 +01:00
Neil Alexander 7902859bc1
Fix lock contention in sync notifier 2022-04-07 17:08:15 +01:00
Till 60ee7eef4c
Add possibility to ignore users (#2329)
* Add ignore users

* Ignore users in pushrules
Add passing tests

* Update sytest lists

* Store ignore knowledge in the sync API

* Fix copyrights

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2022-04-07 15:08:19 +01:00
Neil Alexander 99ef547295
Simplify presence stringification (should help with vector-im/element-android#5712) 2022-04-07 10:10:28 +01:00
Neil Alexander c28f00bf96
Sync notifier tweaks (#2327)
* Micro-optimisations, lock fixes

* Refactor `SharedUsers`

* Reuse map to reduce allocations/GC pressure

* oh yeah, initialise it

* Leave room for the user ID we'll no doubt append afterward
2022-04-06 15:04:41 +01:00
Neil Alexander 602818460d
Reduce allocations in /sync presence stream (#2326)
* Reduce allocations on presence

* Try to reduce allocations further

* Tweak `IsSharedUser` some more

* Take map lock
2022-04-06 13:31:44 +01:00
Till e5e3350ce1
Add presence module V2 (#2312)
* Syncapi presence

* Clientapi http presence handler

* Why is this here?

* Missing files

* FederationAPI presence implementation

* Add new presence stream

* Pinecone update

* Pinecone update

* Add passing tests

* Make linter happy

* Add presence producer

* Add presence config option

* Set user to unavailable after x minutes

* Only set currently_active if online
Avoid unneeded presence updates when syncing

* Tweaks

* Query devices for last_active_ts
Fixes & tweaks

* Export SharedUsers/SharedUsers

* Presence stream in MemoryStorage

* Remove status_msg_nil

* Fix sytest crashes

* Make presence types const and use stringer for it

* Change options to allow inbound/outbound presence

* Fix option & typo

* Update configs

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2022-04-06 13:11:19 +02:00
Neil Alexander cd8fac152e
Include joined and invite member counts in room summary (#2315)
* Include joined and invite member counts in room summary

This should fix #2314 and also fix the problem where some clients like Element Android, Fluffychat etc would display the wrong member count for a given room.

* Improve SQLite query precision

* Check existence of state key for membership events
2022-04-01 16:14:38 +01:00
S7evinK 49dc49b232
Remove eduserver (#2306)
* Move receipt sending to own JetStream producer

* Move SendToDevice to producer

* Remove most parts of the EDU server

* Fix SendToDevice & copyrights

* Move structs, cleanup EDU Server traces

* Use HeadersOnly subscription

* Missing file

* Fix linter issues

* Move consumers to own files

* Rename durable consumer; Consumer cleanup

* Docs/config cleanup
2022-03-29 14:14:35 +02:00
Neil Alexander b113217a6d
Use most recent event in response to get latest stream position in incremental sync (#2302)
* Use latest event position in response for advancing the stream position in an incremental sync

* Create some calm

* Use To in worst case

* Don't waste CPU cycles on an empty response after all

* Bug fixes

* Fix another bug
2022-03-25 12:38:16 +00:00
Neil Alexander d983d17355
Fix lint errors 2022-03-24 10:03:22 +00:00
Neil Alexander 98a5e410d7
Per-room consumers (#2293)
* Roomserver input refactoring — again!

* Ensure the actor runs again

* Preserve consumer after unsubscribe

* Another sprinkling of magic

* Rename `TopicFor` to `Prefixed`

* Recreate the stream if the config is bad

* Check streams too

* Prefix subjects, preserve inboxes

* Recreate if subjects wrong

* Remove stream subject

* Reconstruct properly

* Fix mutex unlock

* Comments

* Fix tests

* Don't drop events

* Review comments

* Separate `queueInputRoomEvents` function

* Re-jig control flow a bit
2022-03-23 10:20:18 +00:00
Neil Alexander 9572f5ed19
Wait for safe shutdown of NATS Server (#2289) 2022-03-21 10:32:34 +00:00
S7evinK 8336ce972e
Remove unused partition_offset_table (#2288) 2022-03-21 10:47:41 +01:00
Neil Alexander 475d3c1af9
Better mapping of stream positions to topological positions in /messages (#2263)
* Convert stream positions into topological positions for both `from` and `to` in `/messages`

* Hopefully it works now

* Remove unnecessary logging

* Return sane values if `StreamToTopologicalPosition` can't work out the right thing to do

* Revert logging change

* tweaks

* Fix `selectEventIDsInRangeASCSQL`

* Test `Getting messages going forward is limited for a departed room (SPEC-216)` was passing incorrectly so un-whitelist it
2022-03-18 10:40:01 +00:00
Neil Alexander 4e64c270db
Various bug fixes and tweaks around invites and membership 2022-03-17 17:05:21 +00:00
Neil Alexander e30aa38fb0
Stream tweaks, use same codepath for sync vs async input room events, wait for error response via NATS messages (#2283) 2022-03-16 14:21:11 +00:00
S7evinK d8facd6308
Fix SQL statement for PurgeRoomState (#2280) 2022-03-16 11:25:50 +01:00
Neil Alexander fc0bdf5d88
Truncate recentStreamEvents before working out which event IDs to exclude from stateEvents (#2281) 2022-03-16 10:18:08 +00:00
S7evinK a2cf1aaf48
Fix /context with lazy_load_members (#2277)
* Add membership events to the end of the list, to ensure Sytest sees them

* Move tests to allowlist

* Append to correct list, fix logging message

* Add flakey tests to blacklist

* Remove flakey tests from whitelist
2022-03-14 20:04:24 +01:00
Neil Alexander 507a8e6773
Don't range entire state for /sync (#2270)
* Don't range entire state for rooms the user has no reason to care about

* Remove unnecessary db field in postgresql
2022-03-11 12:48:45 +00:00
Neil Alexander 67de4dbd0c
Don't send adds_state_events in roomserver output events anymore (#2258)
* Don't send `adds_state_events` in roomserver output events anymore

* Set `omitempty` on some output fields that aren't always set

* Add `AddsState` helper function

* No-op if no added state event IDs

* Revert "No-op if no added state event IDs"

This reverts commit 71a0ef3df1.

* Revert "Add `AddsState` helper function"

This reverts commit c9fbe45475.
2022-03-07 17:17:16 +00:00
Neil Alexander 7fc62d8178
Fix a panic in OnIncomingMessagesRequest (#2250)
It's possible for `GetStateEvent` to return `nil` if there was no error but the state event wasn't found. Therefore we need to be prepared for that case.

This should fix #2247.
2022-03-04 10:24:26 +00:00
Neil Alexander 72022a6ecf
Return 404 if event given to /context was not found (#2245) 2022-03-03 17:58:24 +00:00
Neil Alexander 6ed8cf0e07
Handle ErrNoRows when sending read updates 2022-03-03 12:09:16 +00:00
Dan f05ce478f0
Implement Push Notifications (#1842)
* Add Pushserver component with Pushers API

Co-authored-by: Tommie Gannert <tommie@gannert.se>
Co-authored-by: Dan Peleg <dan@globekeeper.com>

* Wire Pushserver component

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>

* Add PushGatewayClient.

The full event format is required for Sytest.

* Add a pushrules module.

* Change user API account creation to use the new pushrules module's defaults.

Introduces "scope" as required by client API, and some small field
tweaks to make some 61push Sytests pass.

* Add push rules query/put API in Pushserver.

This manipulates account data over User API, and fires sync messages
for changes. Those sync messages should, according to an existing TODO
in clientapi, be moved to userapi.

Forks clientapi/producers/syncapi.go to pushserver/ for later extension.

* Add clientapi routes for push rules to Pushserver.

A cleanup would be to move more of the name-splitting logic into
pushrules.go, to depollute routing.go.

* Output rooms.join.unread_notifications in /sync.

This is the read-side. Pushserver will be the write-side.

* Implement pushserver/storage for notifications.

* Use PushGatewayClient and the pushrules module in Pushserver's room consumer.

* Use one goroutine per user to avoid locking up the entire server for
  one bad push gateway.
* Split pushing by format.
* Send one device per push. Sytest does not support coalescing
  multiple devices into one push. Matches Synapse. Either we change
  Sytest, or remove the group-by-url-and-format logic.
* Write OutputNotificationData from push server. Sync API is already
  the consumer.

* Implement read receipt consumers in Pushserver.

Supports m.read and m.fully_read receipts.

* Add clientapi route for /unstable/notifications.

* Rename to UpsertPusher for clarity and handle pusher update

* Fix linter errors

* Ignore body.Close() error check

* Fix push server internal http wiring

* Add 40 newly passing 61push tests to whitelist

* Add next 12 newly passing 61push tests to whitelist

* Send notification data before notifying users in EDU server consumer

* NATS JetStream

* Goodbye sarama

* Fix `NewStreamTokenFromString`

* Consume on the correct topic for the roomserver

* Don't panic, NAK instead

* Move push notifications into the User API

* Don't set null values since that apparently causes Element upsetti

* Also set omitempty on conditions

* Fix bug so that we don't override the push rules unnecessarily

* Tweak defaults

* Update defaults

* More tweaks

* Move `/notifications` onto `r0`/`v3` mux

* User API will consume events and read/fully read markers from the sync API with stream positions, instead of consuming directly

Co-authored-by: Piotr Kozimor <p1996k@gmail.com>
Co-authored-by: Tommie Gannert <tommie@gannert.se>
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2022-03-03 11:40:53 +00:00
Neil Alexander 849e40d456
Use correct stream provider in Latest for ReceiptPosition 2022-03-01 17:25:26 +00:00
Neil Alexander 726529fe99
Hopefully fix read receipts (#2241) 2022-03-01 16:59:11 +00:00
S7evinK af610df85a
Return state on calls to /message and lazy load members (#2218)
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2022-03-01 14:39:56 +00:00
Neil Alexander 530f05885d
Limit JoinedUsersSetInRooms to interested users (#2234)
* Limit database work in `JoinedUsersSetInRooms` to changed user IDs only

* Comments

* Fix variadic params for SQLite, update comments
2022-03-01 13:01:38 +00:00
Neil Alexander bbe7d37928
Fix logic error on context history visibility (#2211) 2022-02-21 16:38:53 +00:00
S7evinK cf525d1f61
Implement /context (#2207)
* Add QueryEventsAfter

* Add /context

* Make all tests pass on sqlite

* Add queries to get the events for /context requests

* Move /context to the syncapi

* Revert "Add QueryEventsAfter"

This reverts commit 440a771d10.

* Simplify getting the required events

* Apply RoomEventFilter when getting events

* Add passing tests

* Remove logging

* Remove unused SQL statements
Update comments & add TODO
2022-02-21 17:12:22 +01:00
Neil Alexander dbded87525
Expose sync endpoints via /v3 (#2203) 2022-02-18 14:14:16 +00:00
Neil Alexander 353168a9e9
Fix potential panic in NewStreamTokenFromString caused by off-by-one error (#2196)
Line 291 could panic when trying to set `positions[i]` if `i == len(positions)`.
2022-02-17 13:25:41 +00:00
Neil Alexander fa1e12b503
Don't panic on retiring an invite that we haven't seen yet (#2189) 2022-02-16 11:56:08 +00:00
S7evinK 2771d93748
Remove OutputKeyChangeEvent consumer on keyserver (#2160)
* Remove keyserver consumer

* Remove keyserver from eduserver

* Directly upload device keys without eduserver

* Add passing tests
2022-02-08 18:13:38 +01:00
S7evinK 9de7efa0b0
Remove sarama/saramajetstream dependencies (#2138)
* Remove dependency on saramajetstream & sarama

Signed-off-by: Till Faelligen <tfaelligen@gmail.com>

* Remove internal.ContinualConsumer from federationapi

* Remove internal.ContinualConsumer from syncapi

* Remove internal.ContinualConsumer from keyserver

* Move to new Prepare function

* Remove saramajetstream & sarama dependency

* Delete unneeded file

* Remove duplicate import

* Log error instead of silently irgnoring it

* Move `OffsetNewest` and `OffsetOldest` into keyserver types, change them to be more sane values

* Fix comments

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2022-02-04 13:08:13 +00:00
Neil Alexander c773b038bb
Use pull consumers (#2140)
* Pull consumers

* Pull consumers

* Only nuke consumers if they are push consumers

* Clean up old consumers

* Better error handling

* Update comments
2022-02-02 13:32:48 +00:00
Neil Alexander ba1a9b98b7
Tweak some logging (#2130)
* Modify some log levels

* Update gomatrixserverlib to matrix-org/gomatrixserverlib@336334f

* Update gomatrixserverlib to matrix-org/gomatrixserverlib@cde7ac8

* Demote warning about key change producer

* Add more useful roomserver logging

* Further tweaking
2022-01-31 10:48:28 +00:00
Neil Alexander a763cbb0e1
Roomserver/federation input refactor (#2104)
* Put federation client functions into their own file

* Look for missing auth events in RS input

* Remove retrieveMissingAuthEvents from federation API

* Logging

* Sorta transplanted the code over

* Use event origin failing all else

* Don't get stuck on mutexes:

* Add verifier

* Don't mark state events with zero snapshot NID as not existing

* Check missing state if not an outlier before storing the event

* Reject instead of soft-fail, don't copy roominfo so much

* Use synchronous contexts, limit time to fetch missing events

* Clean up some commented out bits

* Simplify `/send` endpoint significantly

* Submit async

* Report errors on sending to RS input

* Set max payload in NATS to 16MB

* Tweak metrics

* Add `workerForRoom` for tidiness

* Try skipping unmarshalling errors for RespMissingEvents

* Track missing prev events separately to avoid calculating state when not possible

* Tweak logic around checking missing state

* Care about state when checking missing prev events

* Don't check missing state for create events

* Try that again

* Handle create events better

* Send create room events as new

* Use given event kind when sending auth/state events

* Revert "Use given event kind when sending auth/state events"

This reverts commit 089d64d271.

* Only search for missing prev events or state for new events

* Tweaks

* We only have missing prev if we don't supply state

* Room version tweaks

* Allow async inputs again

* Apply backpressure to consumers/synchronous requests to hopefully stop things being overwhelmed

* Set timeouts on roomserver input tasks (need to decide what timeout makes sense)

* Use work queue policy, deliver all on restart

* Reduce chance of duplicates being sent by NATS

* Limit the number of servers we attempt to reduce backpressure

* Some review comment fixes

* Tidy up a couple things

* Don't limit servers, randomise order using map

* Some context refactoring

* Update gmsl

* Don't resend create events

* Set stateIDs length correctly or else the roomserver thinks there are missing events when there aren't

* Exclude our own servername

* Try backing off servers

* Make excluding self behaviour optional

* Exclude self from g_m_e

* Update sytest-whitelist

* Update consumers for the roomserver output stream

* Remember to send outliers for state returned from /gme

* Make full HTTP tests less upsetti

* Remove 'If a device list update goes missing, the server resyncs on the next one' from the sytest blacklist

* Remove debugging test

* Fix blacklist again, remove unnecessary duplicate context

* Clearer contexts, don't use background in case there's something happening there

* Don't queue up events more than once in memory

* Correctly identify create events when checking for state

* Fill in gaps again in /gme code

* Remove `AuthEventIDs` from `InputRoomEvent`

* Remove stray field

Co-authored-by: Kegan Dougal <kegan@matrix.org>
2022-01-27 14:29:14 +00:00
kegsay 2c581377a5
Remodel how device list change IDs are created (#2098)
* Remodel how device list change IDs are created

Previously we made them using the offset Kafka supplied.
We don't run Kafka anymore, so now we make the SQL table assign
the change ID via an AUTOINCREMENTing ID. Redesign the
`keyserver_key_changes` table to have `UNIQUE(user_id)` so we
don't accumulate key changes forevermore, we now have at most 1
row per user which contains the highest change ID.

This needs a SQL migration.

* Ensure we bump the change ID on sqlite

* Actually read the DeviceChangeID not the Offset in synapi

* Add SQL migrations

* Prepare after migration; fixup dendrite-upgrade-test logging

* Use higher version numbers; fix sqlite query to increment better

* Default 0 on postgres

* fixup postgres migration on fresh dendrite instances
2022-01-21 09:56:06 +00:00
kegsay db7d9cba8a
BREAKING: Remove Partitioned Stream Positions (#2096)
* go mod tidy

* Break complement to check it fails CI

* Remove partitioned stream positions

This was used by the device list stream position. The device list position
now corresponds to the `Offset`, and the partition is always 0, in prep
for removing reliance on Kafka topics for device list changes.

* Linting

* Migrate old style tokens to new style because element-web doesn't soft-logoout on 4xx errors on /sync
2022-01-20 15:26:45 +00:00
Neil Alexander 16035b9737
NATS JetStream tweaks (#2086)
* Use named NATS durable consumers

* Build fixes

* Remove dupe call to SetFederationAPI

* Use namespaced consumer name

* Fix namespacing

* Fix unit tests hopefully
2022-01-07 17:31:57 +00:00
S7evinK 161f145176
Add NATS JetStream support (#1866)
* Add NATS JetStream support
Update shopify/sarama

* Fix addresses

* Don't change Addresses in Defaults

* Update saramajetstream

* Add missing error check

Keep typing events for at least one minute

* Use all configured NATS addresses

* Update saramajetstream

* Try setting up with NATS

* Make sure NATS uses own persistent directory (TODO: make this configurable)

* Update go.mod/go.sum

* Jetstream package

* Various other refactoring

* Build fixes

* Config tweaks, make random jetstream storage path for CI

* Disable interest policies

* Try to sane default on jetstream base path

* Try to use in-memory for CI

* Restore storage/retention

* Update nats.go dependency

* Adapt changes to config

* Remove unneeded TopicFor

* Dep update

* Revert "Remove unneeded TopicFor"

This reverts commit f5a4e4a339.

* Revert changes made to streams

* Fix build problems

* Update nats-server

* Update go.mod/go.sum

* Roomserver input API queuing using NATS

* Fix topic naming

* Prometheus metrics

* More refactoring to remove saramajetstream

* Add missing topic

* Don't try to populate map that doesn't exist

* Roomserver output topic

* Update go.mod/go.sum

* Message acknowledgements

* Ack tweaks

* Try to resume transaction re-sends

* Try to resume transaction re-sends

* Update to matrix-org/gomatrixserverlib@91dadfb

* Remove internal.PartitionStorer from components that don't consume keychanges

* Try to reduce re-allocations a bit in resolveConflictsV2

* Tweak delivery options on RS input

* Publish send-to-device messages into correct JetStream subject

* Async and sync roomserver input

* Update dendrite-config.yaml

* Remove roomserver tests for now (they need rewriting)

* Remove roomserver test again (was merged back in)

* Update documentation

* Docker updates

* More Docker updates

* Update Docker readme again

* Fix lint issues

* Send final event in `processEvent` synchronously (since this might stop Sytest from being so upset)

* Don't report event rejection errors via `/send`, since apparently this is upsetting tests that don't expect that

* Go 1.16 instead of Go 1.13 for upgrade tests and Complement

* Revert "Don't report event rejection errors via `/send`, since apparently this is upsetting tests that don't expect that"

This reverts commit 368675283f.

* Don't report any errors on `/send` to see what fun that creates

* Fix panics on closed channel sends

* Enforce state key matches sender

* Do the same for leave

* Various tweaks to make tests happier

Squashed commit of the following:

commit 13f9028e7a
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Tue Jan 4 15:47:14 2022 +0000

    Do the same for leave

commit e6be7f05c3
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Tue Jan 4 15:33:42 2022 +0000

    Enforce state key matches sender

commit 85ede6d64b
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Tue Jan 4 14:07:04 2022 +0000

    Fix panics on closed channel sends

commit 9755494a98
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Tue Jan 4 13:38:22 2022 +0000

    Don't report any errors on `/send` to see what fun that creates

commit 3bb4f87b5d
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Tue Jan 4 13:00:26 2022 +0000

    Revert "Don't report event rejection errors via `/send`, since apparently this is upsetting tests that don't expect that"

    This reverts commit 368675283f.

commit fe2673ed7b
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Tue Jan 4 12:09:34 2022 +0000

    Go 1.16 instead of Go 1.13 for upgrade tests and Complement

commit 368675283f
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Tue Jan 4 11:51:45 2022 +0000

    Don't report event rejection errors via `/send`, since apparently this is upsetting tests that don't expect that

commit b028dfc085
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Tue Jan 4 10:29:08 2022 +0000

    Send final event in `processEvent` synchronously (since this might stop Sytest from being so upset)

* Merge in NATS Server v2.6.6 and nats.go v1.13 into the in-process connection fork

* Add `jetstream.WithJetStreamMessage` to make ack/nak-ing less messy, use process context in consumers

* Fix consumer component name in  federation API

* Add comment explaining where streams are defined

* Tweaks to roomserver input with comments

* Finish that sentence that I apparently forgot to finish in INSTALL.md

* Bump version number of config to 2

* Add comments around asynchronous sends to roomserver in processEventWithMissingState

* More useful error message when the config version does not match

* Set version in generate-config

* Fix version in config.Defaults

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2022-01-05 17:44:49 +00:00
Neil Alexander b7f09f78b0
Cherry-pick typing fix from #2061
Co-authored-by: Tommie Gannert <tommie@gannert.se>
2021-12-03 17:26:30 +00:00
Neil Alexander a9e715b5c5
Guard in all key consumers 2021-11-16 09:27:49 +00:00
Neil Alexander 837f50ac89
Reduce CPU usage of SelectStateInRange (#2038) 2021-11-03 09:53:37 +00:00