Commit graph

67 commits

Author SHA1 Message Date
Neil Alexander fe56651fa2
Merge branch 'master' into neilalexander/rstxn 2022-02-02 17:47:20 +00:00
Neil Alexander 4d9f5b2e57
Fix panic from closing the input channel before the workers complete (it'll get GC'd either way) 2022-02-02 17:46:37 +00:00
Neil Alexander 885c70c31c
Tweaks 2022-02-02 16:40:59 +00:00
Neil Alexander f49f4e55e0
Fix incorrect error check 2022-02-02 15:52:02 +00:00
Neil Alexander f58ee67a7a
Fix bugs 2022-02-02 15:42:22 +00:00
Neil Alexander 2ca972ef76
Pass through errors properly 2022-02-02 15:24:45 +00:00
Neil Alexander 15038eb2e7
Don't roll back transactions when events rejected 2022-02-02 15:04:33 +00:00
Neil Alexander 9fb2503493
good lord it gets worse 2022-02-02 14:47:39 +00:00
Neil Alexander 250a0ee946
Fill some gaps 2022-02-02 14:12:55 +00:00
Neil Alexander b4c136a9c4
Handle cases where the room does not exist 2022-02-01 13:31:45 +00:00
Neil Alexander e0a485c50d
Tweak order 2022-02-01 13:15:20 +00:00
Neil Alexander 8cbf67a2f2
Better transaction management 2022-02-01 13:14:21 +00:00
Neil Alexander f2c0bb165e
Add transaction to all database tables in roomserver, rename latest events updater to room updater, use room updater for all RS input 2022-02-01 12:52:37 +00:00
Neil Alexander 893aa3b141
More logging tweaks 2022-01-31 16:01:54 +00:00
Neil Alexander 07d0e72a8b
Improve roomserver logging 2022-01-31 15:33:00 +00:00
Neil Alexander d21f3eace0
Roomserver fixes (#2133)
* Improve server selection somewhat

* Remove things from the map when we're done

* Be less panicky about auth event signatures in case they are not fatal after all

* Accept HasState in all cases

* Send join asynchronously

* Revert "Send join asynchronously"

This reverts commit 5b685bfcd0.

* Joins and leaves use background context
2022-01-31 14:36:59 +00:00
Neil Alexander f9547a53d2
Tweak roomserver logging for rejected events 2022-01-31 12:01:53 +00:00
Neil Alexander ba1a9b98b7
Tweak some logging (#2130)
* Modify some log levels

* Update gomatrixserverlib to matrix-org/gomatrixserverlib@336334f

* Update gomatrixserverlib to matrix-org/gomatrixserverlib@cde7ac8

* Demote warning about key change producer

* Add more useful roomserver logging

* Further tweaking
2022-01-31 10:48:28 +00:00
Neil Alexander eb8e770e99
Revert consumer change 2022-01-31 10:42:41 +00:00
Neil Alexander a271fde8f5
Only limit context for fetching missing auth/prev events (#2131) 2022-01-31 10:39:33 +00:00
Neil Alexander 8e4002831f
Call hooks for outliers (#2119)
* Move hook call when processing room events

* Fix build

* Call hooks for outliers too
2022-01-28 13:11:56 +00:00
Neil Alexander e9fbad6f20
Move hook call when processing room events (#2118)
* Move hook call when processing room events

* Fix build
2022-01-28 12:33:31 +00:00
Neil Alexander 48789ebec5
Don't flood Sentry with context cancelled/deadline exceeded errors (#2115) 2022-01-28 10:27:28 +00:00
Neil Alexander a763cbb0e1
Roomserver/federation input refactor (#2104)
* Put federation client functions into their own file

* Look for missing auth events in RS input

* Remove retrieveMissingAuthEvents from federation API

* Logging

* Sorta transplanted the code over

* Use event origin failing all else

* Don't get stuck on mutexes:

* Add verifier

* Don't mark state events with zero snapshot NID as not existing

* Check missing state if not an outlier before storing the event

* Reject instead of soft-fail, don't copy roominfo so much

* Use synchronous contexts, limit time to fetch missing events

* Clean up some commented out bits

* Simplify `/send` endpoint significantly

* Submit async

* Report errors on sending to RS input

* Set max payload in NATS to 16MB

* Tweak metrics

* Add `workerForRoom` for tidiness

* Try skipping unmarshalling errors for RespMissingEvents

* Track missing prev events separately to avoid calculating state when not possible

* Tweak logic around checking missing state

* Care about state when checking missing prev events

* Don't check missing state for create events

* Try that again

* Handle create events better

* Send create room events as new

* Use given event kind when sending auth/state events

* Revert "Use given event kind when sending auth/state events"

This reverts commit 089d64d271.

* Only search for missing prev events or state for new events

* Tweaks

* We only have missing prev if we don't supply state

* Room version tweaks

* Allow async inputs again

* Apply backpressure to consumers/synchronous requests to hopefully stop things being overwhelmed

* Set timeouts on roomserver input tasks (need to decide what timeout makes sense)

* Use work queue policy, deliver all on restart

* Reduce chance of duplicates being sent by NATS

* Limit the number of servers we attempt to reduce backpressure

* Some review comment fixes

* Tidy up a couple things

* Don't limit servers, randomise order using map

* Some context refactoring

* Update gmsl

* Don't resend create events

* Set stateIDs length correctly or else the roomserver thinks there are missing events when there aren't

* Exclude our own servername

* Try backing off servers

* Make excluding self behaviour optional

* Exclude self from g_m_e

* Update sytest-whitelist

* Update consumers for the roomserver output stream

* Remember to send outliers for state returned from /gme

* Make full HTTP tests less upsetti

* Remove 'If a device list update goes missing, the server resyncs on the next one' from the sytest blacklist

* Remove debugging test

* Fix blacklist again, remove unnecessary duplicate context

* Clearer contexts, don't use background in case there's something happening there

* Don't queue up events more than once in memory

* Correctly identify create events when checking for state

* Fill in gaps again in /gme code

* Remove `AuthEventIDs` from `InputRoomEvent`

* Remove stray field

Co-authored-by: Kegan Dougal <kegan@matrix.org>
2022-01-27 14:29:14 +00:00
Neil Alexander 16035b9737
NATS JetStream tweaks (#2086)
* Use named NATS durable consumers

* Build fixes

* Remove dupe call to SetFederationAPI

* Use namespaced consumer name

* Fix namespacing

* Fix unit tests hopefully
2022-01-07 17:31:57 +00:00
Neil Alexander a422321435
Fix panic at startup if roomserver was not given federation API reference by the time NATS consumes an event, tweak backpressure metrics 2022-01-07 13:41:53 +00:00
S7evinK 161f145176
Add NATS JetStream support (#1866)
* Add NATS JetStream support
Update shopify/sarama

* Fix addresses

* Don't change Addresses in Defaults

* Update saramajetstream

* Add missing error check

Keep typing events for at least one minute

* Use all configured NATS addresses

* Update saramajetstream

* Try setting up with NATS

* Make sure NATS uses own persistent directory (TODO: make this configurable)

* Update go.mod/go.sum

* Jetstream package

* Various other refactoring

* Build fixes

* Config tweaks, make random jetstream storage path for CI

* Disable interest policies

* Try to sane default on jetstream base path

* Try to use in-memory for CI

* Restore storage/retention

* Update nats.go dependency

* Adapt changes to config

* Remove unneeded TopicFor

* Dep update

* Revert "Remove unneeded TopicFor"

This reverts commit f5a4e4a339.

* Revert changes made to streams

* Fix build problems

* Update nats-server

* Update go.mod/go.sum

* Roomserver input API queuing using NATS

* Fix topic naming

* Prometheus metrics

* More refactoring to remove saramajetstream

* Add missing topic

* Don't try to populate map that doesn't exist

* Roomserver output topic

* Update go.mod/go.sum

* Message acknowledgements

* Ack tweaks

* Try to resume transaction re-sends

* Try to resume transaction re-sends

* Update to matrix-org/gomatrixserverlib@91dadfb

* Remove internal.PartitionStorer from components that don't consume keychanges

* Try to reduce re-allocations a bit in resolveConflictsV2

* Tweak delivery options on RS input

* Publish send-to-device messages into correct JetStream subject

* Async and sync roomserver input

* Update dendrite-config.yaml

* Remove roomserver tests for now (they need rewriting)

* Remove roomserver test again (was merged back in)

* Update documentation

* Docker updates

* More Docker updates

* Update Docker readme again

* Fix lint issues

* Send final event in `processEvent` synchronously (since this might stop Sytest from being so upset)

* Don't report event rejection errors via `/send`, since apparently this is upsetting tests that don't expect that

* Go 1.16 instead of Go 1.13 for upgrade tests and Complement

* Revert "Don't report event rejection errors via `/send`, since apparently this is upsetting tests that don't expect that"

This reverts commit 368675283f.

* Don't report any errors on `/send` to see what fun that creates

* Fix panics on closed channel sends

* Enforce state key matches sender

* Do the same for leave

* Various tweaks to make tests happier

Squashed commit of the following:

commit 13f9028e7a
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Tue Jan 4 15:47:14 2022 +0000

    Do the same for leave

commit e6be7f05c3
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Tue Jan 4 15:33:42 2022 +0000

    Enforce state key matches sender

commit 85ede6d64b
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Tue Jan 4 14:07:04 2022 +0000

    Fix panics on closed channel sends

commit 9755494a98
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Tue Jan 4 13:38:22 2022 +0000

    Don't report any errors on `/send` to see what fun that creates

commit 3bb4f87b5d
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Tue Jan 4 13:00:26 2022 +0000

    Revert "Don't report event rejection errors via `/send`, since apparently this is upsetting tests that don't expect that"

    This reverts commit 368675283f.

commit fe2673ed7b
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Tue Jan 4 12:09:34 2022 +0000

    Go 1.16 instead of Go 1.13 for upgrade tests and Complement

commit 368675283f
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Tue Jan 4 11:51:45 2022 +0000

    Don't report event rejection errors via `/send`, since apparently this is upsetting tests that don't expect that

commit b028dfc085
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Tue Jan 4 10:29:08 2022 +0000

    Send final event in `processEvent` synchronously (since this might stop Sytest from being so upset)

* Merge in NATS Server v2.6.6 and nats.go v1.13 into the in-process connection fork

* Add `jetstream.WithJetStreamMessage` to make ack/nak-ing less messy, use process context in consumers

* Fix consumer component name in  federation API

* Add comment explaining where streams are defined

* Tweaks to roomserver input with comments

* Finish that sentence that I apparently forgot to finish in INSTALL.md

* Bump version number of config to 2

* Add comments around asynchronous sends to roomserver in processEventWithMissingState

* More useful error message when the config version does not match

* Set version in generate-config

* Fix version in config.Defaults

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2022-01-05 17:44:49 +00:00
Neil Alexander c3dda0779d
Return event NID from StoreEvent, match PSQL vs SQLite behaviour, tweak backfill persistence (#2071) 2021-12-09 15:03:26 +00:00
Neil Alexander 6e93531e94
Don't persist transaction IDs in the roomserver (#2048) 2021-11-22 09:13:12 +00:00
Neil Alexander 39e8d1cc6f
Track knocking in membership updater (#1935)
* Topologically sort outliers in SendEventWithState

* Knock in membership updater

* Update gomatrixserverlib

* Update gomatrixserverlib

* Get the NID of the knock event properly for the membership updater
2021-07-22 12:26:58 +01:00
kegsay f8ae391a5b
Expose more data when outputting output room events (#1916)
* Add more logging for content fields

* Fix fields
2021-07-13 11:19:21 +01:00
Neil Alexander 192a7a7923
Roomserver input backpressure metric
Squashed commit of the following:

commit 56e934ac0aeedcfb2c072010959ba49734d4e0cb
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Fri Jul 2 09:39:30 2021 +0100

    Fix metric

commit 3911f3a0c17b164b012e881c085ceca30f5de408
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Fri Jul 2 09:36:29 2021 +0100

    Register correct metric

commit a9ddbfaed421538a701151801e9451198a8be4f3
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Fri Jul 2 09:33:33 2021 +0100

    Try to capture RS input backpressure metric
2021-07-02 09:48:55 +01:00
Neil Alexander 7c3991ee2f
Use a custom FIFO queue for the RS input API (#1888)
* Use a FIFO queue instead of a channel to reduce backpressure

* Make sure someone wakes up

* Tweaks

* Add comments
2021-06-28 15:11:36 +01:00
Kegsay af41f6d454
Add Sentry support (#1803)
* Add Sentry support

* Use HTTP Sentry properly maybe

* Capture panics

* Log fed Sentry stuff correctly

* British english linter
2021-03-24 10:25:24 +00:00
Neil Alexander 02e6d89cc2
Fix crash in membership updater (#1753)
* Fix nil pointer exception in membership updater

* goimports
2021-02-06 11:49:18 +00:00
Neil Alexander de5f22a469
Remove redundant check (#1748) 2021-02-04 11:12:52 +00:00
Neil Alexander 244ff0dccb
Don't create so many state snapshots when updating forward extremities (#1718)
* Light-weight checking of state changes when updating forward extremities

* Only do this for non-state events, since state events will always result in state change at extremities
2021-01-18 13:21:33 +00:00
Neil Alexander 3ac693c7a5
Add dendrite_roomserver_processroomevent_duration_millis to prometheus
Squashed commit of the following:

commit e5e2d793119733ecbcf9b85f966e018ab0318741
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Wed Jan 13 17:28:12 2021 +0000

    Add dendrite_roomserver_processroomevent_duration_millis to prometheus
2021-01-13 17:31:46 +00:00
Neil Alexander e1e34b8994
Deep-checking forward extremities (#1698) 2021-01-11 12:47:25 +00:00
Neil Alexander 2885eb0422
Don't use request context for input room event queued tasks (#1640) 2020-12-14 14:40:57 +00:00
Neil Alexander f5869daaab
Don't start more goroutines than needed on RS input, increase input worker buffer size (#1638) 2020-12-14 10:42:21 +00:00
Neil Alexander d9b3035342
Adjust latest events updater (#1623)
* Adjust forward elatest events updater

* Populate newLatest in all cases

* Re-add existingPrevs loop
2020-12-09 13:34:37 +00:00
Kegsay b507312d4c
MSC2836 threading: part 2 (#1596)
* Update GMSL

* Add MSC2836EventRelationships to fedsender

* Call MSC2836EventRelationships in reqCtx

* auth remote servers

* Extract room ID and servers from previous events; refactor a bit

* initial cut of federated threading

* Use the right client/fed struct in the response

* Add QueryAuthChain for use with MSC2836

* Add auth chain to federated response

* Fix pointers

* under CI: more logging and enable mscs, nil fix

* Handle direction: up

* Actually send message events to the roomserver..

* Add children and children_hash to unsigned, with tests

* Add logic for exploring threads and tracking children; missing storage functions

* Implement storage functions for children

* Add fetchUnknownEvent

* Do federated hits for include_children if we have unexplored children

* Use /ev_rel rather than /event as the former includes child metadata

* Remove cross-room threading impl

* Enable MSC2836 in the p2p demo

* Namespace mscs db

* Enable msc2836 for ygg

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2020-12-04 14:11:01 +00:00
Kegsay 6353b0b7e4
MSC2836: Threading - part one (#1589)
* Add mscs/hooks package, begin work for msc2836

* Flesh out hooks and add SQL schema

* Begin implementing core msc2836 logic

* Add test harness

* Linting

* Implement visibility checks; stub out APIs for tests

* Flesh out testing

* Flesh out walkThread a bit

* Persist the origin_server_ts as well

* Edges table instead of relationships

* Add nodes table for event metadata

* LEFT JOIN to extract origin_server_ts for children

* Add graph walking structs

* Implement walking algorithm

* Add more graph walking tests

* Add auto_join for local rooms

* Fix create table syntax on postgres

* Add relationship_room_id|servers to the unsigned section of events

* Persist the parent room_id/servers in edge metadata

Other events cannot assert the true room_id/servers for the
parent event, only make claims to them, hence why this is
edge metadata.

* guts to pass through room_id/servers

* Refactor msc2836 to allow handling from federation

* Add JoinedVia to PerformJoin responses

* Fix tests; review comments
2020-11-19 11:34:59 +00:00
Neil Alexander 20a01bceb2
Pass pointers to events — reloaded (#1583)
* Pass events as pointers

* Fix lint errors

* Update gomatrixserverlib

* Update gomatrixserverlib

* Update to matrix-org/gomatrixserverlib#240
2020-11-16 15:44:53 +00:00
S7evinK d5675feb96
Add possibilty to configure MaxMessageBytes for sarama (#1563)
* Add configuration for max_message_bytes for sarama

* Log all errors when sending multiple messages

Signed-off-by: Till Faelligen <tfaelligen@gmail.com>

* Add missing config

* - Better comments on what MaxMessageBytes is used for
- Also sets the size the consumer may use
2020-10-27 14:11:37 +00:00
Neil Alexander 3afc623098
Fix RewritesState bug (#1557)
* Set RewritesState once

* Check if any new state provided

* Obey rewritesState

* Don't nuke everything the sync API knows when purging state

* Fix panic from duplicate insert

* Consistency

* Use HasState

* Remove nolint

* Clean up joined rooms on state rewrite
2020-10-22 10:39:16 +01:00
Neil Alexander 04dc019e5e
Don't set empty state snapshots 2020-10-21 16:21:36 +01:00
Neil Alexander 534f9a9eb6
Refactor forward extremities (#1556)
* Add resolve-state helper

* Tweaks

* Refactor forward extremities, again

* Tweaks

* Minor optimisation

* Make path a bit clearer

* Only process state/membership if forward extremities have changed

* Usage comments in resolve-state
2020-10-21 15:37:07 +01:00
Neil Alexander 6c3c621de0
Remove invalid state delta check (#1550) 2020-10-20 12:36:16 +01:00