Commit graph

41 commits

Author SHA1 Message Date
Paige Thompson 1e0e935699
add option for credentials file for NATS; more info: https://docs.nat… (#3415)
Not 100% on how you would want to test this; you would need a NATS
server configured with NKey:

https://docs.nats.io/using-nats/developer/connecting/creds

This was tested with Synadia's free NATS SaaS and it does appear to be
working, however there's an issue with how NATS is used in general:

```
time="2024-09-10T14:40:05.105105731Z" level=fatal msg="Unable to add in-memory stream" error="nats: account requires a stream config to have max bytes set" stream=DendriteInputRoomEvent subjects="[DendriteInputRoomEvent DendriteInputRoomEvent.>]"
```

I tried creating the topic manually, however dendrite insists on
deleting/recreating the topic, so getting this to work is an issue I'm
going ot have to deal with later unless somebody gets to it before then.

If you feel more competent than me and wanna draw from this PR as an
example (if you have another way you'd prefer to see this done) go ahead
feel free I just wanna see it get done and I'm not particularly good at
working with golang.

Signed-off-by: `Paige Thompson <paige@paige.bio>`
2024-09-10 21:28:04 +02:00
Neil 117ed66037
Update NATS to 2.10.20, use SyncAlways (#3418)
The internal NATS instance is definitely convenient but it does have one
problem: its lifecycle is tied to the Dendrite process. That means if
Dendrite panics or OOMs, it takes out NATS with it. I suspect this is
sometimes contributing to what people see with stuck streams, as some
operations or state might not be written to disk fully before it gets
interrupted.

Using `SyncAlways` means that NATS will effectively use `O_SYNC` and
block writes on flushes, which should improve resiliency against this
kind of failure considerably. It might affect performance a little but
shouldn't be significant.

Also updates NATS to 2.10.20 as there have been all sorts of fixes since
2.10.7, including better `SyncAlways` handling.

Signed-off-by: Neil Alexander <git@neilalexander.dev>

---------

Signed-off-by: Neil Alexander <git@neilalexander.dev>
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2024-09-10 20:54:38 +02:00
Till 81f73c9f8d
Reuse existing NATS connection (#3345)
If using external NATS, we opened unnecessary connections. This now
re-uses existing connections.

[skip ci]
2024-03-22 22:33:23 +01:00
Till 1555b3542d
Introduce a new stream for the appservice consumer (#3277)
This introduces a new stream the syncAPI produces to once it processed a
`OutputRoomEvent` and the appservices consumes.
This is to work around a race condition where appservices receive an
event before the syncAPI has handled it, this can result in e.g. calls
to `/joined_members` returning a wrong membership list.
2023-12-12 12:13:55 +01:00
Neil e93bdd56fd
Set max age for roomserver input stream to avoid excessive interior deletes (#3145)
If old messages build up in the input stream and do not get processed
successfully, this can create a significant drift between the stream
first sequence and the consumer ack floors, which results in a slow and
expensive start-up when interest-based retention is in use.

If a message is sat in the stream for 24 hours, it's probably not going
to get processed successfully, so let NATS drop them instead. Dendrite
can reconcile by fetching missing events later if it needs to.

---------

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2023-07-07 19:59:34 +02:00
Till Faelligen e1d76de6c6
Increase NATS server startup timeout 2023-07-06 10:04:46 +02:00
Till 5e85a00cb3
Remove BaseDendrite (#3023)
Removes `BaseDendrite` to, hopefully, make testing and composing of
components easier in the future.
2023-03-22 09:21:32 +01:00
Till 5579121c6f
Preparations for removing BaseDendrite (#3016)
Preparations to actually remove/replace `BaseDendrite`.
Quite a few changes:
- SyncAPI accepts an `fulltext.Indexer` interface (fulltext is removed
from `BaseDendrite`)
- Caches are removed from `BaseDendrite`
- Introduces a `Router` struct (likely to change)
  - also fixes #2903
- Introduces a `sqlutil.ConnectionManager`, which should remove
`base.DatabaseConnection` later on
- probably more
2023-03-17 11:09:45 +00:00
David Schneider e6aa0955ff
Unify logging by using logrus for jetstream logs (#2976)
I guess tests for the logging is rather unusual so I omitted tests for
this change.

* [x] I have added Go unit tests or [Complement integration
tests](https://github.com/matrix-org/complement) for this PR _or_ I have
justified why this PR doesn't need tests
* [x] Pull request includes a [sign off below using a legally
identifiable
name](https://matrix-org.github.io/dendrite/development/contributing#sign-off)
_or_ I have already signed off privately

Signed-off-by: `David Schneider <dsbrng25b@gmail.com>`

---------

Signed-off-by: David Schneider <dsbrng25b@gmail.com>
2023-02-24 08:56:53 +01:00
Till 48fa869fa3
Use t.TempDir for SQLite databases, so tests don't rip out each others databases (#2950)
This should hopefully finally fix issues about `disk I/O error` as seen
[here](https://gitlab.alpinelinux.org/alpine/aports/-/jobs/955030/raw)

Hopefully this will also fix `SSL accept attempt failed` issues by
disabling HTTP keep alives when generating a config for CI.
2023-01-23 13:17:15 +01:00
Neil Alexander a916b041b1
Detect consumer being deleted in JetStreamConsumer 2022-11-16 10:28:22 +00:00
Till 8c0c3441d8
Add RoomEventType nats.Header to avoid unneeded unmarshalling (#2765) 2022-10-05 12:12:42 +02:00
Neil Alexander 3da182212e
Track reasons why the process is in a degraded state 2022-10-04 13:02:41 +01:00
Till 249b32c4f3
Refactor notifications (#2688)
This PR changes the handling of notifications
- removes the `StreamEvent` and `ReadUpdate` stream
- listens on the `OutputRoomEvent` stream in the UserAPI to inform the
SyncAPI about unread notifications
- listens on the `OutputReceiptEvent` stream in the UserAPI to set
receipts/update notifications
- sets the `read_markers` directly from within the internal UserAPI

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2022-09-27 15:01:34 +02:00
Till d5876abbe9
Fulltext implementation incl. config (#2480)
This adds the main component of the fulltext search.
This PR doesn't do anything yet, besides creating an empty fulltextindex
folder if enabled. Indexing events is done in a separate PR.
2022-09-07 18:15:54 +02:00
Till 2cfcfddecc
Add a SigningKeyUpdate producer (#2697)
This adds a new stream for signing key updates, this should ensure we
don't lose any updates over federation.
2022-09-07 11:45:12 +02:00
Neil Alexander ad6b902b84
Refactor appservices component (#2687)
This PR refactors the app services component. It makes the following changes:

* Each appservice now gets its own NATS JetStream consumer
* The appservice database is now removed entirely, since we just use JetStream as a data source instead
* The entire component is now much simpler and we deleted lots of lines of code 💅

The result is that it should be much lighter and hopefully much more performant.
2022-09-01 09:20:40 +01:00
Neil Alexander 175f65407a
Allow batching in JetStreamConsumer (#2686)
This allows us to receive more than one message from NATS at a time if we want.
2022-08-31 12:21:56 +01:00
Brian Meek de78eab63a
Add race testing to tests, and fix a few small race conditions in the tests (#2587)
* Add race testing to tests, and fix a few small race conditions in the tests

* Enable run-sytest on MacOS

* Remove deadlock detecting mutex, per code review feedback

* Remove autoformatting related changes and a closure that is not needed

* Adjust to importing nats client as 'natsclient'

Signed-off-by: Brian Meek <brian@hntlabs.com>

* Clarify the use of gooseMutex to proect goose internal state

Signed-off-by: Brian Meek <brian@hntlabs.com>

* Remove no longer needed mutex for guarding goose

Signed-off-by: Brian Meek <brian@hntlabs.com>
2022-08-05 09:19:33 +01:00
Till 7ec70272d2
Disable NATS Server logging, allow self-signed certificates (#2605)
* Disable NATS Server logs in CI

* Add option to disable TLS validation for NATS
2022-08-02 13:58:08 +02:00
Neil Alexander 7120eb6bc9
Add InputDeviceListUpdate to the keyserver, remove old input API (#2536)
* Add `InputDeviceListUpdate` to the keyserver, remove old input API

* Fix copyright

* Log more information when a device list update fails
2022-06-15 14:27:07 +01:00
kegsay 236b16aa6c
Begin adding syncapi component tests (#2442)
* Add very basic syncapi tests

* Add a way to inject jetstream messages

* implement add_state_ids

* bugfixes

* Unbreak tests

* Remove now un-needed API call

* Linting
2022-05-09 17:23:02 +01:00
Neil Alexander 09d754cfbf
One NATS instance per BaseDendrite (#2438)
* One NATS instance per `BaseDendrite`

* Fix roomserver
2022-05-09 14:15:24 +01:00
Neil Alexander 923f789ca3
Fix graceful shutdown 2022-04-27 15:29:49 +01:00
Neil Alexander d7cc187ec0
Prevent JetStream from handling OS signals, allow running as a Windows service (#2385)
* Prevent JetStream from handling OS signals, allow running as a Windows service (fixes #2374)

* Remove double import
2022-04-27 13:36:40 +01:00
kegsay 7499147550
Add test infrastructure code for dendrite unit/integ tests (#2331)
* Add test infrastructure code for dendrite unit/integ tests

Start re-enabling some syncapi storage tests in the process.

* Linting

* Add postgres service to unit tests

* dendrite not syncv3

* Skip test which doesn't work

* Linting

* Add `jetstream.PrepareForTests`

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2022-04-08 10:12:30 +01:00
Till e5e3350ce1
Add presence module V2 (#2312)
* Syncapi presence

* Clientapi http presence handler

* Why is this here?

* Missing files

* FederationAPI presence implementation

* Add new presence stream

* Pinecone update

* Pinecone update

* Add passing tests

* Make linter happy

* Add presence producer

* Add presence config option

* Set user to unavailable after x minutes

* Only set currently_active if online
Avoid unneeded presence updates when syncing

* Tweaks

* Query devices for last_active_ts
Fixes & tweaks

* Export SharedUsers/SharedUsers

* Presence stream in MemoryStorage

* Remove status_msg_nil

* Fix sytest crashes

* Make presence types const and use stringer for it

* Change options to allow inbound/outbound presence

* Fix option & typo

* Update configs

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2022-04-06 13:11:19 +02:00
S7evinK 49dc49b232
Remove eduserver (#2306)
* Move receipt sending to own JetStream producer

* Move SendToDevice to producer

* Remove most parts of the EDU server

* Fix SendToDevice & copyrights

* Move structs, cleanup EDU Server traces

* Use HeadersOnly subscription

* Missing file

* Fix linter issues

* Move consumers to own files

* Rename durable consumer; Consumer cleanup

* Docs/config cleanup
2022-03-29 14:14:35 +02:00
Neil Alexander e6d4bdeed5
Try to recover from corrupted NATS streams in memory temporarily (#2301) 2022-03-25 12:24:21 +00:00
Neil Alexander 98a5e410d7
Per-room consumers (#2293)
* Roomserver input refactoring — again!

* Ensure the actor runs again

* Preserve consumer after unsubscribe

* Another sprinkling of magic

* Rename `TopicFor` to `Prefixed`

* Recreate the stream if the config is bad

* Check streams too

* Prefix subjects, preserve inboxes

* Recreate if subjects wrong

* Remove stream subject

* Reconstruct properly

* Fix mutex unlock

* Comments

* Fix tests

* Don't drop events

* Review comments

* Separate `queueInputRoomEvents` function

* Re-jig control flow a bit
2022-03-23 10:20:18 +00:00
Neil Alexander 9572f5ed19
Wait for safe shutdown of NATS Server (#2289) 2022-03-21 10:32:34 +00:00
Neil Alexander e30aa38fb0
Stream tweaks, use same codepath for sync vs async input room events, wait for error response via NATS messages (#2283) 2022-03-16 14:21:11 +00:00
Neil Alexander 626d3f6cf5
Capture Sentry exceptions for errors in JetStreamConsumer 2022-03-07 16:40:56 +00:00
Dan f05ce478f0
Implement Push Notifications (#1842)
* Add Pushserver component with Pushers API

Co-authored-by: Tommie Gannert <tommie@gannert.se>
Co-authored-by: Dan Peleg <dan@globekeeper.com>

* Wire Pushserver component

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>

* Add PushGatewayClient.

The full event format is required for Sytest.

* Add a pushrules module.

* Change user API account creation to use the new pushrules module's defaults.

Introduces "scope" as required by client API, and some small field
tweaks to make some 61push Sytests pass.

* Add push rules query/put API in Pushserver.

This manipulates account data over User API, and fires sync messages
for changes. Those sync messages should, according to an existing TODO
in clientapi, be moved to userapi.

Forks clientapi/producers/syncapi.go to pushserver/ for later extension.

* Add clientapi routes for push rules to Pushserver.

A cleanup would be to move more of the name-splitting logic into
pushrules.go, to depollute routing.go.

* Output rooms.join.unread_notifications in /sync.

This is the read-side. Pushserver will be the write-side.

* Implement pushserver/storage for notifications.

* Use PushGatewayClient and the pushrules module in Pushserver's room consumer.

* Use one goroutine per user to avoid locking up the entire server for
  one bad push gateway.
* Split pushing by format.
* Send one device per push. Sytest does not support coalescing
  multiple devices into one push. Matches Synapse. Either we change
  Sytest, or remove the group-by-url-and-format logic.
* Write OutputNotificationData from push server. Sync API is already
  the consumer.

* Implement read receipt consumers in Pushserver.

Supports m.read and m.fully_read receipts.

* Add clientapi route for /unstable/notifications.

* Rename to UpsertPusher for clarity and handle pusher update

* Fix linter errors

* Ignore body.Close() error check

* Fix push server internal http wiring

* Add 40 newly passing 61push tests to whitelist

* Add next 12 newly passing 61push tests to whitelist

* Send notification data before notifying users in EDU server consumer

* NATS JetStream

* Goodbye sarama

* Fix `NewStreamTokenFromString`

* Consume on the correct topic for the roomserver

* Don't panic, NAK instead

* Move push notifications into the User API

* Don't set null values since that apparently causes Element upsetti

* Also set omitempty on conditions

* Fix bug so that we don't override the push rules unnecessarily

* Tweak defaults

* Update defaults

* More tweaks

* Move `/notifications` onto `r0`/`v3` mux

* User API will consume events and read/fully read markers from the sync API with stream positions, instead of consuming directly

Co-authored-by: Piotr Kozimor <p1996k@gmail.com>
Co-authored-by: Tommie Gannert <tommie@gannert.se>
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2022-03-03 11:40:53 +00:00
Neil Alexander 934491eda5
Update NATS Server to v2.7.2 (#2193)
* Update NATS JetStream to v2.7.2

* Remove deprecated option
2022-02-17 13:15:35 +00:00
S7evinK 9de7efa0b0
Remove sarama/saramajetstream dependencies (#2138)
* Remove dependency on saramajetstream & sarama

Signed-off-by: Till Faelligen <tfaelligen@gmail.com>

* Remove internal.ContinualConsumer from federationapi

* Remove internal.ContinualConsumer from syncapi

* Remove internal.ContinualConsumer from keyserver

* Move to new Prepare function

* Remove saramajetstream & sarama dependency

* Delete unneeded file

* Remove duplicate import

* Log error instead of silently irgnoring it

* Move `OffsetNewest` and `OffsetOldest` into keyserver types, change them to be more sane values

* Fix comments

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2022-02-04 13:08:13 +00:00
Neil Alexander c773b038bb
Use pull consumers (#2140)
* Pull consumers

* Pull consumers

* Only nuke consumers if they are push consumers

* Clean up old consumers

* Better error handling

* Update comments
2022-02-02 13:32:48 +00:00
Neil Alexander a763cbb0e1
Roomserver/federation input refactor (#2104)
* Put federation client functions into their own file

* Look for missing auth events in RS input

* Remove retrieveMissingAuthEvents from federation API

* Logging

* Sorta transplanted the code over

* Use event origin failing all else

* Don't get stuck on mutexes:

* Add verifier

* Don't mark state events with zero snapshot NID as not existing

* Check missing state if not an outlier before storing the event

* Reject instead of soft-fail, don't copy roominfo so much

* Use synchronous contexts, limit time to fetch missing events

* Clean up some commented out bits

* Simplify `/send` endpoint significantly

* Submit async

* Report errors on sending to RS input

* Set max payload in NATS to 16MB

* Tweak metrics

* Add `workerForRoom` for tidiness

* Try skipping unmarshalling errors for RespMissingEvents

* Track missing prev events separately to avoid calculating state when not possible

* Tweak logic around checking missing state

* Care about state when checking missing prev events

* Don't check missing state for create events

* Try that again

* Handle create events better

* Send create room events as new

* Use given event kind when sending auth/state events

* Revert "Use given event kind when sending auth/state events"

This reverts commit 089d64d271.

* Only search for missing prev events or state for new events

* Tweaks

* We only have missing prev if we don't supply state

* Room version tweaks

* Allow async inputs again

* Apply backpressure to consumers/synchronous requests to hopefully stop things being overwhelmed

* Set timeouts on roomserver input tasks (need to decide what timeout makes sense)

* Use work queue policy, deliver all on restart

* Reduce chance of duplicates being sent by NATS

* Limit the number of servers we attempt to reduce backpressure

* Some review comment fixes

* Tidy up a couple things

* Don't limit servers, randomise order using map

* Some context refactoring

* Update gmsl

* Don't resend create events

* Set stateIDs length correctly or else the roomserver thinks there are missing events when there aren't

* Exclude our own servername

* Try backing off servers

* Make excluding self behaviour optional

* Exclude self from g_m_e

* Update sytest-whitelist

* Update consumers for the roomserver output stream

* Remember to send outliers for state returned from /gme

* Make full HTTP tests less upsetti

* Remove 'If a device list update goes missing, the server resyncs on the next one' from the sytest blacklist

* Remove debugging test

* Fix blacklist again, remove unnecessary duplicate context

* Clearer contexts, don't use background in case there's something happening there

* Don't queue up events more than once in memory

* Correctly identify create events when checking for state

* Fill in gaps again in /gme code

* Remove `AuthEventIDs` from `InputRoomEvent`

* Remove stray field

Co-authored-by: Kegan Dougal <kegan@matrix.org>
2022-01-27 14:29:14 +00:00
Neil Alexander 49a618dfe2
Increase maximum message size to 16MB (#2109) 2022-01-25 14:20:12 +00:00
Neil Alexander 16035b9737
NATS JetStream tweaks (#2086)
* Use named NATS durable consumers

* Build fixes

* Remove dupe call to SetFederationAPI

* Use namespaced consumer name

* Fix namespacing

* Fix unit tests hopefully
2022-01-07 17:31:57 +00:00
S7evinK 161f145176
Add NATS JetStream support (#1866)
* Add NATS JetStream support
Update shopify/sarama

* Fix addresses

* Don't change Addresses in Defaults

* Update saramajetstream

* Add missing error check

Keep typing events for at least one minute

* Use all configured NATS addresses

* Update saramajetstream

* Try setting up with NATS

* Make sure NATS uses own persistent directory (TODO: make this configurable)

* Update go.mod/go.sum

* Jetstream package

* Various other refactoring

* Build fixes

* Config tweaks, make random jetstream storage path for CI

* Disable interest policies

* Try to sane default on jetstream base path

* Try to use in-memory for CI

* Restore storage/retention

* Update nats.go dependency

* Adapt changes to config

* Remove unneeded TopicFor

* Dep update

* Revert "Remove unneeded TopicFor"

This reverts commit f5a4e4a339.

* Revert changes made to streams

* Fix build problems

* Update nats-server

* Update go.mod/go.sum

* Roomserver input API queuing using NATS

* Fix topic naming

* Prometheus metrics

* More refactoring to remove saramajetstream

* Add missing topic

* Don't try to populate map that doesn't exist

* Roomserver output topic

* Update go.mod/go.sum

* Message acknowledgements

* Ack tweaks

* Try to resume transaction re-sends

* Try to resume transaction re-sends

* Update to matrix-org/gomatrixserverlib@91dadfb

* Remove internal.PartitionStorer from components that don't consume keychanges

* Try to reduce re-allocations a bit in resolveConflictsV2

* Tweak delivery options on RS input

* Publish send-to-device messages into correct JetStream subject

* Async and sync roomserver input

* Update dendrite-config.yaml

* Remove roomserver tests for now (they need rewriting)

* Remove roomserver test again (was merged back in)

* Update documentation

* Docker updates

* More Docker updates

* Update Docker readme again

* Fix lint issues

* Send final event in `processEvent` synchronously (since this might stop Sytest from being so upset)

* Don't report event rejection errors via `/send`, since apparently this is upsetting tests that don't expect that

* Go 1.16 instead of Go 1.13 for upgrade tests and Complement

* Revert "Don't report event rejection errors via `/send`, since apparently this is upsetting tests that don't expect that"

This reverts commit 368675283f.

* Don't report any errors on `/send` to see what fun that creates

* Fix panics on closed channel sends

* Enforce state key matches sender

* Do the same for leave

* Various tweaks to make tests happier

Squashed commit of the following:

commit 13f9028e7a
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Tue Jan 4 15:47:14 2022 +0000

    Do the same for leave

commit e6be7f05c3
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Tue Jan 4 15:33:42 2022 +0000

    Enforce state key matches sender

commit 85ede6d64b
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Tue Jan 4 14:07:04 2022 +0000

    Fix panics on closed channel sends

commit 9755494a98
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Tue Jan 4 13:38:22 2022 +0000

    Don't report any errors on `/send` to see what fun that creates

commit 3bb4f87b5d
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Tue Jan 4 13:00:26 2022 +0000

    Revert "Don't report event rejection errors via `/send`, since apparently this is upsetting tests that don't expect that"

    This reverts commit 368675283f.

commit fe2673ed7b
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Tue Jan 4 12:09:34 2022 +0000

    Go 1.16 instead of Go 1.13 for upgrade tests and Complement

commit 368675283f
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Tue Jan 4 11:51:45 2022 +0000

    Don't report event rejection errors via `/send`, since apparently this is upsetting tests that don't expect that

commit b028dfc085
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Tue Jan 4 10:29:08 2022 +0000

    Send final event in `processEvent` synchronously (since this might stop Sytest from being so upset)

* Merge in NATS Server v2.6.6 and nats.go v1.13 into the in-process connection fork

* Add `jetstream.WithJetStreamMessage` to make ack/nak-ing less messy, use process context in consumers

* Fix consumer component name in  federation API

* Add comment explaining where streams are defined

* Tweaks to roomserver input with comments

* Finish that sentence that I apparently forgot to finish in INSTALL.md

* Bump version number of config to 2

* Add comments around asynchronous sends to roomserver in processEventWithMissingState

* More useful error message when the config version does not match

* Set version in generate-config

* Fix version in config.Defaults

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2022-01-05 17:44:49 +00:00