Commit graph

88 commits

Author SHA1 Message Date
Neil Alexander 95eb3545d6
Tweak metrics 2022-01-17 13:39:34 +00:00
Neil Alexander 3a62494082
Report errors on sending to RS input 2022-01-17 11:53:42 +00:00
Neil Alexander 14327faaf6
Submit async 2022-01-17 11:45:03 +00:00
Neil Alexander 8236478dc3
Simplify /send endpoint significantly 2022-01-17 11:40:42 +00:00
Neil Alexander 0f5049279c
Clean up some commented out bits 2022-01-17 11:06:22 +00:00
Neil Alexander ad19c2b81a
Merge branch 'master' into neilalexander/federationinput 2022-01-06 13:14:48 +00:00
S7evinK 161f145176
Add NATS JetStream support (#1866)
* Add NATS JetStream support
Update shopify/sarama

* Fix addresses

* Don't change Addresses in Defaults

* Update saramajetstream

* Add missing error check

Keep typing events for at least one minute

* Use all configured NATS addresses

* Update saramajetstream

* Try setting up with NATS

* Make sure NATS uses own persistent directory (TODO: make this configurable)

* Update go.mod/go.sum

* Jetstream package

* Various other refactoring

* Build fixes

* Config tweaks, make random jetstream storage path for CI

* Disable interest policies

* Try to sane default on jetstream base path

* Try to use in-memory for CI

* Restore storage/retention

* Update nats.go dependency

* Adapt changes to config

* Remove unneeded TopicFor

* Dep update

* Revert "Remove unneeded TopicFor"

This reverts commit f5a4e4a339.

* Revert changes made to streams

* Fix build problems

* Update nats-server

* Update go.mod/go.sum

* Roomserver input API queuing using NATS

* Fix topic naming

* Prometheus metrics

* More refactoring to remove saramajetstream

* Add missing topic

* Don't try to populate map that doesn't exist

* Roomserver output topic

* Update go.mod/go.sum

* Message acknowledgements

* Ack tweaks

* Try to resume transaction re-sends

* Try to resume transaction re-sends

* Update to matrix-org/gomatrixserverlib@91dadfb

* Remove internal.PartitionStorer from components that don't consume keychanges

* Try to reduce re-allocations a bit in resolveConflictsV2

* Tweak delivery options on RS input

* Publish send-to-device messages into correct JetStream subject

* Async and sync roomserver input

* Update dendrite-config.yaml

* Remove roomserver tests for now (they need rewriting)

* Remove roomserver test again (was merged back in)

* Update documentation

* Docker updates

* More Docker updates

* Update Docker readme again

* Fix lint issues

* Send final event in `processEvent` synchronously (since this might stop Sytest from being so upset)

* Don't report event rejection errors via `/send`, since apparently this is upsetting tests that don't expect that

* Go 1.16 instead of Go 1.13 for upgrade tests and Complement

* Revert "Don't report event rejection errors via `/send`, since apparently this is upsetting tests that don't expect that"

This reverts commit 368675283f.

* Don't report any errors on `/send` to see what fun that creates

* Fix panics on closed channel sends

* Enforce state key matches sender

* Do the same for leave

* Various tweaks to make tests happier

Squashed commit of the following:

commit 13f9028e7a
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Tue Jan 4 15:47:14 2022 +0000

    Do the same for leave

commit e6be7f05c3
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Tue Jan 4 15:33:42 2022 +0000

    Enforce state key matches sender

commit 85ede6d64b
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Tue Jan 4 14:07:04 2022 +0000

    Fix panics on closed channel sends

commit 9755494a98
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Tue Jan 4 13:38:22 2022 +0000

    Don't report any errors on `/send` to see what fun that creates

commit 3bb4f87b5d
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Tue Jan 4 13:00:26 2022 +0000

    Revert "Don't report event rejection errors via `/send`, since apparently this is upsetting tests that don't expect that"

    This reverts commit 368675283f.

commit fe2673ed7b
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Tue Jan 4 12:09:34 2022 +0000

    Go 1.16 instead of Go 1.13 for upgrade tests and Complement

commit 368675283f
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Tue Jan 4 11:51:45 2022 +0000

    Don't report event rejection errors via `/send`, since apparently this is upsetting tests that don't expect that

commit b028dfc085
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Tue Jan 4 10:29:08 2022 +0000

    Send final event in `processEvent` synchronously (since this might stop Sytest from being so upset)

* Merge in NATS Server v2.6.6 and nats.go v1.13 into the in-process connection fork

* Add `jetstream.WithJetStreamMessage` to make ack/nak-ing less messy, use process context in consumers

* Fix consumer component name in  federation API

* Add comment explaining where streams are defined

* Tweaks to roomserver input with comments

* Finish that sentence that I apparently forgot to finish in INSTALL.md

* Bump version number of config to 2

* Add comments around asynchronous sends to roomserver in processEventWithMissingState

* More useful error message when the config version does not match

* Set version in generate-config

* Fix version in config.Defaults

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2022-01-05 17:44:49 +00:00
Neil Alexander df6a60f35e
Don't get stuck on mutexes: 2021-12-15 15:45:53 +00:00
Neil Alexander 2203dd9d8a
Sorta transplanted the code over 2021-12-15 13:59:58 +00:00
Neil Alexander bb5ee2ecc4
Remove retrieveMissingAuthEvents from federation API 2021-12-14 11:40:02 +00:00
Neil Alexander 323a6fb54f
Resume federation sends (#2039)
* Resume federation sends

* Review comments

* Fix build error
2021-11-08 09:24:16 +00:00
Neil Alexander fbd1a0ab13
Update to matrix-org/gomatrixserverlib@5e02b64 2021-11-02 10:13:38 +00:00
Ryan W a624eab309
- Removed double imports (#1989)
- Lower cased error messages

Signed-off-by: Ryan Whittington <twentybitdev@gmail.com>

Co-authored-by: kegsay <kegan@matrix.org>
2021-09-08 17:31:03 +01:00
kegsay 7dc8fb1fe7
Add more logs (#2005)
* Add more logs

To help debug the migration issue in #1924 along with manual data-loss-inducing fixes.
Also log the origin server on processed txns to help debug buggy server origins.

* Fix query
2021-09-07 15:07:14 +01:00
Neil Alexander ff21675c5b
Cross-signing fixes, notifications via sync, federation (#1974)
* Initial work on signing key update EDUs

* Fix build

* Produce/consume EDUs

* Producer logging

* Only produce key change notifications for local users

* Better naming

* Try to notify sync

* Enable feature

* Use key change topic

* Don't bother verifying signatures, validate key lengths if we can, notifier fixes

* Copyright notices

* Remove tests from whitelist until matrix-org/sytest#1117

* Some review comment fixes

* Update to matrix-org/gomatrixserverlib@f9416ac

* Remove unneeded parameter
2021-08-17 13:44:30 +01:00
Neil Alexander c8408a6387
Add more optimised code path for checking if we're in a room (#1909)
* Add more optimised code path for checking if we're in a room

* Fix database queries

* Fix federation API test

* Fix logging

* Review comments

* Make separate API call for room membership
2021-07-09 16:36:45 +01:00
Neil Alexander f2974721d5
Fix concurrent map reads/writes on t.hadEvents (#1902)
* Fix concurrent map reads/writes on t.hadEvents

* Add hadEvent function
2021-07-07 18:55:44 +01:00
Neil Alexander bcd3ef38d0
Track expiry rate on pduCountTotal 2021-07-05 13:47:37 +01:00
Neil Alexander 99d8e1c107
Federation API fixes (#1899)
* Ensure worker has work before starting goroutine

* Revert "Remove processEventWithMissingStateMutex"

This reverts commit 7f02eab47d.

* Use request context when processing transactions

* Keep goroutine count down by not starting work for things where the caller gave up

* Remove mutex, start workers at correct time
2021-07-05 12:14:31 +01:00
Neil Alexander 7f02eab47d
Remove processEventWithMissingStateMutex 2021-07-05 09:14:24 +01:00
Neil Alexander 57320897cb
Federation API workers for /send to reduce memory usage (#1897)
* Try to process rooms concurrently in FS /send

* Clean up

* Use request context so that dead things don't linger for so long

* Remove mutex

* Free up pdus slice so only references remaining are in channel

* Revert "Remove mutex"

This reverts commit 8558075e8c.

* Process EDUs in parallel

* Try refactoring /send concurrency

* Fix waitgroup

* Release on waitgroup

* Respond to transaction

* Reduce CPU usage, fix unit tests

* Tweaks

* Move into one file
2021-07-02 12:33:27 +01:00
Neil Alexander 2647f6e9c5
Fix concurrent map read/write on haveEvents (#1893) 2021-06-30 12:32:20 +01:00
Neil Alexander b7a2d369c0
Change how servers are selected for missing auth/prev events (#1892)
* Change how servers are selected for missing auth/prev events

* Shuffle order

* Move ServersInRoomProvider into api package
2021-06-30 12:05:58 +01:00
Neil Alexander 0e69212206
Give up on loops when the context expires (#1891) 2021-06-30 10:39:47 +01:00
Neil Alexander 3afb161352
Reduce memory usage in federation /send endpoint (#1890)
* More aggressive event caching

* Deduplicate /state results

* Deduplicate more

* Ensure we use the correct list of events when excluding repeated state

* Fixes

* Ensure we track all events we already knew about properly
2021-06-30 10:01:56 +01:00
Neil Alexander e2b6a90d90
Put gmectx back to 5 minutes 2021-06-29 10:22:26 +01:00
Neil Alexander f645646ca9
Restore the getServers RS query (needs optimisation) 2021-06-29 09:37:28 +01:00
Neil Alexander 4417f24678
Protect processEventWithMissingState with per-room mutex, to prevent mass CPU burn/RAM usage
Squashed commit of the following:

commit 7fad77c10e3c1c78feddb37351812b209d9c0f25
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Mon Jun 28 15:06:52 2021 +0100

    Fix processEventWithMissingStateMutexes

commit 138cddcac7b8373a8e1816a232f84a7bda6adcdf
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Mon Jun 28 13:59:44 2021 +0100

    Use internal.MutexByRoom

commit 6e6f026cfad31da391ad261cfec16d41dff1b15b
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Mon Jun 28 13:50:18 2021 +0100

    Try to slow things down per room

commit b97d406dff2e11769a9202fbf58b138a541ca449
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Mon Jun 28 13:41:27 2021 +0100

    Try to slow things down

commit 8866120ebf880b4fd8a456937f69903e233c19a2
Merge: 9f2de8a2 4a37b19a
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Mon Jun 28 13:40:33 2021 +0100

    Merge branch 'neilalexander/rsinputfifo' into neilalexander/rsinputfifo2

commit 4a37b19a8f
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Mon Jun 28 13:34:54 2021 +0100

    Add comments

commit f9ab3f4b81
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Mon Jun 28 13:31:21 2021 +0100

    Tweaks

commit 9f2de8a29cadec4c785d9c2e4e74c1138305f759
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Mon Jun 28 13:15:59 2021 +0100

    Ask origin only for missing things for now

commit 8fd878c75a
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Mon Jun 28 11:18:11 2021 +0100

    Make sure someone wakes up

commit b63f699f1b
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Mon Jun 28 11:12:58 2021 +0100

    Use a FIFO queue instead of a channel to reduce backpressure
2021-06-28 15:11:59 +01:00
Neil Alexander 080ae6a829
Move room mutex in federation API (#1830)
* Move room mutex in federation API to surround resolveStatesAndCheck

* Guard processEventWithMissingState instead

* Revert "Guard processEventWithMissingState instead"

This reverts commit 0ce88036aa.
2021-04-13 11:13:07 +01:00
Kegsay b769d5a25e
Optimise memory usage when calling /g_m_e (#1819)
* Optimise memory usage when calling /g_m_e

* cache more events

* refactor handling of device list update pokes

* Sigh
2021-04-08 13:50:39 +01:00
Kegsay f8d3a762c4
Add a per-room mutex to federationapi when processing transactions (#1810)
* Add a per-room mutex to federationapi when processing transactions

This has numerous benefits:
 - Prevents us doing lots of state resolutions in busy rooms. Previously, room forks would always result
   in a state resolution being performed immediately, without checking if we were already doing this in
   a different transaction. Now they will queue up, resulting in fewer calls to `/state_ids`, `/g_m_e`, etc.
 - Prevents memory usage from growing too large as a result and potentially OOMing.

And costs:
 - High traffic rooms will be slightly slower due to head-of-line blocking from other servers,
   though this has always been an issue as roomserver has a per-room mutex already.

* Fix unit tests

* Correct mutex lock ordering
2021-03-30 10:01:32 +01:00
Kegsay af41f6d454
Add Sentry support (#1803)
* Add Sentry support

* Use HTTP Sentry properly maybe

* Capture panics

* Log fed Sentry stuff correctly

* British english linter
2021-03-24 10:25:24 +00:00
Kegsay 802f1c96f8
Add more metrics (#1802)
* Add more metrics

* Linting
2021-03-23 15:22:00 +00:00
Kegsay a1b7e4ef3f
log less for failed key querys, add counters for incoming pdus/edus (#1801)
* log less for failed key querys, add counters for incoming pdus/edus

* use labels

* Blacklist flakey test

* Fix metrics
2021-03-23 11:33:36 +00:00
Neil Alexander d15836e260
Increase gocyclo complexity to 25 (and remove all but 2 golint directives related to it) (#1783) 2021-03-03 14:35:57 +00:00
Neil Alexander 5d74a1757f
Don't query for servers so often in /send (#1766)
* Look up servers less often, don't hit API for missing auth events unless there are actually missing auth events

* Remove ResolveConflictsAdhoc (since it is already in GMSL), other tweaks

* Update gomatrixserverlib to matrix-org/gomatrixserverlib#254

* Fix resolve-state

* Initialise t.servers on first use
2021-02-16 17:12:17 +00:00
Neil Alexander 05324b6861
Send/state tweaks (#1681)
* Check missing event count

* Don't use request context for /send
2021-01-04 13:47:48 +00:00
Neil Alexander b5aa7ca3ab
Top-level setup package (#1605)
* Move config, setup, mscs into "setup" top-level folder

* oops, forgot the EDU server

* Add setup

* goimports
2020-12-02 17:41:00 +00:00
Neil Alexander 265cf5e835
Protect txnReq.newEvents with mutex (#1587)
* Protect txnReq.newEvents and txnReq.haveEvents with mutex

* Missing defer

* Remove t.haveEventsMutex
2020-11-18 11:31:58 +00:00
Neil Alexander 20a01bceb2
Pass pointers to events — reloaded (#1583)
* Pass events as pointers

* Fix lint errors

* Update gomatrixserverlib

* Update gomatrixserverlib

* Update to matrix-org/gomatrixserverlib#240
2020-11-16 15:44:53 +00:00
S7evinK bcb89ada5e
Implement read receipts (#1528)
* fix conversion from int to string yields a string of one rune, not a string of digits

* Add receipts table to syncapi

* Use StreamingToken as the since value

* Add required method to testEDUProducer

* Make receipt json creation "easier" to read

* Add receipts api to the eduserver

* Add receipts endpoint

* Add eduserver kafka consumer

* Add missing kafka config

* Add passing tests to whitelist

Signed-off-by: Till Faelligen <tfaelligen@gmail.com>

* Fix copy & paste error

* Fix column count error

* Make outbound federation receipts pass

* Make "Inbound federation rejects receipts from wrong remote" pass

* Don't use errors package

* - Add TODO for batching requests
- Rename variable

* Return a better error message

* - Use OutputReceiptEvent instead of InputReceiptEvent as result
- Don't use the errors package for errors
- Defer CloseAndLogIfError to close rows
- Fix Copyright

* Better creation/usage of JoinResponse

* Query all joined rooms instead of just one

* Update gomatrixserverlib

* Add sqlite3 migration

* Add postgres migration

* Ensure required sequence exists before running migrations

* Clarification on comment

* - Fix a bug when creating client receipts
- Use concrete types instead of interface{}

* Remove dead code
Use key for timestamp

* Fix postgres query...

* Remove single purpose struct

* Use key/value directly

* Only apply receipts on initial sync or if edu positions differ,
otherwise we'll be sending the same receipts over and over again.

* Actually update the id, so it is correctly send in syncs

* Set receipt on request to /read_markers

* Fix issue with receipts getting overwritten

* Use fmt.Errorf instead of pkg/errors

* Revert "Add postgres migration"

This reverts commit 722fe5a046.

* Revert "Add sqlite3 migration"

This reverts commit d113b03f64.

* Fix selectRoomReceipts query

* Make golangci-lint happy

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2020-11-09 18:46:11 +00:00
Neil Alexander 6e63df1d9a
KindOld (#1531)
* Add KindOld

* Don't process latest events/memberships for old events

* Allow federationsender to ignore duplicate key entries when LatestEventIDs is duplicated by RS output events

* Signal to downstream components if an event has become a forward extremity

* Don't exclude from sync

* Soft-fail checks on KindNew

* Don't run the latest events updater at all for KindOld

* Don't make federation sender change after all

* Kind in federation sender join

* Don't send isForwardExtremity

* Fix syncapi

* Update comments

* Fix SendEventWithState

* Update sytest-whitelist

* Generate old output events

* Sync API consumes old room events

* Update comments
2020-10-19 14:59:13 +01:00
Neil Alexander 10f1beb0de
Don't re-run state resolution on a single trusted state snapshot (#1526)
* Don't re-run state resolution on a single trusted state snapshot

* Lint

* Check if backward extremity is create event before checking missing state
2020-10-15 12:08:49 +01:00
Neil Alexander 6f12b8f85c
Ignore typing events where sender doesn't match origin (#1523)
* Ignore typing notifications where the sender doesn't match the origin

* Update sytest-whitelist

* Fix formatting directives
2020-10-14 16:49:25 +01:00
Neil Alexander 7a1fd123de
Improved state handling in /send (#1521)
* Capture errors

* Don't request only state key tuples needed for auth (we end up discarding room state this way)

* QueryStateAfterEvent returns all state when no tuples supplied

* Resolve state

* Comments
2020-10-14 12:39:37 +01:00
Neil Alexander 9d6b77c58a
Try to retrieve missing auth events from multiple servers (#1516)
* Recursively fetch auth events if needed

* Fix processEvent call

* Ask more servers in lookupEvent

* Don't panic!

* Panic at the Disco

* Find servers more aggressively

* Add getServers

* Fix number of servers to 5, don't bail making RespState if auth events missing

* Fix panic

* Ignore missing state events too

* Report number of servers correctly

* Don't reuse request context for /send_join

* Update federation API tests

* Don't recurse processEvents

* Implement getEvents differently
2020-10-13 11:53:20 +01:00
Neil Alexander 8001627cfc
Get missing event tweaks (#1514)
* Adjust backfill to send backward extremity with state before other backfilled events, include prev_events with no state amongst missing events

* Not finished refactor

* Fix test

* Remove isInboundTxn

* Remove debug logging
2020-10-12 15:56:15 +01:00
Neil Alexander 4df7e345bb
Only return 500 on /send if a database error occurs (#1503) 2020-10-09 15:06:43 +01:00
Neil Alexander 28454d6fb7
Log origin in /send 2020-10-02 11:38:35 +01:00
Neil Alexander 135b5e264f
Fix panic on verifySigError in fetching missing events 2020-09-30 13:51:54 +01:00