Commit graph

144 commits

Author SHA1 Message Date
Neil Alexander 57320897cb
Federation API workers for /send to reduce memory usage (#1897)
* Try to process rooms concurrently in FS /send

* Clean up

* Use request context so that dead things don't linger for so long

* Remove mutex

* Free up pdus slice so only references remaining are in channel

* Revert "Remove mutex"

This reverts commit 8558075e8c.

* Process EDUs in parallel

* Try refactoring /send concurrency

* Fix waitgroup

* Release on waitgroup

* Respond to transaction

* Reduce CPU usage, fix unit tests

* Tweaks

* Move into one file
2021-07-02 12:33:27 +01:00
Neil Alexander 2647f6e9c5
Fix concurrent map read/write on haveEvents (#1893) 2021-06-30 12:32:20 +01:00
Neil Alexander b7a2d369c0
Change how servers are selected for missing auth/prev events (#1892)
* Change how servers are selected for missing auth/prev events

* Shuffle order

* Move ServersInRoomProvider into api package
2021-06-30 12:05:58 +01:00
Neil Alexander 0e69212206
Give up on loops when the context expires (#1891) 2021-06-30 10:39:47 +01:00
Neil Alexander 3afb161352
Reduce memory usage in federation /send endpoint (#1890)
* More aggressive event caching

* Deduplicate /state results

* Deduplicate more

* Ensure we use the correct list of events when excluding repeated state

* Fixes

* Ensure we track all events we already knew about properly
2021-06-30 10:01:56 +01:00
Neil Alexander e2b6a90d90
Put gmectx back to 5 minutes 2021-06-29 10:22:26 +01:00
Neil Alexander f645646ca9
Restore the getServers RS query (needs optimisation) 2021-06-29 09:37:28 +01:00
Neil Alexander 4417f24678
Protect processEventWithMissingState with per-room mutex, to prevent mass CPU burn/RAM usage
Squashed commit of the following:

commit 7fad77c10e3c1c78feddb37351812b209d9c0f25
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Mon Jun 28 15:06:52 2021 +0100

    Fix processEventWithMissingStateMutexes

commit 138cddcac7b8373a8e1816a232f84a7bda6adcdf
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Mon Jun 28 13:59:44 2021 +0100

    Use internal.MutexByRoom

commit 6e6f026cfad31da391ad261cfec16d41dff1b15b
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Mon Jun 28 13:50:18 2021 +0100

    Try to slow things down per room

commit b97d406dff2e11769a9202fbf58b138a541ca449
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Mon Jun 28 13:41:27 2021 +0100

    Try to slow things down

commit 8866120ebf880b4fd8a456937f69903e233c19a2
Merge: 9f2de8a2 4a37b19a
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Mon Jun 28 13:40:33 2021 +0100

    Merge branch 'neilalexander/rsinputfifo' into neilalexander/rsinputfifo2

commit 4a37b19a8f
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Mon Jun 28 13:34:54 2021 +0100

    Add comments

commit f9ab3f4b81
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Mon Jun 28 13:31:21 2021 +0100

    Tweaks

commit 9f2de8a29cadec4c785d9c2e4e74c1138305f759
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Mon Jun 28 13:15:59 2021 +0100

    Ask origin only for missing things for now

commit 8fd878c75a
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Mon Jun 28 11:18:11 2021 +0100

    Make sure someone wakes up

commit b63f699f1b
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date:   Mon Jun 28 11:12:58 2021 +0100

    Use a FIFO queue instead of a channel to reduce backpressure
2021-06-28 15:11:59 +01:00
Neil Alexander 080ae6a829
Move room mutex in federation API (#1830)
* Move room mutex in federation API to surround resolveStatesAndCheck

* Guard processEventWithMissingState instead

* Revert "Guard processEventWithMissingState instead"

This reverts commit 0ce88036aa.
2021-04-13 11:13:07 +01:00
Kegsay b769d5a25e
Optimise memory usage when calling /g_m_e (#1819)
* Optimise memory usage when calling /g_m_e

* cache more events

* refactor handling of device list update pokes

* Sigh
2021-04-08 13:50:39 +01:00
Bruce MacDonald d27607af78
Implement OpenID module (#599) (#1812)
* Implement OpenID module (#599)

- Unrelated: change Riot references to Element in client API routing

Signed-off-by: Bruce MacDonald <contact@bruce-macdonald.com>

* OpenID module tweaks (#599)

- specify expiry is ms rather than vague ts
- add OpenID token lifetime to configuration
- use Go naming conventions for the path params
- store plaintext token rather than hash
- remove openid table sqllite mutex

* Add default OpenID token lifetime (#599)

* Update dendrite-config.yaml

Co-authored-by: Kegsay <kegsay@gmail.com>
Co-authored-by: Kegsay <kegan@matrix.org>
2021-04-07 13:26:20 +01:00
Kegsay f8d3a762c4
Add a per-room mutex to federationapi when processing transactions (#1810)
* Add a per-room mutex to federationapi when processing transactions

This has numerous benefits:
 - Prevents us doing lots of state resolutions in busy rooms. Previously, room forks would always result
   in a state resolution being performed immediately, without checking if we were already doing this in
   a different transaction. Now they will queue up, resulting in fewer calls to `/state_ids`, `/g_m_e`, etc.
 - Prevents memory usage from growing too large as a result and potentially OOMing.

And costs:
 - High traffic rooms will be slightly slower due to head-of-line blocking from other servers,
   though this has always been an issue as roomserver has a per-room mutex already.

* Fix unit tests

* Correct mutex lock ordering
2021-03-30 10:01:32 +01:00
Kegsay af41f6d454
Add Sentry support (#1803)
* Add Sentry support

* Use HTTP Sentry properly maybe

* Capture panics

* Log fed Sentry stuff correctly

* British english linter
2021-03-24 10:25:24 +00:00
Kegsay 802f1c96f8
Add more metrics (#1802)
* Add more metrics

* Linting
2021-03-23 15:22:00 +00:00
Kegsay a1b7e4ef3f
log less for failed key querys, add counters for incoming pdus/edus (#1801)
* log less for failed key querys, add counters for incoming pdus/edus

* use labels

* Blacklist flakey test

* Fix metrics
2021-03-23 11:33:36 +00:00
Will Hunt 9557ccada4
Fix appsevice alias queries part 2 (#1684)
* Check membership of room

* Use QueryStateAfterEventsResponse

* Fix complexity

* Add field ShouldHitAppservice to GetRoomIDForAlias

* Hit appservice when trying to join a non-existent alias

* remove unused

* Changes that I made a long time ago

* Rename to appserviceJoinedAtEvent

* Check membership in GetMemberships

* Update QueryMembershipsForRoom

* Tweaks in client API

* Update appserviceJoinedAtEvent

* Comments

* Try QueryMembershipForUser instead

* Undo some changes to client API that shouldn't be needed

* More /event tweaks

* Refactor /event bit

* Go back to QueryMembershipsForRoom because appservices are hard

* Fix bugs in onMessage

* Add comments

* More logical naming, clean up a bit

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2021-03-03 17:00:31 +00:00
Neil Alexander d15836e260
Increase gocyclo complexity to 25 (and remove all but 2 golint directives related to it) (#1783) 2021-03-03 14:35:57 +00:00
Neil Alexander 5d74a1757f
Don't query for servers so often in /send (#1766)
* Look up servers less often, don't hit API for missing auth events unless there are actually missing auth events

* Remove ResolveConflictsAdhoc (since it is already in GMSL), other tweaks

* Update gomatrixserverlib to matrix-org/gomatrixserverlib#254

* Fix resolve-state

* Initialise t.servers on first use
2021-02-16 17:12:17 +00:00
Kegsay 93942f8ab6
Gate peeking behind msc flags (#1731) 2021-01-22 16:08:47 +00:00
Matthew Hodgson 0571d395b5
Peeking over federation via MSC2444 (#1391)
* a very very WIP first cut of peeking via MSC2753.

doesn't yet compile or work.
needs to actually add the peeking block into the sync response.
checking in now before it gets any bigger, and to gather any initial feedback on the vague shape of it.

* make PeekingDeviceSet private

* add server_name param

* blind stab at adding a `peek` section to /sync

* make it build

* make it launch

* add peeking to getResponseWithPDUsForCompleteSync

* cancel any peeks when we join a room

* spell out how to runoutside of docker if you want speed

* fix SQL

* remove unnecessary txn for SelectPeeks

* fix s/join/peek/ cargocult fail

* HACK: Track goroutine IDs to determine when we write by the wrong thread

To use: set `DENDRITE_TRACE_SQL=1` then grep for `unsafe`

* Track partition offsets and only log unsafe for non-selects

* Put redactions in the writer goroutine

* Update filters on writer goroutine

* wrap peek storage in goid hack

* use exclusive writer, and MarkPeeksAsOld more efficiently

* don't log ascii in binary at sql trace...

* strip out empty roomd deltas

* re-add txn to SelectPeeks

* re-add accidentally deleted field

* reject peeks for non-worldreadable rooms

* move perform_peek

* fix package

* correctly refactor perform_peek

* WIP of implementing MSC2444

* typo

* Revert "Merge branch 'kegan/HACK-goid-sqlite-db-is-locked' into matthew/peeking"

This reverts commit 3cebd8dbfb, reversing
changes made to ed4b3a58a7.

* (almost) make it build

* clean up bad merge

* support SendEventWithState with optional event

* fix build & lint

* fix build & lint

* reinstate federated peeks in the roomserver (doh)

* fix sql thinko

* todo for authenticating state returned by /peek

* support returning current state from QueryStateAndAuthChain

* handle SS /peek

* reimplement SS /peek to prod the RS to tell the FS about the peek

* rename RemotePeeks as OutboundPeeks

* rename remote_peeks_table as outbound_peeks_table

* add perform_handle_remote_peek.go

* flesh out federation doc

* add inbound peeks table and hook it up

* rename ambiguous RemotePeek as InboundPeek

* rename FSAPI's PerformPeek as PerformOutboundPeek

* setup inbound peeks db correctly

* fix api.SendEventWithState with no event

* track latestevent on /peek

* go fmt

* document the peek send stream race better

* fix SendEventWithRewrite not to bail if handed a non-state event

* add fixme

* switch SS /peek to use SendEventWithRewrite

* fix comment

* use reverse topo ordering to find latest extrem

* support postgres for federated peeking

* go fmt

* back out bogus go.mod change

* Fix performOutboundPeekUsingServer

* Fix getAuthChain -> GetAuthChain

* Fix build issues

* Fix build again

* Fix getAuthChain -> GetAuthChain

* Don't repeat outbound peeks for the same room ID to the same servers

* Fix lint

* Don't omitempty to appease sytest

Co-authored-by: Kegan Dougal <kegan@matrix.org>
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2021-01-22 14:55:08 +00:00
Neil Alexander 05324b6861
Send/state tweaks (#1681)
* Check missing event count

* Don't use request context for /send
2021-01-04 13:47:48 +00:00
Kegsay b507312d4c
MSC2836 threading: part 2 (#1596)
* Update GMSL

* Add MSC2836EventRelationships to fedsender

* Call MSC2836EventRelationships in reqCtx

* auth remote servers

* Extract room ID and servers from previous events; refactor a bit

* initial cut of federated threading

* Use the right client/fed struct in the response

* Add QueryAuthChain for use with MSC2836

* Add auth chain to federated response

* Fix pointers

* under CI: more logging and enable mscs, nil fix

* Handle direction: up

* Actually send message events to the roomserver..

* Add children and children_hash to unsigned, with tests

* Add logic for exploring threads and tracking children; missing storage functions

* Implement storage functions for children

* Add fetchUnknownEvent

* Do federated hits for include_children if we have unexplored children

* Use /ev_rel rather than /event as the former includes child metadata

* Remove cross-room threading impl

* Enable MSC2836 in the p2p demo

* Namespace mscs db

* Enable msc2836 for ygg

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2020-12-04 14:11:01 +00:00
Neil Alexander be7d8595be
Peeking updates (#1607)
* Add unpeek

* Don't allow peeks into encrypted rooms

* Fix send tests

* Update consumers
2020-12-03 11:11:46 +00:00
Neil Alexander b5aa7ca3ab
Top-level setup package (#1605)
* Move config, setup, mscs into "setup" top-level folder

* oops, forgot the EDU server

* Add setup

* goimports
2020-12-02 17:41:00 +00:00
Neil Alexander 265cf5e835
Protect txnReq.newEvents with mutex (#1587)
* Protect txnReq.newEvents and txnReq.haveEvents with mutex

* Missing defer

* Remove t.haveEventsMutex
2020-11-18 11:31:58 +00:00
Neil Alexander 20a01bceb2
Pass pointers to events — reloaded (#1583)
* Pass events as pointers

* Fix lint errors

* Update gomatrixserverlib

* Update gomatrixserverlib

* Update to matrix-org/gomatrixserverlib#240
2020-11-16 15:44:53 +00:00
S7evinK bcb89ada5e
Implement read receipts (#1528)
* fix conversion from int to string yields a string of one rune, not a string of digits

* Add receipts table to syncapi

* Use StreamingToken as the since value

* Add required method to testEDUProducer

* Make receipt json creation "easier" to read

* Add receipts api to the eduserver

* Add receipts endpoint

* Add eduserver kafka consumer

* Add missing kafka config

* Add passing tests to whitelist

Signed-off-by: Till Faelligen <tfaelligen@gmail.com>

* Fix copy & paste error

* Fix column count error

* Make outbound federation receipts pass

* Make "Inbound federation rejects receipts from wrong remote" pass

* Don't use errors package

* - Add TODO for batching requests
- Rename variable

* Return a better error message

* - Use OutputReceiptEvent instead of InputReceiptEvent as result
- Don't use the errors package for errors
- Defer CloseAndLogIfError to close rows
- Fix Copyright

* Better creation/usage of JoinResponse

* Query all joined rooms instead of just one

* Update gomatrixserverlib

* Add sqlite3 migration

* Add postgres migration

* Ensure required sequence exists before running migrations

* Clarification on comment

* - Fix a bug when creating client receipts
- Use concrete types instead of interface{}

* Remove dead code
Use key for timestamp

* Fix postgres query...

* Remove single purpose struct

* Use key/value directly

* Only apply receipts on initial sync or if edu positions differ,
otherwise we'll be sending the same receipts over and over again.

* Actually update the id, so it is correctly send in syncs

* Set receipt on request to /read_markers

* Fix issue with receipts getting overwritten

* Use fmt.Errorf instead of pkg/errors

* Revert "Add postgres migration"

This reverts commit 722fe5a046.

* Revert "Add sqlite3 migration"

This reverts commit d113b03f64.

* Fix selectRoomReceipts query

* Make golangci-lint happy

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2020-11-09 18:46:11 +00:00
S7evinK eccd0d2c1b
Implement forgetting about rooms (#1572)
* Add basic storage methods

* Add internal api handler

* Add check for forgotten room

* Add /rooms/{roomID}/forget endpoint

* Add missing rsAPI method

* Remove unused parameters

* Add passing tests

Signed-off-by: Till Faelligen <tfaelligen@gmail.com>

* Add missing file

* Add postgres migration

* Add sqlite migration

* Use Forgetter to forget room

* Remove empty line

* Update HTTP status codes

It looks like the spec calls for these to be 400, rather than 403: https://matrix.org/docs/spec/client_server/r0.6.1#post-matrix-client-r0-rooms-roomid-forget

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2020-11-05 10:19:23 +00:00
Neil Alexander 6e63df1d9a
KindOld (#1531)
* Add KindOld

* Don't process latest events/memberships for old events

* Allow federationsender to ignore duplicate key entries when LatestEventIDs is duplicated by RS output events

* Signal to downstream components if an event has become a forward extremity

* Don't exclude from sync

* Soft-fail checks on KindNew

* Don't run the latest events updater at all for KindOld

* Don't make federation sender change after all

* Kind in federation sender join

* Don't send isForwardExtremity

* Fix syncapi

* Update comments

* Fix SendEventWithState

* Update sytest-whitelist

* Generate old output events

* Sync API consumes old room events

* Update comments
2020-10-19 14:59:13 +01:00
Neil Alexander 10f1beb0de
Don't re-run state resolution on a single trusted state snapshot (#1526)
* Don't re-run state resolution on a single trusted state snapshot

* Lint

* Check if backward extremity is create event before checking missing state
2020-10-15 12:08:49 +01:00
Neil Alexander 6f12b8f85c
Ignore typing events where sender doesn't match origin (#1523)
* Ignore typing notifications where the sender doesn't match the origin

* Update sytest-whitelist

* Fix formatting directives
2020-10-14 16:49:25 +01:00
Neil Alexander 7a1fd123de
Improved state handling in /send (#1521)
* Capture errors

* Don't request only state key tuples needed for auth (we end up discarding room state this way)

* QueryStateAfterEvent returns all state when no tuples supplied

* Resolve state

* Comments
2020-10-14 12:39:37 +01:00
Neil Alexander 9d6b77c58a
Try to retrieve missing auth events from multiple servers (#1516)
* Recursively fetch auth events if needed

* Fix processEvent call

* Ask more servers in lookupEvent

* Don't panic!

* Panic at the Disco

* Find servers more aggressively

* Add getServers

* Fix number of servers to 5, don't bail making RespState if auth events missing

* Fix panic

* Ignore missing state events too

* Report number of servers correctly

* Don't reuse request context for /send_join

* Update federation API tests

* Don't recurse processEvents

* Implement getEvents differently
2020-10-13 11:53:20 +01:00
Neil Alexander 8001627cfc
Get missing event tweaks (#1514)
* Adjust backfill to send backward extremity with state before other backfilled events, include prev_events with no state amongst missing events

* Not finished refactor

* Fix test

* Remove isInboundTxn

* Remove debug logging
2020-10-12 15:56:15 +01:00
Neil Alexander 4df7e345bb
Only return 500 on /send if a database error occurs (#1503) 2020-10-09 15:06:43 +01:00
Kegsay 1eaf7aa27e
Use [] not null when there are no devices (#1480) 2020-10-06 11:05:15 +01:00
Neil Alexander 28454d6fb7
Log origin in /send 2020-10-02 11:38:35 +01:00
S7evinK 3e01db0049
Fix golangci-lint issues (#1464)
* Fix S1039: unnecessary use of fmt.Sprintf

* Fix S1036: unnecessary guard around map access

Signed-off-by: Till Faelligen <tfaelligen@gmail.com>
2020-10-01 20:00:56 +01:00
Neil Alexander 135b5e264f
Fix panic on verifySigError in fetching missing events 2020-09-30 13:51:54 +01:00
Neil Alexander f290e92a34
Remove TLS fingerprints, improve perspective unmarshal handling (#1452)
* Add prefer_direct_fetch option

* Update gomatrixserverlib

* Update gomatrixserverlib

* Update gomatrixserverlib

* Don't deal in TLS fingerprints anymore
2020-09-29 17:08:18 +01:00
Neil Alexander 43cdba9a69
Ignore depth in federation API (#1451) 2020-09-29 14:07:59 +01:00
Neil Alexander b0d5d1cc9f
Fix old verify keys 2020-09-29 14:00:12 +01:00
Neil Alexander 738b829a23
Fetch missing auth events, implement QueryMissingAuthPrevEvents, try other servers in room for /event and /get_missing_events (#1450)
* Try to ask other servers in the room for missing events if the origin won't provide them

* Logging

* More logging

* Implement QueryMissingAuthPrevEvents

* Try to get missing auth events badly

* Use processEvent

* Logging

* Update QueryMissingAuthPrevEvents

* Try to find missing auth events

* Patchy fix for test

* Logging tweaks

* Send auth events as outliers

* Update check in QueryMissingAuthPrevEvents

* Error responses

* More return codes

* Don't return error on reject/soft-fail since it was ultimately handled

* More tweaks

* More error tweaks
2020-09-29 13:40:29 +01:00
Neil Alexander ce318f53bc
Use workers when fetching events from /state_ids, use /state only if significant portion of events missing (#1447)
* Don't fall back to /state on incoming /send

* Event workers for /state_ids, use /state only if significant percentage of events are missing
2020-09-28 11:32:59 +01:00
Neil Alexander 40dd16a6e6
Don't fall back to /state on incoming /send (#1446) 2020-09-28 10:03:18 +01:00
Neil Alexander 145db37d89
Allow configuring old verify keys (#1443)
* Allow configuring old verify keys

* Update sample config

* Update sample config

* Fix config population

* Key ID formatting validity of old_verify_keys

* Update comment
2020-09-25 10:58:53 +01:00
Neil Alexander 6fbf89a166
Return the correct error codes for v6 invite JSON violations (#1440)
* Return the correct error codes for v6 invite JSON violations

* Update sytest-whitelist
2020-09-24 17:16:59 +01:00
Neil Alexander 3013ade84f
Reject make_join for empty rooms (#1439)
* Sanity-check room version on RS event input

* Update gomatrixserverlib

* Reject make_join when no room members are left

* Revert some changes from wrong branch

* Distinguish between room not existing and room being abandoned on this server

* nolint
2020-09-24 16:18:13 +01:00
Neil Alexander a14b29b526
Initial notary support (#1436)
* Initial work on notary support

* Somewhat working (but not properly filtered) notary support, other tweaks

* Update gomatrixserverlib
2020-09-22 14:40:54 +01:00
Kegsay 18231f25b4
Implement rejected events (#1426)
* WIP Event rejection

* Still send back errors for rejected events

Instead, discard them at the federationapi /send layer rather than
re-implementing checks at the clientapi/PerformJoin layer.

* Implement rejected events

Critically, rejected events CAN cause state resolution to happen
as it can merge forks in the DAG. This is fine, _provided_ we
do not add the rejected event when performing state resolution,
which is what this PR does. It also fixes the error handling
when NotAllowed happens, as we were checking too early and needlessly
handling NotAllowed in more than one place.

* Update test to match reality

* Modify InputRoomEvents to no longer return an error

Errors do not serialise across HTTP boundaries in polylith mode,
so instead set fields on the InputRoomEventsResponse. Add `Err()`
function to make the API shape basically the same.

* Remove redundant returns; linting

* Update blacklist
2020-09-16 13:00:52 +01:00