Commit graph

6 commits

Author SHA1 Message Date
Neil Alexander eb352a5f6b
Full roomserver input transactional isolation (#2141)
* Add transaction to all database tables in roomserver, rename latest events updater to room updater, use room updater for all RS input

* Better transaction management

* Tweak order

* Handle cases where the room does not exist

* Other fixes

* More tweaks

* Fill some gaps

* Fill in the gaps

* good lord it gets worse

* Don't roll back transactions when events rejected

* Pass through errors properly

* Fix bugs

* Fix incorrect error check

* Don't panic on nil txns

* Tweaks

* Hopefully fix panics for good in SQLite this time

* Fix rollback

* Minor bug fixes with latest event updater

* Some review comments

* Revert "Some review comments"

This reverts commit 0caf8cf53e.

* Fix a couple of bugs

* Clearer commit and rollback results

* Remove unnecessary prepares
2022-02-04 10:39:34 +00:00
Neil Alexander 893aa3b141
More logging tweaks 2022-01-31 16:01:54 +00:00
Neil Alexander d21f3eace0
Roomserver fixes (#2133)
* Improve server selection somewhat

* Remove things from the map when we're done

* Be less panicky about auth event signatures in case they are not fatal after all

* Accept HasState in all cases

* Send join asynchronously

* Revert "Send join asynchronously"

This reverts commit 5b685bfcd0.

* Joins and leaves use background context
2022-01-31 14:36:59 +00:00
Neil Alexander ba1a9b98b7
Tweak some logging (#2130)
* Modify some log levels

* Update gomatrixserverlib to matrix-org/gomatrixserverlib@336334f

* Update gomatrixserverlib to matrix-org/gomatrixserverlib@cde7ac8

* Demote warning about key change producer

* Add more useful roomserver logging

* Further tweaking
2022-01-31 10:48:28 +00:00
Neil Alexander a271fde8f5
Only limit context for fetching missing auth/prev events (#2131) 2022-01-31 10:39:33 +00:00
Neil Alexander a763cbb0e1
Roomserver/federation input refactor (#2104)
* Put federation client functions into their own file

* Look for missing auth events in RS input

* Remove retrieveMissingAuthEvents from federation API

* Logging

* Sorta transplanted the code over

* Use event origin failing all else

* Don't get stuck on mutexes:

* Add verifier

* Don't mark state events with zero snapshot NID as not existing

* Check missing state if not an outlier before storing the event

* Reject instead of soft-fail, don't copy roominfo so much

* Use synchronous contexts, limit time to fetch missing events

* Clean up some commented out bits

* Simplify `/send` endpoint significantly

* Submit async

* Report errors on sending to RS input

* Set max payload in NATS to 16MB

* Tweak metrics

* Add `workerForRoom` for tidiness

* Try skipping unmarshalling errors for RespMissingEvents

* Track missing prev events separately to avoid calculating state when not possible

* Tweak logic around checking missing state

* Care about state when checking missing prev events

* Don't check missing state for create events

* Try that again

* Handle create events better

* Send create room events as new

* Use given event kind when sending auth/state events

* Revert "Use given event kind when sending auth/state events"

This reverts commit 089d64d271.

* Only search for missing prev events or state for new events

* Tweaks

* We only have missing prev if we don't supply state

* Room version tweaks

* Allow async inputs again

* Apply backpressure to consumers/synchronous requests to hopefully stop things being overwhelmed

* Set timeouts on roomserver input tasks (need to decide what timeout makes sense)

* Use work queue policy, deliver all on restart

* Reduce chance of duplicates being sent by NATS

* Limit the number of servers we attempt to reduce backpressure

* Some review comment fixes

* Tidy up a couple things

* Don't limit servers, randomise order using map

* Some context refactoring

* Update gmsl

* Don't resend create events

* Set stateIDs length correctly or else the roomserver thinks there are missing events when there aren't

* Exclude our own servername

* Try backing off servers

* Make excluding self behaviour optional

* Exclude self from g_m_e

* Update sytest-whitelist

* Update consumers for the roomserver output stream

* Remember to send outliers for state returned from /gme

* Make full HTTP tests less upsetti

* Remove 'If a device list update goes missing, the server resyncs on the next one' from the sytest blacklist

* Remove debugging test

* Fix blacklist again, remove unnecessary duplicate context

* Clearer contexts, don't use background in case there's something happening there

* Don't queue up events more than once in memory

* Correctly identify create events when checking for state

* Fill in gaps again in /gme code

* Remove `AuthEventIDs` from `InputRoomEvent`

* Remove stray field

Co-authored-by: Kegan Dougal <kegan@matrix.org>
2022-01-27 14:29:14 +00:00