* Add more logs
To help debug the migration issue in #1924 along with manual data-loss-inducing fixes.
Also log the origin server on processed txns to help debug buggy server origins.
* Fix query
* Add more optimised code path for checking if we're in a room
* Fix database queries
* Fix federation API test
* Fix logging
* Review comments
* Make separate API call for room membership
* Ensure worker has work before starting goroutine
* Revert "Remove processEventWithMissingStateMutex"
This reverts commit 7f02eab47d.
* Use request context when processing transactions
* Keep goroutine count down by not starting work for things where the caller gave up
* Remove mutex, start workers at correct time
* Try to process rooms concurrently in FS /send
* Clean up
* Use request context so that dead things don't linger for so long
* Remove mutex
* Free up pdus slice so only references remaining are in channel
* Revert "Remove mutex"
This reverts commit 8558075e8c.
* Process EDUs in parallel
* Try refactoring /send concurrency
* Fix waitgroup
* Release on waitgroup
* Respond to transaction
* Reduce CPU usage, fix unit tests
* Tweaks
* Move into one file
* More aggressive event caching
* Deduplicate /state results
* Deduplicate more
* Ensure we use the correct list of events when excluding repeated state
* Fixes
* Ensure we track all events we already knew about properly
Squashed commit of the following:
commit 7fad77c10e3c1c78feddb37351812b209d9c0f25
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Mon Jun 28 15:06:52 2021 +0100
Fix processEventWithMissingStateMutexes
commit 138cddcac7b8373a8e1816a232f84a7bda6adcdf
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Mon Jun 28 13:59:44 2021 +0100
Use internal.MutexByRoom
commit 6e6f026cfad31da391ad261cfec16d41dff1b15b
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Mon Jun 28 13:50:18 2021 +0100
Try to slow things down per room
commit b97d406dff2e11769a9202fbf58b138a541ca449
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Mon Jun 28 13:41:27 2021 +0100
Try to slow things down
commit 8866120ebf880b4fd8a456937f69903e233c19a2
Merge: 9f2de8a2 4a37b19a
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Mon Jun 28 13:40:33 2021 +0100
Merge branch 'neilalexander/rsinputfifo' into neilalexander/rsinputfifo2
commit 4a37b19a8f
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Mon Jun 28 13:34:54 2021 +0100
Add comments
commit f9ab3f4b81
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Mon Jun 28 13:31:21 2021 +0100
Tweaks
commit 9f2de8a29cadec4c785d9c2e4e74c1138305f759
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Mon Jun 28 13:15:59 2021 +0100
Ask origin only for missing things for now
commit 8fd878c75a
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Mon Jun 28 11:18:11 2021 +0100
Make sure someone wakes up
commit b63f699f1b
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Mon Jun 28 11:12:58 2021 +0100
Use a FIFO queue instead of a channel to reduce backpressure
* Add a per-room mutex to federationapi when processing transactions
This has numerous benefits:
- Prevents us doing lots of state resolutions in busy rooms. Previously, room forks would always result
in a state resolution being performed immediately, without checking if we were already doing this in
a different transaction. Now they will queue up, resulting in fewer calls to `/state_ids`, `/g_m_e`, etc.
- Prevents memory usage from growing too large as a result and potentially OOMing.
And costs:
- High traffic rooms will be slightly slower due to head-of-line blocking from other servers,
though this has always been an issue as roomserver has a per-room mutex already.
* Fix unit tests
* Correct mutex lock ordering
* Look up servers less often, don't hit API for missing auth events unless there are actually missing auth events
* Remove ResolveConflictsAdhoc (since it is already in GMSL), other tweaks
* Update gomatrixserverlib to matrix-org/gomatrixserverlib#254
* Fix resolve-state
* Initialise t.servers on first use
* fix conversion from int to string yields a string of one rune, not a string of digits
* Add receipts table to syncapi
* Use StreamingToken as the since value
* Add required method to testEDUProducer
* Make receipt json creation "easier" to read
* Add receipts api to the eduserver
* Add receipts endpoint
* Add eduserver kafka consumer
* Add missing kafka config
* Add passing tests to whitelist
Signed-off-by: Till Faelligen <tfaelligen@gmail.com>
* Fix copy & paste error
* Fix column count error
* Make outbound federation receipts pass
* Make "Inbound federation rejects receipts from wrong remote" pass
* Don't use errors package
* - Add TODO for batching requests
- Rename variable
* Return a better error message
* - Use OutputReceiptEvent instead of InputReceiptEvent as result
- Don't use the errors package for errors
- Defer CloseAndLogIfError to close rows
- Fix Copyright
* Better creation/usage of JoinResponse
* Query all joined rooms instead of just one
* Update gomatrixserverlib
* Add sqlite3 migration
* Add postgres migration
* Ensure required sequence exists before running migrations
* Clarification on comment
* - Fix a bug when creating client receipts
- Use concrete types instead of interface{}
* Remove dead code
Use key for timestamp
* Fix postgres query...
* Remove single purpose struct
* Use key/value directly
* Only apply receipts on initial sync or if edu positions differ,
otherwise we'll be sending the same receipts over and over again.
* Actually update the id, so it is correctly send in syncs
* Set receipt on request to /read_markers
* Fix issue with receipts getting overwritten
* Use fmt.Errorf instead of pkg/errors
* Revert "Add postgres migration"
This reverts commit 722fe5a046.
* Revert "Add sqlite3 migration"
This reverts commit d113b03f64.
* Fix selectRoomReceipts query
* Make golangci-lint happy
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
* Add KindOld
* Don't process latest events/memberships for old events
* Allow federationsender to ignore duplicate key entries when LatestEventIDs is duplicated by RS output events
* Signal to downstream components if an event has become a forward extremity
* Don't exclude from sync
* Soft-fail checks on KindNew
* Don't run the latest events updater at all for KindOld
* Don't make federation sender change after all
* Kind in federation sender join
* Don't send isForwardExtremity
* Fix syncapi
* Update comments
* Fix SendEventWithState
* Update sytest-whitelist
* Generate old output events
* Sync API consumes old room events
* Update comments
* Capture errors
* Don't request only state key tuples needed for auth (we end up discarding room state this way)
* QueryStateAfterEvent returns all state when no tuples supplied
* Resolve state
* Comments
* Recursively fetch auth events if needed
* Fix processEvent call
* Ask more servers in lookupEvent
* Don't panic!
* Panic at the Disco
* Find servers more aggressively
* Add getServers
* Fix number of servers to 5, don't bail making RespState if auth events missing
* Fix panic
* Ignore missing state events too
* Report number of servers correctly
* Don't reuse request context for /send_join
* Update federation API tests
* Don't recurse processEvents
* Implement getEvents differently
* Adjust backfill to send backward extremity with state before other backfilled events, include prev_events with no state amongst missing events
* Not finished refactor
* Fix test
* Remove isInboundTxn
* Remove debug logging
* Try to ask other servers in the room for missing events if the origin won't provide them
* Logging
* More logging
* Implement QueryMissingAuthPrevEvents
* Try to get missing auth events badly
* Use processEvent
* Logging
* Update QueryMissingAuthPrevEvents
* Try to find missing auth events
* Patchy fix for test
* Logging tweaks
* Send auth events as outliers
* Update check in QueryMissingAuthPrevEvents
* Error responses
* More return codes
* Don't return error on reject/soft-fail since it was ultimately handled
* More tweaks
* More error tweaks