Commit graph

79 commits

Author SHA1 Message Date
Neil Alexander 119cde3766
De-race types.RoomInfo (#2600) 2022-08-01 15:29:19 +01:00
Neil Alexander 05c83923e3
Optimise checking other servers allowed to see events (#2596)
* Try optimising checking if server is allowed to see event

* Fix error

* Handle case where snapshot NID is 0

* Fix query

* Update SQL

* Clean up `CheckServerAllowedToSeeEvent`

* Not supported on SQLite

* Maybe placate the unit tests

* Review comments
2022-08-01 14:11:00 +01:00
Till 081f5e7226
Update database migrations, remove goose (#2264)
* Add new db migration

* Update migrations
Remove goose

* Add possibility to test direct upgrades

* Try to fix WASM test

* Add checks for specific migrations

* Remove AddMigration
Use WithTransaction
Add Dendrite version to table

* Fix linter issues

* Update tests

* Update comments, outdent if

* Namespace migrations

* Add direct upgrade tests, skipping over one version

* Split migrations

* Update go version in CI

* Fix copy&paste mistake

* Use contexts in migrations

Co-authored-by: kegsay <kegan@matrix.org>
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2022-07-25 10:39:22 +01:00
Neil Alexander c7d978274d
Try to fix HTTP 500s on /members (#2581) 2022-07-22 19:43:48 +01:00
Neil Alexander f0c8a03649
Membership updater refactoring (#2541)
* Membership updater refactoring

* Pass in membership state

* Use membership check rather than referring to state directly

* Delete irrelevant membership states

* We don't need the leave event after all

* Tweaks

* Put a log entry in that I might stand a chance of finding

* Be less panicky

* Tweak invite handling

* Don't freak if we can't find the event NID

* Use event NID from `types.Event`

* Clean up

* Better invite handling

* Placate the almighty linter

* Blacklist a Sytest which is otherwise fine under Complement for reasons I don't understand

* Fix the sytest after all (thanks @S7evinK for the spot)
2022-07-22 14:44:04 +01:00
Till 5087b36af0
Fix QuerySharedUsers for the SyncAPI keychange consumer (#2554)
* Make more use of base.BaseDendrite

* Fix QuerySharedUsers if no UserIDs are supplied
2022-07-05 14:50:56 +02:00
Till 05607d6b87
Add roomserver tests (3/4) (#2447)
* Add Room Aliases tests

* Add Rooms table test

* Move StateKeyTuplerSorter to the types package

* Add StateBlock tests
Some optimizations

* Add State Snapshot tests
Some optimization

* Return []int64 and convert to pq.Int64Array for postgres

* Move []types.EventNID back to rows.Next()

* Update tests, rename SelectRoomIDs
2022-05-16 19:33:16 +02:00
Till 6db08b2874
Add roomserver tests (2/?) (#2445)
* Add invite table tests; move variable declarations

* Add Membership table tests

* Move variable declarations

* Add PrevEvents table tests

* Add Published table test

* Add Redactions tests
Fix bug in SQLite markRedactionValidatedSQL

* PR comments, better readability for invite tests
2022-05-10 14:41:12 +02:00
Till f69ebc6af2
Add roomserver tests (1/?) (#2434)
* Add EventJSONTable tests

* Add eventJSON tests

* Add EventStateKeysTable tests

* Add EventTypesTable tests

* Add Events Table tests
Move variable declaration outside loops
Switch to testify/assert for tests

* Move variable declaration outside loop

* Remove random data

* Fix issue where the EventReferenceSHA256 is not set

* Add more tests

* Revert "Fix issue where the EventReferenceSHA256 is not set"

This reverts commit 8ae34c4e5f.

* Update GMSL

* Add tests for duplicate entries

* Test what happens if we select non-existing NIDs

* Add test for non-existing eventType

* Really update GMSL
2022-05-09 15:30:32 +02:00
Neil Alexander 4ad5f9c982
Global database connection pool (for monolith mode) (#2411)
* Allow monolith components to share a single database pool

* Don't yell about missing connection strings

* Rename field

* Setup tweaks

* Fix panic

* Improve configuration checks

* Update config

* Fix lint errors

* Update comments
2022-05-03 16:35:06 +01:00
Neil Alexander 4e64c270db
Various bug fixes and tweaks around invites and membership 2022-03-17 17:05:21 +00:00
Neil Alexander 530f05885d
Limit JoinedUsersSetInRooms to interested users (#2234)
* Limit database work in `JoinedUsersSetInRooms` to changed user IDs only

* Comments

* Fix variadic params for SQLite, update comments
2022-03-01 13:01:38 +00:00
Neil Alexander aa6bbf484a
Return ErrRoomNoExists if insufficient state is available for a buildEvent to succeed when joining a room (#2210)
This may help cases like #2206, since it should prompt us to try a federated join again instead.
2022-02-21 16:22:29 +00:00
Neil Alexander a02dd7721d
Reset invalid state snapshots for events during state storage refactor migration (#2209)
This should help with #2204. We can't do this for rooms, only events.
2022-02-21 15:25:54 +00:00
Neil Alexander 7dfc7c3d70
Don't re-send sent events in add_state_events (#2195)
* Only add events to `add_state_events` that haven't already been sent to the roomserver output before

* Filter on event NIDs instead, hopefully bring joy to SQLite

* UnsentFilter, review comments
2022-02-17 13:53:48 +00:00
Neil Alexander 5106cc807c
Ensure only one transaction is used for RS input per room (#2178)
* Ensure the input API only uses a single transaction

* Remove more of the dead query API call

* Tidy up

* Fix tests hopefully

* Don't do unnecessary work for rooms that don't exist

* Improve error, fix another case where transaction wasn't used properly

* Add a unit test for checking single transaction on RS input API

* Fix logic oops when deciding whether to use a transaction in storeEvent
2022-02-11 17:40:14 +00:00
Neil Alexander 37cbe263ce
Fix transaction issues in events table in PSQL (#2165)
* Revert "Revert "Fix storage bug in PSQL events table""

This reverts commit cf447dd52a.

* Membership updater to use updater

* Fix membership updater to use transactions properly
2022-02-10 09:30:16 +00:00
Neil Alexander cf447dd52a
Revert "Fix storage bug in PSQL events table"
This reverts commit b4687f2ed2.
2022-02-09 11:41:21 +00:00
Neil Alexander b4687f2ed2
Fix storage bug in PSQL events table 2022-02-09 11:24:49 +00:00
Neil Alexander 0e26662a55
Allow events to be un-rejected (#2159)
* Allow un-rejecting an event later

* SQL

* Only un-reject, don't re-reject

* Clarify ambiguous column reference
2022-02-08 13:45:48 +00:00
Neil Alexander eb352a5f6b
Full roomserver input transactional isolation (#2141)
* Add transaction to all database tables in roomserver, rename latest events updater to room updater, use room updater for all RS input

* Better transaction management

* Tweak order

* Handle cases where the room does not exist

* Other fixes

* More tweaks

* Fill some gaps

* Fill in the gaps

* good lord it gets worse

* Don't roll back transactions when events rejected

* Pass through errors properly

* Fix bugs

* Fix incorrect error check

* Don't panic on nil txns

* Tweaks

* Hopefully fix panics for good in SQLite this time

* Fix rollback

* Minor bug fixes with latest event updater

* Some review comments

* Revert "Some review comments"

This reverts commit 0caf8cf53e.

* Fix a couple of bugs

* Clearer commit and rollback results

* Remove unnecessary prepares
2022-02-04 10:39:34 +00:00
Neil Alexander a763cbb0e1
Roomserver/federation input refactor (#2104)
* Put federation client functions into their own file

* Look for missing auth events in RS input

* Remove retrieveMissingAuthEvents from federation API

* Logging

* Sorta transplanted the code over

* Use event origin failing all else

* Don't get stuck on mutexes:

* Add verifier

* Don't mark state events with zero snapshot NID as not existing

* Check missing state if not an outlier before storing the event

* Reject instead of soft-fail, don't copy roominfo so much

* Use synchronous contexts, limit time to fetch missing events

* Clean up some commented out bits

* Simplify `/send` endpoint significantly

* Submit async

* Report errors on sending to RS input

* Set max payload in NATS to 16MB

* Tweak metrics

* Add `workerForRoom` for tidiness

* Try skipping unmarshalling errors for RespMissingEvents

* Track missing prev events separately to avoid calculating state when not possible

* Tweak logic around checking missing state

* Care about state when checking missing prev events

* Don't check missing state for create events

* Try that again

* Handle create events better

* Send create room events as new

* Use given event kind when sending auth/state events

* Revert "Use given event kind when sending auth/state events"

This reverts commit 089d64d271.

* Only search for missing prev events or state for new events

* Tweaks

* We only have missing prev if we don't supply state

* Room version tweaks

* Allow async inputs again

* Apply backpressure to consumers/synchronous requests to hopefully stop things being overwhelmed

* Set timeouts on roomserver input tasks (need to decide what timeout makes sense)

* Use work queue policy, deliver all on restart

* Reduce chance of duplicates being sent by NATS

* Limit the number of servers we attempt to reduce backpressure

* Some review comment fixes

* Tidy up a couple things

* Don't limit servers, randomise order using map

* Some context refactoring

* Update gmsl

* Don't resend create events

* Set stateIDs length correctly or else the roomserver thinks there are missing events when there aren't

* Exclude our own servername

* Try backing off servers

* Make excluding self behaviour optional

* Exclude self from g_m_e

* Update sytest-whitelist

* Update consumers for the roomserver output stream

* Remember to send outliers for state returned from /gme

* Make full HTTP tests less upsetti

* Remove 'If a device list update goes missing, the server resyncs on the next one' from the sytest blacklist

* Remove debugging test

* Fix blacklist again, remove unnecessary duplicate context

* Clearer contexts, don't use background in case there's something happening there

* Don't queue up events more than once in memory

* Correctly identify create events when checking for state

* Fill in gaps again in /gme code

* Remove `AuthEventIDs` from `InputRoomEvent`

* Remove stray field

Co-authored-by: Kegan Dougal <kegan@matrix.org>
2022-01-27 14:29:14 +00:00
Neil Alexander 6e93531e94
Don't persist transaction IDs in the roomserver (#2048) 2021-11-22 09:13:12 +00:00
Neil Alexander 403498a85b
Only return non-stub rooms from GetKnownRooms (#2049)
* Only return non-stub rooms from `GetKnownRooms`

This should stop a bunch of errors at startup with invalid server ACLs.

* Fix query
2021-11-18 11:34:19 +00:00
kegsay 7dc8fb1fe7
Add more logs (#2005)
* Add more logs

To help debug the migration issue in #1924 along with manual data-loss-inducing fixes.
Also log the origin server on processed txns to help debug buggy server origins.

* Fix query
2021-09-07 15:07:14 +01:00
kegsay 4cc8b28b7f
Ensure all create events have a snapshot NID of 0 (#1961)
Fixes #1924 for postgres users, though the underlying cause of why
they aren't 0 in the first place is unresolved.
2021-08-04 17:48:23 +01:00
Kegan Dougal ed4097825b Factor out StatementList to sqlutil and use it in userapi
It helps with the boilerplate.
2021-07-28 18:30:04 +01:00
kegsay 16bf94f239
Not finding the snapshot is not fatal (#1940) 2021-07-26 12:30:44 +01:00
Neil Alexander f0f8c7f055
Optimise QueryServerJoinedToRoom (#1933)
* Optimise checking if a server is in a room

* Fix queries

* Fix queries
2021-07-21 13:06:32 +01:00
Neil Alexander c8408a6387
Add more optimised code path for checking if we're in a room (#1909)
* Add more optimised code path for checking if we're in a room

* Fix database queries

* Fix federation API test

* Fix logging

* Review comments

* Make separate API call for room membership
2021-07-09 16:36:45 +01:00
kegsay 3e50bac944
bugfix: order the state blocks so recreating state snapshots works correctly (#1908)
* Logging

* Revert "Logging"

This reverts commit 23ce334182.

* bugfix: order the state blocks so recreating state snapshots works correctly
2021-07-09 10:49:49 +01:00
kegsay 5a09290c32
db migration: handle create events with no state blocks from v0.1.0 (#1904) 2021-07-07 17:07:33 +01:00
kegsay c849e74dfc
db migration: fix #1844 and add additional assertions (#1889)
* db migration: fix #1844 and add additional assertions

- Migration scripts will now check to see if there are any unconverted
  snapshot IDs and fail the migration if there are any. This should
  prevent people from getting a corrupt database in the event the root
  cause is still unknown.
- Add an ORDER BY clause when doing batch queries in the postgres
  migration. LIMIT and OFFSET without ORDER BY are undefined and must
  not be relied upon to produce a deterministic ordering (e.g row order).
  See https://www.postgresql.org/docs/current/queries-limit.html

* Linting

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2021-06-29 11:25:17 +01:00
Neil Alexander 5ce1fe80de
State storage refactor (#1839)
* Hash-deduplicated state storage (and migrations) for PostgreSQL and SQLite

* Refactor droomserver database setup for migrations

* Fix conflict statements

* Update migration names

* Set a boundary for old to new block/snapshot IDs so we don't rewrite them more than once accidentally

* Create sequence if not exists

* Fix boundary queries

* Fix boundary queries

* Use Query

* Break out queries a bit

* More sequence tweaks

* Query parameters are not playing the game

* Injection escaping may not work for CREATE SEQUENCE after all

* Fix snapshot sequence name

* Use boundaried IDs in SQLite too

* Use IFNULL for SQLite

* Use COALESCE in PostgreSQL

* Review comments @Kegsay
2021-04-26 13:25:57 +01:00
Neil Alexander 9057143033
Hit the database far less in Events to find room NIDs and room versions (#1643)
* Hit the database far less to find room NIDs for event NIDs

* Close the rows

* Fix SQLite selectRoomNIDsForEventNIDsSQL

* Give same treatment to room version lookups
2020-12-16 10:33:28 +00:00
Neil Alexander b5aa7ca3ab
Top-level setup package (#1605)
* Move config, setup, mscs into "setup" top-level folder

* oops, forgot the EDU server

* Add setup

* goimports
2020-12-02 17:41:00 +00:00
S7evinK eccd0d2c1b
Implement forgetting about rooms (#1572)
* Add basic storage methods

* Add internal api handler

* Add check for forgotten room

* Add /rooms/{roomID}/forget endpoint

* Add missing rsAPI method

* Remove unused parameters

* Add passing tests

Signed-off-by: Till Faelligen <tfaelligen@gmail.com>

* Add missing file

* Add postgres migration

* Add sqlite migration

* Use Forgetter to forget room

* Remove empty line

* Update HTTP status codes

It looks like the spec calls for these to be 400, rather than 403: https://matrix.org/docs/spec/client_server/r0.6.1#post-matrix-client-r0-rooms-roomid-forget

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2020-11-05 10:19:23 +00:00
Neil Alexander 8d9ecb3996
Ignore duplicate redaction entries (#1522) 2020-10-14 15:24:43 +01:00
Sam a6700331ce
Update all usages of tx.Stmt to sqlutil.TxStmt (#1423)
* Replace all usages of txn.Stmt with sqlutil.TxStmt

Signed-off-by: Sam Day <me@samcday.com>

* Fix sign off link in PR template.

Signed-off-by: Sam Day <me@samcday.com>

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2020-09-24 11:10:14 +01:00
Kegsay 18231f25b4
Implement rejected events (#1426)
* WIP Event rejection

* Still send back errors for rejected events

Instead, discard them at the federationapi /send layer rather than
re-implementing checks at the clientapi/PerformJoin layer.

* Implement rejected events

Critically, rejected events CAN cause state resolution to happen
as it can merge forks in the DAG. This is fine, _provided_ we
do not add the rejected event when performing state resolution,
which is what this PR does. It also fixes the error handling
when NotAllowed happens, as we were checking too early and needlessly
handling NotAllowed in more than one place.

* Update test to match reality

* Modify InputRoomEvents to no longer return an error

Errors do not serialise across HTTP boundaries in polylith mode,
so instead set fields on the InputRoomEventsResponse. Add `Err()`
function to make the API shape basically the same.

* Remove redundant returns; linting

* Update blacklist
2020-09-16 13:00:52 +01:00
Kegsay ca8dcf46b7
Remove QuerySharedUsers from current state server (#1396)
* Remove QuerySharedUsers from current state server

* Bugfixes
2020-09-04 14:25:01 +01:00
Neil Alexander f1a98e1193
Fix nil txn bug 2020-09-04 10:22:32 +01:00
Kegsay 33b8143a95
Implement more CSS storage functions in roomserver (#1388) 2020-09-03 18:27:02 +01:00
Kegsay b20386123e
Move currentstateserver API to roomserver (#1387)
* Move currentstateserver API to roomserver

Stub out DB functions for now, nothing uses the roomserver version yet.

* Allow it to startup

* Implement some current-state-server storage interface functions

* Add missing package
2020-09-03 17:20:54 +01:00
Kegsay 02a73f29f8
Expand RoomInfo to cover more DB storage functions (#1377)
* Factor more things to RoomInfo

* Factor out remaining bits for RoomInfo

* Linting for now
2020-09-02 10:02:48 +01:00
Kegsay 6d79f04354
Add RoomInfo metadata struct (#1367)
* Add RoomInfo struct

* Remove RoomNID and replace with RoomInfo

* Bugfix and remove another needless query

* nil guard
2020-09-01 12:40:49 +01:00
Neil Alexander c8b873abc8
Roomserver NID caches (#1335)
* Initial work on roomserver NID caches

* Give caches to roomserver storage

* Populate caches

* Fix bugs

* Fix WASM build

* Don't hit cache twice in RoomNIDExcludingStubs

* Store reverse room ID-room NID mapping, consult caches when assigning NIDs
2020-08-25 12:32:29 +01:00
Neil Alexander 9d53351dc2
Component-wide TransactionWriters (#1290)
* Offset updates take place using TransactionWriter

* Refactor TransactionWriter in current state server

* Refactor TransactionWriter in federation sender

* Refactor TransactionWriter in key server

* Refactor TransactionWriter in media API

* Refactor TransactionWriter in server key API

* Refactor TransactionWriter in sync API

* Refactor TransactionWriter in user API

* Fix deadlocking Sync API tests

* Un-deadlock device database

* Fix appservice API

* Rename TransactionWriters to Writers

* Move writers up a layer in sync API

* Document sqlutil.Writer interface

* Add note to Writer documentation
2020-08-21 10:42:08 +01:00
Neil Alexander 068a3d3c9f
Roomserver per-room input parallelisation (Postgres) (#1289)
* Per-room input mutex

* GetMembership should use transaction when assigning state key NID

* Actually use writer transactions rather than ignoring them

* Limit per-room mutexes to Postgres

* Flip the check in InputRoomEvents
2020-08-20 16:24:33 +01:00
Neil Alexander b24747b305
Transaction writer changes, move roomserver writers (#1285)
* Updated TransactionWriters, moved locks in roomserver, various other tweaks

* Fix redaction deadlocks

* Fix lint issue

* Rename SQLiteTransactionWriter to ExclusiveTransactionWriter

* Fix us not sending transactions through in latest events updater
2020-08-19 15:38:27 +01:00