This now should actually speed up startup times.
This is because _many_ rooms (like DMs) don't have room ACLs, this means
that we had around 95% pointless DB queries. (as queried on d.m.org)
This PR changes a few things:
- It pulls out the creation of several NIDs from the `StoreEvent`
function to make the functions more reusable
- Uses more caching when using those NIDs to avoid DB round trips
This optimizes history visibility checks by (mostly) avoiding database
hits.
Possibly solves https://github.com/matrix-org/dendrite/issues/2777
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
* Reprocess outliers that were previously rejected
* Might as well do all events this way
* More useful errors
* Fix queries
* Tweak condition
* Don't wrap errors
* Report more useful error
* Flatten error on `r.Queryer.QueryStateAfterEvents`
* Some more debug logging
* Flatten error in `QueryRestrictedJoinAllowed`
* Revert "Flatten error in `QueryRestrictedJoinAllowed`"
This reverts commit 1238b4184c.
* Tweak `QueryStateAfterEvents`
* Handle MissingStateError too
* Scope to room
* Clean up
* Fix the error
* Only apply rejection check to outliers
* Add Room Aliases tests
* Add Rooms table test
* Move StateKeyTuplerSorter to the types package
* Add StateBlock tests
Some optimizations
* Add State Snapshot tests
Some optimization
* Return []int64 and convert to pq.Int64Array for postgres
* Move []types.EventNID back to rows.Next()
* Update tests, rename SelectRoomIDs
* Add EventJSONTable tests
* Add eventJSON tests
* Add EventStateKeysTable tests
* Add EventTypesTable tests
* Add Events Table tests
Move variable declaration outside loops
Switch to testify/assert for tests
* Move variable declaration outside loop
* Remove random data
* Fix issue where the EventReferenceSHA256 is not set
* Add more tests
* Revert "Fix issue where the EventReferenceSHA256 is not set"
This reverts commit 8ae34c4e5f.
* Update GMSL
* Add tests for duplicate entries
* Test what happens if we select non-existing NIDs
* Add test for non-existing eventType
* Really update GMSL
* Only add events to `add_state_events` that haven't already been sent to the roomserver output before
* Filter on event NIDs instead, hopefully bring joy to SQLite
* UnsentFilter, review comments
* Add transaction to all database tables in roomserver, rename latest events updater to room updater, use room updater for all RS input
* Better transaction management
* Tweak order
* Handle cases where the room does not exist
* Other fixes
* More tweaks
* Fill some gaps
* Fill in the gaps
* good lord it gets worse
* Don't roll back transactions when events rejected
* Pass through errors properly
* Fix bugs
* Fix incorrect error check
* Don't panic on nil txns
* Tweaks
* Hopefully fix panics for good in SQLite this time
* Fix rollback
* Minor bug fixes with latest event updater
* Some review comments
* Revert "Some review comments"
This reverts commit 0caf8cf53e.
* Fix a couple of bugs
* Clearer commit and rollback results
* Remove unnecessary prepares
* Put federation client functions into their own file
* Look for missing auth events in RS input
* Remove retrieveMissingAuthEvents from federation API
* Logging
* Sorta transplanted the code over
* Use event origin failing all else
* Don't get stuck on mutexes:
* Add verifier
* Don't mark state events with zero snapshot NID as not existing
* Check missing state if not an outlier before storing the event
* Reject instead of soft-fail, don't copy roominfo so much
* Use synchronous contexts, limit time to fetch missing events
* Clean up some commented out bits
* Simplify `/send` endpoint significantly
* Submit async
* Report errors on sending to RS input
* Set max payload in NATS to 16MB
* Tweak metrics
* Add `workerForRoom` for tidiness
* Try skipping unmarshalling errors for RespMissingEvents
* Track missing prev events separately to avoid calculating state when not possible
* Tweak logic around checking missing state
* Care about state when checking missing prev events
* Don't check missing state for create events
* Try that again
* Handle create events better
* Send create room events as new
* Use given event kind when sending auth/state events
* Revert "Use given event kind when sending auth/state events"
This reverts commit 089d64d271.
* Only search for missing prev events or state for new events
* Tweaks
* We only have missing prev if we don't supply state
* Room version tweaks
* Allow async inputs again
* Apply backpressure to consumers/synchronous requests to hopefully stop things being overwhelmed
* Set timeouts on roomserver input tasks (need to decide what timeout makes sense)
* Use work queue policy, deliver all on restart
* Reduce chance of duplicates being sent by NATS
* Limit the number of servers we attempt to reduce backpressure
* Some review comment fixes
* Tidy up a couple things
* Don't limit servers, randomise order using map
* Some context refactoring
* Update gmsl
* Don't resend create events
* Set stateIDs length correctly or else the roomserver thinks there are missing events when there aren't
* Exclude our own servername
* Try backing off servers
* Make excluding self behaviour optional
* Exclude self from g_m_e
* Update sytest-whitelist
* Update consumers for the roomserver output stream
* Remember to send outliers for state returned from /gme
* Make full HTTP tests less upsetti
* Remove 'If a device list update goes missing, the server resyncs on the next one' from the sytest blacklist
* Remove debugging test
* Fix blacklist again, remove unnecessary duplicate context
* Clearer contexts, don't use background in case there's something happening there
* Don't queue up events more than once in memory
* Correctly identify create events when checking for state
* Fill in gaps again in /gme code
* Remove `AuthEventIDs` from `InputRoomEvent`
* Remove stray field
Co-authored-by: Kegan Dougal <kegan@matrix.org>
* Do not store 'null' in the database for empty JSON arrays
This can cause issues, though it should be noted that the majority
of the time this will marshal/unmarshal just fine, see
https://play.golang.org/p/Doe2NZUgv7Q
* bugfix: sqlite migration should handle create events as having no 'before' snapshot
The state snapshot for any given event in the roomserver represents the state _before_
the event. For the create event, this is nothing, so the state snapshot nid should be 0.
In some cases this wasn't happening, resulting in a nice mix of possible options including:
- A state snapshot without any state blocks `[]` or `null`.
- A state snapshot with a single state block with a single event, the create event, causing
a circular loop. This is incorrect as it represents the state before the event, not after.
* Add state key check
* Hash-deduplicated state storage (and migrations) for PostgreSQL and SQLite
* Refactor droomserver database setup for migrations
* Fix conflict statements
* Update migration names
* Set a boundary for old to new block/snapshot IDs so we don't rewrite them more than once accidentally
* Create sequence if not exists
* Fix boundary queries
* Fix boundary queries
* Use Query
* Break out queries a bit
* More sequence tweaks
* Query parameters are not playing the game
* Injection escaping may not work for CREATE SEQUENCE after all
* Fix snapshot sequence name
* Use boundaried IDs in SQLite too
* Use IFNULL for SQLite
* Use COALESCE in PostgreSQL
* Review comments @Kegsay
* Hit the database far less to find room NIDs for event NIDs
* Close the rows
* Fix SQLite selectRoomNIDsForEventNIDsSQL
* Give same treatment to room version lookups
* WIP Event rejection
* Still send back errors for rejected events
Instead, discard them at the federationapi /send layer rather than
re-implementing checks at the clientapi/PerformJoin layer.
* Implement rejected events
Critically, rejected events CAN cause state resolution to happen
as it can merge forks in the DAG. This is fine, _provided_ we
do not add the rejected event when performing state resolution,
which is what this PR does. It also fixes the error handling
when NotAllowed happens, as we were checking too early and needlessly
handling NotAllowed in more than one place.
* Update test to match reality
* Modify InputRoomEvents to no longer return an error
Errors do not serialise across HTTP boundaries in polylith mode,
so instead set fields on the InputRoomEventsResponse. Add `Err()`
function to make the API shape basically the same.
* Remove redundant returns; linting
* Update blacklist
* Per-room input mutex
* GetMembership should use transaction when assigning state key NID
* Actually use writer transactions rather than ignoring them
* Limit per-room mutexes to Postgres
* Flip the check in InputRoomEvents
* Updated TransactionWriters, moved locks in roomserver, various other tweaks
* Fix redaction deadlocks
* Fix lint issue
* Rename SQLiteTransactionWriter to ExclusiveTransactionWriter
* Fix us not sending transactions through in latest events updater
* Use TransactionWriter on other component SQLites
* Fix sync API tests
* Fix panic in media API
* Fix a couple of transactions
* Fix wrong query, add some logging output
* Add debug logging into StoreEvent
* Adjust InsertRoomNID
* Update logging
* Make backfill work for shared history visibility
* fetch missing state on backfill to remember snapshots correctly
* Fix gmsl to not mux in auth events into room state
* Whoops
* Linting
* Room version 2 by default, other wiring updates, update gomatrixserverlib
* Fix nil pointer exception
* Fix some more nil pointer exceptions hopefully
* Update gomatrixserverlib
* Send all room versions when joining, not just stable ones
* Remove room version cquery
* Get room version when getting events from the roomserver database
* Reset default back to room version 2
* Don't generate event IDs unless needed
* Revert "Remove room version cquery"
This reverts commit a170d58733.
* Query room version in federation API, client API as needed
* Improvements to make_join send_join dance
* Make room server producers use headered events
* Lint tweaks
* Update gomatrixserverlib
* Versioned SendJoin
* Query room version in syncapi backfill
* Handle transaction marshalling/unmarshalling within Dendrite
* Sorta fix federation (kinda)
* whoops commit federation API too
* Use NewEventFromTrustedJSON when getting events from the database
* Update gomatrixserverlib
* Strip headers on federationapi endpoints
* Fix bug in clientapi profile room version query
* Update gomatrixserverlib
* Return more useful error if room version query doesn't find the room
* Update gomatrixserverlib
* Update gomatrixserverlib
* Maybe fix federation
* Fix formatting directive
* Update sytest whitelist and blacklist
* Temporarily disable room versions 3 and 4 until gmsl is fixed
* Fix count of EDUs in logging
* Update gomatrixserverlib
* Update gomatrixserverlib
* Update gomatrixserverlib
* Rely on EventBuilder in gmsl to generate the event IDs for us
* Some review comments fixed
* Move function out of common and into gmsl
* Comment in federationsender destinationqueue
* Update gomatrixserverlib
* Use a fork of pq which supports userCurrent on wasm
* Use sqlite3_js driver when running in JS
* Add cmd/dendritejs to pull in sqlite3_js driver for wasm only
* Update to latest go-sqlite-js version
* Replace prometheus with a stub. sigh
* Hard-code a config and don't use opentracing
* Latest go-sqlite3-js version
* Generate a key for now
* Listen for fetch traffic rather than HTTP
* Latest hacks for js
* libp2p support
* More libp2p
* Fork gjson to allow us to enforce auth checks as before
Previously, all events would come down redacted because the hash
checks would fail. They would fail because sjson.DeleteBytes didn't
remove keys not used for hashing. This didn't work because of a build
tag which included a file which no-oped the index returned.
See https://github.com/tidwall/gjson/issues/157
When it's resolved, let's go back to mainline.
* Use gjson@1.6.0 as it fixes https://github.com/tidwall/gjson/issues/157
* Use latest gomatrixserverlib for sig checks
* Fix a bug which could cause exclude_from_sync to not be set
Caused when sending events over federation.
* Use query variadic to make lookups actually work!
* Latest gomatrixserverlib
* Add notes on getting p2p up and running
Partly so I don't forget myself!
* refactor: Move p2p specific stuff to cmd/dendritejs
This is important or else the normal build of dendrite will fail
because the p2p libraries depend on syscall/js which doesn't work
on normal builds.
Also, clean up main.go to read a bit better.
* Update ho-http-js-libp2p to return errors from RoundTrip
* Add an LRU cache around the key DB
We actually need this for P2P because otherwise we can *segfault*
with things like: "runtime: unexpected return pc for runtime.handleEvent"
where the event is a `syscall/js` event, caused by spamming sql.js
caused by "Checking event signatures for 14 events of room state" which
hammers the key DB repeatedly in quick succession.
Using a cache fixes this, though the underlying cause is probably a bug
in the version of Go I'm on (1.13.7)
* breaking: Add Tracing.Enabled to toggle whether we do opentracing
Defaults to false, which is why this is a breaking change. We need
this flag because WASM builds cannot do opentracing.
* Start adding conditional builds for wasm to handle lib/pq
The general idea here is to have the wasm build have a `NewXXXDatabase`
that doesn't import any postgres package and hence we never import
`lib/pq`, which doesn't work under WASM (undefined `userCurrent`).
* Remove lib/pq for wasm for syncapi
* Add conditional building to remaining storage APIs
* Update build script to set env vars correctly for dendritejs
* sqlite bug fixes
* Docs
* Add a no-op main for dendritejs when not building under wasm
* Use the real prometheus, even for WASM
Instead, the dendrite-sw.js must mock out `process.pid` and
`fs.stat` - which must invoke the callback with an error (e.g `EINVAL`)
in order for it to work:
```
global.process = {
pid: 1,
};
global.fs.stat = function(path, cb) {
cb({
code: "EINVAL",
});
}
```
* Linting
* Move current work into single branch
* Initial massaging of clientapi etc (not working yet)
* Interfaces for accounts/devices databases
* Duplicate postgres package for sqlite3 (no changes made to it yet)
* Some keydb, accountdb, devicedb, common partition fixes, some more syncapi tweaking
* Fix accounts DB, device DB
* Update naffka dependency for SQLite
* Naffka SQLite
* Update naffka to latest master
* SQLite support for federationsender
* Mostly not-bad support for SQLite in syncapi (although there are problems where lots of events get classed incorrectly as backward extremities, probably because of IN/ANY clauses that are badly supported)
* Update Dockerfile -> Go 1.13.7, add build-base (as gcc and friends are needed for SQLite)
* Implement GET endpoints for account_data in clientapi
* Nuke filtering for now...
* Revert "Implement GET endpoints for account_data in clientapi"
This reverts commit 4d80dff458.
* Implement GET endpoints for account_data in clientapi (#861)
* Implement GET endpoints for account_data in clientapi
* Fix accountDB parameter
* Remove fmt.Println
* Fix insertAccountData SQLite query
* Fix accountDB storage interfaces
* Add empty push rules into account data on account creation (#862)
* Put SaveAccountData into the right function this time
* Not sure if roomserver is better or worse now
* sqlite work
* Allow empty last sent ID for the first event
* sqlite: room creation works
* Support sending messages
* Nuke fmt.println
* Move QueryVariadic etc into common, other device fixes
* Fix some linter issues
* Fix bugs
* Fix some linting errors
* Fix errcheck lint errors
* Make naffka use postgres as fallback, fix couple of compile errors
* What on earth happened to the /rooms/{roomID}/send/{eventType} routing
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>