Beforehand we disabled HTTP keepalives to prevent ambient system
resources from being used by excess idle connections. Now that we've
fixed some bugs in the federation API and device list updater, this
situation is now much better and we don't open so many remote
connections anyway.
Keepalives allow us to not have to handshake TLS so often (which is
quite expensive) and reusing an idle connection is much faster than
having to open a new one. This can help with response times when talking
to remote federated servers.
This PR also adds a new option to disable keepalives if needed:
```
# Disable HTTP keepalives, which also prevents connection reuse. Dendrite will typically
# keep HTTP connections open to remote hosts for 5 minutes as they can be reused much
# more quickly than opening new connections each time. Disabling keepalives will close
# HTTP connections immediately after a successful request but may result in more CPU and
# memory being used on TLS handshakes for each new connection instead.
disable_http_keepalives: false
```
See issue: [#2718](https://github.com/matrix-org/dendrite/issues/2718)
for more details.
The fix assumes that if the number of transaction items are different,
then the txnid should be different.
txnid := OriginalServerTS()_len(transactions)
The case that it doesn't address is if the txnid generated this way is
the same for 2 different batches of events which have the same
OriginalServerTS and the same array length.
Another option:
txnid := OriginalServerTS()_hash(transactions)
Would love to hear other ideas and ways to fix this.
### Pull Request Checklist
* [x ] I have added added tests for PR _or_ I have justified why this PR
doesn't need tests.
* [x ] Pull request includes a [sign
off](https://github.com/matrix-org/dendrite/blob/main/docs/CONTRIBUTING.md#sign-off)
Signed-off-by: `Tak Wai Wong <tak@hntlabs.com>`
Co-authored-by: Tak Wai Wong <tak@hntlabs.com>
This should hopefully fix an entire class of problems where components
downstream from the roomserver (i.e. the sync API) could just lose a
whole bunch of state after a rewrite operation like a federated join.
The root of the bug is that we set `RewritesState` in the output event
which instructs downstream components to purge their copy of any room
state, but then didn't send the entire state snapshot in
`adds_state_event_ids` so the downstream state ends up being incomplete
as a result.
* temporary fix for dendrite regression #2718
* Change comment to match with dendrite main pr
* renamed zion-registration.yaml to zion-appservice.yaml. Change gitignore to ignore this file.
Co-authored-by: Tak Wai Wong <tak@hntlabs.com>
zion hack - always send the notification data on a read receipt
the notifications are stored in two different databases, and somehow the notification database
prunes data, so trying to mark old notifications as read will fail, but the notification will still exist in the other db
todo, revist when this refactor lands: https://github.com/matrix-org/dendrite/pull/2688/files
it looks like we cleanup the notification table after a day
func (s *notificationsStatements) Clean(ctx context.Context, txn *sql.Tx) error {
_, err := sqlutil.TxStmt(txn, s.cleanNotificationsStmt).ExecContext(
ctx,
time.Now().AddDate(0, 0, -1).UnixNano()/int64(time.Millisecond), // keep non-highlights for a day
time.Now().AddDate(0, -1, 0).UnixNano()/int64(time.Millisecond), // keep highlights for a month
)
return err
}
But we don't clean up the notifications in the syncAPI table.
When we send a read receipt we first do a updated _, err := s.db.SetNotificationsRead(ctx, localpart, roomID, int64(read.Read), true) and only forward the message on if the table was updated. If a user waits more than a day to send a read receipt, they can't clear their notifications.
I was not seeing unread notifications in sync, even if they were written to the db
Notifications are in their own stream, but the code was trying to tack them onto the join room stream. If the offsets “happened” to line up, you might get a count here or there, but they would be totally wrong (jump from 1 to 0 to 2, etc)
To fix, put them in their own top level object, handle them on the client.
Signed-off-by: Austin Ellis <austin@hntlabs.com>
Previously `LoadMembershipAtEvent` would fail if the state before one of
the events was not known, i.e. because it was an outlier. This modifies
it so that it gracefully handles not knowing the state and returns no
memberships instead, so that history visibility doesn't freak out and
kill `/sync` requests dead.