* Declare new config
* Parse new config
* Read new config
* Don't use trial/our TestCase where it's not needed
Before:
```
$ time trial tests/events/test_utils.py > /dev/null
real 0m2.277s
user 0m2.186s
sys 0m0.083s
```
After:
```
$ time trial tests/events/test_utils.py > /dev/null
real 0m0.566s
user 0m0.508s
sys 0m0.056s
```
* Helper to upsert to event fields
without exceeding size limits.
* Use helper when adding invite/knock state
Now that we allow admins to include events in prejoin room state with
arbitrary state keys, be a good Matrix citizen and ensure they don't
accidentally create an oversized event.
* Changelog
* Move StateFilter tests
should have done this in #14668
* Add extra methods to StateFilter
* Use StateFilter
* Ensure test file enforces typed defs; alphabetise
* Workaround surprising get_current_state_ids
* Whoops, fix mypy
* Enable `--warn-redundant-casts` option in mypy
Doesn't do much but helps me sleep better at night.
* Changelog
* Fix name of the ignore
* Fix one more missed cast
Not sure why I didn't see this one locally, maybe I needed a poetry update
* Remove old comment
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
#11915 introduced the `@cached` `is_interested_in_room` method in
Synapse 1.55.0, which depends upon `get_aliases_for_room`. Add a missing
cache invalidation callback so that the `is_interested_in_room` cache is
invalidated when `get_aliases_for_room` is invalidated.
#13787 made `get_rooms_for_user` `@cached`. Add a missing cache
invalidation callback so that the `is_interested_in_presence` cache is
invalidated when `get_rooms_for_user` is invalidated.
Signed-off-by: Sean Quah <seanq@matrix.org>
Fixes#13655
This change uses ICU (International Components for Unicode) to improve boundary detection in user search.
This change also adds a new dependency on libicu-dev and pkg-config for the Debian packages, which are available in all supported distros.
When Synapse is terminated while running the background update to create
the `receipts_graph` or `receipts_linearized` indexes, the indexes may
be successfully created (or marked as invalid on postgres) while the
background update remains unfinished. When Synapse next starts up, the
background update will fail because the index already exists, or exists
but is invalid on postgres.
Use the existing code to create indices in background updates, since it
handles these edge cases.
Signed-off-by: Sean Quah <seanq@matrix.org>
This should help reduce the number of devices e.g. simple bots the repeatedly login rack up.
We only delete non-e2e devices as they should be safe to delete, whereas if we delete e2e devices for a user we may accidentally break their ability to receive e2e keys for a message.
This PR changes http-based image URLs to be https in html templates.
This impacts the Synapse SSO error page, where browsers report mixed
media content warnings.
Also, https://matrix.org/img/vector-logo-email.png is currently broken
but the URL has been updated to be https anyway.
Signed-off-by: Ashish Kumar <ashfame@users.noreply.github.com>
Due to the various fixes to the StreamChangeCache it is not
safe to trust the information in the user directory or room/user
stats tables. Rebuild them as background jobs.
In particular see da77720752 (#14639),
and 6a8310f3df (#14435).
Maybe also be related to fac8a38525
(#14592).
An empty cache does not mean the entity has no changed, if
it is earlier than the earliest known stream position return that
the entity *has* changed since the cache cannot accurately
answer that query.
A batch of changes intended to make it easier to trace to-device messages through the system.
The intention here is that a client can set a property org.matrix.msgid in any to-device message it sends. That ID is then included in any tracing or logging related to the message. (Suggestions as to where this field should be documented welcome. I'm not enthusiastic about speccing it - it's very much an optional extra to help with debugging.)
I've also generally improved the data we send to opentracing for these messages.
The internal methods of the StreamChangeCache were inconsistently
treating the earliest known stream position as valid. It is now treated as
invalid, meaning the cache cannot determine if an entity at the earliest
known stream position has changed or not.
Add logic to ClientRestResource to decide whether to mount servlets
or not based on whether the current process is a worker.
This is clearer to see what a worker runs than the completely separate /
copy & pasted list of servlets being mounted for workers.
StreamChangeCache.get_all_changed_entities can return None to signify
it does not have information at the given stream position. Two callers (related
to device lists and presence) were treating this response the same as an empty
list (i.e. there being no updates).
* Fix one typo on line 3700(and apparently do something to other lines, no idea)
* Update config_documentation.md with more information about how federation_senders and pushers settings can be handled.
Specifically, that the instance map style of config does not require the special other variables that enable and disable functionality and that a single worker CAN be added to the map not only just two or more.
* Extra line here for consistency and appearance.
* Add link to sygnal repo.
* Add deprecation notice to workers.md and point to the newer alternative method of defining this functionality.
* Changelog
* Correct version number of Synapse the deprecation is happening in.
* Update quiet deprecation with simple notice and suggestion.
This should help reduce the number of devices e.g. simple bots the repeatedly login rack up.
We only delete non-e2e devices as they should be safe to delete, whereas if we delete e2e devices for a user we may accidentally break their ability to receive e2e keys for a message.
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
* Support MSC1767's `content.body` behaviour in push rules
* Add the base rules from MSC3933
* Changelog entry
* Flip condition around for finding `m.markup`
* Remove forgotten import
* Add support for MSC3931: Room Version Supports push rule condition
* Create experimental flag for future work, and use it to gate MSC3931
* Changelog entry
* Use `device_one_time_keys_count` to match MSC3202
Rename the `device_one_time_key_counts` key in responses to
`device_one_time_keys_count` to match the name specified by MSC3202.
Also change related variable/class names for consistency.
Signed-off-by: Andrew Ferrazzutti <andrewf@element.io>
* Update changelog.d/14565.misc
* Revert name change for `one_time_key_counts` key
as this is a different key altogether from `device_one_time_keys_count`,
which is used for `/sync` instead of appservice transactions.
Signed-off-by: Andrew Ferrazzutti <andrewf@element.io>
`setup()` is run under the sentinel context manager, so we wrap the
initial update in a background process. Before this change, Synapse
would log two warnings on startup:
Starting db txn 'count_daily_users' from sentinel context
Starting db connection from sentinel context: metrics will be lost
Signed-off-by: Sean Quah <seanq@matrix.org>
Include the thread_id field when sending read receipts over
federation. This might result in the same user having multiple
read receipts per-room, meaning multiple EDUs must be sent
to encapsulate those receipts.
This restructures the PerDestinationQueue APIs to support
multiple receipt EDUs, queue_read_receipt now becomes linear
time in the number of queued threaded receipts in the room for
the given user, it is expected this is a small number since receipt
EDUs are sent as filler in transactions.
To perform an emulated upsert into a table safely, we must either:
* lock the table,
* be the only writer upserting into the table
* or rely on another unique index being present.
When the 2nd or 3rd cases were applicable, we previously avoided locking
the table as an optimization. However, as seen in #14406, it is easy to
slip up when adding new schema deltas and corrupt the database.
The only time we lock when performing emulated upserts is while waiting
for background updates on postgres. On sqlite, we do no locking at all.
Let's remove the option to skip locking tables, so that we don't shoot
ourselves in the foot again.
Signed-off-by: Sean Quah <seanq@matrix.org>
* GHA workflow to build complement images of key branches.
* Add changelog.d
* GHA workflow to build complement images of key branches.
* Add changelog.d
* Update complement.yml
Remove special casing for michaelk branch.
* Update complement.yml
Should run on master, develop not main, develop
* Rename file to be more obvious
* Merge did not go correctly.
* Setup 5am builds of develop, limit to one run at once.
* Fix crontab---run once at 5AM, not very minute between 5 and 6
* Fix cron syntax again?
* Tweak workflow name
* Allow manual debug runs
* Tweak indentation
Ctrl-Alt-L in PyCharm
Co-authored-by: David Robertson <david.m.robertson1@gmail.com>
Co-authored-by: David Robertson <davidr@element.io>
This commit adds support for handling a provided avatar picture URL
when logging in via SSO.
Signed-off-by: Ashish Kumar <ashfame@users.noreply.github.com>
Fixes#9357.
This was the last untyped handler from the HomeServer object. Since
it was being treated as Any (and thus unchecked) it was being used
incorrectly in a few places.
When a local device list change is added to
`device_lists_changes_in_room`, the `converted_to_destinations` flag is
set to `FALSE` and the `_handle_new_device_update_async` background
process is started. This background process looks for unconverted rows
in `device_lists_changes_in_room`, copies them to
`device_lists_outbound_pokes` and updates the flag.
To update the `converted_to_destinations` flag, the database performs a
`DELETE` and `INSERT` internally, which fragments the table. To avoid
this, track unconverted rows using a `(stream ID, room ID)` position
instead of the flag.
From now on, the `converted_to_destinations` column indicates rows that
need converting to outbound pokes, but does not indicate whether the
conversion has already taken place.
Closes#14037.
Signed-off-by: Sean Quah <seanq@matrix.org>
Avoid an n+1 query problem and fetch the bundled aggregations for
m.reference relations in a single query instead of a query per event.
This applies similar logic for as was previously done for edits in
8b309adb43 (#11660; threads
in b65acead42 (#11752); and
annotations in 1799a54a54 (#14491).
Avoid an n+1 query problem and fetch the bundled aggregations for
m.annotation relations in a single query instead of a query per event.
This applies similar logic for as was previously done for edits in
8b309adb43 (#11660) and threads
in b65acead42 (#11752).
* Attempt to fix federation-client devscript handling of .well-known
The script was setting the wrong value in the Host header
* Fix TLS verification
Turns out that actually doing TLS verification isn't that hard. Let's enable
it.
* Add tests for StreamIdGenerator
* Drive-by: annotate all defs
* Revert "Revert "Remove slaved id tracker (#14376)" (#14463)"
This reverts commit d63814fd73, which in
turn reverted 36097e88c4. This restores
the latter.
* Fix StreamIdGenerator not handling unpersisted IDs
Spotted by @erikjohnston.
Closes#14456.
* Changelog
Co-authored-by: Nick Mills-Barrett <nick@fizzadar.com>
Co-authored-by: Erik Johnston <erik@matrix.org>
Remove type hints from comments which have been added
as Python type hints. This helps avoid drift between comments
and reality, as well as removing redundant information.
Also adds some missing type hints which were simple to fill in.
As part of the database migration to support threaded receipts, there is
a possible window in between
`73/08thread_receipts_non_null.sql.postgres` removing the original
unique constraints on `receipts_linearized` and `receipts_graph` and the
`reeipts_linearized_unique_index` and `receipts_graph_unique_index`
background updates from `72/08thread_receipts.sql` completing where
the unique constraints on `receipts_linearized` and `receipts_graph` are
missing. Any emulated upserts on these tables must therefore be
performed with a lock held, otherwise duplicate rows can end up in the
tables when there are concurrent emulated upserts. Fix the missing lock.
Note that emulated upserts no longer happen by default on sqlite, since
the minimum supported version of sqlite supports native upserts by
default now.
Finally, clean up any duplicate receipts that may have crept in before
trying to create the `receipts_graph_unique_index` and
`receipts_linearized_unique_index` unique indexes.
Signed-off-by: Sean Quah <seanq@matrix.org>
We don't filter state usually, so doing so here is a waste of time. This is not much of an issue for clients that enable lazy loading of members, since there will be fewer state events.
This matches the multi instance writer ID generator class which can
both handle advancing the current token over replication and by calling
the database.
This code was factored out to a method, but also left in-place.
Calling this twice in a row makes no sense: the first call will reduce
the size appropriately, but the loop will immediately exit since the
cache size was already reduced.
PostgreSQL may underestimate the number of distinct `room_id`s in
`event_search`, which can cause it to use table scans for queries for
multiple rooms.
Fix this by setting `n_distinct` on the column.
Resolves#14402.
Signed-off-by: Sean Quah <seanq@matrix.org>
* Expose getting SYNAPSE_WORKER_TYPES from external, allowing override of workers requested.
* Add WORKER_TYPES variable option to complement.sh script that passes requested workers into start_for_complement.sh entrypoint.
* Update docs to reflect this new ability.
* Changelog
* Don't rely on soft wrapping to format long strings
Good idea dklimpel. Thanks for catching that.
Co-authored-by: Dirk Klimpel <5740567+dklimpel@users.noreply.github.com>
* Small nits just noticed in docs.
* Fixup new line in docs.
Co-authored-by: Dirk Klimpel <5740567+dklimpel@users.noreply.github.com>
When this background update did its last batch, it would try to update all the
events that had been inserted since the bgupdate started, which could cause a
table-scan. Make sure we limit the update correctly.
For forward compatibility, Synapse needs to ignore fields it does not
recognise instead of raising an error.
Fixes#14365.
Signed-off-by: Sean Quah <seanq@matrix.org>
Synapse 1.71.0rc2 (2022-11-04)
==============================
Please note that, as announced in the release notes for Synapse 1.69.0, legacy Prometheus metric names are now disabled by default.
They will be removed altogether in Synapse 1.73.0.
If not already done, server administrators should update their dashboards and alerting rules to avoid using the deprecated metric names.
See the [upgrade notes](https://matrix-org.github.io/synapse/v1.71/upgrade.html#upgrading-to-v1710) for more details.
Improved Documentation
----------------------
- Document the changes to monthly active user metrics due to deprecation of legacy Prometheus metric names. ([\#14358](https://github.com/matrix-org/synapse/issues/14358), [\#14360](https://github.com/matrix-org/synapse/issues/14360))
Deprecations and Removals
-------------------------
- Disable legacy Prometheus metric names by default. They can still be re-enabled for now, but they will be removed altogether in Synapse 1.73.0. ([\#14353](https://github.com/matrix-org/synapse/issues/14353))
Internal Changes
----------------
- Run unit tests against Python 3.11. ([\#13812](https://github.com/matrix-org/synapse/issues/13812))
4f5d492cd6a9438de03d1b768f4c220cb662ac06
The release branch CI is failing because poetry seems unable to install
wrapt 1.13.3 when run under CPython 3.11. Develop has already bumped
wrapt for 3.11 compatibility. Cherry-pick that commit here to try and
get CI going again.
Run when an issue is labelled with X-Needs-Info only. Add to triage board.
Use itemId which is output by actions/add-to-project to run the mutation to update the field value (i.e. move to the right column).
If configured an OIDC IdP can log a user's session out of
Synapse when they log out of the identity provider.
The IdP sends a request directly to Synapse (and must be
configured with an endpoint) when a user logs out.
* Introduce a test for the old behaviour which we want to restore
* Reintroduce the old behaviour in a simpler way
* Newsfile
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
* Use 1 credit instead of 2 for creating a room: be more lenient than before
Notably, the UI in Element Web was still broken after restoring to prior behaviour.
After discussion, we agreed that it would be sensible to increase the limit.
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
PostgreSQL 14 changed the behavior of `websearch_to_tsquery` to
improve some behaviour.
The tests were hitting those edge-cases about handling of hanging double
quotes. This fixes the tests to take into account the PostgreSQL version.
* Add workers settings to configuration manual
* Update `pusher_instances`
* update url to python logger
* update headlines
* update links after headline change
* remove link from `daemon process`
There is no docs in Synapse for this
* extend example for `federation_sender_instances` and `pusher_instances`
* more infos about stream writers
* add link to DAG
* update `pusher_instances`
* update `worker_listeners`
* update `stream_writers`
* Update `worker_name`
Co-authored-by: David Robertson <davidr@element.io>
1. `federation_client.timestamp_to_event(...)` now handles all `destination` looping and uses our generic `_try_destination_list(...)` helper.
2. Consistently handling `NotRetryingDestination` and `FederationDeniedError` across `get_pdu` , backfill, and the generic `_try_destination_list` which is used for many places we use this pattern.
3. `get_pdu(...)` now returns `PulledPduInfo` so we know which `destination` we ended up pulling the PDU from
Fixes check_avatar_size_and_mime_type() to successfully update avatars on homeservers running on non-default ports which it would mistakenly treat as remote homeserver while validating the avatar's size and mime type.
Signed-off-by: Ashish Kumar ashfame@users.noreply.github.com
Support a unified search query syntax which leverages more of the full-text
search of each database supported by Synapse.
Supports, with the same syntax across Postgresql 11+ and Sqlite:
- quoted "search terms"
- `AND`, `OR`, `-` (negation) operators
- Matching words based on their stem, e.g. searches for "dog" matches
documents containing "dogs".
This is achieved by
- If on postgresql 11+, pass the user input to `websearch_to_tsquery`
- If on sqlite, manually parse the query and transform it into the sqlite-specific
query syntax.
Note that postgresql 10, which is close to end-of-life, falls back to using
`phraseto_tsquery`, which only supports a subset of the features.
Multiple terms separated by a space are implicitly ANDed.
Note that:
1. There is no escaping of full-text syntax that might be supported by the database;
e.g. `NOT`, `NEAR`, `*` in sqlite. This runs the risk that people might discover this
as accidental functionality and depend on something we don't guarantee.
2. English text is assumed for stemming. To support other languages, either the target
language needs to be known at the time of indexing the message (via room metadata,
or otherwise), or a separate index for each language supported could be created.
Sqlite docs: https://www.sqlite.org/fts3.html#full_text_index_queries
Postgres docs: https://www.postgresql.org/docs/11/textsearch-controls.html
This implements a fake OIDC server, which intercepts calls to the HTTP client.
Improves accuracy of tests by covering more internal methods.
One particular example was the ID token validation, which previously mocked.
This uncovered an incorrect dependency: Synapse actually requires at least
authlib 0.15.1, not 0.14.0.
* Return NOT_JSON if decode fails and defer set_timeline_upper_limit call until after check_valid_filter. Fixes#13661. Signed-off-by: Ryan Miguel <miguel.ryanj@gmail.com>.
* Reword changelog
Use a base template to create a cohesive feel across the HTML
templates provided by Synapse.
Adds basic styling to the base template for a more user-friendly
look and feel.
When the last event in a thread is redacted we need to update
the threads table:
* Find the new latest event in the thread and store it into the table; or
* Remove the thread from the table if it is no longer a thread (i.e. all
events in the thread were redacted).
* Show erasure status when listing users in the Admin API
* Use USING when joining erased_users
* Add changelog entry
* Revert "Use USING when joining erased_users"
This reverts commit 30bd2bf106415caadcfdbdd1b234ef2b106cc394.
* Make the erased check work on postgres
* Add a testcase for showing erased user status
* Appease the style linter
* Explicitly convert `erased` to bool to make SQLite consistent with Postgres
This also adds us an easy way in to fix the other accidentally integered columns.
* Move erasure status test to UsersListTestCase
* Include user erased status when fetching user info via the admin API
* Document the erase status in user_admin_api
* Appease the linter and mypy
* Signpost comments in tests
Co-authored-by: Tadeusz Sośnierz <tadeusz@sosnierz.com>
Co-authored-by: David Robertson <david.m.robertson1@gmail.com>
Fix MSC3030 `/timestamp_to_event` endpoint returning `outliers` that it has no idea whether are near a gap or not (and therefore unable to determine whether it's actually the closest event). The reason Synapse doesn't know whether an `outlier` is next to a gap is because our gap checks rely on entries in the `event_edges`, `event_forward_extremeties`, and `event_backward_extremities` tables which is [not the case for `outliers`](2c63cdcc3f/docs/development/room-dag-concepts.md (outliers)).
Also fixes MSC3030 Complement `can_paginate_after_getting_remote_event_from_timestamp_to_event_endpoint` test flake. Although this acted flakey in Complement, if `sync_partial_state` raced and beat us before `/timestamp_to_event`, then even if we retried the failing `/context` request it wouldn't work until we made this Synapse change. With this PR, Synapse will never return an `outlier` event so that test will always go and ask over federation.
Fix https://github.com/matrix-org/synapse/issues/13944
### Why did this fail before? Why was it flakey?
Sleuthing the server logs on the [CI failure](https://github.com/matrix-org/synapse/actions/runs/3149623842/jobs/5121449357#step:5:5805), it looks like `hs2:/timestamp_to_event` found `$NP6-oU7mIFVyhtKfGvfrEQX949hQX-T-gvuauG6eurU` as an `outlier` event locally. Then when we went and asked for it via `/context`, since it's an `outlier`, it was filtered out of the results -> `You don't have permission to access that event.`
This is reproducible when `sync_partial_state` races and persists `$NP6-oU7mIFVyhtKfGvfrEQX949hQX-T-gvuauG6eurU` as an `outlier` before we evaluate `get_event_for_timestamp(...)`. To consistently reproduce locally, just add a delay at the [start of `get_event_for_timestamp(...)`](cb20b885cb/synapse/handlers/room.py (L1470-L1496)) so it always runs after `sync_partial_state` completes.
```py
from twisted.internet import task as twisted_task
d = twisted_task.deferLater(self.hs.get_reactor(), 3.5)
await d
```
In a run where it passes, on `hs2`, `get_event_for_timestamp(...)` finds a different event locally which is next to a gap and we request from a closer one from `hs1` which gets backfilled. And since the backfilled event is not an `outlier`, it's returned as expected during `/context`.
With this PR, Synapse will never return an `outlier` event so that test will always go and ask over federation.
* Don't pin dev-deps in pyproject; use lower bounds
This makes it slightly less tedious to update these things via
successive dependabot updates, by reducing the likelihood of a merge
conflict.
* Changelog
* Changelog
* Fix `track_memory_usage` on poetry-core 1.3.x installations
The same kind of problem as discussed in #14085:
1. we defined an extra with an underscore
2. we look it up at runtime with an underscore
3. but poetry-core 1.3.x. installs it with a dash, causing (2) to fail.
Fix by using a dash everywhere.
* Changelog
Spawned while investigating https://github.com/matrix-org/synapse/issues/13944
This way we might get some more context whenever an `403 Forbidden - body: {"errcode":"M_FORBIDDEN","error":"You don't have permission to access that event."}` error is produced.
`log_config.yaml`
```yaml
loggers:
synapse:
level: INFO
synapse.visibility:
level: DEBUG
```
This should fix a race where the event notification comes in over
replication before the state replication, leaving a window during
which a sync may get an incorrect list of rooms for the user.
While https://github.com/matrix-org/synapse/pull/13635 stops us from doing the slow thing after we've already done it once, this PR stops us from doing one of the slow things in the first place.
Related to
- https://github.com/matrix-org/synapse/issues/13622
- https://github.com/matrix-org/synapse/pull/13635
- https://github.com/matrix-org/synapse/issues/13676
Part of https://github.com/matrix-org/synapse/issues/13356
Follow-up to https://github.com/matrix-org/synapse/pull/13815 which tracks event signature failures.
With this PR, we avoid the call to the costly `_get_state_ids_after_missing_prev_event` because the signature failure will count as an attempt before and we filter events based on the backoff before calling `_get_state_ids_after_missing_prev_event` now.
For example, this will save us 156s out of the 185s total that this `matrix.org` `/messages` request. If you want to see the full Jaeger trace of this, you can drag and drop this `trace.json` into your own Jaeger, https://gist.github.com/MadLittleMods/4b12d0d0afe88c2f65ffcc907306b761
To explain this exact scenario around `/messages` -> backfill, we call `/backfill` and first check the signatures of the 100 events. We see bad signature for `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` and `$zuOn2Rd2vsC7SUia3Hp3r6JSkSFKcc5j3QTTqW_0jDw` (both member events). Then we process the 98 events remaining that have valid signatures but one of the events references `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` as a `prev_event`. So we have to do the whole `_get_state_ids_after_missing_prev_event` rigmarole which pulls in those same events which fail again because the signatures are still invalid.
- `backfill`
- `outgoing-federation-request` `/backfill`
- `_check_sigs_and_hash_and_fetch`
- `_check_sigs_and_hash_and_fetch_one` for each event received over backfill
- ❗ `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` fails with `Signature on retrieved event was invalid.`: `unable to verify signature for sender domain xxx: 401: Failed to find any key to satisfy: _FetchKeyRequest(...)`
- ❗ `$zuOn2Rd2vsC7SUia3Hp3r6JSkSFKcc5j3QTTqW_0jDw` fails with `Signature on retrieved event was invalid.`: `unable to verify signature for sender domain xxx: 401: Failed to find any key to satisfy: _FetchKeyRequest(...)`
- `_process_pulled_events`
- `_process_pulled_event` for each validated event
- ❗ Event `$Q0iMdqtz3IJYfZQU2Xk2WjB5NDF8Gg8cFSYYyKQgKJ0` references `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` as a `prev_event` which is missing so we try to get it
- `_get_state_ids_after_missing_prev_event`
- `outgoing-federation-request` `/state_ids`
- ❗ `get_pdu` for `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` which fails the signature check again
- ❗ `get_pdu` for `$zuOn2Rd2vsC7SUia3Hp3r6JSkSFKcc5j3QTTqW_0jDw` which fails the signature check
The root node of a thread (and events related to it) are considered
"part of a thread" when validating receipts. This allows clients which
show the root node in both the main timeline and the threaded timeline
to easily send receipts in either.
Note that threaded notifications are not created for these events, these
events created notifications on the main timeline.
The callers either set a default limit or manually handle a None-limit
later on (by setting a default value).
Update the callers to always instantiate PaginationConfig with a default
limit and then assume the limit is non-None.
Stabilize the threads API (MSC3856) by supporting (only) the v1
path for the endpoint.
This also marks the API as safe for workers since it is a read-only
API.
Implement the /threads endpoint from MSC3856.
This is currently unstable and behind an experimental configuration
flag.
It includes a background update to backfill data, results from
the /threads endpoint will be partial until that finishes.
**Before:**
```
WARNING - POST-11 - Unable to parse JSON: Expecting value: line 1 column 1 (char 0) (b'')
```
**After:**
```
WARNING - POST-11 - Unable to parse JSON from POST /_matrix/client/v3/join/%21ZlmJtelqFroDRJYZaq:hs1?server_name=hs1 response: Expecting value: line 1 column 1 (char 0) (b'')
```
---
It's possible to figure out which endpoint these warnings were coming from before but you had to follow the request ID `POST-11` to the log line that says `Completed request [...]`. Including this key information next to the JSON parsing error makes it much easier to reason whether it matters or not.
```
2022-09-29T08:23:25.7875506Z synapse_main | 2022-09-29 08:21:10,336 - synapse.http.matrixfederationclient - 299 - INFO - POST-11 - {GET-O-13} [hs1] Completed request: 200 OK in 0.53 secs, got 450 bytes - GET matrix://hs1/_matrix/federation/v1/make_join/%21ohtKoQiXlPePSycXwp%3Ahs1/%40charlie%3Ahs2?ver=1&ver=2&ver=3&ver=4&ver=5&ver=6&ver=org.matrix.msc2176&ver=7&ver=8&ver=9&ver=org.matrix.msc3787&ver=10&ver=org.matrix.msc2716v4
```
---
As a note, having no `body` is normal for the `/join` endpoint and it can handle it.
0c853e0970/synapse/rest/client/room.py (L398-L403)
Alternatively we could remove these extra logs but they are probably more usually helpful to figure out what went wrong.
Fixes two related bugs:
* No edit information was bundled for events which aren't `m.room.message`.
* `m.new_content` was not applied for those events.
* Revert to prior build-system requirements
This reverts #14080.
* Use normalised extra name, which poetry-core 1.3 will generate anyway
* Changelog
* Upper bound build-system requirements
* Remove upgrade note; expand changelog entry a little.
* Fix typo in build-system comment
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
Attempt to parse any valid information from an oEmbed response
(instead of bailing at the first unexpected data). This should allow
for more partial oEmbed data to be returned, resulting in better /
more URL previews, even if those URL previews are only partial.
Fixes two related bugs:
* The handling of `[null]` for a `room_types` filter was incorrect.
* The ordering of arguments when providing both a network tuple
and room type field was incorrect.
By getting the joined rooms before the current token we avoid any reading
history to confirm a user *was* in a room. We can then use any membership
change events, which we already fetch during sync, to determine the final
list of joined room IDs.
Applies the proper logic for unthreaded and threaded receipts to either
apply to all events in the room or only events in the same thread, respectively.
* Fix building wheels on OSX
Follow-up to #13983. I missed a breaking change in setup-python v4.
Serves me right for rushing to cut through the dependabot spam.
* Changelog
* Merge changelog
When retrieving counts of notifications segment the results based on the
thread ID, but choose whether to return them as individual threads or as
a single summed field by letting the client opt-in via a sync flag.
The summarization code is also updated to be per thread, instead of per
room.
Implements MSC2832 by sending application service access
tokens in the Authorization header.
The access token is also still sent as a query parameter until
the application service ecosystem has fully migrated to using
headers. In the future this could be made opt-in, or removed
completely.
Keep the old behavior (of including the original_event field) for any
requests to the /unstable version of the endpoint, but do not include
the field when the /v1 version is used.
This should avoid new clients from depending on this field, but will
not help with current dependencies.
MSC3316 declares that both /rooms/{roomId}/send and /rooms/{roomId}/state
should accept a ts parameter for appservices. This change expands support
to /state and adds tests.
Instead of running a single large query, run a single query for
user-only lookups and additional queries for batches of user device
lookups.
Resolves#13580.
Signed-off-by: Sean Quah <seanq@matrix.org>
Spawned while working on [`get_users_in_room` mis-uses](https://github.com/matrix-org/synapse/pull/13958#discussion_r984074897) and thinking we could use `get_local_users_in_room` here but we can't.
From first glance, it seemed like this was only using local users from all of the `is_mine_id(user_id)` checks but I see that it does actually use remote users. Just making things a little more clear here what it does and mentions remote users so maybe that will be more obvious in the future.
We move the expensive check of visibility to after calculating push actions, avoiding the expensive check for users who won't get pushed anyway.
I think this should have a big impact on rooms with large numbers of local users that have pushed disabled.
Fixes#13942. Introduced in #13575.
Basically, let's only get the ordered set of hosts out of the DB if we need an ordered set of hosts. Since we split the function up the caching won't be as good, but I think it will still be fine as e.g. multiple backfill requests for the same room will hit the cache.
There is no need to grab thousands of backfill points when we only need 5 to make the `/backfill` request with. We need to grab a few extra in case the first few aren't visible in the history.
Previously, we grabbed thousands of backfill points from the database, then sorted and filtered them in the app. Fetching the 4.6k backfill points for `#matrix:matrix.org` from the database takes ~50ms - ~570ms so it's not like this saves a lot of time 🤷. But it might save us more time now that `get_backfill_points_in_room`/`get_insertion_event_backward_extremities_in_room` are more complicated after https://github.com/matrix-org/synapse/pull/13635
This PR moves the filtering and limiting to the SQL query so we just have less data to work with in the first place.
Part of https://github.com/matrix-org/synapse/issues/13356
c.f. #12993 (comment), point 3
This stores all device list updates that we receive while partial joins are ongoing, and processes them once we have the full state.
Note: We don't actually process the device lists in the same ways as if we weren't partially joined. Instead of updating the device list remote cache, we simply notify local users that a change in the remote user's devices has happened. I think this is safe as if the local user requests the keys for the remote user and we don't have them we'll simply fetch them as normal.
This PR begins work on batching up events during the creation of a room. The PR splits out the creation and sending/persisting of the events. The first three events in the creation of the room-creating the room, joining the creator to the room, and the power levels event are sent sequentially, while the subsequent events are created and collected to be sent at the end of the function. This is currently done by appending them to a list and then iterating over the list to send, the next step (after this PR) would be to send and persist the collected events as a batch.
Fix https://github.com/matrix-org/synapse/issues/13856
Fix https://github.com/matrix-org/synapse/issues/13865
> Discovered while trying to make Synapse fast enough for [this MSC2716 test for importing many batches](https://github.com/matrix-org/complement/pull/214#discussion_r741678240). As an example, disabling the `have_seen_event` cache saves 10 seconds for each `/messages` request in that MSC2716 Complement test because we're not making as many federation requests for `/state` (speeding up `have_seen_event` itself is related to https://github.com/matrix-org/synapse/issues/13625)
>
> But this will also make `/messages` faster in general so we can include it in the [faster `/messages` milestone](https://github.com/matrix-org/synapse/milestone/11).
>
> *-- https://github.com/matrix-org/synapse/issues/13856*
### The problem
`_invalidate_caches_for_event` doesn't run in monolith mode which means we never even tried to clear the `have_seen_event` and other caches. And even in worker mode, it only runs on the workers, not the master (AFAICT).
Additionally there was bug with the key being wrong so `_invalidate_caches_for_event` never invalidates the `have_seen_event` cache even when it does run.
Because we were using the `@cachedList` wrong, it was putting items in the cache under keys like `((room_id, event_id),)` with a `set` in a `set` (ex. `(('!TnCIJPKzdQdUlIyXdQ:test', '$Iu0eqEBN7qcyF1S9B3oNB3I91v2o5YOgRNPwi_78s-k'),)`) and we we're trying to invalidate with just `(room_id, event_id)` which did nothing.
Since MSC3715 has passed FCP, the stable parameter can be used.
This currently falls back to the unstable parameter if the stable
parameter is not provided (and MSC3715 support is enabled in
the configuration).
Since #11482, we're saving sessions IDs from upstream IdPs, but we've been losing them when the user goes through a user mapping session on account registration.
During a `lazy_load_members` `/sync`, we look through auth events in
rooms with partial state to find prior membership events. When such a
membership is not found, an error is logged.
Since the first join event for a user never has a prior membership event
to cite, the error would always be logged when one appeared in the room
timeline.
Avoid logging errors for such events.
Introduced in #13477.
Signed-off-by: Sean Quah <seanq@matrix.org>
This should mean that logs from worker processes are flushed before shutdown.
When a test completes, Complement stops the docker container, which means that
synapse will receive a SIGTERM. Currently, the `complement_fork_starter` exits
immediately (without notifying the worker processes), which means that the
workers never get a chance to flush their logs before the whole container is
vaped. We can fix this by propagating the SIGTERM to the children.
This moves all the invalidations into a single place and de-duplicates
the code involved in invalidating caches for a given event by using
the base class method.
* Lockfile: update canonicaljson 1.6.0 -> 1.6.3
* Fix mypy errors with latest canonicaljson
The change to `_encode_json_bytes` definition wasn't sufficient:
```
synapse/http/server.py:751: error: Incompatible types in assignment (expression has type "Callable[[Arg(object, 'json_object')], bytes]", variable has type "Callable[[Arg(object, 'data')], bytes]") [assignment]
```
Which I think is mypy warning us that the two functions accept different
sets of kwargs. Fair enough!
* Changelog
Part of the work for #12993.
Once #12993 is fully resolved, we expect `/keys/changes` to behave
sensibly when joined to a room with partial state.
Signed-off-by: Sean Quah <seanq@matrix.org>
Use the provided list of servers in the room from the `/send_join`
response, since we will not know which users are in the room. This
isn't sufficient to ensure that all remote servers receive the right
device list updates, since the `/send_join` response may be inaccurate
or we may calculate the membership state of new users in the room
incorrectly.
Signed-off-by: Sean Quah <seanq@matrix.org>
This fixes a bug where the `/relations` API with `dir=f` would
skip the first item of each page (except the first page), causing
incomplete data to be returned to the client.
* Generate separate snapshots for sqlite, postgres and common
* Cleanup postgres dbs in the TRAP
* Say which logical DB we're applying updates to
* Run background updates on the state DB
* Add new option for accepting a SCHEMA_NUMBER
Adds a `thread_id` column to the `event_push_actions`, `event_push_actions_staging`,
and `event_push_summary` tables. This will notifications to be segmented by the thread
in a future pull request. The `thread_id` column stores the root event ID or the special
value `"main"`.
The `thread_id` column for `event_push_actions` and `event_push_summary` is
backfilled with `"main"` for all existing rows. New entries into `event_push_actions`
and `event_push_actions_staging` will get the proper thread ID.
`receipts_linearized` and `receipts_graph` also gain a `thread_id` column, which is similar,
except `NULL` is a special value meaning the receipt is "unthreaded".
See MSC3771 and MSC3773 for where this data will be useful.
Partial indices have been supported since SQLite 3.8, but Synapse
now requires >= 3.27, so we can enable support for them.
This requires rebuilding previous indices which were partial on
PostgreSQL, but not on SQLite.
* Remove incorrect migration file from `state` logical DB
The table `ex_outlier_stream` is part of the `main` logical DB; it
should not have been created in the `state` logical DB. We remove this
migration now as a tidy-up.
Note: we cannot `DROP TABLE IF EXISTS ex_outlier_stream` in a new
migration, because some (most) instances of Synapse host both of these
logical DBs on the same DB cluster.
* Changelog
When a remote user leaves the last room shared with the homeserver, we
have to mark their device list as unsubscribed, otherwise we would hold
on to a stale device list in our cache. Crucially, the device list would
remain cached even after the remote user rejoined the room, which could
lead to E2EE failures until the next change to the remote user's device
list.
Fixes#13651.
Signed-off-by: Sean Quah <seanq@matrix.org>
* Don't accept a trailing slash on the end of /get_missing_events
* Newsfile
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
* Remove checks for membership column in current_state_events
* Add schema script to force through the
`current_state_events_membership` background job
Contributed by Nick @ Beeper (@fizzadar).
Most of the time this function is heavily cached, but when that isn't
the case fetching the counts room by room slows down push delivery on
users with many (thousands) of rooms.
Signed off by Nick @ Beeper.
The problem with many services is that it makes it hard to find which service has the trace you want, see https://github.com/jaegertracing/jaeger-ui/issues/985
Previously, we split traces out into services based on their instance name like `matrix.org client_reader-1`, etc but there are many worker instances of the same `client_reader` so there is a lot to click through.
With this PR, all of the traces are just collected under the worker type like `client_reader`, `event_persister` 😇
Note: A Synapse worker instance name is an opaque string with the number convention only being our own thing for the `matrix.org` deployment. But seems pretty sensible to group things this way.
Update the docstrings for `get_users_in_room` and
`get_current_hosts_in_room` to explain the impact of partial state.
Signed-off-by: Sean Quah <seanq@matrix.org>
Handle malformed user IDs with no colons in `get_current_hosts_in_room`.
It's not currently possible for a malformed user ID to join a room, so
this error would never be hit.
Signed-off-by: Sean Quah <seanq@matrix.org>
Previously, `is_mine_id` would raise an exception when passed an ID with
no colons. Return `False` instead.
Fixes#13040.
Signed-off-by: Sean Quah <seanq@matrix.org>
When backfilling, `_get_state_ids_after_missing_prev_event` calls [`get_metadata_for_events`](26bc26586b/synapse/handlers/federation_event.py (L1133)). For `#matrix:matrix.org`, it's called with 77k `state_events` which means 77 calls to the database and takes 28 seconds.
This is a re-do of 57d334a13d (#13365),
which was backed out in 12abd72497 (#13501).
The `room_id` field represented the parent space for each room
and was made redundant by changes in the API shape where the
`children_state` is now nested underneath each `room`.
The room ID of each child is in the `state_key` field and is still
available.
This avoids doing work that will never be used (since the
resulting unread counts will never be sent in a /sync
response).
The negative of doing this is that unread counts will be
incorrect when the feature is initially enabled.
* Add monthly active users documentation
* changelog
* Tidy up notes
* more tidyup
* Rewrite #1
* link back to mau docs
* fix links
* s/appservice|AS/application service
* further review
* a newline
* Remove bit about shadow banned users.
I think talking about them is confusing, and the current text doesn't imply they get any special treatment.
* Update docs/usage/administration/monthly_active_users.md
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
* Update docs/usage/administration/monthly_active_users.md
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
Co-authored-by: Brendan Abolivier <babolivier@matrix.org>
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
The method doesn't actually do any data fetching and the method that
does, `_get_joined_profile_from_event_id`, has its own cache.
Signed off by Nick @ Beeper (@Fizzadar).
The --force flag of dpkg-statoverride has been deprecated (apparently starting
with the dpkg version in Debian buster). It offers --force-all as q quick fix,
but the usage in the Debian postinst script is probably covered by
--force-statoverride-add.
Fixes: #8391
Signed-off-by: Jörg Behrmann <behrmann@physik.fu-berlin.de>
We incorrectly didn't use the returned `Responder` if the client had
disconnected, which meant that the resource used by the Responder
wasn't correctly released.
In particular, this exhausted the thread pools so that *all* requests
timed out.
Media downloaded as part of a URL preview is normally deleted after two days.
However, while a background database migration is running, the process is
stopped. A long-running database migration can therefore cause the media
store to fill up with old preview files.
This logic was added in #2697 to make sure that we didn't try to run the expiry
without an index on `local_media_repository.created_ts`; the original logic that
needs that index was added in #2478 (in `get_url_cache_media_before`, as
amended by 93247a424a), and is still present.
Given that the background update was added before Synapse v1.0.0, just drop
this check and assume the index exists.
Optimize how we calculate `likely_domains` during backfill because I've seen this take 17s in production just to `get_current_state` which is used to `get_domains_from_state` (see case [*2. Loading tons of events* in the `/messages` investigation issue](https://github.com/matrix-org/synapse/issues/13356)).
There are 3 ways we currently calculate hosts that are in the room:
1. `get_current_state` -> `get_domains_from_state`
- Used in `backfill` to calculate `likely_domains` and `/timestamp_to_event` because it was cargo-culted from `backfill`
- This one is being eliminated in favor of `get_current_hosts_in_room` in this PR 🕳
1. `get_current_hosts_in_room`
- Used for other federation things like sending read receipts and typing indicators
1. `get_hosts_in_room_at_events`
- Used when pushing out events over federation to other servers in the `_process_event_queue_loop`
Fix https://github.com/matrix-org/synapse/issues/13626
Part of https://github.com/matrix-org/synapse/issues/13356
Mentioned in [internal doc](https://docs.google.com/document/d/1lvUoVfYUiy6UaHB6Rb4HicjaJAU40-APue9Q4vzuW3c/edit#bookmark=id.2tvwz3yhcafh)
### Query performance
#### Before
The query from `get_current_state` sucks just because we have to get all 80k events. And we see almost the exact same performance locally trying to get all of these events (16s vs 17s):
```
synapse=# SELECT type, state_key, event_id FROM current_state_events WHERE room_id = '!OGEhHVWSdvArJzumhm:matrix.org';
Time: 16035.612 ms (00:16.036)
synapse=# SELECT type, state_key, event_id FROM current_state_events WHERE room_id = '!OGEhHVWSdvArJzumhm:matrix.org';
Time: 4243.237 ms (00:04.243)
```
But what about `get_current_hosts_in_room`: When there is 8M rows in the `current_state_events` table, the previous query in `get_current_hosts_in_room` took 13s from complete freshness (when the events were first added). But takes 930ms after a Postgres restart or 390ms if running back to back to back.
```sh
$ psql synapse
synapse=# \timing on
synapse=# SELECT COUNT(DISTINCT substring(state_key FROM '@[^:]*:(.*)$'))
FROM current_state_events
WHERE
type = 'm.room.member'
AND membership = 'join'
AND room_id = '!OGEhHVWSdvArJzumhm:matrix.org';
count
-------
4130
(1 row)
Time: 13181.598 ms (00:13.182)
synapse=# SELECT COUNT(*) from current_state_events where room_id = '!OGEhHVWSdvArJzumhm:matrix.org';
count
-------
80814
synapse=# SELECT COUNT(*) from current_state_events;
count
---------
8162847
synapse=# SELECT pg_size_pretty( pg_total_relation_size('current_state_events') );
pg_size_pretty
----------------
4702 MB
```
#### After
I'm not sure how long it takes from complete freshness as I only really get that opportunity once (maybe restarting computer but that's cumbersome) and it's not really relevant to normal operating times. Maybe you get closer to the fresh times the more access variability there is so that Postgres caches aren't as exact. Update: The longest I've seen this run for is 6.4s and 4.5s after a computer restart.
After a Postgres restart, it takes 330ms and running back to back takes 260ms.
```sh
$ psql synapse
synapse=# \timing on
Timing is on.
synapse=# SELECT
substring(c.state_key FROM '@[^:]*:(.*)$') as host
FROM current_state_events c
/* Get the depth of the event from the events table */
INNER JOIN events AS e USING (event_id)
WHERE
c.type = 'm.room.member'
AND c.membership = 'join'
AND c.room_id = '!OGEhHVWSdvArJzumhm:matrix.org'
GROUP BY host
ORDER BY min(e.depth) ASC;
Time: 333.800 ms
```
#### Going further
To improve things further we could add a `limit` parameter to `get_current_hosts_in_room`. Realistically, we don't need 4k domains to choose from because there is no way we're going to query that many before we a) probably get an answer or b) we give up.
Another thing we can do is optimize the query to use a index skip scan:
- https://wiki.postgresql.org/wiki/Loose_indexscan
- Index Skip Scan, https://commitfest.postgresql.org/37/1741/
- https://www.timescale.com/blog/how-we-made-distinct-queries-up-to-8000x-faster-on-postgresql/
If things like the signing key file are missing, let's just try to generate
them on startup.
Again, this is useful for k8s-like deployments where we just want to generate
keys on the first run.
* Update debian packaging to debhelper version 12
Don't call dh_installinit anymore, because it has been deprecated, and use
dh_installsystemd instead of dh_systemd_enable for the same reason.
Signed-off-by: Jörg Behrmann <behrmann@physik.fu-berlin.de>
* Drop preinst script
It was used for reasons of interactions of dh_systemd_start and dh_installinit,
which have both be deprecated
Signed-off-by: Jörg Behrmann <behrmann@physik.fu-berlin.de>
* Drop /etc/default file
It was no longer being installed.
* Remove debian/compat file
This is managed by the control file nowadays
GitHub appears to be deprecating addProjectNextItem by not allowing it to be used alongside projectV2 to get the project ID, so switching to using addProjectV2ItemById instead.
When loading current ids, sort by stream ID so that we don't want to overwrite the `current_position` of an instance to a lower stream ID than we're actually at ([discussion](https://github.com/matrix-org/synapse/pull/13585#discussion_r951795379)). Previously, it sorted alphabetically by instance name which can be `null` and throw errors but more importantly, accomplishes nothing.
Fixes the following startup error which is why I started looking into this area:
```
$ poetry run synapse_homeserver --config-path homeserver.yaml
****************************************************************
Error during initialisation:
'<' not supported between instances of 'NoneType' and 'str'
There may be more information in the logs.
****************************************************************
```
Somehow my database ended up looking like the following, notice the `instance_name` is `null` in the db, and we can't sort `NoneType` things. Another question is why do we see the `instance_name` as `null` sometimes instead of `master` in monolith mode?
```
$ psql synapse
synapse=# SELECT * FROM stream_positions;
stream_name | instance_name | stream_id
-----------------+---------------+-----------
account_data | master | 1242
events | master | 1787
to_device | master | 58
presence_stream | master | 485638
receipts | master | 341
backfill | master | -139106
(6 rows)
synapse=# SELECT instance_name, stream_id FROM receipts_linearized;
instance_name | stream_id
---------------+-----------
| 211
| 3
| 4
| 212
| 213
| 224
| 228
| 164
| 313
| 253
| 38
| 321
| 324
| 189
| 192
| 193
| 194
| 195
| 197
| 198
| 275
| 79
| 339
| 340
| 82
| 341
| 84
| 85
| 91
| 119
```
Use dedicated `get_local_users_in_room` to find local users when calculating `join_authorised_via_users_server` ("the authorising user for joining a restricted room") of a `/make_join` request.
Found while working on https://github.com/matrix-org/synapse/pull/13575#discussion_r953023755 but it's not related.
This speeds things up by ~2x.
The vast majority of the time is now spent in `LruCache` moving things around the linked lists.
We do this via two things:
1. Don't create a deferred per-key during bulk set operations in `DeferredCache`. Instead, only create them if a subsequent caller asks for the key.
2. Add a bulk lookup API to `DeferredCache` rather than use a loop.
Part of #13019
This changes all the permission-related methods to rely on the Requester instead of the UserID. This is a first step towards enabling scoped access tokens at some point, since I expect the Requester to have scope-related informations in it.
It also changes methods which figure out the user/device/appservice out of the access token to return a Requester instead of something else. This avoids having store-related objects in the methods signatures.
Use a state filter or accept partial state in a few places where we
request state, to avoid blocking.
To make lazy-loading `/sync`s work, we need to provide the memberships
of event senders, which are not guaranteed to be in the room state.
Instead we dig through auth events for memberships to present to
clients. The auth events of an event are guaranteed to contain a
passable membership event, otherwise the event would have been rejected.
Note that this only covers the common code paths encountered during
testing. There has been no exhaustive checking of all sync code paths.
Fixes#13146.
Signed-off-by: Sean Quah <seanq@matrix.org>
Broke by #13522
It looks like we have some rules in the DB with a priority class less
than 0 that don't override the base rules. Before these were just
dropped, but #13522 made that a hard error.
This improves load times for push rules:
| Version | Time per user | Time for 1k users |
| -------------------- | ------------- | ----------------- |
| Before | 138 µs | 138ms |
| Now (with custom) | 2.11 µs | 2.11ms |
| Now (without custom) | 49.7 ns | 0.05 ms |
This therefore has a large impact on send times for rooms
with large numbers of local users in the room.
This reverts commit f383b9b3ec. Other PRs
were seeing mypy failures that looked to be related to mypy-zope.
Confusingly, we didn't see this on #13521.
Revert this for now and investigate later.
* Clarifies comments.
* Fixes an erroneous comment (about return type) added in #13455
(ec24813220).
* Clarifies the name of a variable.
* Simplifies logic of pulling out the latest join for the requesting user.
```py
@trace
@tag_args
async def get_oldest_event_ids_with_depth_in_room(...)
...
```
Before this PR, you would see a warning in the logs and the span was not exported:
```
2022-08-03 19:11:59,383 - synapse.logging.opentracing - 835 - ERROR - GET-0 - @trace may not have wrapped EventFederationWorkerStore.get_oldest_event_ids_with_depth_in_room correctly! The function is not async but returned a coroutine.
```
In state res v2, we apply two passes of iterative auth checks. The first
pass replays power events and events in their auth chains, but only
those belonging to the full conflicted set. The source code as written
suggests that we want only those belonging to the auth difference (which
is a smaller set of events).
At runtime we were doing the correct thing anyway, because the only
callsite of `_reverse_topological_power_sort` passes in the
`full_conflicted_set`. So this really is just a rename.
This adds support for the stable identifiers of MSC2285 while
continuing to support the unstable identifiers behind the configuration
flag. These will be removed in a future version.
Fix @tag_args being off-by-one (ahead)
Example:
```
argspec.args=[
'self',
'room_id'
]
args=(
<synapse.storage.databases.main.DataStore object at 0x10d0b8d00>,
'!HBehERstyQBxyJDLfR:my.synapse.server'
)
```
---
The previous logic was also flawed and we can end up in a situation like this:
```
argspec.args=['self', 'dest', 'room_id', 'limit', 'extremities']
args=(<synapse.federation.federation_client.FederationClient object at 0x7f1651c18160>, 'hs1', '!jAEHKIubyIfuLOdfpY:hs1')
```
From this source:
```py
async def backfill(
self, dest: str, room_id: str, limit: int, extremities: Collection[str]
) -> Optional[List[EventBase]]:
```
And this usage:
```py
events = await self._federation_client.backfill(
dest, room_id, limit=limit, extremities=extremities
)
```
which would previously cause this error:
```
synapse_main | 2022-08-04 06:13:12,051 - synapse.handlers.federation - 424 - ERROR - GET-5 - Failed to backfill from hs1 because tuple index out of range
synapse_main | Traceback (most recent call last):
synapse_main | File "/usr/local/lib/python3.9/site-packages/synapse/handlers/federation.py", line 392, in try_backfill
synapse_main | await self._federation_event_handler.backfill(
synapse_main | File "/usr/local/lib/python3.9/site-packages/synapse/logging/tracing.py", line 828, in _wrapper
synapse_main | return await func(*args, **kwargs)
synapse_main | File "/usr/local/lib/python3.9/site-packages/synapse/handlers/federation_event.py", line 593, in backfill
synapse_main | events = await self._federation_client.backfill(
synapse_main | File "/usr/local/lib/python3.9/site-packages/synapse/logging/tracing.py", line 828, in _wrapper
synapse_main | return await func(*args, **kwargs)
synapse_main | File "/usr/local/lib/python3.9/site-packages/synapse/logging/tracing.py", line 827, in _wrapper
synapse_main | with wrapping_logic(func, *args, **kwargs):
synapse_main | File "/usr/local/lib/python3.9/contextlib.py", line 119, in __enter__
synapse_main | return next(self.gen)
synapse_main | File "/usr/local/lib/python3.9/site-packages/synapse/logging/tracing.py", line 922, in _wrapping_logic
synapse_main | set_attribute("ARG_" + arg, str(args[i + 1])) # type: ignore[index]
synapse_main | IndexError: tuple index out of range
```
* Adds docstrings and inline comments.
* Formats SQL queries using triple quoted strings.
* Minor formatting changes.
* Avoid fetching `event_push_summary_stream_ordering` multiple times
in the same transactions.
Still maintains local in memory lookup optimisation, but does any external
lookup as part of the deferred that prevents duplicate lookups for the same
event at once. This makes the assumption that fetching from an external
cache is a non-zero load operation.
Part of my continuing quest to make the docker images build quicker: copy nginx and redis in from base docker images, rather than apt installing each time.
* Improved section regarding server admin
Added steps describing how to elevate an existing user to administrator by manipulating a `postgres` database.
Signed-off-by: jejo86 28619134+jejo86@users.noreply.github.com
* Improved section regarding server admin
* Reference database settings
Add instructions to check database settings to find out the database name, instead of listing all available PostgreSQL databases.
* Add suggestions from PR conversation
Replace config filename `homeserver.yaml`. with "config file".
Remove instructions to switch to `postgres` user.
Add instructions how to connect to SQLite database.
* Update changelog.d/13230.doc
Co-authored-by: reivilibre <olivier@librepush.net>
Previously, `_resolve_state_at_missing_prevs` returned the resolved
state before an event and a partial state flag. These were unwieldy to
carry around would only ever be used to build an event context. Build
the event context directly instead.
Signed-off-by: Sean Quah <seanq@matrix.org>
Synapse 1.64.0rc2 (2022-07-29)
==============================
This RC reintroduces support for `account_threepid_delegates.email`, which was removed in 1.64.0rc1. It remains deprecated and will be removed altogether in a future release. ([\#13406](https://github.com/matrix-org/synapse/issues/13406))
The `room_id` field represented the parent space for each room
and was made redundant by changes in the API shape where the
`children_state` is now nested underneath each `room`.
The room ID of each child is in the `state_key` field and is still
available.
Avoid blocking on full state in `_resolve_state_at_missing_prevs` and
return a new flag indicating whether the resolved state is partial.
Thread that flag around so that it makes it into the event context.
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
When registering a new account via SSO on iOS, the text field becomes pretty annoying as it autocapitalises and autocorrects your input. This PR fixes that (although I have only tested the raw HTML file on the simulator, I'm not sure how to get the complete setup available for testing in the flow).
Previously, TLS could only be used with STARTTLS.
Add a new option `force_tls`, where TLS is used from the start.
Implicit TLS is recommended over STARTLS,
see https://datatracker.ietf.org/doc/html/rfc8314Fixes#8046.
Signed-off-by: Jan Schär <jan@jschaer.ch>
See #10826 and #10786 for context as to why we had to disable pruning on
those caches.
Now that `get_users_who_share_room_with_user` is called frequently only
for presence, we just need to make calls to it less frequent and then we
can remove the various levels of caching that is going on.
When a room has the partial state flag, we may not have an accurate
`m.room.member` event for event senders in the room's current state, and
so cannot perform soft fail checks correctly. Skip the soft fail check
entirely in this case.
As an alternative, we could block until we have full state, but that
would prevent us from receiving incoming events over federation, which
is undesirable.
Signed-off-by: Sean Quah <seanq@matrix.org>
Add another bash script to the contrib directory. It creates multiple stream writers and also prints out the example configuration for homeserver.yaml.
Signed-off-by: Ville Petteri Huh.
Fix race conditions in the async cache invalidation logic, by separating
the async & local invalidation calls and ensuring any async call i
executed first.
Signed off by Nick @ Beeper (@Fizzadar).
More prep work for asyncronous caching, also makes all process_replication_rows methods consistent (presence handler already is so).
Signed off by Nick @ Beeper (@Fizzadar)
Fix https://github.com/matrix-org/synapse/issues/13016
## New error code and status
### Before
Previously, we returned a `404` for `/thumbnail` which isn't even in the spec.
```json
{
"errcode": "M_NOT_FOUND",
"error": "Not found [b'hs1', b'tefQeZhmVxoiBfuFQUKRzJxc']"
}
```
### After
What does the spec say?
> 400: The request does not make sense to the server, or the server cannot thumbnail the content. For example, the client requested non-integer dimensions or asked for negatively-sized images.
>
> *-- https://spec.matrix.org/v1.1/client-server-api/#get_matrixmediav3thumbnailservernamemediaid*
Now with this PR, we respond with a `400` when we don't have thumbnails to serve and we explain why we might not have any thumbnails.
```json
{
"errcode": "M_UNKNOWN",
"error": "Cannot find any thumbnails for the requested media ([b'example.com', b'12345']). This might mean the media is not a supported_media_format=(image/jpeg, image/jpg, image/webp, image/gif, image/png) or that thumbnailing failed for some other reason. (Dynamic thumbnails are disabled on this server.)",
}
```
> Cannot find any thumbnails for the requested media ([b'example.com', b'12345']). This might mean the media is not a supported_media_format=(image/jpeg, image/jpg, image/webp, image/gif, image/png) or that thumbnailing failed for some other reason. (Dynamic thumbnails are disabled on this server.)
---
We still respond with a 404 in many other places. But we can iterate on those later and maybe keep some in some specific places after spec updates/clarification: https://github.com/matrix-org/matrix-spec/issues/1122
We can also iterate on the bugs where Synapse doesn't thumbnail when it should in other issues/PRs.
`frozendict` 2.3.2 includes a fix for a memory leak in
`frozendict.__hash__`. This likely has no impact outside of the
deprecated `/initialSync` endpoint, which uses `StreamToken`s,
containing `RoomStreamToken`s, containing `frozendict`s, as cache keys.
Signed-off-by: Sean Quah <seanq@matrix.org>
These columns were added back in Synapse 1.52, and have been populated for new
events since then. It's now (beyond) time to back-populate them for existing
events.
There are two fixes here:
1. A long-standing bug where we incorrectly calculated `delta_ids`; and
2. A bug introduced in #13267 where we got current state incorrect.
When building the docker images for complement testing, copy a preinstalled
complement over from a base image, rather than apt installing it. This avoids
network traffic and is much faster.
Some experimental prep work to enable external event caching based on #9379 & #12955. Doesn't actually move the cache at all, just lays the groundwork for async implemented caches.
Signed off by Nick @ Beeper (@Fizzadar)
* Replace `get_new_events_for_appservice` with `get_all_new_events_stream`
The functions were near identical and this brings the AS worker closer
to the way federation senders work which can allow for multiple workers
to handle AS traffic.
* Pull received TS alongside events when processing the stream
This avoids an extra query -per event- when both federation sender
and appservice pusher process events.
There is a corner in `_check_event_auth` (long known as "the weird corner") where, if we get an event with auth_events which don't match those we were expecting, we attempt to resolve the diffence between our state and the remote's with a state resolution.
This isn't specced, and there's general agreement we shouldn't be doing it.
However, it turns out that the faster-joins code was relying on it, so we need to introduce something similar (but rather simpler) for that.
* Drop support for v1 unbind
Signed-off-by: Jacek Kusnierz <jacek.kusnierz@tum.de>
* Add changelog
Signed-off-by: Jacek Kusnierz <jacek.kusnierz@tum.de>
* Update changelog.d/13240.misc
* Admin API request explanation improved
Pointed out, that the Admin API is not accessible by default from any remote computer, but only from the PC `matrix-synapse` is running on.
Added a full, working example, making sure to include the cURL flag `-X`, which needs to be prepended to `GET`, `POST`, `PUT` etc. and listing the full query string including protocol, IP address and port.
* Admin API request explanation improved
* Apply suggestions from code review
Update changelog. Reword prose.
Co-authored-by: David Robertson <david.m.robertson1@gmail.com>
* Drop support for delegating email validation
Delegating email validation to an IS is insecure (since it allows the owner of
the IS to do a password reset on your HS), and has long been deprecated. It
will now cause a config error at startup.
* Update unit test which checks for email verification
Give it an `email` config instead of a threepid delegate
* Remove unused method `requestEmailToken`
* Simplify config handling for email verification
Rather than an enum and a boolean, all we need here is a single bool, which
says whether we are or are not doing email verification.
* update docs
* changelog
* upgrade.md: fix typo
* update version number
this will be in 1.64, not 1.63
* update version number
this one too
This gets rid of another usage of get_appservice_by_req, with all the benefits, including correctly tracking the appservice IP and setting the tracing attributes correctly.
Signed-off-by: Quentin Gliech <quenting@element.io>
Inspired by the room batch handler, this uses previous event inserts to
pre-populate prev events during room creation, reducing the number of
queries required to create a room.
Signed off by Nick @ Beeper (@Fizzadar)
Complement tests: https://github.com/matrix-org/complement/pull/405
This happens when you have some messages imported before the room is created.
Then use MSC3030 to look backwards before the room creation from a remote
federated server. The server won't find anything locally, but will ask over
federation which will have the remote event. The previous logic would
choke on not having the local event assigned.
```
Failed to fetch /timestamp_to_event from hs2 because of exception(UnboundLocalError) local variable 'local_event' referenced before assignment args=("local variable 'local_event' referenced before assignment",)
```
All tests are prefixed with `STALE_` and therefore they are silently
skipped. They were moved to `STALE_` in version `v0.5.0` in commit
2fcce3b3c5 - `Remove stale tests`.
Tests from `RoomEventsStoreTestCase` class are not used for last 8
years, I believe the best would be to remove them entirely.
Signed-off-by: Petr Vaněk <arkamar@atlas.cz>
Bounce recalculation of current state to the correct event persister and
move recalculation of current state into the event persistence queue, to
avoid concurrent updates to a room's current state.
Also give recalculation of a room's current state a real stream
ordering.
Signed-off-by: Sean Quah <seanq@matrix.org>
Method `_get_state_map_for_room` seems to break in presence of some ill-formed events in the database. Reimplementing this method to use `get_current_state`, which is more robust to such events.
This happened if we encountered a stream ordering in `event_push_actions` that had more rows than the batch size of the delete, as If we don't delete any rows in an iteration then the next time round we get the exact same stream ordering and get stuck.
Whenever we want to persist an event, we first compute an event context,
which includes the state at the event and a flag indicating whether the
state is partial. After a lot of processing, we finally try to store the
event in the database, which can fail for partial state events when the
containing room has been un-partial stated in the meantime.
We detect the race as a foreign key constraint failure in the data store
layer and turn it into a special `PartialStateConflictError` exception,
which makes its way up to the method in which we computed the event
context.
To make things difficult, the exception needs to cross a replication
request: `/fed_send_events` for events coming over federation and
`/send_event` for events from clients. We transport the
`PartialStateConflictError` as a `409 Conflict` over replication and
turn `409`s back into `PartialStateConflictError`s on the worker making
the request.
All client events go through
`EventCreationHandler.handle_new_client_event`, which is called in
*a lot* of places. Instead of trying to update all the code which
creates client events, we turn the `PartialStateConflictError` into a
`429 Too Many Requests` in
`EventCreationHandler.handle_new_client_event` and hope that clients
take it as a hint to retry their request.
On the federation event side, there are 7 places which compute event
contexts. 4 of them use outlier event contexts:
`FederationEventHandler._auth_and_persist_outliers_inner`,
`FederationHandler.do_knock`, `FederationHandler.on_invite_request` and
`FederationHandler.do_remotely_reject_invite`. These events won't have
the partial state flag, so we do not need to do anything for then.
The remaining 3 paths which create events are
`FederationEventHandler.process_remote_join`,
`FederationEventHandler.on_send_membership_event` and
`FederationEventHandler._process_received_pdu`.
We can't experience the race in `process_remote_join`, unless we're
handling an additional join into a partial state room, which currently
blocks, so we make no attempt to handle it correctly.
`on_send_membership_event` is only called by
`FederationServer._on_send_membership_event`, so we catch the
`PartialStateConflictError` there and retry just once.
`_process_received_pdu` is called by `on_receive_pdu` for incoming
events and `_process_pulled_event` for backfill. The latter should never
try to persist partial state events, so we ignore it. We catch the
`PartialStateConflictError` in `on_receive_pdu` and retry just once.
Refering to the graph of code paths in
https://github.com/matrix-org/synapse/issues/12988#issuecomment-1156857648
may make the above make more sense.
Signed-off-by: Sean Quah <seanq@matrix.org>
* Cast to postgres types when handling postgres db
* Remove unused method
* Easy annotations
* Annotate create_room
* Use `ParamSpec` to annotate looping_call
* Annotate `default_config`
* Track `now` as a float
`time_ms` returns an int like the proper Synapse `Clock`
* Introduce a `Timer` dataclass
* Introduce a Looper type
* Suppress checking of a mock
* tests.utils is typed
* Changelog
* Whoops, import ParamSpec from typing_extensions
* ditch the psycopg2 casts
When we receive an event over federation during a faster join, there is no need
to wait for full state, since we have a whole reconciliation process designed
to take the partial state into account.
* Make _iterate_over_text easier to read by using simple data structures
* Prefer a set of tags to ignore
In my tests, it's 4x faster to check for containment in a set of this size
* Add a stack size limit to _iterate_over_text
* Continue accepting the case where there is no body element
* Use an early return instead for None
Co-authored-by: Richard van der Hoff <richard@matrix.org>
* Extend the auth rule checks for `m.room.create` events
... and move them up to the top of the function. Since the no auth_events are
allowed for m.room.create events, we may as well get the m.room.create event
checks out of the way first.
* Add a test for create events with prev_events
When we fail to persist a federation event, we kick off a task to remove
its push actions in the background, using the current logging context.
Since we don't `await` that task, we may finish our logging context
before the task finishes. There's no reason to not `await` the task, so
let's do that.
Signed-off-by: Sean Quah <seanq@matrix.org>
Pull out `twitter:` meta tags when generating a preview and
use it to augment any `og:` meta tags.
Prefers Open Graph information over Twitter card information.
* Add auth events to events used in tests
* Move some event auth checks out to a different method
Some of the event auth checks apply to an event's auth_events, rather than the
state at the event - which means they can play no part in state
resolution. Move them out to a separate method.
* Rename check_auth_rules_for_event
Now it only checks the state-dependent auth rules, it needs a better name.
Fixes#11887 hopefully.
The core change here is that `event_push_summary` now holds a summary of counts up until a much more recent point, meaning that the range of rows we need to count in `event_push_actions` is much smaller.
This needs two major changes:
1. When we get a receipt we need to recalculate `event_push_summary` rather than just delete it
2. The logic for deleting `event_push_actions` is now divorced from calculating `event_push_summary`.
In future it would be good to calculate `event_push_summary` while we persist a new event (it should just be a case of adding one to the relevant rows in `event_push_summary`), as that will further simplify the get counts logic and remove the need for us to periodically update `event_push_summary` in a background job.
* Remove redundant references to `event_edges.room_id`
We don't need to care about the room_id here, because we are already checking
the event id.
* Clean up the event_edges table
We make a number of changes to `event_edges`:
* We give the `room_id` and `is_state` columns defaults (null and false
respectively) so that we can stop populating them.
* We drop any rows that have `is_state` set true - they should no longer
exist.
* We drop any rows that do not exist in `events` - these should not exist
either.
* We drop the old unique constraint on all the colums, which wasn't much use.
* We create a new unique index on `(event_id, prev_event_id)`.
* We add a foreign key constraint to `events`.
These happen rather differently depending on whether we are on Postgres or
SQLite. For SQLite, we just rebuild the whole table, copying only the rows we
want to keep. For Postgres, we try to do things in the background as much as
possible.
* Stop populating `event_edges.room_id` and `is_state`
We can just rely on the defaults.
* Rename test_fedclient to match its source file
* Require at least one destination to be truthy
* Explicitly validate user ID in profile endpoint GETs
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
This simplifies the access token verification logic by removing the `rights`
parameter which was only ever used for the unsubscribe link in email
notifications. The latter has been moved under the `/_synapse` namespace,
since it is not a standard API.
This also makes the email verification link more secure, by embedding the
app_id and pushkey in the macaroon and verifying it. This prevents the user
from tampering the query parameters of that unsubscribe link.
Macaroon generation is refactored:
- Centralised all macaroon generation and verification logic to the
`MacaroonGenerator`
- Moved to `synapse.utils`
- Changed the constructor to require only a `Clock`, hostname, and a secret key
(instead of a full `Homeserver`).
- Added tests for all methods.
Instead, use the `room_version` property of the event we're checking.
The `room_version` was originally added as a parameter somewhere around #4482,
but really it's been redundant since #6875 added a `room_version` field to `EventBase`.
Instead, use the `room_version` property of the event we're validating.
The `room_version` was originally added as a parameter somewhere around #4482,
but really it's been redundant since #6875 added a `room_version` field to `EventBase`.
In practice, when we run the auth rules, all of the events have the right room
version. Let's stop building Room V1 events for these tests and use the right
version.
The `room_id` field was removed from MSC2946 before
it was accepted. It was initially kept for backwards compatibility
and should be removed now that the stable form of the API
is used.
This change only stops Synapse from validating that it is returned,
a future PR will remove returning it as part of the response.
By always using delete_devices and sometimes passing a list
with a single device ID.
Previously these methods had gotten out of sync with each
other and it seems there's little benefit to the single-device
variant.
* Update worker docs to remove group endpoints.
* Removes an unused parameter to `ApplicationService`.
* Break dependency between media repo and groups.
* Avoid copying `m.room.related_groups` state events during room upgrades.
Currently, we try to pull the event corresponding to a sync token from the database. However, when
we fetch redaction events, we check the target of that redaction (because we aren't allowed to send
redactions to clients without validating them). So, if the sync token points to a redaction of an event
that we don't have, we have a problem.
It turns out we don't really need that event, and can just work with its ID and metadata, which
sidesteps the whole problem.
* Raise a dedicated `InvalidEventSignatureError` from `_check_sigs_on_pdu`
* Downgrade logging about redactions to DEBUG
this can be very spammy during a room join, and it's not very useful.
* Raise `InvalidEventSignatureError` from `_check_sigs_and_hash`
... and, more importantly, move the logging out to the callers.
* changelog
Synapse 1.60.0rc2 (2022-05-27)
==============================
This release of Synapse adds a unique index to the `state_group_edges` table, in
order to prevent accidentally introducing duplicate information (for example,
because a database backup was restored multiple times). If your Synapse database
already has duplicate rows in this table, this could fail with an error and
require manual remediation.
Additionally, the signature of the `check_event_for_spam` module callback has changed.
The previous signature has been deprecated and remains working for now. Module authors
should update their modules to use the new signature where possible.
See [the upgrade notes](https://github.com/matrix-org/synapse/blob/develop/docs/upgrade.md#upgrading-to-v1600)
for more details.
Features
--------
- Add an option allowing users to use their password to reauthenticate for privileged actions even though password login is disabled. ([\#12883](https://github.com/matrix-org/synapse/issues/12883))
Bugfixes
--------
- Explicitly close `ijson` coroutines once we are done with them, instead of leaving the garbage collector to close them. ([\#12875](https://github.com/matrix-org/synapse/issues/12875))
Internal Changes
----------------
- Improve URL previews by not including the content of media tags in the generated description. ([\#12887](https://github.com/matrix-org/synapse/issues/12887))
Hopefully this means that exceptions raised due to truncated JSON
get a sensible logging context and stack.
Signed-off-by: Sean Quah <seanq@matrix.org>
* Fix room deletion
ae7858f broke room deletion by attempting to delete the entry from `rooms`
before the tables that reference it.
* faster_joins: remove database rows on purge
* Refactor HTTP response size limits
Rather than passing a separate `max_response_size` down the stack, make it an
attribute of the `parser`.
* Allow bigger responses on `federation/v1/state`
`/state` can return huge responses, so we need to handle that.
Makes it so that groups/communities no longer exist from a user-POV. E.g. we remove:
* All API endpoints (including Client-Server, Server-Server, and admin).
* Documented configuration options (and the experimental flag, which is now unused).
* Special handling during room upgrades.
* The `groups` section of the `/sync` response.
By always returning all requested values from the function
wrapped by cachedList. Otherwise implicit None values get
added into the cache, which are unexpected.
Implements the following behind an experimental configuration flag:
* A new push rule kind for mutually related events.
* A new default push rule (`.m.rule.thread_reply`) under an unstable prefix.
This is missing part of MSC3772:
* The `.m.rule.thread_reply_to_me` push rule, this depends on MSC3664 / #11804.
The main differences are:
- values with delimiters (such as colons) should be quoted, so always
quote the origin, since it could contain a colon followed by a port
number
- should allow more than one space after "X-Matrix"
- quoted values with backslash-escaped characters should be unescaped
- names should be case insensitive
A minor optimization to avoid unnecessary copying/building
identical dictionaries when filtering private read receipts.
Also clarifies comments and cleans-up some tests.
Parse the `m.relates_to` event content field (which describes relations)
in a single place, this is used during:
* Event persistence.
* Validation of the Client-Server API.
* Fetching bundled aggregations.
* Processing of push rules.
Each of these separately implement the logic and each made slightly
different assumptions about what was valid. Some had minor / potential
bugs.
Enable cancellation of `GET /rooms/$room_id/members`,
`GET /rooms/$room_id/state` and
`GET /rooms/$room_id/state/$state_key/*` requests.
Signed-off-by: Sean Quah <seanq@element.io>
`BaseFederationServlet` wraps its endpoints in a bunch of async code
that has not been vetted for compatibility with cancellation.
Fail CI if a `@cancellable` flag is applied to a federation endpoint.
Signed-off-by: Sean Quah <seanq@element.io>
While `ReplicationEndpoint`s register themselves via `JsonResource`,
they pass a method that calls the handler, instead of the handler itself,
to `register_paths`. As a result, `JsonResource` will not correctly pick
up the `@cancellable` flag and we have to apply it ourselves.
Signed-off-by: Sean Quah <seanq@element.io>
Both `RestServlet`s and `BaseFederationServlet`s register their handlers
with `HttpServer.register_paths` / `JsonResource.register_paths`. Update
`JsonResource` to respect the `@cancellable` flag on handlers registered
in this way.
Although `ReplicationEndpoint` also registers itself using
`register_paths`, it does not pass the handler method that would have the
`@cancellable` flag directly, and so needs separate handling.
Signed-off-by: Sean Quah <seanq@element.io>
`DirectServeHtmlResource` and `DirectServeJsonResource` both inherit
from `_AsyncResource`. These classes expect to be subclassed with
`_async_render_*` methods.
This commit has no effect on `JsonResource`, despite inheriting from
`_AsyncResource`. `JsonResource` has its own `_async_render` override
which will need to be updated separately.
Signed-off-by: Sean Quah <seanq@element.io>
Refactor how the `EventContext` class works, with the intention of reducing the amount of state we fetch from the DB during event processing.
The idea here is to get rid of the cached `current_state_ids` and `prev_state_ids` that live in the `EventContext`, and instead defer straight to the database (and its caching).
One change that may have a noticeable effect is that we now no longer prefill the `get_current_state_ids` cache on a state change. However, that query is relatively light, since its just a case of reading a table from the DB (unlike fetching state at an event which is more heavyweight). For deployments with workers this cache isn't even used.
Part of #12684
All async request processing goes through `_AsyncResource`, so this is
the only place where a `Deferred` needs to be captured for cancellation.
Unfortunately, the same isn't true for determining whether a request
can be cancelled. Each of `RestServlet`, `BaseFederationServlet`,
`DirectServe{Html,Json}Resource` and `ReplicationEndpoint` have
different wrappers around the method doing the request handling and they
all need to be handled separately.
Signed-off-by: Sean Quah <seanq@element.io>
Also expose the `SynapseRequest` from `FakeChannel` in tests, so that
we can call `Request.connectionLost` to simulate a client disconnecting.
Signed-off-by: Sean Quah <seanq@element.io>
* Move `_condition_checker` into `PushRuleEvaluatorForEvent`.
* Move the condition cache into `PushRuleEvaluatorForEvent`.
* Improve docstrings.
* Inline a method which is only called once.
There's no guarantee that module callbacks will handle cancellation
appropriately. Protect module callbacks with read semantics from
cancellation and avoid swallowing `CancelledError`s that arise.
Other module callbacks, such as the `on_*` callbacks, are presumed to
live on code paths that involve writes and aren't cancellation-friendly.
These module callbacks have been left alone.
Signed-off-by: Sean Quah <seanq@element.io>
* Move `pympler` back into the `all` extras
Undoes a change I made in #12381. I can't fully remember my reasoning,
but this changed the contents of the debian packages in a backwards
incompatible way. We're not aware of anyone who's been bitten by this,
but we still want to fix it.
To the reviewer: please be convinced that the debian packages will still
contain pympler after this change.
* Debian changelog entry to keep the linter happy
Update the "Build docker images" GitHub Actions workflow to use
`docker/metadata-action` to generate docker image tags, instead of a
custom shell script.
Signed-off-by: Henry <97804910+henryclw@users.noreply.github.com>
Fixes a regression from 8b309adb43 (#11660)
and b65acead42 (#11752) where events which
themselves were an edit or an annotation could have bundled aggregations calculated,
which is not allowed.
* Add mau_appservice_trial_days
* Add a test
* Tweaks
* changelog
* Ensure we sync after the delay
* Fix types
* Add config statement
* Fix test
* Reinstate logging that got removed
* Fix feature name
getClientIP was deprecated in Twisted 18.4.0, which also added
getClientAddress. The Synapse minimum version for Twisted is
currently 18.9.0, so all supported versions have the new API.
* Changes hidden read receipts to be a separate receipt type
(instead of a field on `m.read`).
* Updates the `/receipts` endpoint to accept `m.fully_read`.
* `m.login.jwt`, which was never specced and has been deprecated
since Synapse 1.16.0. (`org.matrix.login.jwt` can be used instead.)
* `uk.half-shot.msc2778.login.application_service`, which was
stabilized as part of the Matrix spec v1.2 release.
The `latest_event` field of the bundled aggregations for `m.thread` relations
did not include bundled aggregations itself. This resulted in clients needing to
immediately request the event from the server (and thus making it useless that
the latest event itself was serialized instead of just including an event ID).
I've seen a few errors which can only plausibly be explained by the calculated
event id for an event being different from the ID of the event in the
database. It should be cheap to check this, so let's do so and raise an
exception.
Check we're on the right branch before tagging, and on the right tag before uploading
* Abort if we're on the wrong branch
* Check we have the right tag checked out
* Clarify that `publish` only releases to GitHub
This works by taking a row level lock on the `rooms` table at the start of both transactions, ensuring that they don't run at the same time. In the event persistence transaction we also check that there is an entry still in the `rooms` table.
I can't figure out how to do this in SQLite. I was just going to lock the table, but it seems that we don't support that in SQLite either, so I'm *really* confused as to how we maintain integrity in SQLite when using `lock_table`....
This was originally added when we first added a `MemoryHandler` to the default
log config back in https://github.com/matrix-org/synapse/pull/8040, to ensure
that we didn't explode with an infinite loop if there was an error formatting
the logs.
Since then, we made additional improvements to logging which make this
workaround redundant. In particular:
* we no longer attempt to log un-UTF8-decodable byte sequences, which were the
most likely cause of an error in the first place.
* https://github.com/matrix-org/synapse/pull/8268 ensures that in the unlikely
case that there *is* an error, it won't cause an infinite loop.
* Allow unused ignores in "bleeding edge" CI
Where "bleeding edge" means the Twisted Trunk and Latest Deps jobs.
Follow up from #12531.
Resolves#12574.
* Use `--extras all` in latest deps mypy CI
Twisted trunk job already does this.
Missed in #12531.
* changelog
The status code of requests must always be set, regardless of client
disconnection, otherwise they will always be logged as 200!.
Broken for `respond_with_json` in
f48792eec4.
Broken for `respond_with_json_bytes` in
3e58ce72b4.
Broken for `respond_with_html_bytes` in
ea26e9a98b.
Signed-off-by: Sean Quah <seanq@element.io>
When configuring the return values of mocks, prefer awaitables from
`make_awaitable` over `defer.succeed`. `Deferred`s are only awaitable
once, so it is inappropriate for a mock to return the same `Deferred`
multiple times.
Also update `run_in_background` to support functions that return
arbitrary awaitables.
Signed-off-by: Sean Quah <seanq@element.io>
Over time we've begun to use newer versions of mypy, typeshed, stub
packages---and of course we've improved our own annotations. This makes
some type ignore comments no longer necessary. I have removed them.
There was one exception: a module that imports `select.epoll`. The
ignore is redundant on Linux, but I've kept it ignored for those of us
who work on the source tree using not-Linux. (#11771)
I'm more interested in the config line which enforces this. I want
unused ignores to be reported, because I think it's useful feedback when
annotating to know when you've fixed a problem you had to previously
ignore.
* Installing extras before typechecking
Lacking an easy way to install all extras generically, let's bite the bullet and
make install the hand-maintained `all` extra before typechecking.
Now that https://github.com/matrix-org/backend-meta/pull/6 is merged to
the release/v1 branch.
Synapse 1.58.0rc2 (2022-04-26)
==============================
This release candidate fixes bugs related to Synapse 1.58.0rc1's logic for handling device list updates.
Bugfixes
--------
- Fix a bug introduced in Synapse 1.58.0rc1 where the main process could consume excessive amounts of CPU and memory while handling sentry logging failures. ([\#12554](https://github.com/matrix-org/synapse/issues/12554))
- Fix a bug introduced in Synapse 1.58.0rc1 where opentracing contexts were not correctly sent to whitelisted remote servers with device lists updates. ([\#12555](https://github.com/matrix-org/synapse/issues/12555))
Internal Changes
----------------
- Reduce unnecessary work when handling remote device list updates. ([\#12557](https://github.com/matrix-org/synapse/issues/12557))
Try to avoid an OOM by checking fewer extremities.
Generally this is a big rewrite of _maybe_backfill, to try and fix some of the TODOs and other problems in it. It's best reviewed commit-by-commit.
Multiple calls to `EventsWorkerStore._get_events_from_cache_or_db` can
reuse the same database fetch, which is initiated by the first call.
Ensure that cancelling the first call doesn't cancel the other calls
sharing the same database fetch.
Signed-off-by: Sean Quah <seanq@element.io>
This will mainly be useful when dealing with module callbacks, which are
all typed as returning `Awaitable`s instead of coroutines or
`Deferred`s.
Signed-off-by: Sean Quah <seanq@element.io>
When we join a room via the faster-joins mechanism, we end up with "partial
state" at some points on the event DAG. Many parts of the codebase need to
wait for the full state to load. So, we implement a mechanism to keep track of
which events have partial state, and wait for them to be fully-populated.
MSC2314 has now been closed, so we're backing out its implementation, which
originally happened in #6176.
Unfortunately it's not a direct revert, as that PR mixed in a bunch of
unrelated changes to tests etc.
* Use `poetry` to build venv in debian packages
Co-authored-by: Dan Callahan <danc@element.io>
Co-authored-by: Shay <hillerys@element.io>
* Changelog
* Only pull in from requirements.txt
Addresses the same problem as #12439.
* Include `test` and `all` extras
`poetry export` helpfully silently ignores an unknown extra
Haven't seen this before because it's the only place we export `all` and
`test`. I could have __sworm__ that the syntax `--extra "all test"`
worked for `poetry install`...
* Clean up requirements file on subsequence builds
* Fix shell syntax
Co-authored-by: Dan Callahan <danc@element.io>
Co-authored-by: Shay <hillerys@element.io>
When we run a worker-mode synapse under docker, everything gets logged to stdout. Currently, output from the workers is tacked with a worker name, for example:
```
2022-04-13 15:27:56,810 - worker:frontend_proxy1 - synapse.util.caches.lrucache - 154 - INFO - LruCache._expire_old_entries-0 - Dropped 0 items from caches
```
- note `worker:frontend_proxy1`. No such tag is applied to log lines from the master, which makes somewhat confusing reading.
To fix this, we generate a dedicated log config file for the master in the same way that we do for the workers, and use that.
In trying to use the MSC3026 busy presence status, the user's status
would be set back to 'online' next time they synced. This change makes
it so that syncing does not affect a user's presence status if it
is currently set to 'busy': it must be removed through the presence
API.
The MSC defers to implementations on the behaviour of busy presence,
so this ought to remain compatible with the MSC.
* Run "main" trial tests under poetry
Olddeps and twisted trunk tests are handled in separate PRs.
The PyPy config is a best-effort only; it's completely untested.
Pulled out from #12337.
* Changelog
This was missed when initially stabilising room version 8 and was
left in as a compatibility shim. Most homeservers have upgraded
to a version which expects the proper field name, and the failure
mode is reasonable (a user on an older server may have to attempt
joining the room twice with an obscure error message the first time).
We work through all the events with partial state, updating the state at each
of them. Once it's done, we recalculate the state for the whole room, and then
mark the room as having complete state.
* Add some type hints to datastore
* newsfile
* change `Collection` to `List`
* refactor return type of `select_users_txn`
* correct type hint in `stream.py`
* Remove `Optional` in `select_users_txn`
* remove not needed return type in `__init__`
* Revert change in `get_stream_id_for_event_txn`
* Remove import from `Literal`
* Specify `tls` extra for Twisted dependency.
It was already pulled in for us by `treq`, but we should be explicit
that we do use the `tls` functionality of Twisted directly.
* Mark `idna` as dev-dependency
This doesn't actually change anything, as `Twisted[tls]` will put it in
as a main dependency anyway.
The requirements file generated by `poetry export` isn't correctly processed by `pip install -r requirements.txt`. It contains twisted and treq, both pinned to 22.2.0.
When `pip` installs treq, it notices that `Twisted[tls]` is required. It then tries to acquire the latest twisted release, only to fail (because this hash isn't listed in the requirements file).From e.g. https://github.com/matrix-org/synapse/runs/5977154990?check_suite_focus=true
> ```
> #15 9.204 Collecting Twisted[tls]>=18.7.0
> #15 9.205 ERROR: In --require-hashes mode, all requirements must have their versions pinned with ==. These do not:
> #15 9.205 Twisted[tls]>=18.7.0 from 38622ff95b/Twisted-22.4.0-py3-none-any.whl (sha256)=f9f7a91f94932477a9fc3b169d57f54f96c6e74a23d78d9ce54039a7f48928a2 (from treq==22.2.0->-r /synapse/requirements.txt (line 724))
> #15 ERROR: executor failed running [/bin/sh -c pip install --prefix="/install" --no-warn-script-location -r /synapse/requirements.txt]: exit code: 1
> ```
The underlying pip issue is https://github.com/pypa/pip/issues/9644. A comment notes that one can avoid this behaviour with by `pip install`ing with the `--no-deps` flag. Let us do so.
(At first glance, the problem looks like https://github.com/python-poetry/poetry/issues/5311, but that was a bug in `poetry install`; this is `poetry export`, whose behaviour is fine AFAICS).
Consider the requester's ignored users when calculating the
bundled aggregations.
See #12285 / 4df10d3214
for corresponding changes for the `/relations` endpoint.
Of note:
* No untyped defs in `register_new_matrix_user`
This one might be contraversial. `request_registration` has three
dependency-injection arguments used for testing. I'm removing the
injection of the `requests` module and using `unitest.mock.patch` in the
test cases instead.
Doing `reveal_type(requests)` and `reveal_type(requests.get)` before the
change:
```
synapse/_scripts/register_new_matrix_user.py:45: note: Revealed type is "Any"
synapse/_scripts/register_new_matrix_user.py:46: note: Revealed type is "Any"
```
And after:
```
synapse/_scripts/register_new_matrix_user.py:44: note: Revealed type is "types.ModuleType"
synapse/_scripts/register_new_matrix_user.py:45: note: Revealed type is "def (url: Union[builtins.str, builtins.bytes], params: Union[Union[_typeshed.SupportsItems[Union[builtins.str, builtins.bytes, builtins.int, builtins.float], Union[builtins.str, builtins.bytes, builtins.int, builtins.float, typing.Iterable[Union[builtins.str, builtins.bytes, builtins.int, builtins.float]], None]], Tuple[Union[builtins.str, builtins.bytes, builtins.int, builtins.float], Union[builtins.str, builtins.bytes, builtins.int, builtins.float, typing.Iterable[Union[builtins.str, builtins.bytes, builtins.int, builtins.float]], None]], typing.Iterable[Tuple[Union[builtins.str, builtins.bytes, builtins.int, builtins.float], Union[builtins.str, builtins.bytes, builtins.int, builtins.float, typing.Iterable[Union[builtins.str, builtins.bytes, builtins.int, builtins.float]], None]]], builtins.str, builtins.bytes], None] =, data: Union[Any, None] =, headers: Union[Any, None] =, cookies: Union[Any, None] =, files: Union[Any, None] =, auth: Union[Any, None] =, timeout: Union[Any, None] =, allow_redirects: builtins.bool =, proxies: Union[Any, None] =, hooks: Union[Any, None] =, stream: Union[Any, None] =, verify: Union[Any, None] =, cert: Union[Any, None] =, json: Union[Any, None] =) -> requests.models.Response"
```
* Drive-by comment in `synapse.storage.types`
* No untyped defs in `synapse_port_db`
This was by far the most painful. I'm happy to break this up into
smaller pieces for review if it's not managable as-is.
Fixesmatrix-org/complement#330 (or it will, once we remove the old files).
It's not quite a lift-and-shift: I've also taken the opportunity to get rid of the custom CA that we used to use to sign the TLS certs, which has been superceded by the CA exposed by Complement.
* Pull out query param types to `synapse.http.types`
* Use QueryParams everywhere
* Simplify `encode_query_args`
* Add annotation which would have caught #12410
Principally, `prometheus_client.REGISTRY.register` now requires its argument to
extend `prometheus_client.Collector`.
Additionally, `Gauge.set` is now annotated so that passing `Optional[int]`
causes an error.
Just after a task acquires a contended `Linearizer` lock, it sleeps.
If the task is cancelled during this sleep, we need to release the lock.
Signed-off-by: Sean Quah <seanq@element.io>
`StreamToken.from_string` and `RoomStreamToken.parse` are both async
methods that could be cancelled. These methods must not replace
`CancelledError`s with `SynapseError`s.
Signed-off-by: Sean Quah <seanq@element.io>
Refactor and convert `Linearizer` to async. This makes a `Linearizer`
cancellation bug easier to fix.
Also refactor to use an async context manager, which eliminates an
unlikely footgun where code that doesn't immediately use the context
manager could forget to release the lock.
Signed-off-by: Sean Quah <seanq@element.io>
This is a first step in dealing with #7721.
The idea is basically that rather than calculating the full set of users a device list update needs to be sent to up front, we instead simply record the rooms the user was in at the time of the change. This will allow a few things:
1. we can defer calculating the set of remote servers that need to be poked about the change; and
2. during `/sync` and `/keys/changes` we can avoid also avoid calculating users who share rooms with other users, and instead just look at the rooms that have changed.
However, care needs to be taken to correctly handle server downgrades. As such this PR writes to both `device_lists_changes_in_room` and the `device_lists_outbound_pokes` table synchronously. In a future release we can then bump the database schema compat version to `69` and then we can assume that the new `device_lists_changes_in_room` exists and is handled.
There is a temporary option to disable writing to `device_lists_outbound_pokes` synchronously, allowing us to test the new code path does work (and by implication upgrading to a future release and downgrading to this one will work correctly).
Note: Ideally we'd do the calculation of room to servers on a worker (e.g. the background worker), but currently only master can write to the `device_list_outbound_pokes` table.
There are a bunch of places we call get_success on an immediate value, which is unnecessary. Let's rip them out, and remove the redundant functionality in get_success and friends.
Switching to a sequence means there's no need to track `last_txn` on the
AS state table to generate new TXN IDs. This also means that there is
no longer contention between the AS scheduler and AS handler on updates
to the `application_services_state` table, which will prevent serialization
errors during the complete AS txn transaction.
It seems like calling `_get_state_group_for_events` for an event where the
state is unknown is an error. Accordingly, let's raise an exception rather than
silently returning an empty result.
If we're missing most of the events in the room state, then we may as well call the /state endpoint, instead of individually requesting each and every event.
The intention here is to avoid doing state lookups for outliers in
`/_matrix/federation/v1/event`. Unfortunately that's expanded into something of
a rewrite of `filter_events_for_server`, which ended up trying to do that
operation in a couple of places.
The PaginationChunk class attempted to bundle some properties
together, but really just caused callers to jump through hoops and
hid implementation details.
This endpoint was removed from MSC2675 before it was approved.
It is currently unspecified (even in any MSCs) and therefore subject to
removal. It is not implemented by any known clients.
This also changes the bundled aggregation format for `m.annotation`,
which previously included pagination tokens for the `/aggregations`
endpoint, which are no longer useful.
Document the behaviour of `LoggingTransaction.call_after` and
`LoggingTransaction.call_on_exception` when transactions are retried.
Signed-off-by: Sean Quah <seanq@element.io>
Follow-up to https://github.com/matrix-org/synapse/pull/12083
Since we are now using the new `state_event_ids` parameter to do all of the heavy lifting.
We can remove any spots where we plumbed `auth_event_ids` just for MSC2716 things in
https://github.com/matrix-org/synapse/pull/9247/files.
Removing `auth_event_ids` from following functions:
- `create_and_send_nonmember_event`
- `_local_membership_update`
- `update_membership`
- `update_membership_locked`
When we are processing a `/backfill` request from a remote server, exclude any
outliers from consideration early on. We can't return outliers anyway (since we
don't know the state at the outlier), and filtering them out earlier means that
we won't attempt to calulate the state for them.
This should speed up push rule calculations for rooms with large numbers of local users when the main push rule cache fails.
Co-authored-by: reivilibre <oliverw@matrix.org>
* Make it possible to enable compression for the metrics HTTP resource
This can provide significant bandwidth savings pulling metrics from
synapse instances.
* Add changelog file.
* Fix type hint
We fetch the thread summary in two phases:
1. The summary that is shared by all users (count of messages and latest event).
2. Whether the requesting user has participated in the thread.
There's no use in attempting step 2 for events which did not return a summary
from step 1.
* Formally type the UserProfile in user searches
* export UserProfile in synapse.module_api
* Update docs
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
An error occured if a filter was supplied with `event_fields` which did not include
`unsigned`.
In that case, bundled aggregations are still added as the spec states it is allowed
for servers to add additional fields.
To handle cancellation, we ensure that `after_callback`s and
`exception_callback`s are always run, since the transaction will
complete on another thread regardless of cancellation.
We also wait until everything is done before releasing the
`CancelledError`, so that logging contexts won't get used after they
have been finished.
Signed-off-by: Sean Quah <seanq@element.io>
* Moves the relation pagination tests to a separate class.
* Move the assertion of the response code into the `_send_relation` helper.
* Moves some helpers into the base-class.
These decorators mostly support cancellation already. Add cancellation
tests and fix use of finished logging contexts by delaying cancellation,
as suggested by @erikjohnston.
Signed-off-by: Sean Quah <seanq@element.io>
`delay_cancellation` behaves like `stop_cancellation`, except it
delays `CancelledError`s until the original `Deferred` resolves.
This is handy for unifying cleanup paths and ensuring that uncancelled
coroutines don't use finished logcontexts.
Signed-off-by: Sean Quah <seanq@element.io>
The unstable identifiers are still supported if the experimental configuration
flag is enabled. The unstable identifiers will be removed in a future release.
This is allowed per MSC2675, although the original implementation did
not allow for it and would return an empty chunk / not bundle aggregations.
The main thing to improve is that the various caches get cleared properly
when an event is redacted, and that edits must not leak if the original
event is redacted (as that would presumably leak something similar to
the original event content).
Since the object it returns is a ReplicationCommandHandler.
This is clean-up from adding support to Redis where the command handler
was added as an additional layer of abstraction from the TCP protocol.
* `@cached` can now take an `uncached_args` which is an iterable of names to not use in the cache key.
* Requires `@cached`, @cachedList` and `@lru_cache` to use keyword arguments for clarity.
* Asserts that keyword-only arguments in cached functions are not accepted. (I tested this briefly and I don't believe this works properly.)
This allows for the target process to be down for around a minute
which provides time for restarts during synapse upgrades/config updates.
Closes: #12178
Signed off by Nick Mills-Barrett nick@beeper.com
* Rewrites the demo documentation to be clearer, accurate, and moves it to our documentation tree.
* Improvements to the demo scripts:
* `clean.sh` now runs `stop.sh` first to avoid zombie processes.
* Uses more modern Synapse configuration (and removes some obsolete configuration).
* Consistently use the HTTP ports for server name, etc.
* Remove the `demo/etc` directory and place everything into the `demo/808x` directories.
This field is only to be used in the Server-Server API, and not the
Client-Server API, but was being leaked when a federation response
was used in the /hierarchy API.
It’s just occurred to me that #12088 pulled in the “packaging” package (~=21.3). I pulled in the newest version I had at the time.
I only use it for packaging.requirements.Requirements. Which was added in packaging 16.1: https://github.com/pypa/packaging/releases/tag/16.1https://pkgs.org/download/python3-packaging suggests that the oldest version we care about is 17.1 in Ubuntu Bionic. So I think with this bound we're hunky dory.
If we locally generate a rejection for an invite received over federation, it
is stored as an outlier (because we probably don't have the state for the
room). However, currently we still generate a state group for it (even though
the state in that state group will be nonsense).
By setting the `outlier` param on `create_event`, we avoid the nonsensical
state.
I've argued in #11537 that poetry and tox don't cooperate well at the
moment. (See also #12119.) Therefore I'm pruning away bits of tox to make the transition to poetry easier. This change removes the commands for coverage.
We don't use coverage in anger at the moment. It shouldn't be too hard to add coverage as a dev-dependency and reintroduce this if we really want it.
* Fix incorrect argument in test case
* Add copyright header
* Docstring and __all__
* Exclude dev depenencies
* Use changelog from #12088
* Include version in error messages
This will hopefully distinguish between the version of the source code
and the version of the distribution package that is installed.
* Linter script is your friend
* Move the `snapcraft` configuration to `contrib`.
We're happy for people to package this as a snap image if it's useful,
but we don't support or maintain it. I'd like to move the config to
`contrib` to reflect this state of affairs.
* Changelog
* Remove unused mocks from `test_typing`
It's not clear what these do. `get_user_by_access_token` has the wrong
signature, including the return type. Tests all pass without these. I
think we should nuke them.
* Changelog
* Fixup imports
* Add type hints to `tests/rest/client`
* newsfile
* fix imports
* add `test_account.py`
* Remove one type hint in `test_report_event.py`
* change `on_create_room` to `async`
* update new functions in `test_third_party_rules.py`
* Add `test_filter.py`
* add `test_rooms.py`
* change to `assertEquals` to `assertEqual`
* lint
* Two scripts are basically entry_points already
* Move and rename scripts/* to synapse/_scripts/*.py
* Delete sync_room_to_group.pl
* Expose entry points in setup.py
* Update linter script and config
* Fixup scripts & docs mentioning scripts that moved
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
* Pull runtime dep checks into their own module
* Reimplement `check_requirements` using `importlib`
I've tried to make this clearer. We start by working out which of
Synapse's requirements we need to be installed here and now. I was
surprised that there wasn't an easier way to see which packages were
installed by a given extra.
I've pulled out the error messages into functions that deal with "is
this for an extra or not". And I've rearranged the loop over two
different sets of requirements into one loop with a "must be instaled"
flag.
I hope you agree that this is clearer.
* Test cases
When we get a partial_state response from send_join, store information in the
database about it:
* store a record about the room as a whole having partial state, and stash the
list of member servers too.
* flag the join event itself as having partial state
* also, for any new events whose prev-events are partial-stated, note that
they will *also* be partial-stated.
We don't yet make any attempt to interpret this data, so API calls (and a bunch
of other things) are just going to get incorrect data.
* Fix 'Unhandled error in Deferred'
Fixes a CRITICAL "Unhandled error in Deferred" log message which happened when
a function wrapped with `@cachedList` failed
* Minor optimisation to cachedListDescriptor
we can avoid re-using `missing`, which saves looking up entries in
`deferreds_map`, and means we don't need to copy it.
* Improve type annotation on CachedListDescriptor
* fix incorrect unwrapFirstError import
this was being imported from the wrong place
* Refactor `concurrently_execute` to use `yieldable_gather_results`
* Improve exception handling in `yieldable_gather_results`
Try to avoid swallowing so many stack traces.
* mark unwrapFirstError deprecated
* changelog
...and various code supporting it.
The /spaces endpoint was from an old version of MSC2946 and included
both a Client-Server and Server-Server API. Note that the unstable
/hierarchy endpoint (from the final version of MSC2946) is not yet
removed.
* Fix `PushRuleEvaluator` to work on frozendicts
frozendicts do not (necessarily) inherit from dict, so this needs to handle
them correctly.
* Fix event filtering for frozen events
Looks like this one was introduced by #11194.
Before this fix, a legitimate 404 from a federation endpoint (e.g. due
to an unknown room) would be treated as an unknown endpoint. This
could cause unnecessary federation traffic.
Don't attempt to add non-string `value`s to `event_search` and add a
background update to clear out bad rows from `event_search` when
using sqlite.
Signed-off-by: Sean Quah <seanq@element.io>
The complement.sh script relies on the name of the ref matching the name
of the unpacked folder. The branch redirect from renaming the default
branch breaks that assumption.
Signed-off-by: Nicolas Werner <n.werner@famedly.com>
* Remove `trial` section from setup.cfg
This was added in the initial commit from 2014. I can't see that it does
anything. Maybe it's there so that you can run `trial` without any extra
args, but if I do that then I just get the `--help` message.
* Move flake8's config to its own file
This is an endpoint that we have server-side support for, but no client-side support. It's going to be useful for resyncing partial-stated rooms, so let's introduce it.
msc3706 proposes changing the `/send_join` response:
> Any events returned within `state` can be omitted from `auth_chain`.
Currently, we rely on `m.room.create` being returned in `auth_chain`, but since
the `m.room.create` event must necessarily be part of the state, the above
change will break this.
In short, let's look for `m.room.create` in `state` rather than `auth_chain`.
For users with large accounts it is inefficient to calculate the set of
users they share a room with (and takes a lot of space in the cache).
Instead we can look at users whose devices have changed since the last
sync and check if they share a room with the syncing user.
When the server leaves a room the `get_rooms_for_user` cache is not
correctly invalidated for the remote users in the room. This means that
subsequent calls to `get_rooms_for_user` for the remote users would
incorrectly include the room (it shouldn't be included because the
server no longer knows anything about the room).
The driver for this is to stop Complement complaining about it, but as far as I can tell it was pointless and needed to go away anyway.
I'm a bit unclear about what exactly VOLUME does, but I think what it means is that, if you don't override it with an explicit -v argument, then docker run will create a temporary volume, and copy things into it. The temporary volume is then deleted when the container finishes.
That only sounds useful if your image has something to copy into it (otherwise you may as well just use the default root filesystem), and our image notably doesn't copy anything into /data.
So... this wasn't doing anything, except annoying Complement?
Splits the search code into a few logical functions instead of a single
unreadable function.
There are also a few additional changes for readability.
After refactoring it was clear to see there were some unused and
unnecessary variables, which were simplified.
If the latest event in a thread was edited than the original
event content was included in bundled aggregation for
threads instead of the edited event content.
* Make `get_auth_chain_ids` return a Set
It has a set internally, and a set is often useful where it gets used, so let's
avoid converting to an intermediate list.
* Minor refactors in `on_send_join_request`
A little bit of non-functional groundwork
* Implement MSC3706: partial state in /send_join response
This should reduce database usage when fetching bundled aggregations
as the number of individual queries (and round trips to the database) are
reduced.
If ther are more than 100 to-device messages pending for a device
`/sync` will only return the first 100, however the next batch token was
incorrectly calculated and so all other pending messages would be
dropped.
This is due to `txn.rowcount` only returning the number of rows that
*changed*, rather than the number *selected* in SQLite.
If we prepopulate the test homeserver with a key for a remote homeserver, we
can make federation requests to it without having to stub out the
authenticator. This has two advantages:
* means that what we are testing is closer to reality (ie, we now have
complete tests for the incoming-request-authorisation flow)
* some tests require that other objects be signed by the remote server (eg,
the event in `/send_join`), and doing that would require a whole separate
set of mocking out. It's much simpler just to use real keys.
This implements an allow list for content types for which Synapse will attempt URL preview. If a URL resolves to a resource with a content type which isn't in the list, the download will terminate immediately.
This makes sense given that Synapse would never successfully generate a URL preview for such files in the first place, and helps prevent issues with streaming media servers, such as #8302.
Signed-off-by: Denis Kasak dkasak@termina.org.uk
This should reduce database usage when fetching bundled aggregations
as the number of individual queries (and round trips to the database) are
reduced.
Part of the Tchap Synapse mainlining.
This allows modules to implement extra logic to figure out whether a given 3PID can be added to the local homeserver. In the Tchap use case, this will allow a Synapse module to interface with the custom endpoint /internal_info.
Since #11811 there has been general Complement flakiness around networking.
It seems like tests are hitting the wrong containers. In an effort to diagnose
the cause of this, as well as reduce its impact on this project, set the
parallelsim to 1 (no parallelism) when running tests.
If this fixes the flakiness then this indicates the cause and I can diagnose
this further. If this doesn't fix the flakiness then that implies some kind
of test pollution which also helps to diagnose this further.
The idea here is to set the parent span for incoming federation requests to the
*outgoing* span on the other end. That means that you can see (most of) the
full end-to-end flow when you have a process that includes federation requests.
However, in order not to lose information, we still want a link to the
`incoming-federation-request` span from the servlet, so we have to create
another span to do exactly that.
`start_active_span` was inconsistent as to whether it would activate the span
immediately, or wait for `scope.__enter__` to happen (it depended on whether
the current logcontext already had an associated scope). The inconsistency was
rather confusing if you were hoping to set up a couple of separate spans before
activating either.
Looking at the other implementations of opentracing `ScopeManager`s, the
intention is that it *should* be activated immediately, as the name
implies. Indeed, the idea is that you don't have to use the scope as a
contextmanager at all - you can just call `.close` on the result. Hence, our
cleanup has to happen in `.close` rather than `.__exit__`.
So, the main change here is to ensure that `start_active_span` does activate
the span, and that `scope.close()` does close the scope.
We also add some tests, which requires a `tracer` param so that we don't have
to rely on the global variable in unit tests.
The get_users_in_room and get_users_in_room_with_profiles
are now only invalidated when the membership of a room changes,
instead of during any state change in the room.
* Fix losing incoming EDUs if debug logging enabled
Fixes#11889. Homeservers should only be affected if the
`synapse.8631_debug` logger was enabled for DEBUG mode.
I am not sure if this merits a bugfix release: I think the logging can
be disabled in config if anyone is affected? But it is still pretty bad.
Only allow files which file size and content types match configured
limits to be set as avatar.
Most of the inspiration from the non-test code comes from matrix-org/synapse-dinsic#19
This is in the context of mainlining the Tchap fork of Synapse. Currently in Tchap usernames are derived from the user's email address (extracted from the UIA results, more specifically the m.login.email.identity step).
This change also exports the check_username method from the registration handler as part of the module API, so that a module can check if the username it's trying to generate is correct and doesn't conflict with an existing one, and fallback gracefully if not.
Co-authored-by: David Robertson <davidr@element.io>
This is some odds and ends found during the review of #11791
and while continuing to work in this code:
* Return attrs classes instead of dictionaries from some methods
to improve type safety.
* Call `get_bundled_aggregations` fewer times.
* Adds a missing assertion in the tests.
* Do not return empty bundled aggregations for an event (preferring
to not include the bundle at all, as the docstring states).
This is mostly motivated by the tchap use case, where usernames are automatically generated from the user's email address (in a way that allows figuring out the email address from the username). Therefore, it's an issue if we respond to requests on /register and /register/available with M_USER_IN_USE, because it can potentially leak email addresses (which include the user's real name and place of work).
This commit adds a flag to inhibit the M_USER_IN_USE errors that are raised both by /register/available, and when providing a username early into the registration process. This error will still be raised if the user completes the registration process but the username conflicts. This is particularly useful when using modules (https://github.com/matrix-org/synapse/pull/11790 adds a module callback to set the username of users at registration) or SSO, since they can ensure the username is unique.
More context is available in the PR that introduced this behaviour to synapse-dinsic: matrix-org/synapse-dinsic#48 - as well as the issue in the matrix-dinsic repo: matrix-org/matrix-dinsic#476
Similar to #11817.
In `_create_power_level_validator` we
- retrieve `validator`. This is a class implementing the
`jsonschema.protocols.Validator` interface. In other words,
`validator: Type[jsonschema.protocols.Validator]`.
- we then create an second validator class by modifying the original
`validator`. We return that class, which is also of type
`Type[jsonschema.protocols.Validator]`.
So the original annotation was incorrect: it claimed we were returning
an instance of jsonSchema.Draft7Validator, not the class (or a subclass)
itself. (Strictly speaking this is incorrect, because `POWER_LEVELS_SCHEMA`
isn't pinned to a particular version of JSON Schema. But there are other
complications with the type stubs if you try to fix this; I felt like
the change herein was a decent compromise that better expresses intent).
(I suspect/hope the typeshed project would welcome an effort to improve
the jsonschema stubs. Let's see if I get some spare time.)
* add check that gc.freeze is available before calling
* newsfragment
* lint
* Update comment
Co-authored-by: Dan Callahan <danc@element.io>
Co-authored-by: Dan Callahan <danc@element.io>
* CI: run Complement on the VM, not inside Docker
This requires https://github.com/matrix-org/complement/pull/289
We now run Complement on the VM instead of inside a Docker container.
This is to allow Complement to bind to any high-numbered port when it
starts up its own federation servers. We want to do this to allow for
more concurrency when running complement tests. Previously, Complement
only ever bound to `:8448` when running its own federation server. This
prevented multiple federation tests running at the same time as they would
fight each other on the port. This did however allow Complement to run
in Docker, as the host could just port forward `:8448` to allow homeserver
containers to communicate to Complement. Now that we are using random
ports however, we cannot use Docker to run Complement. This ends up
being a good thing because:
- Running Complement tests locally is closer to how they run in CI.
- Allows the `CI` env var to be removed in Complement.
- Slightly speeds up runs as we don't need to pull down the Complement
image prior to running tests. This assumes GHA caches actions sensibly.
* Changelog
* Full stop
* Update .github/workflows/tests.yml
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
* Review comments
* Update .github/workflows/tests.yml
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
* Docs: add missing PR submission process how-tos
The documentation says that in order to submit a pull request you have to run the linter and links to [Run the linters](https://matrix-org.github.io/synapse/latest/development/contributing_guide.html#run-the-linters). IMO "Run the linters" should explain that development dependencies are a pre-requisite.
I also included `pip install wheel` which I had to run inside my virtual environment on ubuntu before I `pip install -e ".[all,dev]"` would succeed.
It had already accounted for 1.50.2 (ordered chronologically rather than
sem-ver-ically); it just seems this wasn't merged into master when we
released 1.50.2.
PyNaCl's recent 1.5.0 release on PyPi includes arm64 wheels, which means our
arm64 docker images now build in a sensible amount of time, so we can skip the
amd64-only build.
Synapse 1.51.0rc2 (2022-01-24)
==============================
Bugfixes
--------
- Fix a bug introduced in Synapse 1.40.0 that caused Synapse to fail to process incoming federation traffic after handling a large amount of events in a v1 room. ([\#11806](https://github.com/matrix-org/synapse/issues/11806))
* remove reference in comments to python3.6
* upgrade tox python env in script
* bump python version in example for completeness
* upgrade python version requirement in setup doc
* upgrade necessary python version in __init__.py
* upgrade python version in setup.py
* newsfragment
* drops refs to bionic and replace with focal
* bump refs to postgres 9.6 to 10
* fix hanging ci
* try installing tzdata first
* revert change made in b979f336
* ignore new random mypy error while debugging other error
* fix lint error for temporary workaround
* revert change to install list
* try passing env var
* export debian frontend var?
* move line and add comment
* bump pillow dependency
* bump lxml depenency
* install libjpeg-dev for pillow
* bump automat version to one compatible with py3.8
* add libwebp for pillow
* bump twisted trunk python version
* change suffix of newsfragment
* remove redundant python 3.7 checks
* lint
Debug for #8631.
I'm having a hard time tracking down what's going wrong in that issue.
In the reported example, I could see server A sending federation traffic
to server B and all was well. Yet B reports out-of-sync device updates
from A.
I couldn't see what was _in_ the events being sent from A to B. So I
have added some crude logging to track
- when we have updates to send to a remote HS
- the edus we actually accumulate to send
- when a federation transaction includes a device list update edu
- when such an EDU is received
This is a bit of a sledgehammer.
By scraping Open Graph information from the HTML even
when an autodiscovery endpoint is found. The results are
then combined to capture as much information as possible
from the page.
I've never found this terribly useful. I think it was added in the early days
of Synapse, without much thought as to what would actually be useful to log,
and has just been cargo-culted ever since.
Rather, it tends to clutter up debug logs with useless information.
The existing implementation of the `python_twisted_reactor_tick_time` metric is pretty useless, because it *only*
measures the time taken to execute timed calls and callbacks from threads. That neglects everything that
happens off the back of I/O, which is obviously quite a lot for us.
To improve this, I've hooked into a different place in the reactor - in particular, where it calls `epoll`. That call is
the only place it should wait for something to happen - the rest of the loop *should* be quick.
I've also removed `python_twisted_reactor_pending_calls`, because I don't believe anyone ever looks at it, and
it's a nuisance to populate.
Always add state.room_id after the configurable ORDER BY. Otherwise,
for any sort, certain pages can contain results from
other pages. (Especially when sorting by creator, since there may
be many rooms by the same creator)
* Document different order direction of numerical fields
"joined_members", "joined_local_members", "version" and "state_events"
are ordered in descending direction by default (dir=f). Added a note
in tests to explain the differences in ordering.
Signed-off-by: Daniël Sonck <daniel@sonck.nl>
documentation claims that you can use the %(app)s variable in password_reset and email_validation subjects, but if you do you end up with an error 500
Co-authored-by: br4nnigan <10244835+br4nnigan@users.noreply.github.com>
Rather than hooking into the reactor loop, just add a timed task that runs every 100 ms to do the garbage collection.
Part 1 of a quest to simplify the reactor monkey-patching.
Currently when puppeting another user, the user doing the puppeting is
tracked for client IPs and MAU (if configured).
When tracking MAU is important, it becomes necessary to be possible to
also track the client IPs and MAU of puppeted users. As an example a
client that manages user creation and creation of tokens via the Synapse
admin API, passing those tokens for the client to use.
This PR adds optional configuration to enable tracking of puppeted users
into monthly active users. The default behaviour stays the same.
Signed-off-by: Jason Robinson <jasonr@matrix.org>
* Deal with mypy errors w/ type-hinted pynacl 1.5.0
Fixes#11644.
I really don't like that we're monkey patching pynacl SignedKey
instances with alg and version objects. But I'm too scared to make the
changes necessary right now.
(Ideally I would replace `signedjson.types.SingingKey` with a runtime class which
wraps or inherits from `nacl.signing.SigningKey`.) C.f. https://github.com/matrix-org/python-signedjson/issues/16
* Deal with mypy errors w/ type-hinted pynacl 1.5.0
Fixes#11644.
I really don't like that we're monkey patching pynacl SignedKey
instances with alg and version objects. But I'm too scared to make the
changes necessary right now.
(Ideally I would replace `signedjson.types.SingingKey` with a runtime class which
wraps or inherits from `nacl.signing.SigningKey`.) C.f. https://github.com/matrix-org/python-signedjson/issues/16
By returning all of the m.space.child state of the space, not just
the first 50. The number of rooms returned is still capped at 50.
For the federation API this implies that the requesting server will
need to individually query for any other rooms it is not joined to.
* Optionally use an on-disk sqlite db in tests
When debugging a test it is sometimes useful to inspect the state of the
DB. This is not easy when the db is in-memory: one cannot attach the
sqlite CLI to another process's DB.
With this change, if SYNAPSE_TEST_PERSIST_SQLITE_DB is set, we use
`_trial_temp/test.db` as our sqlite database. One can then use
`sqlite3 _trial_temp/test.db` and query to your heart's content.
The DB is destroyed and recreated between different test cases.
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
This makes the serialization of events synchronous (and it no
longer access the database), but we must manually calculate and
provide the bundled aggregations.
Overall this should cause no change in behavior, but is prep work
for other improvements.
Fixes minor discrepancies between the /hierarchy endpoint described
in MSC2946 and the implementation.
Note that the changes impact the stable and unstable /hierarchy and
unstable /spaces endpoints for both client and federation APIs.
* `_auth_and_persist_outliers`: mark persisted events as outliers
Mark any events that get persisted via `_auth_and_persist_outliers` as, well,
outliers.
Currently this will be a no-op as everything will already be flagged as an
outlier, but I'm going to change that.
* `process_remote_join`: stop flagging as outlier
The events are now flagged as outliers later on, by `_auth_and_persist_outliers`.
* `send_join`: remove `outlier=True`
The events created here are returned in the result of `send_join` to
`FederationHandler.do_invite_join`. From there they are passed into
`FederationEventHandler.process_remote_join`, which passes them to
`_auth_and_persist_outliers`... which sets the `outlier` flag.
* `get_event_auth`: remove `outlier=True`
stop flagging the events returned by `get_event_auth` as outliers. This method
is only called by `_get_remote_auth_chain_for_event`, which passes the results
into `_auth_and_persist_outliers`, which will flag them as outliers.
* `_get_remote_auth_chain_for_event`: remove `outlier=True`
we pass all the events into `_auth_and_persist_outliers`, which will now flag
the events as outliers.
* `_check_sigs_and_hash_and_fetch`: remove unused `outlier` parameter
This param is now never set to True, so we can remove it.
* `_check_sigs_and_hash_and_fetch_one`: remove unused `outlier` param
This is no longer set anywhere, so we can remove it.
* `get_pdu`: remove unused `outlier` parameter
... and chase it down into `get_pdu_from_destination_raw`.
* `event_from_pdu_json`: remove redundant `outlier` param
This is never set to `True`, so can be removed.
* changelog
* update docstring
* Fix AssertionErrors after purging events
If you purged a bunch of events from your database, and then restarted synapse
without receiving more events, then you would get a bunch of AssertionErrors on
restart.
This fixes the situation by rewinding the stream processors.
* `check-newsfragment`: ignore deleted newsfiles
Events returned by `backfill` should not be flagged as outliers.
Fixes:
```
AssertionError: null
File "synapse/handlers/federation.py", line 313, in try_backfill
dom, room_id, limit=100, extremities=extremities
File "synapse/handlers/federation_event.py", line 517, in backfill
await self._process_pulled_events(dest, events, backfilled=True)
File "synapse/handlers/federation_event.py", line 642, in _process_pulled_events
await self._process_pulled_event(origin, ev, backfilled=backfilled)
File "synapse/handlers/federation_event.py", line 669, in _process_pulled_event
assert not event.internal_metadata.is_outlier()
```
See https://sentry.matrix.org/sentry/synapse-matrixorg/issues/231992Fixes#8894.
* Push `get_room_{min,max_stream_ordering}` into StreamStore
Both implementations of this are identical, so we may as well push it down and
get rid of the abstract base class nonsense.
* Remove redundant `StreamStore` class
This is empty now
* Remove redundant `get_current_events_token`
This was an exact duplicate of `get_room_max_stream_ordering`, so let's get rid
of it.
* newsfile
* remove python 3.6 and postgres 9.6 from github workflow
* remove python 3.6 env from tox
* newsfragment
* correct postgres version
* add py310 to tox env list
* Wrap `auth.get_user_by_req` in an opentracing span
give `get_user_by_req` its own opentracing span, since it can result in a
non-trivial number of sub-spans which it is useful to group together.
This requires a bit of reorganisation because it also sets some tags (and may
force tracing) on the servlet span.
* Emit opentracing span for encoding json responses
This can be a significant time sink.
* Rename all sync spans with a prefix
* Write an opentracing span for encoding sync response
* opentracing span to group generate_room_entries
* opentracing spans within sync.encode_response
* changelog
* Use the `trace` decorator instead of context managers
This adds some opentracing annotations to ResponseCache, to make it easier to see what's going on; in particular, it adds a link back to the initial trace which is actually doing the work of generating the response.
* remove `start_active_span_from_request`
Instead, pull out a separate function, `span_context_from_request`, to extract
the parent span, which we can then pass into `start_active_span` as
normal. This seems to be clearer all round.
* Remove redundant tags from `incoming-federation-request`
These are all wrapped up inside a parent span generated in AsyncResource, so
there's no point duplicating all the tags that are set there.
* Leave request spans open until the request completes
It may take some time for the response to be encoded into JSON, and that JSON
to be streamed back to the client, and really we want that inside the top-level
span, so let's hand responsibility for closure to the SynapseRequest.
* opentracing logs for HTTP request events
* changelog
* Disable aggregation bundling on `/sync` responses
A partial revert of #11478. This turns out to have had a significant CPU impact
on initial-sync handling. For now, let's disable it, until we find a more
efficient way of achieving this.
* Fix tests.
Co-authored-by: Patrick Cloke <patrickc@matrix.org>
A couple of safety-checks to hopefully stop people doing what I just did, and create a storage
function which only works the first time it is called (and not when it is re-run due to a database
concurrency error or similar).
* Splits the logic for parsing HTML from the resource handling code.
* Fix a circular import in the oEmbed code (which uses the HTML parsing code).
* Renames some of the HTML parsing methods to:
* Make it clear which methods are "internal" to the module.
* Clarify what the methods do.
Create a new dict helper method `simple_insert_many_values_txn`, which takes
raw row values, rather than {key=>value} dicts. This saves us a bunch of dict
munging, and makes it easier to use generators rather than creating
intermediate lists and dicts.
Revert "Sort internal changes in changelog"
Revert "Update CHANGES.md"
Revert "1.49.0rc1"
Revert "Revert "Move `glob_to_regex` and `re_word_boundary` to `matrix-python-common` (#11505) (#11527)"
Revert "Refactors in `_generate_sync_entry_for_rooms` (#11515)"
Revert "Correctly register shutdown handler for presence workers (#11518)"
Revert "Fix `ModuleApi.looping_background_call` for non-async functions (#11524)"
Revert "Fix 'delete room' admin api to work on incomplete rooms (#11523)"
Revert "Correctly ignore invites from ignored users (#11511)"
Revert "Fix the test breakage introduced by #11435 as a result of concurrent PRs (#11522)"
Revert "Stabilise support for MSC2918 refresh tokens as they have now been merged into the Matrix specification. (#11435)"
Revert "Save the OIDC session ID (sid) with the device on login (#11482)"
Revert "Add admin API to get some information about federation status (#11407)"
Revert "Include bundled aggregations in /sync and related fixes (#11478)"
Revert "Move `glob_to_regex` and `re_word_boundary` to `matrix-python-common` (#11505)"
Revert "Update backward extremity docs to make it clear that it does not indicate whether we have fetched an events' `prev_events` (#11469)"
Revert "Support configuring the lifetime of non-refreshable access tokens separately to refreshable access tokens. (#11445)"
Revert "Add type hints to `synapse/tests/rest/admin` (#11501)"
Revert "Revert accidental commits to develop."
Revert "Newsfile"
Revert "Give `tests.server.setup_test_homeserver` (nominally!) the same behaviour"
Revert "Move `tests.utils.setup_test_homeserver` to `tests.server`"
Revert "Convert one of the `setup_test_homeserver`s to `make_test_homeserver_synchronous`"
Revert "Disambiguate queries on `state_key` (#11497)"
Revert "Comments on the /sync tentacles (#11494)"
Revert "Clean up tests.storage.test_appservice (#11492)"
Revert "Clean up `tests.storage.test_main` to remove use of legacy code. (#11493)"
Revert "Clean up `tests.test_visibility` to remove legacy code. (#11495)"
Revert "Minor cleanup on recently ported doc pages (#11466)"
Revert "Add most of the missing type hints to `synapse.federation`. (#11483)"
Revert "Avoid waiting for zombie processes in `synctl stop` (#11490)"
Revert "Fix media repository failing when media store path contains symlinks (#11446)"
Revert "Add type annotations to `tests.storage.test_appservice`. (#11488)"
Revert "`scripts-dev/sign_json`: support for signing events (#11486)"
Revert "Add MSC3030 experimental client and federation API endpoints to get the closest event to a given timestamp (#9445)"
Revert "Port wiki pages to documentation website (#11402)"
Revert "Add a license header and comment. (#11479)"
Revert "Clean-up get_version_string (#11468)"
Revert "Link background update controller docs to summary (#11475)"
Revert "Additional type hints for config module. (#11465)"
Revert "Register the login redirect endpoint for v3. (#11451)"
Revert "Update openid.md"
Revert "Remove mention of OIDC certification from Dex (#11470)"
Revert "Add a note about huge pages to our Postgres doc (#11467)"
Revert "Don't start Synapse master process if `worker_app` is set (#11416)"
Revert "Expose worker & homeserver as entrypoints in `setup.py` (#11449)"
Revert "Bundle relations of relations into the `/relations` result. (#11284)"
Revert "Fix `LruCache` corruption bug with a `size_callback` that can return 0 (#11454)"
Revert "Eliminate a few `Any`s in `LruCache` type hints (#11453)"
Revert "Remove unnecessary `json.dumps` from `tests.rest.admin` (#11461)"
Revert "Merge branch 'master' into develop"
This reverts commit 26b5d2320f.
This reverts commit bce4220f38.
This reverts commit 966b5d0fa0.
This reverts commit 088d748f2c.
This reverts commit 14d593f72d.
This reverts commit 2a3ec6facf.
This reverts commit eccc49d755.
This reverts commit b1ecd19c5d.
This reverts commit 9c55dedc8c.
This reverts commit 2d42e586a8.
This reverts commit 2f053f3f82.
This reverts commit a15a893df8.
This reverts commit 8b4b153c9e.
This reverts commit 494ebd7347.
This reverts commit a77c369897.
This reverts commit 4eb77965cd.
This reverts commit 637df95de6.
This reverts commit e5f426cd54.
This reverts commit 8cd68b8102.
This reverts commit 6cae125e20.
This reverts commit 7be88fbf48.
This reverts commit b3fd99b74a.
This reverts commit f7ec6e7d9e.
This reverts commit 5640992d17.
This reverts commit d26808dd85.
This reverts commit f91624a595.
This reverts commit 16d39a5490.
This reverts commit 8a4c296987.
This reverts commit 49e1356ee3.
This reverts commit d2279f471b.
This reverts commit b50e39df57.
This reverts commit 858d80bf0f.
This reverts commit 435f044807.
This reverts commit f61462e1be.
This reverts commit a6f1a3abec.
This reverts commit 84dc50e160.
This reverts commit ed635d3285.
This reverts commit 7b62791e00.
This reverts commit 153194c771.
This reverts commit f44d729d4c.
This reverts commit a265fbd397.
This reverts commit b9fef1a7cd.
This reverts commit b0eb64ff7b.
This reverts commit f1795463bf.
This reverts commit 70cbb1a5e3.
This reverts commit 42bf020463.
This reverts commit 379f2650cf.
This reverts commit 7ff22d6da4.
This reverts commit 5a0b652d36.
This reverts commit 432a174bc1.
This reverts commit b14f8a1baf, reversing
changes made to e713855dca.
* Move sync_token up to the top
* Pull out _get_ignored_users
* Try to signpost the body of `_generate_sync_entry_for_rooms`
* Pull out _calculate_user_changes
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
After #10847, `looping_background_call` would print an error in the logs
every time a non-async function was called. Since the error would be
caught and ignored immediately, there were no other side effects.
Due to updates to MSC2675 this includes a few fixes:
* Include bundled aggregations for /sync.
* Do not include bundled aggregations for /initialSync and /events.
* Do not bundle aggregations for state events.
* Clarifies comments and variable names.
by calling into `make_test_homeserver_synchronous`.
The function *could* have been inlined at this point but the function is big enough
and it felt fine to leave it as is.
At least there isn't a confusing name clash anymore!
It had no users.
We have just taken the identity of a previous function but don't provide the same
behaviour, so we need to fix this in the next commit...
This mainly consists of docstrings and inline comments. There are one or two type annotations and variable renames thrown in while I was here.
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
* move wiki pages to synapse/docs and add a few titles where necessary
* update SUMMARY.md with added pages
* add changelog
* move incorrectly located newsfragment
* update changelog number
* snake case added files and update summary.md accordingly
* update issue/pr links
* update relative links to docs
* update changelog to indicate that we moved wiki pages to the docs and state reasoning
* requested changes to admin_faq.md
* requested changes to database_maintenance_tools.md
* requested changes to understanding_synapse_through_graphana_graphs.md
* add changelog
* fix leftover merge errata
* fix unwanted changes from merge
* use two spaces between entries
* outdent code blocks
MSC3030: https://github.com/matrix-org/matrix-doc/pull/3030
Client API endpoint. This will also go and fetch from the federation API endpoint if unable to find an event locally or we found an extremity with possibly a closer event we don't know about.
```
GET /_matrix/client/unstable/org.matrix.msc3030/rooms/<roomID>/timestamp_to_event?ts=<timestamp>&dir=<direction>
{
"event_id": ...
"origin_server_ts": ...
}
```
Federation API endpoint:
```
GET /_matrix/federation/unstable/org.matrix.msc3030/timestamp_to_event/<roomID>?ts=<timestamp>&dir=<direction>
{
"event_id": ...
"origin_server_ts": ...
}
```
Co-authored-by: Erik Johnston <erik@matrix.org>
* move wiki pages to synapse/docs and add a few titles where necessary
* update SUMMARY.md with added pages
* add changelog
* move incorrectly located newsfragment
* update changelog number
* snake case added files and update summary.md accordingly
* update issue/pr links
* update relative links to docs
* update changelog to indicate that we moved wiki pages to the docs and state reasoning
* revert unintentional change to CHANGES.md
* add link
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
* Update CHANGES.md
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
* Add check to catch syanpse master process starting when workers are configured
* add test to verify that starting master process with worker config raises error
* newsfragment
* specify config.worker.worker_app in check
* update test
* report specific config option that triggered the error
Co-authored-by: reivilibre <oliverw@matrix.org>
* clarify error message
Co-authored-by: reivilibre <oliverw@matrix.org>
Co-authored-by: reivilibre <oliverw@matrix.org>
When all entries in an `LruCache` have a size of 0 according to the
provided `size_callback`, and `drop_from_cache` is called on a cache
node, the node would be unlinked from the LRU linked list but remain in
the cache dictionary. An assertion would be later be tripped due to the
inconsistency.
Avoid unintentionally calling `__len__` and use a strict `is None`
check instead when unwrapping the weak reference.
Part of https://github.com/matrix-org/synapse/issues/11300
Call stack:
- `_persist_events_and_state_updates` (added `use_negative_stream_ordering`)
- `_persist_events_txn`
- `_update_room_depths_txn` (added `update_room_forward_stream_ordering`)
- `_update_metadata_tables_txn`
- `_store_room_members_txn` (added `inhibit_local_membership_updates`)
Using keyword-only arguments (`*`) to reduce the mistakes from `backfilled` being left as a positional argument somewhere and being interpreted wrong by our new arguments.
Since e81fa92648, Synapse depends on
the use_float flag which has been introduced in ijson 3.1 and
is not available in 3.0. This is known to cause runtime errors
with send_join.
Signed-off-by: Daniel Molkentin <danimo@infra.run>
Co-authored-by: Daniel Molkentin <danimo@infra.run>
The previous fix for the ongoing event fetches counter
(8eec25a1d9) was both insufficient and
incorrect.
When the database is unreachable, `_do_fetch` never gets run and so
`_event_fetch_ongoing` is never decremented.
The previous fix also moved the `_event_fetch_ongoing` decrement outside
of the `_event_fetch_lock` which allowed race conditions to corrupt the
counter.
This change makes mypy complain if the constants are ever reassigned,
and, more usefully, makes mypy type them as `Literal`s instead of `str`s,
allowing code of the following form to pass mypy:
```py
def do_something(membership: Literal["join", "leave"], ...): ...
do_something(Membership.JOIN, ...)
```
* remove background update code related to deprecated config flag
* changelog entry
* update changelog
* Delete 11394.removal
Duplicate, wrong number
* add no-op background update and change newfragment so it will be consolidated with associated work
* remove unused code
* Remove code associated with deprecated flag from legacy docker dynamic config file
Co-authored-by: reivilibre <oliverw@matrix.org>
Synapse 1.47.1 (2021-11-23)
===========================
This release fixes a security issue in the media store, affecting all prior releases of Synapse. Server administrators are encouraged to update Synapse as soon as possible. We are not aware of these vulnerabilities being exploited in the wild.
Server administrators who are unable to update Synapse may use the workarounds described in the linked GitHub Security Advisory below.
Security advisory
-----------------
The following issue is fixed in 1.47.1.
- **[GHSA-3hfw-x7gx-437c](https://github.com/matrix-org/synapse/security/advisories/GHSA-3hfw-x7gx-437c) / [CVE-2021-41281](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-41281): Path traversal when downloading remote media.**
Synapse instances with the media repository enabled can be tricked into downloading a file from a remote server into an arbitrary directory, potentially outside the media store directory.
The last two directories and file name of the path are chosen randomly by Synapse and cannot be controlled by an attacker, which limits the impact.
Homeservers with the media repository disabled are unaffected. Homeservers configured with a federation whitelist are also unaffected.
Fixed by [91f2bd090](https://github.com/matrix-org/synapse/commit/91f2bd090).
Instead of only known relation types. This also reworks the background
update for thread relations to crawl events and search for any relation
type, not just threaded relations.
If `room_list_publication_rules` was configured with a rule with a
non-wildcard alias and a room was created with an alias then an
internal server error would have been thrown.
This fixes the error and properly applies the publication rules
during room creation.
Fixes a bug introduced in #11129: objects signed by the local server, but with
keys other than the current one, could not be successfully verified.
We need to check the key id in the signature, and track down the right key.
* remove code legacy code related to deprecated config flag "trust_identity_server_for_password_resets" from synapse/config/emailconfig.py
* remove legacy code supporting depreciated config flag "trust_identity_server_for_password_resets" from synapse/config/registration.py
* remove legacy code supporting depreciated config flag "trust_identity_server_for_password_resets" from synapse/handlers/identity.py
* add tests to ensure config error is thrown and synapse refuses to start when depreciated config flag is found
* add changelog
* slightly change behavior to only check for deprecated flag if set to 'true'
* Update changelog.d/11333.misc
Co-authored-by: reivilibre <oliverw@matrix.org>
Co-authored-by: reivilibre <oliverw@matrix.org>
Adds validation to the Client-Server API to ensure that
the potential thread head does not relate to another event
already. This results in not allowing a thread to "fork" into
other threads.
If the target event is unknown for some reason (maybe it isn't
visible to your homeserver), but is the target of other events
it is assumed that the thread can be created from it. Otherwise,
it is rejected as an unknown event.
Otherwise I get this beautiful stacktrace:
```
python3 -m synapse.app.homeserver --config-path /etc/matrix/homeserver.yaml
Traceback (most recent call last):
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/root/synapse/synapse/app/homeserver.py", line 455, in <module>
main()
File "/root/synapse/synapse/app/homeserver.py", line 445, in main
hs = setup(sys.argv[1:])
File "/root/synapse/synapse/app/homeserver.py", line 345, in setup
config = HomeServerConfig.load_or_generate_config(
File "/root/synapse/synapse/config/_base.py", line 671, in load_or_generate_config
config_dict = read_config_files(config_files)
File "/root/synapse/synapse/config/_base.py", line 717, in read_config_files
yaml_config = yaml.safe_load(file_stream)
File "/root/synapse/env/lib/python3.8/site-packages/yaml/__init__.py", line 125, in safe_load
return load(stream, SafeLoader)
File "/root/synapse/env/lib/python3.8/site-packages/yaml/__init__.py", line 81, in load
return loader.get_single_data()
File "/root/synapse/env/lib/python3.8/site-packages/yaml/constructor.py", line 49, in get_single_data
node = self.get_single_node()
File "/root/synapse/env/lib/python3.8/site-packages/yaml/composer.py", line 36, in get_single_node
document = self.compose_document()
File "/root/synapse/env/lib/python3.8/site-packages/yaml/composer.py", line 55, in compose_document
node = self.compose_node(None, None)
File "/root/synapse/env/lib/python3.8/site-packages/yaml/composer.py", line 84, in compose_node
node = self.compose_mapping_node(anchor)
File "/root/synapse/env/lib/python3.8/site-packages/yaml/composer.py", line 133, in compose_mapping_node
item_value = self.compose_node(node, item_key)
File "/root/synapse/env/lib/python3.8/site-packages/yaml/composer.py", line 82, in compose_node
node = self.compose_sequence_node(anchor)
File "/root/synapse/env/lib/python3.8/site-packages/yaml/composer.py", line 110, in compose_sequence_node
while not self.check_event(SequenceEndEvent):
File "/root/synapse/env/lib/python3.8/site-packages/yaml/parser.py", line 98, in check_event
self.current_event = self.state()
File "/root/synapse/env/lib/python3.8/site-packages/yaml/parser.py", line 379, in parse_block_sequence_first_entry
return self.parse_block_sequence_entry()
File "/root/synapse/env/lib/python3.8/site-packages/yaml/parser.py", line 384, in parse_block_sequence_entry
if not self.check_token(BlockEntryToken, BlockEndToken):
File "/root/synapse/env/lib/python3.8/site-packages/yaml/scanner.py", line 116, in check_token
self.fetch_more_tokens()
File "/root/synapse/env/lib/python3.8/site-packages/yaml/scanner.py", line 227, in fetch_more_tokens
return self.fetch_alias()
File "/root/synapse/env/lib/python3.8/site-packages/yaml/scanner.py", line 610, in fetch_alias
self.tokens.append(self.scan_anchor(AliasToken))
File "/root/synapse/env/lib/python3.8/site-packages/yaml/scanner.py", line 922, in scan_anchor
raise ScannerError("while scanning an %s" % name, start_mark,
yaml.scanner.ScannerError: while scanning an alias
in "/etc/matrix/homeserver.yaml", line 614, column 5
expected alphabetic or numeric character, but found '.'
in "/etc/matrix/homeserver.yaml", line 614, column 6
```
Signed-off-by: Nicolai Søborg <git@xn--sb-lka.org>
It already seems to pass mypy. I wonder what changed, given that it was
on the exclusion list. So this commit consists of me ensuring
`--disallow-untyped-defs` passes and a minor fixup to a function that
returned either `True` or `None`.
* Add support for the stable version of MSC2778
Signed-off-by: Tulir Asokan <tulir@maunium.net>
* Expect m.login.application_service in login and password provider tests
Signed-off-by: Tulir Asokan <tulir@maunium.net>
* remove unused tables room_stats_historical and user_stats_historical
* update changelog number
* Bump schema compat version comment
* make linter happy
* Update comment to give more info
Co-authored-by: reivilibre <oliverw@matrix.org>
Co-authored-by: reivilibre <oliverw@matrix.org>
* Prefer `HTTPStatus` over plain `int`
This is an Opinion that no-one has seemed to object to yet.
* `--disallow-untyped-defs` for `tests.rest.client.test_directory`
* Improve synapse's annotations for deleting aliases
* Test case for deleting a room alias
* Changelog
* change display names/avatar URLS to None if they contain null bytes
* add changelog
* add POC test, requested changes
* add a saner test and remove old one
* update test to verify that display name has been changed to None
* make test less fragile
* Make DataStore inherit from EventForwardExtremitiesStore before CacheInvalidationWorkerStore
the former implicitly inherits from the latter, so they should be
ordered like this when used.
* Annotate HomeserverTestCase.servlets
* Correct annotation of federation_auth_origin
* Use AnyStr custom_headers instead of a Union
This allows (str, str) and (bytes, bytes).
This disallows (str, bytes) and (bytes, str)
* DomainSpecificString.SIGIL is a ClassVar
Synapse 1.47.0rc2 (2021-11-10)
==============================
This fixes an issue with publishing the Debian packages for 1.47.0rc1.
It is otherwise identical to 1.47.0rc1.
* Make lock better handle process being killed
If the process gets killed and restarted (so that it didn't have a
chance to drop its locks gracefully) then there may still be locks in
the DB that are for the same instance that haven't yet timed out but are
safe to delete.
We handle this case by a) checking if the current instance already has
taken out the lock, and b) if not then ignoring locks that are for the
same instance.
* Periodically check for old staged events
This is to protect against other instances dying and their locks timing
out.
* Remove unused Vagrant scripts
* Change package Architecture to any
* Preinstall the wheel package when building venvs.
Addresses the following warnings during Debian builds:
Using legacy 'setup.py install' for jaeger-client, since package 'wheel' is not installed.
Using legacy 'setup.py install' for matrix-synapse-ldap3, since package 'wheel' is not installed.
Using legacy 'setup.py install' for opentracing, since package 'wheel' is not installed.
Using legacy 'setup.py install' for psycopg2, since package 'wheel' is not installed.
Using legacy 'setup.py install' for systemd-python, since package 'wheel' is not installed.
Using legacy 'setup.py install' for pympler, since package 'wheel' is not installed.
Using legacy 'setup.py install' for threadloop, since package 'wheel' is not installed.
Using legacy 'setup.py install' for thrift, since package 'wheel' is not installed.
* Allow /etc/default/matrix-synapse to be missing
Per the systemd.exec manpage, prefixing an EnvironmentFile with "-":
> indicates that if the file does not exist, it will not be read and no
> error or warning message is logged.
Signed-off-by: Dan Callahan <danc@element.io>
When an event fetcher aborts due to an exception, `_event_fetch_ongoing`
must be decremented, otherwise the event fetcher would never be
replaced. If enough event fetchers were to fail, no more events would be
fetched and requests would get stuck waiting for events.
* add code to handle missing content-type header and a test to verify that it works
* add handling for missing content-type in the /upload endpoint as well
* slightly refactor test code to put private method in approriate place
* handle possible null value for content-type when pulling from the local db
* add changelog
* refactor test and add code to handle missing content-type in cached remote media
* requested changes
* Update changelog.d/11200.bugfix
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
* Docker image: avoid changing user during `generate`
The intention was always that the config files get written as the initial user
(normally root) - only the data directory needs to be writable by Synapse. This
got changed in https://github.com/matrix-org/synapse/pull/5970, but that seems
to have been a mistake.
* Avoid changing user if no explicit UID is given
* changelog
* Labeled a lot more code blocks with the appropriate type
* Fixed a couple of minor typos (missing/extraneous commas)
Signed-off-by: Sumner Evans <me@sumnerevans.com>
* add tests for fetching key locally
* add logic to check if origin server is same as host and fetch verify key locally rather than over federation
* add changelog
* slight refactor, add docstring, change changelog entry
* Make changelog entry one line
* remove verify_json_locally and push locality check to process_request, add function process_request_locally
* remove leftover code reference
* refactor to add common call to 'verify_json and associated handling code
* add type hint to process_json
* add some docstrings + very slight refactor
* Teach MyPy that the sentinel context is False
This means that if `ctx: LoggingContextOrSentinel`
then `bool(ctx)` narrows us to `ctx:LoggingContext`, which is a really
neat find!
* Annotate RequestMetrics
- Raise errors for sentry if we use the sentinel context
- Ensure we don't raise an error and carry on, but not recording stats
- Include stack trace in the error case to lower Sean's blood pressure
* Make mypy pass for synapse.http.request_metrics
* Make synapse.http.connectproxyclient pass mypy
Co-authored-by: reivilibre <oliverw@matrix.org>
Users admin API can now also modify user
type in addition to allowing it to be
set on user creation.
Signed-off-by: Jason Robinson <jasonr@matrix.org>
Co-authored-by: Brendan Abolivier <babolivier@matrix.org>
This is the final piece of the jigsaw for #9595. As with other changes before this one (eg #10771), we need to make sure that we auth the auth events in the right order, and actually check that their predecessors haven't been rejected.
To do this I've reused the existing code we use when persisting outliers elsewhere.
I've removed the code for attempting to fetch missing auth_events - the events should have been present in the send_join response, so the likely reason they are missing is that we couldn't verify them, so requesting them again is unlikely to help. Instead, we simply drop any state which relies on those auth events, as we do at a backwards-extremity. See also matrix-org/complement#216 for a test for this.
Expressions don't expand in single quotes, use double quotes for that.
https://github.com/koalaman/shellcheck/wiki/SC2016
This specifically warned about the '$aregis...' part of the sed script.
Which is a relatively obscure use of sed.
Splitting this into two commands makes its intent more obvious and
avoids contravening Shellcheck's lints.
Signed-off-by: Dan Callahan <danc@element.io>
SC2089: Quotes/backslashes will be treated literally. Use an array.
https://github.com/koalaman/shellcheck/wiki/SC2089
SC2090: Quotes/backslashes in this variable will not be respected.
https://github.com/koalaman/shellcheck/wiki/SC2090
Putting literal JSON in a variable mistakenly triggers these warnings.
Instead of adding ignore directives, this can be avoided by inlining the
JSON data into the curl invocation.
Since the variable is only used in this one location, inlining is fine.
Signed-off-by: Dan Callahan <danc@element.io>
`synapse.config.__main__` has the possibility to read a config item. This can be used to conveniently also validate the config is valid before trying to start Synapse.
The "read" command broke in https://github.com/matrix-org/synapse/pull/10916 as it now requires passing in "server.server_name" for example.
Also made the read command optional so one can just call this with just the confirm file reference and get a "Config parses OK" if things are ok.
Signed-off-by: Jason Robinson <jasonr@matrix.org>
Co-authored-by: Brendan Abolivier <babolivier@matrix.org>
* We only need to fetch users in private rooms
* Filter out `user_id` at the top
* Discard excluded users in the top loop
We weren't doing this in the "First, if they're our user" branch so this
is a bugfix.
* The caller must check that `user_id` is included
This is in the docstring. There are two call sites:
- one in `_handle_room_publicity_change`, which explicitly checks before calling;
- and another in `_handle_room_membership_event`, which returns early if
the user is excluded.
So this change is safe.
* Test joining a private room with an excluded user
* Tweak an existing test
* Changelog
* test docstring
* lint
If we find ourselves dealing with rejected events, we proably want to know
about it. Let's include it in the stringification of the event so that it gets
logged.
Currently, when we receive an event whose auth_events differ from those we expect, we state-resolve between the two state sets, and check that the event passes auth based on the resolved state.
This means that it's possible for us to accept events which don't pass auth at their declared auth_events (or where the auth events themselves were rejected), leading to problems down the line like #10083.
This change means we will:
* ignore any events where we cannot find the auth events
* reject any events whose auth events were rejected
* reject any events which do not pass auth at their declared auth_events.
Together with a whole raft of previous work, this is a partial fix to #9595.
Fixes#6643.
Based on #11009.
This fixes a bug where we would accept an event whose `auth_events` include
rejected events, if the rejected event was shadowed by another `auth_event`
with same `(type, state_key)`.
The approach is to pass a list of auth events into
`check_auth_rules_for_event` instead of a dict, which of course means updating
the call sites.
This is an extension of #10956.
Instead of triggering `__exit__` manually on the replication handler's
logging context, use it as a context manager so that there is an
`__enter__` call to balance the `__exit__`.
Found while working on the Gitter backfill script and noticed
it only happened after we sent 7 batches, https://gitlab.com/gitterHQ/webapp/-/merge_requests/2229#note_665906390
When there are more than 5 backward extremities for a given depth,
backfill will throw an error because we sliced the extremity list
to 5 but then try to iterate over the full list. This causes
us to look for state that we never fetched and we get a `KeyError`.
Before when calling `/messages` when there are more than 5 backward extremities:
```
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/synapse/http/server.py", line 258, in _async_render_wrapper
callback_return = await self._async_render(request)
File "/usr/local/lib/python3.8/site-packages/synapse/http/server.py", line 446, in _async_render
callback_return = await raw_callback_return
File "/usr/local/lib/python3.8/site-packages/synapse/rest/client/room.py", line 580, in on_GET
msgs = await self.pagination_handler.get_messages(
File "/usr/local/lib/python3.8/site-packages/synapse/handlers/pagination.py", line 396, in get_messages
await self.hs.get_federation_handler().maybe_backfill(
File "/usr/local/lib/python3.8/site-packages/synapse/handlers/federation.py", line 133, in maybe_backfill
return await self._maybe_backfill_inner(room_id, current_depth, limit)
File "/usr/local/lib/python3.8/site-packages/synapse/handlers/federation.py", line 386, in _maybe_backfill_inner
likely_extremeties_domains = get_domains_from_state(states[e_id])
KeyError: '$zpFflMEBtZdgcMQWTakaVItTLMjLFdKcRWUPHbbSZJl'
```
Resolve and share `state_groups` for all historical events in batch. This also helps for showing the appropriate avatar/displayname in Element and will work whenever `/messages` has one of the historical messages as the first message in the batch.
This does have the flaw where if you just insert a single historical event somewhere, it probably won't resolve the state correctly from `/messages` or `/context` since it will grab a non historical event above or below with resolved state which never included the historical state back then. For the same reasions, this also does not work in Element between the transition from actual messages to historical messages. In the Gitter case, this isn't really a problem since all of the historical messages are in one big lump at the beginning of the room.
For a future iteration, might be good to look at `/messages` and `/context` to additionally add the `state` for any historical messages in that batch.
---
How are the `state_groups` shared? To illustrate the `state_group` sharing, see this example:
**Before** (new `state_group` for every event 😬, very inefficient):
```
# Tests from https://github.com/matrix-org/complement/pull/206
$ COMPLEMENT_ALWAYS_PRINT_SERVER_LOGS=1 COMPLEMENT_DIR=../complement ./scripts-dev/complement.sh TestBackfillingHistory/parallel/should_resolve_member_state_events_for_historical_events
create_new_client_event m.room.member event=$_JXfwUDIWS6xKGG4SmZXjSFrizhARM7QblhATVWWUcA state_group=None
create_new_client_event org.matrix.msc2716.insertion event=$1ZBfmBKEjg94d-vGYymKrVYeghwBOuGJ3wubU1-I9y0 state_group=9
create_new_client_event org.matrix.msc2716.insertion event=$Mq2JvRetTyclPuozRI682SAjYp3GqRuPc8_cH5-ezPY state_group=10
create_new_client_event m.room.message event=$MfmY4rBQkxrIp8jVwVMTJ4PKnxSigpG9E2cn7S0AtTo state_group=11
create_new_client_event m.room.message event=$uYOv6V8wiF7xHwOMt-60d1AoOIbqLgrDLz6ZIQDdWUI state_group=12
create_new_client_event m.room.message event=$PAbkJRMxb0bX4A6av463faiAhxkE3FEObM1xB4D0UG4 state_group=13
create_new_client_event org.matrix.msc2716.batch event=$Oy_S7AWN7rJQe_MYwGPEy6RtbYklrI-tAhmfiLrCaKI state_group=14
```
**After** (all events in batch sharing `state_group=10`) (the base insertion event has `state_group=8` which matches the `prev_event` we're inserting next to):
```
# Tests from https://github.com/matrix-org/complement/pull/206
$ COMPLEMENT_ALWAYS_PRINT_SERVER_LOGS=1 COMPLEMENT_DIR=../complement ./scripts-dev/complement.sh TestBackfillingHistory/parallel/should_resolve_member_state_events_for_historical_events
create_new_client_event m.room.member event=$PWomJ8PwENYEYuVNoG30gqtybuQQSZ55eldBUSs0i0U state_group=None
create_new_client_event org.matrix.msc2716.insertion event=$e_mCU7Eah9ABF6nQU7lu4E1RxIWccNF05AKaTT5m3lw state_group=9
create_new_client_event org.matrix.msc2716.insertion event=$ui7A3_GdXIcJq0C8GpyrF8X7B3DTjMd_WGCjogax7xU state_group=10
create_new_client_event m.room.message event=$EnTIM5rEGVezQJiYl62uFBl6kJ7B-sMxWqe2D_4FX1I state_group=10
create_new_client_event m.room.message event=$LGx5jGONnBPuNhAuZqHeEoXChd9ryVkuTZatGisOPjk state_group=10
create_new_client_event m.room.message event=$wW0zwoN50lbLu1KoKbybVMxLbKUj7GV_olozIc5i3M0 state_group=10
create_new_client_event org.matrix.msc2716.batch event=$5ZB6dtzqFBCEuMRgpkU201Qhx3WtXZGTz_YgldL6JrQ state_group=10
```
* Pull out `_handle_room_membership_event`
* Discard excluded users early
* Rearrange logic so the change is membership is effectively switched over. See PR for rationale.
The following scenarios would halt the user directory updater:
- user joins room
- user leaves room
- user present in room which switches from private to public, or vice versa.
for two classes of users:
- appservice senders
- users missing from the user table.
If this happened, the user directory would be stuck, unable to make forward progress.
Exclude both cases from the user directory, so that we ignore them.
Co-authored-by: Eric Eastwood <erice@element.io>
Co-authored-by: reivilibre <oliverw@matrix.org>
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
Co-authored-by: Brendan Abolivier <babolivier@matrix.org>
The race allowed the current position to advance too far when stream IDs
are still being persisted.
This happened when it received a new stream ID from a remote write
between a new stream ID being allocated and it being added to the set of
unpersisted stream IDs.
Fixes#9424.
This reverts #11019 and structures the code a bit more like it was before #10985.
The global cache state must be reset before running the tests since other test
cases might have configured caching (and thus touched the global state).
Make `get_last_client_by_ip` return the same dictionary structure
regardless of whether the data has been persisted to the database.
This change will allow slightly cleaner type hints to be applied later
on.
This commit fixes two bugs to do with decorators not instrumenting
`ReplicationEndpoint`'s `send_request` correctly. There are two
decorators on `send_request`: Prometheus' `Gauge.track_inprogress()`
and Synapse's `opentracing.trace`.
`Gauge.track_inprogress()` does not have any support for async
functions when used as a decorator. Since async functions behave like
regular functions that return coroutines, only the creation of the
coroutine was covered by the metric and none of the actual body of
`send_request`.
`Gauge.track_inprogress()` returns a regular, non-async function
wrapping `send_request`, which is the source of the next bug.
The `opentracing.trace` decorator would normally handle async functions
correctly, but since the wrapped `send_request` is a non-async function,
the decorator ends up suffering from the same issue as
`Gauge.track_inprogress()`: the opentracing span only measures the
creation of the coroutine and none of the actual function body.
Using `Gauge.track_inprogress()` as a context manager instead of a
decorator resolves both bugs.
Updating mypy past version 0.9 means that third-party stubs are no-longer distributed with typeshed. See http://mypy-lang.blogspot.com/2021/06/mypy-0900-released.html for details.
We therefore pull in stub packages in setup.py
Additionally, some modules that we were previously ignoring import failures for now have stubs. So let's use them.
The rest of this change consists of fixups to make the newer mypy + stubs pass CI.
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
This splits apart `handle_new_user` into a function which adds an entry to the `user_directory` and a function which updates the room sharing tables. I plan to continue doing more of this kind of refactoring to clarify the implementation.
The shared ratelimit function was replaced with a dedicated
RequestRatelimiter class (accessible from the HomeServer
object).
Other properties were copied to each sub-class that inherited
from BaseHandler.
Use `PreserveLoggingContext()` to ensure that logging contexts are not
lost when exiting a read/write lock.
When exiting a read/write lock, callbacks on a `Deferred` are triggered
as a signal to any waiting coroutines. Any waiting coroutine that
becomes runnable is likely to follow the Synapse logging context rules
and will restore its own logging context, then either run to completion
or await another `Deferred`, resetting the logging context in the
process.
This removes the magic allowing accessing configurable
variables directly from the config object. It is now required
that a specific configuration class is used (e.g. `config.foo`
must be replaced with `config.server.foo`).
Fix a long-standing bug where a batch of user directory changes would be
silently dropped if the server left a room early in the batch.
* Pull out `wait_for_background_update` in tests
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
The following modules now pass `disallow_untyped_defs`:
* synapse.util.caches.cached_call
* synapse.util.caches.lrucache
* synapse.util.caches.response_cache
* synapse.util.caches.stream_change_cache
* synapse.util.caches.ttlcache pass
* synapse.util.daemonize
* synapse.util.patch_inline_callbacks pass `no-untyped-defs`
* synapse.util.versionstring
Additional typing in synapse.util.metrics. Didn't get this to pass `no-untyped-defs`, think I'll need to watch #10847
There are two steps to rebuilding the user directory:
1. a scan over rooms, followed by
2. a scan over local users.
The former reads avatars and display names from the `room_memberships`
table and therefore contains potentially private avatars and
display names. The latter reads from the the `profiles` table which only
contains public data; moreover it will overwrite any private profiles
that the rooms scan may have written to the user directory. This means
that the rebuild could leak private user while the rebuild was in
progress, only to later cover up the leaks once the rebuild had completed.
This change skips over local users when writing user_directory rows
when scanning rooms. Doing so means that it'll take longer for a rebuild
to make local users searchable, which is unfortunate. I think a future
PR can improve this by swapping the order of the two steps above. (And
indeed there's more to do here, e.g. copying from `profiles` without
going via Python.)
Small tidy-ups while I'm here:
* Remove duplicated code from test_initial. This was meant to be pulled into `purge_and_rebuild_user_dir`.
* Move `is_public` before updating sharing tables. No functional change; it's still before the first read of `is_public`.
* Don't bother creating a set from dict keys. Slightly nicer and makes the code simpler.
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
We correctly allowed using the MSC2716 batch endpoint for
the room creator in existing room versions but accidentally didn't track
the events because of a logic flaw.
This prevented you from connecting subsequent chunks together because it would
throw the unknown batch ID error.
We only want to process MSC2716 events when:
- The room version supports MSC2716
- Any room where the homeserver has the `msc2716_enabled` experimental feature enabled and the event is from the room creator
`_check_event_auth` is only called in two places, and only one of those sets
`send_on_behalf_of`. Warming the cache isn't really part of auth anyway, so
moving it out makes a lot more sense.
There's little point in doing a fancy state reconciliation dance if the event
itself is invalid.
Likewise, there's no point checking it again in `_check_for_soft_fail`.
* add test
* add function to remove user from monthly active table in deactivate code
* add function to remove user from monthly active table
* add changelog entry
* update changelog number
* requested changes
* update docstring on new function
* fix lint error
* Update synapse/storage/databases/main/monthly_active_users.py
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
Synapse 1.44.0rc3 (2021-10-04)
==============================
Bugfixes
--------
- Fix a bug introduced in Synapse v1.40.0 where changing a user's display name or avatar in a restricted room would cause an authentication error. ([\#10933](https://github.com/matrix-org/synapse/issues/10933))
- Fix `/admin/whois/{user_id}` endpoint, which was broken in v1.44.0rc1. ([\#10968](https://github.com/matrix-org/synapse/issues/10968))
* Introduce `should_include_local_users_in_dir`
We exclude three kinds of local users from the user_directory tables. At
present we don't consistently exclude all three in the same places. This
commit introduces a new function to gather those exclusion conditions
together. Because we have to handle local and remote users in different
ways, I've made that function only consider the case of remote users.
It's the caller's responsibility to make the local versus remote
distinction clear and correct.
A test fixup is required. The test now hits a path which makes db
queries against the users table. The expected rows were missing, because
we were using a dummy user that hadn't actually been registered.
We also add new test cases to covert the exclusion logic.
----
By my reading this makes these changes:
* When an app service user registers or changes their profile, they will
_not_ be added to the user directory. (Previously only support and
deactivated users were excluded). This is consistent with the logic that
rebuilds the user directory. See also [the discussion
here](https://github.com/matrix-org/synapse/pull/10914#discussion_r716859548).
* When rebuilding the directory, exclude support and disabled users from
room sharing tables. Previously only appservice users were excluded.
* Exclude all three categories of local users when rebuilding the
directory. Previously `_populate_user_directory_process_users` didn't do
any exclusion.
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
This fixes a "Event not signed by authorising server" error when
transition room member from join -> join, e.g. when updating a
display name or avatar URL for restricted rooms.
This fixes a "Event not signed by authorising server" error when
transition room member from join -> join, e.g. when updating a
display name or avatar URL for restricted rooms.
This follows a correction made in twisted/twisted#1664 and should fix our Twisted Trial CI job.
Until that change is in a twisted release, we'll have to ignore the type
of the `host` argument. I've raised #10899 to remind us to review the
issue in a few months' time.
Fix event context for outlier causing failures in all of the MSC2716
Complement tests.
The `EventContext.for_outlier` refactor happened in
https://github.com/matrix-org/synapse/pull/10883
and this spot was left out.
* Pull out GetUserDirectoryTables helper
* Don't rebuild the dir in tests that don't need it
In #10796 I changed registering a user to add directory entries under.
This means we don't have to force a directory regbuild in to tests of
the user directory search.
* Move test_initial to tests/storage
* Add type hints to both test_user_directory files
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
Broadly, the existing `event_auth.check` function has two parts:
* a validation section: checks that the event isn't too big, that it has the rught signatures, etc.
This bit is independent of the rest of the state in the room, and so need only be done once
for each event.
* an auth section: ensures that the event is allowed, given the rest of the state in the room.
This gets done multiple times, against various sets of room state, because it forms part of
the state res algorithm.
Currently, this is implemented with `do_sig_check` and `do_size_check` parameters, but I think
that makes everything hard to follow. Instead, we split the function in two and call each part
separately where it is needed.
Before Synapse 1.31 (#9411), we relied on `outlier` being stored in the
`internal_metadata` column. We can now assume nobody will roll back their
deployment that far and drop the legacy support.
* Inline `_check_event_auth` for outliers
When we are persisting an outlier, most of `_check_event_auth` is redundant:
* `_update_auth_events_and_context_for_auth` does nothing, because the
`input_auth_events` are (now) exactly the event's auth_events,
which means that `missing_auth` is empty.
* we don't care about soft-fail, kicking guest users or `send_on_behalf_of`
for outliers
... so the only thing that matters is the auth itself, so let's just do that.
* `_auth_and_persist_fetched_events_inner`: de-async `prep`
`prep` no longer calls any `async` methods, so let's make it synchronous.
* Simplify `_check_event_auth`
We no longer need to support outliers here, which makes things rather simpler.
* changelog
* lint
Currently we use `JsonEncoder.iterencode` to write JSON responses, which ensures that we don't block the main reactor thread when encoding huge objects. The downside to this is that `iterencode` falls back to using a pure Python encoder that is *much* less efficient and can easily burn a lot of CPU for huge responses. To fix this, while still ensuring we don't block the reactor loop, we encode the JSON on a threadpool using the standard `JsonEncoder.encode` functions, which is backed by a C library.
Doing so, however, requires `respond_with_json` to have access to the reactor, which it previously didn't. There are two ways of doing this:
1. threading through the reactor object, which is a bit fiddly as e.g. `DirectServeJsonResource` doesn't currently take a reactor, but is exposed to modules and so is a PITA to change; or
2. expose the reactor in `SynapseRequest`, which requires updating a bunch of servlet types.
I went with the latter as that is just a mechanical change, and I think makes sense as a request already has a reactor associated with it (via its http channel).
This is in the context of creating new module callbacks that modules in https://github.com/matrix-org/synapse-dinsic can use, in an effort to reconcile the spam checker API in synapse-dinsic with the one in mainline.
This adds a callback that's fairly similar to user_may_create_room except it also allows processing based on the invites sent at room creation.
- Use sytest:bionic. Sytest:latest is two years old (do we want
CI to push out latest at all?) and comes with Python 3.5, which we
explictly no longer support. The script now runs under PostgreSQL 10
as a result.
- Advertise script in the docs
- Move pg testing script to scripts-dev directory
- Write to host as the script's exector, not root
A few changes to make it speedier to re-run the tests:
- Create blank DB in the container, not the script, so we don't have to
`initdb` each time
- Use a named volume to persist the tox environment, so we don't have to
fetch and install a bunch of packages from PyPI each time
Co-authored-by: reivilibre <olivier@librepush.net>
* Factor more stuff out of `_get_events_and_persist`
It turns out that the event-sorting algorithm in `_get_events_and_persist` is
also useful in other circumstances. Here we move the current
`_auth_and_persist_fetched_events` to `_auth_and_persist_fetched_events_inner`,
and then factor the sorting part out to `_auth_and_persist_fetched_events`.
* `_get_remote_auth_chain_for_event`: remove redundant `outlier` assignment
`get_event_auth` returns events with the outlier flag already set, so this is
redundant (though we need to update a test where `get_event_auth` is mocked).
* `_get_remote_auth_chain_for_event`: move existing-event tests earlier
Move a couple of tests outside the loop. This is a bit inefficient for now, but
a future commit will make it better. It should be functionally identical.
* `_get_remote_auth_chain_for_event`: use `_auth_and_persist_fetched_events`
We can use the same codepath for persisting the events fetched as part of an
auth chain as for those fetched individually by `_get_events_and_persist` for
building the state at a backwards extremity.
* `_get_remote_auth_chain_for_event`: use a dict for efficiency
`_auth_and_persist_fetched_events` sorts the events itself, so we no longer
need to care about maintaining the ordering from `get_event_auth` (and no
longer need to sort by depth in `get_event_auth`).
That means that we can use a map, making it easier to filter out events we
already have, etc.
* changelog
* `_auth_and_persist_fetched_events`: improve docstring
Combine the two loops over the list of events, and hence get rid of
`_NewEventInfo`. Also pass the event back alongside the context, so that it's
easier to process the result.
If the MAU count had been reached, Synapse incorrectly blocked appservice users even though they've been explicitly configured not to be tracked (the default). This was due to bypassing the relevant if as it was chained behind another earlier hit if as an elif.
Signed-off-by: Jason Robinson <jasonr@matrix.org>
* Improve typing in user_directory files
This makes the user_directory.py in storage pass most of mypy's
checks (including `no-untyped-defs`). Unfortunately that file is in the
tangled web of Store class inheritance so doesn't pass mypy at the moment.
The handlers directory has already been mypyed.
Co-authored-by: reivilibre <olivier@librepush.net>
This change adds a check for row existence before accessing row element, this should fix issue #10669
Signed-off-by: Vasya Boytsov vasiliy.boytsov@phystech.edu
* Reload auth events from db after fetching and persisting
In `_update_auth_events_and_context_for_auth`, when we fetch the remote auth
tree and persist the returned events: load the missing events from the database
rather than using the copies we got from the remote server.
This is mostly in preparation for additional refactors, but does have an
advantage in that if we later get around to checking the rejected status, we'll
be able to make use of it.
* Factor out `_get_remote_auth_chain_for_event` from `_update_auth_events_and_context_for_auth`
* changelog
This avoids the overhead of searching through the various
configuration classes by directly referencing the class that
the attributes are in.
It also improves type hints since mypy can now resolve the
types of the configuration variables.
Constructing an EventContext for an outlier is actually really simple, and
there's no sense in going via an `async` method in the `StateHandler`.
This also means that we can resolve a bunch of FIXMEs.
* add test to check if null code points are being inserted
* add logic to detect and replace null code points before insertion into db
* lints
* add license to test
* change approach to null substitution
* add type hint for SearchEntry
* Add changelog entry
Signed-off-by: H.Shay <shaysquared@gmail.com>
* updated changelog
* update chanelog message
* remove duplicate changelog
* Update synapse/storage/databases/main/events.py remove extra space
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
* rename and move test file, update tests, delete old test file
* fix typo in comments
* update _find_highlights_in_postgres to replace null byte with space
* replace null byte in sqlite search insertion
* beef up and reorganize test for this pr
* update changelog
* add type hints and update docstring
* check db engine directly vs using env variable
* refactor tests to be less repetetive
* move rplace logic into seperate function
* requested changes
* Fix typo.
* Update synapse/storage/databases/main/search.py
Co-authored-by: reivilibre <olivier@librepush.net>
* Update changelog.d/10820.misc
Co-authored-by: Aaron Raimist <aaron@raim.ist>
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
Co-authored-by: reivilibre <olivier@librepush.net>
Co-authored-by: Aaron Raimist <aaron@raim.ist>
The invalidation was missing in `_claim_e2e_one_time_key_returning`,
which is used on SQLite 3.24+ and Postgres. This could break e2ee if
nothing else happened to invalidate the caches before the keys ran out.
Signed-off-by: Tulir Asokan <tulir@beeper.com>
* Improved titles (fall back to the author name if there's not title) and include the site name.
* Handle photo/video payloads.
* Include the original URL in the Open Graph response.
* Fix the expiration time (by properly converting from seconds to milliseconds).
The deprecated /initialSync endpoint maintains a cache of responses,
using parameter values as part of the cache key. When a `from` or `to`
parameter is specified, it gets converted into a `StreamToken`, which
contains a `RoomStreamToken` and forms part of the cache key.
`RoomStreamToken`s need to be made hashable for this to work.
I meant to do this before, in #10591, but because I'm stupid I forgot to do it
for V2 and V3 events.
I've factored the common code out to `EventBase` to save us having two copies
of it.
This means that for `FrozenEvent` we replace `self.get("event_id", None)` with
`self.event_id`, which I think is safe. `get()` is an alias for
`self._dict.get()`, whereas `event_id()` is an `@property` method which looks
up `self._event_id`, which is populated during construction from the same
dict. We don't seem to rely on the fallback, because if the `event_id` key is
absent from the dict then construction of the `EventBase` object will
fail.
Long story short, the only way this could change behaviour is if
`event_dict["event_id"]` is changed *after* the `EventBase` object is
constructed without updating the `_event_id` field, or vice versa - either of
which would be very problematic anyway and the behavior of `str(event)` is the
least of our worries.
The major change is moving the decision of whether to use oEmbed
further up the call-stack. This reverts the _download_url method to
being a "dumb" functionwhich takes a single URL and downloads it
(as it was before #7920).
This also makes more minor refactorings:
* Renames internal variables for clarity.
* Factors out shared code between the HTML and rich oEmbed
previews.
* Fixes tests to preview an oEmbed image.
* add tests for checking if room search works with non-ascii char
* change encoding on parse_string to UTF-8
* lints
* properly encode search term
* lints
* add changelog file
* update changelog number
* set changelog entry filetype to .bugfix
* Revert "set changelog entry filetype to .bugfix"
This reverts commit be8e5a314251438ec4ec7dbc59ba32162c93e550.
* update changelog message and file type
* change parse_string default encoding back to ascii and update room search admin api calll to parse string
* refactor tests
* Update tests/rest/admin/test_room.py
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
It's a simplification, but one that'll help make the user directory logic easier
to follow with the other changes upcoming. It's not strictly required for those
changes, but this will help simplify the resulting logic that listens for
`m.room.member` events and generally make the logic easier to follow.
This means the config option `search_all_users` ends up controlling the
search query only, and not the data we store. The cost of doing so is an
extra row in the `user_directory` and `user_directory_search` tables for
each local user which
- belongs to no public rooms
- belongs to no private rooms of size ≥ 2
I think the cost of this will be marginal (since they'll already have entries
in `users` and `profiles` anyway).
As a small upside, a homeserver whose directory was built with this
change can toggle `search_all_users` without having to rebuild their
directory.
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
Adds missing type hints to methods in the synapse.handlers
module and requires all methods to have type hints there.
This also removes the unused construct_auth_difference method
from the FederationHandler.
We added a bunch of spans in #10704, but this ended up adding a lot of
redundant spans for rooms where nothing changed, so instead we only
start the span if there might be something interesting going on.
In `MatrixFederationHttpClient._send_request()`, we make a HTTP request
using an `Agent`, wrap that request in a timeout and await the resulting
`Deferred`. On its own, the `Agent` performing the HTTP request
correctly stashes and restores the logging context while waiting.
The addition of the timeout introduces a path where the logging context
is not restored when execution resumes.
To address this, we wrap the timeout `Deferred` in a
`make_deferred_yieldable()` to stash the logging context and restore it
on completion of the `await`. However this is not sufficient, since by
the time we construct the timeout `Deferred`, the `Agent` has already
stashed and cleared the logging context when using
`make_deferred_yieldable()` to produce its `Deferred` for the request.
Hence, we wrap the `Agent` request in a `run_in_background()` to "fork"
and preserve the logging context so that we can stash and restore it
when `await`ing the timeout `Deferred`.
This approach is similar to the one used with `defer.gatherResults`.
Note that the code is still not fully correct. When a timeout occurs,
the request remains running in the background (existing behavior which
is nothing to do with the new call to `run_in_background`) and may
re-start the logging context after it has finished.
I had one of these error messages yesterday and assumed it was an
invalid auth token (because that was an HTTP query parameter in the
test) I was working on. In fact, it was an invalid next batch token for
syncing.
We've already batched up the events previously, and assume in other
places in the events.py file that we have. Removing this makes it easier
to adjust the batch sizes in one place.
Hint to clients via the room capabilities API (MSC3244) that
room version 9 should be preferred for creating a room with
restricted join rules (instead of room version 8).
* Split up the documentation in several files rather than one huge one
* Add examples for each callback category
* Other niceties like fixing https://github.com/matrix-org/synapse/issues/10632
* Add titles to callbacks so they're easier to find in the navigation panels and link to
When releasing 1.42.0 with @Azrenbeth and talking with @clokep yesterday I realised doing the dch incantations related to releasing Synapse wasn't trivial on eg a macOS system, so this is a script to run in a Debian container to make things a bit easier.
This adds the format to the request arguments / URL to
ensure that JSON data is returned (which is all that
Synapse supports).
This also adds additional error checking / filtering to the
configuration file to ignore XML-only providers.
I think I have finally teased apart the codepaths which handle outliers, and those that handle non-outliers.
Let's add some assertions to demonstrate my newfound knowledge.
If we're persisting an event E which has auth_events A1, A2, then we ought to make sure that we correctly auth
and persist A1 and A2, before we blindly accept E.
This PR does part of that - it persists the auth events first - but it does not fully solve the problem, because we
still don't check that the auth events weren't rejected.
The full event content cannot be trusted from this API (as no auth
chain, etc.) is processed over federation. Returning the full event
content was a bug as MSC2946 specifies that only the stripped
state should be returned.
This also avoids calculating aggregations / annotations which go
unused.
Synapse 1.42.0rc2 (2021-09-06)
==============================
This version of Synapse removes deprecated room-management admin APIs, removes out-of-date
email pushers, and improves error handling for fallback templates for user-interactive
authentication. For more information on these points, server administrators are
encouraged to read [the upgrade notes](docs/upgrade.md#upgrading-to-v1420).
Features
--------
- Support room version 9 from [MSC3375](https://github.com/matrix-org/matrix-doc/pull/3375). ([\#10747](https://github.com/matrix-org/synapse/issues/10747))
Internal Changes
----------------
- Print a warning when using one of the deprecated `template_dir` settings. ([\#10768](https://github.com/matrix-org/synapse/issues/10768))
The deprecation itself happened in #10596 which shipped with Synapse v1.41.0. However, it doesn't seem fair to suddenly drop support for these settings in ~4-6w without being more vocal about said deprecation.
This is part of my ongoing war against BaseHandler. I've moved kick_guest_users into RoomMemberHandler (since it calls out to that handler anyway), and split maybe_kick_guest_users into the two places it is called.
* Allow room creator to send MSC2716 related events in existing room versions
Discussed at https://github.com/matrix-org/matrix-doc/pull/2716/#discussion_r682474869
Restoring `get_create_event_for_room_txn` from,
44bb3f0cf5
* Add changelog
* Stop people from trying to redact MSC2716 events in unsupported room versions
* Populate rooms.creator column for easy lookup
> From some [out of band discussion](https://matrix.to/#/!UytJQHLQYfvYWsGrGY:jki.re/$p2fKESoFst038x6pOOmsY0C49S2gLKMr0jhNMz_JJz0?via=jki.re&via=matrix.org), my plan is to use `rooms.creator`. But currently, we don't fill in `creator` for remote rooms when a user is invited to a room for example. So we need to add some code to fill in `creator` wherever we add to the `rooms` table. And also add a background update to fill in the rows missing `creator` (we can use the same logic that `get_create_event_for_room_txn` is doing by looking in the state events to get the `creator`).
>
> https://github.com/matrix-org/synapse/pull/10566#issuecomment-901616642
* Remove and switch away from get_create_event_for_room_txn
* Fix no create event being found because no state events persisted yet
* Fix and add tests for rooms creator bg update
* Populate rooms.creator field for easy lookup
Part of https://github.com/matrix-org/synapse/pull/10566
- Fill in creator whenever we insert into the rooms table
- Add background update to backfill any missing creator values
* Add changelog
* Fix usage
* Remove extra delta already included in #10697
* Don't worry about setting creator for invite
* Only iterate over rows missing the creator
See https://github.com/matrix-org/synapse/pull/10697#discussion_r695940898
* Use constant to fetch room creator field
See https://github.com/matrix-org/synapse/pull/10697#discussion_r696803029
* More protection from other random types
See https://github.com/matrix-org/synapse/pull/10697#discussion_r696806853
* Move new background update to end of list
See https://github.com/matrix-org/synapse/pull/10697#discussion_r696814181
* Fix query casing
* Fix ambiguity iterating over cursor instead of list
Fix `psycopg2.ProgrammingError: no results to fetch` error
when tests run with Postgres.
```
SYNAPSE_POSTGRES=1 SYNAPSE_TEST_LOG_LEVEL=INFO python -m twisted.trial tests.storage.databases.main.test_room
```
---
We use `txn.fetchall` because it will return the results as a
list or an empty list when there are no results.
Docs:
> `cursor` objects are iterable, so, instead of calling explicitly fetchone() in a loop, the object itself can be used:
>
> https://www.psycopg.org/docs/cursor.html#cursor-iterable
And I'm guessing iterating over a raw cursor does something weird when there are no results.
---
Test CI failure: https://github.com/matrix-org/synapse/pull/10697/checks?check_run_id=3468916530
```
tests.test_visibility.FilterEventsForServerTestCase.test_large_room
===============================================================================
[FAIL]
Traceback (most recent call last):
File "/home/runner/work/synapse/synapse/tests/storage/databases/main/test_room.py", line 85, in test_background_populate_rooms_creator_column
self.get_success(
File "/home/runner/work/synapse/synapse/tests/unittest.py", line 500, in get_success
return self.successResultOf(d)
File "/home/runner/work/synapse/synapse/.tox/py/lib/python3.9/site-packages/twisted/trial/_synctest.py", line 700, in successResultOf
self.fail(
twisted.trial.unittest.FailTest: Success result expected on <Deferred at 0x7f4022f3eb50 current result: None>, found failure result instead:
Traceback (most recent call last):
File "/home/runner/work/synapse/synapse/.tox/py/lib/python3.9/site-packages/twisted/internet/defer.py", line 701, in errback
self._startRunCallbacks(fail)
File "/home/runner/work/synapse/synapse/.tox/py/lib/python3.9/site-packages/twisted/internet/defer.py", line 764, in _startRunCallbacks
self._runCallbacks()
File "/home/runner/work/synapse/synapse/.tox/py/lib/python3.9/site-packages/twisted/internet/defer.py", line 858, in _runCallbacks
current.result = callback( # type: ignore[misc]
File "/home/runner/work/synapse/synapse/.tox/py/lib/python3.9/site-packages/twisted/internet/defer.py", line 1751, in gotResult
current_context.run(_inlineCallbacks, r, gen, status)
--- <exception caught here> ---
File "/home/runner/work/synapse/synapse/.tox/py/lib/python3.9/site-packages/twisted/internet/defer.py", line 1657, in _inlineCallbacks
result = current_context.run(
File "/home/runner/work/synapse/synapse/.tox/py/lib/python3.9/site-packages/twisted/python/failure.py", line 500, in throwExceptionIntoGenerator
return g.throw(self.type, self.value, self.tb)
File "/home/runner/work/synapse/synapse/synapse/storage/background_updates.py", line 224, in do_next_background_update
await self._do_background_update(desired_duration_ms)
File "/home/runner/work/synapse/synapse/synapse/storage/background_updates.py", line 261, in _do_background_update
items_updated = await update_handler(progress, batch_size)
File "/home/runner/work/synapse/synapse/synapse/storage/databases/main/room.py", line 1399, in _background_populate_rooms_creator_column
end = await self.db_pool.runInteraction(
File "/home/runner/work/synapse/synapse/synapse/storage/database.py", line 686, in runInteraction
result = await self.runWithConnection(
File "/home/runner/work/synapse/synapse/synapse/storage/database.py", line 791, in runWithConnection
return await make_deferred_yieldable(
File "/home/runner/work/synapse/synapse/.tox/py/lib/python3.9/site-packages/twisted/internet/defer.py", line 858, in _runCallbacks
current.result = callback( # type: ignore[misc]
File "/home/runner/work/synapse/synapse/tests/server.py", line 425, in <lambda>
d.addCallback(lambda x: function(*args, **kwargs))
File "/home/runner/work/synapse/synapse/.tox/py/lib/python3.9/site-packages/twisted/enterprise/adbapi.py", line 293, in _runWithConnection
compat.reraise(excValue, excTraceback)
File "/home/runner/work/synapse/synapse/.tox/py/lib/python3.9/site-packages/twisted/python/deprecate.py", line 298, in deprecatedFunction
return function(*args, **kwargs)
File "/home/runner/work/synapse/synapse/.tox/py/lib/python3.9/site-packages/twisted/python/compat.py", line 404, in reraise
raise exception.with_traceback(traceback)
File "/home/runner/work/synapse/synapse/.tox/py/lib/python3.9/site-packages/twisted/enterprise/adbapi.py", line 284, in _runWithConnection
result = func(conn, *args, **kw)
File "/home/runner/work/synapse/synapse/synapse/storage/database.py", line 786, in inner_func
return func(db_conn, *args, **kwargs)
File "/home/runner/work/synapse/synapse/synapse/storage/database.py", line 554, in new_transaction
r = func(cursor, *args, **kwargs)
File "/home/runner/work/synapse/synapse/synapse/storage/databases/main/room.py", line 1375, in _background_populate_rooms_creator_column_txn
for room_id, event_json in txn:
psycopg2.ProgrammingError: no results to fetch
```
* Move code not under the MSC2716 room version underneath an experimental config option
See https://github.com/matrix-org/synapse/pull/10566#issuecomment-906437909
* Add ordering to rooms creator background update
See https://github.com/matrix-org/synapse/pull/10697#discussion_r696815277
* Add comment to better document constant
See https://github.com/matrix-org/synapse/pull/10697#discussion_r699674458
* Use constant field
This updates the ordering of the returned events from the spaces
summary API to that defined in MSC2946 (which updates MSC1772).
Previously a step was skipped causing ordering to be inconsistent with
clients.
Judging by the template, this was intended ages ago, but we never
actually passed an avatar URL to the template. So let's provide one.
Closes#1546.
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
Use `gc.freeze()` on exit to exclude all existing objects from the final GC.
In testing, this sped up shutdown by up to a few seconds.
`gc.freeze()` runs in constant time, so there is little chance of performance
regression.
Signed-off-by: Sean Quah <seanq@element.io>
Point to the book where possible, and use hyperlinks to github to refer to files not included in the book.
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
* Add some tests to characterise the problem
Some failing. Current states:
RoomsMemberListTestCase
test_get_member_list ...
[OK]
test_get_member_list_mixed_memberships ...
[OK]
test_get_member_list_no_permission ...
[OK]
test_get_member_list_no_permission_former_member ...
[OK]
test_get_member_list_no_permission_former_member_with_at_token ...
[FAIL]
test_get_member_list_no_room ...
[OK]
test_get_member_list_no_permission_with_at_token ...
[FAIL]
* Correct the tests
* Check user is/was member before divulging room membership
* Pull out only the 1 membership event we want.
* Update tests/rest/client/v1/test_rooms.py
Co-authored-by: Erik Johnston <erik@matrix.org>
* Fixup tests (following apply review suggestion)
Co-authored-by: Erik Johnston <erik@matrix.org>
Turns out that the functionality added in #10546 to skip TLS was incompatible
with older Twisted versions, so we need to be a bit more inventive.
Also, add a test to (hopefully) not break this in future. Sadly, testing TLS is
really hard.
- Removed page summaries from CONTRIBUTING and installation pages as
this information was already in the table of contents on the right hand side
- Fixed some broken links in CONTRIBUTING
- Added margin-right tag for when table of contents is being shown
(otherwise the text in the page sometimes overlaps with it)
The code to deduplicate repeated fetches of the same set of events was
N^2 (over the number of events requested), which could lead to a process
being completely wedged.
The main fix is to deduplicate the returned deferreds so we only await
on a deferred once rather than many times. Seperately, when handling the
returned events from the defrered we only add the events we care about
to the event map to be returned (so that we don't pay the price of
inserting extraneous events into the dict).
Given that backfill and get_missing_events are basically the same thing, it's somewhat crazy that we have entirely separate code paths for them. This makes backfill use the existing get_missing_events code, and then clears up all the unused code.
When a user deletes an email from their account it will
now also remove all pushers for that email and that user
(even if these pushers were created by a different client)
* Fix the titles in the OIDC documentation
Having them as links broke the table-of-contents rendering in mdbook.
Plus there's no reason for only some of the provider titles to be links.
* Changelog
* Add link to google idp docs
Setting `update_existing: true` in the `create-an-issue` GitHub Action
will avoid opening duplicate issues if an open issue already exists with
an identical title.
If no open issues match the title, then a new issue will be created.
This helps avoid spamming our issue tracker should there be a failure
when testing against Twisted's trunk.
This PR also pins the SHA of the `create-an-issue` action to mitigate
the risk of a malicious actor gaining access to JasonEtco's account.
See GitHub's page on security hardening third party actions for more:
https://docs.github.com/en/actions/learn-github-actions/security-hardening-for-github-actions#using-third-party-actions
Signed-off-by: Dan Callahan <danc@element.io>
This creates a GHA workflow which runs at 8am every day, and runs mypy, trial and sytest against Twisted's current trunk. If any of the jobs fail, it opens an issue.
* Validate device_keys for C-S /keys/query requests
Closes#10354
A small, not particularly critical fix. I'm interested in seeing if we
can find a more systematic approach though. #8445 is the place for any discussion.
Here we split on_receive_pdu into two functions (on_receive_pdu and process_pulled_event), rather than having both cases in the same method. There's a tiny bit of overlap, but not that much.
* drop room pdu linearizer sooner
No point holding onto it while we recheck the db
* move out `missing_prevs` calculation
we're going to need `missing_prevs` whatever we do, so we may as well calculate
it eagerly and just update it if it gets outdated.
* Add another `if missing_prevs` condition
this should be a no-op, since all the code inside the block already checks `if
missing_prevs`
* reorder if conditions
This shouldn't change the logic at all.
* Push down `min_depth` read
No point reading it from the database unless we're going to use it.
* Collect the sent_to_us_directly code together
Move the remaining `sent_to_us_directly` code inside the `if
sent_to_us_directly` block.
* Properly separate the `not sent_to_us_directly` branch
Since the only way this second block is now reachable is if we
*didn't* go into the `sent_to_us_directly` branch, we can replace it with a
simple `else`.
* changelog
Several configuration sections are using separate settings for custom template directories, which can be confusing. This PR adds a new top-level configuration for a custom template directory which is then used for every module. The only exception is the consent templates, since the consent template directory require a specific hierarchy, so it's probably better that it stays separate from everything else.
If the new /hierarchy API does not exist on all destinations,
fallback to querying the /spaces API and translating the results.
This is a backwards compatibility hack since not all of the
federated homeservers will update at the same time.
Marking things as outliers to inhibit pushes is a sledgehammer to crack a
nut. Move the test further down the stack so that we just inhibit the thing we
want.
* Include outlier status in `str(event)`
In places where we log event objects, knowing whether or not you're dealing
with an outlier is super useful.
* Remove duplicated logging in get_missing_events
When we process events received from get_missing_events, we log them twice
(once in `_get_missing_events_for_pdu`, and once in `on_receive_pdu`). Reduce
the duplication by removing the logging in `on_receive_pdu`, and ensuring the
call sites do sensible logging.
* log in `on_receive_pdu` when we already have the event
* Log which prev_events we are missing
* changelog
As opposed to only allowing the summary of spaces which the user is
already in or has world-readable visibility.
This makes the logic consistent with whether a space/room is returned
as part of a space and whether a space summary can start at a space.
If a room which the requesting user was invited to was queried over
federation it will now properly appear in the spaces summary (instead
of being stripped out by the requesting server).
* Keep event fields that maintain the historical event structure intact
Fix https://github.com/matrix-org/synapse/issues/10521
* Add changelog
* Bump room version
* Better changelog text
* Fix up room version after develop merge
Instead of wrapping the JSON into an object, this creates concrete
instances for Transaction and Edu. This allows for improved type
hints and simplified code.
* drop old-room hack
pretty sure we don't need this any more.
* Remove incorrect comment about modifying `context`
It doesn't look like the supplied context is ever modified.
* Stop `_auth_and_persist_event` modifying its parameters
This is only called in three places. Two of them don't pass `auth_events`, and
the third doesn't use the dict after passing it in, so this should be non-functional.
* Stop `_check_event_auth` modifying its parameters
`_check_event_auth` is only called in three places. `on_send_membership_event`
doesn't pass an `auth_events`, and `prep` and `_auth_and_persist_event` do not
use the map after passing it in.
* Stop `_update_auth_events_and_context_for_auth` modifying its parameters
Return the updated auth event dict, rather than modifying the parameter.
This is only called from `_check_event_auth`.
* Improve documentation on `_auth_and_persist_event`
Rename `auth_events` parameter to better reflect what it contains.
* Improve documentation on `_NewEventInfo`
* Improve documentation on `_check_event_auth`
rename `auth_events` parameter to better describe what it contains
* changelog
This adds 'allowed_room_ids' (in addition to 'allowed_spaces', for backwards
compatibility) to the federation response of the spaces summary.
A future PR will remove the 'allowed_spaces' flag.
If there are no services providing a protocol, omit it completely
instead of returning an empty dictionary.
This fixes a long-standing spec compliance bug.
Synapse 1.40.0rc2 (2021-08-04)
==============================
Bugfixes
--------
- Fix the `PeriodicallyFlushingMemoryHandler` inhibiting application shutdown because of its background thread. ([\#10517](https://github.com/matrix-org/synapse/issues/10517))
- Fix a bug introduced in Synapse v1.40.0rc1 that could cause Synapse to respond with an error when clients would update read receipts. ([\#10531](https://github.com/matrix-org/synapse/issues/10531))
Internal Changes
----------------
- Fix release script to open the correct URL for the release. ([\#10516](https://github.com/matrix-org/synapse/issues/10516))
The room type is per MSC3288 to allow the identity-server to
change invitation wording based on whether the invitation is to
a room or a space.
The prefixed key will be replaced once MSC3288 is accepted
into the spec.
* Make historical messages available to federated servers
Part of MSC2716: https://github.com/matrix-org/matrix-doc/pull/2716
Follow-up to https://github.com/matrix-org/synapse/pull/9247
* Debug message not available on federation
* Add base starting insertion point when no chunk ID is provided
* Fix messages from multiple senders in historical chunk
Follow-up to https://github.com/matrix-org/synapse/pull/9247
Part of MSC2716: https://github.com/matrix-org/matrix-doc/pull/2716
---
Previously, Synapse would throw a 403,
`Cannot force another user to join.`,
because we were trying to use `?user_id` from a single virtual user
which did not match with messages from other users in the chunk.
* Remove debug lines
* Messing with selecting insertion event extremeties
* Move db schema change to new version
* Add more better comments
* Make a fake requester with just what we need
See https://github.com/matrix-org/synapse/pull/10276#discussion_r660999080
* Store insertion events in table
* Make base insertion event float off on its own
See https://github.com/matrix-org/synapse/pull/10250#issuecomment-875711889
Conflicts:
synapse/rest/client/v1/room.py
* Validate that the app service can actually control the given user
See https://github.com/matrix-org/synapse/pull/10276#issuecomment-876316455
Conflicts:
synapse/rest/client/v1/room.py
* Add some better comments on what we're trying to check for
* Continue debugging
* Share validation logic
* Add inserted historical messages to /backfill response
* Remove debug sql queries
* Some marker event implemntation trials
* Clean up PR
* Rename insertion_event_id to just event_id
* Add some better sql comments
* More accurate description
* Add changelog
* Make it clear what MSC the change is part of
* Add more detail on which insertion event came through
* Address review and improve sql queries
* Only use event_id as unique constraint
* Fix test case where insertion event is already in the normal DAG
* Remove debug changes
* Add support for MSC2716 marker events
* Process markers when we receive it over federation
* WIP: make hs2 backfill historical messages after marker event
* hs2 to better ask for insertion event extremity
But running into the `sqlite3.IntegrityError: NOT NULL constraint failed: event_to_state_groups.state_group`
error
* Add insertion_event_extremities table
* Switch to chunk events so we can auth via power_levels
Previously, we were using `content.chunk_id` to connect one
chunk to another. But these events can be from any `sender`
and we can't tell who should be able to send historical events.
We know we only want the application service to do it but these
events have the sender of a real historical message, not the
application service user ID as the sender. Other federated homeservers
also have no indicator which senders are an application service on
the originating homeserver.
So we want to auth all of the MSC2716 events via power_levels
and have them be sent by the application service with proper
PL levels in the room.
* Switch to chunk events for federation
* Add unstable room version to support new historical PL
* Messy: Fix undefined state_group for federated historical events
```
2021-07-13 02:27:57,810 - synapse.handlers.federation - 1248 - ERROR - GET-4 - Failed to backfill from hs1 because NOT NULL constraint failed: event_to_state_groups.state_group
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/synapse/handlers/federation.py", line 1216, in try_backfill
await self.backfill(
File "/usr/local/lib/python3.8/site-packages/synapse/handlers/federation.py", line 1035, in backfill
await self._auth_and_persist_event(dest, event, context, backfilled=True)
File "/usr/local/lib/python3.8/site-packages/synapse/handlers/federation.py", line 2222, in _auth_and_persist_event
await self._run_push_actions_and_persist_event(event, context, backfilled)
File "/usr/local/lib/python3.8/site-packages/synapse/handlers/federation.py", line 2244, in _run_push_actions_and_persist_event
await self.persist_events_and_notify(
File "/usr/local/lib/python3.8/site-packages/synapse/handlers/federation.py", line 3290, in persist_events_and_notify
events, max_stream_token = await self.storage.persistence.persist_events(
File "/usr/local/lib/python3.8/site-packages/synapse/logging/opentracing.py", line 774, in _trace_inner
return await func(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/synapse/storage/persist_events.py", line 320, in persist_events
ret_vals = await yieldable_gather_results(enqueue, partitioned.items())
File "/usr/local/lib/python3.8/site-packages/synapse/storage/persist_events.py", line 237, in handle_queue_loop
ret = await self._per_item_callback(
File "/usr/local/lib/python3.8/site-packages/synapse/storage/persist_events.py", line 577, in _persist_event_batch
await self.persist_events_store._persist_events_and_state_updates(
File "/usr/local/lib/python3.8/site-packages/synapse/storage/databases/main/events.py", line 176, in _persist_events_and_state_updates
await self.db_pool.runInteraction(
File "/usr/local/lib/python3.8/site-packages/synapse/storage/database.py", line 681, in runInteraction
result = await self.runWithConnection(
File "/usr/local/lib/python3.8/site-packages/synapse/storage/database.py", line 770, in runWithConnection
return await make_deferred_yieldable(
File "/usr/local/lib/python3.8/site-packages/twisted/python/threadpool.py", line 238, in inContext
result = inContext.theWork() # type: ignore[attr-defined]
File "/usr/local/lib/python3.8/site-packages/twisted/python/threadpool.py", line 254, in <lambda>
inContext.theWork = lambda: context.call( # type: ignore[attr-defined]
File "/usr/local/lib/python3.8/site-packages/twisted/python/context.py", line 118, in callWithContext
return self.currentContext().callWithContext(ctx, func, *args, **kw)
File "/usr/local/lib/python3.8/site-packages/twisted/python/context.py", line 83, in callWithContext
return func(*args, **kw)
File "/usr/local/lib/python3.8/site-packages/twisted/enterprise/adbapi.py", line 293, in _runWithConnection
compat.reraise(excValue, excTraceback)
File "/usr/local/lib/python3.8/site-packages/twisted/python/deprecate.py", line 298, in deprecatedFunction
return function(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/twisted/python/compat.py", line 403, in reraise
raise exception.with_traceback(traceback)
File "/usr/local/lib/python3.8/site-packages/twisted/enterprise/adbapi.py", line 284, in _runWithConnection
result = func(conn, *args, **kw)
File "/usr/local/lib/python3.8/site-packages/synapse/storage/database.py", line 765, in inner_func
return func(db_conn, *args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/synapse/storage/database.py", line 549, in new_transaction
r = func(cursor, *args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/synapse/logging/utils.py", line 69, in wrapped
return f(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/synapse/storage/databases/main/events.py", line 385, in _persist_events_txn
self._store_event_state_mappings_txn(txn, events_and_contexts)
File "/usr/local/lib/python3.8/site-packages/synapse/storage/databases/main/events.py", line 2065, in _store_event_state_mappings_txn
self.db_pool.simple_insert_many_txn(
File "/usr/local/lib/python3.8/site-packages/synapse/storage/database.py", line 923, in simple_insert_many_txn
txn.execute_batch(sql, vals)
File "/usr/local/lib/python3.8/site-packages/synapse/storage/database.py", line 280, in execute_batch
self.executemany(sql, args)
File "/usr/local/lib/python3.8/site-packages/synapse/storage/database.py", line 300, in executemany
self._do_execute(self.txn.executemany, sql, *args)
File "/usr/local/lib/python3.8/site-packages/synapse/storage/database.py", line 330, in _do_execute
return func(sql, *args)
sqlite3.IntegrityError: NOT NULL constraint failed: event_to_state_groups.state_group
```
* Revert "Messy: Fix undefined state_group for federated historical events"
This reverts commit 187ab28611546321e02770944c86f30ee2bc742a.
* Fix federated events being rejected for no state_groups
Add fix from https://github.com/matrix-org/synapse/pull/10439
until it merges.
* Adapting to experimental room version
* Some log cleanup
* Add better comments around extremity fetching code and why
* Rename to be more accurate to what the function returns
* Add changelog
* Ignore rejected events
* Use simplified upsert
* Add Erik's explanation of extra event checks
See https://github.com/matrix-org/synapse/pull/10498#discussion_r680880332
* Clarify that the depth is not directly correlated to the backwards extremity that we return
See https://github.com/matrix-org/synapse/pull/10498#discussion_r681725404
* lock only matters for sqlite
See https://github.com/matrix-org/synapse/pull/10498#discussion_r681728061
* Move new SQL changes to its own delta file
* Clean up upsert docstring
* Bump database schema version (62)
Makes it easier to fetch user details in for example spam checker modules, without needing to use api._store or figure out database interactions.
Signed-off-by: Jason Robinson <jasonr@matrix.org>
Per issue #9812 using `url_preview_ip_range_blacklist` with a proxy via `HTTPS_PROXY` or `HTTP_PROXY` environment variables has some inconsistent bahavior than mentioned. This PR changes the following:
- Changes the Sample Config file to include a note mentioning that `url_preview_ip_range_blacklist` and `ip_range_blacklist` is ignored when using a proxy
- Changes some logic in synapse/config/repository.py to send a warning when both `*ip_range_blacklist` configs and a proxy environment variable are set and but no longer throws an error.
Signed-off-by: Kento Okamoto <kentokamoto@protonmail.com>
Setting the value will help PostgreSQL free up memory by recycling
the connections in the connection pool.
Signed-off-by: Toni Spets <toni.spets@iki.fi>
If the federation client receives an M_UNABLE_TO_AUTHORISE_JOIN or
M_UNABLE_TO_GRANT_JOIN response it will attempt another server
before giving up completely.
Reproducible on a federated homeserver when there is a membership auth event as a floating outlier. Then when we try to backfill one of that persons messages, it has missing membership auth to fetch which caused us to mistakenly replace the `context` for the message with that of the floating membership `outlier` event. Since `outliers` have no `state` or `state_group`, the error bubbles up when we continue down the persisting route: `sqlite3.IntegrityError: NOT NULL constraint failed: event_to_state_groups.state_group`
Call stack:
```
backfill
_auth_and_persist_event
_check_event_auth
_update_auth_events_and_context_for_auth
```
* Make historical messages available to federated servers
Part of MSC2716: https://github.com/matrix-org/matrix-doc/pull/2716
Follow-up to https://github.com/matrix-org/synapse/pull/9247
* Debug message not available on federation
* Add base starting insertion point when no chunk ID is provided
* Fix messages from multiple senders in historical chunk
Follow-up to https://github.com/matrix-org/synapse/pull/9247
Part of MSC2716: https://github.com/matrix-org/matrix-doc/pull/2716
---
Previously, Synapse would throw a 403,
`Cannot force another user to join.`,
because we were trying to use `?user_id` from a single virtual user
which did not match with messages from other users in the chunk.
* Remove debug lines
* Messing with selecting insertion event extremeties
* Move db schema change to new version
* Add more better comments
* Make a fake requester with just what we need
See https://github.com/matrix-org/synapse/pull/10276#discussion_r660999080
* Store insertion events in table
* Make base insertion event float off on its own
See https://github.com/matrix-org/synapse/pull/10250#issuecomment-875711889
Conflicts:
synapse/rest/client/v1/room.py
* Validate that the app service can actually control the given user
See https://github.com/matrix-org/synapse/pull/10276#issuecomment-876316455
Conflicts:
synapse/rest/client/v1/room.py
* Add some better comments on what we're trying to check for
* Continue debugging
* Share validation logic
* Add inserted historical messages to /backfill response
* Remove debug sql queries
* Some marker event implemntation trials
* Clean up PR
* Rename insertion_event_id to just event_id
* Add some better sql comments
* More accurate description
* Add changelog
* Make it clear what MSC the change is part of
* Add more detail on which insertion event came through
* Address review and improve sql queries
* Only use event_id as unique constraint
* Fix test case where insertion event is already in the normal DAG
* Remove debug changes
* Switch to chunk events so we can auth via power_levels
Previously, we were using `content.chunk_id` to connect one
chunk to another. But these events can be from any `sender`
and we can't tell who should be able to send historical events.
We know we only want the application service to do it but these
events have the sender of a real historical message, not the
application service user ID as the sender. Other federated homeservers
also have no indicator which senders are an application service on
the originating homeserver.
So we want to auth all of the MSC2716 events via power_levels
and have them be sent by the application service with proper
PL levels in the room.
* Switch to chunk events for federation
* Add unstable room version to support new historical PL
* Fix federated events being rejected for no state_groups
Add fix from https://github.com/matrix-org/synapse/pull/10439
until it merges.
* Only connect base insertion event to prev_event_ids
Per discussion with @erikjohnston,
https://matrix.to/#/!UytJQHLQYfvYWsGrGY:jki.re/$12bTUiObDFdHLAYtT7E-BvYRp3k_xv8w0dUQHibasJk?via=jki.re&via=matrix.org
* Make it possible to get the room_version with txn
* Allow but ignore historical events in unsupported room version
See https://github.com/matrix-org/synapse/pull/10245#discussion_r675592489
We can't reject historical events on unsupported room versions because homeservers without knowledge of MSC2716 or the new room version don't reject historical events either.
Since we can't rely on the auth check here to stop historical events on unsupported room versions, I've added some additional checks in the processing/persisting code (`synapse/storage/databases/main/events.py` -> `_handle_insertion_event` and `_handle_chunk_event`). I've had to do some refactoring so there is method to fetch the room version by `txn`.
* Move to unique index syntax
See https://github.com/matrix-org/synapse/pull/10245#discussion_r675638509
* High-level document how the insertion->chunk lookup works
* Remove create_event fallback for room_versions
See https://github.com/matrix-org/synapse/pull/10245/files#r677641879
* Use updated method name
Mostly this involves decorating a few Deferred declarations with extra type hints. We wrap the types in quotes to avoid runtime errors when running against older versions of Twisted that don't have generics on Deferred.
IE11 doesn't support Content-Security-Policy but it has support for
a non-standard X-Content-Security-Policy header, which only supports the
sandbox directive. This prevents script execution, so it at least offers
some protection against media repo-based attacks.
Signed-off-by: Denis Kasak <dkasak@termina.org.uk>
* Fix no-access-token bug in deactivation tests
* Support MSC2033: Device ID on whoami
* Test for appservices too
MSC: https://github.com/matrix-org/matrix-doc/pull/2033
The MSC has passed FCP, which means stable endpoints can be used.
Synapse 1.38.1 (2021-07-22)
===========================
Bugfixes
--------
- Always include `device_one_time_keys_count` key in `/sync` response to work around a bug in Element Android that broke encryption for new devices. ([\#10457](https://github.com/matrix-org/synapse/issues/10457))
Synapse 1.39.0rc2 (2021-07-22)
==============================
Bugfixes
--------
- Always include `device_one_time_keys_count` key in `/sync` response to work around a bug in Element Android that broke encryption for new devices. ([\#10457](https://github.com/matrix-org/synapse/issues/10457))
Internal Changes
----------------
- Move docker image build to Github Actions. ([\#10416](https://github.com/matrix-org/synapse/issues/10416))
Synapse 1.38.1 (2021-07-22)
===========================
Bugfixes
--------
- Always include `device_one_time_keys_count` key in `/sync` response to work around a bug in Element Android that broke encryption for new devices. ([\#10457](https://github.com/matrix-org/synapse/issues/10457))
Now that we have `simple_upsert` that should be used in preference to
trying to insert and looking for an exception. The main benefit is that
we ERROR message don't get written to postgres logs.
We also have tidy up the return value on `simple_upsert`, rather than
having a tri-state of inserted/not-inserted/unknown.
* switch from `types.CoroutineType` to `typing.Coroutine`
these should be identical semantically, and since `defer.ensureDeferred` is
defined to take a `typing.Coroutine`, will keep mypy happy
* Fix some annotations on inlineCallbacks functions
* changelog
Improves type hints for:
* parse_{boolean,integer}
* parse_{boolean,integer}_from_args
* parse_json_{value,object}_from_request
And fixes any incorrect calls that resulted from unknown types.
Previously, we were using `content.chunk_id` to connect one
chunk to another. But these events can be from any `sender`
and we can't tell who should be able to send historical events.
We know we only want the application service to do it but these
events have the sender of a real historical message, not the
application service user ID as the sender. Other federated homeservers
also have no indicator which senders are an application service on
the originating homeserver.
So we want to auth all of the MSC2716 events via power_levels
and have them be sent by the application service with proper
PL levels in the room.
This adds an API for third-party plugin modules to implement account validity, so they can provide this feature instead of Synapse. The module implementing the current behaviour for this feature can be found at https://github.com/matrix-org/synapse-email-account-validity.
To allow for a smooth transition between the current feature and the new module, hooks have been added to the existing account validity endpoints to allow their behaviours to be overridden by a module.
The postgres statistics collector sometimes massively underestimates the
number of distinct state groups are in the `state_groups_state`, which
can cause postgres to use table scans for queries for multiple state
groups.
We fix this by manually setting `n_distinct` on the column.
Our documentation has a history of using a document's name as a way to link to it, such as "See [workers.md]() for details". This makes sense when you're traversing a directory of files, but less sense when the files are abstracted away - as they are on the documentation website.
This PR changes the links to various documentation pages to something that fits better into the surrounding sentence, as you would when making any hyperlink on the web.
This is to help with performance, where trying to connect to thousands
of hosts at once can consume a lot of CPU (due to TLS etc).
Co-authored-by: Brendan Abolivier <babolivier@matrix.org>
A few things here:
* Build the debs for single distro for each PR, so that we can see if it breaks. Do the same for develop. Building all the debs ties up the GHA workers for ages.
* Stop building the debs for release branches. Again, it takes ages, and I don't think anyone is actually going to stop and look at them. We'll know they are working when we make an RC.
* Change the configs so that if we manually cancel a workflow, it actually does something.
Previously only world-readable rooms were shown. This means that
rooms which are public, knockable, or invite-only with a pending invitation,
are included in a space summary. It also applies the same logic to
the experimental room version from MSC3083 -- if a user has access
to the proper allowed rooms then it is shown in the spaces summary.
This change is made per MSC3173 allowing stripped state of a room to
be shown to any potential room joiner.
* Upsert redactions in case they already exists
Occasionally, in combination with retention, redactions aren't deleted
from the database whenever they are due for deletion. The server will
eventually try to backfill the deleted events and trip over the already
existing redaction events.
Switching to an UPSERT for those events allows us to recover from there
situations. The retention code still needs fixing but that is outside of
my current comfort zone on this code base.
This is related to #8707 where the error was discussed already.
Signed-off-by: Andreas Rammhold <andreas@rammhold.de>
* Also purge redactions when purging events
Previously redacints where left behind leading to backfilling issues
when the server stumbled across the already existing yet to be
backfilled redactions.
This issues has been discussed in #8707.
Signed-off-by: Andreas Rammhold <andreas@rammhold.de>
* Add base starting insertion point when no chunk ID is provided
This is so we can have the marker event point to this initial
insertion event and be able to traverse the events in the first chunk.
* Use fake time in tests in _get_start_of_day.
* Change the inequality of last_seen in user_daily_visits
Co-authored-by: Erik Johnston <erik@matrix.org>
Because modules might send extra state events when processing an event (e.g. matrix-org/synapse-dinsic#100), and in some cases these extra events might get dropped if we don't recalculate the initial event's auth.
this was a typo introduced in #10282. We don't want to end up doing the
`replace_stream_ordering_column` update after anything that comes up in
migration 60/03.
The presence router docs include some sample homeserver config. At some point we changed the name of the [config option](859dc05b36/docs/sample_config.yaml (L104-L113)), but forgot to update the docs.
I've also added `presence.enabled: true` to the example, as that's the new way to enable presence (the `presence_enabled` option has been deprecated).
Fixes#9490
This will break a couple of SyTest that are expecting failures to be added to the response of a federation /send, which obviously doesn't happen now that things are asynchronous.
Two drawbacks:
Currently there is no logic to handle any events left in the staging area after restart, and so they'll only be handled on the next incoming event in that room. That can be fixed separately.
We now only process one event per room at a time. This can be fixed up further down the line.
* Move background update names out to a separate class
`EventsBackgroundUpdatesStore` gets inherited and we don't really want to
further pollute the namespace.
* Migrate stream_ordering to a bigint
* changelog
Currently when a new build of the docs is created, an `index.html` file does not exist. Typically this would be generated from a`docs/README.md` file - which we have - however we're currently using [docs/README.md](394673055d/docs/README.md) to explain the docs and point to the website. It is not part of the content of the website. So we end up not having an `index.html` file, which will result in a 404 page if one tries to navigate to `https://matrix-org.github.io/synapse/<docs_version>/index.html`.
This isn't a really problem for the default version of the documentation (currently `develop`), as [navigating to the top-level root](https://matrix-org.github.io/synapse/) of the website (without specifying a version) will [redirect](a77e6925f2/index.html (L2)) you to the Welcome and Overview page of the `develop` docs version.
However, ideally once we add a GUI for switching between versions, we'll want to send the user to `matrix-org.github.io/synapse/<version>/index.html`, which currently isn't generated.
This PR modifies the CI that builds the docs to simply copy the rendered [Welcome & Overview page](https://matrix-org.github.io/synapse/develop/welcome_and_overview.html) to `index.html`.
The idea here is to stop people sending things that aren't joins/leaves/knocks through these endpoints: previously you could send anything you liked through them. I wasn't able to find any security holes from doing so, but it doesn't sound like a good thing.
This implements refresh tokens, as defined by MSC2918
This MSC has been implemented client side in Hydrogen Web: vector-im/hydrogen-web#235
The basics of the MSC works: requesting refresh tokens on login, having the access tokens expire, and using the refresh token to get a new one.
Signed-off-by: Quentin Gliech <quentingliech@gmail.com>
This PR:
* Converts UPGRADE.rst to markdown and moves the contents into the `docs/` directory.
* Updates the contents of UPGRADE.rst to point to the website instead.
* Updates links around the codebase that point to UPGRADE.rst.
`pandoc` + some manual editing was used to convert from RST to md.
* rename major/minor into the right semver terminology minor/patch (since this was something that got me very confused the first couple of times I've used the script)
* name the release branch based on the new version, not the previous one
Required some fixes due to merge conflicts with #6739, but nothing too hairy. The first commit is the same as the original (after merge conflict resolution) then two more for compatibility with the latest sync code.
If a room is remote and we don't have a user in it, always try to join it. It might fail if the room is invite-only, but we don't have a user to invite with, so at this point it's the best we can do.
Fixes#10233 (at least to some extent)
* Drop Origin & Accept from Access-Control-Allow-Headers value
This change drops the Origin and Accept header names from the value of the
Access-Control-Allow-Headers response header sent by Synapse. Per the CORS
protocol, it’s not necessary or useful to include those header names.
Details:
Per-spec at https://fetch.spec.whatwg.org/#forbidden-header-name, Origin
is a “forbidden header name” set by the browser and that frontend
JavaScript code is never allowed to set.
So the value of Access-Control-Allow-Headers isn’t relevant to Origin or
in general to other headers set by the browser itself — the browser
never ever consults the Access-Control-Allow-Headers value to confirm
that it’s OK for the request to include an Origin header.
And per-spec at https://fetch.spec.whatwg.org/#cors-safelisted-request-header,
Accept is a “CORS-safelisted request-header”, which means that browsers
allow requests to contain the Accept header regardless of whether the
Access-Control-Allow-Headers value contains "Accept".
So it’s unnecessary for the Access-Control-Allow-Headers to explicitly
include Accept. Browsers will not perform a CORS preflight for requests
containing an Accept request header.
Related: https://github.com/matrix-org/matrix-doc/pull/3225
Signed-off-by: Michael[tm] Smith <mike@w3.org>
Implemented config option sso.update_profile_information to keep user's display name in sync with the SSO displayname.
Signed-off-by: Johannes Kanefendt <johannes.kanefendt@krzn.de>
We were repeatedly looking up a config option in a loop (using the
unclassed config style), which is expensive enough that it can cause
large CPU usage.
An accidental mis-ordering of operations during #6739 technically allowed an incoming knock event over federation in before checking it against any configured Third Party Access Rules modules.
This PR corrects that by performing the TPAR check *before* persisting the event.
This PR will run a new "Deploy release-specific documentation" job whenever a push to a branch name matching `release-v*` occurs. Doing so will create/add to a folder named `vX.Y` on the `gh-pages` branch. Doing so will allow us to build up `major.minor` releases of the docs as we release Synapse.
This is especially useful for having a mechanism for keeping around documentation of old/removed features (for those running older versions of Synapse), without needing to clutter the latest copy of the docs.
After a [discussion](https://matrix.to/#/!XaqDhxuTIlvldquJaV:matrix.org/$rKmkBmQle8OwTlGcoyu0BkcWXdnHW3_oap8BMgclwIY?via=matrix.org&via=vector.modular.im&via=envs.net) in #synapse-dev, we wanted to use tags to trigger the documentation deployments, which I agreed with. However, I soon realised that the bash-foo required to turn a tag of `v1.2.3rc1` into `1.2` was a lot more complex than the branch's `release-v1.2`. So, I've gone with the latter for simplicity.
In the future we'll have some UI on the website to switch between versions, but for now you can simply just change 'develop' to 'v1.2' in the URL.
This could cause a minor data leak if someone defined a non-restricted join rule
with an allow key or used a restricted join rule in an older room version, but this is
unlikely.
Additionally this starts adding unit tests to the spaces summary handler.
This PR adds a common configuration section for all modules (see docs). These modules are then loaded at startup by the homeserver. Modules register their hooks and web resources using the new `register_[...]_callbacks` and `register_web_resource` methods of the module API.
Fixes https://github.com/matrix-org/synapse/issues/10030.
We were expecting milliseconds where we should have provided a value in seconds.
The impact of this bug isn't too bad. The code is intended to count the number of remote servers that the homeserver can see and report that as a metric. This metric is supposed to run initially 1 second after server startup, and every 60s as well. Instead, it ran 1,000 seconds after server startup, and every 60s after startup.
This fix allows for the correct metrics to be collected immediately, as well as preventing a random collection 1,000s in the future after startup.
Dangerous actions means deactivating an account, modifying an account
password, or adding a 3PID.
Other actions (deleting devices, uploading keys) can re-use the same UI
auth session if ui_auth.session_timeout is configured.
* Trace event persistence
When we persist a batch of events, set the parent opentracing span to the that
from the request, so that we can trace all the way in.
* changelog
* When we force tracing, set a baggage item
... so that we can check again later.
* Link in both directions between persist_events spans
* Room version 7 for knocking.
* Stable prefixes and endpoints (both client and federation) for knocking.
* Removes the experimental configuration flag.
Add 'federation_ip_range_whitelist'. This allows backwards-compatibility, If 'federation_ip_range_blacklist' is set. Otherwise 'ip_range_whitelist' will be used for federation servers.
Signed-off-by: Michael Kutzner 1mikure@gmail.com
This is the first of two PRs which seek to address #8518. This first PR lays the groundwork by extending ResponseCache; a second PR (#10158) will update the SyncHandler to actually use it, and fix the bug.
The idea here is that we allow the callback given to ResponseCache.wrap to decide whether its result should be cached or not. We do that by (optionally) passing a ResponseCacheContext into it, which it can modify.
Synapse 1.36.0rc2 (2021-06-11)
==============================
Bugfixes
--------
- Fix a bug which caused presence updates to stop working some time after a restart, when using a presence writer worker. Broke in v1.33.0. ([\#10149](https://github.com/matrix-org/synapse/issues/10149))
- Fix a bug when using federation sender worker where it would send out more presence updates than necessary, leading to high resource usage. Broke in v1.33.0. ([\#10163](https://github.com/matrix-org/synapse/issues/10163))
- Fix a bug where Synapse could send the same presence update to a remote twice. ([\#10165](https://github.com/matrix-org/synapse/issues/10165))
This is essentially an implementation of the proposal made at https://hackmd.io/@richvdh/BJYXQMQHO, though the details have ended up looking slightly different.
This implements similar behavior to sytest where a matching branch is used,
if one exists. This is useful when needing to modify both application code
and tests at the same time. The following rules are used to find a matching
complement branch:
1. Search for the branch name of the pull request. (E.g. feature/foo.)
2. Search for the base branch of the pull request. (E.g. develop or release-vX.Y.)
3. Search for the reference branch of the commit. (E.g. master or release-vX.Y.)
4. Fallback to 'master', the default complement branch name.
Spawned from missing messages we were seeing on `matrix.org` from a
federated Gtiter bridged room, https://gitlab.com/gitterHQ/webapp/-/issues/2770.
The underlying issue in Synapse is tracked by https://github.com/matrix-org/synapse/issues/10066
where the message and join event race and the message is `soft_failed` before the
`join` event reaches the remote federated server.
Less soft_failed events = better and usually this should only trigger for events
where people are doing bad things and trying to fuzz and fake everything.
This PR updates the build tags that we perform Complement runs with to match our [buildkite pipeline](618b3e90bc/synapse/pipeline.yml (L570)), as well as adding `msc2403` (as it will be required once #9359 is merged). Build tags are what we use to determine which tests to run in Complement (really it determines which test files are compiled into the final binary).
I haven't put in a comment about updating the buildkite side here, as we've decided to migrate fully to GitHub Actions anyhow.
With the prior format, 1.33.0 / 1.33.1 / 1.33.2 got separate branches:
release-v1.33.0
release-v1.33.1
release-v1.33.2
Under the new model, all three would share a common branch:
release-v1.33
As before, RCs and actual releases exist as tags on these branches.
This better reflects our support model, e.g., that the "1.33" series had
a formal release followed by two patches / updates.
Signed-off-by: Dan Callahan <danc@element.io>
Fixes#1834.
`get_new_events_for_appservice` internally calls `get_events_as_list`, which will filter out any rejected events. If all returned events are filtered out, `_notify_interested_services` will return without updating the last handled stream position. If there are 100 consecutive such events, processing will halt altogether.
Breaking the loop is now done by checking whether we're up-to-date with `current_max` in the loop condition, instead of relying on an empty `events` list.
Signed-off-by: Willem Mulder <14mRh4X0r@gmail.com>
If backfilling is slow then the client may time out and retry, causing
Synapse to start a new `/backfill` before the existing backfill has
finished, duplicating work.
This adds quite a lot of OpenTracing decoration for database activity. Specifically it adds tracing at four different levels:
* emit a span for each "interaction" - ie, the top level database function that we tend to call "transaction", but isn't really, because it can end up as multiple transactions.
* emit a span while we hold a database connection open
* emit a span for each database transaction - actual actual transaction.
* emit a span for each database query.
I'm aware this might be quite a lot of overhead, but even just running it on a local Synapse it looks really interesting, and I hope the overhead can be offset just by turning down the sampling frequency and finding other ways of tracing requests of interest (eg, the `force_tracing_for_users` setting).
The existing tracing reports an error each time there is a timeout, which isn't
really representative.
Additionally, we log things about the way `wait_for_events` works
(eg, the result of the callback) to the *parent* span, which is confusing.
So that they render nicely in mdbook (see #10086), and so that we no longer have a mix of structured text languages in our documentation (excluding files outside of `docs/`).
Empirically, this helped my server considerably when handling gaps in Matrix HQ. The problem was that we would repeatedly call have_seen_events for the same set of (50K or so) auth_events, each of which would take many minutes to complete, even though it's only an index scan.
Synapse 1.35.0rc2 (2021-05-27)
==============================
Bugfixes
--------
- Fix a bug introduced in v1.35.0rc1 when calling the spaces summary API via a GET request. ([\#10079](https://github.com/matrix-org/synapse/issues/10079))
* Make `invalidate` and `invalidate_many` do the same thing
... so that we can do either over the invalidation replication stream, and also
because they always confused me a bit.
* Kill off `invalidate_many`
* changelog
`keylen` seems to be a thing that is frequently incorrectly set, and we don't really need it.
The only time it was used was to figure out if we had removed a subtree in `del_multi`, which we can do better by changing `TreeCache.pop` to return a different type (`TreeCacheNode`).
Commits should be independently reviewable.
* Fix /upload 500'ing when presented a very large image
Catch DecompressionBombError and re-raise as ThumbnailErrors
* Set PIL's MAX_IMAGE_PIXELS to match homeserver.yaml
to get it to bomb out quicker, to load less into memory
in the case of super large images
* Add changelog entry for 10029
https://github.com/matrix-org/synapse/issues/9962 uncovered that we accidentally removed all but one of the presence updates that we store in the database when persisting multiple updates. This could cause users' presence state to be stale.
The bug was fixed in #10014, and this PR just adds a test that failed on the old code, and was used to initially verify the bug.
The test attempts to insert some presence into the database in a batch using `PresenceStore.update_presence`, and then simply pulls it out again.
Fixes: https://github.com/matrix-org/synapse/issues/9962
This is a fix for above problem.
I fixed it by swaping the order of insertion of new records and deletion of old ones. This ensures that we don't delete fresh database records as we do deletes before inserts.
Signed-off-by: Marek Matys <themarcq@gmail.com>
Also add support for giving a callback to generate the JSON object to
verify. This should reduce memory usage, as we no longer have the event
in memory in dict form (which has a large memory footprint) for extend
periods of time.
Instead of parsing the full response to `/send_join` into Python objects (which can be huge for large rooms) and *then* parsing that into events, we instead use ijson to stream parse the response directly into `EventBase` objects.
To be more consistent with similar code. The check now automatically
raises an AuthError instead of passing back a boolean. It also absorbs
some shared logic between callers.
- use a tuple rather than a list for the iterable that is passed into the
wrapped function, for performance
- test that we can pass an iterable and that keys are correctly deduped.
It's not obvious that instances of SQLBaseStore each need their own
instances of random.SystemRandom(); let's just use random directly.
Introduced by 52839886d6
Signed-off-by: Dan Callahan <danc@element.io>
Functionally identical, but more obviously cryptographically secure.
...Explicit is better than implicit?
Avoids needing to know that SystemRandom() implies a CSPRNG, and
complies with the big scary red box on the documentation for random:
> Warning:
> The pseudo-random generators of this module should not be used for
> security purposes. For security or cryptographic uses, see the
> secrets module.
https://docs.python.org/3/library/random.html
Signed-off-by: Dan Callahan <danc@element.io>
This should help ensure that equivalent results are achieved between
homeservers querying for the summary of a space.
This implements modified MSC1772 rules, according to MSC2946.
The different is that the origin_server_ts of the m.room.create event
is not used as a tie-breaker since this might not be known if the
homeserver is not part of the room.
Per changes in MSC2946, the C-S and S-S APIs for spaces summary
should use GET requests.
Until this is stable, the POST endpoints still exist.
This does not switch federation requests to use the GET version yet
since it is newly added and already deployed servers might not support
it. When switching to the stable endpoint we should switch to GET
requests.
MSC1772 specifies the m.room.create event should be sent as part
of the invite_state. This was done optionally behind an experimental
flag, but is now done by default due to MSC1772 being approved.
Now that cross signing exists there is much less of a need for other people to look at devices and verify them individually. This PR adds a config option to allow you to prevent device display names from being shared with other servers.
Signed-off-by: Aaron Raimist <aaron@raim.ist>
* tests for push rule pattern matching
* tests for acl pattern matching
* factor out common `re.escape`
* Factor out common re.compile
* Factor out common anchoring code
* add word_boundary support to `glob_to_regex`
* Use `glob_to_regex` in push rule evaluator
NB that this drops support for character classes. I don't think anyone ever
used them.
* Improve efficiency of globs with multiple wildcards
The idea here is that we compress multiple `*` globs into a single `.*`. We
also need to consider `?`, since `*?*` is as hard to implement efficiently as
`**`.
* add assertion on regex pattern
* Fix mypy
* Simplify glob_to_regex
* Inline the glob_to_regex helper function
Signed-off-by: Dan Callahan <danc@element.io>
* Moar comments
Signed-off-by: Dan Callahan <danc@element.io>
Co-authored-by: Dan Callahan <danc@element.io>
We were pulling the full auth chain for the room out of the DB each time
we backfilled, which can be *huge* for large rooms and is totally
unnecessary.
The hope here is that by moving all the schema files into synapse/storage/schema, it gets a bit easier for newcomers to navigate.
It certainly got easier for me to write a helpful README. There's more to do on that front, but I'll follow up with other PRs for that.
This is an update based on changes to MSC2946. The origin_server_ts
of the m.room.create event is copied into the creation_ts field for each
room returned from the spaces summary.
Synapse can be quite memory intensive, and unless care is taken to tune
the GC thresholds it can end up thrashing, causing noticable performance
problems for large servers. We fix this by limiting how often we GC a
given generation, regardless of current counts/thresholds.
This does not help with the reverse problem where the thresholds are set
too high, but that should only happen in situations where they've been
manually configured.
Adds a `gc_min_seconds_between` config option to override the defaults.
Fixes#9890.
* Add healthcheck startup delay by 5secs and reduced interval check to 15s
to reduce waiting time for docker aware edge routers bringing an
instance online
This leaves out all optional keys from /sync. This should be fine for all clients tested against conduit already, but it may break some clients, as such we should check, that at least most of them don't break horribly and maybe back out some of the individual changes. (We can probably always leave out groups for example, while the others may cause more issues.)
Signed-off-by: Nicolas Werner <nicolas.werner@hotmail.de>
Support the delete of a room through DELETE request and mark
previous request as deprecated through documentation.
Signed-off-by: Thibault Ferrante <thibault.ferrante@pm.me>
This fixes a regression where the logging context for runWithConnection
was reported as runWithConnection instead of the connection name,
e.g. "POST-XYZ".
I went through and removed a bunch of cruft that was lying around for compatibility with old Python versions. This PR also will now prevent Synapse from starting unless you're running Python 3.6+.
This ensures that something like an auth error (403) will be
returned to the requester instead of attempting to try more
servers, which will likely result in the same error, and then
passing back a generic 400 error.
First of all, a fixup to `FakeChannel` which is needed to make it work with the default HTTP channel implementation.
Secondly, it looks like we no longer need `_PushHTTPChannel`, because as of #8013, the producer that gets attached to the `HTTPChannel` is now an `IPushProducer`. This is good, because it means we can remove a whole load of test-specific boilerplate which causes variation between tests and production.
Applied a (slightly modified) patch from https://github.com/matrix-org/synapse/issues/9574.
As far as I understand this would allow the cookie set during the OIDC flow to work on deployments using public baseurls that do not sit at the URL path root.
When receiving a /send_join request for a room with join rules set to 'restricted',
check if the user is a member of the spaces defined in the 'allow' key of the join rules.
This only applies to an experimental room version, as defined in MSC3083.
Synapse 1.32.2 (2021-04-22)
===========================
This release includes a fix for a regression introduced in 1.32.0.
Bugfixes
--------
- Fix a regression in Synapse 1.32.0 and 1.32.1 which caused `LoggingContext` errors in plugins. ([\#9857](https://github.com/matrix-org/synapse/issues/9857))
1.32.0 also introduced an incompatibility with Synapse modules that make use of `synapse.logging.context.LoggingContext`, such as [synapse-s3-storage-provider](https://github.com/matrix-org/synapse-s3-storage-provider).
This PR adds a note to the 1.32.0 changelog and upgrade notes about it.
Synapse 1.32.1 (2021-04-21)
===========================
This release fixes [a regression](https://github.com/matrix-org/synapse/issues/9853) in Synapse 1.32.0 that caused connected Prometheus instances to become unstable. If you ran Synapse 1.32.0 with Prometheus metrics, first upgrade to Synapse 1.32.1 and follow [these instructions](https://github.com/matrix-org/synapse/pull/9854#issuecomment-823472183) to clean up any excess writeahead logs.
Bugfixes
--------
- Fix a regression in Synapse 1.32.0 which caused Synapse to report large numbers of Prometheus time series, potentially overwhelming Prometheus instances. ([\#9854](https://github.com/matrix-org/synapse/issues/9854))
As far as I can tell our logging contexts are meant to log the request ID, or sometimes the request ID followed by a suffix (this is generally stored in the name field of LoggingContext). There's also code to log the name@memory location, but I'm not sure this is ever used.
This simplifies the code paths to require every logging context to have a name and use that in logging. For sub-contexts (created via nested_logging_contexts, defer_to_threadpool, Measure) we use the current context's str (which becomes their name or the string "sentinel") and then potentially modify that (e.g. add a suffix).
This attempts to be a direct port of https://github.com/matrix-org/synapse-dinsic/pull/74 to mainline. There was some fiddling required to deal with the changes that have been made to mainline since (mainly dealing with the split of `RegistrationWorkerStore` from `RegistrationStore`, and the changes made to `self.make_request` in test code).
When receiving a /send_join request for a room with join rules set to 'restricted',
check if the user is a member of the spaces defined in the 'allow' key of the join
rules.
This only applies to an experimental room version, as defined in MSC3083.
This basically speeds up federation by "squeezing" each individual dual database call (to destinations and destination_rooms), which previously happened per every event, into one call for an entire batch (100 max).
Signed-off-by: Jonathan de Jong <jonathan@automatia.nl>
Part of #9744
Removes all redundant `# -*- coding: utf-8 -*-` lines from files, as python 3 automatically reads source code as utf-8 now.
`Signed-off-by: Jonathan de Jong <jonathan@automatia.nl>`
This PR adds a Dockerfile and some supporting files to the `docker/` directory. The Dockerfile's intention is to spin up a container with:
* A Synapse main process.
* Any desired worker processes, defined by a `SYNAPSE_WORKERS` environment variable supplied at runtime.
* A redis for worker communication.
* A nginx for routing traffic.
* A supervisord to start all worker processes and monitor them if any go down.
Note that **this is not currently intended to be used in production**. If you'd like to use Synapse workers with Docker, instead make use of the official image, with one worker per container. The purpose of this dockerfile is currently to allow testing Synapse in worker mode with the [Complement](https://github.com/matrix-org/complement/) test suite.
`configure_workers_and_start.py` is where most of the magic happens in this PR. It reads from environment variables (documented in the file) and creates all necessary config files for the processes. It is the entrypoint of the Dockerfile, and thus is run any time the docker container is spun up, recreating all config files in case you want to use a different set of workers. One can specify which workers they'd like to use by setting the `SYNAPSE_WORKERS` environment variable (as a comma-separated list of arbitrary worker names) or by setting it to `*` for all worker processes. We will be using the latter in CI.
Huge thanks to @MatMaul for helping get this all working 🎉 This PR is paired with its equivalent on the Complement side: https://github.com/matrix-org/complement/pull/62.
Note, for the purpose of testing this PR before it's merged: You'll need to (re)build the base Synapse docker image for everything to work (`matrixdotorg/synapse:latest`). Then build the worker-based docker image on top (`matrixdotorg/synapse:workers`).
Context is in https://github.com/matrix-org/synapse/issues/9764#issuecomment-818615894.
I struggled to find a more official link for this. The problem occurs when using WSL1 instead of WSL2, which some Windows platforms (at least Server 2019) still don't have. Docker have updated their documentation to paint a much happier picture now given WSL2's support.
The last sentence here can probably be removed once WSL1 is no longer around... though that will likely not be for a very long time.
This change ensures that the appservice registration behaviour follows the spec. We decided to do this for Dendrite, so it made sense to also make a PR for synapse to correct the behaviour.
Related: #8334
Deprecated in: #9429 - Synapse 1.28.0 (2021-02-25)
`GET /_synapse/admin/v1/users/<user_id>` has no
- unit tests
- documentation
API in v2 is available (#5925 - 12/2019, v1.7.0).
API is misleading. It expects `user_id` and returns a list of all users.
Signed-off-by: Dirk Klimpel dirk@klimpel.org
We pull all destinations requiring catchup from the DB in batches.
However, if all those destinations get filtered out (due to the
federation sender being sharded), then the `last_processed` destination
doesn't get updated, and we keep requesting the same set repeatedly.
They don't make any sense on the intermediate builder image. The final
images needs them to be of use for anyone.
Signed-off-by: Johannes Wienke <languitar@semipol.de>
When joining a room with join rules set to 'restricted', check if the
user is a member of the spaces defined in the 'allow' key of the join rules.
This only applies to an experimental room version, as defined in MSC3083.
This PR modifies `GaugeBucketCollector` to only report data once it has been updated, rather than initially reporting a value of 0. Fixes zero values being reported for some metrics on startup until a background job to update the metric's value runs later.
At the moment, if you'd like to share presence between local or remote users, those users must be sharing a room together. This isn't always the most convenient or useful situation though.
This PR adds a module to Synapse that will allow deployments to set up extra logic on where presence updates should be routed. The module must implement two methods, `get_users_for_states` and `get_interested_users`. These methods are given presence updates or user IDs and must return information that Synapse will use to grant passing presence updates around.
A method is additionally added to `ModuleApi` which allows triggering a set of users to receive the current, online presence information for all users they are considered interested in. This is the equivalent of that user receiving presence information during an initial sync.
The goal of this module is to be fairly generic and useful for a variety of applications, with hard requirements being:
* Sending state for a specific set or all known users to a defined set of local and remote users.
* The ability to trigger an initial sync for specific users, so they receive all current state.
The `remote_media_cache_thumbnails_media_origin_media_id_thumbna_key`
constraint is superceded by
`remote_media_repository_thumbn_media_origin_id_width_height_met` (which adds
`thumbnail_method` to the unique key).
PR #7124 made an attempt to remove the old constraint, but got the name wrong,
so it didn't work. Here we update the bg update and rerun it.
Fixes#8649.
The regex should be terminated so that subdomain matches of another
domain are not accepted. Just ensuring that someone doesn't shoot
themselves in the foot by copying our example.
Signed-off-by: Denis Kasak <dkasak@termina.org.uk>
This PR rewrites the original complement.sh script with a number of improvements:
* We can now use a local checkout of Complement (configurable with `COMPLEMENT_DIR`), though the default behaviour still downloads the master branch.
* You can now specify a regex of test names to run, or just run all tests.
* We now use the Synapse test blacklist tag (so all tests will pass).
`room_invite_state_types` was inconvenient as a configuration setting, because
anyone that ever set it would not receive any new types that were added to the
defaults. Here, we deprecate the old setting, and replace it with a couple of
new settings under `room_prejoin_state`.
This should fix a class of bug where we forget to check if e.g. the appservice shouldn't be ratelimited.
We also check the `ratelimit_override` table to check if the user has ratelimiting disabled. That table is really only meant to override the event sender ratelimiting, so we don't use any values from it (as they might not make sense for different rate limits), but we do infer that if ratelimiting is disabled for the user we should disabled all ratelimits.
Fixes#9663
I've reiterated the advice about using `oidc` to migrate, since I've seen a few
people caught by this.
I've also removed a couple of the examples as they are duplicating the OIDC
documentation, and I think they might be leading people astray.
If you have the wrong version of `cryptography` installed, synapse suggests:
```
To install run:
pip install --upgrade --force 'cryptography>=3.4.7;python_version>='3.6''
```
However, the use of ' inside '...' doesn't work, so when you run this, you get
an error.
Make pip install faster in Docker build for [Complement](https://github.com/matrix-org/complement) testing.
If files have changed in a `COPY` command, Docker will invalidate all of the layers below. So I changed the order of operations to install all dependencies before we `COPY synapse /synapse/synapse/`. This allows Docker to use our cached layer of dependencies even when we change the source of Synapse and speed up builds dramatically! `53.5s` -> `3.7s` builds 🤘
As an alternative, I did try using BuildKit caches but this still took 30 seconds overall on that step. 15 seconds to gather the dependencies from the cache and another 15 seconds to `Installing collected packages`.
Fix https://github.com/matrix-org/synapse/issues/9364
Running `dmypy run` will do a `mypy` check while spinning up a daemon
that makes rerunning `dmypy run` a lot faster.
`dmypy` doesn't support `follow_imports = silent` and has
`local_partial_types` enabled, so this PR enables those options and
fixes the issues that were newly raised. Note that `local_partial_types`
will be enabled by default in upcoming mypy releases.
Make it clearer in the source install step that the platform specific
prerequisites must be installed first.
Signed-off-by: Serban Constantin <serban.constantin@gmail.com>
Split off from https://github.com/matrix-org/synapse/pull/9491
Adds a storage method for getting the current presence of all local users, optionally excluding those that are offline. This will be used by the code in #9491 when a PresenceRouter module informs Synapse that a given user should have `"ALL"` user presence updates routed to them. Specifically, it is used here: b588f16e39/synapse/handlers/presence.py (L1131-L1133)
Note that there is a `get_all_presence_updates` function just above. That function is intended to walk up the table through stream IDs, and is primarily used by the presence replication stream. I could possibly make use of it in the PresenceRouter-related code, but it would be a bit of a bodge.
Builds on the work done in #9643 to add a federation API for space summaries.
There's a bit of refactoring of the existing client-server code first, to avoid too much duplication.
Currently federation catchup will send the last *local* event that we
failed to send to the remote. This can cause issues for large rooms
where lots of servers have sent events while the remote server was down,
as when it comes back up again it'll be flooded with events from various
points in the DAG.
Instead, let's make it so that all the servers send the most recent
events, even if its not theirs. The remote should deduplicate the
events, so there shouldn't be much overhead in doing this.
Alternatively, the servers could only send local events if they were
also extremities and hope that the other server will send the event
over, but that is a bit risky.
This bug was discovered by DINUM. We were modifying `serialized_event["content"]`, which - if you've got `USE_FROZEN_DICTS` turned on or are [using a third party rules module](17cd48fe51/synapse/events/third_party_rules.py (L73-L76)) - will raise a 500 if you try to a edit a reply to a message.
`serialized_event["content"]` could be set to the edit event's content, instead of a copy of it, which is bad as we attempt to modify it. Instead, we also end up modifying the original event's content. DINUM uses a third party rules module, which meant the event's content got frozen and thus an exception was raised.
To be clear, the problem is not that the event's content was frozen. In fact doing so helped us uncover the fact we weren't copying event content correctly.
We had two functions named `get_forward_extremities_for_room` and
`get_forward_extremeties_for_room` that took different paramters. We
rename one of them to avoid confusion.
* Populate `internal_metadata.outlier` based on `events` table
Rather than relying on `outlier` being in the `internal_metadata` column,
populate it based on the `events.outlier` column.
* Move `outlier` out of InternalMetadata._dict
Ultimately, this will allow us to stop writing it to the database. For now, we
have to grandfather it back in so as to maintain compatibility with older
versions of Synapse.
Instead of if the user does not have a password hash. This allows a SSO
user to add a password to their account, but only if the local password
database is configured.
Fixes https://github.com/matrix-org/synapse/issues/9572
When a SSO user logs in for the first time, we create a local Matrix user for them. This goes through the register_user flow, which ends up triggering the spam checker. Spam checker modules don't currently have any way to differentiate between a user trying to sign up initially, versus an SSO user (whom has presumably already been approved elsewhere) trying to log in for the first time.
This PR passes `auth_provider_id` as an argument to the `check_registration_for_spam` function. This argument will contain an ID of an SSO provider (`"saml"`, `"cas"`, etc.) if one was used, else `None`.
Federation catch up mode is very inefficient if the number of events
that the remote server has missed is small, since handling gaps can be
very expensive, c.f. #9492.
Instead of going into catch up mode whenever we see an error, we instead
do so only if we've backed off from trying the remote for more than an
hour (the assumption being that in such a case it is more than a
transient failure).
Background: When we receive incoming federation traffic, and notice that we are missing prev_events from
the incoming traffic, first we do a `/get_missing_events` request, and then if we still have missing prev_events,
we set up new backwards-extremities. To do that, we need to make a `/state_ids` request to ask the remote
server for the state at those prev_events, and then we may need to then ask the remote server for any events
in that state which we don't already have, as well as the auth events for those missing state events, so that we
can auth them.
This PR attempts to optimise the processing of that state request. The `state_ids` API returns a list of the state
events, as well as a list of all the auth events for *all* of those state events. The optimisation comes from the
observation that we are currently loading all of those auth events into memory at the start of the operation, but
we almost certainly aren't going to need *all* of the auth events. Rather, we can check that we have them, and
leave the actual load into memory for later. (Ideally the federation API would tell us which auth events we're
actually going to need, but it doesn't.)
The effect of this is to reduce the number of events that I need to load for an event in Matrix HQ from about
60000 to about 22000, which means it can stay in my in-memory cache, whereas previously the sheer number
of events meant that all 60K events had to be loaded from db for each request, due to the amount of cache
churn. (NB I've already tripled the size of the cache from its default of 10K).
Unfortunately I've ended up basically C&Ping `_get_state_for_room` and `_get_events_from_store_or_dest` into
a new method, because `_get_state_for_room` is also called during backfill, which expects the auth events to be
returned, so the same tricks don't work. That said, I don't really know why that codepath is completely different
(ultimately we're doing the same thing in setting up a new backwards extremity) so I've left a TODO suggesting
that we clean it up.
We either need to pass the auth provider over the replication api, or make sure
we report the auth provider on the worker that received the request. I've gone
with the latter.
Earlier [I was convinced](https://github.com/matrix-org/synapse/issues/9565) that we didn't have an Admin API for listing media uploaded by a user. Foolishly I was looking under the Media Admin API documentation, instead of the User Admin API documentation.
I thought it'd be helpful to link to the latter so others don't hit the same dead end :)
The hashes are from commits due to auto-formatting, e.g. running black.
git can be configured to use this automatically by running the following:
git config blame.ignoreRevsFile .git-blame-ignore-revs
I noticed that I'd occasionally have `scripts-dev/lint.sh` fail when messing about with config options in my PR. The script calls `scripts-dev/config-lint.sh`, which attempts some validation on the sample config.
It does this by using `sed` to edit the sample_config, and then seeing if the file changed using `git diff`.
The problem is: if you changed the sample_config as part of your commit, this script will error regardless.
This PR attempts to change the check so that existing, unstaged changes to the sample_config will not cause the script to report an invalid file.
Unfortunately this doesn't test re-joining the room since
that requires having another homeserver to query over
federation, which isn't easily doable in unit tests.
This great big stack of commits is a a whole load of hoop-jumping to make it easier to store additional values in login tokens, and then to actually store the SSO Identity Provider in the login token. (Making use of that data will follow in a subsequent PR.)
Turns out matrix.org has an event that has duplicate auth events (which really isn't supposed to happen, but here we are). This caused the background update to fail due to `UniqueViolation`.
It landed in schema version 58 after 59 had been created, causing some
servers to not run it. The main effect of was that not all rooms had
their chain cover calculated correctly. After the BG updates complete
the chain covers will get fixed when a new state event in the affected
rooms is received.
In #75, bytecode was disabled (from a bit of FUD back in `python<2.4` days, according to dev chat), I think it's safe enough to enable it again.
Added in `__pycache__/` and `.pyc`/`.pyd` to `.gitignore`, to extra-insure compiled files don't get committed.
`Signed-off-by: Jonathan de Jong <jonathan@automatia.nl>`
### Changes proposed in this PR
- Add support for the `no_proxy` and `NO_PROXY` environment variables
- Internally rely on urllib's [`proxy_bypass_environment`](bdb941be42/Lib/urllib/request.py (L2519))
- Extract env variables using urllib's `getproxies`/[`getproxies_environment`](bdb941be42/Lib/urllib/request.py (L2488)) which supports lowercase + uppercase, preferring lowercase, except for `HTTP_PROXY` in a CGI environment
This does contain behaviour changes for consumers so making sure these are called out:
- `no_proxy`/`NO_PROXY` is now respected
- lowercase `https_proxy` is now allowed and taken over `HTTPS_PROXY`
Related to #9306 which also uses `ProxyAgent`
Signed-off-by: Timothy Leung tim95@hotmail.co.uk
This fixes#8518 by adding a conditional check on `SyncResult` in a function when `prev_stream_token == current_stream_token`, as a sanity check. In `CachedResponse.set.<remove>()`, the result is immediately popped from the cache if the conditional function returns "false".
This prevents the caching of a timed-out `SyncResult` (that has `next_key` as the stream key that produced that `SyncResult`). The cache is prevented from returning a `SyncResult` that makes the client request the same stream key over and over again, effectively making it stuck in a loop of requesting and getting a response immediately for as long as the cache keeps those values.
Signed-off-by: Jonathan de Jong <jonathan@automatia.nl>
* Split ShardedWorkerHandlingConfig
This is so that we have a type level understanding of when it is safe to
call `get_instance(..)` (as opposed to `should_handle(..)`).
* Remove special cases in ShardedWorkerHandlingConfig.
`ShardedWorkerHandlingConfig` tried to handle the various different ways
it was possible to configure federation senders and pushers. This led to
special cases that weren't hit during testing.
To fix this the handling of the different cases is moved from there and
`generic_worker` into the worker config class. This allows us to have
the logic in one place and allows the rest of the code to ignore the
different cases.
The idea here is to stop people forgetting to call `check_consistency`. Folks can still just pass in `None` to the new args in `build_sequence_generator`, but hopefully they won't.
This PR remove the cache for the `get_shared_rooms_for_users` storage method (the db method driving the experimental "what rooms do I share with this user?" feature: [MSC2666](https://github.com/matrix-org/matrix-doc/pull/2666)). Currently subsequent requests to the endpoint will return the same result, even if your shared rooms with that user have changed.
The cache was added in https://github.com/matrix-org/synapse/pull/7785, but we forgot to ensure it was invalidated appropriately.
Upon attempting to invalidate it, I found that the cache had to be entirely invalidated whenever a user (remote or local) joined or left a room. This didn't make for a very useful cache, especially for a function that may or may not be called very often. Thus, I've opted to remove it instead of invalidating it.
This PR attempts to eliminate unnecessary presence sending work when your local server joins a room, or when a remote server joins a room your server is participating in by processing state deltas in chunks rather than individually.
---
When your server joins a room for the first time, it requests the historical state as well. This chunk of new state is passed to the presence handler which, after filtering that state down to only membership joins, will send presence updates to homeservers for each join processed.
It turns out that we were being a bit naive and processing each event individually, and sending out presence updates for every one of those joins. Even if many different joins were users on the same server (hello IRC bridges), we'd send presence to that same homeserver for every remote user join we saw.
This PR attempts to deduplicate all of that by processing the entire batch of state deltas at once, instead of only doing each join individually. We process the joins and note down which servers need which presence:
* If it was a local user join, send that user's latest presence to all servers in the room
* If it was a remote user join, send the presence for all local users in the room to that homeserver
We deduplicate by inserting all of those pending updates into a dictionary of the form:
```
{
server_name1: {presence_update1, ...},
server_name2: {presence_update1, presence_update2, ...}
}
```
Only after building this dict do we then start sending out presence updates.
This PR adds a homeserver config option, `user_directory.prefer_local_users`, that when enabled will show local users higher in user directory search results than remote users. This option is off by default.
Note that turning this on doesn't necessarily mean that remote users will always be put below local users, but they should be assuming all other ranking factors (search query match, profile information present etc) are identical.
This is useful for, say, University networks that are openly federating, but want to prioritise local students and staff in the user directory over other random users.
Add off-by-default configuration settings to:
- disable putting an invitee's profile info in invite events
- disable profile lookup via federation
Signed-off-by: Andrew Ferrazzutti <fair@miscworks.net>
This reduces the memory usage of previewing media files which
end up larger than the `max_spider_size` by avoiding buffering
content internally in treq.
It also checks the `Content-Length` header in additional places
instead of streaming the content to check the body length.
This is a small bug that I noticed while working on #8956.
We have a for-loop which attempts to strip all presence changes for each user except for the final one, as we don't really care about older presence:
9e19c6aab4/synapse/handlers/presence.py (L368-L371)
`new_states_dict` stores this stripped copy of latest presence state for each user, before it is... put into a new variable `new_state`, which is just overridden by the subsequent for loop.
I believe this was instead meant to override `new_states`. Without doing so, it effectively meant:
1. The for loop had no effect.
2. We were still processing old presence state for users.
- Update black version to the latest
- Run black auto formatting over the codebase
- Run autoformatting according to [`docs/code_style.md
`](80d6dc9783/docs/code_style.md)
- Update `code_style.md` docs around installing black to use the correct version
Synapse 1.27.0rc2 (2021-02-11)
==============================
Features
--------
- Further improvements to the user experience of registration via single sign-on. ([\#9297](https://github.com/matrix-org/synapse/issues/9297))
Bugfixes
--------
- Fix ratelimiting introduced in v1.27.0rc1 for invites to respect the `ratelimit` flag on application services. ([\#9302](https://github.com/matrix-org/synapse/issues/9302))
- Do not automatically calculate `public_baseurl` since it can be wrong in some situations. Reverts behaviour introduced in v1.26.0. ([\#9313](https://github.com/matrix-org/synapse/issues/9313))
Improved Documentation
----------------------
- Clarify the sample configuration for changes made to the template loading code. ([\#9310](https://github.com/matrix-org/synapse/issues/9310))
Remove conflicting sqlite tables that throw sqlite3.OperationalError: object name reserved for internal use: event_search_content when running the twisted unit tests.
Fix#8996
Fixes some exceptions if the room state isn't quite as expected.
If the expected state events aren't found, try to find them in the
historical room state. If they still aren't found, fallback to a reasonable,
although ugly, value.
This could arguably replace the existing admin API for `/members`, however that is out of scope of this change.
This sort of endpoint is ideal for moderation use cases as well as other applications, such as needing to retrieve various bits of information about a room to perform a task (like syncing power levels between two places). This endpoint exposes nothing more than an admin would be able to access with a `select *` query on their database.
* Fixes a case where no summary text was returned.
* The use of messages_from_person vs. messages_from_person_and_others
was tweaked to depend on whether there was 1 sender or multiple senders,
not based on if there was 1 room or multiple rooms.
Context, Fixes: https://github.com/matrix-org/synapse/issues/9263
In the past to fix an issue with old Riots re-requesting threepid validation tokens, we raised a `LoginError` during UIA instead of `InteractiveAuthIncompleteError`. This is now breaking the way Tchap logs in - which isn't standard, but also isn't disallowed by the spec.
An easy fix is just to remove the 4 year old workaround.
There's some prelimiary work here to pull out the construction of a jinja environment to a separate function.
I wanted to load the template at display time rather than load time, so that it's easy to update on the fly. Honestly, I think we should do this with all our templates: the risk of ending up with malformed templates is far outweighed by the improved turnaround time for an admin trying to update them.
Fixes#8966.
* Factor out build_synapse_client_resource_tree
Start a function which will mount resources common to all workers.
* Move sso init into build_synapse_client_resource_tree
... so that we don't have to do it for each worker
* Fix SSO-login-via-a-worker
Expose the SSO login endpoints on workers, like the documentation says.
* Update workers config for new endpoints
Add documentation for endpoints recently added (#8942, #9017, #9262)
* remove submit_token from workers endpoints list
this *doesn't* work on workers (yet).
* changelog
* Add a comment about the odd path for SAML2Resource
There are going to be a couple of paths to get to the final step of SSO reg, and I want the URL in the browser to consistent. So, let's move the final step onto a separate path, which we redirect to.
* synapse.app.base: only call gc.freeze() on CPython
gc.freeze() is an implementation detail of CPython garbage collector,
and notably does not exist on PyPy.
Rather than playing whack-a-mole and skipping the call when under PyPy,
simply restrict it to CPython because the whole gc module is
implementation-defined.
Signed-off-by: Ivan Shapovalov <intelfx@intelfx.name>
Adds note about updating dh-virtualenv once we drop support for Xenial.
We can't update now, because it needs debhelper 12, while Xenial only
backports 10.
Signed-off-by: Dan Callahan <danc@element.io>
We've decided to add a 'brand' field to help clients decide how to style the
buttons.
Also, fix up the allowed characters for idp_id, while I'm in the area.
If a Synapse module's config block were empty in YAML, thus being translated to a `Nonetype` in Python, then some modules could fail as that None ends up getting passed to their `parse_config` method. Modules are expected to accept a `dict` instead.
This PR ensures that if the user does end up specifying an empty config block (such as what [the default oidc config in the sample config](5310808d3b/docs/sample_config.yaml (L1816-L1845)) states) then `None` is not passed to the module. An empty dict is passed instead.
This code assumes that no existing modules are relying on receiving a `None` config block, but I'd really hope that they aren't.
Treat unknown encodings (according to lxml) as UTF-8
when generating a preview for HTML documents. This
isn't fully accurate, but will hopefully give a reasonable
title and summary.
This new version no longer has the problem of adding/removing a blank line in `.pyi` files, which black disagrees with. This would cause `isort` to slightly modify `.pyi` files, before `black` would subsequently modify back directly afterwards.
Relevant `isort` issue: https://github.com/pycqa/isort/issues/1284
This is done by creating a custom `RedisFactory` subclass that
periodically pings all connections in its pool.
We also ensure that the `replyTimeout` param is non-null, so that we
timeout waiting for the reply to those pings (and thus triggering a
reconnect).
This expands the current shadow-banning feature to be usable via
the admin API and adds documentation for it.
A shadow-banned users receives successful responses to their
client-server API requests, but the events are not propagated into rooms.
Shadow-banning a user should be used as a tool of last resort and may lead
to confusing or broken behaviour for the client.
If no thumbnail of the requested type exists, return a 404 instead
of erroring. This doesn't quite match the spec (which does not define
what happens if no thumbnail can be found), but is consistent with
what Synapse already does.
The lists of source directories to lint between `tox.ini` and `lint.sh` became out of sync. This PR tightens them up and adds some comments reminding any future readers to keep the list in sync.
We have seen a failure mode here where if there are many in flight
unfinished IDs then marking an ID as finished takes a lot of CPU (as
calling deque.remove iterates over the list)
Introduced in #9104
This wasn't picked up by the tests as this is all fine the first time you run Synapse (after upgrading), but then when you restart the wrong value is pulled from `stream_positions`.
* Factor out a common TestHtmlParser
Looks like I'm doing this in a few different places.
* Improve OIDC login test
Complete the OIDC login flow, rather than giving up halfway through.
* Ensure that OIDC login works with multiple OIDC providers
* Fix bugs in handling clientRedirectUrl
- don't drop duplicate query-params, or params with no value
- allow utf-8 in query-params
0dd2649c1 (#9112) changed the signature of `auth_via_oidc`. Meanwhile,
26d10331e (#9091) introduced a new test which relied on the old signature of
`auth_via_oidc`. The two branches were never tested together until they landed
in develop.
We do this by allowing a single iteration to process multiple rooms at a
time, as there are often a lot of really tiny rooms, which can massively
slow things down.
This is the final step for supporting multiple OIDC providers concurrently.
First of all, we reorganise the config so that you can specify a list of OIDC providers, instead of a single one. Before:
oidc_config:
enabled: true
issuer: "https://oidc_provider"
# etc
After:
oidc_providers:
- idp_id: prov1
issuer: "https://oidc_provider"
- idp_id: prov2
issuer: "https://another_oidc_provider"
The old format is still grandfathered in.
With that done, it's then simply a matter of having OidcHandler instantiate a new OidcProvider for each configured provider.
Protecting media stops it from being quarantined when
e.g. all media in a room is quarantined. This is useful
for sticker packs and other media that is uploaded by
server administrators, but used by many people.
`distutils` is pretty much deprecated these days, and replaced with
`setuptools`. It's also annoying because it's you can't `pip install` it, and
it's hard to figure out which debian package we should depend on to make sure
it's there.
Since we only use it for a tiny function anyway, let's just vendor said
function into our codebase.
* make the OIDC bits of the test work at a higher level - via the REST api instead of poking the OIDCHandler directly.
* Move it to test_login.py, where I think it fits better.
Again in preparation for handling more than one OIDC provider, add a new caveat to the macaroon used as an OIDC session cookie, which remembers which OIDC provider we are talking to. In future, when we get a callback, we'll need it to make sure we talk to the right IdP.
As part of this, I'm adding an idp_id and idp_name field to the OIDC configuration object. They aren't yet documented, and we'll just use the old values by default.
The idea here is that we will have an instance of OidcProvider for each
configured IdP, with OidcHandler just doing the marshalling of them.
For now it's still hardcoded with a single provider.
A reactor was being passed instead of a whitelist for the BlacklistingAgentWrapper
used by the WellyKnownResolver. This coulld cause exceptions when attempting to
connect to IP addresses that are blacklisted, but in reality this did not have any
observable affect since this code is not used for IP literals.
If a user tries to do UI Auth via SSO, but uses the wrong account on the SSO
IdP, try to give them a better error.
Previously, the UIA would claim to be successful, but then the operation in
question would simply fail with "auth fail". Instead, serve up an error page
which explains the failure.
This checks that the domain given to `DomainSpecificString.is_valid` (e.g.
`UserID`, `RoomAlias`, etc.) is of a valid form. Previously some validation
was done on the localpart (e.g. the sigil), but not the domain portion.
Some light refactoring of OidcHandler, in preparation for bigger things:
* remove inheritance from deprecated BaseHandler
* add an object to hold the things that go into a session cookie
* factor out a separate class for manipulating said cookies
If we have integrations with multiple identity providers, when the user does a UI Auth, we need to redirect them to the right one.
There are a few steps to this. First of all we actually need to store the userid of the user we are trying to validate in the UIA session, since the /auth/sso/fallback/web request is unauthenticated.
Then, once we get the /auth/sso/fallback/web request, we can fish the user id out of the session, and use it to look up the external id mappings, and hence pick an SSO provider for them.
Debian package builds were failing for two reasons:
1. Python versions prior to 3.7 throw exceptions when attempting to print
Unicode characters under a "C" locale. (#9076)
2. We depended on `dh-systemd` which no longer exists in Debian Bullseye, but
is necessary in Ubuntu Xenial. (#9073)
Setting `LANG="C.UTF-8"` in the build environment fixes the first issue.
See also: https://bugs.python.org/issue19846
The second issue is a bit trickier. The dh-systemd package was merged into
debhelper version 9.20160709 and a transitional package left in its wake.
The transitional dh-systemd package was removed in Debian Bullseye.
However, Ubuntu Xenial ships an older debhelper, and still needs dh-systemd.
Thus, builds were failing on Bullseye since we depended on a package which had
ceased existing, but we couldn't remove it from the debian/control file and our
build scripts because we still needed it for Ubuntu Xenial.
We can fix the debian/control issue by listing dh-systemd as an alternative to
the newer versions of debhelper. Since dh-systemd declares that it depends on
debhelper, Ubuntu Xenial will select its older dh-systemd which will in turn
pull in its older debhelper, resulting in no change from the status quo. All
other supported releases will satisfy the debhelper dependency constraint and
skip the dh-systemd alternative.
Build scripts were fixed by unconditionally attempting to install dh-systemd on
all releases and suppressing failures.
Once we drop support for Ubuntu Xenial, we can revert most of this commit and
rely on the version constraint on debhelper in debian/control.
Fixes#9076Fixes#9073
Signed-off-by: Dan Callahan <danc@element.io>
SynapseRequest is in danger of becoming a bit of a dumping-ground for "useful stuff relating to Requests",
which isn't really its intention (its purpose is to override render, finished and connectionLost to set up the
LoggingContext and write the right entries to the request log).
Putting utility functions inside SynapseRequest means that lots of our code ends up requiring a
SynapseRequest when there is nothing synapse-specific about the Request at all, and any old
twisted.web.iweb.IRequest will do. This increases code coupling and makes testing more difficult.
In short: move get_user_agent out to a utility function.
You can't continue using a transaction once an exception has been
raised, so catching and dropping the error here is pointless and just
causes more errors.
I'm not even sure what this was supposed to do, but the fact it has python2isms
and nobody has noticed suggests it's not terribly important.
It doesn't seem to have been used since ff23e5ba37.
GET /_synapse/admin/v1/rooms/<identifier>/forward_extremities now gets forward extremities for a room, returning count and the list of extremities.
Signed-off-by: Jason Robinson <jasonr@matrix.org>
* Implement CasHandler.handle_redirect_request
... to make it match OidcHandler and SamlHandler
* Clean up interface for OidcHandler.handle_redirect_request
Make it accept `client_redirect_url=None`.
* Clean up interface for `SamlHandler.handle_redirect_request`
... bring it into line with CAS and OIDC by making it take a Request parameter,
move the magic for `client_redirect_url` for UIA into the handler, and fix the
return type to be a `str` rather than a `bytes`.
* Define a common protocol for SSO auth provider impls
* Give SsoIdentityProvider an ID and register them
* Combine the SSO Redirect servlets
Now that the SsoHandler knows about the identity providers, we can combine the
various *RedirectServlets into a single implementation which delegates to the
right IdP.
* changelog
The `RoomDirectoryFederationTests` tests were not being run unless explicitly called as an `__init__.py` file was not present in `tests/federation/transport/`. Thus the folder was not a python module, and `trial` did not look inside for any test cases to run. This was found while working on #6739.
This PR adds a `__init__.py` and also fixes the test in a couple ways:
- Switch to subclassing `unittest.FederatingHomeserverTestCase` instead, which sets up federation endpoints for us.
- Supply a `federation_auth_origin` to `make_request` in order to more act like the request is coming from another server, instead of just an unauthenicated client requesting a federation endpoint.
I found that the second point makes no difference to the test passing, but felt like the right thing to do if we're testing over federation.
This adds an admin API that allows a server admin to get power in a room if a local user has power in a room. Will also invite the user if they're not in the room and its a private room. Can specify another user (rather than the admin user) to be granted power.
Co-authored-by: Matthew Hodgson <matthew@matrix.org>
This had two effects 1) it'd give the wrong answer and b) would iterate
*all* power levels in the auth chain of each event. The latter of which
can be *very* expensive for certain types of IRC bridge rooms that have
large numbers of power level changes.
The final part (for now) of my work to implement a username picker in synapse itself. The idea is that we allow
`UsernameMappingProvider`s to return `localpart=None`, in which case, rather than redirecting the browser
back to the client, we redirect to a username-picker resource, which allows the user to enter a username.
We *then* complete the SSO flow (including doing the client permission checks).
The static resources for the username picker itself (in
https://github.com/matrix-org/synapse/tree/rav/username_picker/synapse/res/username_picker)
are essentially lifted wholesale from
https://github.com/matrix-org/matrix-synapse-saml-mozilla/tree/master/matrix_synapse_saml_mozilla/res.
As the comment says, we might want to think about making them customisable, but that can be a follow-up.
Fixes#8876.
Fixes a bug that deactivated users appear in the directory when their profile information was updated.
To change profile information of deactivated users is neccesary for example you will remove displayname or avatar.
But they should not appear in directory. They are deactivated.
Co-authored-by: Erik Johnston <erikj@jki.re>
This is another part of my work towards fixing #8876. It moves some of the logic currently in the SAML and OIDC handlers - in particular the call to `AuthHandler.complete_sso_login` down into the `SsoHandler`.
The two are equivalent, but really we want to check the HTTP result that got
returned to the channel, not the code that the Request object *intended* to
return to the channel.
* move simple_async_mock to test_utils
... so that it can be re-used
* Remove references to `SamlHandler._map_saml_response_to_user` from tests
This method is going away, so we can no longer use it as a test point. Instead,
factor out a higher-level method which takes a SAML object, and verify correct
behaviour by mocking out `AuthHandler.complete_sso_login`.
* changelog
* Remove references to handler._auth_handler
(and replace them with hs.get_auth_handler)
* Factor out a utility function for building Requests
* Remove mocks of `OidcHandler._map_userinfo_to_user`
This method is going away, so mocking it out is no longer a valid approach.
Instead, we mock out lower-level methods (eg _remote_id_from_userinfo), or
simply allow the regular implementation to proceed and update the expectations
accordingly.
* Remove references to `OidcHandler._map_userinfo_to_user` from tests
This method is going away, so we can no longer use it as a test point. Instead
we build mock "callback" requests which we pass into `handle_oidc_callback`,
and verify correct behaviour by mocking out `AuthHandler.complete_sso_login`.
* Factor out _call_attribute_mapper and _register_mapped_user
This is mostly an attempt to simplify `get_mxid_from_sso`.
* Move mapping_lock down into SsoHandler.
It seems that letting CircleCI use its default docker version (17.09.0-ce,
apparently) did not interact well with multiarch builds: in particular, we saw
weird effects where running an amd64 build at the same time as an arm64 build
caused the arm64 builds to fail with:
Error while loading /usr/sbin/dpkg-deb: No such file or directory
Synapse 1.23.1 (2020-12-09)
===========================
Due to the two security issues highlighted below, server administrators are
encouraged to update Synapse. We are not aware of these vulnerabilities being
exploited in the wild.
Security advisory
-----------------
The following issues are fixed in v1.23.1 and v1.24.0.
- There is a denial of service attack
([CVE-2020-26257](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-26257))
against the federation APIs in which future events will not be correctly sent
to other servers over federation. This affects all servers that participate in
open federation. (Fixed in [#8776](https://github.com/matrix-org/synapse/pull/8776)).
- Synapse may be affected by OpenSSL
[CVE-2020-1971](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-1971).
Synapse administrators should ensure that they have the latest versions of
the cryptography Python package installed.
To upgrade Synapse along with the cryptography package:
* Administrators using the [`matrix.org` Docker
image](https://hub.docker.com/r/matrixdotorg/synapse/) or the [Debian/Ubuntu
packages from
`matrix.org`](https://github.com/matrix-org/synapse/blob/master/INSTALL.md#matrixorg-packages)
should ensure that they have version 1.24.0 or 1.23.1 installed: these images include
the updated packages.
* Administrators who have [installed Synapse from
source](https://github.com/matrix-org/synapse/blob/master/INSTALL.md#installing-from-source)
should upgrade the cryptography package within their virtualenv by running:
```sh
<path_to_virtualenv>/bin/pip install 'cryptography>=3.3'
```
* Administrators who have installed Synapse from distribution packages should
consult the information from their distributions.
Bugfixes
--------
- Fix a bug in some federation APIs which could lead to unexpected behaviour if different parameters were set in the URI and the request body. ([\#8776](https://github.com/matrix-org/synapse/issues/8776))
Internal Changes
----------------
- Add a maximum version for pysaml2 on Python 3.5. ([\#8898](https://github.com/matrix-org/synapse/issues/8898))
* Consistently use room_id from federation request body
Some federation APIs have a redundant `room_id` path param (see
https://github.com/matrix-org/matrix-doc/issues/2330). We should make sure we
consistently use either the path param or the body param, and the body param is
easier.
* Kill off some references to "context"
Once upon a time, "rooms" were known as "contexts". I think this kills of the
last references to "contexts".
The idea is that the parse_config method of extension modules can raise either a ConfigError or a JsonValidationError,
and it will be magically turned into a legible error message. There's a few components to it:
* Separating the "path" and the "message" parts of a ConfigError, so that we can fiddle with the path bit to turn it
into an absolute path.
* Generally improving the way ConfigErrors get printed.
* Passing in the config path to load_module so that it can wrap any exceptions that get caught appropriately.
* SsoHandler: remove inheritance from BaseHandler
* Simplify the flow for SSO UIA
We don't need to do all the magic for mapping users when we are doing UIA, so
let's factor that out.
Synapse 1.24.0rc2 (2020-12-04)
==============================
Bugfixes
--------
- Fix a regression in v1.24.0rc1 which failed to allow SAML mapping providers which were unable to redirect users to an additional page. ([\#8878](https://github.com/matrix-org/synapse/issues/8878))
Internal Changes
----------------
- Add support for the `prometheus_client` newer than 0.9.0. Contributed by Jordan Bancino. ([\#8875](https://github.com/matrix-org/synapse/issues/8875))
This removes the version pin of the `prometheus_client` dependency, in direct response to #8831. If merged, this will close#8831
As far as I can tell, no other changes are needed, but as I'm no synapse expert, I'm relying heavily on CI and maintainer reviews for this. My very primitive test of synapse with prometheus_client v0.9.0 on my home server didn't bring up any issues, so we'll see what happens.
Signed-off-by: Jordan Bancino
Replaces the `federation_ip_range_blacklist` configuration setting with an
`ip_range_blacklist` setting with wider scope. It now applies to:
* Federation
* Identity servers
* Push notifications
* Checking key validitity for third-party invite events
The old `federation_ip_range_blacklist` setting is still honored if present, but
with reduced scope (it only applies to federation and identity servers).
We do state res with unpersisted events when calculating the new current state of the room, so that should be the only thing impacted. I don't think this is tooooo big of a deal as:
1. the next time a state event happens in the room the current state should correct itself;
2. in the common case all the unpersisted events' auth events will be pulled in by other state, so will still return the correct result (or one which is sufficiently close to not affect the result); and
3. we mostly use the state at an event to do important operations, which isn't affected by this.
Rather than using a single JsonResource, construct a resource tree, as we do in
the prod code, and allow testcases to add extra resources by overriding
`create_resource_dict`.
The official dashboard uses data from these rules, but they were never added to the synapse-v2.rules. They are mentioned in this issue: https://github.com/matrix-org/synapse/issues/7917#issuecomment-661330409, but never got added to the rules.
Adding them results in all graphs in the "Event persist rate" section to function as intended.
Signed-off-by: Johanna Dorothea Reichmann <transcaffeine@finallycoffee.eu>
This was broken in #8801 when abstracting code shared with OIDC.
After this change both SAML and OIDC have a concept of
grandfathering users, but with different implementations.
This PR adds a `room_version` argument to the `RestHelper`'s `create_room_as` function for tests. I plan to use this for testing knocking, which currently uses an unstable room version.
The spec requires synapse to support `identifier` dicts for `m.login.password`
user-interactive auth, which it did not (instead, it required an undocumented
`user` parameter.)
To fix this properly, we need to pull the code that interprets `identifier`
into `AuthHandler.validate_login` so that it can be called from the UIA code.
Fixes#5665.
It's important that we make sure our background updates happen in a defined
order, to avoid disasters like #6923.
Add an ordering to all of the background updates that have landed since #7190.
This applies even if the feature is disabled at the server level with `allow_per_room_profiles`.
The server notice not being a real user it doesn't have an user profile.
This PR adds a new config option to the `push` section of the homeserver config, `group_unread_count_by_room`. By default Synapse will group push notifications by room (so if you have 1000 unread messages, if they lie in 55 rooms, you'll see an unread count on your phone of 55).
However, it is also useful to be able to send out the true count of unread messages if desired. If `group_unread_count_by_room` is set to `false`, then with the above example, one would see an unread count of 1000 (email anyone?).
This PR grew out of #6739, and adds typing to some method arguments
You'll notice that there are a lot of `# type: ignores` in here. This is due to the base methods not matching the overloads here. This is necessary to stop mypy complaining, but a better solution is #8828.
We can get a SIGHUP at any point, including times where we are not in a
sane state. By deferring calling the handlers until the next reactor
tick we ensure that we don't get unexpected conflicts, e.g. trying to
flush logs from the signal handler while the code was in the process of
writing a log entry.
Fixes#8769.
When server URL provided to register_new_matrix_user includes path
component (e.g. "http://localhost:8008/"), the command fails with
"ERROR! Received 400 Bad Request". Stripping trailing slash from the
server_url command argument makes sure combined endpoint URL remains
valid.
Signed-off-by: Dmitry Borodaenko angdraug@debian.org
This is another PR that grew out of #6739.
The existing code for checking whether a user is currently invited to a room when they want to leave the room looks like the following:
f737368a26/synapse/handlers/room_member.py (L518-L540)
It calls `get_invite_for_local_user_in_room`, which will actually query *all* rooms the user has been invited to, before iterating over them and matching via the room ID. It will then return a tuple of a lot of information which we pull the event ID out of.
I need to do a similar check for knocking, but this code wasn't very efficient. I then tried to write a different implementation using `StateHandler.get_current_state` but this actually didn't work as we haven't *joined* the room yet - we've only been invited to it. That means that only certain tables in Synapse have our desired `invite` membership state. One of those tables is `local_current_membership`.
So I wrote a store method that just queries that table instead
Synctl did not check if a worker thread was already running when using
`synctl start` and would naively start a fresh copy. This would
sometimes lead to cases where many duplicate copies of a single worker
would run.
This fix adds a pid check when starting worker threads and synctl will
now refuse to start individual workers if they're already running.
When using `add_header` nginx will literally add a header. If a
`content-type` header is already configured (for example through a
server wide default), this means we end up with 2 content-type headers,
like so:
```
content-type: text/html
content-type: application/json
access-control-allow-origin: *
```
That doesn't make sense. Instead, we want the content type of that
block to only be `application/json` which we can achieve using
`default_type` instead.
Signed-off-by: Daniele Sluijters <daenney@users.noreply.github.com>
Checks that the localpart returned by mapping providers for SAML and
OIDC are valid before registering new users.
Extends the OIDC tests for existing users and invalid data.
* Consistently use room_id from federation request body
Some federation APIs have a redundant `room_id` path param (see
https://github.com/matrix-org/matrix-doc/issues/2330). We should make sure we
consistently use either the path param or the body param, and the body param is
easier.
* Kill off some references to "context"
Once upon a time, "rooms" were known as "contexts". I think this kills of the
last references to "contexts".
* Make this line debug (it's noisy)
* Don't include from_key for presence if we are at 0
* Limit read receipts for all rooms to 100
* changelog.d/8744.bugfix
* Allow from_key to be None
* Update 8744.bugfix
* The from_key is superflous
* Update comment
remove the stubbing out of `request.process`, so that `requestReceived` also renders the request via the appropriate resource.
Replace render() with a stub for now.
`_locally_reject_invite` generates an out-of-band membership event which can be passed to clients, but not other homeservers.
This is used when we fail to reject an invite over federation. If this happens, we instead just generate a leave event locally and send it down /sync, allowing clients to reject invites even if we can't reach the remote homeserver.
A similar flow needs to be put in place for rescinding knocks. If we're unable to contact any remote server from the room we've tried to knock on, we'd still like to generate and store the leave event locally. Hence the need to reuse, and thus generalise, this method.
Separated from #6739.
The root resource isn't necessarily a JsonResource, so rename this method
accordingly, and update a couple of test classes to use the method rather than
directly manipulating self.resource.
Where we want to render a request against a specific Resource, call the global
make_request() function rather than the one in HomeserverTestCase, allowing us
to pass in an appropriate `Site`.
There's a handy function called maybe_store_room_on_invite which allows us to create an entry in the rooms table for a room and its version for which we aren't joined to yet, but we can reference when ingesting events about.
This is currently used for invites where we receive some stripped state about the room and pass it down via /sync to the client, without us being in the room yet.
There is a similar requirement for knocking, where we will eventually do the same thing, and need an entry in the rooms table as well. Thus, reusing this function works, however its name needs to be generalised a bit.
Separated out from #6739.
The main use case is to see how many requests are being made, and how
many are second/third/etc attempts. If there are large number of retries
then that likely indicates a delivery problem.
If the script fails (or is CTRL-C'ed) between porting some of the events table and copying of the sequences then the port script will immediately die if run again due to the postgres DB having inconsistencies between sequences and tables.
The fix is to move the porting of sequences to before porting the tables, so that there is never a period where the Postgres DB is inconsistent. To do that we need to change how we port the sequences so that it calculates the values from the SQLite DB rather than the Postgres DB.
Fixes#8619
This should hopefully speed up `get_auth_chain_difference` a bit in the case of repeated state res on the same rooms.
`get_auth_chain_difference` does a breadth first walk of the auth graphs by repeatedly looking up events' auth events. Different state resolutions on the same room will end up doing a lot of the same event to auth events lookups, so by caching them we should speed things up in cases of repeated state resolutions on the same room.
`adbapi.ConnectionPool` let's you turn on auto reconnect of DB connections. This is off by default.
As far as I can tell if its not enabled dead connections never get removed from the pool.
Maybe helps #8574
* Check support room has only two users
* Create 8728.bugfix
* Update synapse/server_notices/server_notices_manager.py
Co-authored-by: Erik Johnston <erik@matrix.org>
Co-authored-by: Erik Johnston <erik@matrix.org>
If SSO login is used (e.g. SAML) in a multi worker setup, it should be mentioned that currently all SAML logins must run on the same worker, see https://github.com/matrix-org/synapse/issues/7530
Also, if you are using different ports (for example 443 and 8448) in a reverse proxy for client and federation, the path `/_matrix/media` on the client and federation port must point to the listener of the `media_repository` worker, otherwise you'll get a 404 on the federation port for the path `/_matrix/media`, if a remote server is trying to get the media object on federation port, see https://github.com/matrix-org/synapse/issues/8695
This PR adds some documentation that:
* Describes who the audience for the `docs/`, `docs/dev/` and `docs/admin/` directories are, as well as Synapse's wiki page.
* Stresses that we'd like all documentation to be down in markdown.
I idly noticed that these lists were out of sync with each other, causing us to miss a table in a test case (`local_invites`). Let's consolidate this list instead to prevent this from happening in the future.
This PR fixes two things:
* Corrects the copy/paste error of telling the client their displayname is wrong when they are submitting an `avatar_url`.
* Returns a `M_INVALID_PARAM` instead of `M_UNKNOWN` for non-str type parameters.
Reported by @t3chguy.
* Tie together matches_user_in_member_list and get_users_in_room
* changelog
* Remove type to fix mypy
* Add `on_invalidate` to the function signature in the hopes that may make things work well
* Remove **kwargs
* Update 8676.bugfix
* Tie together matches_user_in_member_list and get_users_in_room
* changelog
* Remove type to fix mypy
* Add `on_invalidate` to the function signature in the hopes that may make things work well
* Remove **kwargs
* Update 8676.bugfix
We do it this way round so that only the "owner" can delete the access token (i.e. `/logout/all` by the "owner" also deletes that token, but `/logout/all` by the "target user" doesn't).
A future PR will add an API for creating such a token.
When the target user and authenticated entity are different the `Processed request` log line will be logged with a: `{@admin:server as @bob:server} ...`. I'm not convinced by that format (especially since it adds spaces in there, making it harder to use `cut -d ' '` to chop off the start of log lines). Suggestions welcome.
Cached functions accept an `on_invalidate` function, which we failed to add to the type signature. It's rarely used in the files that we have typed, which is why we haven't noticed it before.
otherwise non-state events get written as `<FrozenEvent ... state_key='None'>`
which is indistinguishable from state events with the actual state_key `None`.
This modifies the configuration of structured logging to be usable from
the standard Python logging configuration.
This also separates the formatting of logs from the transport allowing
JSON logs to files or standard logs to sockets.
Not being able to serialise `frozendicts` is fragile, and it's annoying to have
to think about which serialiser you want. There's no real downside to
supporting frozendicts, so let's just have one json encoder.
I was trying to make it so that we didn't have to start a background task when handling RDATA, but that is a bigger job (due to all the code in `generic_worker`). However I still think not pulling the event from the DB may help reduce some DB usage due to replication, even if most workers will simply go and pull that event from the DB later anyway.
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
This allows trailing commas in multi-line arg lists.
Minor, but we might as well keep our formatting current with regard to
our minimum supported Python version.
Signed-off-by: Dan Callahan <danc@element.io>
The test runner isn't present in the `[all]` set of extras, so the
previous instructions did not work without also installing `[test]`.
Note that this does not include the `[lint]` extras, since those do not
install on all supported Python versions (specifically, isort 5.x
requires Python 3.6, while we still support 3.5). Instructions for that
are included in our pull request template, so we should be fine there.
I've also dropped the `--no-use-pep517` arg to `pip install` since it
seems to have been added to address a temporary regression in pip 19.1
which was fixed in pip 19.1.1 the following month.
Lastly, updated the example output of the test suite to set more
realistic expectations around run time.
Signed-off-by: Dan Callahan <danc@element.io>
This is a requirement for [knocking](https://github.com/matrix-org/synapse/pull/6739), and is abstracting some code that was originally used by the invite flow. I'm separating it out into this PR as it's a fairly contained change.
For a bit of context: when you invite a user to a room, you send them [stripped state events](https://matrix.org/docs/spec/server_server/unstable#put-matrix-federation-v2-invite-roomid-eventid) as part of `invite_room_state`. This is so that their client can display useful information such as the room name and avatar. The same requirement applies to knocking, as it would be nice for clients to be able to display a list of rooms you've knocked on - room name and avatar included.
The reason we're sending membership events down as well is in the case that you are invited to a room that does not have an avatar or name set. In that case, the client should use the displayname/avatar of the inviter. That information is located in the inviter's membership event.
This is optional as knocks don't really have any user in the room to link up to. When you knock on a room, your knock is sent by you and inserted into the room. It wouldn't *really* make sense to show the avatar of a random user - plus it'd be a data leak. So I've opted not to send membership events to the client here. The UX on the client for when you knock on a room without a name/avatar is a separate problem.
In essence this is just moving some inline code to a reusable store method.
it seems to be possible that only one of them ends up to be cached.
when this was the case, the missing one was not fetched via federation,
and clients then failed to validate cross-signed devices.
Signed-off-by: Jonas Jelten <jj@sft.lol>
Split admin API for reported events in detail und list view.
API was introduced with #8217 in synapse v.1.21.0.
It makes the list (`GET /_synapse/admin/v1/event_reports`) less complex and provides a better overview.
The details can be queried with: `GET /_synapse/admin/v1/event_reports/<report_id>`.
It is similar to room and users API.
It is a kind of regression in `GET /_synapse/admin/v1/event_reports`. `event_json` was removed. But the api was introduced one version before and it is an admin API (not under spec).
Signed-off-by: Dirk Klimpel dirk@klimpel.org
* Fix user_daily_visits to not have duplicate rows for UA.
Fixes#8641.
* Newsfile
* Fix typo.
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
#8567 started a span for every background process. This is good as it means all Synapse code that gets run should be in a span (unless in the sentinel logging context), but it means we generate about 15x the number of spans as we did previously.
This PR attempts to reduce that number by a) not starting one for send commands to Redis, and b) deferring starting background processes until after we're sure they're necessary.
I don't really know how much this will help.
I noticed in https://github.com/matrix-org/synapse/issues/8575 that the `end_error` variable in `synapse_port_db` is set to an `Exception`, even though later we expect it to be a `str`.
This PR simply casts an exception raised to a string. I'm doing this instead of having `end_error` be of type exception as we explicitly set `end_error` to a str here:
d25eb8f370/scripts/synapse_port_db (L542-L547)
This whole file could probably use some heavy refactoring, but until then at least this fix will prevent exception contents from being hidden from us and users.
* Add `DeferredCache.get_immediate` method
A bunch of things that are currently calling `DeferredCache.get` are only
really interested in the result if it's completed. We can optimise and simplify
this case.
* Remove unused 'default' parameter to DeferredCache.get()
* another get_immediate instance
This implements a more standard API for instantiating a homeserver and
moves some of the dependency injection into the test suite.
More concretely this stops using `setattr` on all `kwargs` passed to `HomeServer`.
This PR makes several changes to the `./scripts-dev/lint.sh` script, which lints the codebase with a number of tools:
* Adds usage information, with `-h` flag to show it. Otherwise it will show when providing an unknown flag.
* Adds option `-d` which will check both staged and unstaged files that have changed since the last commit and add them to the list of files to lint.
- Note that only files without an extension, or with a `.py` extension will be allowed. This prevents editing bash scripts causing the linters to break on non-python files.
* Improves the print-out of which files/directories are being linted.
We asserted that the IDs returned by postgres sequence was greater than
any we had seen, however this is technically racey as we may update the
current positions out of order.
We now assert that the sequences are correct on startup, so the
assertion is no longer really required, so we remove them.
Autocommit means that we don't wrap the functions in transactions, and instead get executed directly. Introduced in #8456. This will help:
1. reduce the number of `could not serialize access due to concurrent delete` errors that we see (though there are a few functions that often cause serialization errors that we don't fix here);
2. improve the DB performance, as it no longer needs to deal with the overhead of `REPEATABLE READ` isolation levels; and
3. improve wall clock speed of these functions, as we no longer need to send `BEGIN` and `COMMIT` to the DB.
Some notes about the differences between autocommit mode and our default `REPEATABLE READ` transactions:
1. Currently `autocommit` only applies when using PostgreSQL, and is ignored when using SQLite (due to silliness with [Twisted DB classes](https://twistedmatrix.com/trac/ticket/9998)).
2. Autocommit functions may get retried on error, which means they can get applied *twice* (or more) to the DB (since they are not in a transaction the previous call would not get rolled back). This means that the functions need to be idempotent (or otherwise not care about being called multiple times). Read queries, simple deletes, and updates/upserts that replace rows (rather than generating new values from existing rows) are all idempotent.
3. Autocommit functions no longer get executed in [`REPEATABLE READ`](https://www.postgresql.org/docs/current/transaction-iso.html) isolation level, and so data can change queries, which is fine for single statement queries.
We asserted that the IDs returned by postgres sequence was greater than
any we had seen, however this is technically racey as we may update the
current positions out of order.
We now assert that the sequences are correct on startup, so the
assertion is no longer really required, so we remove them.
* Fix outbound federaion with multiple event persisters.
We incorrectly notified federation senders that the minimum persisted
stream position had advanced when we got an `RDATA` from an event
persister.
Notifying of federation senders already correctly happens in the
notifier, so we just delete the offending line.
* Change some interfaces to use RoomStreamToken.
By enforcing use of `RoomStreamTokens` we make it less likely that
people pass in random ints that they got from somewhere random.
After https://github.com/matrix-org/synapse/pull/8377, the deb packages no longer indirectly installed the `"test"` dependencies, causing debian packages to fail to build while carrying out the unit tests.
This PR installs `test` dependencies explicitly when building debian packages.
Currently background proccesses stream the events stream use the "minimum persisted position" (i.e. `get_current_token()`) rather than the vector clock style tokens. This is broadly fine as it doesn't matter if the background processes lag a small amount. However, in extreme cases (i.e. SyTests) where we only write to one event persister the background processes will never make progress.
This PR changes it so that the `MultiWriterIDGenerator` keeps the current position of a given instance as up to date as possible (i.e using the latest token it sees if its not in the process of persisting anything), and then periodically announces that over replication. This then allows the "minimum persisted position" to advance, albeit with a small lag.
This could, very occasionally, cause:
```
tests.test_visibility.FilterEventsForServerTestCase.test_large_room
===============================================================================
[ERROR]
Traceback (most recent call last):
File "/src/tests/rest/media/v1/test_media_storage.py", line 86, in test_ensure_media_is_in_local_cache
self.wait_on_thread(x)
File "/src/tests/unittest.py", line 296, in wait_on_thread
self.reactor.advance(0.01)
File "/src/.tox/py35/lib/python3.5/site-packages/twisted/internet/task.py", line 826, in advance
self._sortCalls()
File "/src/.tox/py35/lib/python3.5/site-packages/twisted/internet/task.py", line 787, in _sortCalls
self.calls.sort(key=lambda a: a.getTime())
builtins.ValueError: list modified during sort
tests.rest.media.v1.test_media_storage.MediaStorageTests.test_ensure_media_is_in_local_cache
```
This PR allows Synapse modules making use of the `ModuleApi` to create and send non-membership events into a room. This can useful to have modules send messages, or change power levels in a room etc. Note that they must send event through a user that's already in the room.
The non-membership event limitation is currently arbitrary, as it's another chunk of work and not necessary at the moment.
Synapse 1.21.0rc3 (2020-10-08)
==============================
Bugfixes
--------
- Fix duplication of events on high traffic servers, caused by PostgreSQL `could not serialize access due to concurrent update` errors. ([\#8456](https://github.com/matrix-org/synapse/issues/8456))
Internal Changes
----------------
- Add Groovy Gorilla to the list of distributions we build `.deb`s for. ([\#8475](https://github.com/matrix-org/synapse/issues/8475))
Added shields directing to synapse-dev room, showing license, latest version on PyPi and supported Python versions.
I've moved substitution definitions to the bottom to improve readability.
Signed-off-by: Mateusz Przybyłowicz <uamfhq@gmail.com>
We call `_update_stream_positions_table_txn` a lot, which is an UPSERT
that can conflict in `REPEATABLE READ` isolation level. Instead of doing
a transaction consisting of a single query we may as well run it outside
of a transaction.
We call `_update_stream_positions_table_txn` a lot, which is an UPSERT
that can conflict in `REPEATABLE READ` isolation level. Instead of doing
a transaction consisting of a single query we may as well run it outside
of a transaction.
Currently when using multiple event persisters we (in the worst case) don't tell clients about events until all event persisters have persisted new events after the original event. This is a suboptimal, especially if one of the event persisters goes down.
To handle this, we encode the position of each event persister in the room tokens so that we can send events to clients immediately. To reduce the size of the token we do two things:
1. We create a unique immutable persistent mapping between instance names and a generated small integer ID, which we can encode in the tokens instead of the instance name; and
2. We encode the "persisted upto position" of the room token and then only explicitly include instances that have positions strictly greater than that.
The new tokens look something like: `m3478~1.3488~2.3489`, where the first number is the min position, and the subsequent `-` separated pairs are the instance ID to positions map. (We use `.` and `~` as separators as they're URL safe and not already used by `StreamToken`).
It seems most of these blacklisted tests do actually pass most of the time.
I'm of the opinion that having them blacklisted here means there is very little incentive for us to deflake any flaky tests, and meanwhile any value in those tests is completely lost.
Lots of different module apis is not easy to maintain.
Rather than adding yet another ModuleApi(hs, hs.get_auth_handler()) incantation, first add an hs.get_module_api() method and use it where possible.
https://github.com/matrix-org/synapse/tree/develop/docs/sphinx doesn't seem to really be utilised or changed recently since the initial commit. I like the idea of exportable documentation of the codebase, but at the moment after running through the build instructions the generated website wasn't very useful...
* Optimise and test state fetching for 3p event rules
Getting all the events at once is much more efficient than getting them
individually
* Test that 3p event rules can modify events
PR #8292 tried to maintain backwards compat with modules which don't provide a
`check_visibility_can_be_modified` method, but the tests weren't being run,
and the check didn't work.
This PR allows `ThirdPartyEventRules` modules to view, manipulate and block changes to the state of whether a room is published in the public rooms directory.
While the idea of whether a room is in the public rooms list is not kept within an event in the room, `ThirdPartyEventRules` generally deal with controlling which modifications can happen to a room. Public rooms fits within that idea, even if its toggle state isn't controlled through a state event.
There's no need for it to be in the dict as well as the events table. Instead,
we store it in a separate attribute in the EventInternalMetadata object, and
populate that on load.
This means that we can rely on it being correctly populated for any event which
has been persited to the database.
This is so we can tell what is going on when things are taking a while to start up.
The main change here is to ensure that transactions that are created during startup get correctly logged like normal transactions.
#7124 changed the behaviour of remote thumbnails so that the thumbnailing method was included in the filename of the thumbnail. To support existing files, it included a fallback so that we would check the old filename if the new filename didn't exist.
Unfortunately, it didn't apply this logic to storage providers, so any thumbnails stored on such a storage provider was broken.
For negative streams we have to negate the internal stream ID before
querying the DB.
The effect of this bug was to query far too many rows, slowing start up
time, but we would correctly filter the results afterwards so there was
no ill effect.
This converts a few more of our inline HTML templates to Jinja. This is somewhat part of #7280 and should make it a bit easier to customize these in the future.
The idea is that in future tokens will encode a mapping of instance to position. However, we don't want to include the full instance name in the string representation, so instead we'll have a mapping between instance name and an immutable integer ID in the DB that we can use instead. We'll then do the lookup when we serialize/deserialize the token (we could alternatively pass around an `Instance` type that includes both the name and ID, but that turns out to be a lot more invasive).
This was a bit unweildy for what I wanted: in particular, I wanted to assign
each measurement straight into a bucket, rather than storing an intermediate
Counter which didn't do any bucketing at all.
I've replaced it with something that is hopefully a bit easier to use.
(I'm not entirely sure what the difference between a HistogramMetricFamily and
a GaugeHistogramMetricFamily is, but given our counters can go down as well as
up the latter *sounds* more accurate?)
Our hacked-up `_exposition.py` was stripping out some samples it shouldn't
have been. Put them back in, to more closely match the upstream
`exposition.py`.
* Don't check whether a 3pid is allowed to register during password reset
This endpoint should only deal with emails that have already been approved, and
are attached with user's account. There's no need to re-check them here.
* Changelog
* Fix table scan of events on worker startup.
This happened because we assumed "new" writers had an initial stream
position of 0, so the replication code tried to fetch all events written
by the instance between 0 and the current position.
Instead, set the initial position of new writers to the current
persisted up to position, on the assumption that new writers won't have
written anything before that point.
* Consider old writers coming back as "new".
Otherwise we'd try and fetch entries between the old stale token and the
current position, even though it won't have written any rows.
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
This PR adds a script that:
* Builds the local Synapse checkout using our existing `docker/Dockerfile` image.
* Downloads [Complement](https://github.com/matrix-org/complement/)'s source code.
* Builds the [Synapse.Dockerfile](https://github.com/matrix-org/complement/blob/master/dockerfiles/Synapse.Dockerfile) using the above dockerfile as a base.
* Builds and runs Complement against it.
This set up differs slightly from [that of the dendrite repo](https://github.com/matrix-org/dendrite/blob/master/build/scripts/complement.sh) (`complement.sh`, `Complement.Dockerfile`), which instead stores a separate, but slightly modified, dockerfile in Dendrite's repo rather than running the one stored in Complement's repo. That synapse equivalent to that dockerfile (`Synapse.Dockerfile`) in Complement's repo is just based on top of `matrixdotorg/synapse:latest`, which we opt to build here locally.
Thus copying over the files from Complement's repo wouldn't change any functionality, and would result in two instances of the same files. So just using the dockerfile in Complement's repo was decided upon instead.
Broken in https://github.com/matrix-org/synapse/pull/8275 and has yet to be put in a release. Fixes https://github.com/matrix-org/synapse/issues/8418.
`next_link` is an optional parameter. However, we were checking whether the `next_link` param was valid, even if it wasn't provided. In that case, `next_link` was `None`, which would clearly not be a valid URL.
This would prevent password reset and other operations if `next_link` was not provided, and the `next_link_domain_whitelist` config option was set.
* Remove `on_timeout_cancel` from `timeout_deferred`
The `on_timeout_cancel` param to `timeout_deferred` wasn't always called on a
timeout (in particular if the canceller raised an exception), so it was
unreliable. It was also only used in one place, and to be honest it's easier to
do what it does a different way.
* Fix handling of connection timeouts in outgoing http requests
Turns out that if we get a timeout during connection, then a different
exception is raised, which wasn't always handled correctly.
To fix it, catch the exception in SimpleHttpClient and turn it into a
RequestTimedOutError (which is already a documented exception).
Also add a description to RequestTimedOutError so that we can see which stage
it failed at.
* Fix incorrect handling of timeouts reading federation responses
This was trapping the wrong sort of TimeoutError, so was never being hit.
The effect was relatively minor, but we should fix this so that it does the
expected thing.
* Fix inconsistent handling of `timeout` param between methods
`get_json`, `put_json` and `delete_json` were applying a different timeout to
the response body to `post_json`; bring them in line and test.
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
Co-authored-by: Erik Johnston <erik@matrix.org>
This endpoint should only deal with emails that have already been approved, and
are attached with user's account. There's no need to re-check them here.
This table was created in #8034 (1.20.0). It references
`ui_auth_sessions`, which is ignored, so this one should be too.
Signed-off-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
* Fix test_verify_json_objects_for_server_awaits_previous_requests
It turns out that this wasn't really testing what it thought it was testing
(in particular, `check_context` was turning failures into success, which was
making the tests pass even though it wasn't clear they should have been.
It was also somewhat overcomplex - we can test what it was trying to test
without mocking out perspectives servers.
* Fix warnings about finished logcontexts in the keyring
We need to make sure that we finish the key fetching magic before we run the
verifying code, to ensure that we don't mess up our logcontexts.
Co-authored-by: Benjamin Koch <bbbsnowball@gmail.com>
This adds configuration flags that will match a user to pre-existing users
when logging in via OpenID Connect. This is useful when switching to
an existing SSO system.
On startup `MultiWriteIdGenerator` fetches the maximum stream ID for
each instance from the table and uses that as its initial "current
position" for each writer. This is problematic as a) it involves either
a scan of events table or an index (neither of which is ideal), and b)
if rows are being persisted out of order elsewhere while the process
restarts then using the maximum stream ID is not correct. This could
theoretically lead to race conditions where e.g. events that are
persisted out of order are not sent down sync streams.
We fix this by creating a new table that tracks the current positions of
each writer to the stream, and update it each time we finish persisting
a new entry. This is a relatively small overhead when persisting events.
However for the cache invalidation stream this is a much bigger relative
overhead, so instead we note that for invalidation we don't actually
care about reliability over restarts (as there's no caches to
invalidate) and simply don't bother reading and writing to the new table
in that particular case.
#8037 changed the default `autoescape` option when rendering Jinja2 templates from `False` to `True`. This caused some bugs, noticeably around redirect URLs being escaped in SAML2 auth confirmation templates, causing those URLs to break for users.
This change returns the previous behaviour as it stood. We may want to look at each template individually and see whether autoescaping is a good idea at some point, but for now lets just fix the breakage.
The idea is to remove some of the places we pass around `int`, where it can represent one of two things:
1. the position of an event in the stream; or
2. a token that partitions the stream, used as part of the stream tokens.
The valid operations are then:
1. did a position happen before or after a token;
2. get all events that happened before or after a token; and
3. get all events between two tokens.
(Note that we don't want to allow other operations as we want to change the tokens to be vector clocks rather than simple ints)
I'd like to get a better insight into what we are doing with respect to state
res. The list of state groups we are resolving across should be short (if it
isn't, that's a massive problem in itself), so it should be fine to log it in
ite entiretly.
I've done some grepping and found approximately zero cases in which the
"shortcut" code delivered the result, so I've ripped that out too.
When updating the `room_stats_state` table, we try to check for null bytes slipping in to the content for state events. It turns out we had added `guest_access` as a field to room_stats_state without including it in the null byte check.
Lo and behold, a null byte in a `m.room.guest_access` event then breaks `room_stats_state` updates.
This PR adds the check for `guest_access`.
This change adds a note and a few lines of configuration settings for Apache users to disable ModSecurity for Synapse's virtual hosts. With ModSecurity enabled and running with its default settings, Matrix clients are unable to send chat messages through the Synapse installation. With this change, ModSecurity can be disabled only for the Synapse virtual hosts.
When updating room_stats_state, we try to check for null bytes slipping
in to the
content for state events. It turns out we had added guest_access as a
field to
room_stats_state without including it in the null byte check.
Lo and behold, a null byte in a m.room.guest_access event then breaks
room_stats_state
updates.
This PR adds the check for guest_access. A further PR will improve this
function so that this hopefully does not happen again in future.
Fixes: #8359
Trying to reactivate a user with the admin API (`PUT /_synapse/admin/v2/users/<user_name>`) causes an internal server error.
Seems to be a regression in #8033.
* Create a new function to verify that the length of a device name is
under a certain threshold.
* Refactor old code and tests to use said function.
* Verify device name length during registration of device
* Add a test for the above
Signed-off-by: Dionysis Grigoropoulos <dgrig@erethon.com>
1.19.3
Synapse 1.19.3 (2020-09-18)
===========================
Bugfixes
--------
- Partially mitigate bug where newly joined servers couldn't get past
events in a room when there is a malformed event.
([\#8350](https://github.com/matrix-org/synapse/issues/8350))
Signed-off-by: Olivier Wilkinson (reivilibre) <olivier@librepush.net>
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
* Fix _set_destination_retry_timings
This came about because the code assumed that retry_interval
could not be NULL — which has been challenged by catch-up.
Add ability for ASes to /login using the `uk.half-shot.msc2778.login.application_service` login `type`.
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
This is a bit of a hack, as `_check_sigs_and_hash_and_fetch` is intended
for attempting to pull an event from the database/(re)pull it from the
server that originally sent the event if checking the signature of the
event fails.
During backfill we *know* that we won't have the event in our database,
however it is still useful to be able to query the original sending
server as the server we're backfilling from may be acting maliciously.
The main benefit and reason for this change however is that
`_check_sigs_and_hash_and_fetch` will drop an event during backfill if
it cannot be successfully validated, whereas the current code will
simply fail the backfill request - resulting in the client's /messages
request silently being dropped.
This is a quick patch to fix backfilling rooms that contain malformed
events. A better implementation in planned in future.
Instead of just using the most recent extremities let's pick the
ones that will give us results that the pagination request cares about,
i.e. pick extremities only if they have a smaller depth than the
pagination token.
This is useful when we fail to backfill an extremity, as we no longer
get stuck requesting that same extremity repeatedly.
Synapse 1.20.0rc4 (2020-09-16)
==============================
Synapse 1.20.0rc4 is identical to 1.20.0rc3, with the addition of the security fix that was included in 1.19.2.
slots use less memory (and attribute access is faster) while slightly
limiting the flexibility of the class attributes. This focuses on objects
which are instantiated "often" and for short periods of time.
This is *not* ready for production yet. Caveats:
1. We should write some tests...
2. The stream token that we use for events can get stalled at the minimum position of all writers. This means that new events may not be processed and e.g. sent down sync streams if a writer isn't writing or is slow.
Synapse 1.20.0rc3 (2020-09-11)
==============================
Bugfixes
--------
- Fix a bug introduced in v1.20.0rc1 where the wrong exception was raised when invalid JSON data is encountered. ([\#8291](https://github.com/matrix-org/synapse/issues/8291))
Some Linux distros have begun disabling TLSv1.0 and TLSv1.1 by default
for security reasons, for example in Fedora 33 onwards:
https://fedoraproject.org/wiki/Changes/StrongCryptoSettings2
Use TLSv1.2 for the fake TLS servers created in the test suite, to avoid
failures due to OpenSSL disallowing TLSv1.0:
<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines',
'ssl_choose_client_version', 'unsupported protocol')]>
Signed-off-by: Dan Callaghan <djc@djc.id.au>
This PR adds a information about forwarding `/_synapse/client` endpoints through your reverse proxy. The first of these endpoints are introduced in https://github.com/matrix-org/synapse/pull/8004.
The idea here is that we pass the `max_stream_id` to everything, and only use the stream ID of the particular event to figure out *when* the max stream position has caught up to the event and we can notify people about it.
This is to maintain the distinction between the position of an item in the stream (i.e. event A has stream ID 513) and a token that can be used to partition the stream (i.e. give me all events after stream ID 352). This distinction becomes important when the tokens are more complicated than a single number, which they will be once we start tracking the position of multiple writers in the tokens.
The valid operations here are:
1. Is a position before or after a token
2. Fetching all events between two tokens
3. Merging multiple tokens to get the "max", i.e. `C = max(A, B)` means that for all positions P where P is before A *or* before B, then P is before C.
Future PR will change the token type to a dedicated type.
This PR adds a confirmation step to resetting your user password between clicking the link in your email and your password actually being reset.
This is to better align our password reset flow with the industry standard of requiring a confirmation from the user after email validation.
If a file cannot be thumbnailed for some reason (e.g. the file is empty), then
catch the exception and convert it to a reasonable error message for the client.
`pusher_pool.on_new_notifications` expected a min and max stream ID, however that was not what we were passing in. Instead, let's just pass it the current max stream ID and have it track the last stream ID it got passed.
I believe that it mostly worked as we called the function for every event. However, it would break for events that got persisted out of order, i.e, that were persisted but the max stream ID wasn't incremented as not all preceding events had finished persisting, and push for that event would be delayed until another event got pushed to the effected users.
This fixes an issue where different methods (crop/scale) overwrite each other.
This first tries the new path. If that fails and we are looking for a
remote thumbnail, it tries the old path. If that still isn't found, it
continues as normal.
This should probably be removed in the future, after some of the newer
thumbnails were generated with the new path on most deployments. Then
the overhead should be minimal if the other thumbnails need to be
regenerated.
Signed-off-by: Nicolas Werner <nicolas.werner@hotmail.de>
The intention here is to change `StreamToken.room_key` to be a `RoomStreamToken` in a future PR, but that is a big enough change without this refactoring too.
This is a config option ported over from DINUM's Sydent: https://github.com/matrix-org/sydent/pull/285
They've switched to validating 3PIDs via Synapse rather than Sydent, and would like to retain this functionality.
This original purpose for this change is phishing prevention. This solution could also potentially be replaced by a similar one to https://github.com/matrix-org/synapse/pull/8004, but across all `*/submit_token` endpoint.
This option may still be useful to enterprise even with that safeguard in place though, if they want to be absolutely sure that their employees don't follow links to other domains.
This removes `SourcePaginationConfig` and `get_pagination_rows`. The reasoning behind this is that these generic classes/functions erased the types of the IDs it used (i.e. instead of passing around `StreamToken` it'd pass in e.g. `token.room_key`, which don't have uniform types).
By importing from canonicaljson the simplejson module was still being used
in some situations. After this change the std lib json is consistenty used
throughout Synapse.
Fixes https://github.com/matrix-org/synapse/issues/8238
Alongside the delta file, some changes were also necessary to the codebase to remove references to the now defunct `populate_stats_process_rooms_2` background job. Thankfully the latter doesn't seem to have made it into any documentation yet :)
The version 1.3.0 has a bug with unicode charecters:
```
>>> from canonicaljson import encode_pretty_printed_json
>>> encode_pretty_printed_json({'a': 'à'})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/erdnaxeli/.pyenv/versions/3.6.7/lib/python3.6/site-packages/canonicaljson.py", line 96, in encode_pretty_printed_json
return _pretty_encoder.encode(json_object).encode("ascii")
UnicodeEncodeError: 'ascii' codec can't encode character '\xe0' in position 12: ordinal not in range(128)
```
Signed-off-by: Alexandre Morignot <erdnaxeli@cervoi.se>
Co-authored-by: Alexandre Morignot <erdnaxeli@cervoi.se>
* Fixup `ALTER TABLE` database queries
Make the new columns nullable, because doing otherwise can wedge a
server with a big database, as setting a default value rewrites the
table.
* Switch back to using the notifications count in the push badge
Clients are likely to be confused if we send a push but the badge count
is the unread messages one, and not the notifications one.
* Changelog
This is *not* ready for production yet. Caveats:
1. We should write some tests...
2. The stream token that we use for events can get stalled at the minimum position of all writers. This means that new events may not be processed and e.g. sent down sync streams if a writer isn't writing or is slow.
* Add shared_rooms api
* Add changelog
* Add .
* Wrap response in {"rooms": }
* linting
* Add unstable_features key
* Remove options from isort that aren't part of 5.x
`-y` and `-rc` are now default behaviour and no longer exist.
`dont-skip` is no longer required
https://timothycrosley.github.io/isort/CHANGELOG/#500-penny-july-4-2020
* Update imports to make isort happy
* Add changelog
* Update tox.ini file with correct invocation
* fix linting again for isort
* Vendor prefix unstable API
* Fix to match spec
* import Codes
* import Codes
* Use FORBIDDEN
* Update changelog.d/7785.feature
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
* Implement get_shared_rooms_for_users
* a comma
* trailing whitespace
* Handle the easy feedback
* Switch to using runInteraction
* Add tests
* Feedback
* Seperate unstable endpoint from v2
* Add upgrade node
* a line
* Fix style by adding a blank line at EOF.
* Update synapse/storage/databases/main/user_directory.py
Co-authored-by: Tulir Asokan <tulir@maunium.net>
* Update synapse/storage/databases/main/user_directory.py
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
* Update UPGRADE.rst
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
* Fix UPGRADE/CHANGELOG unstable paths
unstable unstable unstable
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
Co-authored-by: Tulir Asokan <tulir@maunium.net>
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
Co-authored-by: Tulir Asokan <tulir@maunium.net>
* Move `get_devices_with_keys_by_user` to `EndToEndKeyWorkerStore`
this seems a better fit for it.
This commit simply moves the existing code: no other changes at all.
* Rename `get_devices_with_keys_by_user`
to better reflect what it does.
* get_device_stream_token abstract method
To avoid referencing fields which are declared in the derived classes, make
`get_device_stream_token` abstract, and define that in the classes which define
`_device_list_id_gen`.
... and to show that it does something slightly different to
`_get_e2e_device_keys_txn`.
`include_all_devices` and `include_deleted_devices` were never used (and
`include_deleted_devices` was broken, since that would cause `None`s in the
result which were not handled in the loop below.
Add some typing too.
This fixes a bug where having multiple callers waiting on the same
stream and position will cause it to try and compare two deferreds,
which fails (due to the sorted list having an entry of `Tuple[int,
Deferred]`).
This is split out from https://github.com/matrix-org/synapse/pull/7438, which had gotten rather large.
`LoginRestServlet` has a couple helper methods, `login_submission_legacy_convert` and `login_id_thirdparty_from_phone`. They're primarily used for converting legacy user login submissions to "identifier" dicts ([see spec](https://matrix.org/docs/spec/client_server/r0.6.1#post-matrix-client-r0-login)). Identifying information such as usernames or 3PID information used to be top-level in the login body. They're now supposed to be put inside an [identifier](https://matrix.org/docs/spec/client_server/r0.6.1#identifier-types) parameter instead.
#7438's purpose is to allow using the new identifier parameter during User-Interactive Authentication, which is currently handled in AuthHandler. That's why I've moved these helper methods there. I also moved the refactoring of these method from #7438 as they're relevant.
This ensures systemctl start matrix-synapse returns only after synapse
is actually started, which is very useful for automated deployments.
Fixes#5761
Signed-off-by: Dexter Chua <dec41@srcf.net>
#8174 removed the `is_guest` parameter from `get_room_data`, at the same time that #8157 was merged using it, colliding together to break unit tests on develop.
This PR removes the `is_guest` parameter from the call in the broken test.
Uses the same changelog as #8174.
Small cleanup PR.
* Removed the unused `is_guest` argument
* Added a safeguard to a (currently) impossible code path, fixing static checking at the same time.
Synapse 1.19.1rc1 (2020-08-25)
==============================
Bugfixes
--------
- Fix a bug introduced in v1.19.0 where appservices with ratelimiting disabled would still be ratelimited when joining rooms. ([\#8139](https://github.com/matrix-org/synapse/issues/8139))
- Fix a bug introduced in v1.19.0 that would cause e.g. profile updates to fail due to incorrect application of rate limits on join requests. ([\#8153](https://github.com/matrix-org/synapse/issues/8153))
Add new method ratelimiter.can_requester_do_action and ensure that appservices are exempt from being ratelimited.
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
Co-authored-by: Erik Johnston <erik@matrix.org>
* Don't raise session_id errors on submit_token if request_token_inhibit_3pid_errors is set
* Changelog
* Also wait some time before responding to /requestToken
* Incorporate review
* Update synapse/storage/databases/main/registration.py
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
* Incorporate review
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
Add new method ratelimiter.can_requester_do_action and ensure that appservices are exempt from being ratelimited.
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
Co-authored-by: Erik Johnston <erik@matrix.org>
It's just a thin wrapper around two ID gens to make `get_current_token`
and `get_next` return tuples. This can easily be replaced by calling the
appropriate methods on the underlying ID gens directly.
The function is used for two purposes: 1) for subscribers of streams to
get a token they can use to get further updates with, and 2) for
replication to track position of the writers of the stream.
For streams with a single writer the two scenarios produce the same
result, however the situation becomes complicated for streams with
multiple writers. The current `MultiWriterIdGenerator` does not
correctly handle the first case (which is not an issue as its only used
for the `caches` stream which nothing subscribes to outside of
replication).
If we got an error persisting an event, we would try to remove the push actions
asynchronously, which would lead to a 'Re-starting finished log context'
warning.
I don't think there's any need for this to be asynchronous.
We do this to prevent foot guns. The default config uses a MemoryFilter,
but users are free to change to logging to files directly. If they do
then they have to ensure to set the `filters: [context]` on the right
handler, otherwise records get written with the wrong context.
Instead we move the logic to happen when we generate a record, which is
when we *log* rather than *handle*.
(It's possible to add filters to loggers in the config, however they
don't apply to descendant loggers and so they have to be manually set on
*every* logger used in the code base)
We do this to prevent foot guns. The default config uses a MemoryFilter,
but users are free to change to logging to files directly. If they do
then they have to ensure to set the `filters: [context]` on the right
handler, otherwise records get written with the wrong context.
Instead we move the logic to happen when we generate a record, which is
when we *log* rather than *handle*.
(It's possible to add filters to loggers in the config, however they
don't apply to descendant loggers and so they have to be manually set on
*every* logger used in the code base)
c.f. #8021
A lot of the code here is to change the `Completed 200 OK` logging to include the request URI so that we can drop the `Sending request...` log line.
Some notes:
1. We won't log retries, which may be confusing considering the time taken log line includes retries and sleeps.
2. The `_send_request_with_optional_trailing_slash` will always be logged *without* the forward slash, even if it succeeded only with the forward slash.
* Change default log config to buffer by default.
This batches up writes to the filesystem, which is more efficient for
disk I/O. This means that it can take some time for logs to get written
to disk. Note that ERROR logs (and above) immediately flush the buffer.
This only effects new installs, as we only write the log config if
started with `--generate-config` (in the same way we do for generating
signing keys).
* Default to keeping last 4 days of logs.
This hopefully reduces the amount of logs kept for new servers. Keeping
the last 1GB of logs is likely overkill for new servers, but equally may
not be enough for busy ones.
Instead, we keep the last four days worth of logs, enough so that admins
can investigate any problems that happened over e.g. a long weekend.
Duplicating function signatures between server.py and server.pyi is
silly. This commit changes that by changing all `build_*` methods to
`get_*` methods and changing the `_make_dependency_method` to work work
as a descriptor that caches the produced value.
There are some changes in other files that were made to fix the typing
in server.py.
This PR:
* Reduces the amount of noise in the `check-newsfragment` CI output by hiding the dependency installation output by default.
* Prints a link to the changelog/debian changelog section of the contributing guide if an error is found.
This has long been something I've wanted to do. Basically the `Daemonize` code
is both too flexible and not flexible enough, in that it offers a bunch of
features that we don't use (changing UID, closing FDs in the child, logging to
syslog) and doesn't offer a bunch that we could do with (redirecting stdout/err
to a file instead of /dev/null; having the parent not exit until the child is
running).
As a first step, I've lifted the Daemonize code and removed the bits we don't
use. This should be a non-functional change. Fixing everything else will come
later.
We've [decided](https://github.com/matrix-org/synapse/issues/5253#issuecomment-665976308) to remove the signature check for v1 lookups.
The signature check has been removed in v2 lookups. v1 lookups are currently deprecated. As mentioned in the above linked issue, this verification was causing deployments for the vector.im and matrix.org IS deployments, and this change is the simplest solution, without being unjustified.
Implementations are encouraged to use the v2 lookup API as it has [increased privacy benefits](https://github.com/matrix-org/matrix-doc/pull/2134).
`StatsHandler` handles updates to the `current_state_delta_stream`, and updates room stats such as the amount of state events, joined users, etc.
However, it counts every new join membership as a new user entering a room (and that user being in another room), whereas it's possible for a user's membership status to go from join -> join, for instance when they change their per-room profile information.
This PR adds a check for join->join membership transitions, and bails out early, as none of the further checks are necessary at that point.
Due to this bug, membership stats in many rooms have ended up being wildly larger than their true values. I am not sure if we also want to include a migration step which recalculates these statistics (possibly using the `_populate_stats_process_rooms` bg update).
Bug introduced in the initial implementation https://github.com/matrix-org/synapse/pull/4338.
Thanks to some slightly overzealous cleanup in the
`delete_old_current_state_events`, it's possible to end up with no
`event_forward_extremities` in a room where we have outstanding local
invites. The user would then get a "no create event in auth events" when trying
to reject the invite.
We can hack around it by using the dangling invite as the prev event.
If there are *no* files with CRLF line endings, then the xargs exits with a
non-zero exit code (as expected), but then, since that is the last thing to
happen in the script, the script as a whole exits non-zero, making the whole
thing fail.
using `if/then/fi` instead of `&& (...)` means that the script exits with a
zero exit code.
IIRC this doesn't break tests because its only hit on reconnection, or something.
Basically, when a process needs to fetch missing updates for the `typing` stream it needs to query the writer instance via HTTP (as we don't write typing notifications to the DB), the problem was that the endpoint (`streams`) was only registered on master and specifically not on the typing writer worker.
Most of the stuff we do for replication commands can be done synchronously. There's no point spinning up background processes if we're not going to need them.
Handling of incoming typing stream updates from replication was not
hooked up on master, effecting set ups where typing was handled on a
different worker.
This is really only a problem if the master process is also handling
sync requests, which is unlikely for those that are at the stage of
moving typing off.
The other observable effect is that if a worker restarts or a
replication connect drops then the typing worker will issue a
`POSITION typing`, triggering master process to try and stream *all*
typing updates from position 0.
Fixes#7907
Converts tests/rest/admin/test_room.py to have unix file endings after they were accidentally changed in #7613.
Keeping the same changelog as #7613 as it hasn't gone out in a release yet.
If we send out an event which refers to `prev_events` which other servers in
the federation are missing, then (after a round or two of backfill attempts),
they will end up asking us for `/state_ids` at a particular point in the DAG.
As per https://github.com/matrix-org/synapse/issues/7893, this is quite
expensive, and we tend to see lots of very similar requests around the same
time.
We can therefore handle this much more efficiently by using a cache, which (a)
ensures that if we see the same request from multiple servers (or even the same
server, multiple times), then they share the result, and (b) any other servers
that miss the initial excitement can also benefit from the work.
[It's interesting to note that `/state` has a cache for exactly this
reason. `/state` is now essentially unused and replaced with `/state_ids`, but
evidently when we replaced it we forgot to add a cache to the new endpoint.]
For inbound federation requests, if a given remote server makes too many
requests at once, we start stacking them up rather than processing them
immediatedly.
However, that means that there is a fair chance that the requesting server will
disconnect before we start processing the request. In that case, if it was a
read-only request (ie, a GET request), there is absolutely no point in
building a response (and some requests are quite expensive to handle).
Even in the case of a POST request, one of two things will happen:
* Most likely, the requesting server will retry the request and we'll get the
information anyway.
* Even if it doesn't, the requesting server has to assume that we didn't get
the memo, and act accordingly.
In short, we're better off aborting the request at this point rather than
ploughing on with what might be a quite expensive request.
Run `isort`, `flake8` and `black` over the `contrib/` directory and `synctl` script. The latter was already being done in CI, but now the linting script does it too.
Fixes https://github.com/matrix-org/synapse/issues/7910
The [postgres setup docs](https://github.com/matrix-org/synapse/blob/develop/docs/postgres.md#set-up-database) recommend setting up your database with user `synapse_user`.
However, uncommenting the postgres defaults in the sample config leave you with user `synapse`.
This PR switches the sample config to recommend `synapse_user`. Took a me a second to figure this out, so assume this will beneficial to others.
As mentioned in #7397, switching to a debian base should help with multi-arch work to save time on compiling. This is unashamedly based on #6373, but without the extra functionality. Switch python version back to generic 3.7 to always pull the latest. Essentially, keeping this as small as possible. The image is bigger though unfortunately.
It serves no purpose and updating everytime we write to the device inbox
stream means all such transactions will conflict, causing lots of
transaction failures and retries.
I'm pretty sure there's no technical reason these have to be distinct server blocks, so collapse into one and go with the more terse location block.
Signed-off-by: Luke W Faraone <luke@faraone.cc>
When we get behind on replication, we tend to stack up background processes
behind a linearizer. Bg processes are heavy (particularly with respect to
prometheus metrics) and linearizers aren't terribly efficient once the queue
gets long either.
A better approach is to maintain a queue of requests to be processed, and
nominate a single process to work its way through the queue.
Fixes: #7444
It was correct at the time of our friend Jorik writing it (checking
git blame), but the world has moved now and it is no longer a
generator.
Signed-off-by: Olivier Wilkinson (reivilibre) <olivier@librepush.net>
When considering rooms to clean up in `delete_old_current_state_events`, skip
rooms which we are creating, which otherwise look a bit like rooms we have
left.
Fixes#7834.
As far as I can tell from the sentry logs, the only time this has actually done
anything in the last two years is when we had two master workers running at
once, and even then, it made a bit of a mess of it (see
https://github.com/matrix-org/synapse/issues/7845#issuecomment-658238739).
Generally I feel like this code is doing more harm than good.
I'm tempted to remove this section entirely, but it's helpful for admins who are trying to figure out why their Synapse is crashing on start with ACME errors.
Signed-off-by: Luke W Faraone <luke@faraone.cc>
The replication client requires that arguments are given as keyword
arguments, which was not done in this case. We also pull out the logic
so that we can catch and handle any exceptions raised, rather than
leaving them unhandled.
When fetching the state of a room over federation we receive the event
IDs of the state and auth chain. We then fetch those events that we
don't already have.
However, we used a function that recursively fetched any missing auth
events for the fetched events, which can lead to a lot of recursion if
the server is missing most of the auth chain. This work is entirely
pointless because would have queued up the missing events in the auth
chain to be fetched already.
Let's just diable the recursion, since it only gets called from one
place anyway.
Fixes#2181.
The basic premise is that, when we
fail to reject an invite via the remote server, we can generate our own
out-of-band leave event and persist it as an outlier, so that we have something
to send to the client.
* Starting with apt 1.6, https support has moved into the main package and apt-transport-https has become a transitional dummy package.
Signed-off-by: Dirk Heinrichs <dirk.heinrichs@altum.de>
This table is no longer used, so we may as well stop populating it. Removing it
would prevent people rolling back to older releases of Synapse, so that can
happen in a future release.
* Fix spec compliance; tweaks without values are valid
(default to True, which is only concretely specified for
`highlight`, but it seems only reasonable to generalise)
* Changelog for 7766.
* Add documentation to `tweaks_for_actions`
May as well tidy up when I'm here.
* Add a test for `tweaks_for_actions`
Fixes https://github.com/matrix-org/synapse/issues/7641
The package was pinned to <0.8.0 without an obvious reasoning with
7ad1d7635
in https://github.com/matrix-org/synapse/pull/5636
while the version selection looks to just try to exclude an arbitrary
next minor version number that might introduce API breaking changes.
Selecting the next minor number might be a good conservative selection.
Downstream distributions already reported success patching out the version
requirements.
This also fixes the integration of upgraded packages into openSUSE packages,
e.g. for openSUSE Tumbleweed which already ships prometheus_client >= 0.8 .
Signed-off-by: Oliver Kurz <okurz@suse.de>
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
The CI appears to use the latest version of isort, which is a problem when isort gets a major version bump. Rather than try to pin the version, I've done the necessary to make isort5 happy with synapse.
Synapse 1.15.2 (2020-07-02)
===========================
Due to the two security issues highlighted below, server administrators are
encouraged to update Synapse. We are not aware of these vulnerabilities being
exploited in the wild.
Security advisory
-----------------
* A malicious homeserver could force Synapse to reset the state in a room to a
small subset of the correct state. This affects all Synapse deployments which
federate with untrusted servers. ([96e9afe6](96e9afe625))
* HTML pages served via Synapse were vulnerable to clickjacking attacks. This
predominantly affects homeservers with single-sign-on enabled, but all server
administrators are encouraged to upgrade. ([ea26e9a9](ea26e9a98b))
This was reported by [Quentin Gliech](https://sandhose.fr/).
- Remove the requirement for a specific version of Python
- Move dep comment to a separate line, Tox 3.7.0 like trailing ones
Signed-off-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
State res v2 across large data sets can be very CPU intensive, and if
all the relevant events are in the cache the algorithm will run from
start to finish within a single reactor tick. This can result in
blocking the reactor tick for several seconds, which can have major
repercussions on other requests.
To fix this we simply add the occaisonal `sleep(0)` during iterations to
yield execution until the next reactor tick. The aim is to only do this
for large data sets so that we don't impact otherwise quick resolutions.=
HTTP requires the response to contain a Content-Length header unless chunked encoding is being used.
Prometheus metrics endpoint did not set this, causing software such as prometheus-proxy to not be able to scrape synapse for metrics.
Signed-off-by: Christian Svensson <blue@cmd.nu>
Older versions of `parameterized` package have no `parameterized_class` decorator. This decorator is used in tests.
Signed-off-by: Oleg Girko <ol@infoserver.lv>
* Always return an unread_count in get_unread_event_push_actions_by_room_for_user
* Don't always expect unread_count to be there so we don't take out sync entirely if something goes wrong
This requires a new config option to specify which media repo should be
responsible for running background jobs to e.g. clear out expired URL
preview caches.
The aim here is to make it easier to reason about when streams are limited and when they're not, by moving the logic into the database functions themselves. This should mean we can kill of `db_query_to_update_function` function.
This ended up being a bit more invasive than I'd hoped for (not helped by
generic_worker duplicating some of the code from homeserver), but hopefully
it's an improvement.
The idea is that, rather than storing unstructured `dict`s in the config for
the listener configurations, we instead parse it into a structured
`ListenerConfig` object.
Fixes https://github.com/matrix-org/synapse/issues/7683
Broke in: #7649
We had a `yield` acting on a coroutine. To be fair this one is a bit difficult to notice as there's a function in the middle that just passes the coroutine along.
The spec [states](https://matrix.org/docs/spec/client_server/r0.6.1#phone-number) that `m.id.phone` requires the field `country` and `phone`.
In Synapse, we've been enforcing `country` and `number`.
I am not currently sure whether this affects any client implementations.
This issue was introduced in #1994.
Fixes https://github.com/matrix-org/synapse/issues/2431
Adds config option `encryption_enabled_by_default_for_room_type`, which determines whether encryption should be enabled with the default encryption algorithm in private or public rooms upon creation. Whether the room is private or public is decided based upon the room creation preset that is used.
Part of this PR is also pulling out all of the individual instances of `m.megolm.v1.aes-sha2` into a constant variable to eliminate typos ala https://github.com/matrix-org/synapse/pull/7637
Based on #7637
* Ensure account data stream IDs are unique.
The account data stream is shared between three tables, and the maximum
allocated ID was tracked in a dedicated table. Updating the max ID
happened outside the transaction that allocated the ID, leading to a
race where if the server was restarted then the same ID could be
allocated but the max ID failed to be updated, leading it to be reused.
The ID generators have support for tracking across multiple tables, so
we may as well use that instead of a dedicated table.
* Fix bug in account data replication stream.
If the same stream ID was used in both global and room account data then
the getting updates for the replication stream would fail due to
`heapq.merge(..)` trying to compare a `str` with a `None`. (This is
because you'd have two rows like `(534, '!room')` and `(534, None)` from
the room and global account data tables).
Fix is just to order by stream ID, since we don't rely on the ordering
beyond that. The bug where stream IDs can be reused should be fixed now,
so this case shouldn't happen going forward.
Fixes#7617
While working on https://github.com/matrix-org/synapse/issues/5665 I found myself digging into the `Ratelimiter` class and seeing that it was both:
* Rather undocumented, and
* causing a *lot* of config checks
This PR attempts to refactor and comment the `Ratelimiter` class, as well as encourage config file accesses to only be done at instantiation.
Best to be reviewed commit-by-commit.
Calls `self.get_success` on all deferred methods instead of abusing `self.pump()`. This has the benefit of working with coroutines, as well as checking that method execution completed successfully.
There are also a few small cleanups that I made in the process.
* Expose `return_html_error`, and allow it to take a Jinja2 template instead of a raw string
* Clean up exception handling in SAML2ResponseResource
* use the existing code in `return_html_error` instead of re-implementing it
(giving it a jinja2 template rather than inventing a new form of template)
* do the exception-catching in the REST layer rather than in the handler
layer, to make sure we catch all exceptions.
It looks like `user_device_resync` was ignoring cross-signing keys from the results received from the remote server. This patch fixes this, by processing these keys using the same process `_handle_signing_key_updates` does (and effectively factor that part out of that function).
The query keeps showing up in my slow query log.
This changes the plan under the top-level Sort node from
```
WindowAgg (cost=280335.88..292963.15 rows=561212 width=80) (actual time=138.651..160.562 rows=27112 loops=1)
-> Sort (cost=280335.88..281738.91 rows=561212 width=84) (actual time=138.597..140.622 rows=27112 loops=1)
Sort Key: state_groups_state.type, state_groups_state.state_key, state_groups_state.state_group
Sort Method: quicksort Memory: 4581kB
-> Nested Loop (cost=2.83..226745.22 rows=561212 width=84) (actual time=21.548..47.657 rows=27112 loops=1)
-> HashAggregate (cost=2.27..3.28 rows=101 width=8) (actual time=21.526..21.535 rows=20 loops=1)
Group Key: state.state_group
-> CTE Scan on state (cost=0.00..2.02 rows=101 width=8) (actual time=21.280..21.493 rows=20 loops=1)
-> Index Scan using state_groups_state_type_idx on state_groups_state (cost=0.56..2189.40 rows=5557 width=84) (actual time=0.005..0.991 rows=1356 loops=20)
Index Cond: (state_group = state.state_group)
```
to
```
Nested Loop (cost=2.83..226745.22 rows=561212 width=84) (actual time=24.194..52.834 rows=27112 loops=1)
-> HashAggregate (cost=2.27..3.28 rows=101 width=8) (actual time=24.130..24.138 rows=20 loops=1)
Group Key: state.state_group
-> CTE Scan on state (cost=0.00..2.02 rows=101 width=8) (actual time=23.887..24.113 rows=20 loops=1)
-> Index Scan using state_groups_state_type_idx on state_groups_state (cost=0.56..2189.40 rows=5557 width=84) (actual time=0.016..1.159 rows=1356 loops=20)
Index Cond: (state_group = state.state_group)
```
This cuts the execution time from ~190ms to ~130ms, i.e. a reduction
of ~30%.
The full plans are visualised at https://explain.depesz.com/s/WpbT and
https://explain.depesz.com/s/KlEk
Signed-off-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Without this patch, if an error happens which isn't caught by `user_device_resync`, then `_maybe_retry_device_resync` would fail, without retrying the next users in the iteration. This patch fixes this so that it now only logs an error in this case.
Synapse was added to the ports tree in Nov, 2019 by Renaud Allard (https://marc.info/?l=openbsd-ports&m=157417848805329).
With the release of OpenBSD 6.7 on May 22, 2020 a pre-compiled binary is available as well.
'client_auth_method' commented out value was erronously 'client_auth_basic',
when code and docstring says it should be 'client_secret_basic'.
Signed-off-by: Jason Robinson <jasonr@matrix.org>
The bg update never managed to complete, because it kept being interrupted by
transactions which want to take a lock.
Just doing it in the foreground isn't that bad, and is a good deal simpler.
A couple of changes of significance:
* remove the `_last_ack < federation_position` condition, so that
updates will still be correctly processed after restart
* Correctly wire up send_federation_ack to the right class.
we can use `make_in_list_sql_clause` rather than doing our own half-baked
equivalent, which has the benefit of working just fine with empty lists.
(This has quite a lot of tests, so I think it's pretty safe)
The idea here is that if an instance persists an event via the replication HTTP API it can return before we receive that event over replication, which can lead to races where code assumes that persisting an event immediately updates various caches (e.g. current state of the room).
Most of Synapse doesn't hit such races, so we don't do the waiting automagically, instead we do so where necessary to avoid unnecessary delays. We may decide to change our minds here if it turns out there are a lot of subtle races going on.
People probably want to look at this commit by commit.
Instead of doing a complicated dance of deleting and moving aliases one
by one, which sends a canonical alias update into the old room for each
one, lets do it all in one go.
This also changes the function to move *all* local alias events to the new
room, however that happens later on anyway.
PyPy's gc.get_stats() returns an object containing detailed allocator statistics
which could be beneficial to collect as metrics.
Signed-off-by: Ivan Shapovalov <intelfx@intelfx.name>
When a call to `user_device_resync` fails, we don't currently mark the remote user's device list as out of sync, nor do we retry to sync it.
https://github.com/matrix-org/synapse/pull/6776 introduced some code infrastructure to mark device lists as stale/out of sync.
This commit uses that code infrastructure to mark device lists as out of sync if processing an incoming device list update makes the device handler realise that the device list is out of sync, but we can't resync right now.
It also adds a looping call to retry all failed resync every 30s. This shouldn't cause too much spam in the logs as this commit also removes the "Failed to handle device list update for..." warning logs when catching `NotRetryingDestination`.
Fixes#7418
We don't really make any promises about returning accurate presence data when
presence is disabled, so we may as well just return a static response, rather
than making the master handle a request.
`_is_server_still_joined` will throw if it is given state updates with non-user ID state keys with local user leaves. This is actually rarely a problem since local leaves almost always get persisted by themselves.
(I discovered this on a branch that was otherwise broken, so I haven't seen this in the wild)
* If an error occurs when stopping a process synctl now logs a warning.
* During a restart, synctl will avoid attempting to start Synapse if an error
occurs during stopping Synapse.
Make sure that the AccountDataStream presents complete updates, in the right
order.
This is much the same fix as #7337 and #7358, but applied to a different stream.
This is required as both event persistence and the background update needs access to this function. It should be perfectly safe for two workers to write to that table at the same time.
This allows us to have the logic on both master and workers, which is necessary to move event persistence off master.
We also combine the instantiation of ID generators from DataStore and slave stores to the base worker stores. This allows us to select which process writes events independently of the master/worker splits.
The specific headers that are passed using this new configuration format
are Host and X-Forwarded-For, which should be all that's required.
Note that for production another matcher should be added in the first
section to properly handle the base_url lookup:
reverse_proxy /.well-known/matrix/* http://localhost:8008
Signed-off-by: Jeff Peeler <jpeeler@gmail.com>
Synapse 1.13.0rc2 (2020-05-14)
==============================
Bugfixes
--------
- Fix a long-standing bug which could cause messages not to be sent over federation, when state events with state keys matching user IDs (such as custom user statuses) were received. ([\#7376](https://github.com/matrix-org/synapse/issues/7376))
- Restore compatibility with non-compliant clients during the user interactive authentication process, fixing a problem introduced in v1.13.0rc1. ([\#7483](https://github.com/matrix-org/synapse/issues/7483))
Internal Changes
----------------
- Fix linting errors in new version of Flake8. ([\#7470](https://github.com/matrix-org/synapse/issues/7470))
Fix a bug where the `get_joined_users` cache could be corrupted by custom
status events (or other state events with a state_key matching the user ID).
The bug was introduced by #2229, but has largely gone unnoticed since then.
Fixes#7099, #7373.
The aim here is to get to a stage where we have a `PersistEventStore` that holds all the write methods used during event persistence, so that we can take that class out of the `DataStore` mixin and instansiate it separately. This will allow us to instansiate it on processes other than master, while also ensuring it is only available on processes that are configured to write to events stream.
This is a bit of an architectural change, where we end up with multiple classes per data store (rather than one per data store we have now). We end up having:
1. Storage classes that provide high level APIs that can talk to multiple data stores.
2. Data store modules that consist of classes that must point at the same database instance.
3. Classes in a data store that can be instantiated on processes depending on config.
Before all streams were only written to from master, so only master needed to respond to `REPLICATE` commands.
Before all instances wrote to the cache invalidation stream, but didn't respond to `REPLICATE`. This was a bug, which could lead to missed rows from cache invalidation stream if an instance is restarted, however all the caches would be empty in that case so it wasn't a problem.
Proactively send out `POSITION` commands (as if we had just received a `REPLICATE`) when we connect to Redis. This is important as other instances won't notice we've connected to issue a `REPLICATE` command (unlike for direct TCP connections). This is only currently an issue if master process reconnects without restarting (if it restarts then it won't have written anything and so other instances probably won't have missed anything).
* release-v1.13.0:
Don't UPGRADE database rows
RST indenting
Put rollback instructions in upgrade notes
Fix changelog typo
Oh yeah, RST
Absolute URL it is then
Fix upgrade notes link
Provide summary of upgrade issues in changelog. Fix )
Move next version notes from changelog to upgrade notes
Changelog fixes
1.13.0rc1
Documentation on setting up redis (#7446)
Rework UI Auth session validation for registration (#7455)
Fix errors from malformed log line (#7454)
Drop support for redis.dbid (#7450)
For the record, the reason we need this is as follows:
each RDATA command comes down the redis pipe as a subscription message. txredisapi as written needs at least three reactor ticks to read each subscription message from the tcp buffer. Hence, once the process gets loaded, it starts getting behind, and eventually redis knifes the connection. it then takes ages for the master to work its way through the backlog, before it reconnects again, during which any commands from any workers are dropped.
An update of check-manifest shone some light on some issues with MANIFEST.in, specifically that we didn't ignore/prune the contrib directory, and that we were using prune instead of exclude for files. This fixes both issues.
Fixes#7403
We forgot to set the password on the subscriber connection, as well as
not calling super methods for overridden connectionMade/connectionLost
functions.
For in memory streams when fetching updates on workers we need to query the source of the stream, which currently is hard coded to be master. This PR threads through the source instance we received via `POSITION` through to the update function in each stream, which can then be passed to the replication client for in memory streams.
Continuation of #7379
Adds a section in the README telling people to go to #synapse:matrix.org instead of using github issues. I'm not entirely sure about placing it above the install section but then people are likely to first seek support when installing (if something goes boom), and it's probably better to have it as high as possible anyway so people actually see it.
We move the processing of typing and federation replication traffic into their handlers so that `Stream.current_token()` points to a valid token. This allows us to remove `get_streams_to_replicate()` and `stream_positions()`.
This PR moves the "support is in #synapse:matrix.org" in the bug report template outside of the comment as some people seem to ignore what's in the comments, and phrase it a bit more like the support request template. It also adds a default issue template that says the same thing. It's also adding a notice about the security disclosure to both the default template and the bug report one.
It also adds a badge to the top of the README with an alt text saying about the same message if the badge doesn't load (e.g. if matrix.org is slow).
Fixes#6826
By persisting the user interactive authentication sessions to the database, this fixes
situations where a user hits different works throughout their auth session and also
allows sessions to persist through restarts of Synapse.
This is primarily for allowing us to send those commands from workers, but for now simply allows us to ignore echoed RDATA/POSITION commands that we sent (we get echoes of sent commands when using redis). Currently we log a WARNING on the master process every time we receive an echoed RDATA.
For direct TCP connections we need the master to relay REMOTE_SERVER_UP
commands to the other connections so that all instances get notified
about it. The old implementation just relayed to all connections,
assuming that sending back to the original sender of the command was
safe. This is not true for redis, where commands sent get echoed back to
the sender, which was causing master to effectively infinite loop
sending and then re-receiving REMOTE_SERVER_UP commands that it sent.
The fix is to ensure that we only relay to *other* connections and not
to the connection we received the notification from.
Fixes#7334.
* Factor out functions for injecting events into database
I want to add some more flexibility to the tools for injecting events into the
database, and I don't want to clutter up HomeserverTestCase with them, so let's
factor them out to a new file.
* Rework TestReplicationDataHandler
This wasn't very easy to work with: the mock wrapping was largely superfluous,
and it's useful to be able to inspect the received rows, and clear out the
received list.
* Fix AssertionErrors being thrown by EventsStream
Part of the problem was that there was an off-by-one error in the assertion,
but also the limit logic was too simple. Fix it all up and add some tests.
Specifically some tests for the typing stream, which means we test streams that fetch missing updates via HTTP (rather than via the DB).
We also shuffle things around a bit so that we create two separate `HomeServer` objects, rather than trying to insert a slaved store into places.
Note: `test_typing.py` is heavily inspired by `test_receipts.py`
It doesn't seem to be documented anywhere and means that you suddenly start losing metrics without any obvious reason when you go from monolith to workers (e.g. #7312).
If the admin adds a `.yaml` file that's either empty or doesn't parse into a dict to a config directory (e.g. `conf.d` for debs installs), stuff like https://github.com/matrix-org/synapse/issues/7322 would happen. This PR checks that the file is correctly parsed into a dict, or ignores it with a warning if it parses into any other type (including `None` for empty files).
Fixes https://github.com/matrix-org/synapse/issues/7322
Figuring out how to correctly limit updates from this stream without dropping
entries is far more complicated than just counting the number of rows being
returned. We need to consider each query separately and, if any one query hits
the limit, truncate the results from the others.
I think this also fixes some potentially long-standing bugs where events or
state changes could get missed if we hit the limit on either query.
Synapse v1.12.4
Features:
* Always send users their own device updates. (#7160)
* Add support for handling GET requests for account_data on a worker. (#7311)
Bugfixes:
* Fix a bug that prevented cross-signing with users on worker-mode synapses. (#7255)
* Do not treat display names as globs in push rules. (#7271)
* Fix a bug with cross-signing devices belonging to remote users who did not share a
room with any user on the local homeserver. (#7289)
Long story short: if we're handling presence on the current worker, we shouldn't be sending USER_SYNC commands over replication.
In an attempt to figure out what is going on here, I ended up refactoring some bits of the presencehandler code, so the first 4 commits here are non-functional refactors to move this code slightly closer to sanity. (There's still plenty to do here :/). Suggest reviewing individual commits.
Fixes (I hope) #7257.
Adds a request_token_inhibit_errors configuration flag (disabled by
default) which, if enabled, change the behaviour of all /requestToken
endpoints so that they return a 200 and a fake sid if the 3PID was/was
not found associated with an account (depending on the endpoint),
instead of an error.
Co-Authored-By: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
First some background: StreamChangeCache is used to keep track of what "entities" have
changed since a given stream ID. So for example, we might use it to keep track of when the last
to-device message for a given user was received [1], and hence whether we need to pull any to-device messages from the database on a sync [2].
Now, it turns out that StreamChangeCache didn't support more than one thing being changed at
a given stream_id (this was part of the problem with #7206). However, it's entirely valid to send
to-device messages to more than one user at a time.
As it turns out, this did in fact work, because *some* methods of StreamChangeCache coped
ok with having multiple things changing on the same stream ID, and it seems we never actually
use the methods which don't work on the stream change caches where we allow multiple
changes at the same stream ID. But that feels horribly fragile, hence: let's update
StreamChangeCache to properly support this, and add some typing and some more tests while
we're at it.
[1]: https://github.com/matrix-org/synapse/blob/release-v1.12.3/synapse/storage/data_stores/main/deviceinbox.py#L301
[2]: https://github.com/matrix-org/synapse/blob/release-v1.12.3/synapse/storage/data_stores/main/deviceinbox.py#L47-L51
Splitting based on the response code means we can avoid double logging here and identical information from line 164 while still logging at info if we don't get a good response and need to retry.
Other parts of the code (such as the StreamChangeCache) assume that there will
not be multiple changes with the same stream id.
This code was introduced in #7024, and I hope this fixes#7206.
Add changelog
Save retrieved keys to the db
lint
Fix and de-brittle remote result dict processing
Use query_user_devices instead, assume only master, self_signing key types
Make changelog more useful
Remove very specific exception handling
Wrap get_verify_key_from_cross_signing_key in a try/except
Note that _get_e2e_cross_signing_verify_key can raise a SynapseError
lint
Add comment explaining why this is useful
Only fetch master and self_signing key types
Fix log statements, docstrings
Remove extraneous items from remote query try/except
lint
Factor key retrieval out into a separate function
Send device updates, modeled after SigningKeyEduUpdater._handle_signing_key_updates
Update method docstring
The general idea here is to get rid of the type: ignore annotations on all of the current_token and update_function assignments, which would have caught #7290.
After a bit of experimentation, it seems like the least-awful way to do this is to pass the offending functions in as parameters to the Stream constructor. Unfortunately that means that the concrete implementations no longer have the same constructor signature as Stream itself, which means that it gets hard to correctly annotate STREAMS_MAP.
I've also introduced a couple of new types, to take out some duplication.
Some of the query functions return generators rather than lists, so we can't
index into the result. Happily we already have a copy of the results.
(think this was introduced in #7024)
I don't really remember why this was so complicated; I think it dates
back to the time when we had to instantiate the Config classes before
we could call `add_arguments` - ie before #5597. In any case, I don't
think there's a good reason for it any more, and the impact of it
being complicated is that `--help` doesn't work correctly.
We pass --daemonize on the commandline, which (since at least #4853) overrides
whatever the config file, so there is no need for it to be set in the config
file.
`REPLICATE` is now a valid command, and it's nice if you can issue it from the
console without remembering to call it `REPLICATE ` with a trailing space.
Separate `SimpleCommand` from `Command`, so that things which don't want to use
the `data` property don't have to, and thus fix the warnings PyCharm was giving
me about not calling `__init__` in the base class.
The aim here is to move the command handling out of the TCP protocol classes and to also merge the client and server command handling (so that we can reuse them for redis protocol). This PR simply moves the client paths to the new `ReplicationCommandHandler`, a future PR will move the server paths too.
Fixes#6815
Before figuring out whether we should alert a user on MAU, we call get_notice_room_for_user to get some info on the existing server notices room for this user. This function, if the room doesn't exist, creates it and invites the user in it. This means that, if we decide later that no server notice is needed, the user gets invited in a room with no message in it. This happens at every restart of the server, since the room ID returned by get_notice_room_for_user is cached.
This PR fixes that by moving the inviting bit to a dedicated function, that's only called when the server actually needs to send a notice to the user. A potential issue with this approach is that the room that's created by get_notice_room_for_user doesn't match how that same function looks for an existing room (i.e. it creates a room that doesn't have an invite or a join for the current user in it, so it could lead to a new room being created each time a user syncs), but I'm not sure this is a problem given it's cached until the server restarts, so that function won't run very often.
It also renames get_notice_room_for_user into get_or_create_notice_room_for_user to make what it does clearer.
Synapse 1.12.3 (2020-04-03)
===========================
- Remove the the pin to Pillow 7.0 which was introduced in Synapse 1.12.2, and
correctly fix the issue with building the Debian packages. ([\#7212](https://github.com/matrix-org/synapse/issues/7212))
Synapse 1.12.2 (2020-04-02)
===========================
This release fixes [an
issue](https://github.com/matrix-org/synapse/issues/7208) with building the
debian packages.
No other significant changes since 1.12.1.
Occasionally we could get a federation device list update transaction which
looked like:
```
[
{'edu_type': 'm.device_list_update', 'content': {'user_id': '@user:test', 'device_id': 'D2', 'prev_id': [], 'stream_id': 12, 'deleted': True}},
{'edu_type': 'm.device_list_update', 'content': {'user_id': '@user:test', 'device_id': 'D1', 'prev_id': [12], 'stream_id': 11, 'deleted': True}},
{'edu_type': 'm.device_list_update', 'content': {'user_id': '@user:test', 'device_id': 'D3', 'prev_id': [11], 'stream_id': 13, 'deleted': True}}
]
```
Having `stream_ids` which are lower than `prev_ids` looks odd. It might work
(I'm not actually sure), but in any case it doesn't seem like a reasonable
thing to expect other implementations to support.
* master:
1.12.1
Note where bugs were introduced
1.12.1rc1
Newsfile
Rewrite changelog
Add changelog
Only import sqlite3 when type checking
Fix another instance
Only setdefault for signatures if device has key_json
Fix starting workers when federation sending not split out.
Attempt to clarify Python version requirements (#7161)
Improve the UX of the login fallback when using SSO (#7152)
Update the wording of the config comment
Lint
Changelog
Regenerate sample config
Whitelist the login fallback by default for SSO
Synapse 1.12.1 (2020-04-02)
===========================
No significant changes since 1.12.1rc1.
Synapse 1.12.1rc1 (2020-03-31)
==============================
Bugfixes
--------
- Fix starting workers when federation sending not split out. ([\#7133](https://github.com/matrix-org/synapse/issues/7133)). Introduced in v1.12.0.
- Avoid importing `sqlite3` when using the postgres backend. Contributed by David Vo. ([\#7155](https://github.com/matrix-org/synapse/issues/7155)). Introduced in v1.12.0rc1.
- Fix a bug which could cause outbound federation traffic to stop working if a client uploaded an incorrect e2e device signature. ([\#7177](https://github.com/matrix-org/synapse/issues/7177)). Introduced in v1.11.0.
* tag 'v1.12.1':
1.12.1
Note where bugs were introduced
1.12.1rc1
Newsfile
Rewrite changelog
Add changelog
Only import sqlite3 when type checking
Fix another instance
Only setdefault for signatures if device has key_json
Fix starting workers when federation sending not split out.
This broke in a recent PR (#7024) and is no longer useful due to all
replication clients implicitly subscribing to all streams, so let's
just remove it.
If there was an exception setting up one of the attributes of the Homeserver
god object, then future attempts to fetch that attribute would raise a
confusing "Cyclic dependency" error. Let's make sure that we clear the
`building` flag so that we just get the original exception.
Ref: #7169
* Remove `conn_id` usage for UserSyncCommand.
Each tcp replication connection is assigned a "conn_id", which is used
to give an ID to a remotely connected worker. In a redis world, there
will no longer be a one to one mapping between connection and instance,
so instead we need to replace such usages with an ID generated by the
remote instances and included in the replicaiton commands.
This really only effects UserSyncCommand.
* Add CLEAR_USER_SYNCS command that is sent on shutdown.
This should help with the case where a synchrotron gets restarted
gracefully, rather than rely on 5 minute timeout.
In particular, we depend on `typing.TYPE_CHECKING`, which is only present in
3.5.2.
It turns out that Ubuntu Xenial, despite having a package called `python 3
(3.5.1-3)`, actually has python 3.5.2, so I think this is fine.
That fallback sets the redirect URL to itself (so it can process the login
token then return gracefully to the client). This would make it pointless to
ask the user for confirmation, since the URL the confirmation page would be
showing wouldn't be the client's.
* Don't show the login forms if we're currently logging in with a
password or a token.
* Submit directly the SSO login form, showing only a spinner to the
user, in order to eliminate from the clunkiness of SSO through this
fallback.
* change debian package from python3-virtualenv to virtualenv
The virtualenv package is needed for the virtualenv command. The
virtualenv package depends on python3-virtualenv (at least since
debian jessie) so there is no need to specify python3-virtualenv
additionally.
Signed-off-by: Vieno Hakkerinen <vieno@hakkerinen.eu>
* Add changelog
Co-authored-by: Andrew Morgan <andrew@amorgan.xyz>
* Don't show the login forms if we're currently logging in with a
password or a token.
* Submit directly the SSO login form, showing only a spinner to the
user, in order to eliminate from the clunkiness of SSO through this
fallback.
This changes the replication protocol so that the server does not send down `RDATA` for rows that happened before the client connected. Instead, the server will send a `POSITION` and clients then query the database (or master out of band) to get up to date.
* Pull Sentinel out of LoggingContext
... and drop a few unnecessary references to it
* Factor out LoggingContext.current_context
move `current_context` and `set_context` out to top-level functions.
Mostly this means that I can more easily trace what's actually referring to
LoggingContext, but I think it's generally neater.
* move copy-to-parent into `stop`
this really just makes `start` and `stop` more symetric. It also means that it
behaves correctly if you manually `set_log_context` rather than using the
context manager.
* Replace `LoggingContext.alive` with `finished`
Turn `alive` into `finished` and make it a bit better defined.
* Add 'device_lists_outbound_pokes' as extra table.
This makes sure we check all the relevant tables to get the current max
stream ID.
Currently not doing so isn't problematic as the max stream ID in
`device_lists_outbound_pokes` is the same as in `device_lists_stream`,
however that will change.
* Change device lists stream to have one row per id.
This will make it possible to process the streams more incrementally,
avoiding having to process large chunks at once.
* Change device list replication to match new semantics.
Instead of sending down batches of user ID/host tuples, send down a row
per entity (user ID or host).
* Newsfile
* Remove handling of multiple rows per ID
* Fix worker handling
* Comments from review
This should be safe to do on all workers/masters because it is guarded by
a config option which will ensure it is only actually done on the worker
assigned as a pusher.
It was originally implemented by pulling the full auth chain of all
state sets out of the database and doing set comparison. However, that
can take a lot work if the state and auth chains are large.
Instead, lets try and fetch the auth chains at the same time and
calculate the difference on the fly, allowing us to bail early if all
the auth chains converge. Assuming that the auth chains do converge more
often than not, this should improve performance. Hopefully.
Fixes#7065
This is basically the same as https://github.com/matrix-org/synapse/pull/6847 except it tries to populate events from `state_events` rather than `current_state_events`, since the latter might have been cleared from the state of some rooms too early, leaving them with a `NULL` room version.
If an error happened while processing a SAML AuthN response, or a client
ends up doing a `GET` request to `/authn_response`, then render a
customisable error page rather than a confusing error.
Fixes#7054
I also had a look at the rest of the functions in
`EventPushActionsStore` and in the push notifications send code and it
looks to me like there shouldn't be any other method with this issue in
this part of the codebase.
This is a bit fiddly because it all has to be done on one fell swoop:
* Wherever we create a new event, pass in the room version (and check it matches the format version)
* When we prune an event, use the room version of the unpruned event to create the pruned version.
* When we pass an event over the replication protocol, pass the room version over alongside it, and use it when deserialising the event again.
This currently causes presence notify code to log exceptions when there
is no state changes to process. This doesn't actually cause any problems
as we'd simply do nothing anyway.
This makes sure we check all the relevant tables to get the current max
stream ID.
Currently not doing so isn't problematic as the max stream ID in
`device_lists_outbound_pokes` is the same as in `device_lists_stream`,
however that will change.
Instead lets just warn if the worker has a media listener configured but
has the media repository disabled.
Previously non media repository workers would just ignore the media
listener.
When we get an invite over federation, store the room version in the rooms table.
The general idea here is that, when we pull the invite out again, we'll want to know what room_version it belongs to (so that we can later redact it if need be). So we need to store it somewhere...
This is intended as a precursor to storing room versions when we receive an
invite over federation, but has the happy side-effect of fixing #3374 at last.
In short: change the store_room with try/except to a proper upsert which
updates the right columns.
* Give `notif_template_html`, `notif_template_text` default values (fixes#6960)
* Don't complain if `smtp_host` and `smtp_port` are unset, since they have sensible defaults (fixes#6961)
* Set the example for `enable_notifs` to `True`, for consistency and because it's more useful
* Raise errors as ConfigError rather than RuntimeError for nicer formatting
The state res v2 algorithm only cares about the difference between auth
chains, so we can pass in the known common state to the `get_auth_chain`
storage function so that it can ignore those events.
* Increase DB/CPU perf of `_is_server_still_joined` check.
For rooms with large amount of state a single user leaving could cause
us to go and load a lot of membership events and then pull out
membership state in a large number of batches.
* Newsfile
* Update synapse/storage/persist_events.py
Co-Authored-By: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
* Fix adding if too soon
* Update docstring
* Review comments
* Woops typo
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
... and set it everywhere it's called.
while we're here, rename it for consistency with `check_user_in_room` (and to
help check that I haven't missed any instances)
* Reject device display names that are too long.
Too long is currently defined as 100 characters in length.
* Add a regression test for rejecting a too long device display name.
Synapse 1.10.0rc3 (2020-02-10)
==============================
Features
--------
- Filter out m.room.aliases from the CS API to mitigate abuse while a better solution is specced. ([\#6878](https://github.com/matrix-org/synapse/issues/6878))
Internal Changes
----------------
- Fix continuous integration failures with old versions of `pip`, which were introduced by a release of the `zipp` library. ([\#6880](https://github.com/matrix-org/synapse/issues/6880))
We're in the middle of properly mitigating spam caused by malicious aliases being added to a room. However, until this work fully lands, we temporarily filter out all m.room.aliases events from /sync and /messages on the CS API, to remove abusive aliases. This is considered acceptable as m.room.aliases events were never a reliable record of the given alias->id mapping and were purely informational, and in their current state do more harm than good.
A lot of the things we log at INFO are now a bit superfluous, so lets
make them DEBUG logs to reduce the amount we log by default.
Co-Authored-By: Brendan Abolivier <babolivier@matrix.org>
Co-authored-by: Brendan Abolivier <github@brendanabolivier.com>
We were looking at the wrong event type (`m.room.encryption` vs
`m.room.encrypted`).
Also fixup the duplicate `EvenTypes` entries.
Introduced in #6776.
I messed this up a bit in #6805, but fortunately we weren't actually doing
anything with the room_version so it didn't matter that it was a str not a RoomVersion.
When a server leaves a room it may stop sharing a room with remote
users, and thus not get any updates to their device lists. So we need to
check for this case and delete those device lists from the cache.
We don't need to do this if we stop sharing a room because the remote
user leaves the room, because we track that case via looking at
membership changes.
If we detect that the remote users' keys may have changed then we should
attempt to resync against the remote server rather than using the
(potentially) stale local cache.
We were sending device updates down both the federation stream and
device streams. This mean there was a race if the federation sender
worker processed the federation stream first, as when the sender checked
if there were new device updates the slaved ID generator hadn't been
updated with the new stream IDs and so returned nothing.
This situation is correctly handled by events/receipts/etc by not
sending updates down the federation stream and instead having the
federation sender worker listen on the other streams and poke the
transaction queues as appropriate.
Otherwise its just stale data, which may get deleted later anyway so
can't be relied on. It's also a bit of a shotgun if we're trying to get
the current state of a room we're not in.
These are easier to work with than the strings and we normally have one around.
This fixes `FederationHander._persist_auth_tree` which was passing a
RoomVersion object into event_auth.check instead of a string.
* Add note that user_dir requires disabling user dir
updates from the main synapse process.
* Add note that federation_reader should have
the federation listener resource.
Signed-off-by: Jason Robinson <jasonr@matrix.org>
There are quite a few places that we assume that a redaction event has a
corresponding `redacts` key, which is not always the case. So lets
cheekily make it so that event.redacts just returns None instead.
The old statement returned `None` for such a `password_config` (like the one
created on first run), thus retrieval of the `pepper` key failed with
`AttributeError`.
Fixes#5315
Signed-off-by: Ivan Vilata i Balaguer <ivan@selidor.net>
* Raise an exception if there are pending background updates
So we return with a non-0 code
* Changelog
* Port synapse_port_db to async/await
* Port update_database to async/await
* Add version string to mocked homeservers
* Remove unused imports
* Convert overseen bits to async/await
* Fixup logging contexts
* Fix imports
* Add a way to print an error without raising an exception
* Incorporate review
Turns out that figuring out a remote user id for the SAML user isn't quite as obvious as it seems. Factor it out to the SamlMappingProvider so that it's easy to control.
Generally try to make this more comprehensible, and make it match the
conventions.
I've removed the documentation for all the settings which allow you to change
the names of the template files, because I can't really see why they are
useful.
* Port synapse.replication.tcp to async/await
* Newsfile
* Correctly document type of on_<FOO> functions as async
* Don't be overenthusiastic with the asyncing....
When figuring out which topological token to start a purge job at, we
need to do the following:
1. Figure out a timestamp before which events will be purged
2. Select the first stream ordering after that timestamp
3. Select info about the first event after that stream ordering
4. Build a topological token from that info
In some situations (e.g. quiet rooms with a short max_lifetime), there
might not be an event after the stream ordering at step 3, therefore we
abort the purge with the error `No event found`. To mitigate that, this
patch fetches the first event _before_ the stream ordering, instead of
after.
Currently we rely on `current_state_events` to figure out what rooms a
user was in and their last membership event in there. However, if the
server leaves the room then the table may be cleaned up and that
information is lost. So lets add a table that separately holds that
information.
AdditionalResource really doesn't add any value, and it gets in the way for
resources which want to support child resources or the like. So, if the
resource object already implements the IResource interface, don't bother
wrapping it.
Add some useful things, such as error types and logcontext handling, to the
API.
Make `hs` a private member to dissuade people from using it (hopefully
they aren't already).
Add a couple of new methods (`record_user_external_id` and
`generate_short_term_login_token`).
This was ill-advised. We can't modify verify_keys here, because the response
object has already been signed by the requested key.
Furthermore, it's somewhat unnecessary because existing versions of Synapse
(which get upset that the notary key isn't present in verify_keys) will fall
back to a direct fetch via `/key/v2/server`.
Also: more tests for fetching keys via perspectives: it would be nice if we actually tested when our fetcher can't talk to our notary impl.
The mount in the form of ./matrix-config:/etc overwrites the contents of the container /etc folder. Since all valid ca certificates are stored in /etc, the synapse.push.httppusher, for example, cannot validate the certificate from matrix.org.
* Kill off redundant SynapseRequestFactory
We already get the Site via the Channel, so there's no need for a dedicated
RequestFactory: we can just use the right constructor.
* Workaround for error when fetching notary's own key
As a notary server, when we return our own keys, include all of our signing
keys in verify_keys.
This is a workaround for #6596.
Synapse 1.7.2 (2019-12-20)
==========================
This release fixes some regressions introduced in Synapse 1.7.0 and 1.7.1.
Bugfixes
--------
- Fix a regression introduced in Synapse 1.7.1 which caused errors when attempting to backfill rooms over federation. ([\#6576](https://github.com/matrix-org/synapse/issues/6576))
- Fix a bug introduced in Synapse 1.7.0 which caused an error on startup when upgrading from versions before 1.3.0. ([\#6578](https://github.com/matrix-org/synapse/issues/6578))
If acme was enabled, the sdnotify startup hook would never be run because we
would try to add it to a hook which had already fired.
There's no need to delay it: we can sdnotify as soon as we've started the
listeners.
* Remove redundant python2 support code
`str.decode()` doesn't exist on python3, so presumably this code was doing
nothing
* Filter out pushers with corrupt data
When we get a row with unparsable json, drop the row, rather than returning a
row with null `data`, which will then cause an explosion later on.
* Improve logging when we can't start a pusher
Log the ID to help us understand the problem
* Make email pusher setup more robust
We know we'll have a `data` member, since that comes from the database. What we
*don't* know is if that is a dict, and if that has a `brand` member, and if
that member is a string.
Previously we tried to be clever and filter out some unnecessary event
IDs to keep the auth chain small, but that had some annoying
interactions with state res v2 so we stop doing that for now.
Previously we tried to be clever and filter out some unnecessary event
IDs to keep the auth chain small, but that had some annoying
interactions with state res v2 so we stop doing that for now.
This fixes a weird bug where, if you were determined enough, you could end up with a rejected event forming part of the state at a backwards-extremity. Authing that backwards extrem would then lead to us trying to pull the rejected event from the db (with allow_rejected=False), which would fail with a 404.
When we request the state/auth_events to populate a backwards extremity (on
backfill or in the case of missing events in a transaction push), we should
check that the returned events are in the right room rather than blindly using
them in the room state or auth chain.
Given that _get_events_from_store_or_dest takes a room_id, it seems clear that
it should be sanity-checking the room_id of the requested events, so let's do
it there.
Make it return the state *after* the requested event, rather than the one
before it. This is a bit easier and requires fewer calls to
get_events_from_store_or_dest.
This fixes a weird bug where, if you were determined enough, you could end up with a rejected event forming part of the state at a backwards-extremity. Authing that backwards extrem would then lead to us trying to pull the rejected event from the db (with allow_rejected=False), which would fail with a 404.
Sometimes the filtering function can return a pruned version of an event (on top of either the event itself or an empty list), if it thinks the user should be able to see that there's an event there but not the content of that event. Therefore, the previous logic of 'if filtered is empty then we can use the event we retrieved from the database' is flawed, and we should use the event returned by the filtering function.
When we request the state/auth_events to populate a backwards extremity (on
backfill or in the case of missing events in a transaction push), we should
check that the returned events are in the right room rather than blindly using
them in the room state or auth chain.
Given that _get_events_from_store_or_dest takes a room_id, it seems clear that
it should be sanity-checking the room_id of the requested events, so let's do
it there.
Make it return the state *after* the requested event, rather than the one
before it. This is a bit easier and requires fewer calls to
get_events_from_store_or_dest.
PaginationHandler.get_messages is only called by RoomMessageListRestServlet,
which is async.
Chase the code path down from there:
- FederationHandler.maybe_backfill (and nested try_backfill)
- FederationHandler.backfill
`Measure` incorrectly assumed that it was the only thing being done by the parent `LoggingContext`. For instance, during a "renew group attestations" operation, hundreds of `outbound_request` calls could take place in parallel, all using the same `LoggingContext`. This would mean that any resources used during *any* of those calls would be reported against *all* of them, producing wildly inaccurate results.
Instead, we now give each `Measure` block its own `LoggingContext` (using the parent `LoggingContext` mechanism to ensure that the log lines look correct and that the metrics are ultimately propogated to the top level for reporting against requests/backgrond tasks).
have_events was a map from event_id to rejection reason (or None) for events
which are in our local database. It was used as filter on the list of
event_ids being passed into get_events_as_list. However, since
get_events_as_list will ignore any event_ids that are unknown or rejected, we
can equivalently just leave it to get_events_as_list to do the filtering.
That means that we don't have to keep `have_events` up-to-date, and can use
`have_seen_events` instead of `get_seen_events_with_rejection` in the one place
we do need it.
Implement part [MSC2228](https://github.com/matrix-org/matrix-doc/pull/2228). The parts that differ are:
* the feature is hidden behind a configuration flag (`enable_ephemeral_messages`)
* self-destruction doesn't happen for state events
* only implement support for the `m.self_destruct_after` field (not the `m.self_destruct` one)
* doesn't send synthetic redactions to clients because for this specific case we consider the clients to be able to destroy an event themselves, instead we just censor it (by pruning its JSON) in the database
Purge jobs don't delete the latest event in a room in order to keep the forward extremity and not break the room. On the other hand, get_state_events, when given an at_token argument calls filter_events_for_client to know if the user can see the event that matches that (sync) token. That function uses the retention policies of the events it's given to filter out those that are too old from a client's view.
Some clients, such as Riot, when loading a room, request the list of members for the latest sync token it knows about, and get confused to the point of refusing to send any message if the server tells it that it can't get that information. This can happen very easily with the message retention feature turned on and a room with low activity so that the last event sent becomes too old according to the room's retention policy.
An easy and clean fix for that issue is to discard the room's retention policies when retrieving state.
Synapse 1.6.0rc2 (2019-11-25)
=============================
Bugfixes
--------
- Fix a bug which could cause the background database update hander for event labels to get stuck in a loop raising exceptions. ([\#6407](https://github.com/matrix-org/synapse/issues/6407))
Doing a SELECT DISTINCT when paginating is quite expensive, because it requires the engine to do sorting on the entire events table. However, we only need to run it if we're filtering on 2+ labels, so this PR is changing the request so that DISTINCT is only used then.
We were doing this in a number of places which meant that some login
code paths incremented the counter multiple times.
It was also applying ratelimiting to UIA endpoints, which was probably
not intentional.
In particular, some custom auth modules were calling
`check_user_exists`, which incremented the counters, meaning that people
would fail to login sometimes.
Fixes a bug where rejected events were persisted with the wrong state group.
Also fixes an occasional internal-server-error when receiving events over
federation which are rejected and (possibly because they are
backwards-extremities) have no prev_group.
Fixes#6289.
* Raise an exception if accessing state for rejected events
Add some sanity checks on accessing state_group etc for
rejected events.
* Skip calculating push actions for rejected events
It didn't actually cause any bugs, because rejected events get filtered out at
various later points, but there's not point in trying to calculate the push
actions for a rejected event.
When the `/keys/query` API is hit on client_reader worker Synapse may
decide that it needs to resync some remote deivces. Usually this happens
on master, and then gets cached. However, that fails on workers and so
it falls back to fetching devices from remotes directly, which may in
turn fail if the remote is down.
While the current version of the spec doesn't say much about how this endpoint uses filters (see https://github.com/matrix-org/matrix-doc/issues/2338), the current implementation is that some fields of an EventFilter apply (the ones that are used when running the SQL query) and others don't (the ones that are used by the filter itself) because we don't call event_filter.filter(...). This seems counter-intuitive and probably not what we want so this commit fixes it.
The intention here is to make it clearer which fields we can expect to be
populated when: notably, that the _event_type etc aren't used for the
synchronous impl of EventContext.
* Add lint dependencies black, flake8 and isort
These are required when running the `lint.sh` dev scripts.
Signed-off-by: Jason Robinson <jasonr@matrix.org>
* Add contributer docs for using the providers linters script
Add also to the pull request template to avoid build failures due
to people not knowing that linters need running.
Signed-off-by: Jason Robinson <jasonr@matrix.org>
* Fix mention of linter errors correction
Co-Authored-By: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
* Add mention for installing linter dependencies
Co-Authored-By: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
* Remove linters from python dependencies as per PR review
Signed-off-by: Jason Robinson <jasonr@matrix.org>
The `http_proxy` and `HTTPS_PROXY` env vars can be set to a `host[:port]` value which should point to a proxy.
The address of the proxy should be excluded from IP blacklists such as the `url_preview_ip_range_blacklist`.
The proxy will then be used for
* push
* url previews
* phone-home stats
* recaptcha validation
* CAS auth validation
It will *not* be used for:
* Application Services
* Identity servers
* Outbound federation
* In worker configurations, connections from workers to masters
Fixes#4198.
* Offer the homeserver instance to the spam checker
* Newsfile
* Linting
* Expose a Spam Checker API instead of passing the homeserver object
* Alter changelog
* s/hs/api
This makes it easier to use in an async/await world.
Also fixes a bug where cache descriptors would occaisonally return a raw
value rather than a deferred.
* Don't use a virtualenv
* Generate the server's signing key to allow it to start
* Add signing key paths to CI configuration files
* Use a Python script to create the postgresql database
* Improve logging
This adds:
* a test sqlite database
* a configuration file for the sqlite database
* a configuration file for a postgresql database (using the credentials in `.buildkite/docker-compose.pyXX.pgXX.yaml`)
as well as a new script named `.buildkite/scripts/test_synapse_port_db.sh` that:
1. installs Synapse
2. updates the test sqlite database to the latest schema and runs background updates on it
3. creates an empty postgresql database
4. run the `synapse_port_db` script to migrate the test sqlite database to the empty postgresql database (with coverage)
Step `2` is done via a new script located at `scripts-dev/update_database`.
The test sqlite database is extracted from a SyTest run, so that it can be considered as an actual homeserver's database with actual data in it.
Make `synapse_port_db` correctly create indexes in the PostgreSQL database, by having it run the background updates on the database before migrating the data.
To ensure we're migrating the right data, also block the port if the SQLite3 database still has pending or ongoing background updates.
Fixes#4877
The synapse demo was a bit flakey in terms of supporting federation. It turns out that if your computer resolved `localhost` to `::1` instead of `127.0.0.1`, the built-in federation blacklist specified in `start.sh` would still block it, since it contained an entry for `::/127`. Removing this no longer prevents Synapse from contacting `::1`, federation works again on these boxes.
* Delete format_tap.py
This python implementation of a tap formatting library for buildkite has been
replaced with a perl implementation as part of the matrix-org/sytest repo,
which is specific to sytest's language, not that of any one homeserver's.
Turns out that loggers that are instantiated before the config is loaded get
turned off.
Also bring the logging config that is generated by --generate-config into line.
Fixes#6194.
Synapse 1.4.1 (2019-10-18)
==========================
No changes since 1.4.1rc1.
Synapse 1.4.1rc1 (2019-10-17)
=============================
Bugfixes
--------
- Fix bug where redacted events were sometimes incorrectly censored in the database, breaking APIs that attempted to fetch such events. ([\#6185](https://github.com/matrix-org/synapse/issues/6185), [5b0e9948](5b0e9948ea))
* Fix presence timeouts when synchrotron restarts.
Handling timeouts would fail if there was an external process that had
timed out, e.g. a synchrotron restarting. This was due to a couple of
variable name typoes.
Fixes#3715.
Synapse 1.4.1rc1 (2019-10-17)
=============================
Bugfixes
--------
- Fix bug where redacted events were sometimes incorrectly censored in the database, breaking APIs that attempted to fetch such events. ([\#6185](https://github.com/matrix-org/synapse/issues/6185), [5b0e9948](5b0e9948ea))
This fixed the weirdness of 400 vs 404 as http status code in the case
the filter id is not known by the server.
As e.g. matrix-js-sdk expects 404 to catch this situation this leads
to unwanted behaviour.
Hopefully this will fix the occasional failures we were seeing in the room directory.
The problem was that events are not necessarily persisted (and `current_state_delta_stream` updated) in the same order as their stream_id. So for instance current_state_delta 9 might be persisted *before* current_state_delta 8. Then, when the room stats saw stream_id 9, it assumed it had done everything up to 9, and never came back to do stream_id 8.
We can solve this easily by only processing up to the stream_id where we know all events have been persisted.
It turns out that _local_membership_update doesn't run when you join a new, remote room. It only runs if you're joining a room that your server already knows about. This would explain #4703 and #5295 and why the transfer would work in testing and some rooms, but not others. This would especially hit single-user homeservers.
The check has been moved to right after the room has been joined, and works much more reliably. (Though it may still be a bit awkward of a place).
More often than not passing bytes to `txn.execute` is a bug (where we
meant to pass a string) that just happens to work if `BYTEA_OUTPUT` is
set to `ESCAPE`. However, this is a bit of a footgun so we want to
instead error when this happens, and force using `bytearray` if we
actually want to use bytes.
* Fix /federation/v1/state for recent room versions
Turns out this endpoint was completely broken for v3 rooms. Hopefully this
re-signing code is irrelevant nowadays anyway.
Pillow will use nearest neighbour as the resampling algorithm if the
source image is either 1-bit or a color palette using 8 bits. If we
convert to RGB before scaling, we'll probably get a better result.
Replace `client_secret` query parameter values with `<redacted>` in the logs. Prevents a scenario where a MITM of server traffic can horde 3pids on their account.
We incorrectly used `room_id` as to bound the result set, even though we
order by `joined_members, room_id`, leading to incorrect results after
pagination.
Copy push rules during a room upgrade from the old room to the new room, instead of deleting them from the old room.
For instance, we've defined upgrading of a room multiple times to be possible, and push rules won't be transferred on the second upgrade if they're deleted during the first.
Also fix some missing yields that probably broke things quite a bit.
While this is not documented in the spec (but should be), Riot (and other clients) revoke 3PID invites by sending a m.room.third_party_invite event with an empty ({}) content to the room's state.
When the invited 3PID gets associated with a MXID, the identity server (which doesn't know about revocations) sends down to the MXID's homeserver all of the undelivered invites it has for this 3PID. The homeserver then tries to talk to the inviting homeserver in order to exchange these invite for m.room.member events.
When one of the invite is revoked, the inviting homeserver responds with a 500 error because it tries to extract a 'display_name' property from the content, which is empty. This might cause the invited server to consider that the server is down and not try to exchange other, valid invites (or at least delay it).
This fix handles the case of revoked invites by avoiding trying to fetch a 'display_name' from the original invite's content, and letting the m.room.member event fail the auth rules (because, since the original invite's content is empty, it doesn't have public keys), which results in sending a 403 with the correct error message to the invited server.
We have set the max retry interval to a value larger than a postgres or
sqlite int can hold, which caused exceptions when updating the
destinations table.
To fix postgres we need to change the column to a bigint, and for sqlite
we lower the max interval to 2**62 (which is still incredibly long).
Joining against `events` and ordering by `stream_ordering` is
inefficient as it forced scanning the entirety of the redactions table.
This isn't the case if we use `redactions.received_ts` column as we can
then use an index.
Currently we don't set `have_censored` column if we don't have the
target event of a redaction, which means we repeatedly attempt to censor
the same non-existant event.
When we persist non-redacted events we unset the `have_censored` column
for any redactions that target said event.
Doing a password reset via SMS has never worked, and in any case is a silly
idea because msisdn recycling is a thing.
See also matrix-org/matrix-doc#2303.
Pull the checkers out to their own classes, rather than having them lost in a
massive 1000-line class which does everything.
This is also preparation for some more intelligent advertising of flows, as per #6100
We don't actually care about what happens in `get_user_by_req` and
having it as a separate span means that the entity tag isn't added to
the servlet spans, making it harder to search.
because, frankly, it looked like it was written by an axe-murderer.
This should be a non-functional change, except that where `m.login.dummy` was
previously advertised *before* `m.login.terms`, it will now be advertised
afterwards. AFAICT that should have no effect, and will be more consistent with
the flows that involve passing a 3pid.
Second part of solving #6076Fixes#6076
We return a submit_url parameter on calls to POST */msisdn/requestToken so that clients know where to submit token information to.
Uses a SimpleHttpClient instance equipped with the federation_ip_range_blacklist list for requests to identity servers provided by user input. Does not use a blacklist when contacting identity servers specified by account_threepid_delegates. The homeserver trusts the latter and we don't want to prevent homeserver admins from specifying delegates that are on internal IP addresses.
Fixes#5935
As MSC2263 states, m.require_identity_server must be set to false when it does not require an identity server to be provided by the client for the purposes of email registration or password reset.
Adds an m.require_identity_server flag to /versionss unstable_flags section. This will advertise that Synapse no longer needs id_server as a parameter.
This is a) simpler than querying user_ips directly and b) means we can
purge older entries from user_ips without losing the required info.
The storage functions now no longer return the access_token, since it
was unused.
Implements MSC2290. This PR adds two new endpoints, /unstable/account/3pid/add and /unstable/account/3pid/bind. Depending on the progress of that MSC the unstable prefix may go away.
This PR also removes the blacklist on some 3PID tests which occurs in #6042, as the corresponding Sytest PR changes them to use the new endpoints.
Finally, it also modifies the account deactivation code such that it doesn't just try to deactivate 3PIDs that were bound to the user's account, but any 3PIDs that were bound through the homeserver on that user's account.
Fixes#6066
This register endpoint should be disabled if registration is disabled, otherwise we're giving anyone the ability to check if a username exists on a server when we don't need to be.
Error code is 403 (Forbidden) as that's the same returned by /register when registration is disabled.
In ancient times Synapse would only send emails when it was notifying a user about a message they received...
Now it can do all sorts of neat things!
Change the logging so it's not just about notifications.
Remove trailing slash ability from the password reset submit_token endpoint. Since we provide the link in an email, and have never sent it with a trailing slash, there's no point for us to accept them on the endpoint.
The validation links sent via email had their query parameters inserted without any URL-encoding. Surprisingly this didn't seem to cause any issues, but if a user were to put a `/` in their client_secret it could lead to problems.
* Allow passing SYNAPSE_WORKER envvar
* changelog.d
* Document SYNAPSE_WORKER.
Attempting to imply that you don't need to change this default
unless you're in worker mode.
Also aware that there's a bigger problem of attempting to document
a complete working configuration of workers using docker, as we
currently only document to use `synctl` for worker mode, and synctl
doesn't work that way in docker.
Fixes a bug where the default attribute maps were prioritised over
user-specified ones, resulting in incorrect mappings.
The problem is that if you call SPConfig.load() multiple times, it adds new
attribute mappers to a list. So by calling it with the default config first,
and then the user-specified config, we would always get the default mappers
before the user-specified mappers.
To solve this, let's merge the config dicts first, and then pass them to
SPConfig.
* make it clear that if you installed from a package manager, you should use
that to upgrade
* Document the new way of getting the server version (cf #4878)
* Write some words about downgrading.
This is a partial revert of #5893. The problem is that if we drop these tables
in the same release as removing the code that writes to them, it prevents users
users from being able to roll back to a previous release.
So let's leave the tables in place for now, and remember to drop them in a
subsequent release.
(Note that these tables haven't been *read* for *years*, so any missing rows
resulting from a temporary upgrade to vNext won't cause a problem.)
Removes the POST method from `/password_reset/<medium>/submit_token/` as it's only used by phone number verification which Synapse does not support yet.
This is a potential solution to https://github.com/vector-im/riot-web/issues/3374
and https://github.com/vector-im/riot-web/issues/5953
as raised by Mozilla at https://github.com/vector-im/riot-web/issues/10868.
This lets you define a push rule action which increases the badge count (unread notification)
count on a given room, but doesn't actually send a push for that notification via email or HTTP.
We might want to define this as the default behaviour for group chats in future
to solve https://github.com/vector-im/riot-web/issues/3268 at last.
This is implemented as a string action rather than a tweak because:
* Other pushers don't care about the tweak, given they won't ever get pushed
* The DB can store the tweak more efficiently using the existing `notify` table.
* It avoids breaking the default_notif/highlight_action optimisations.
Clients which generate their own notifs (e.g. desktop notifs from Riot/Web
would need to be aware of the new push action) to uphold it.
An alternative way to do this would be to maintain a `msg_count` alongside
`highlight_count` and `notification_count` in `unread_notifications` in sync responses.
However, doing this by counting the rows in `events` since the `stream_position`
of the user's last read receipt turns out to be painfully slow (~200ms), perhaps
due to the size of the events table. So instead, we use the highly optimised
existing event_push_actions (and event_push_actions_staging) table to maintain
the counts - using the code paths which already exist for tracking unread
notification counts efficiently. These queries are typically ~3ms or so.
The biggest issues I see here are:
* We're slightly repurposing the `notif` field on `event_push_actions` to
track whether a given action actually sent a `push` or not. This doesn't
seem unreasonable, but it's slightly naughty given that previously the
field explicitly tracked whether `notify` was true for the action (and
as a result, it was uselessly always set to 1 in the DB).
* We're going to put more load on the `event_push_actions` table for all the
random group chats which people had previously muted. In practice i don't
think there are many of these though.
* There isn't an MSC for this yet (although this comment could become one).
3PID invites require making a request to an identity server to check that the invited 3PID has an Matrix ID linked, and if so, what it is.
These requests are being made on behalf of a user. The user will supply an identity server and an access token for that identity server. The homeserver will then forward this request with the access token (using an `Authorization` header) and, if the given identity server doesn't support v2 endpoints, will fall back to v1 (which doesn't require any access tokens).
Requires: ~~#5976~~
Broke in #5971
Basically the bug is that if get_current_state_deltas returns no new updates and we then take the max pos, its possible that we miss an update that happens in between the two calls. (e.g. get_current_state_deltas looks up to stream pos 5, then an event persists and so getting the max stream pos returns 6, meaning that next time we check for things with a stream pos bigger than 6)
We want to assign unique mxids to saml users based on an incrementing
suffix. For that to work, we need to record the allocated mxid in a separate
table.
This allows support users to be created even on MAU limits via
the admin API. Support users are excluded from MAU after creation,
so it makes sense to exclude them in creation - except if the
whole host is in disabled state.
Signed-off-by: Jason Robinson <jasonr@matrix.org>
Some small fixes to `room_member.py` found while doing other PRs.
1. Add requester to the base `_remote_reject_invite` method.
2. `send_membership_event`'s docstring was out of date and took in a `remote_room_hosts` arg that was not used and no calling function provided.
Another small fixup noticed during work on a larger PR. The `origin` field of `add_display_name_to_third_party_invite` is not used and likely was just carried over from the `on_PUT` method of `FederationThirdPartyInviteExchangeServlet` which, like all other servlets, provides an `origin` argument.
Since it's not used anywhere in the handler function though, we should remove it from the function arguments.
Previously if the first registered user was a "support" or "bot" user,
when the first real user registers, the auto-join rooms were not
created.
Fix to exclude non-real (ie users with a special user type) users
when counting how many users there are to determine whether we should
auto-create a room.
Signed-off-by: Jason Robinson <jasonr@matrix.org>
This is a combination of a few different PRs, finally all being merged into `develop`:
* #5875
* #5876
* #5868 (This one added the `/versions` flag but the flag itself was actually [backed out](891afb57cb (diff-e591d42d30690ffb79f63bb726200891)) in #5969. What's left is just giving /versions access to the config file, which could be useful in the future)
* #5835
* #5969
* #5940
Clients should not actually use the new registration functionality until https://github.com/matrix-org/synapse/pull/5972 is merged.
UPGRADE.rst, changelog entries and config file changes should all be reviewed closely before this PR is merged.
* Ensure the list media admin API is always available
This API is required for some external media repo implementations to operate (mostly for doing quarantine operations on a room).
* changelog
Remove all the "double return" statements which were a result of us removing all the instances of
```
defer.returnValue(...)
return
```
statements when we switched to python3 fully.
Python will return a tuple whether there are parentheses around the returned values or not.
I'm just sick of my editor complaining about this all over the place :)
Template config files
* Imagine a system composed entirely of x, y, z etc and the basic operations..
Wait George, why XOR? Why not just neq?
George: Eh, I didn't think of that..
Co-Authored-By: Erik Johnston <erik@matrix.org>
Some of the caches on worker processes were not being correctly invalidated
when a room's state was changed in a way that did not affect the membership
list of the room.
We need to make sure we send out cache invalidations even when no memberships
are changing.
Propagate opentracing contexts across workers
Also includes some Convenience modifications to opentracing for servlets, notably:
- Add boolean to skip the whitelisting check on inject
extract methods. - useful when injecting into carriers
locally. Otherwise we'd always have to include our
own servername and whitelist our servername
- start_active_span_from_request instead of header
- Add boolean to decide whether to extract context
from a request to a servlet
This type of registration was probably never used. It only includes the
user name in the HMAC but not the password.
Shared secret registration is still available via
client/r0/admin/register.
Signed-off-by: Manuel Stahl <manuel.stahl@awesome-technologies.de>
There's no point doing a raise_from here, because the exception is always
logged at warn with no stacktrace in the caller. Instead, let's try to give
better messages to reduce confusion.
In particular, this means that we won't log 'Failed to connect to remote
server' when we don't even attempt to connect to the remote server due to
blacklisting.
Get rid of the labyrinthine `recoverer_fn` code, and clean up the startup code
(it seemed to be previously inexplicably split between
`ApplicationServiceScheduler.start` and `_Recoverer.start`).
Add some docstrings too.
Hopefully, this will fix a stack overflow when recovering an appservice.
The recursion here leads to a huge chain of deferred callbacks, which then
overflows the stack when the chain completes. `inlineCallbacks` makes a better
job of this if we use iteration instead.
Clean up the code a bit too, while we're there.
Get rid of the labyrinthine `recoverer_fn` code, and clean up the startup code
(it seemed to be previously inexplicably split between
`ApplicationServiceScheduler.start` and `_Recoverer.start`).
Add some docstrings too.
Add authenticated_entity and servlet_names tags.
Functionally:
- Add a tag for authenticated_entity
- Add a tag for servlet_names
Stylistically:
Moved to importing methods directly from opentracing.
Fixes#5833
The emailconfig code was attempting to pull incorrect config file names. This corrects that, while also marking a difference between a config file variable that's a filepath versus a str containing HTML.
This refactors MatrixFederationAgent to move the SRV lookup into the
endpoint code, this has two benefits:
1. Its easier to retry different host/ports in the same way as
HostnameEndpoint.
2. We avoid SRV lookups if we have a free connection in the pool
If we have recently seen a valid well-known for a domain we want to
retry on (non-final) errors a few times, to handle temporary blips in
networking/etc.
is cached and so does not always return a `Deferred`.
`await` does not silently pass-through non-Deferreds like `yield` used to.
Signed-off-by: Olivier Wilkinson (reivilibre) <olivier@librepush.net>
This gives a bit of a grace period where we can attempt to refetch a
remote `well-known`, while still using the cached result if that fails.
Hopefully this will make the well-known resolution a bit more torelant
of failures, rather than it immediately treating failures as "no result"
and caching that for an hour.
* allow devices to be marked as "hidden"
This is a prerequisite for cross-signing, as it allows us to create other things
that live within the device namespace, so they can be used for signatures.
It costs both us and the remote server for us to fetch the well known
for every single request we send, so we add a minimum cache period. This
is set to 5m so that we still honour the basic premise of "refetch
frequently".
When persisting events we calculate new stream orderings up front.
Before we notify about an event all events with lower stream orderings
must have finished being persisted.
This PR moves the assignment of stream ordering till *after* calculated
the new current state and split the batch of events into separate chunks
for persistence. This means that if it takes a long time to calculate
new current state then it will not block events in other rooms being
notified about.
This should help reduce some global pauses in the events stream which
can last for tens of seconds (if not longer), caused by some
particularly expensive state resolutions.
This hopefully addresses #5407 by gracefully handling an empty but
limited TimelineBatch. We also add some logging to figure out how this
is happening.
This is intended as an amendment to #5674 as using M_UNKNOWN as the errcode makes it hard for clients to differentiate between an invalid password and a deactivated user (the problem we were trying to solve in the first place).
M_UNKNOWN was originally chosen as it was presumed than an MSC would have to be carried out to add a new code, but as Synapse often is the testing bed for new MSC implementations, it makes sense to try it out first in the wild and then add it into the spec if it is successful. Thus this PR return a new M_USER_DEACTIVATED code when a deactivated user attempts to login.
The double posting is really annoying, and I don't think anyone is
actually reading them. The commit statuses should give a good summary
and will link to a full report.
The `expire_access_token` didn't do what it sounded like it should do. What it
actually did was make Synapse enforce the 'time' caveat on macaroons used as
access tokens, but since our access token macaroons never contained such a
caveat, it was always a no-op.
(The code to add 'time' caveats was removed back in v0.18.5, in #1656)
* Fix debian packages for sid being called buster.
I don't know why the sid images return buster as its codename in
`lsb_release` but it does, so lets just grab the codename from the
distro we pass into dockerfile
* Newsfile
Annoyingly, `current_state_events` table can include rejected events,
in which case the membership column will be null. To work around this
lets just always filter out null membership for now.
There was some inconsistent behaviour in the caching layer around how
exceptions were handled - particularly synchronously-thrown ones.
This seems to be most easily handled by pushing the creation of
ObservableDeferreds down from CacheDescriptor to the Cache.
This is a prerequisite for cross-signing, as it allows us to create other things
that live within the device namespace, so they can be used for signatures.
`None` is not a valid event id, so queuing up a database fetch for it seems
like a silly thing to do.
I considered making `get_event` return `None` if `event_id is None`, but then
its interaction with `allow_none` seemed uninituitive, and strong typing ftw.
This will allow us to efficiently filter out rooms that have been
forgotten in other queries without having to join against the
`room_memberships` table.
* Add decerators for tracing functions
* Use the new clean contexts
* Context and edu utils
* Move opentracing setters
* Move whitelisting
* Sectioning comments
* Better args wrapper
* Docstrings
Co-Authored-By: Erik Johnston <erik@matrix.org>
* Remove unused methods.
* Don't use global
* One tracing decorator to rule them all.
* Refactor Keyring._start_key_lookups
There's an awful lot of deferreds and dictionaries flying around here. The
whole thing can be made much simpler and achieve the same effect.
* Add a delay to key lookup lock release to fix stack overflow
A tactical call_later here should fix#5723
* changelog
The version of a module isn't going to change over the lifetime of the
process (assuming no funky hot reloading is going on, which it isn't),
so let's just cache the result to avoid spawning lots of git
subprocesses.
Fixes#5672.
* Opentracing survival guide
* Update decorator names in doc
* Doc cleanup
These are all alterations as a result of comments in #5703, it
includes mostly typos and clarifications. The most interesting
changes are:
- Split developer and user docs into two sections
- Add a high level description of OpenTracing
* newsfile
* Move contributer specific info to docstring.
* Sample config.
* Trailing whitespace.
* Update 5703.misc
* Apply suggestions from code review
Mostly just rewording parts of the docs for clarity.
Co-Authored-By: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
Clean up config settings and dead code.
This is mostly about cleaning up the config format, to bring it into line with our conventions. In particular:
* There should be a blank line after `## Section ##' headings
* There should be a blank line between each config setting
* There should be a `#`-only line between a comment and the setting it describes
* We don't really do the `# #` style commenting-out of whole sections if we can help it
* rename `tracer_enabled` to `enabled`
While we're here, do more config parsing upfront, which makes it easier to use
later on.
Also removes redundant code from LogContextScopeManager.
Also changes the changelog fragment to a `feature` - it's exciting!
* Convert BaseFederationServlet._wrap to async
Empirically, this fixes some lost stacktraces. It should be safe because the
wrapped function is called from JsonResource._async_render, which is already
async.
* Convert the rest of synapse.federation.transport.server to async
We may as well do the whole file while we're here.
* changelog
* flake8
This is basically a contrived way of adding a `Recommends` on `libpq5`, to fix#5653.
The way this is supposed to happen in debhelper is to run
`dh_shlibdeps`, which in turn runs `dpkg-shlibdeps`, which spits things out
into `debian/<package>.substvars` whence they can later be included by
`control`.
Previously, we had disabled `dh_shlibdeps`, mostly because `dpkg-shlibdeps`
gets confused about PIL's interdependent objects, but that's not really the
right thing to do and there is another way to work around that.
Since we don't always use postgres, we don't necessarily want a hard Depends on
libpq5, so I've actually ended up adding an explicit invocation of
`dpkg-shlibdeps` for `psycopg2`.
I've also updated the build-depends list for the package, which was missing a
couple of entries.
We can now use `_get_events_from_cache_or_db` rather than going right back to
the database, which means that (a) we can benefit from caching, and (b) it
opens the way forward to more extensive checks on the original event.
We now always require the original event to exist before we will serve up a
redaction.
Ensures that redactions are correctly authenticated for recent room versions.
There are a few things going on here:
* `_fetch_event_rows` is updated to return a dict rather than a list of rows.
* Rather than returning multiple copies of an event which was redacted
multiple times, it returns the redactions as a list within the dict.
* It also returns the actual rejection reason, rather than merely the fact
that it was rejected, so that we don't have to query the table again in
`_get_event_from_row`.
* The redaction handling is factored out of `_get_event_from_row`, and now
checks if any of the redactions are valid.
A couple of changes here:
* get rid of a redundant `allow_rejected` condition - we should already have filtered out any rejected
events before we get to that point in the code, and the redundancy is confusing. Instead, let's stick in
an assertion just to make double-sure we aren't leaking rejected events by mistake.
* factor out a `_get_events_from_cache_or_db` method, which is going to be important for a
forthcoming fix to redactions.
Alpine Linux 3.8 is still supported, but it seems like
it's quite outdated now.
While Python should be the same on both, all other libraries, etc.,
are much newer in Alpine 3.9 and 3.10.
Signed-off-by: Slavi Pantaleev <slavi@devture.com>
First of all, let's get rid of `TOKEN_NOT_FOUND_HTTP_STATUS`. It was a hack we
did at one point when it was possible to return either a 403 or a 401 if the
creds were missing. We always return a 401 in these cases now (thankfully), so
it's not needed.
Let's also stop abusing `AuthError` for these cases. Honestly they have nothing
that relates them to the other places that `AuthError` is used, other than the
fact that they are loosely under the 'Auth' banner. It makes no sense for them
to share exception classes.
Instead, let's add a couple of new exception classes: `InvalidClientTokenError`
and `MissingClientTokenError`, for the `M_UNKNOWN_TOKEN` and `M_MISSING_TOKEN`
cases respectively - and an `InvalidClientCredentialsError` base class for the
two of them.
* Configure and initialise tracer
Includes config options for the tracer and sets up JaegerClient.
* Scope manager using LogContexts
We piggy-back our tracer scopes by using log context.
The current log context gives us the current scope. If new scope is
created we create a stack of scopes in the context.
* jaeger is a dependency now
* Carrier inject and extraction for Twisted Headers
* Trace federation requests on the way in and out.
The span is created in _started_processing and closed in
_finished_processing because we need a meaningful log context.
* Create logcontext for new scope.
Instead of having a stack of scopes in a logcontext we create a new
context for a new scope if the current logcontext already has a scope.
* Remove scope from logcontext if logcontext is top level
* Disable tracer if not configured
* typo
* Remove dependence on jaeger internals
* bools
* Set service name
* :Explicitely state that the tracer is disabled
* Black is the new black
* Newsfile
* Code style
* Use the new config setup.
* Generate config.
* Copyright
* Rename config to opentracing
* Remove user whitelisting
* Empty whitelist by default
* User ConfigError instead of RuntimeError
* Use isinstance
* Use tag constants for opentracing.
* Remove debug comment and no need to explicitely record error
* Two errors a "s(c)entry"
* Docstrings!
* Remove debugging brainslip
* Homeserver Whitlisting
* Better opentracing config comment
* linting
* Inclue worker name in service_name
* Make opentracing an optional dependency
* Neater config retreival
* Clean up dummy tags
* Instantiate tracing as object instead of global class
* Inlcude opentracing as a homeserver member.
* Thread opentracing to the request level
* Reference opetnracing through hs
* Instantiate dummy opentracin g for tests.
* About to revert, just keeping the unfinished changes just in case
* Revert back to global state, commit number:
9ce4a3d9067bf9889b86c360c05ac88618b85c4f
* Use class level methods in tracerutils
* Start and stop requests spans in a place where we
have access to the authenticated entity
* Seen it, isort it
* Make sure to close the active span.
* I'm getting black and blue from this.
* Logger formatting
Co-Authored-By: Erik Johnston <erik@matrix.org>
* Outdated comment
* Import opentracing at the top
* Return a contextmanager
* Start tracing client requests from the servlet
* Return noop context manager if not tracing
* Explicitely say that these are federation requests
* Include servlet name in client requests
* Use context manager
* Move opentracing to logging/
* Seen it, isort it again!
* Ignore twisted return exceptions on context exit
* Escape the scope
* Scopes should be entered to make them useful.
* Nicer decorator names
* Just one init, init?
* Don't need to close something that isn't open
* Docs make you smarter
this is only used in one place, so it's clearer if we inline it and reduce the
API surface.
Also, fixes a buglet where we would create an access token even if we were
about to block the user (we would never return the AT, so the user could never
use it, but it was still created and added to the db.)
A fix for PR #5626, which returned the original event content as part of a call to /relations.
Only problem was that we were attempting to aggregate the relations on top of it when we did so. We now set bundle_aggregations to False in the get_event call.
We also do this when pulling the relation events as well, because edits of edits are not something we'd like to support here.
FederationDeniedError is a subclass of SynapseError, which is a subclass of
CodeMessageException, so if e is a FederationDeniedError, then this check for
FederationDeniedError will never be reached since it will be caught by the
check for CodeMessageException above. The check for CodeMessageException does
almost the same thing as this check (since FederationDeniedError initialises
with code=403 and msg="Federation denied with %s."), so may as well just keep
allowing it to handle this case.
When asking for the relations of an event, include the original event in the response. This will mostly be used for efficiently showing edit history, but could be useful in other circumstances.
Nothing uses this now, so we can remove the dead code, and clean up the
API.
Since we're changing the shape of the return value anyway, we take the
opportunity to give the method a better name.
When a user creates an account and the 'require_auth_for_profile_requests' config flag is set, and a client that performed the registration wants to lookup the newly-created profile, the request will be denied because the user doesn't share a room with themselves yet.
This has never been documented, and I'm not sure it's ever been used outside
sytest.
It's quite a lot of poorly-maintained code, so I'd like to get rid of it.
For now I haven't removed the database table; I suggest we leave that for a
future clearout.
- Put the default window_size back to 1000ms (broken by #5181)
- Make the `rc_federation` config actually do something
- fix an off-by-one error in the 'concurrent' limit
- Avoid creating an unused `_PerHostRatelimiter` object for every single
incoming request
The runtime errors that dealt with local email password resets talked about config options that users may not even have in their config file yet (if upgrading). Instead, the cryptic errors are now replaced with hopefully much more helpful ones.
* SAML2 Improvements and redirect stuff
Signed-off-by: Alexander Trost <galexrt@googlemail.com>
* Code cleanups and simplifications.
Also: share the saml client between redirect and response handlers.
* changelog
* Revert redundant changes to static js
* Move all the saml stuff out to a centralised handler
* Add support for tracking SAML2 sessions.
This allows us to correctly handle `allow_unsolicited: False`.
* update sample config
* cleanups
* update sample config
* rename BaseSSORedirectServlet for consistency
* Address review comments
This would cause emails being sent, but Synapse responding with a 429 when creating the event. The client would then retry, and with bad timing the same scenario would happen again. Some testing I did ended up sending me 10 emails for one single invite because of this.
This is mostly a documentation change, but also adds a default value for
SYNAPSE_CONFIG_PATH, so that running from the generated config is the default,
and will Just Work provided your config is in the right place.
When running under docker, we want to use docker's own logging stuff rather
than losing the logs somewhere on the container's filesystem, so let's use log
configs that spit logs out to stdout instead.
We don't want to generate any missing configs when running from a precanned
config.
(There's a strong argument that we don't want to do this at all, since
generating a new signing key on each invocation sounds disasterous, but I don't
fancy unpicking that for now.)
When a client asks for users whose devices have changed since a token we
used to pull *all* users from the database since the token, which could
easily be thousands of rows for old tokens.
This PR changes this to only check for changes for users the client is
actually interested in.
Fixes#5553
Closes#4583
Does slightly less than #5045, which prevented a room from being upgraded multiple times, one after another. This PR still allows that, but just prevents two from happening at the same time.
Mostly just to mitigate the fact that servers are slow and it can take a moment for the room upgrade to actually complete. We don't want people sending another request to upgrade the room when really they just thought the first didn't go through.
Updates the v1.0.0 changelog to provide more information to those upgrading to v1.0 on why they may receive errors or find their password reset abilities have now been disabled.
Helps address #5444
If no `from` param is specified we calculate and use the "current
token" that inlcuded typing, presence, etc. These are unused during
pagination and are not available on workers, so we simply don't
calculate them.
* group the arguments together into a group
* add new names "--generate-missing-config" and "--config-directory" for
existing cmdline options "--generate-keys" and "--keys-dir", which better
reflect their purposes.
Fixes https://github.com/matrix-org/synapse/issues/5431
`jinja2` was being imported even when it wasn't strictly necessary. This made it required to run Synapse, even if the functionality that required it wasn't enabled. This was causing new Synapse installations to crash on startup.
Email modules are now required.
There is a README.txt which always sets off this warning, which is a bit
alarming when you first start synapse. I don't think we need to warn about
this.
If, for some reason, presence updates take a while to persist then it
can trigger clients to tightloop calling `/sync` due to the presence
handler returning updates but not advancing the stream token.
Fixes#5503.
Fixes intermittent errors observed on Apple hardware which were caused by
time.clock() appearing to go backwards when called from different threads.
Also fixes a bug where database activity times were logged as 1/1000 of their
correct ratio due to confusion between milliseconds and seconds.
I had to add quite a lot of logging to diagnose a problem with 3pid
invites - we only logged the one failure which isn't all that
informative.
NB. I'm not convinced the logic of this loop is right: I think it
should just accept a single valid signature from a trusted source
rather than fail if *any* signature is invalid. Also it should
probably not skip the rest of middle loop if a check fails? However,
I'm deliberately not changing the logic here.
Adds new config option `cleanup_extremities_with_dummy_events` which
periodically sends dummy events to rooms with more than 10 extremities.
THIS IS REALLY EXPERIMENTAL.
Configure the demo servers to use untrusted
tls certs so that they communicate with each other.
This configuration makes them very unsafe so I've added warnings about
it in the readme.
Fixes that when a user exchanges a 3PID invite for a proper invite over
federation it does not include the `invite_room_state` key.
This was due to synapse incorrectly sending out two invite requests.
* remove 2.7 from CI and publishing
* fill out classifiers and also make it not be installed on 3.5
* some minor bumps so that the old deps work on python 3.5
Moves the warning about password resets being disabled to the point where a user actually tries to reset their password. Is this an appropriate place for it to happen?
Also removed the disabling of msisdn password resets when you don't have an email config, as that just doesn't make sense.
Also change the error a user receives upon disabled passwords to specify that only email-based password reset is disabled.
If we try and send a transaction with lots of EDUs and we run out of
space, we call get_new_device_msgs_for_remote with a limit of 0, which
then failed.
Some keys are stored in the synapse database with a null valid_until_ms
which caused an exception to be thrown when using that key. We fix this
by treating nulls as zeroes, i.e. they keys will match verification
requests with a minimum_valid_until_ms of zero (i.e. don't validate ts)
but will not match requests with a non-zero minimum_valid_until_ms.
Fixes#5391.
It's not really a problem to trust notary responses signed by the old key so
long as we are also doing TLS validation.
This commit adds a check to the config parsing code at startup to check that
we do not have the insecure matrix.org key without tls validation, and refuses
to start without it.
This allows us to remove the rather alarming-looking warning which happens at
runtime.
When we try and calculate a description for a room for with no name but
multiple other users we threw an exception (due to trying to subscript
result of `dict.values()`).
Sometimes the build agents get lost or die (error codes -1 and 2). Retry automatically a maximum of 2 times if this happens.
Error code reference:
* -1: Agent was lost
* 0: Build successful
* 1: There was an error in your code
* 2: The build stopped abruptly
* 255: The build was cancelled
Sends password reset emails from the homeserver instead of proxying to the identity server. This is now the default behaviour for security reasons. If you wish to continue proxying password reset requests to the identity server you must now enable the email.trust_identity_server_for_password_resets option.
This PR is a culmination of 3 smaller PRs which have each been separately reviewed:
* #5308
* #5345
* #5368
There are a few changes going on here:
* We make checking the signature on a key server response optional: if no
verify_keys are specified, we trust to TLS to validate the connection.
* We change the default config so that it does not require responses to be
signed by the old key.
* We replace the old 'perspectives' config with 'trusted_key_servers', which
is also formatted slightly differently.
* We emit a warning to the logs every time we trust a key server response
signed by the old key.
* Fix background updates to handle redactions/rejections
In background updates based on current state delta stream we need to
handle that we may not have all the events (or at least that
`get_events` may raise an exception).
Also:
* rename VerifyKeyRequest->VerifyJsonRequest
* calculate key_ids on VerifyJsonRequest construction
* refactor things to pass around VerifyJsonRequests instead of 4-tuples
FederationClient.get_pdu is called in a loop to fetch a batch of PDUs. A
failure to fetch one should not result in a failure of the whole batch. Add the
missing `continue`.
We have too many things called get_event, and it's hard to figure out what we
mean. Also remove some unused params from the signature, and add some logging.
It takes at least 20 minutes to work through the long_retries schedule (11
attempts, each with a 60 second timeout, and 60 seconds between each request),
so if the notary server isn't returning within the timeout, we'll just end up
blocking whatever request is happening for 20 minutes.
Ain't nobody got time for that.
When handling incoming federation requests, make sure that we have an
up-to-date copy of the signing key.
We do not yet enforce the validity period for event signatures.
When processing an incoming event over federation, we may try and
resolve any unexpected differences in auth events. This is a
non-essential process and so should not stop the processing of the event
if it fails (e.g. due to the remote disappearing or not implementing the
necessary endpoints).
Fixes#3330
We have to do this by re-inserting a background update and recreating
tables, as the tables only get created during a background update and
will later be deleted.
We also make sure that we remove any entries that should have been
removed but weren't due to a race that has been fixed in a previous
commit.
The verify_request deferred already returns a suitable SynapseError, so I don't
really know what we expect to achieve by doing more wrapping, other than log
spam.
Fixes#4278.
When we receive a soft failed event we, correctly, *do not* update the
forward extremity table with the event. However, if we later receive an
event that references the soft failed event we then need to remove the
soft failed events prev events from the forward extremities table,
otherwise we just build up forward extremities.
Fixes#5269
This fixes a bug which were causing the "event_format" field to be
ignored in the filter of requests to the `/messages` endpoint of the
CS API.
Signed-off-by: Eisha Chen-yen-su <chenyensu0@gmail.com>
When we receive a soft failed event we, correctly, *do not* update the
forward extremity table with the event. However, if we later receive an
event that references the soft failed event we then need to remove the
soft failed events prev events from the forward extremities table,
otherwise we just build up forward extremities.
Fixes#5269
When enabling the account validity feature, Synapse will look at startup for registered account without an expiration date, and will set one equals to 'now + validity_period' for them. On large servers, it can mean that a large number of users will have the same expiration date, which means that they will all be sent a renewal email at the same time, which isn't ideal.
In order to mitigate this, this PR allows server admins to define a 'max_delta' so that the expiration date is a random value in the [now + validity_period ; now + validity_period + max_delta] range. This allows renewal emails to be progressively sent over a configured period instead of being sent all in one big batch.
The list of server names was redundant, since it was equivalent to the keys on
the server_to_deferred map. This reduces the number of large lists being passed
around, and has the benefit of deduplicating the entries in `wait_on`.
Replaces DEFAULT_ROOM_VERSION constant with a method that first checks the config, then returns a hardcoded value if the option is not present.
That hardcoded value is now located in the server.py config file.
Rather than have three methods which have to have the same interface,
factor out a separate interface which is provided by three implementations.
I find it easier to grok the code this way.
This is a first step to checking that the key is valid at the required moment.
The idea here is that, rather than passing VerifyKey objects in and out of the
storage layer, we instead pass FetchKeyResult objects, which simply wrap the
VerifyKey and add a valid_until_ts field.
* Pass time_added_ms into process_v2_response
* Simplify process_v2_response
We can merge old_verify_keys into verify_keys, and reduce the number of dicts
flying around.
Storing server keys hammered the database a bit. This replaces the
implementation which stored a single key, with one which can do many updates at
once.
I was staring at this function trying to figure out wtf it was actually
doing. This is (hopefully) a non-functional refactor which makes it a bit
clearer.
If we remove support for a particular room version, we should behave more
gracefully. This should make client requests fail with a 400 rather than a 500,
and will ignore individiual PDUs in a federation transaction, rather than the
whole transaction.
This reverts commit ce5bcefc60.
This caused:
```
Traceback (most recent call last):
File "/usr/local/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/local/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/synapse/src/synapse/app/client_reader.py", line 32, in <module>
from synapse.replication.slave.storage import SlavedProfileStore
ImportError: cannot import name 'SlavedProfileStore' from 'synapse.replication.slave.storage' (/home/synapse/src/synapse/replication/slave/storage/__init__.py)
error starting synapse.app.client_reader('/home/synapse/config/workers/client_reader.yaml') (exit code: 1); see above for logs
```
If account validity is enabled in the server's configuration, this job will run at startup as a background job and will stick an expiration date to any registered account missing one.
We checked that 3pids were not already in use before we checked if
we were going to return the account previously registered in the
same UI auth session, in which case the 3pids will definitely
be in use.
https://github.com/vector-im/riot-web/issues/9586
Prevents a SynapseError being raised inside of a IResolutionReceiver and instead opts to just return 0 results. This thus means that we have to lump a failed lookup and a blacklisted lookup together with the same error message, but the substitute should be generic enough to cover both cases.
It's more natural for the user if the bit that takes them away
from the registration flow comes last. Adding the dummy stage allows
us to do the stages in this order without the ambiguity.
This allows the client to complete the email last which is more
natual for the user. Without this stage, if the client would
complete the recaptcha (and terms, if enabled) stages and then the
registration request would complete because you've now completed a
flow, even if you were intending to complete the flow that's the
same except has email auth at the end.
Adding a dummy auth stage to the recaptcha-only flow means it's
always unambiguous which flow the client was trying to complete.
Longer term we should think about changing the protocol so the
client explicitly says which flow it's trying to complete.
vector-im/riot-web#9586
This allows the client to complete the email last which is more
natual for the user. Without this stage, if the client would
complete the recaptcha (and terms, if enabled) stages and then the
registration request would complete because you've now completed a
flow, even if you were intending to complete the flow that's the
same except has email auth at the end.
Adding a dummy auth stage to the recaptcha-only flow means it's
always unambiguous which flow the client was trying to complete.
Longer term we should think about changing the protocol so the
client explicitly says which flow it's trying to complete.
https://github.com/vector-im/riot-web/issues/9586
* Add AllowEncodedSlashes to apache
Add `AllowEncodedSlashes On` to apache config to support encoding for v3 rooms. "The AllowEncodedSlashes setting is not inherited by virtual hosts, and virtual hosts are used in many default Apache configurations, such as the one in Ubuntu. The workaround is to add the AllowEncodedSlashes setting inside a <VirtualHost> container (/etc/apache2/sites-available/default in Ubuntu)." Source: https://stackoverflow.com/questions/4390436/need-to-allow-encoded-slashes-on-apache
* change allowencodedslashes to nodecode
This commit adds two config options:
* `restrict_public_rooms_to_local_users`
Requires auth to fetch the public rooms directory through the CS API and disables fetching it through the federation API.
* `require_auth_for_profile_requests`
When set to `true`, requires that requests to `/profile` over the CS API are authenticated, and only returns the user's profile if the requester shares a room with the profile's owner, as per MSC1301.
MSC1301 also specifies a behaviour for federation (only returning the profile if the server asking for it shares a room with the profile's owner), but that's currently really non-trivial to do in a not too expensive way. Next step is writing down a MSC that allows a HS to specify which user sent the profile query. In this implementation, Synapse won't send a profile query over federation if it doesn't believe it already shares a room with the profile's owner, though.
Groups have been intentionally omitted from this commit.
This endpoint isn't much use for its intended purpose if you first need to get
yourself an admin's auth token.
I've restricted it to the `/_synapse/admin` path to make it a bit easier to
lock down for those concerned about exposing this information. I don't imagine
anyone is using it in anger currently.
Synapse 0.99.3.2 (2019-05-03)
=============================
Internal Changes
----------------
- Ensure that we have `urllib3` <1.25, to resolve incompatibility with `requests`. ([\#5135](https://github.com/matrix-org/synapse/issues/5135))
psycopg 2.8 is now out, which means that the C library gets built from source,
so we now need libpq-dev when building.
Turns out the need for this package is already documented in
docs/postgres.rst.
Using systemd-python allows for logging to the systemd journal,
as is documented in: `synapse/contrib/systemd/log_config.yaml`.
Signed-off-by: Silke Hofstra <silke@slxh.eu>
Currently if a call to `/get_missing_events` fails we log an exception
and stop processing the top level event we received over federation.
Instead let's try and handle it sensibly given it is a somewhat expected
failure mode.
We need to drop tables in the correct order due to foreign table
constraints (on `application_services`), otherwise the DROP TABLE
command will fail.
Introduced in #4992.
There's no point in collecting a merged dict of keys: it is sufficient to
consider just the new keys which have been fetched by the most recent
key_fetch_fns.
Make this just return the key dict, rather than a single-entry dict mapping the
server name to the key dict. It's easy for the caller to get the server name
from from the response object anyway.
**IF YOU HAVE SUPPORT QUESTIONS ABOUT RUNNING OR CONFIGURING YOUR OWN HOME SERVER**, please ask in **[#synapse:matrix.org](https://matrix.to/#/#synapse:matrix.org)** (using a matrix.org account if necessary).
If you want to report a security issue, please see https://matrix.org/security-disclosure-policy/
This is a bug report form. By following the instructions below and completing the sections with your information, you will help the us to get all the necessary data to fix your issue.
You can also preview your report before submitting it.
- type:textarea
id:description
attributes:
label:Description
description:Describe the problem that you are experiencing
validations:
required:true
- type:textarea
id:reproduction_steps
attributes:
label:Steps to reproduce
description:|
Describe the series of steps that leads you to the problem.
Describe how what happens differs from what you expected.
placeholder:Tell us what you see!
value:|
- list the steps
- that reproduce the bug
- using hyphens as bullet points
validations:
required:true
- type:markdown
attributes:
value:|
---
**IMPORTANT**: please answer the following questions, to help us narrow down the problem.
- type:input
id:homeserver
attributes:
label:Homeserver
description:Which homeserver was this issue identified on? (matrix.org, another homeserver, etc)
validations:
required:true
- type:input
id:version
attributes:
label:Synapse Version
description:|
What version of Synapse is this homeserver running?
You can find the Synapse version by visiting https://yourserver.example.com/_matrix/federation/v1/version
<!-- Please read CONTRIBUTING.rst before submitting your pull request -->
<!-- Please read https://matrix-org.github.io/synapse/latest/development/contributing_guide.html before submitting your pull request -->
* [ ] Pull request is based on the develop branch
* [ ] Pull request includes a [changelog file](https://github.com/matrix-org/synapse/blob/master/CONTRIBUTING.rst#changelog)
* [ ] Pull request includes a [sign off](https://github.com/matrix-org/synapse/blob/master/CONTRIBUTING.rst#sign-off)
* [ ] Pull request includes a [changelog file](https://matrix-org.github.io/synapse/latest/development/contributing_guide.html#changelog). The entry should:
- Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from `EventStore` to `EventWorkerStore`.".
- Use markdown where necessary, mostly for `code blocks`.
- End with either a period (.) or an exclamation mark (!).
- Start with a capital letter.
- Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
* [ ] Pull request includes a [sign off](https://matrix-org.github.io/synapse/latest/development/contributing_guide.html#sign-off)
* [ ] [Code style](https://matrix-org.github.io/synapse/latest/code_style.html) is correct
(run the [linters](https://matrix-org.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))
[**#matrix:matrix.org**](https://matrix.to/#/#matrix:matrix.org) is the official support room for Matrix, and can be accessed by any client from https://matrix.org/docs/projects/try-matrix-now.html
It can also be access via IRC bridge at irc://irc.freenode.net/matrix or on the web here: https://webchat.freenode.net/?channels=matrix
[**#synapse:matrix.org**](https://matrix.to/#/#synapse:matrix.org) is the official support room for
Synapse, and can be accessed by any client from https://matrix.org/docs/projects/try-matrix-now.html.
Please ask for support there, rather than filing github issues.
- # "pip" is the correct setting for poetry, per https://docs.github.com/en/code-security/dependabot/dependabot-version-updates/configuration-options-for-the-dependabot.yml-file#package-ecosystem
**The single biggest thing you need to know is: please base your changes on
the develop branch - /not/ master.**
We use the master branch to track the most recent release, so that folks who
blindly clone the repo and automatically check out master get something that
works. Develop is the unstable branch where all the development actually
happens: the workflow is that contributors should fork the develop branch to
make a 'feature' branch for a particular contribution, and then make a pull
request to merge this back into the matrix.org 'official' develop branch. We
use github's pull request workflow to review the contribution, and either ask
you to make any refinements needed or merge it and make them ourselves. The
changes will then land on master when we next do a release.
We use `CircleCI <https://circleci.com/gh/matrix-org>`_ and `Travis CI
<https://travis-ci.org/matrix-org/synapse>`_ for continuous integration. All
pull requests to synapse get automatically tested by Travis and CircleCI.
If your change breaks the build, this will be shown in GitHub, so please
keep an eye on the pull request for feedback.
To run unit tests in a local development environment, you can use:
-``tox -e py27`` (requires tox to be installed by ``pip install tox``) for
SQLite-backed Synapse on Python 2.7.
-``tox -e py35`` for SQLite-backed Synapse on Python 3.5.
-``tox -e py36`` for SQLite-backed Synapse on Python 3.6.
-``tox -e py27-postgres`` for PostgreSQL-backed Synapse on Python 2.7
(requires a running local PostgreSQL with access to create databases).
-``./test_postgresql.sh`` for PostgreSQL-backed Synapse on Python 2.7
(requires Docker). Entirely self-contained, recommended if you don't want to
set up PostgreSQL yourself.
Docker images are available for running the integration tests (SyTest) locally,
see the `documentation in the SyTest repo
<https://github.com/matrix-org/sytest/blob/develop/docker/README.md>`_ for more
information.
Code style
~~~~~~~~~~
All Matrix projects have a well-defined code-style - and sometimes we've even
got as far as documenting it... For instance, synapse's code style doc lives
at https://github.com/matrix-org/synapse/tree/master/docs/code_style.rst.
Please ensure your changes match the cosmetic style of the existing project,
and **never** mix cosmetic and functional changes in the same commit, as it
makes it horribly hard to review otherwise.
Changelog
~~~~~~~~~
All changes, even minor ones, need a corresponding changelog / newsfragment
entry. These are managed by Towncrier
(https://github.com/hawkowl/towncrier).
To create a changelog entry, make a new file in the ``changelog.d``
file named in the format of ``PRnumber.type``. The type can be
one of ``feature``, ``bugfix``, ``removal`` (also used for
deprecations), or ``misc`` (for internal-only changes).
The content of the file is your changelog entry, which can contain Markdown
formatting. The entry should end with a full stop ('.') for consistency.
Adding credits to the changelog is encouraged, we value your
contributions and would like to have you shouted out in the release notes!
For example, a fix in PR #1234 would have its changelog entry in
``changelog.d/1234.bugfix``, and contain content like "The security levels of
Florbs are now validated when recieved over federation. Contributed by Jane
Matrix.".
Debian changelog
----------------
Changes which affect the debian packaging files (in ``debian``) are an
exception.
In this case, you will need to add an entry to the debian changelog for the
next release. For this, run the following command::
dch
This will make up a new version number (if there isn't already an unreleased
version in flight), and open an editor where you can add a new changelog entry.
(Our release process will ensure that the version number and maintainer name is
corrected for the release.)
If your change affects both the debian packaging *and* files outside the debian
directory, you will need both a regular newsfragment *and* an entry in the
debian changelog. (Though typically such changes should be submitted as two
separate pull requests.)
Attribution
~~~~~~~~~~~
Everyone who contributes anything to Matrix is welcome to be listed in the
AUTHORS.rst file for the project in question. Please feel free to include a
change to AUTHORS.rst in your pull request to list yourself and a short
description of the area(s) you've worked on. Also, we sometimes have swag to
give away to contributors - if you feel that Matrix-branded apparel is missing
from your life, please mail us your shipping address to matrix at matrix.org and
we'll try to fix it :)
Sign off
~~~~~~~~
In order to have a concrete record that your contribution is intentional
and you agree to license it under the same terms as the project's license, we've adopted the
same lightweight approach that the Linux Kernel
`submitting patches process <https://www.kernel.org/doc/html/latest/process/submitting-patches.html#sign-your-work-the-developer-s-certificate-of-origin>`_, Docker
(https://github.com/docker/docker/blob/master/CONTRIBUTING.md), and many other
projects use: the DCO (Developer Certificate of Origin:
http://developercertificate.org/). This is a simple declaration that you wrote
the contribution or otherwise have the right to contribute it to Matrix::
Developer Certificate of Origin
Version 1.1
Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
660 York Street, Suite 102,
San Francisco, CA 94110 USA
Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.
Developer's Certificate of Origin 1.1
By making a contribution to this project, I certify that:
(a) The contribution was created in whole or in part by me and I
have the right to submit it under the open source license
indicated in the file; or
(b) The contribution is based upon previous work that, to the best
of my knowledge, is covered under an appropriate open source
license and I have the right under that license to submit that
work with modifications, whether created in whole or in part
by me, under the same open source license (unless I am
permitted to submit under a different license), as indicated
in the file; or
(c) The contribution was provided directly to me by some other
person who certified (a), (b) or (c) and I have not modified
it.
(d) I understand and agree that this project and the contribution
are public and that a record of the contribution (including all
personal information I submit with it, including my sign-off) is
maintained indefinitely and may be redistributed consistent with
this project or the open source license(s) involved.
If you agree to this for your contribution, then all that's needed is to
include the line in your commit or pull request comment::
Signed-off-by: Your Name <your@email.example.org>
We accept contributions under a legally identifiable name, such as
your name on government documentation or common-law names (names
claimed by legitimate usage or repute). Unfortunately, we cannot
accept anonymous contributions at this time.
Git allows you to add this signoff automatically when using the ``-s``
flag to ``git commit``, which uses the name and email set in your
``user.name`` and ``user.email`` git configs.
Conclusion
~~~~~~~~~~
That's it! Matrix is a very open and collaborative project as you might expect
given our obsession with open communication. If we're going to successfully
matrix together all the fragmented communication technologies out there we are
reliant on contributions and collaboration from the community to do so. So
please get involved - and we hope you have as much fun hacking on Matrix as we
doas pkg_add python libffi py-pip py-setuptools sqlite3 py-virtualenv \
libxslt jpeg
```
There is currently no port for OpenBSD. Additionally, OpenBSD's security
settings require a slightly more difficult installation process.
XXX: I suspect this is out of date.
1. Create a new directory in `/usr/local` called `_synapse`. Also, create a
new user called `_synapse` and set that directory as the new user's home.
This is required because, by default, OpenBSD only allows binaries which need
write and execute permissions on the same memory space to be run from
`/usr/local`.
2.`su` to the new `_synapse` user and change to their home directory.
3. Create a new virtualenv: `virtualenv -p python2.7 ~/.synapse`
4. Source the virtualenv configuration located at
`/usr/local/_synapse/.synapse/bin/activate`. This is done in `ksh` by
using the `.` command, rather than `bash`'s `source`.
5. Optionally, use `pip` to install `lxml`, which Synapse needs to parse
webpages for their titles.
6. Use `pip` to install this repository: `pip install matrix-synapse`
7. Optionally, change `_synapse`'s shell to `/bin/false` to reduce the
chance of a compromised Synapse server being used to take over your box.
After this, you may proceed with the rest of the install directions.
#### Windows
If you wish to run or develop Synapse on Windows, the Windows Subsystem For
Linux provides a Linux environment on Windows 10 which is capable of using the
Debian, Fedora, or source installation methods. More information about WSL can
be found at https://docs.microsoft.com/en-us/windows/wsl/install-win10 for
Windows 10 and https://docs.microsoft.com/en-us/windows/wsl/install-on-server
for Windows Server.
### Troubleshooting Installation
XXX a bunch of this is no longer relevant.
Synapse requires pip 8 or later, so if your OS provides too old a version you
may need to manually upgrade it::
sudo pip install --upgrade pip
Installing may fail with `Could not find any downloads that satisfy the requirement pymacaroons-pynacl (from matrix-synapse==0.12.0)`.
You can fix this by manually upgrading pip and virtualenv::
sudo pip install --upgrade virtualenv
You can next rerun `virtualenv -p python3 synapse` to update the virtual env.
Installing may fail during installing virtualenv with `InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.`
You can fix this by manually installing ndg-httpsclient::
pip install --upgrade ndg-httpsclient
Installing may fail with `mock requires setuptools>=17.1. Aborting installation`.
You can fix this by upgrading setuptools::
pip install --upgrade setuptools
If pip crashes mid-installation for reason (e.g. lost terminal), pip may
refuse to run until you remove the temporary installation directory it
created. To reset the installation::
rm -rf /tmp/pip_install_matrix
pip seems to leak *lots* of memory during installation. For instance, a Linux
host with 512MB of RAM may run out of memory whilst installing Twisted. If this
happens, you will have to individually install the dependencies which are
failing, e.g.::
pip install twisted
## Prebuilt packages
As an alternative to installing from source, prebuilt packages are available
for a number of platforms.
### Docker images and Ansible playbooks
There is an offical synapse image available at
https://hub.docker.com/r/matrixdotorg/synapse which can be used with
the docker-compose file available at [contrib/docker](contrib/docker). Further information on
this including configuration options is available in the README on
hub.docker.com.
Alternatively, Andreas Peters (previously Silvio Fricke) has contributed a
Dockerfile to automate a synapse server in a single Docker image, at
[1] End-to-end encryption is currently in beta: `blog post <https://matrix.org/blog/2016/11/21/matrixs-olm-end-to-end-encryption-security-assessment-released-and-implemented-cross-platform-on-riot-at-last>`_.
Synapse Installation
====================
The Synapse documentation describes `how to install Synapse <https://matrix-org.github.io/synapse/latest/setup/installation.html>`_. We recommend using
`Docker images <https://matrix-org.github.io/synapse/latest/setup/installation.html#docker-images-and-ansible-playbooks>`_ or `Debian packages from Matrix.org
Remove old, incorrect minimum postgres version note and replace with a link to the [Dependency Deprecation Policy](https://matrix-org.github.io/synapse/v1.73/deprecation_policy.html).
Suppress a spurious warning when `POST /rooms/<room_id>/<membership>/`, `POST /join/<room_id_or_alias`, or the unspecced `PUT /join/<room_id_or_alias>/<txn_id>` receive an empty HTTP request body.
Return spec-compliant JSON errors when unknown endpoints are requested.
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.