Skip to content

chroma: gate candidates on the musicbrainz plugin being enabled#6522

Open
AnonymousAAArdvark wants to merge 5 commits intobeetbox:masterfrom
AnonymousAAArdvark:fix/chroma-gate-musicbrainz-6212
Open

chroma: gate candidates on the musicbrainz plugin being enabled#6522
AnonymousAAArdvark wants to merge 5 commits intobeetbox:masterfrom
AnonymousAAArdvark:fix/chroma-gate-musicbrainz-6212

Conversation

@AnonymousAAArdvark
Copy link
Copy Markdown

Summary

Fixes #6212. The chroma plugin used Acoustid fingerprinting to look
up MusicBrainz release and recording IDs, then unconditionally resolved
those IDs via MusicBrainz — even when the user had not enabled the
musicbrainz plugin. The result: users with (e.g.) only
chroma + spotify enabled would still see MusicBrainz-sourced
album and track candidates appear in the autotagger, defeating their
choice to opt out of MusicBrainz as a metadata source.

This PR replaces the direct MusicBrainzPlugin() instantiation with a
lookup through metadata_plugins.get_metadata_source(\"musicbrainz\").
When the musicbrainz plugin is not loaded, both candidates and
item_candidates short-circuit and return empty. Acoustid fingerprinting
itself is unaffected — the acoustid_id and acoustid_fingerprint
item fields are still populated exactly as before.

Note that the original issue thread mentions only the album-level
candidates path, but the same bug exists in item_candidates (the
singleton/track path). Both are fixed.

Bonus fix

The previous cached_property mb = MusicBrainzPlugin() pattern created a
second MusicBrainzPlugin instance outside the plugin registry. This
silently bypassed beetsplug/mbpseudo.py, which registers a replacement
instance for the musicbrainz plugin — meaning chroma-triggered lookups were
never routed through the pseudo-release-aware code. Routing through
get_metadata_source fixes this as a side effect: chroma now uses the
same shared instance as the rest of beets.

Alternative considered

The issue reporter's maintainer response suggested a richer approach:
extract cross-reference IDs (Spotify, Discogs, etc.) from the MusicBrainz
release response and route them to whichever metadata plugins are
enabled. That's a good direction, but (a) it only applies to
candidates — MusicBrainz recording responses don't carry cross-source
track IDs, so item_candidates needs the gating fix regardless; and
(b) it would still want get_metadata_source as its foundation.
This PR is intentionally scoped as the minimal gating fix; I'd be happy
to follow up with the cross-reference routing as a separate PR if the
maintainers want it.

Test plan

  • New unit tests in test/plugins/test_chroma.py covering all four
    cross-product cases: candidates / item_candidates × with /
    without the musicbrainz plugin loaded. The without musicbrainz
    cases directly pin the issue-6212 regression.
  • Existing chroma tests (chromasearch_*) still pass.
  • Full focused regression subset passes (144 tests across
    test/plugins/test_chroma.py, test/plugins/test_musicbrainz.py,
    test/test_metadata_plugins.py, test/autotag/test_match.py,
    test/autotag/test_distance.py).
  • ruff check + ruff format --check clean on changed files.
  • mypy beetsplug/chroma.py clean.
  • sphinx-lint docs/plugins/chroma.rst docs/changelog.rst clean.
  • Changelog entry added under Unreleased → Bug fixes.
  • Note added to docs/plugins/chroma.rst explaining that chroma
    requires musicbrainz to produce candidates.

Disclosure: this contribution was made while working on an open-source
assignment for EECS 481 (Software Engineering) at the University of
Michigan.

The chroma plugin uses Acoustid fingerprinting, which returns MusicBrainz
release and recording IDs. It then unconditionally resolved those IDs
through MusicBrainz even when the user had not enabled the musicbrainz
plugin, so MusicBrainz-sourced candidates still appeared during tagging
regardless of the user's plugin configuration.

Instead of instantiating ``MusicBrainzPlugin`` directly as a
``cached_property``, look up the loaded instance through
``metadata_plugins.get_metadata_source("musicbrainz")``. When that
returns ``None`` (the musicbrainz plugin is not loaded), short-circuit
both ``candidates`` and ``item_candidates`` to return empty so no
MusicBrainz-sourced candidates are produced. Acoustid fingerprinting and
the ``acoustid_id`` / ``acoustid_fingerprint`` item fields are
unaffected.

This also removes a latent bug: the previous direct instantiation
bypassed the plugin registry, so any plugin that swapped the musicbrainz
plugin at runtime (for example, ``mbpseudo``) was silently ignored on
the chroma path.

Fixes beetbox#6212
@AnonymousAAArdvark AnonymousAAArdvark requested a review from a team as a code owner April 10, 2026 23:26
@github-actions github-actions bot added chroma chroma plugin musicbrainz musicbrainz plugin labels Apr 10, 2026
@AnonymousAAArdvark
Copy link
Copy Markdown
Author

AnonymousAAArdvark commented Apr 10, 2026

Quick note on the CI state:

Check docs: real failure docstrfmt --preserve-adornments wanted my changelog entry wrapped slightly differently than I had it. Fixed in 14b26f0.
Run tests (windows-latest, 3.10–3.13): these failed at the choco install mp3gain step with an HTTP 499 from community.chocolatey.org/api/v2/ (example job log). Looks like a transient Chocolatey infrastructure hiccup.

@semohr semohr self-assigned this Apr 11, 2026
@amogus07
Copy link
Copy Markdown
Contributor

Bonus fix

The previous cached_property mb = MusicBrainzPlugin() pattern created a second MusicBrainzPlugin instance outside the plugin registry. This silently bypassed beetsplug/mbpseudo.py, which registers a replacement instance for the musicbrainz plugin — meaning chroma-triggered lookups were never routed through the pseudo-release-aware code. Routing through get_metadata_source fixes this as a side effect: chroma now uses the same shared instance as the rest of beets.

would this fix #6441?

@semohr
Copy link
Copy Markdown
Contributor

semohr commented Apr 11, 2026

As it is yes it would fix the issue with mbpseudo.

As I said in the other comments tho it feels a bit like a hacky and lazy way out. Musibrainz does give identifiers for other metadata sources which we could use to resolve to the proper (enabled) metadata plugin.

But maybe these are two different issues to tackle.

…rces

In response to review feedback on beetbox#6212, rework the fix so chroma does
not simply gate on the musicbrainz plugin being enabled. Instead:

1. Chroma always queries MusicBrainz to resolve acoustid release IDs
   (since that is the only identifier acoustid returns), preferring
   the loaded ``musicbrainz`` plugin instance from the metadata-source
   registry and falling back to a private, unregistered instance when
   the user has not enabled ``musicbrainz``.

2. The new ``extra_external_sources`` keyword argument on
   ``MusicBrainzPlugin.album_for_id`` / ``album_info`` lets chroma
   force-extract cross-reference IDs from ``url-relations`` for
   exactly the external sources whose metadata-source plugins are
   loaded (Discogs, Bandcamp, Spotify, Deezer, Tidal), without
   requiring the user to globally opt into ``musicbrainz.external_ids``.

3. For each resolved MusicBrainz ``AlbumInfo``, chroma yields the MB
   candidate (when the musicbrainz plugin is loaded) and dispatches
   to each loaded external metadata plugin via its ``album_for_id``
   when the corresponding ``{source}_album_id`` attribute is set on
   the ``AlbumInfo``. A user running ``chroma`` with ``spotify`` but
   without ``musicbrainz`` now receives Spotify-sourced candidates
   from acoustid matches instead of nothing.

``item_candidates`` (the singleton track path) still requires the
musicbrainz plugin because MusicBrainz recording responses do not
carry cross-source track identifiers; this is documented in
``docs/plugins/chroma.rst``.

Tests updated to cover all four cross-product cases:
no-compatible-sources, musicbrainz-only, external-only, and
musicbrainz+external.
@AnonymousAAArdvark
Copy link
Copy Markdown
Author

Thanks for the review @semohr, I've reworked the PR to do the cross-reference routing instead of gating. Latest commit bc956db:

What chroma does now:

  1. Always queries MusicBrainz to resolve the acoustid release ID (prefers the loaded musicbrainz plugin from the registry so mbpseudo still takes effect; falls back to a private, transient MusicBrainzPlugin instance when the user hasn't enabled musicbrainz).
  2. Passes a new extra_external_sources keyword argument to MusicBrainzPlugin.album_for_id / album_info so chroma can force-extract cross-reference IDs for exactly the external sources whose metadata plugins are loaded — without asking the user to globally opt into musicbrainz.external_ids.
  3. For each resolved MB AlbumInfo, yields the MB candidate (only when the musicbrainz plugin itself is loaded) and dispatches to each loaded external metadata plugin via album_for_id(info.{source}_album_id) for the Discogs / Bandcamp / Spotify / Deezer / Tidal cross-references in url-relations.

So:

  • chroma + spotify (no musicbrainz) → Spotify candidates for releases MB has a Spotify url-relation for. Addresses your AC/DC example directly.
  • chroma + musicbrainz → MB candidates, unchanged from today.
  • chroma + musicbrainz + spotify → both kinds of candidates.
  • chroma alone (no MB, no cross-ref targets) → empty, no network call made.

Test coverage is in test/plugins/test_chroma.py. Four classes covering all four cross-product cases (no compatible sources / MB only / external only / both), plus asserts that extra_external_sources is passed correctly and that external plugins are only called when their cross-ref ID is present on the MB AlbumInfo. Full suite (2234 tests) still passes locally; ruff / docstrfmt / sphinx-lint clean.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 11, 2026

Codecov Report

❌ Patch coverage is 78.68852% with 13 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.80%. Comparing base (72b1118) to head (f0934e2).
⚠️ Report is 32 commits behind head on master.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
beetsplug/chroma.py 77.58% 10 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6522      +/-   ##
==========================================
+ Coverage   70.55%   70.80%   +0.25%     
==========================================
  Files         148      150       +2     
  Lines       18820    18965     +145     
  Branches     3066     3090      +24     
==========================================
+ Hits        13279    13429     +150     
+ Misses       4894     4877      -17     
- Partials      647      659      +12     
Files with missing lines Coverage Δ
beetsplug/mbpseudo.py 79.87% <100.00%> (ø)
beetsplug/chroma.py 49.83% <77.58%> (+13.29%) ⬆️

... and 15 files with indirect coverage changes

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

The previous commit added a new keyword-only ``extra_external_sources``
parameter to ``MusicBrainzPlugin.album_info`` so :doc:`plugins/chroma`
can force extraction of cross-reference IDs for enabled metadata
source plugins. ``MusicBrainzPseudoReleasePlugin.album_info`` overrides
that method and needs the matching signature (mypy --override) and
should forward the flag through both its own ``super().album_info``
calls so the cross-reference routing also works on releases that get
redirected to a pseudo release.
@semohr
Copy link
Copy Markdown
Contributor

semohr commented Apr 11, 2026

Neat, this is more or less how I imagined it. Thanks for the initiative 🙏

As the changes touch musicbrainz now I would like to ping @snejus. Adding a additional argument to album_for_id seems like a bad idea to me. Any ideas for a less intrusive workaround?

Im gona review this properly once we are okay with the high level decicion on how to integrate musicbrainz.

@cached_property
def mb(self) -> MusicBrainzPlugin:
return MusicBrainzPlugin()
def _musicbrainz_client(self) -> MusicBrainzPlugin:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make this a cached property. Should simplify this slightly.

@amogus07
Copy link
Copy Markdown
Contributor

also related: #6524

@snejus
Copy link
Copy Markdown
Member

snejus commented Apr 12, 2026

  • Passes a new extra_external_sources keyword argument to MusicBrainzPlugin.album_for_id / album_info so chroma can force-extract cross-reference IDs for exactly the external sources whose metadata plugins are loaded — without asking the user to globally opt into musicbrainz.external_ids.

  • For each resolved MB AlbumInfo, yields the MB candidate (only when the musicbrainz plugin itself is loaded) and dispatches to each loaded external metadata plugin via album_for_id(info.{source}_album_id) for the Discogs / Bandcamp / Spotify / Deezer / Tidal cross-references in url-relations.

This to me seems to be well outside the scope of chroma plugin's responsibility. These external source IDs are always available when we query musicbrainz, why should chroma be responsible for handling them?

It seems like a large additional feature that should exist somewhere else - can we keep this focused to the first point / fix related to mbpseudo?

Would this mean that chroma without musicbrainz enabled would essentially be pointless? If that's the case, I would still want to see handling of external sources in a separate PR.

@semohr
Copy link
Copy Markdown
Contributor

semohr commented Apr 12, 2026

The idea here was to implement #6212 which predates mbpseudo. The mbpseudo fix is just a sideeffect.

The idea of the PR was to implement the features of forwarding the candisates lookups to only enabled datasources, not to fix the mbpseudo issue. I would like to keep it scoped like that.

@semohr
Copy link
Copy Markdown
Contributor

semohr commented Apr 12, 2026

Would this mean that chroma without musicbrainz enabled would essentially be pointless? If that's the case, I would still want to see handling of external sources in a separate PR.

Chroma (before) only ever produced musicbrainz candidates. Imo it would be consistent to only import/show candidates of the enabled data sources.

@snejus
Copy link
Copy Markdown
Member

snejus commented Apr 12, 2026

The idea here was to implement #6212 which predates mbpseudo. The mbpseudo fix is just a sideeffect.

Regarding the issue scope, it seems to me that the expected behaviour would be to not return anything when musicbrainz plugin is disabled? Happy that it fixes the mbpseudo issue as a side effect!

Routing lookups to the rest of data source plugins is a good idea, but best to be moved to a separate PR.

@semohr
Copy link
Copy Markdown
Contributor

semohr commented Apr 12, 2026

Regarding the issue scope, it seems to me that the expected behaviour would be to not return anything when musicbrainz plugin is disabled?

This might break some edge-case workflows (as usual 😄), but as long as we document the behavior clearly, it should be acceptable. (Imo also the expected behavior.)


My suggestion would be to:

  • Open a separate PR to explicitly disable candidates if mb is not enabled (and fix the mbpseudo issue)
  • Then rebase this PR on top of it, so we don’t lose the progress already made here (especially around the forwarding feature)

@AnonymousAAArdvark would you be up for taking this on as well? Specifically, cherry-pick the fixes for the mbpseudo issue from this PR and restrict candidates to only those coming from MusicBrainz
If not, we can always pick it up.

@AnonymousAAArdvark
Copy link
Copy Markdown
Author

Done — opened #6532 with just the gating fix (gate candidates / item_candidates on musicbrainz being loaded via the metadata-source registry + mbpseudo side-effect fix). Clean branch off current master, no changes to musicbrainz.py or mbpseudo.py.

I'll keep this PR open for the cross-reference routing follow-up. Once #6532 lands I can rebase this branch on top.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

chroma chroma plugin musicbrainz musicbrainz plugin

Projects

None yet

Development

Successfully merging this pull request may close these issues.

musicbrainz metadata source is used when the musicbrainz plugin is not activated

4 participants