Skip to content

refine validating of wikimedia_commons links#12182

Open
tyrasd wants to merge 8 commits intodevelopfrom
validate-wikimedia
Open

refine validating of wikimedia_commons links#12182
tyrasd wants to merge 8 commits intodevelopfrom
validate-wikimedia

Conversation

@tyrasd
Copy link
Copy Markdown
Member

@tyrasd tyrasd commented Apr 8, 2026

follow-up to #11499 / #12036:

  • show preview of tag upgrades of the validation fix (also for normal website updates)
  • suggest simplifying full URL to compact wikimedia_commons tag value
  • improved reference text for image -> wikimedia_commons warning and change to suggestion validation type
  • fix validation of a few edge cases (e.g. image tag with semicolon separated values, and only a part matches the wikimedia commons syntax; or when wikimedia_commons file name contains % character)
  • //edit: also: do not suggest to fix URLs with a domain name without a TLD (see below unexpected fix on the left, and on the right the new behaviour)
    ->

* suggest simplifying full URL in wikimedia_commons tag
* show preview of tag upgrade of the validation fix (also for normal website updates)
* fix validation of a few edge cases (e.g. image tag with semicolon separated values, and some match the wikimedia commons syntax)
@tyrasd tyrasd added the validation An issue with the validation or Q/A code label Apr 8, 2026
Comment thread data/core.yaml Outdated
@hlfan

This comment was marked as resolved.

@tyrasd

This comment was marked as resolved.

@hlfan

This comment was marked as resolved.

@tyrasd
Copy link
Copy Markdown
Member Author

tyrasd commented Apr 10, 2026

PS: @hlfan do you remember the reasoning behind the All split parts valid, but whole value still invalid branch added with #11499? At least now, after tweaking to the isValidURL check, it is not covered by any of our test cases (anymore?) and I cannot think of any way to trigger it. //edit: I removed the branch now in f44ea35, but maybe there is a reason to still have it?

also fixes a crash in the auto-fix when the URL consists of multiple parts and only one is invalid
also fixes a bug where the tag diff was not shown when there are multiple parts in the URL and it is auto-fixable
e.g. `website=none` should not be "fixed" to `website=https://none`
@hlfan
Copy link
Copy Markdown
Contributor

hlfan commented Apr 10, 2026

I think that originated from #11438 (comment). Perhaps @1ec5 can comment on that.

@hlfan
Copy link
Copy Markdown
Contributor

hlfan commented Apr 10, 2026

Maybe there could be a separate message if both the image and the wikimedia_commons keys have valid Wikimedia Commons references because some mappers can't decide what to pick.

@tyrasd
Copy link
Copy Markdown
Member Author

tyrasd commented Apr 10, 2026

I think that originated from #11438 (comment). Perhaps @1ec5 can comment on that.

I see, however then this must have been buggy from the start, as !invalidParts.length would not catch the case for an URL like https://example.com/foo;bar (where there is one invalid part).

I guess we could think about how to actually properly handle those cases:

  • https://example.com/foo;bar is not currently not auto-fixable as the second part does not look like a valid URL (missing a TLD)
  • but https://example.com/foo;bar.png would be interpreted like a two-URL case where the fix would look like https://example.com/foo;https://bar.png. I hope the newly added tag-diff preview will make is obvious that the proposed fix is not to be applied in such cases.
  • technically, we could add an extra fix for such cases, e.g. https://example.com/foo;bar.png -> https://example.com/foo%3Bbar.png
  • but then it would also show up for https://example.com;example.net as https://example.com%3Bexample.net, which might be confusing if not worded well

@tyrasd
Copy link
Copy Markdown
Member Author

tyrasd commented Apr 10, 2026

Maybe there could be a separate message if both the image and the wikimedia_commons keys have valid Wikimedia Commons references because some mappers can't decide what to pick.

Do you mean like in the following example?

wikimedia_commons=File:foo.png
image=File:bar.png

to suggest

wikimedia_commons=File:foo.png;File:bar.png

but as far as I understood the semicolon-is-ambiguous issue is even more pronounced for the wikimedia_commons tag, and should therefore be avoided altogether, isn't it?

@hlfan

This comment was marked as resolved.

@matkoniecz
Copy link
Copy Markdown
Contributor

wait, is the consensus to fix image=File:bar.png ones?

last time I looked into it (wanted to make a bot edit) there was, sadly, no clear consensus

@matkoniecz
Copy link
Copy Markdown
Contributor

(and if there is consensus it would make far more sense to run a bot edit rather than asking people to do bot job manually)

@tyrasd
Copy link
Copy Markdown
Member Author

tyrasd commented Apr 11, 2026

wait, is the consensus to fix image=File:bar.png ones?

last time I looked into it (wanted to make a bot edit) there was, sadly, no clear consensus

Ooh, I did not know that the image tag also allows wikimedia File:* values:

There are at least three common styles of formatting the link:

  • a Wikimedia Commons filename (formatted as File:image.jpg) - although this is commonly tagged as wikimedia_commons=*
  • […]

Looks like we should remove that validation until there is consensus about this in the community. Or at least make tweak the wording of the message and use the less aggressive "blue" validation type.

//edit: PS:

(and if there is consensus it would make far more sense to run a bot edit rather than asking people to do bot job manually)

IMHO there is still merit, as this validation would let mappers discover that the tagging should use one tag over the other. In a "magic" bot edit this is not really the case.

@matkoniecz
Copy link
Copy Markdown
Contributor

Ooh, I did not know that the image tag also allows wikimedia File:* values:

well, there is definitely no consensus for treating it as a valid value

but last time I checked there was no consensus to fix it either

¯\_ (ツ)_/¯

@tyrasd
Copy link
Copy Markdown
Member Author

tyrasd commented Apr 14, 2026

I changed the type of the image to wikimedia_commons validation to a suggestion and tweaked the wording slightly such that keeping it remains acceptable, see screenshot in original post.


it('should propose to remove URL from Wikimedia Commons tag', function() {
var entity = createPointWithTags({
'wikimedia_commons': 'https://commons.wikimedia.org/wiki/File:OpenStreetMap-Editor_iD_Logo.svg#mw-jump-to-license'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

validation An issue with the validation or Q/A code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants