Skip to content

Fix NSRange length calculations using String.count instead of UTF-16 count#25341

Open
jkmassel wants to merge 10 commits intotrunkfrom
jkmassel/fix-nsrange-string-count
Open

Fix NSRange length calculations using String.count instead of UTF-16 count#25341
jkmassel wants to merge 10 commits intotrunkfrom
jkmassel/fix-nsrange-string-count

Conversation

@jkmassel
Copy link
Contributor

@jkmassel jkmassel commented Mar 3, 2026

Summary

  • Fix ~20 occurrences across 10 files where String.count (grapheme clusters) was used instead of String.utf16.count or NSAttributedString.length (UTF-16 code units) when constructing NSRange values
  • Fix two off-by-one errors in NSAttributedString+StyledHTML.swift and DestructiveAlertHelper.swift where count - 1 incorrectly skipped the last character

NSRange, NSRegularExpression, and NSMutableAttributedString all operate on UTF-16 code units internally. Using Swift's String.count produces incorrect ranges for strings containing emoji, accented characters, or CJK text — potentially causing wrong text attributes, missed regex matches, or crashes.

Files changed

File Fix
RichContentFormatter.swift 13 occurrences: .count.utf16.count
NonceRetrieval.swift 1 occurrence: .count.utf16.count
FormattableNoticonRange.swift 2 occurrences: .count.utf16.count
FilterTabBar.swift .string.count.length
OverviewCell.swift 2x .string.count.length
StatsTotalInsightsCell.swift 3x .count.utf16.count
BloggingRemindersScheduleFormatter.swift .count.utf16.count
ShareExtractor.swift .count.utf16.count
NSAttributedString+StyledHTML.swift .string.count - 1.length (off-by-one fix)
DestructiveAlertHelper.swift .string.count - 1.length (off-by-one fix)

Test plan

  • Verify stats views render correctly with post titles containing emoji
  • Verify rich content formatting works with multi-byte characters
  • Verify blogging reminders text renders correctly
  • Verify notification icon insertion works with special characters
  • Verify share extension URL detection works with multi-byte text

🤖 Generated with Claude Code

…count

NSRange operates on UTF-16 code units, but Swift's String.count returns
grapheme clusters. For strings with emoji, accented characters, or CJK
text, these values differ — causing wrong attributes, missed regex
matches, or potential crashes.

This fixes ~20 occurrences across 10 files by using .utf16.count or
.length (for NSAttributedString) instead of .count when constructing
NSRange values. Also fixes two off-by-one errors where `count - 1`
incorrectly skipped the last character.
@dangermattic
Copy link
Collaborator

dangermattic commented Mar 3, 2026

2 Warnings
⚠️ View files have been modified, but no screenshot or video is included in the pull request. Consider adding some for clarity.
⚠️ This PR is larger than 500 lines of changes. Please consider splitting it into smaller PRs for easier and faster reviews.

Generated by 🚫 Danger

@wpmobilebot
Copy link
Contributor

wpmobilebot commented Mar 3, 2026

App Icon📲 You can test the changes from this Pull Request in WordPress by scanning the QR code below to install the corresponding build.
App NameWordPress
ConfigurationRelease-Alpha
Build Number31305
VersionPR #25341
Bundle IDorg.wordpress.alpha
Commitc365360
Installation URL5l5j1ie3aqqi8
Automatticians: You can use our internal self-serve MC tool to give yourself access to those builds if needed.

@wpmobilebot
Copy link
Contributor

wpmobilebot commented Mar 3, 2026

App Icon📲 You can test the changes from this Pull Request in Jetpack by scanning the QR code below to install the corresponding build.
App NameJetpack
ConfigurationRelease-Alpha
Build Number31305
VersionPR #25341
Bundle IDcom.jetpack.alpha
Commitc365360
Installation URL592j21r9b55g0
Automatticians: You can use our internal self-serve MC tool to give yourself access to those builds if needed.

Add 5 test cases to RichContentFormatterTests that use the family
emoji (👨‍👩‍👧‍👦), which is 1 grapheme cluster but 11 UTF-16 code units.
These tests place HTML tags after emoji content, so the tags fall
outside the NSRange when String.count is used instead of utf16.count.

The tests would fail (or crash) with the old .count code and pass
with the new .utf16.count code, proving the fix is necessary.

Also fixes a related crash in removeTrailingBreakTags where
match.range.location (a UTF-16 offset) was incorrectly used with
String.index(startIndex, offsetBy:) (which expects grapheme cluster
offsets). Replaced with Range(match.range, in:) for correct
UTF-16 to String.Index conversion.
@wpmobilebot
Copy link
Contributor

wpmobilebot commented Mar 3, 2026

🤖 Build Failure Analysis

This build has failures. Claude has analyzed them - check the build annotations for details.

jkmassel and others added 6 commits March 3, 2026 13:38
Introduce NSMutableAttributedString.applyAttribute(_:value:) and
applyAttributes(_:) to eliminate manual NSRange construction when
applying attributes to an entire string. This prevents the
String.count vs utf16.count footgun that causes bugs with emoji
and other multi-code-unit characters.

Migrate all full-range addAttribute/addAttributes call sites across
the codebase to use the new helpers. Also add a SwiftLint custom rule
to flag future uses of the old pattern.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Replace manual index arithmetic in `range(from:)` and `nsRange(from:)`
with Foundation's built-in `Range(nsRange, in:)` and `NSRange(range, in:)`,
and fix `endOfStringNSRange()` and String+RegEx helpers to use `utf16.count`.
Also remove redundant accessibility element entries that duplicate items
already contained in their tableView, and add comprehensive tests for
range conversion with emoji.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
`extractRSDURLFromHTML` used `html.count` (grapheme clusters) to build
the NSRange for the regex search, which is too short when the HTML
contains emoji or other multi-code-unit characters before the RSD link.
Use `html.utf16.count` instead, matching what NSRegularExpression expects.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
…safety

Change `pingbackReadMoreGroup` from `private` to `internal` and fix its
range construction to use `text.utf16.count` so localized translations
with multi-byte characters produce a correct NSRange. Add a unit test
that asserts the link range covers the full text.

Also document why `AddressTableViewCell.processName` is safe: domain
names are ASCII-only (internationalized domains use Punycode).

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Directly constructs a NotificationContentRange with emoji + CJK text
and verifies the range length matches NSAttributedString.length (UTF-16)
and that applying it as a link attribute doesn't crash.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
…ling

Replace manual NSRange construction using `String.count` with
`applyAttributes` helper in `JetpackThreatContext.attributedString(with:)`.
The `contentsStr` can contain multi-byte characters from scanned files,
making the previous `String.count`-based ranges incorrect for
NSAttributedString (which uses UTF-16). Make the extension internal so
it can be tested, and add tests with emoji and CJK content.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@jkmassel jkmassel added this to the 26.8 milestone Mar 3, 2026
@jkmassel jkmassel self-assigned this Mar 3, 2026
@jkmassel jkmassel requested review from crazytonyli and kean March 4, 2026 04:34
jkmassel and others added 2 commits March 3, 2026 21:38
Verify that applyAttribute, applyAttributes, and applyForegroundColor
correctly cover the full string length when the string contains
multi-byte characters like emoji.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Tests that apply() returns the correct UTF-16 code unit count (12)
rather than grapheme cluster count (2) when the noticon value is a
multi-byte emoji like 👨‍👩‍👧‍👦.
@jkmassel jkmassel force-pushed the jkmassel/fix-nsrange-string-count branch from ff05917 to c365360 Compare March 4, 2026 04:39
@sonarqubecloud
Copy link

sonarqubecloud bot commented Mar 4, 2026

private func scrapNonceFromNewPost(html: String) -> String? {
guard let regex = try? NSRegularExpression(pattern: "apiFetch.createNonceMiddleware\\(\\s*['\"](?<nonce>\\w+)['\"]\\s*\\)", options: []),
let match = regex.firstMatch(in: html, options: [], range: NSRange(location: 0, length: html.count)) else {
let match = regex.firstMatch(in: html, options: [], range: NSRange(location: 0, length: html.utf16.count)) else {
Copy link
Contributor

@kean kean Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes are repetitive, so I focused on this first scenario. I generated a set of unit tests for it, verified the original issue and the fact that the change fixes it.

(nit) An idiomatic way to create NSRange from String is with a specialized init method that abstracts the underlying representation away:

NSRange(html.startIndex..., in: html)

(nit) Since it's String in and String out, the line 55 just below requires another conversion from NSRange back to Swift.Range. It might be worth taking it a step further and replacing it with Regex that will also add some compile-time guarantees for the expression.


return lowerBound ..< upperBound
guard let range = Range(nsRange, in: self) else {
preconditionFailure("Invalid NSRange \(nsRange) for string of UTF-16 length \(utf16.count)")
Copy link
Contributor

@kean kean Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might result in a crash in production. The ideal change would be to eliminate String+Conversion.swift as it seems to contain code that is now provided by the system or could be replaced with Regex.

///
public func applyAttributes(_ attrs: [NSAttributedString.Key: Any]) {
// swiftlint:disable:next full_range_attributed_string_attribute
addAttributes(attrs, range: NSRange(location: 0, length: length))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) It doesn't seem to be one of the documented rules and there don't seem to be any custom rules in the .yml` file. It might be the case SwiftLint is behind. I'm not familiar with this rule though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, it's a new one – source is here

///
func configureForAccessibility() {
view.accessibilityElements = [
codeField as Any,
Copy link
Contributor

@kean kean Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes seem to be unrelated to the PR.

let attributedValue: NSMutableAttributedString = NSMutableAttributedString(string: valueToConfirm)
let paragraphStyle = NSMutableParagraphStyle()
paragraphStyle.lineBreakMode = .byCharWrapping
attributedValue.addAttribute(.paragraphStyle, value: paragraphStyle, range: NSMakeRange(0, attributedValue.string.count - 1))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original range seems to exclude the last character – dot, perhaps?


let completeDomainName = NSMutableAttributedString(string: name, attributes: TextStyleAttributes.defaults)
// Domain names are ASCII-only (internationalized domains use Punycode),
// so .count == .utf16.count here and this is safe.
Copy link
Contributor

@kean kean Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it safer to not rely on the fact that domains are ASCII? Should it also use the new setAttributes convenience method that makes the range optional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is AI-generated, but it's also correct and I hadn't thought of it!

I don't foresee this changing in our lifetimes tbh, but if you feel strongly we can change the approach. My goal was to avoid touching any code that wouldn't fix a potential bug which is why I left this one alone.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense.

/// which is error-prone when mixing `String.count` (grapheme clusters) with
/// `NSRange` (UTF-16 code units).
///
public func applyAttribute(_ key: NSAttributedString.Key, value: Any) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) I'd suggest naming it addAttribute to match the original method. The apply is ambiguous – is it set or is it add?

Copy link
Contributor

@kean kean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I re-reviewed the changes with the exception of unit tests. Everything looked perfect, except for some optional nits I left that could be addressed separately.

The only change I would recommend to make before merging is rename applyAttributes because it's ambiguous.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants