Fix NSRange length calculations using String.count instead of UTF-16 count#25341
Fix NSRange length calculations using String.count instead of UTF-16 count#25341
Conversation
…count NSRange operates on UTF-16 code units, but Swift's String.count returns grapheme clusters. For strings with emoji, accented characters, or CJK text, these values differ — causing wrong attributes, missed regex matches, or potential crashes. This fixes ~20 occurrences across 10 files by using .utf16.count or .length (for NSAttributedString) instead of .count when constructing NSRange values. Also fixes two off-by-one errors where `count - 1` incorrectly skipped the last character.
Generated by 🚫 Danger |
|
| App Name | WordPress | |
| Configuration | Release-Alpha | |
| Build Number | 31305 | |
| Version | PR #25341 | |
| Bundle ID | org.wordpress.alpha | |
| Commit | c365360 | |
| Installation URL | 5l5j1ie3aqqi8 |
|
| App Name | Jetpack | |
| Configuration | Release-Alpha | |
| Build Number | 31305 | |
| Version | PR #25341 | |
| Bundle ID | com.jetpack.alpha | |
| Commit | c365360 | |
| Installation URL | 592j21r9b55g0 |
Add 5 test cases to RichContentFormatterTests that use the family emoji (👨👩👧👦), which is 1 grapheme cluster but 11 UTF-16 code units. These tests place HTML tags after emoji content, so the tags fall outside the NSRange when String.count is used instead of utf16.count. The tests would fail (or crash) with the old .count code and pass with the new .utf16.count code, proving the fix is necessary. Also fixes a related crash in removeTrailingBreakTags where match.range.location (a UTF-16 offset) was incorrectly used with String.index(startIndex, offsetBy:) (which expects grapheme cluster offsets). Replaced with Range(match.range, in:) for correct UTF-16 to String.Index conversion.
🤖 Build Failure AnalysisThis build has failures. Claude has analyzed them - check the build annotations for details. |
Introduce NSMutableAttributedString.applyAttribute(_:value:) and applyAttributes(_:) to eliminate manual NSRange construction when applying attributes to an entire string. This prevents the String.count vs utf16.count footgun that causes bugs with emoji and other multi-code-unit characters. Migrate all full-range addAttribute/addAttributes call sites across the codebase to use the new helpers. Also add a SwiftLint custom rule to flag future uses of the old pattern. Co-Authored-By: Claude Opus 4.6 <[email protected]>
Replace manual index arithmetic in `range(from:)` and `nsRange(from:)` with Foundation's built-in `Range(nsRange, in:)` and `NSRange(range, in:)`, and fix `endOfStringNSRange()` and String+RegEx helpers to use `utf16.count`. Also remove redundant accessibility element entries that duplicate items already contained in their tableView, and add comprehensive tests for range conversion with emoji. Co-Authored-By: Claude Opus 4.6 <[email protected]>
`extractRSDURLFromHTML` used `html.count` (grapheme clusters) to build the NSRange for the regex search, which is too short when the HTML contains emoji or other multi-code-unit characters before the RSD link. Use `html.utf16.count` instead, matching what NSRegularExpression expects. Co-Authored-By: Claude Opus 4.6 <[email protected]>
…safety Change `pingbackReadMoreGroup` from `private` to `internal` and fix its range construction to use `text.utf16.count` so localized translations with multi-byte characters produce a correct NSRange. Add a unit test that asserts the link range covers the full text. Also document why `AddressTableViewCell.processName` is safe: domain names are ASCII-only (internationalized domains use Punycode). Co-Authored-By: Claude Opus 4.6 <[email protected]>
Directly constructs a NotificationContentRange with emoji + CJK text and verifies the range length matches NSAttributedString.length (UTF-16) and that applying it as a link attribute doesn't crash. Co-Authored-By: Claude Opus 4.6 <[email protected]>
…ling Replace manual NSRange construction using `String.count` with `applyAttributes` helper in `JetpackThreatContext.attributedString(with:)`. The `contentsStr` can contain multi-byte characters from scanned files, making the previous `String.count`-based ranges incorrect for NSAttributedString (which uses UTF-16). Make the extension internal so it can be tested, and add tests with emoji and CJK content. Co-Authored-By: Claude Opus 4.6 <[email protected]>
Verify that applyAttribute, applyAttributes, and applyForegroundColor correctly cover the full string length when the string contains multi-byte characters like emoji. Co-Authored-By: Claude Opus 4.6 <[email protected]>
Tests that apply() returns the correct UTF-16 code unit count (12) rather than grapheme cluster count (2) when the noticon value is a multi-byte emoji like 👨👩👧👦.
ff05917 to
c365360
Compare
|
| private func scrapNonceFromNewPost(html: String) -> String? { | ||
| guard let regex = try? NSRegularExpression(pattern: "apiFetch.createNonceMiddleware\\(\\s*['\"](?<nonce>\\w+)['\"]\\s*\\)", options: []), | ||
| let match = regex.firstMatch(in: html, options: [], range: NSRange(location: 0, length: html.count)) else { | ||
| let match = regex.firstMatch(in: html, options: [], range: NSRange(location: 0, length: html.utf16.count)) else { |
There was a problem hiding this comment.
The changes are repetitive, so I focused on this first scenario. I generated a set of unit tests for it, verified the original issue and the fact that the change fixes it.
(nit) An idiomatic way to create NSRange from String is with a specialized init method that abstracts the underlying representation away:
NSRange(html.startIndex..., in: html)
(nit) Since it's String in and String out, the line 55 just below requires another conversion from NSRange back to Swift.Range. It might be worth taking it a step further and replacing it with Regex that will also add some compile-time guarantees for the expression.
|
|
||
| return lowerBound ..< upperBound | ||
| guard let range = Range(nsRange, in: self) else { | ||
| preconditionFailure("Invalid NSRange \(nsRange) for string of UTF-16 length \(utf16.count)") |
There was a problem hiding this comment.
It might result in a crash in production. The ideal change would be to eliminate String+Conversion.swift as it seems to contain code that is now provided by the system or could be replaced with Regex.
| /// | ||
| public func applyAttributes(_ attrs: [NSAttributedString.Key: Any]) { | ||
| // swiftlint:disable:next full_range_attributed_string_attribute | ||
| addAttributes(attrs, range: NSRange(location: 0, length: length)) |
There was a problem hiding this comment.
(nit) It doesn't seem to be one of the documented rules and there don't seem to be any custom rules in the .yml` file. It might be the case SwiftLint is behind. I'm not familiar with this rule though.
| /// | ||
| func configureForAccessibility() { | ||
| view.accessibilityElements = [ | ||
| codeField as Any, |
There was a problem hiding this comment.
These changes seem to be unrelated to the PR.
| let attributedValue: NSMutableAttributedString = NSMutableAttributedString(string: valueToConfirm) | ||
| let paragraphStyle = NSMutableParagraphStyle() | ||
| paragraphStyle.lineBreakMode = .byCharWrapping | ||
| attributedValue.addAttribute(.paragraphStyle, value: paragraphStyle, range: NSMakeRange(0, attributedValue.string.count - 1)) |
There was a problem hiding this comment.
The original range seems to exclude the last character – dot, perhaps?
|
|
||
| let completeDomainName = NSMutableAttributedString(string: name, attributes: TextStyleAttributes.defaults) | ||
| // Domain names are ASCII-only (internationalized domains use Punycode), | ||
| // so .count == .utf16.count here and this is safe. |
There was a problem hiding this comment.
Isn't it safer to not rely on the fact that domains are ASCII? Should it also use the new setAttributes convenience method that makes the range optional?
There was a problem hiding this comment.
It is AI-generated, but it's also correct and I hadn't thought of it!
I don't foresee this changing in our lifetimes tbh, but if you feel strongly we can change the approach. My goal was to avoid touching any code that wouldn't fix a potential bug which is why I left this one alone.
| /// which is error-prone when mixing `String.count` (grapheme clusters) with | ||
| /// `NSRange` (UTF-16 code units). | ||
| /// | ||
| public func applyAttribute(_ key: NSAttributedString.Key, value: Any) { |
There was a problem hiding this comment.
(nit) I'd suggest naming it addAttribute to match the original method. The apply is ambiguous – is it set or is it add?
kean
left a comment
There was a problem hiding this comment.
I re-reviewed the changes with the exception of unit tests. Everything looked perfect, except for some optional nits I left that could be addressed separately.
The only change I would recommend to make before merging is rename applyAttributes because it's ambiguous.





Summary
String.count(grapheme clusters) was used instead ofString.utf16.countorNSAttributedString.length(UTF-16 code units) when constructingNSRangevaluesNSAttributedString+StyledHTML.swiftandDestructiveAlertHelper.swiftwherecount - 1incorrectly skipped the last characterNSRange,NSRegularExpression, andNSMutableAttributedStringall operate on UTF-16 code units internally. Using Swift'sString.countproduces incorrect ranges for strings containing emoji, accented characters, or CJK text — potentially causing wrong text attributes, missed regex matches, or crashes.Files changed
RichContentFormatter.swift.count→.utf16.countNonceRetrieval.swift.count→.utf16.countFormattableNoticonRange.swift.count→.utf16.countFilterTabBar.swift.string.count→.lengthOverviewCell.swift.string.count→.lengthStatsTotalInsightsCell.swift.count→.utf16.countBloggingRemindersScheduleFormatter.swift.count→.utf16.countShareExtractor.swift.count→.utf16.countNSAttributedString+StyledHTML.swift.string.count - 1→.length(off-by-one fix)DestructiveAlertHelper.swift.string.count - 1→.length(off-by-one fix)Test plan
🤖 Generated with Claude Code