Skip to content

Comments

Ensure FormatRawStringLiteral produces valid swift syntax for special characters#3256

Open
itsdevandy wants to merge 1 commit intoswiftlang:mainfrom
itsdevandy:bugfix/2465-FormatRawStringLiteral-ensure-valid-swift-syntax
Open

Ensure FormatRawStringLiteral produces valid swift syntax for special characters#3256
itsdevandy wants to merge 1 commit intoswiftlang:mainfrom
itsdevandy:bugfix/2465-FormatRawStringLiteral-ensure-valid-swift-syntax

Conversation

@itsdevandy
Copy link

@itsdevandy itsdevandy commented Feb 1, 2026

Summary

Currently, the FormatRawStringLiteral refactoring can produce invalid Swift code when moving from extended delimiters to a regular string literal. For example:

  • #"""# becomes """ (Unmatched multi-line literal)
  • #"He says "Hi""# becomes "He says "Hi"" (Syntax error)
  • #"C:\Users"# becomes "C:\Users" (Invalid escape sequence)

This PR adjusts the logic to ensure that the minimum number of # symbols is is always sufficient to represent the content safely. The minimum safe number of # delimiters is now 1.

Alternative: Auto-escaping on hitting string literal

I have also explored an implementation where it can convert these to string literals while injecting the escape characters ( eg: #"He says "Hi""# -> "He says \"Hi\"" ).

I can use this alternative, or a different approach if the preference is to favor standard literals over delimited ones. Open to suggestions!


Test Plan:

  • Modified unit tests for new cases
  • Remaining existing tests pass
  • Files formatted using swift format -ipr

Linked Issue

Closes swiftlang/sourcekit-lsp#2465

@rickhohler
Copy link

rickhohler commented Feb 2, 2026

Hey @itsdevandy! 👋

I'm just a guest here looking to contribute, and I actually started working on this same issue today before realizing you had already been working it! I've closed my PR to defer to yours.

While I was exploring the issue, I did notice one small edge case: checking max(1, ...) might accidentally prevent standard strings like #"hello"# from being simplified down to "hello", which I think is the goal of the "minimal" refactoring.

I wrote a little snippet to handle the tricky #""" case while keeping that minimization behavior. Just thought I'd share it here in case it saves you any time!

    // Logic to safely check implementation
    var shouldRemoveHashes = maximumHashes == 0
    let quote = lit.openingQuote.text 

    if shouldRemoveHashes {
      if quote == "\"" {
         // Check for the #"""..."""# case which parses as quote=""" and content wrapped in quotes.
         for segment in lit.segments {
            if case .stringSegment(let s) = segment {
               if s.content.text.hasPrefix("\"") && s.content.text.hasSuffix("\"") {
                  shouldRemoveHashes = false
                  break
               }
            }
         }
      } else if quote == "\"\"\"" {
         // Check for single-line multiline strings
         let containsNewline =
          lit.openingQuote.trailingTrivia.description.contains("\n")
          || lit.segments.contains(where: { $0.description.contains("\n") })
          || lit.closingQuote.leadingTrivia.description.contains("\n")

        if !containsNewline {
          shouldRemoveHashes = false
        }
      }
    }

Thanks. Rick.

@itsdevandy
Copy link
Author

Thanks @rickhohler, appreciate it! Sorry for the delayed response. I’ve started with a basic implementation for now, but I’m wondering how far we should take the refactoring.

If we want to be comprehensive, we’d need to account for cases like:

  1. Nested quotes: #"He says "Hi""# -> "He says "Hi""
  2. Backslashes: #"C:\Users"# -> "C:\Users"
  3. Interpolations in extended delimiter strings (Eg: #"6 times 7 is #(6 * 7)."#)
  4. Escape sequences in extended delimiter strings (Eg: ###"Line1###nLine2"###)

I’m happy to expand the scope to include these and use your suggested logic, but I wanted to check in first.

@ahoppen , do you have a preference on the scope here? Thanks.

@ahoppen
Copy link
Member

ahoppen commented Feb 9, 2026

do you have a preference on the scope here? Thanks

All of these cases should be handled. One simple thing you could do to check if we can remove all pounds without re-implementing logic from the lexer, might to parse each string literal segment again without the surrounding # and checking if it parses without errors (check Syntax.hasError) and if it still has the same representedLiteralValue.

@itsdevandy
Copy link
Author

parse each string literal segment again without the surrounding # and checking if it parses without errors (check Syntax.hasError) and if it still has the same representedLiteralValue

Thanks, that makes sense. I'm working on implementing it this way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactoring "Convert string literal to minimal number of '#'s" produces incorrect source code

3 participants