Skip to content

Commit 845ef48

Browse files
committed
fix: ignore punctuation at end of URL
Problem: URLs in help docs may be followed by "." or ",", but it's usually not intended as part of the URL. Examples from neovim/neovim#36597: https://luarocks.org, https://neovim.io/doc/, Solution: - Treat "." as a word. - Assume that `)].,` at the end of a URL is not part of the URL. Now NESTED parens work: (https://neovim.io/doc/user/vimfn.html#get()-blob) but it's not possible to support a trailing closing paren ")": (https://neovim.io/doc/user/api.html#nvim_input()) workaround: URL-encode the trailing paren: (https://neovim.io/doc/user/api.html#nvim_input%28%29) URL cannot contain a closing bracket `]` anywhere in the URL. (Workaround: URL-encode the bracket.) This is a tradeoff so that markdown hyperlinks work: [https://example.com](https://example.com) Bonus(?): now the inline code in this example is recognized: `foo`.bar
1 parent 5cb043a commit 845ef48

File tree

12 files changed

+99
-12
lines changed

12 files changed

+99
-12
lines changed

.editorconfig

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,10 @@ root = true
44
charset = utf-8
55
end_of_line = lf
66
insert_final_newline = true
7-
trim_trailing_whitespace = true
7+
8+
[*.txt]
9+
# Some test files have intentional whitespace at EOL.
10+
trim_trailing_whitespace = false
811

912
[*.{json,toml,yml,gyp}]
1013
indent_style = space

README.md

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@ Overview
3232
nature; parsing the contents would require loading a "child" language
3333
(injection). See [#2](https://github.com/neovim/tree-sitter-vimdoc/issues/2).
3434
- the terminating `<` (and any following whitespace) is discarded (anonymous).
35+
- `url` intentionally does not capture `.,)` at the end of the URL. See also [Known issues](#known-issues).
3536
- `h1` = "Heading 1": `======` followed by text and optional `*tags*`.
3637
- `h2` = "Heading 2": `------` followed by text and optional `*tags*`.
3738
- `h3` = "Heading 3": UPPERCASE WORDS, followed by optional `*tags*`, followed
@@ -45,8 +46,11 @@ Known issues
4546
- Spec requires that `codeblock` delimiter ">" must be preceded by a space
4647
(" >"), not a tab. But currently the grammar doesn't enforce this. Example:
4748
`:help lcs-tab`.
48-
- `url` doesn't handle _surrounding_ parens. E.g. `(https://example.com/#yay)` yields `word`
49-
- `url` doesn't handle _nested_ parens. E.g. `(https://example.com/(foo)#yay)`
49+
- `url` cannot contain a closing bracket `]` anywhere in the URL. (Workaround:
50+
URL-encode the bracket.) This is a tradeoff so that markdown hyperlinks work:
51+
```
52+
[https://example.com](https://example.com)
53+
```
5054
- `column_heading` currently only recognizes tilde `~` preceded by space (i.e.
5155
`foo ~` not `foo~`). This covers 99% of :help files.
5256
- `column_heading` children should be plaintext, but currently are parsed as `$._atom`.
@@ -55,8 +59,8 @@ Known issues
5559
TODO
5660
----
5761

58-
- `tag_heading` : line(s) containing only tags, typically implies a "heading"
59-
before a block.
62+
- `h4` ("tag heading") : a line containing only tags, or ending with a tag, is
63+
a "h4" heading.
6064

6165
Release
6266
-------

grammar.js

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ module.exports = grammar({
4141
$._atom_common,
4242
),
4343
word: ($) => choice(
44-
token(prec(-1, /[^,(\[\n\t ]+/)),
44+
token(prec(-1, /[^.,(\[\n\t ]+/)),
4545
$._word_common,
4646
),
4747

@@ -89,11 +89,14 @@ module.exports = grammar({
8989
/\{\{+[0-9]*/,
9090

9191
'(',
92+
')',
9293
'[',
94+
']',
9395
'~',
9496
// NOT codeblock: random ">" in middle of the motherflippin text.
9597
'>',
9698
',',
99+
'.',
97100
),
98101

99102
note: () => choice(
@@ -223,7 +226,7 @@ module.exports = grammar({
223226
'*', '*'),
224227

225228
// URL without surrounding (), [], etc.
226-
url_word: () => /https?:[^\n\t)\] ]+/,
229+
url_word: () => /https?:\/\/[^\n\t\] ]*[^\n\t )\].,]/,
227230
url: ($) => choice(
228231
// seq('(', field('text', prec.left(alias($.url_word, $.word))), token.immediate(')')),
229232
// seq('[', field('text', prec.left(alias($.url_word, $.word))), token.immediate(']')),

test/corpus/arguments.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -207,6 +207,7 @@ EXTERNAL *netrw-externapp* {{{2
207207
(h1
208208
(delimiter)
209209
(heading
210+
(word)
210211
(word)
211212
(word))
212213
(tag

test/corpus/codeblock.txt

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,8 @@ text
8282
(line)))))
8383
(block
8484
(line
85+
(word)
86+
(word)
8587
(word)
8688
(word)))
8789
(block
@@ -90,6 +92,7 @@ text
9092
(word)
9193
(taglink
9294
(word))
95+
(word)
9396
(word))
9497
(line
9598
(codeblock
@@ -336,6 +339,7 @@ To test for a non-empty string, use empty(): >
336339
(word)
337340
(word)
338341
(word)
342+
(word)
339343
(word))
340344
(line
341345
(word)
@@ -359,6 +363,7 @@ To test for a non-empty string, use empty(): >
359363
(word)
360364
(word)
361365
(word)
366+
(word)
362367
(codeblock
363368
(code
364369
(line))))))

test/corpus/codespan.txt

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@ an error`.
4444
(word)
4545
(MISSING "`")))
4646
(line
47+
(word)
4748
(word)
4849
(word))))
4950

@@ -79,4 +80,4 @@ g'{mark} g`{mark}
7980
(word)))
8081
(line
8182
(word)
82-
(word))))
83+
(word))))

test/corpus/heading3-column_heading.txt

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -229,17 +229,27 @@ ABC not-h3
229229
(word))
230230
(codespan
231231
(word))
232+
(word)
232233
(word))
233234
(line
234235
(codespan
235236
(word))
237+
(word)
236238
(word))
237239
(line
238240
(word)
239241
(word)
240242
(word)
243+
(word)
241244
(word))
242245
(line
246+
(word)
247+
(word)
248+
(word)
249+
(codespan
250+
(word))
251+
(word)
252+
(word)
243253
(word))))
244254

245255
================================================================================

test/corpus/line_block.txt

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,8 @@ li continues
103103
(block
104104
(line_li
105105
(line
106+
(word)
107+
(word)
106108
(word)
107109
(word))
108110
(line))
@@ -121,6 +123,7 @@ li continues
121123
(word)
122124
(word)
123125
(word)
126+
(word)
124127
(word))
125128
(line
126129
(optionlink
@@ -130,6 +133,9 @@ li continues
130133
(word))))
131134
(line_li
132135
(line
136+
(word)
137+
(word)
138+
(word)
133139
(word)
134140
(word)
135141
(word)
@@ -199,13 +205,17 @@ listitem with codeblock
199205
(block
200206
(line_li
201207
(line
208+
(word)
209+
(word)
202210
(word)
203211
(word))
204212
(codeblock
205213
(code
206214
(line))))
207215
(line_li
208216
(line
217+
(word)
218+
(word)
209219
(word)
210220
(word))
211221
(line))

test/corpus/optionlink.txt

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,7 @@ Regular / :help /[
7575
(help_file
7676
(block
7777
(line
78+
(word)
7879
(word)
7980
(word))
8081
(line
@@ -96,6 +97,8 @@ Regular / :help /[
9697
(word)
9798
(word)
9899
(word)
100+
(word)
101+
(word)
99102
(ERROR
100103
(word))
101104
(word)
@@ -162,6 +165,7 @@ foo '"\ '. Notice
162165
(word)
163166
(word)
164167
(word)
168+
(word)
165169
(word))
166170
(line
167171
(taglink

test/corpus/taglink.txt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -148,6 +148,9 @@ Note: ":autocmd" can...
148148
(line
149149
(note)
150150
(word)
151+
(word)
152+
(word)
153+
(word)
151154
(word))
152155
(line
153156
(word)

0 commit comments

Comments
 (0)