A href link but obfuscated by "some other text", e.g. `some other text does not seem to be picked up in the plain-text analysis from Tika. It does look like it can be picked up in Tika's "structured" output. That being said, it is unclear the side-effects of using this other method, e.g. could other structural links be surfaced such as XML schema links and so forth?
I will create a sample using Google Docs.
Kelly P has also mentioned link-rot on Antistatic and their slides file looks a good candidate to analyze for fun and for this particular issue: here.
A href link but obfuscated by "some other text", e.g. `some other text does not seem to be picked up in the plain-text analysis from Tika. It does look like it can be picked up in Tika's "structured" output. That being said, it is unclear the side-effects of using this other method, e.g. could other structural links be surfaced such as XML schema links and so forth?
I will create a sample using Google Docs.
Kelly P has also mentioned link-rot on Antistatic and their slides file looks a good candidate to analyze for fun and for this particular issue: here.