A set of Apache Drill UDFs for working with Internet Domain Names
There is one UDF:
suffix_extract(domain-string): Given a valid internet domain name (FQDN or otherwise) this will return a map with fields for thetld,assigned,subdomain, andhostnamefor all those that are present.
It relies on the crawler-commons Java library.
Retrieve the dependencies and build the UDF:
make deps
make udf
To automatically install it locally, ensure DRILL_HOME is set (the Makefile has a default of /usr/local/drill) and:
make install
Assuming you're running in standalone mode, you can then do:
make restart
You can manually copy:
target/drill-domain-tools-1.0.jartarget/drill-domain-tools-1.0-sources.jardeps/crawler-commons-0.10.jar
(after a successful build) to your $DRILL_HOME/jars/3rdparty directory and manually restart Drill as well.
Using the following query:
SELECT
a.dom AS dom,
a.rec.hostname AS host,
a.rec.assigned AS assigned,
a.rec.tld AS tld,
a.rec.subdomain AS subdomain
FROM
(SELECT dom, suffix_extract(dom) AS rec
FROM
(SELECT 'somehost.subnet.example.co.uk' AS dom
FROM (VALUES((1))))) a;
Here's the output:
$ drill-conf
apache drill 1.14.0-SNAPSHOT
"this isn't your grandfather's sql"
0: jdbc:drill:> SELECT
. . . . . . . > a.dom AS dom,
. . . . . . . > a.rec.hostname AS host,
. . . . . . . > a.rec.assigned AS assigned,
. . . . . . . > a.rec.tld AS tld,
. . . . . . . > a.rec.subdomain AS subdomain
. . . . . . . > FROM
. . . . . . . > (SELECT dom, suffix_extract(dom) AS rec
. . . . . . . > FROM
. . . . . . . > (SELECT 'somehost.subnet.example.co.uk' AS dom
. . . . . . . > FROM (VALUES((1))))) a;
+--------------------------------+-----------+----------------+--------+------------+
| dom | host | assigned | tld | subdomain |
+--------------------------------+-----------+----------------+--------+------------+
| somehost.subnet.example.co.uk | somehost | example.co.uk | co.uk | subnet |
+--------------------------------+-----------+----------------+--------+------------+
1 row selected (0.182 seconds)