Skip to content

Can't Resolve Path on Network Drive #6616

@RainmakerP

Description

@RainmakerP

lance.dataset() fails on Windows when dataset resides on a network share with spaces in the share name

Environment

   
pylance 4.0.1
lancedb 0.30.2
pyarrow 20.0.0
Python 3.12.3 (CPython, Windows, 64-bit)
OS Windows 11 (Build 26100)
Storage SMB network share mounted as a Windows drive letter

Description

lance.dataset() and lancedb's open_table() both fail when the Lance dataset resides on a Windows-mapped network drive whose underlying SMB share name contains a space.

The root cause is that Lance's Rust LocalFileSystem resolves any Windows path (drive letter, subst, symlink) down to its underlying UNC path — e.g. \\192.168.x.x\My Share\... — then attempts to construct a file:// URL from it. Because the share name contains a space, the resulting URL (file:///My%20Share/...) cannot be converted back to a valid filesystem path, and Lance errors out.


Error

LanceError(IO): Generic LocalFileSystem error: Unable to convert URL
"file:///My%20Share/data/my-dataset.lance/_versions/..."
to filesystem path,
lance-io\src\object_store.rs:716

Reproduction

Any Windows machine where:

  • A network share is mounted as a drive letter (e.g. Z:)
  • The SMB share name contains a space (e.g. \\server\My Share)
  • A Lance dataset exists somewhere under that share
import lance
ds = lance.dataset(r"Z:\data\my-dataset.lance")
# raises LanceError(IO): Unable to convert URL "file:///My%20Share/..."

What was tried

Every approach below was tested. All fail for the same underlying reason: Windows resolves any path abstraction back to the UNC path before Lance's Rust layer sees it.

1. Drive letter path (baseline)

lance.dataset(r"Z:\data\my-dataset.lance")

Result: Fails. Lance resolves Z: → UNC → URL encoding fails.


2. subst virtual drive

subst V: Z:\data
lance.dataset(r"V:\my-dataset.lance")

Result: Fails. subst does not prevent UNC resolution. pathlib.Path(r"V:\...").resolve() returns the full UNC path.


3. NTFS symbolic link (mklink /D)

mklink /D C:\LocalLink "Z:\data"
lance.dataset(r"C:\LocalLink\my-dataset.lance")

Result: Fails. NTFS symlinks on Windows are resolved to their UNC target at the kernel level. Lance sees the UNC path.


4. RFC-8089 UNC file URI (with host)

lance.dataset("file://192.168.x.x/My%20Share/data/my-dataset.lance")

Result: Fails. Lance strips the host component and produces file:///My%20Share/..., which it cannot convert back to a path.


5. Raw UNC path string

lance.dataset(r"\\192.168.x.x\My Share\data\my-dataset.lance")

Result: Fails with same URL conversion error.


6. Direct file:// URI with percent-encoded drive path

lance.dataset("file:///Z:/data/my-dataset.lance")
lance.dataset("file:///Z:/data/my%20dataset.lance")

Result: Fails. Lance still resolves the path to UNC before URL construction.


7. PyArrow SubTreeFileSystem passed to lance.dataset()

import pyarrow.fs as pafs
fs = pafs.SubTreeFileSystem(r"\\server\My Share\data", pafs.LocalFileSystem())
lance.dataset("my-dataset.lance", filesystem=fs)

Result: lance.dataset() does not accept a filesystem keyword argument in 4.0.1.


8. lancedb.connect() + open_table()

import lancedb
db = lancedb.connect(r"C:\LocalLink")       # succeeds — lists tables correctly
db.open_table("my dataset")                 # fails

Result: lancedb.connect() succeeds and list_tables() returns correct table names. However, open_table() panics in Rust with InvalidTableName because the table names contain spaces — spaces are rejected by the name validator even though the underlying directory names contain them. Even when symlinked names without spaces are used, the underlying open_table() call still resolves to UNC and hits the same URL error.


9. lancedb.connect() with file:// URI

db = lancedb.connect("file:///C:/LocalLink")
db.open_table("mydataset")

Result: Connect succeeds, open_table() fails with the UNC URL error.


10. PyArrow SubTreeFileSystem passed as storage_options to lancedb

db = lancedb.connect(r"\\server\My Share\data", storage_options={"filesystem": fs})

Result: 'SubTreeFileSystem' object cannot be converted to 'PyString'storage_options does not accept filesystem objects.


Workaround

Copying the dataset to a true local NTFS path (no network, no symlink) and opening from there:

import lance, shutil, tempfile, os

SRC = r"Z:\data\my-dataset.lance"
TEMP = tempfile.mkdtemp(dir=r"C:\Temp")
DST = os.path.join(TEMP, "dataset.lance")

shutil.copytree(SRC, DST)
ds = lance.dataset(DST) # works perfectly
print(ds.count_rows())
shutil.rmtree(TEMP)

This confirms Lance reads the format correctly — the failure is exclusively in path/URL resolution for network-backed paths.


Expected behaviour

lance.dataset() should be able to open a dataset from any path accessible to the Python process, including Windows-mapped network drives whose UNC share names contain spaces.

At minimum, either:

  • The LocalFileSystem URL conversion should correctly handle UNC paths with spaces using the proper RFC-8089 file://host/share/path format, or
  • lance.dataset() should accept a pyarrow.fs.FileSystem object so callers can supply their own filesystem abstraction (as pyarrow.dataset.dataset() does), bypassing Lance's internal path resolution entirely.

Notes

  • The manifests do not contain hardcoded absolute paths — fragment filenames are stored as relative names only. The path issue is entirely in Lance's runtime resolution, not the stored data.
  • lancedb.connect() successfully lists tables from the same path, suggesting the directory-listing layer handles UNC paths correctly. Only the dataset-open layer fails.
  • The InvalidTableName validator in lancedb rejecting spaces is a separate but related friction point for datasets whose directory names contain spaces.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions