lance.dataset() fails on Windows when dataset resides on a network share with spaces in the share name
Environment
| |
|
| pylance |
4.0.1 |
| lancedb |
0.30.2 |
| pyarrow |
20.0.0 |
| Python |
3.12.3 (CPython, Windows, 64-bit) |
| OS |
Windows 11 (Build 26100) |
| Storage |
SMB network share mounted as a Windows drive letter |
Description
lance.dataset() and lancedb's open_table() both fail when the Lance dataset resides on a Windows-mapped network drive whose underlying SMB share name contains a space.
The root cause is that Lance's Rust LocalFileSystem resolves any Windows path (drive letter, subst, symlink) down to its underlying UNC path — e.g. \\192.168.x.x\My Share\... — then attempts to construct a file:// URL from it. Because the share name contains a space, the resulting URL (file:///My%20Share/...) cannot be converted back to a valid filesystem path, and Lance errors out.
Error
LanceError(IO): Generic LocalFileSystem error: Unable to convert URL
"file:///My%20Share/data/my-dataset.lance/_versions/..."
to filesystem path,
lance-io\src\object_store.rs:716
Reproduction
Any Windows machine where:
- A network share is mounted as a drive letter (e.g.
Z:)
- The SMB share name contains a space (e.g.
\\server\My Share)
- A Lance dataset exists somewhere under that share
import lance
ds = lance.dataset(r"Z:\data\my-dataset.lance")
# raises LanceError(IO): Unable to convert URL "file:///My%20Share/..."
What was tried
Every approach below was tested. All fail for the same underlying reason: Windows resolves any path abstraction back to the UNC path before Lance's Rust layer sees it.
1. Drive letter path (baseline)
lance.dataset(r"Z:\data\my-dataset.lance")
Result: Fails. Lance resolves Z: → UNC → URL encoding fails.
2. subst virtual drive
lance.dataset(r"V:\my-dataset.lance")
Result: Fails. subst does not prevent UNC resolution. pathlib.Path(r"V:\...").resolve() returns the full UNC path.
3. NTFS symbolic link (mklink /D)
mklink /D C:\LocalLink "Z:\data"
lance.dataset(r"C:\LocalLink\my-dataset.lance")
Result: Fails. NTFS symlinks on Windows are resolved to their UNC target at the kernel level. Lance sees the UNC path.
4. RFC-8089 UNC file URI (with host)
lance.dataset("file://192.168.x.x/My%20Share/data/my-dataset.lance")
Result: Fails. Lance strips the host component and produces file:///My%20Share/..., which it cannot convert back to a path.
5. Raw UNC path string
lance.dataset(r"\\192.168.x.x\My Share\data\my-dataset.lance")
Result: Fails with same URL conversion error.
6. Direct file:// URI with percent-encoded drive path
lance.dataset("file:///Z:/data/my-dataset.lance")
lance.dataset("file:///Z:/data/my%20dataset.lance")
Result: Fails. Lance still resolves the path to UNC before URL construction.
7. PyArrow SubTreeFileSystem passed to lance.dataset()
import pyarrow.fs as pafs
fs = pafs.SubTreeFileSystem(r"\\server\My Share\data", pafs.LocalFileSystem())
lance.dataset("my-dataset.lance", filesystem=fs)
Result: lance.dataset() does not accept a filesystem keyword argument in 4.0.1.
8. lancedb.connect() + open_table()
import lancedb
db = lancedb.connect(r"C:\LocalLink") # succeeds — lists tables correctly
db.open_table("my dataset") # fails
Result: lancedb.connect() succeeds and list_tables() returns correct table names. However, open_table() panics in Rust with InvalidTableName because the table names contain spaces — spaces are rejected by the name validator even though the underlying directory names contain them. Even when symlinked names without spaces are used, the underlying open_table() call still resolves to UNC and hits the same URL error.
9. lancedb.connect() with file:// URI
db = lancedb.connect("file:///C:/LocalLink")
db.open_table("mydataset")
Result: Connect succeeds, open_table() fails with the UNC URL error.
10. PyArrow SubTreeFileSystem passed as storage_options to lancedb
db = lancedb.connect(r"\\server\My Share\data", storage_options={"filesystem": fs})
Result: 'SubTreeFileSystem' object cannot be converted to 'PyString' — storage_options does not accept filesystem objects.
Workaround
Copying the dataset to a true local NTFS path (no network, no symlink) and opening from there:
import lance, shutil, tempfile, os
SRC = r"Z:\data\my-dataset.lance"
TEMP = tempfile.mkdtemp(dir=r"C:\Temp")
DST = os.path.join(TEMP, "dataset.lance")
shutil.copytree(SRC, DST)
ds = lance.dataset(DST) # works perfectly
print(ds.count_rows())
shutil.rmtree(TEMP)
This confirms Lance reads the format correctly — the failure is exclusively in path/URL resolution for network-backed paths.
Expected behaviour
lance.dataset() should be able to open a dataset from any path accessible to the Python process, including Windows-mapped network drives whose UNC share names contain spaces.
At minimum, either:
- The
LocalFileSystem URL conversion should correctly handle UNC paths with spaces using the proper RFC-8089 file://host/share/path format, or
lance.dataset() should accept a pyarrow.fs.FileSystem object so callers can supply their own filesystem abstraction (as pyarrow.dataset.dataset() does), bypassing Lance's internal path resolution entirely.
Notes
- The manifests do not contain hardcoded absolute paths — fragment filenames are stored as relative names only. The path issue is entirely in Lance's runtime resolution, not the stored data.
lancedb.connect() successfully lists tables from the same path, suggesting the directory-listing layer handles UNC paths correctly. Only the dataset-open layer fails.
- The
InvalidTableName validator in lancedb rejecting spaces is a separate but related friction point for datasets whose directory names contain spaces.
lance.dataset() fails on Windows when dataset resides on a network share with spaces in the share name
Environment
Description
lance.dataset()andlancedb'sopen_table()both fail when the Lance dataset resides on a Windows-mapped network drive whose underlying SMB share name contains a space.The root cause is that Lance's Rust
LocalFileSystemresolves any Windows path (drive letter, subst, symlink) down to its underlying UNC path — e.g.\\192.168.x.x\My Share\...— then attempts to construct afile://URL from it. Because the share name contains a space, the resulting URL (file:///My%20Share/...) cannot be converted back to a valid filesystem path, and Lance errors out.Error
Reproduction
Any Windows machine where:
Z:)\\server\My Share)What was tried
Every approach below was tested. All fail for the same underlying reason: Windows resolves any path abstraction back to the UNC path before Lance's Rust layer sees it.
1. Drive letter path (baseline)
Result: Fails. Lance resolves
Z:→ UNC → URL encoding fails.2.
substvirtual driveResult: Fails.
substdoes not prevent UNC resolution.pathlib.Path(r"V:\...").resolve()returns the full UNC path.3. NTFS symbolic link (
mklink /D)Result: Fails. NTFS symlinks on Windows are resolved to their UNC target at the kernel level. Lance sees the UNC path.
4. RFC-8089 UNC file URI (with host)
Result: Fails. Lance strips the host component and produces
file:///My%20Share/..., which it cannot convert back to a path.5. Raw UNC path string
Result: Fails with same URL conversion error.
6. Direct
file://URI with percent-encoded drive pathResult: Fails. Lance still resolves the path to UNC before URL construction.
7. PyArrow
SubTreeFileSystempassed tolance.dataset()Result:
lance.dataset()does not accept afilesystemkeyword argument in 4.0.1.8.
lancedb.connect()+open_table()Result:
lancedb.connect()succeeds andlist_tables()returns correct table names. However,open_table()panics in Rust withInvalidTableNamebecause the table names contain spaces — spaces are rejected by the name validator even though the underlying directory names contain them. Even when symlinked names without spaces are used, the underlyingopen_table()call still resolves to UNC and hits the same URL error.9.
lancedb.connect()withfile://URIResult: Connect succeeds,
open_table()fails with the UNC URL error.10. PyArrow
SubTreeFileSystempassed asstorage_optionstolancedbResult:
'SubTreeFileSystem' object cannot be converted to 'PyString'—storage_optionsdoes not accept filesystem objects.Workaround
Copying the dataset to a true local NTFS path (no network, no symlink) and opening from there:
This confirms Lance reads the format correctly — the failure is exclusively in path/URL resolution for network-backed paths.
Expected behaviour
lance.dataset()should be able to open a dataset from any path accessible to the Python process, including Windows-mapped network drives whose UNC share names contain spaces.At minimum, either:
LocalFileSystemURL conversion should correctly handle UNC paths with spaces using the proper RFC-8089file://host/share/pathformat, orlance.dataset()should accept apyarrow.fs.FileSystemobject so callers can supply their own filesystem abstraction (aspyarrow.dataset.dataset()does), bypassing Lance's internal path resolution entirely.Notes
lancedb.connect()successfully lists tables from the same path, suggesting the directory-listing layer handles UNC paths correctly. Only the dataset-open layer fails.InvalidTableNamevalidator inlancedbrejecting spaces is a separate but related friction point for datasets whose directory names contain spaces.