| Connector | Data Client | Best For |
|---|---|---|
BaseDatasourceConnector |
BaseDataClient |
Small-to-medium datasets that fit in memory |
BaseStreamingDatasourceConnector |
BaseStreamingDataClient |
Large datasets with sync/paginated APIs |
BaseAsyncStreamingDatasourceConnector |
BaseAsyncStreamingDataClient |
Large datasets with async APIs (aiohttp, httpx async) |
Use when:
- All data fits comfortably in memory
- Your API returns all data in one call (or a small number of calls)
- You're indexing wikis, knowledge bases, documentation sites, or file systems with moderate content
Avoid when:
- The dataset is too large to fit in memory
- Individual documents are very large (> 10MB each)
- Memory usage is a concern
Use when:
- Data is too large to load all at once
- Your source API is paginated
- You want to process data incrementally to limit memory usage
- You're in a memory-constrained environment
Avoid when:
- Your dataset fits comfortably in memory (use
BaseDatasourceConnectorinstead for simplicity)
Use when:
- Your data source provides async APIs (e.g.,
aiohttp,httpxasync client) - You want non-blocking I/O during data retrieval
- You're already working in an async codebase
- You need to make concurrent requests to your source system
Avoid when:
- Your source API only has synchronous clients (use
BaseStreamingDatasourceConnectorinstead) - You don't need async I/O benefits
All connector types support forced restart uploads via force_restart=True:
connector.index_data(mode=IndexingMode.FULL, force_restart=True)Or for async connectors:
await connector.index_data_async(mode=IndexingMode.FULL, force_restart=True)- Aborting and restarting a failed or interrupted upload
- Ensuring a clean upload state by discarding partial uploads
- Recovering from upload errors or inconsistent states
- Generates a new
upload_idto ensure clean separation from previous uploads - Sets
forceRestartUpload=Trueon the first batch only - Continues with normal batch processing for subsequent batches
This feature is available on BaseDatasourceConnector, BaseStreamingDatasourceConnector, BaseAsyncStreamingDatasourceConnector, and BasePeopleConnector.