gleanwork
diff --git a/‎.markdown-coderc.json‎
Lines changed: 1 addition & 2 deletions b/‎.markdown-coderc.json‎
Lines changed: 1 addition & 2 deletions
diff --git a/‎README.md‎
Lines changed: 57 additions & 330 deletions b/‎README.md‎
Lines changed: 57 additions & 330 deletions
diff --git a/‎docs/advanced.md‎
Lines changed: 78 additions & 0 deletions b/‎docs/advanced.md‎
Lines changed: 78 additions & 0 deletions
diff --git a/‎docs/architecture.md‎
Lines changed: 59 additions & 0 deletions b/‎docs/architecture.md‎
Lines changed: 59 additions & 0 deletions
@@ -1,6 +1,5 @@
 {
   "snippetRoot": "./snippets",
-  "markdownGlob": "README.md",
+  "markdownGlob": "{README.md,docs/**/*.md}",
   "includeExtensions": [".ts", ".js", ".py"]
 }
-
@@ -0,0 +1,78 @@
+# Advanced Usage
+
+## Choosing a Connector Type
+
+| Connector | Data Client | Best For |
+|---|---|---|
+| `BaseDatasourceConnector` | `BaseDataClient` | Small-to-medium datasets that fit in memory |
+| `BaseStreamingDatasourceConnector` | `BaseStreamingDataClient` | Large datasets with sync/paginated APIs |
+| `BaseAsyncStreamingDatasourceConnector` | `BaseAsyncStreamingDataClient` | Large datasets with async APIs (aiohttp, httpx async) |
+
+### BaseDatasourceConnector
+
+**Use when:**
+
+- All data fits comfortably in memory
+- Your API returns all data in one call (or a small number of calls)
+- You're indexing wikis, knowledge bases, documentation sites, or file systems with moderate content
+
+**Avoid when:**
+
+- The dataset is too large to fit in memory
+- Individual documents are very large (> 10MB each)
+- Memory usage is a concern
+
+### BaseStreamingDatasourceConnector
+
+**Use when:**
+
+- Data is too large to load all at once
+- Your source API is paginated
+- You want to process data incrementally to limit memory usage
+- You're in a memory-constrained environment
+
+**Avoid when:**
+
+- Your dataset fits comfortably in memory (use `BaseDatasourceConnector` instead for simplicity)
+
+### BaseAsyncStreamingDatasourceConnector
+
+**Use when:**
+
+- Your data source provides async APIs (e.g., `aiohttp`, `httpx` async client)
+- You want non-blocking I/O during data retrieval
+- You're already working in an async codebase
+- You need to make concurrent requests to your source system
+
+**Avoid when:**
+
+- Your source API only has synchronous clients (use `BaseStreamingDatasourceConnector` instead)
+- You don't need async I/O benefits
+
+## Forced Restart Uploads
+
+All connector types support forced restart uploads via `force_restart=True`:
+
+```python
+connector.index_data(mode=IndexingMode.FULL, force_restart=True)
+```
+
+Or for async connectors:
+
+```python
+await connector.index_data_async(mode=IndexingMode.FULL, force_restart=True)
+```
+
+### When to Use
+
+- Aborting and restarting a failed or interrupted upload
+- Ensuring a clean upload state by discarding partial uploads
+- Recovering from upload errors or inconsistent states
+
+### How It Works
+
+1. Generates a new `upload_id` to ensure clean separation from previous uploads
+2. Sets `forceRestartUpload=True` on the **first batch only**
+3. Continues with normal batch processing for subsequent batches
+
+This feature is available on `BaseDatasourceConnector`, `BaseStreamingDatasourceConnector`, `BaseAsyncStreamingDatasourceConnector`, and `BasePeopleConnector`.
@@ -0,0 +1,59 @@
+# Architecture Overview
+
+The Glean Indexing SDK follows a simple, predictable pattern for all connector types. Understanding this flow will help you implement any connector quickly.
+
+## Data Flow
+
+```mermaid
+sequenceDiagram
+    participant User
+    participant Connector as "Connector<br/>(BaseDatasourceConnector<br/>or BasePeopleConnector)"
+    participant DataClient as "DataClient<br/>(BaseDataClient<br/>or StreamingDataClient)"
+    participant External as "External System<br/>(API/Database)"
+    participant Glean as "Glean API"
+
+    User->>+Connector: 1. connector.index_data()<br/>or connector.index_people()
+    Connector->>+DataClient: 2. get_source_data()
+    DataClient->>+External: 3. Fetch data
+    External-->>-DataClient: Raw source data
+    DataClient-->>-Connector: Typed source data
+    Connector->>Connector: 4. transform() or<br/>transform_people()
+    Note over Connector: Transform to<br/>DocumentDefinition or<br/>EmployeeInfoDefinition
+    Connector->>+Glean: 5. Batch upload documents<br/>or employee data
+    Glean-->>-Connector: Upload response
+    Connector-->>-User: Indexing complete
+```
+
+## Key Components
+
+1. **DataClient** — Fetches raw data from your external system (API, database, files, etc.)
+2. **Connector** — Transforms your data into Glean's format and handles the upload process
+
+## Connector Hierarchy
+
+```
+BaseConnector (abstract)
+├── BaseDatasourceConnector[T]          — documents that fit in memory
+│   ├── BaseStreamingDatasourceConnector[T]      — large/paginated datasets (sync generator)
+│   └── BaseAsyncStreamingDatasourceConnector[T]  — large datasets with async I/O
+└── BasePeopleConnector                 — employee/identity indexing
+```
+
+## Data Client Hierarchy
+
+```
+BaseDataClient[T]                — fetches all data at once, returns Sequence[T]
+BaseStreamingDataClient[T]       — yields data incrementally via Generator[T]
+BaseAsyncStreamingDataClient[T]  — yields data incrementally via AsyncGenerator[T]
+```
+
+## Implementation Pattern
+
+Every connector follows the same four steps:
+
+1. **Define your data type** — a `TypedDict` describing your source data
+2. **Create a data client** — extends the appropriate `BaseDataClient` variant to fetch from your source
+3. **Create a connector** — extends the appropriate `BaseDatasourceConnector` variant, sets `configuration`, and implements `transform()`
+4. **Run it** — call `index_data()` (or `index_data_async()` for async connectors)
+
+See the [Quickstart](../README.md) for a complete working example.
Original file line number	Diff line number	Diff line change
`@@ -1,6 +1,5 @@`
`1`	`1`	`{`
`2`	`2`	`"snippetRoot": "./snippets",`
`3`		`- "markdownGlob": "README.md",`
	`3`	`+ "markdownGlob": "{README.md,docs/*/.md}",`
`4`	`4`	`"includeExtensions": [".ts", ".js", ".py"]`
`5`	`5`	`}`
`6`		`-`