Skip to content

Commit 50b9433

Browse files
Merge pull request #20 from jaleman-vdr-wikimedia/main
Add integration suite, sample.env, and update example docs
2 parents 60864f4 + 365a007 commit 50b9433

File tree

10 files changed

+586
-94
lines changed

10 files changed

+586
-94
lines changed

example/batches/README.md

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,28 @@ The articles included in the batches follow [this](https://gitlab.wikimedia.org/
1616

1717
The batches metadata follow [this](https://gitlab.wikimedia.org/repos/wme/wikimedia-enterprise/-/blob/main/general/schema/snapshot.go) schema.
1818

19+
## Prerequisites
1920

21+
Before running this script, you must have your environment set up.
22+
23+
1. **Environment Variables:** The script requires user credentials to authenticate with the API. Ensure the following environment variables are set on the .env file:
24+
25+
```bash
26+
WME_USERNAME="your_username"
27+
WME_PASSWORD="your_password"
28+
```
29+
30+
2. **Python Dependencies:** You must have the required packages, like `httpx` and the SDK's modules, available in your Python environment.
31+
32+
## How to Run
33+
34+
This script is designed to be run from the **virtual enviroment** of the SDK. Once within the virtual enviroment, execute the script:
35+
36+
```bash
37+
python -m example.batches.batches
38+
```
39+
40+
## Use Cases
2041
2142
i) Get metadata of all the available batches for a day and hour.
2243
@@ -170,3 +191,55 @@ with header:
170191
"Range": "bytes=21-36"
171192
}
172193
```
194+
195+
## Expected Output
196+
197+
The script will log its progress for each of the five use cases. A successful run will look similar to this:
198+
199+
```
200+
INFO:__main__:Successfully authenticated.
201+
INFO:__main__:--- Targeting batches for 2025-10-26 10:00 UTC ---
202+
203+
INFO:__main__:--- i) Get metadata for all available batches ---
204+
INFO:__main__:Found 150 total batches.
205+
INFO:__main__:Metadata for the first batch:
206+
INFO:__main__:{
207+
"identifier": "arwiki_namespace_0",
208+
"version": "...",
209+
...
210+
}
211+
212+
INFO:__main__:--- ii) Get metadata for 'en' (English) batches ---
213+
INFO:__main__:Found 12 'en' batches.
214+
INFO:__main__:Metadata for the first 'en' batch:
215+
INFO:__main__:{
216+
"identifier": "enwiki_namespace_0",
217+
...
218+
}
219+
220+
INFO:__main__:--- iii) Get metadata for a single batch (enwiki_namespace_0) ---
221+
INFO:__main__:Metadata for 'enwiki_namespace_0':
222+
INFO:__main__:{
223+
"identifier": "enwiki_namespace_0",
224+
"version": "...",
225+
...
226+
}
227+
228+
INFO:__main__:--- iv) Get HEAD metadata for a single batch (enwiki_namespace_0) ---
229+
INFO:__main__:Headers for 'enwiki_namespace_0':
230+
INFO:__main__:{
231+
"ETag": "...",
232+
"Content-Type": "application/gzip",
233+
"Content-Length": 123456789
234+
}
235+
INFO:__main__:Content-Length from HEAD: 123456789 bytes
236+
237+
INFO:__main__:--- v) Download and read a batch (enwiki_namespace_0) ---
238+
INFO:__main__:Downloading 'enwiki_namespace_0' into an in-memory buffer...
239+
INFO:__main__:Downloaded 117.74 MB in 8.34 s
240+
INFO:__main__:Processing the downloaded archive...
241+
INFO:__main__:Successfully processed 25148 articles from the batch.
242+
INFO:__main__:First 5 article identifiers: ['Q1', 'Q2', 'Q3', 'Q4', 'Q5']
243+
INFO:__main__:Shutting down helper and revoking tokens...
244+
INFO:__main__:Exiting.
245+
```

example/batches/batches.py

Lines changed: 4 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -11,37 +11,22 @@
1111
thread-safe token management and revocation.
1212
"""
1313

14-
import sys
15-
import os
1614
import logging
1715
import json
1816
import io
1917
import time
2018
from datetime import datetime, timedelta, timezone
2119

22-
try:
23-
PROJECT_ROOT = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
24-
sys.path.insert(0, PROJECT_ROOT)
25-
except NameError:
26-
PROJECT_ROOT = os.path.abspath('.')
27-
sys.path.insert(0, PROJECT_ROOT)
28-
2920
# --- Import our custom modules ---
30-
try:
31-
from modules.auth.auth_client import AuthClient
32-
from modules.auth.helper import Helper
33-
from modules.api.api_client import Client, Request, Filter
34-
from modules.api.exceptions import APIRequestError, APIStatusError, APIDataError
35-
except ImportError as e:
36-
print("Error: Failed to import modules. Make sure you are running from the project root.")
37-
print("Details: %s", e)
38-
sys.exit(1)
21+
from modules.auth.auth_client import AuthClient
22+
from modules.auth.helper import Helper
23+
from modules.api.api_client import Client, Request, Filter
24+
from modules.api.exceptions import APIRequestError, APIStatusError, APIDataError
3925

4026
# --- Setup logging ---
4127
logging.basicConfig(level=logging.INFO)
4228
logger = logging.getLogger(__name__)
4329

44-
4530
def main():
4631
"""Runs the main demonstration of the Batches API."""
4732
helper = None

example/client_config/README.md

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
# Example: Custom API Client Configuration
2+
3+
This script demonstrates how to configure the `Client` from the WME API SDK with custom HTTP settings. It specifically shows how to set a custom **timeout**, **max\_retries**, and **User-Agent** string.
4+
5+
The primary demonstration is to set an extremely low timeout (`0.1s`) and then correctly catch the `APIRequestError` (wrapping an `httpx.TimeoutException`) that this intentionally causes.
6+
7+
## Prerequisites
8+
9+
Before running this script, you must have your environment set up.
10+
11+
1. **Environment Variables:** The script requires user credentials to authenticate with the API. Ensure the following environment variables are set on the .env file:
12+
13+
```bash
14+
WME_USERNAME="your_username"
15+
WME_PASSWORD="your_password"
16+
```
17+
18+
2. **Python Dependencies:** You must have the required packages, like `httpx` and the SDK's modules, available in your Python environment.
19+
20+
## How to Run
21+
22+
This script is designed to be run from the **virtual enviroment** of the SDK. Once within the virtual enviroment, execute the script:
23+
24+
```bash
25+
python -m example.client_config.clientconfig
26+
```
27+
28+
## Expected Output
29+
30+
The script is considered **successful** when it logs that it *caught the expected timeout error*. This proves the custom `timeout=0.1` setting was correctly applied.
31+
32+
You should see output similar to this:
33+
34+
```
35+
INFO:__main__:Setting up authentication...
36+
INFO:__main__:
37+
Initializing API Client with custom settings...
38+
INFO:__main__:Initialized Client with: timeout=0.1, max_retires=2, user_agent='clientconfig Script'
39+
INFO:__main__:Successfully authenticated custom client!
40+
41+
INFO:__main__:--- Demonstrating Custom Timeout ---
42+
43+
INFO:__main__:Attempting an API call expected to timeout (timeout=0.1s)...
44+
INFO:__main__:Success! Caught expected timeout error: Request Error: An error occurred while requesting POST https://api.enterprise.wikimedia.com/v2/codes (Details: Read timed out)
45+
INFO:__main__:Shutting down helper and revoking tokens...
46+
INFO:__main__:Exiting.
47+
```
48+
49+
If you instead see `Error: The API call succeeded unexpectedly...`, it means your network connection was somehow fast enough to complete the request and get a response in under 0.1 seconds, which is highly unlikely but technically possible.

example/client_config/clientconfig.py

Lines changed: 4 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -4,32 +4,14 @@
44
and user_agent settings.
55
"""
66

7-
import sys
8-
import os
97
import logging
108
import httpx
119

12-
# --- Add project root to sys.path ---
13-
try:
14-
PROJECT_ROOT = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
15-
if PROJECT_ROOT not in sys.path:
16-
sys.path.insert(0, PROJECT_ROOT)
17-
except NameError:
18-
PROJECT_ROOT = os.path.abspath('.')
19-
if PROJECT_ROOT not in sys.path:
20-
sys.path.insert(0, PROJECT_ROOT)
21-
2210
# --- Import our custom modules ---
23-
try:
24-
from modules.auth.auth_client import AuthClient
25-
from modules.auth.helper import Helper
26-
from modules.api.api_client import Client, Request
27-
from modules.api.exceptions import APIRequestError, APIStatusError, APIDataError
28-
except ImportError as e:
29-
print("Error: Failed to import modules. Make sure you are running from the project root")
30-
print(" or that '{PROJECT_ROOT}' is correct.")
31-
print("Details: {e}")
32-
sys.exit(1)
11+
from modules.auth.auth_client import AuthClient
12+
from modules.auth.helper import Helper
13+
from modules.api.api_client import Client, Request
14+
from modules.api.exceptions import APIRequestError, APIStatusError, APIDataError
3315

3416
# --- Setup logging ---
3517
logging.basicConfig(level=logging.INFO)

example/metadata/README.md

Lines changed: 133 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,13 +6,36 @@ Refer to the documentation [here](https://enterprise.wikimedia.com/docs/metadata
66
Get metadata on available project codes (types).
77
Allows filtering and field selection.
88

9-
i) Without any parameters. Returns all the project codes.
9+
## Prerequisites
10+
11+
Before running this script, you must have your environment set up.
12+
13+
1. **Environment Variables:** The script requires user credentials to authenticate with the API. Ensure the following environment variables are set on the .env file:
14+
15+
```bash
16+
WME_USERNAME="your_username"
17+
WME_PASSWORD="your_password"
18+
```
19+
20+
2. **Python Dependencies:** You must have the required packages, like `httpx` and the SDK's modules, available in your Python environment.
21+
22+
## How to Run
23+
24+
This script is designed to be run from the **virtual enviroment** of the SDK. Once within the virtual enviroment, execute the script:
25+
26+
```bash
27+
python -m example.metadata.metadata
28+
```
29+
30+
## Use Cases
31+
32+
i) Without any parameters. Returns all the project codes.
1033
1134
```bash
1235
GET https://api.enterprise.wikimedia.com/v2/codes
1336
```
1437
15-
Response:
38+
Response:
1639
```json
1740
[
1841
{
@@ -10306,3 +10329,111 @@ Response:
1030610329
}
1030710330
]
1030810331
```
10332+
## Expected Output
10333+
10334+
The script will log its progress as it steps through each example. A successful run will look similar to this:
10335+
10336+
```
10337+
INFO:__main__:Setting up authentication...
10338+
INFO:__main__:Succesfully authenticated!
10339+
10340+
INFO:__main__:Starting Metadata API examples...
10341+
10342+
INFO:__main__:--- Project Codes ---
10343+
10344+
INFO:__main__: --- Use Case 1: Get all codes ---
10345+
10346+
INFO:__main__:1) Get all project codes:
10347+
INFO:__main__:Found 15 project codes
10348+
INFO:__main__:First code details:
10349+
INFO:__main__:{
10350+
"identifier": "commons",
10351+
"name": "Wikimedia Commons",
10352+
"url": "https://commons.wikimedia.org"
10353+
}
10354+
10355+
INFO:__main__:2) Get only the 'identifier' field for all codes:
10356+
INFO:__main__:Identifiers found:
10357+
INFO:__main__:[
10358+
{
10359+
"identifier": "commons"
10360+
},
10361+
{
10362+
"identifier": "mediawiki"
10363+
},
10364+
...
10365+
]
10366+
10367+
INFO:__main__:3) Filter for code 'wiki' and select 'identifier':
10368+
INFO:__main__:Filtered result:
10369+
INFO:__main__:[
10370+
{
10371+
"identifier": "wiki"
10372+
}
10373+
]
10374+
10375+
INFO:__main__:4) Get details for specific code 'wiktionary':
10376+
INFO:__main__:Wiktionary details:
10377+
INFO:__main__:{
10378+
"identifier": "wiktionary",
10379+
"name": "Wiktionary",
10380+
"url": "https://www.wiktionary.org"
10381+
}
10382+
...
10383+
INFO:__main__:--- Languages ---
10384+
10385+
INFO:__main__:1) Get all supported languages:
10386+
INFO:__main__:Found 321 languages.
10387+
INFO:__main__:Details for English ('en'):
10388+
INFO:__main__:{
10389+
"identifier": "en",
10390+
"name": "English",
10391+
"direction": "ltr"
10392+
}
10393+
INFO:__main__:Details for Arabic ('ar'):
10394+
INFO:__main__:{
10395+
"identifier": "ar",
10396+
"name": "العربية",
10397+
"direction": "rtl"
10398+
}
10399+
...
10400+
INFO:__main__:--- Projects ---
10401+
10402+
INFO:__main__:1) Get metadata for all supported projects:
10403+
INFO:__main__:Found 943 projects.
10404+
INFO:__main__:Details for English Wikipedia ('enwiki'):
10405+
INFO:__main__:{
10406+
"identifier": "enwiki",
10407+
"name": "Wikipedia",
10408+
"url": "https://en.wikipedia.org",
10409+
...
10410+
}
10411+
...
10412+
INFO:__main__:--- Namespaces ---
10413+
10414+
INFO:__main__:1) Get metadata for all supported namespaces:
10415+
INFO:__main__:Found 28 namespaces.
10416+
INFO:__main__:Namespace details:
10417+
INFO:__main__:[
10418+
{
10419+
"identifier": 0,
10420+
"name": "Article"
10421+
},
10422+
{
10423+
"identifier": 1,
10424+
"name": "Article talk"
10425+
},
10426+
...
10427+
]
10428+
10429+
INFO:__main__:2) Get details for specific namespace ID 0 (Articles):
10430+
INFO:__main__:Namespace 0 details:
10431+
INFO:__main__:{
10432+
"identifier": 0,
10433+
"name": "Article"
10434+
}
10435+
10436+
INFO:__main__:--- Metadata API examples complete!
10437+
INFO:__main__:Shutting down helper and revoking tokens...
10438+
INFO:__main__:Exiting!
10439+
```

0 commit comments

Comments
 (0)