|
1 | 1 | # Wikidata Textifier |
2 | 2 |
|
3 | | -**Wikidata Textifier** is an API that transforms Wikidata items into compact format for use in LLMs and GenAI applications. It resolves missing labels of properties and claim values by querying the Wikidata Action API, making it efficient and suitable for AI pipelines. |
| 3 | +**Wikidata Textifier** is an API that transforms Wikidata entities into compact outputs for LLM and GenAI use cases. |
| 4 | +It resolves missing labels for properties and claim values using the Wikidata Action API and caches labels to reduce repeated lookups. |
4 | 5 |
|
5 | | -🔗 Live API: [https://wd-textify.toolforge.org/](https://wd-textify.toolforge.org/) |
| 6 | +Live API: [wd-textify.wmcloud.org](https://wd-textify.wmcloud.org/) |
| 7 | +API Docs: [wd-textify.wmcloud.org/docs](https://wd-textify.wmcloud.org/docs) |
6 | 8 |
|
7 | | ---- |
| 9 | +## Features |
8 | 10 |
|
9 | | -## Functionalities |
| 11 | +- Textify Wikidata entities as `json`, `text`, or `triplet`. |
| 12 | +- Resolve labels for linked entities and properties. |
| 13 | +- Cache labels in MariaDB for faster repeated requests. |
| 14 | +- Support multilingual output with fallback language support. |
| 15 | +- Avoid SPARQL and use Wikidata Action API / EntityData endpoints. |
10 | 16 |
|
11 | | -- **Textifies** any Wikidata item into a readable or JSON format suitable for LLMs. |
12 | | -- **Resolves all labels**, including those missing when querying the Wikidata API. |
13 | | -- **Caches labels** for 90 days to boost performance and reduce API load. |
14 | | -- **Avoids SPARQL** and uses the Wikidata Action API for better efficiency and compatibility. |
15 | | -- **Hosted on Toolforge**: [https://wd-textify.toolforge.org/](https://wd-textify.toolforge.org/) |
| 17 | +## Output Formats |
16 | 18 |
|
17 | | ---- |
| 19 | +- `json`: Structured representation with claims (and optionally qualifiers/references). |
| 20 | +- `text`: Readable summary including label, description, aliases, and attributes. |
| 21 | +- `triplet`: Triplet-style lines with labels and IDs for graph-style traversal. |
18 | 22 |
|
19 | | -## Formats |
20 | | - |
21 | | -- **Text**: A textual representation or summary of the Wikidata item, including its label, description, aliases, and claims. Useful for helping LLMs understand what the item represents. |
22 | | -- **Triplet**: Outputs each triplet as a structured line, including labels and IDs, but omits descriptions and aliases. Ideal for agentic LLMs to traverse and explore Wikidata. |
23 | | -- **JSON**: A structured and compact representation of the full item, suitable for custom formats. |
24 | | - |
25 | | ---- |
26 | | - |
27 | | -## API Usage |
| 23 | +## API |
28 | 24 |
|
29 | 25 | ### `GET /` |
30 | 26 |
|
31 | | -#### Query Parameters |
32 | | - |
33 | | -| Name | Type | Required | Description | |
34 | | -|----------------|---------|----------|-----------------------------------------------------------------------------| |
35 | | -| `id` | string | Yes | Wikidata item ID (e.g., `Q42`) | |
36 | | -| `lang` | string | No | Language code for labels (default: `en`) | |
37 | | -| `format` | string | No | The format of the response, either 'json', 'text', or 'triplet' (default: `json`) | |
38 | | -| `external_ids` | bool | No | Whether to include external IDs in the output (default: `true`) | |
39 | | -| `all_ranks` | bool | No | If false, returns ranked preferred statements, falling back to normal when unavailable (default: `false`) | |
40 | | -| `references` | bool | No | Whether to include references (default: `false`) | |
41 | | -| `fallback_lang` | string | No | Fallback language code if the preferred language is not available (default: `en`) | |
42 | | - |
43 | | ---- |
44 | | - |
45 | | -## Deploy to Toolforge |
46 | | - |
47 | | -1. Shell into the Toolforge system: |
48 | | - |
49 | | -```bash |
50 | | -ssh [UNIX shell username]@login.toolforge.org |
51 | | -``` |
52 | | - |
53 | | -2. Switch to tool user account: |
54 | | - |
55 | | -```bash |
56 | | -become wd-textify |
57 | | -``` |
58 | | - |
59 | | -3. Build from Git: |
60 | | - |
61 | | -```bash |
62 | | -toolforge build start https://github.com/philippesaade-wmde/WikidataTextifier.git |
63 | | -``` |
| 27 | +#### Query parameters |
64 | 28 |
|
65 | | -4. Start the web service: |
| 29 | +| Name | Type | Required | Description | |
| 30 | +|---|---|---|---| |
| 31 | +| `id` | string | Yes | Comma-separated Wikidata IDs (for example: `Q42` or `Q42,Q2`). | |
| 32 | +| `pid` | string | No | Comma-separated property IDs to filter claims (for example: `P31,P279`). | |
| 33 | +| `lang` | string | No | Preferred language code (default: `en`). | |
| 34 | +| `fallback_lang` | string | No | Fallback language code (default: `en`). | |
| 35 | +| `format` | string | No | Output format: `json`, `text`, or `triplet` (default: `json`). | |
| 36 | +| `external_ids` | bool | No | Include `external-id` datatype claims (default: `true`). | |
| 37 | +| `all_ranks` | bool | No | Include all statement ranks instead of preferred/normal filtering (default: `false`). | |
| 38 | +| `qualifiers` | bool | No | Include qualifiers in claim values (default: `true`). | |
| 39 | +| `references` | bool | No | Include references in claim values (default: `false`). | |
66 | 40 |
|
67 | | -```bash |
68 | | -webservice buildservice start --mount all |
69 | | -``` |
70 | | - |
71 | | -5. Debugging the web service: |
72 | | - |
73 | | -Read the logs: |
74 | | -```bash |
75 | | -webservice logs |
76 | | -``` |
| 41 | +#### Example requests |
77 | 42 |
|
78 | | -Open the service shell: |
79 | 43 | ```bash |
80 | | -webservice shell |
| 44 | +curl "https://wd-textify.wmcloud.org/?id=Q42" |
| 45 | +curl "https://wd-textify.wmcloud.org/?id=Q42&format=text&lang=en" |
| 46 | +curl "https://wd-textify.wmcloud.org/?id=Q42,Q2&pid=P31,P279&format=triplet" |
81 | 47 | ``` |
0 commit comments