Reproducing the Results

We provide most of the models mentioned in the final results (Table 6 and 7 in the paper). We used fairseq to implement and train the models. You can install the dependencies by executing:

pip install -r requirements.txt

Additionally, we used the luigi library for all the staff related to preprocessing and postprocessing the data. Finally, we provide an evaluation script to determine the scores.

Dual-Source Models

Dual-Source models (vanilla dual-source transformer and dual-source RoBERTa) share the same preprocessing process but use different vocabularies. We train a new sentencepiece model on all training data for dual-source transformer and reuse the original RoBERTa vocabulary for Dual-source RoBERTa.

The input to these models consists of two parts: the article and the requested properties. The properties are separated by the ### token. For instance:

William Costello Kennedy, PC (August 27, 1868 -- January 17, 1923) was a Canadian politician. Born in Ottawa, Ontario, he was first elected to the Canadian House of Commons in the riding of Essex North in the 1917 federal election as a Laurier-Liberal. He was re-elected as a Liberal in 1921. From 1921 until his death, he was the Minister of Railways and Canals in the government of William Lyon Mackenzie King.
family name ### position held ### date of birth ### member of political party ### instance of ### date of death ### occupation ### sex or gender ### given name ### country of citizenship

The first line is a Wikipedia article, and the second contains the requested properties.

Evaluation of Models

We published three models on WikiReading Recycled:

T5,
vanilla dual-source transformer,
dual-source RoBERTa.

Moreover, we prepared a complete pipeline to reproduce our results. To evaluate a model, run:

PYTHONPATH=. luigi --local-scheduler --module tutorial_scripts EvaluateModelTask --model MODEL --split SPLIT

where MODEL may be one of T5, DUAL_SOURCE_TRANSFORMER, DUAL_ROBERTA_TRANSFORMER; and split one of dev-0, test-A, test-B.

For example:

PYTHONPATH=. luigi --local-scheduler --module tutorial_scripts EvaluateModelTask --model DUAL_ROBERTA_TRANSFORMER --split test-B

will evaluate dual-source RoBERTa model on test-B.

You may specify a diagnostic subset by adding --subset SUBSET option (SUBSET: unseen, rare, categorical, relational, exact-match, long-articles). For example:

PYTHONPATH=. luigi --local-scheduler --module tutorial_scripts EvaluateModelTask --model DUAL_ROBERTA_TRANSFORMER --split test-B --subset rare

The results are written to results/DUAL_ROBERTA_TRANSFORMER/test-B.rare.

The schema of directories is as follows:

dataset: the WikiReading Recycled dataset directory
models: the models
processed: the preprocessed data
binarized: the binary version of preprocessed files (required by fairseq)
outputs: model's raw predictions and they post-processed version
results: the final results
fairseq_modules: implementation of the models in fairseq
tutorial_scripts: the pipeline implementation

In case of any problem, contact Tomasz Dwojak or report an issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproducing the Results

Dual-Source Models

Evaluation of Models

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Reproducing the Results

Dual-Source Models

Evaluation of Models