Skip to content

Commit 6d2996a

Browse files
Added note about train_version
1 parent e2296bc commit 6d2996a

File tree

1 file changed

+1
-2
lines changed

1 file changed

+1
-2
lines changed

README.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -512,8 +512,7 @@ The steps involved are:
512512

513513
1. Create a directory under `coreferee/lang/` with the same structure as the existing language-specific directories; it is probably easiest to copy one of them.
514514

515-
2. The file `config.cfg` lists the spaCy models for which you wish to generate Coreferee models. You will need to specify a [separate vectors model](#model-performance) for any of the spaCy models that lack vectors or context-dependent tensors of their own — see the English `config.cfg` for an example. Each config entry specifies a minimum (`from_version`) and maximum (`to_version`) spaCy model version number that the generated Coreferee model will support. During development, both numbers will normally refer to a single version number. Later, when an updated spaCy model version is brought out, testing will be required to see whether the existing Coreferee model still supports the new spaCy model version. If so, the maximum version number can be increased; if not, a new config entry will be necessary to accommodate the new Coreferee model that will then be required.
516-
515+
2. The file `config.cfg` lists the spaCy models for which you wish to generate Coreferee models. You will need to specify a [separate vectors model](#model-performance) for any of the spaCy models that lack vectors or context-dependent tensors of their own — see the English `config.cfg` for an example. Each config entry specifies a minimum (`from_version`) and maximum (`to_version`) spaCy model version number that the generated Coreferee model will support, as well as the spaCy model version number with which the Coreferee model is trained (`train_version`). During development, all three numbers will normally refer to a single version number. Later, when an updated spaCy model version is brought out, testing will be required to see whether the existing Coreferee model still supports the new spaCy model version. If so, the maximum version number can be increased; if not, a new config entry will be necessary to accommodate the new Coreferee model that will then be required.
517516
3. The file `rules.py` in the main code directory contains an abstract class `RulesAnalyzer` that must be implemented by a class `LanguageSpecificRulesAnalyzer` within a file called `language_specific_rules.py` in each language-specific directory. The abstract class `RulesAnalyzer` contains docstrings that specify for each abstract property and method the contract to which implementing classes should adhere. Looking at the existing language-specific rules is also likely to be helpful. The method `is_potential_anaphor()` is normally the most work to create: here it is probably worth looking at the existing English method for languages with natural gender or at the existing German method for languages with grammatical gender. (Polish has an unusually complex gender system, so the Polish example is unlikely to be helpful even as a basis for working with other Slavonic languages.)
518517

519518
4. There are some situations where word lists can be helpful. If a list is placed in a file `<name>.dat` within the `data` directory under a language-specific directory, the contents will be automatically made available within the `LanguageSpecificRulesAnalyzer` for the language in question as a variable `self.<name>` that contains a list where each entry corresponds to a line from the file; comments with `#` are supported. If you use a word list, please ensure it can be published under the MIT license and give appropriate attribution within the language-specific directory in the `LICENSE` and, where appropriate, in a `COPYING` file.

0 commit comments

Comments
 (0)