Skip to content

Filtering CUI/TUI returned entities? #516

@ddofer

Description

@ddofer

When doing NER/NEL to UMLS/CUI entities, is there any way to configure the nlp pipe to exclude candidates by a predefined filtering list of CUIs or TUIs? e.g. to exclude any detected CUIs with TUI: T079 (Temporal Concept)?

Currently I'm doing it by post-hoc filtering, which is both inelegant, inneffecient, and doesn't help remove noisy detections. i.e., if the linker returns the first detected entity froma text, then post-hoc filtering to remove the TUI means I miss the relevant entities.

Current code extract:

`nlp.add_pipe("scispacy_linker",
config={"resolve_abbreviations": True,
"linker_name": "umls",
"max_entities_per_mention": 4, #5
"threshold":0.87 ## default is 0.8, paper mentions 0.99 as thresh
})
#...

EXCLUDE_TUIS_LIST = ["T079","T093"] #List of umls cui semtypes to exclude.

novel_cols_candidates_names = []
no_entities_list = []

novel_candidate_cuis = []
novel_candidate_cuis_nomenclatures = []
TUIs_list = []

for f in icu_feature_terms["name"]:
print(f)
doc =nlp(f)
linker = nlp.get_pipe("scispacy_linker")

if len(doc.ents)>0:
    for j,entity in enumerate(doc.ents):
        print(f"Entity #{j}:{entity}")
       
        list_feature_cuis = [i[0] for i in entity._.kb_ents]

        ## add tui filt
        s1 = len(list_feature_cuis)
        # print(s1)
        tui_filter_mask = [linker.kb.cui_to_entity[c][3][0] not in EXCLUDE_TUIS_LIST for c in list_feature_cuis]
        list_feature_cuis = list(compress(list_feature_cuis,tui_filter_mask))

     
        list_cuis_nomenclatures = [linker.kb.cui_to_entity[i[0]][1] for i in entity._.kb_ents]
        # linker = nlp.get_pipe("scispacy_linker") #ORIG
        list_cuis_nomenclatures = list(compress(list_cuis_nomenclatures,tui_filter_mask))
        
        num_candidates = len(list_feature_cuis)
        for c in list_feature_cuis:
            TUIs_list.append(linker.kb.cui_to_entity[c][3][0]) # c[0]][3][0])

            for cui in list_feature_cuis:
              novel_cols_candidates_names.extend([f]*(num_candidates))
              novel_candidate_cuis.extend(list_feature_cuis)
              novel_candidate_cuis_nomenclatures.extend(list_cuis_nomenclatures)

else:
    no_entities_list.append(f)
    print(f"No Entity candidates for {f}")

`

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions