Skip to content

Improve / change products_single_atoms filter #3

@avaucher

Description

@avaucher

Imported from internal issue number 24 (Dec 2020).


Looking into the cases why 2000 reactions from [a project] were marked by this filter,

        if self.products_single_atoms(reaction):
            valid = False
            reasons.append("products_single_atoms")

it appears that actually most of them (maybe 90%) are actually rather "No product". This is mainly reactions that have the same starting material and product.

I would therefore:

  • Add a "no product" filter
  • Reformulate the products_single_atoms filter so that it does not return True in this case.

Thinking a bit further, I am not sure if we really need this "single atom" filter. The remaining 10% cases are reactions in which we had, in the beginning, sth like X>>X.Cl or X>>X.[NH4+]. We need to consider:

  • There are also cases in which the ion has more than one heavy atom (triflate, ...)
  • Mainly it's also about renaming the filter. I'd say that "single atoms" are the symptom, not the actual reason why we remove the reaction.

Similarly, products_subset_of_reactants currently returns True if there are no products - if we add the "no product" filter, this one is not needed anymore.

What may be to discuss is the following:
Currently, it looks to me like the molecules present on both sides are removed before entering the function with the filters - which leads to the problem(s) above. If I see it correctly, this is not necessary: products_subset_of_reactants would mark the reaction as invalid anyway and so it would be filtered afterwards anyway.
[@others] anything I am missing there?


Decision:

update implementations of product_single_atom and products_subset_of_reactants so that they are not tagged when the products are empty.

Add a column in csv, sth like rxn_before_smiles_processing

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions