-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Imported from internal issue number 24 (Dec 2020).
Looking into the cases why 2000 reactions from [a project] were marked by this filter,
if self.products_single_atoms(reaction):
valid = False
reasons.append("products_single_atoms")it appears that actually most of them (maybe 90%) are actually rather "No product". This is mainly reactions that have the same starting material and product.
I would therefore:
- Add a "no product" filter
- Reformulate the products_single_atoms filter so that it does not return True in this case.
Thinking a bit further, I am not sure if we really need this "single atom" filter. The remaining 10% cases are reactions in which we had, in the beginning, sth like X>>X.Cl or X>>X.[NH4+]. We need to consider:
- There are also cases in which the ion has more than one heavy atom (triflate, ...)
- Mainly it's also about renaming the filter. I'd say that "single atoms" are the symptom, not the actual reason why we remove the reaction.
Similarly, products_subset_of_reactants currently returns True if there are no products - if we add the "no product" filter, this one is not needed anymore.
What may be to discuss is the following:
Currently, it looks to me like the molecules present on both sides are removed before entering the function with the filters - which leads to the problem(s) above. If I see it correctly, this is not necessary: products_subset_of_reactants would mark the reaction as invalid anyway and so it would be filtered afterwards anyway.
[@others] anything I am missing there?
Decision:
update implementations of product_single_atom and products_subset_of_reactants so that they are not tagged when the products are empty.
Add a column in csv, sth like rxn_before_smiles_processing