Skip to content

bad change to mmCIF chain vs segment? #1902

@jamesmkrieger

Description

@jamesmkrieger

mmCIF files often split different kinds of entities into chains and segments and have more divisions in the hierarchical view than PDB files

For example, with 4ake, we get 4 instead of 2. We have the option unite_chains, which restores this back and gives a similar behaviour to ChimeraX, but the issue is what happens when we don't use this option to have a more similar behaviour to PyMOL.

In the released version tag v2.4.1, we get something similar to PyMOL:

In [21]: ag = prody.parseMMCIF('4ake')

In [22]: list(ag.getHierView())
Out[22]: 
[<Chain: A from Segment A from 4ake (214 residues, 1656 atoms)>,
 <Chain: B from Segment B from 4ake (214 residues, 1656 atoms)>,
 <Chain: A from Segment C from 4ake (72 residues, 72 atoms)>,
 <Chain: B from Segment D from 4ake (75 residues, 75 atoms)>]

In our current ProDy master, we get them switched and that's probably an issue:

In [2]: ag = prody.parseMMCIF('4ake')

In [3]: list(ag.getHierView())
Out[3]: 
[<Chain: A from Segment A from 4ake (214 residues, 1656 atoms)>,
 <Chain: B from Segment B from 4ake (214 residues, 1656 atoms)>,
 <Chain: C from Segment A from 4ake (72 residues, 72 atoms)>,
 <Chain: D from Segment B from 4ake (75 residues, 75 atoms)>]

There seems to be a difference related to biomol assemblies with v2.4.1 giving an error and master not giving one, but not necessarily giving the right result although I think it does. Here is the example for 1ake:

v2.4.1

In [28]: ag = prody.parseMMCIF('1ake', biomol=True)

In [29]: ag
Out[29]: 
[<AtomGroup: 1ake biomolecule 1 (1954 atoms)>,
 <AtomGroup: 1ake biomolecule 2 (1850 atoms)>]

In [30]: [bm.numChains() for bm in ag]
Out[30]: [1, 1]

In [31]: [list(bm.getHierView()) for bm in ag]
Out[31]: 
[[<Chain: A from Segment 1 from 1ake biomolecule 1 (456 residues, 1954 atoms)>],
 [<Chain: B from Segment 1 from 1ake biomolecule 2 (352 residues, 1850 atoms)>]]

In [32]: ag = prody.parseMMCIF('1ake', biomol=True, unite_chains=True)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[32], line 1
----> 1 ag = prody.parseMMCIF('1ake', biomol=True, unite_chains=True)

File ~/software/scipion3/software/em/prody-2.4.1/ProDy/prody/proteins/ciffile.py:125, in parseMMCIF(pdb, **kwargs)
    123 cif.close()
    124 if unite_chains:
--> 125     result.setSegnames(result.getChids())
    126 return result

AttributeError: 'list' object has no attribute 'setSegnames'

In [33]: ag = prody.parseMMCIF('1ake', biomol=True)

In [34]: [list(bm.protein.getHierView()) for bm in ag]
Out[34]: 
[[<Chain: A from Segment 1 from 1ake biomolecule 1 (214 residues, 1656 atoms)>],
 [<Chain: B from Segment 1 from 1ake biomolecule 2 (214 residues, 1656 atoms)>]]

master

In [10]: ag = prody.parseMMCIF('1ake', biomol=True, unite_chains=True)

In [11]: ag
Out[11]: 
[<AtomGroup: 1ake biomolecule 1 (1954 atoms)>,
 <AtomGroup: 1ake biomolecule 2 (1850 atoms)>]

In [12]: [list(bm.getHierView()) for bm in ag]
Out[12]: 
[[<Chain: A1 from Segment A1 from 1ake biomolecule 1 (456 residues, 1954 atoms)>],
 [<Chain: B1 from Segment B1 from 1ake biomolecule 2 (352 residues, 1850 atoms)>]]

In [13]: ag = prody.parseMMCIF('1ake', biomol=True)

In [14]: ag
Out[14]: 
[<AtomGroup: 1ake biomolecule 1 (1954 atoms)>,
 <AtomGroup: 1ake biomolecule 2 (1850 atoms)>]

In [17]: [list(bm.getHierView()) for bm in ag]
Out[17]: 
[[<Chain: A from Segment A1 from 1ake biomolecule 1 (214 residues, 1656 atoms)>,
  <Chain: C from Segment A1 from 1ake biomolecule 1 (1 residues, 57 atoms)>,
  <Chain: E from Segment A1 from 1ake biomolecule 1 (241 residues, 241 atoms)>],
 [<Chain: B from Segment B1 from 1ake biomolecule 2 (214 residues, 1656 atoms)>,
  <Chain: D from Segment B1 from 1ake biomolecule 2 (1 residues, 57 atoms)>,
  <Chain: F from Segment B1 from 1ake biomolecule 2 (137 residues, 137 atoms)>]]

In [18]: [list(bm.protein.getHierView()) for bm in ag]
Out[18]: 
[[<Chain: A from Segment A1 from 1ake biomolecule 1 (214 residues, 1656 atoms)>],
 [<Chain: B from Segment B1 from 1ake biomolecule 2 (214 residues, 1656 atoms)>]]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions