Skip to content

Infinite loop in Genome::genomeGenerate following I/O error #1991

@rowanworth

Description

@rowanworth

Hi! I'm observing a STAR process on our system (Linux, Centos7) which appears to have stalled for ~4.5 days. The log shows:

2023-11-08T17:37:49+0800:       STAR version: 2.7.10a   compiled: 2023-10-13T16:49:32+0800
2023-11-08T17:37:49+0800: Nov 08 17:37:49 ..... started STAR run
2023-11-08T17:37:49+0800: Nov 08 17:37:49 ... starting to generate Genome files
2023-11-08T17:38:28+0800: Nov 08 17:38:28 ..... processing annotations GTF
2023-11-08T17:38:45+0800: Nov 08 17:38:45 ... starting to sort Suffix Array. This may take a long time...
2023-11-08T17:38:58+0800: Nov 08 17:38:58 ... sorting Suffix Array chunks and saving them to disk...
2023-11-08T17:49:22+0800: Nov 08 17:49:22 ... loading chunks from disk, packing SA...

2023-11-13T11:30:15+0800 is the current time.

I have the following observations:

  1. strace on the process main thread for a few minutes shows that no system calls are being made
  2. gstack on the process shows the main thread shows something like this:
Thread 1 (Thread 0x2ad51636b800 (LWP 57366)):
#0  0x00002ad51687ae9b in std::istream::sentry::sentry (this=0x7fff5a5dd47f, __in=..., __noskip=<optimized out>)
#1  0x00002ad51687bcdb in std::istream::read (this=0x7fff5a5dd9f0, __s=0x2ad666816010 "", __n=80000000)
#2  0x000000000052532b in fstreamReadBig(std::basic_ifstream<char, std::char_traits<char> >&, char*, unsigned long long) ()
#3  0x000000000051b97f in Genome::genomeGenerate() ()
#4  0x0000000000425d61 in main ()

Obviously this changes depending on exactly when I sample the stack, but the outer two frames main() -> genomeGenerate() are always present.

Anyway there's only one call to fstreamReadBig() from genomeGenerate() -- around line 313 of source/Genome_genomeGenerate.cpp we have this loop:

            while (! saChunkFile.eof()) {//read blocks from each file
                uint chunkBytesN=fstreamReadBig(saChunkFile,(char*) saIn,SA_CHUNK_BLOCK_SIZE*sizeof(saIn[0]));
                for (uint ii=0;ii<chunkBytesN/sizeof(saIn[0]);ii++) {
                    SA.writePacked( packedInd+ii, (saIn[ii]<nGenome) ? saIn[ii] : ( (saIn[ii]-nGenome) | N2bit ) );

                    #ifdef genenomeGenerate_SA_textOutput
                        SAtxtStream << saIn[ii] << "\n";
                    #endif
                };
                packedInd += chunkBytesN/sizeof(saIn[0]);
            };

Based on the fact that fstreamReadBig() calls std::istream::read() but I never observe a read() syscall to actually perform any I/O, I conclude that the istream is not making the syscall based on its own internal state -- probably istream::good() is returning false, but istream::eof() is also false which prevents the outer loop from ever terminating.

Attaching to the process with gdb confirms my hypothesis about the stream state:

#1  0x00002ad51687bcdb in std::istream::read (this=0x7fff5a5dd9f0, __s=0x2ad666816010 "", __n=80000000)
(gdb) print this->good()
$1 = false
(gdb) print this->eof()
$2 = false
(gdb) print this->bad()
$3 = false
(gdb) print this->fail()
$4 = true

Obviously you can't do anything about the I/O failure, but tweaking the outer loop condition would avoid the infinite loop.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions