-
Notifications
You must be signed in to change notification settings - Fork 545
Description
Hi! I'm observing a STAR process on our system (Linux, Centos7) which appears to have stalled for ~4.5 days. The log shows:
2023-11-08T17:37:49+0800: STAR version: 2.7.10a compiled: 2023-10-13T16:49:32+0800
2023-11-08T17:37:49+0800: Nov 08 17:37:49 ..... started STAR run
2023-11-08T17:37:49+0800: Nov 08 17:37:49 ... starting to generate Genome files
2023-11-08T17:38:28+0800: Nov 08 17:38:28 ..... processing annotations GTF
2023-11-08T17:38:45+0800: Nov 08 17:38:45 ... starting to sort Suffix Array. This may take a long time...
2023-11-08T17:38:58+0800: Nov 08 17:38:58 ... sorting Suffix Array chunks and saving them to disk...
2023-11-08T17:49:22+0800: Nov 08 17:49:22 ... loading chunks from disk, packing SA...
2023-11-13T11:30:15+0800 is the current time.
I have the following observations:
straceon the process main thread for a few minutes shows that no system calls are being madegstackon the process shows the main thread shows something like this:
Thread 1 (Thread 0x2ad51636b800 (LWP 57366)):
#0 0x00002ad51687ae9b in std::istream::sentry::sentry (this=0x7fff5a5dd47f, __in=..., __noskip=<optimized out>)
#1 0x00002ad51687bcdb in std::istream::read (this=0x7fff5a5dd9f0, __s=0x2ad666816010 "", __n=80000000)
#2 0x000000000052532b in fstreamReadBig(std::basic_ifstream<char, std::char_traits<char> >&, char*, unsigned long long) ()
#3 0x000000000051b97f in Genome::genomeGenerate() ()
#4 0x0000000000425d61 in main ()
Obviously this changes depending on exactly when I sample the stack, but the outer two frames main() -> genomeGenerate() are always present.
Anyway there's only one call to fstreamReadBig() from genomeGenerate() -- around line 313 of source/Genome_genomeGenerate.cpp we have this loop:
while (! saChunkFile.eof()) {//read blocks from each file
uint chunkBytesN=fstreamReadBig(saChunkFile,(char*) saIn,SA_CHUNK_BLOCK_SIZE*sizeof(saIn[0]));
for (uint ii=0;ii<chunkBytesN/sizeof(saIn[0]);ii++) {
SA.writePacked( packedInd+ii, (saIn[ii]<nGenome) ? saIn[ii] : ( (saIn[ii]-nGenome) | N2bit ) );
#ifdef genenomeGenerate_SA_textOutput
SAtxtStream << saIn[ii] << "\n";
#endif
};
packedInd += chunkBytesN/sizeof(saIn[0]);
};
Based on the fact that fstreamReadBig() calls std::istream::read() but I never observe a read() syscall to actually perform any I/O, I conclude that the istream is not making the syscall based on its own internal state -- probably istream::good() is returning false, but istream::eof() is also false which prevents the outer loop from ever terminating.
Attaching to the process with gdb confirms my hypothesis about the stream state:
#1 0x00002ad51687bcdb in std::istream::read (this=0x7fff5a5dd9f0, __s=0x2ad666816010 "", __n=80000000)
(gdb) print this->good()
$1 = false
(gdb) print this->eof()
$2 = false
(gdb) print this->bad()
$3 = false
(gdb) print this->fail()
$4 = true
Obviously you can't do anything about the I/O failure, but tweaking the outer loop condition would avoid the infinite loop.