This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug nptl/4294] New: rwlock hangs under stress load


fedora core 6 x86_64 smp installation.  updated to kernel.org 2.6.20.3 kernel
configured with fully preemptible kernel, including preemptible big kernel lock.
 running on 4 dual-core AMD 880 system with 8-gig of ram.

real-time process with 4 reader threads and 1 writer thread of higher real-time
SCHED_FIFO priority than reader threads.  all 5 threads use "cpu affinity"
setting to each obtain a processor to themselves.  attempted to set rwlock
attributes to "writer preferred", although i'm not certain it worked ("flags"
still appear to be zero in pthread_rwlock_t structure, but maybe i'm looking at
the wrong thing).  approximately 15000 write locks and 4x15000 read locks per
second under full load.

process will eventually get "stuck" with reader threads returning "RESOURCE
TEMPORARILY UNAVAILABLE" status forever after.  amount of time it takes to get
"stuck" is highly variable (can be minutes or hours).

the EAGAIN return status would appear to be indicative of a counter overflow
condition, but i believe it's actually just the opposite, in a roundabout manner
of speaking.  i don't know much about assembly language code, so i tried taking
the "C" code equivalents (instead of the x86_64 assembly functions) for the
pthread_rwlock_rdlock, pthread_rwlock_wrlock, and pthread_rwlock_unlock
functions, and "incorporated them into my process" so to speak, hoping to
duplicate the symptoms, and allowing me to insert some printf statements, which
might shed some light on the problem.

i was able to duplicate the situation, and what appears to be happening to me is
two of the reader threads are simultaneously incrementing the __nr_readers
counter in the pthread_rwlock_t structure, so essentially one of the increments
is "missed".  for example, the __nr_readers counter starts at zero let's say,
both threads increment the counter simultaneously, and it ends up at one, where
it should have ended up at two.  then when the "unlock" call decrements the
counter, it goes to -1 (or 4294967295 as an unsigned 32-bit integer).  the next
time a rdlock is issued, it thinks the counter is about to roll over, and
returns the EAGAIN status.

i thought the low level lock should prevent two threads from simultaneously
incrementing or decrementing those counters, but for some reason that doesn't
seem to be the case?  so perhaps the problem is really with the lll_mutex_lock
rather than the rwlock itself, i'm not really sure?

sorry, this is my first bug report, and i didn't know what to fill in for the
host, target, build, triplets, but hopefully i've provided enough information
otherwise.  if not, feel free to e-mail me at Matthew.L.Dunkle@nasa.gov if you
need additional information.

i know this might not be easy to reproduce, especially considering the equipment
i am working with and so forth, but i appreciate whatever efforts you can make.
 in the meantime, i am going to attempt to use something else, maybe a plain
vanilla mutex, to see if i can get it working in a different manner.  thank you.

-- 
           Summary: rwlock hangs under stress load
           Product: glibc
           Version: 2.4
            Status: NEW
          Severity: normal
          Priority: P2
         Component: nptl
        AssignedTo: drepper at redhat dot com
        ReportedBy: Matthew dot L dot Dunkle at nasa dot gov
                CC: glibc-bugs at sources dot redhat dot com


http://sourceware.org/bugzilla/show_bug.cgi?id=4294

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]