Weird problem with mutex and lock error

Discussion:

(too old to reply)

Agents Marlow

2011-09-12 03:05:13 UTC

Hello,

I am getting a nasty error condition from a mutex where it calls
std::terminate. I hope someone here will be able to explain why. The
error message is:

terminate called after throwing an instance of
'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::lock_error>

' what(): boost::lock_error

I read somewhere that the fix for this is to use a recursive mutex. I
don't understand why a recursive mutex should be required in my case.

Here is my situation: I have a parent thread that owns an object that
contains a concurrent queue. This object is set up from info in a
config file. Let us call this object a config object. Under certain
conditions the parent thread re-reads the config file and recreates
the config object. The object is managed using a scoped_ptr so when I
need to blow the old object away I create a new config object then do
a reset to that object pointer. Here is the concurrent queue object
that the config object contains:

template<typename Data>
class concurrent_queue
{
private:
std::queue<Data> the_queue;
mutable boost::mutex the_mutex;
boost::condition_variable the_condition_variable;

public:
void push(Data const& data)
{
boost::mutex::scoped_lock lock(the_mutex);
the_queue.push(data);
lock.unlock();
the_condition_variable.notify_one();
}

bool empty() const
{
boost::mutex::scoped_lock lock(the_mutex);
return the_queue.empty();
}

bool try_pop(Data& popped_value)
{
boost::mutex::scoped_lock lock(the_mutex);
if(the_queue.empty())
{
return false;
}

popped_value=the_queue.front();
the_queue.pop();
return true;
}

struct queue_not_empty
{
std::queue<Data>& queue;

queue_not_empty(std::queue<Data>& queue_):
queue(queue_)
{}
bool operator()() const
{
return !queue.empty();
}
};

void wait_and_pop(Data& popped_value)
{
boost::mutex::scoped_lock lock(the_mutex);
the_condition_variable.wait(lock, queue_not_empty(the_queue));
popped_value = the_queue.front();
the_queue.pop();
}

template<typename Duration>
bool timed_wait_and_pop(Data& popped_value, Duration const&
wait_duration)
{
bool got_value;
boost::mutex::scoped_lock lock(the_mutex);
if (the_condition_variable.timed_wait(lock, wait_duration,
queue_not_empty(the_queue)))
{
popped_value = the_queue.front();
the_queue.pop();
got_value = true;
}
else
{
got_value = false;
}

return got_value;
}
};

The queue is used for the parent thread to communicate with various
child threads that it spawns.

I managed to reproduce the error in the debugger and the exception is
thrown from concurrent_queue in the first line of the push method
(where it gets the lock).

I am not sure how this can be since the config object is only supposed
to be destroyed when there are no child threads left. It is as if a
child thread is still trying to write to the queue when the parent is
trying to destroy it.

If my suspicion is right then I don't think it would be right to
correct this error by changing the concurrent queue to use a recursive
mutex. What do people think?

This is a bif of a tricky issue because the parent cannot know that a
child will finish properly. The child threads do a fork and an exec
and it is possible that the exec will not return.

Regards,

Andrew Marlow

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Ulrich Eckhardt

2011-09-12 18:41:48 UTC

Permalink

Post by Agents Marlow
I am getting a nasty error condition from a mutex where it calls
std::terminate. I hope someone here will be able to explain why. The
terminate called after throwing an instance of
'boost::exception_detail::clone_impl
<
boost::exception_detail::error_info_injector<boost::lock_error>

what(): boost::lock_error

std::terminate called after throwing an exception that is not caught, if I'm
guessing right. I'd attach a debugger and see what is causing the problem.

Post by Agents Marlow
I read somewhere that the fix for this is to use a recursive mutex.

No, that suggestion without any analysis of the problem is surely wrong.

Post by Agents Marlow
Here is my situation: I have a parent thread that owns an object that
contains a concurrent queue. This object is set up from info in a
config file. Let us call this object a config object. Under certain
conditions the parent thread re-reads the config file and recreates
the config object. The object is managed using a scoped_ptr so when I
need to blow the old object away I create a new config object then do
a reset to that object pointer.

A scoped_ptr means that the object is not shared, and your statement that
your "parent thread"[1] owns it confirms this. Now, the question is why is
this thing exclusively owned but somehow shared between threads?
[snip]

I've been looking at this, and just a few things:
1. I don't see anything locking the mutex recursively, but you should be
able to trivially verify that using a debugger.
2. As soon as the empty() function returns, the state of the queue could
change, so that function is somewhere between dangerous and useless.
3. I'm not sure if the condition variable is thread-safe, it might be
necessary to lock the associated mutex while signalling it in push().
4. timed_wait_and_pop is made unnecessarily long by introducing a temporary
to hold a returnvalue. Just return false if the timed_wait() call returns
false. YMMV.

Post by Agents Marlow
I am not sure how this can be since the config object is only supposed
to be destroyed when there are no child threads left. It is as if a
child thread is still trying to write to the queue when the parent is
trying to destroy it.

Since you haven't shown any code using that queue, it's impossible to tell.
If your suspicion is right, you do have an ownership problem, see the use of
scoped_ptr above. Using the right smart pointer (default to shared_ptr)
should help remedy this. Try some memory debugger (e.g. valgrind) to see if
the object is used after it was destroyed, too.

Post by Agents Marlow
If my suspicion is right then I don't think it would be right to
correct this error by changing the concurrent queue to use a recursive
mutex. What do people think?

Is the mutex locked recursively and is this intended? If not, using a
recursive mutex is not a solution, at best it is a workaround. BTW: Using a
typedef would have saved you lots of typing and you could change the code to
use a recursive mutex with a single-line change. ;)

Good luck!

Uli

[1] Threads in all systems I know don't share a parent-child relationship.
They are peers, even if one started another.

--
Domino Laser GmbH
Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Andrew

2011-09-13 17:52:31 UTC

Permalink

Post by Ulrich Eckhardt

Post by Agents Marlow
I am getting a nasty error condition from a mutex where it calls
std::terminate. I hope someone here will be able to explain why.

[snip]
1. I don't see anything locking the mutex recursively, but you should be
able to trivially verify that using a debugger.
2. As soon as the empty() function returns, the state of the queue could
change, so that function is somewhere between dangerous and useless.
3. I'm not sure if the condition variable is thread-safe, it might be
necessary to lock the associated mutex while signalling it in push().
4. timed_wait_and_pop is made unnecessarily long by introducing a temporary
to hold a returnvalue. Just return false if the timed_wait() call returns
false. YMMV.

I must confess, I did not write this from scratch. I have been using
java where you get a concurrent queue object for free so I looked
around to see if anyone had written one for C++. I found one here:

http://www.justsoftwaresolutions.co.uk/threading/implementing-a-thread-safe-queue-using-condition-variables.html

The code I used is from the section called "The Final Code". It is
written by Anthony Williams, who knows a thing or two about
concurrency in C++. I just assumed it was correct, I needed something
off the shelf.

Point 1 (no recursive mutex) is true but I think it is correct. IMO a
concurrent queue like this should not require a recursive mutex.
However, I could be wrong. Do you think such a structure does require
one? Why?

Regards,

Andrew Marlow

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Ulrich Eckhardt

2011-09-14 08:47:36 UTC

Permalink

Post by Andrew
Point 1 (no recursive mutex) is true but I think it is correct. IMO a
concurrent queue like this should not require a recursive mutex.
However, I could be wrong. Do you think such a structure does require
one? Why?

I personally consider recursive mutexes a hack, when you are too lazy to
separate things. For such a queue, which is "self contained", i.e. doesn't
contain any external references, you just lock the mutex when you begin an
operation and unlock it afterwards. There shouldn't be any recursion
anywhere. However, if your try_pop() function called the empty() function,
you would get this recursive lock, but your code doesn't. In some cases, it
is just convenience/lazyness that prompts these recursive locks and they
don't hurt. This is basically your case, so to answer your question, you
don't need a recursive mutex.

A completely different case is if you have calls from inside that are made
to some external entity that also requires a lock. There, you will probably
want to release your local lock temporarily in order to avoid deadlock. If
the mutex is now locked twice and you only unlock it once, it is still
locked, leading to bugs. In such a scenario, you can't be lazy, and using a
non-recursive mutex (preferably with a debugging diagnostic to catch invalid
recursions) is the better choice. You can't use the recursion anyway, and
shouldn't have to pay for it.

Cheers!

Uli

Chris M. Thomasson

2011-09-13 17:53:07 UTC

Permalink

"Ulrich Eckhardt" <***@dominolaser.com> wrote in message news:f6p0k8-***@satorlaser.homedns.org...
[...]

Post by Ulrich Eckhardt

Post by Agents Marlow
I am getting a nasty error condition from a mutex where it calls
If my suspicion is right then I don't think it would be right to
correct this error by changing the concurrent queue to use a recursive
mutex. What do people think?

Is the mutex locked recursively and is this intended?

AFAICT, the recursive locking of the mutex is most definitely "intended";
how else does one explain the following function's:
____________________________________________________________
bool empty() const
{
boost::mutex::scoped_lock lock(the_mutex);
return the_queue.empty();
}

bool try_pop(Data& popped_value)
{
boost::mutex::scoped_lock lock(the_mutex);
if(the_queue.empty())
{
return false;
}

popped_value=the_queue.front();
the_queue.pop();
return true;
}
____________________________________________________________

AFAICT, `try_pop()' does lock the mutex twice... ;^o

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

2011-09-14 08:44:32 UTC

Permalink

Post by Chris M. Thomasson
AFAICT, the recursive locking of the mutex is most definitely "intended";
____________________________________________________________
bool empty() const
{
boost::mutex::scoped_lock lock(the_mutex);
return the_queue.empty();
}
bool try_pop(Data& popped_value)
{
boost::mutex::scoped_lock lock(the_mutex);
if(the_queue.empty())
{
return false;
}
popped_value=the_queue.front();
the_queue.pop();
return true;
}
____________________________________________________________
AFAICT, `try_pop()' does lock the mutex twice... ;^o

I don't see it. try_pop calls the_queue.empty() and not this->empty();
the_queue.empty() does not involve any locking.

I don't see any concurrent_queue implementation bug except for the
rather useless empty-function and an unfortunate naming of the
condition variable. I'd prefer to use "not_empty" or something like
this as the condition variable's name so that it actually reflects the
condition that is signalled / waited for.

The error is probably somewhere else.

Cheers!
SG

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Andrew

2011-09-15 20:47:14 UTC

Permalink

Post by SG

Post by Chris M. Thomasson
AFAICT, `try_pop()' does lock the mutex twice... ;^o

I don't see it. try_pop calls the_queue.empty() and not this->empty();
the_queue.empty() does not involve any locking.
The error is probably somewhere else.

It was. I eventually tracked it down to the queue getting deleted
whilst a thread was still trying to use it. I now wait for all the
threads to finish.

I am still thinking about what to do when they don't finish. It is
possible (but rare) for processes to hang on unix and become
unkillable. I think I probably need to terminate the process without
cleaning up the queue if I timeout waiting for my threads to finish.

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Ulrich Eckhardt

2011-09-14 08:47:56 UTC

Permalink

It does check if the queue is empty, but it doesn't use its memberfunction
for that but directly accesses the internal container. So no, it doesn't.

Or am I missing something?

Uli

Chris M. Thomasson

2011-09-15 20:44:31 UTC

Permalink

Post by Ulrich Eckhardt

Post by Chris M. Thomasson
AFAICT, the recursive locking of the mutex is most definitely "intended";
____________________________________________________________

[...]

Post by Ulrich Eckhardt

Post by Chris M. Thomasson
____________________________________________________________
AFAICT, `try_pop()' does lock the mutex twice... ;^o

It does check if the queue is empty, but it doesn't use its memberfunction
for that but directly accesses the internal container. So no, it doesn't.
Or am I missing something?

You are missing nothing Ulrich; I made a stupid mistake...

Sorry about that non-sense!

;^o

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]