I may be mistaken here, but I wrote a similar library myself and know a thing or two how things need to be done within the context of locking based on keys. I believe I see multiple race conditions in this code. The reference counters are not incremented and decremented within locks (#2 mentions it already, but the suggestion is still not enough).
So what could happen?
First of all, two concurrent calls for the same key could both try to do a _waiters++ at the same time, which could result in the value being incremented only by 1 instead of 2. Should this happen, it will be removed from the dictionary whilst it's still in use, and a new thread for the same key could now enter and concurrently do stuff for the same key, which is what this library is trying to avoid.
Secondly, if thread A decrements _waiters from 1 to 0 and starts doing a tryremove, a thread B can get it from the dictionary and increment that 0 to a 1, and is blissfully unaware that the key is being removed by thread A. So thread A removes the item from the dictionary and thread B is still processing for a key that is not in the dictionary anymore. Once again, a thread C can enter, not find anything in the dictionary for the same key as B, and parallel processes for the same key, again against the expectation from this library.
I may be mistaken here, but I wrote a similar library myself and know a thing or two how things need to be done within the context of locking based on keys. I believe I see multiple race conditions in this code. The reference counters are not incremented and decremented within locks (#2 mentions it already, but the suggestion is still not enough).
So what could happen?
First of all, two concurrent calls for the same key could both try to do a _waiters++ at the same time, which could result in the value being incremented only by 1 instead of 2. Should this happen, it will be removed from the dictionary whilst it's still in use, and a new thread for the same key could now enter and concurrently do stuff for the same key, which is what this library is trying to avoid.
Secondly, if thread A decrements _waiters from 1 to 0 and starts doing a tryremove, a thread B can get it from the dictionary and increment that 0 to a 1, and is blissfully unaware that the key is being removed by thread A. So thread A removes the item from the dictionary and thread B is still processing for a key that is not in the dictionary anymore. Once again, a thread C can enter, not find anything in the dictionary for the same key as B, and parallel processes for the same key, again against the expectation from this library.