ParaSail uses work stealing to schedule the picothreads that make up a ParaSail program. With work stealing, there are a relatively small number of heavy-weight worker processes, roughly one per physical core/processor, each serving their own queue of picothreads (in a LIFO manner), and periodically stealing a picothread from some other worker process (using FIFO, so as to pick up a picothread that has been languishing on the other worker's queue). See the following blog entry for more discussion of work stealing:
http://parasail-programming-language.blogspot.com/2010/11/virtual-machine-for-parasail-with.html
What this means is that a worker's queue is referenced mostly by only one process, namely the owner of the queue.
A similar situation arises in the region-based storage management used in ParaSail. A region is created when a new scope is entered, and most of the allocation and deallocation within a region is done by the worker that created the scope. But due to work stealing, some other worker might be executing a picothread that is expanding or shrinking an object associated with the region, so some synchronization is necessary in this case.
So what sort of synchronization should be used for these situations where most of the access to a resource arises from an owning worker process, but some of the access arises from other non-owning workers? We could use a traditional lock-based mutex all of the time, but this slows down the common case where all the access comes from the owner. We could use a general N-way lock-free synchronization, but this generally requires some kind of atomic compare-and-swap and involves busy waiting. Atomic compare-and-swap is not always available in a portable fashion at the high-level language level, and busy waiting presumes that there are never more worker processes than there are available physical processors/cores, so the current holder of the lock-free resource is actually making progress while other workers are busy-waiting.
So for ParaSail we are adopting a middle ground between fully lock-based synchronization and N-way lock-free synchronization, which recognizes the asymmetric nature of the problem, namely that one process, the owner, will be performing most of the references. With the adopted solution, we only need atomic load and store, rather than atomic compare-and-swap, and there is never any busy waiting, so we can run on top of, for example, a time-slicing operating system, where some worker processes might be preempted.
So what is the adopted solution? For a given resource, we have two flags which are atomic variables, one mutex, and a queue, named as follows:
How do we know this approach does not leave a nonowner waiting on the queue forever? We know the owner rechecks the nonowner-wants-resource flag after clearing the owner-wants-resource flag, so the owner will never miss the possibility that a nonowner queued itself while the owner was manipulating the resource.
So what does this approach accomplish? We see that the owner only uses a lock-based mutex when it bumps into a nonowner that is simultaneously manipulating the resource. On the other hand, a nonowner always uses a lock-based mutex, and in addition it uses a queue if it happens to bump into the owner simultaneously manipulating the resource. As mentioned above, this approach also avoids the need for atomic compare-and-swap, as well as avoiding the need for busy waiting.
http://parasail-programming-language.blogspot.com/2010/11/virtual-machine-for-parasail-with.html
What this means is that a worker's queue is referenced mostly by only one process, namely the owner of the queue.
A similar situation arises in the region-based storage management used in ParaSail. A region is created when a new scope is entered, and most of the allocation and deallocation within a region is done by the worker that created the scope. But due to work stealing, some other worker might be executing a picothread that is expanding or shrinking an object associated with the region, so some synchronization is necessary in this case.
So what sort of synchronization should be used for these situations where most of the access to a resource arises from an owning worker process, but some of the access arises from other non-owning workers? We could use a traditional lock-based mutex all of the time, but this slows down the common case where all the access comes from the owner. We could use a general N-way lock-free synchronization, but this generally requires some kind of atomic compare-and-swap and involves busy waiting. Atomic compare-and-swap is not always available in a portable fashion at the high-level language level, and busy waiting presumes that there are never more worker processes than there are available physical processors/cores, so the current holder of the lock-free resource is actually making progress while other workers are busy-waiting.
So for ParaSail we are adopting a middle ground between fully lock-based synchronization and N-way lock-free synchronization, which recognizes the asymmetric nature of the problem, namely that one process, the owner, will be performing most of the references. With the adopted solution, we only need atomic load and store, rather than atomic compare-and-swap, and there is never any busy waiting, so we can run on top of, for example, a time-slicing operating system, where some worker processes might be preempted.
So what is the adopted solution? For a given resource, we have two flags which are atomic variables, one mutex, and a queue, named as follows:
- Owner-wants-resource flag
- Nonowner-wants-resource flag
- Resource mutex
- Nonowner-waiting queue
- Owner sets the owner-wants-resource flag atomically;
- It then checks the nonowner-wants-resource flag:
- If nonowner-wants-resource flag is set:
- Owner calls the mutex lock operation;
- Owner manipulates the resource;
- Owner clears the owner-wants-resource flag;
- <
<Check_Queue>> Owner then checks the nonowner-waiting queue: - If the queue is empty, it clears the nonowner-wants-resource flag;
- If the queue is not empty, it wakes up one of the waiting nonowners;
- Owner calls the mutex unlock operation (note that this might be combined with the above waking up of one of the nonowners -- e.g. using a lock handoff).
- If nonowner-wants-resource flag is not set:
- Owner manipulates the resource;
- Owner clears the owner-wants-resource flag.
- Owner rechecks the nonowner-wants-resource flag:
- If nonowner-wants-resource flag is now set:
- Owner calls the mutex lock operation;
- Owner does the <<Check_Queue>
> operation (see above); - Owner calls the mutex unlock operation (note that this might be combined with the waking up of one of the nonowners by Check_Queue -- e.g. using a lock handoff).
- Nonowner calls the mutex lock operation;
- Nonowner sets the nonowner-wants-resource flag atomically;
- Nonowner checks the owner-wants-resource flag;
- While owner-wants-resource flag is set:
- Nonowner adds itself to the nonowner-waiting queue;
- When woken up, reacquire the lock (or get lock automatically via a handoff from the process that woke us up);
- Nonowner manipulates the resource;
- Nonowner does the <<Check_Queue>
> operation (see above); - Nonowner calls the mutex unlock operation (note that this might be combined with the waking up of another nonowner by Check_Queue -- e.g. using a lock handoff).
How do we know this approach does not leave a nonowner waiting on the queue forever? We know the owner rechecks the nonowner-wants-resource flag after clearing the owner-wants-resource flag, so the owner will never miss the possibility that a nonowner queued itself while the owner was manipulating the resource.
So what does this approach accomplish? We see that the owner only uses a lock-based mutex when it bumps into a nonowner that is simultaneously manipulating the resource. On the other hand, a nonowner always uses a lock-based mutex, and in addition it uses a queue if it happens to bump into the owner simultaneously manipulating the resource. As mentioned above, this approach also avoids the need for atomic compare-and-swap, as well as avoiding the need for busy waiting.