It took me a lot of time to make this decision (apart from long hours of thinking how to implement it actually). But I finally made it. I refactored nGENE to make use of multi-threading/multi-core features in a more serious manner and thus speed up CPU part a bit. In this lengthy post I will try to describe in details approach I used which is heavily inspired by this post from Bitsquid creators and which can be called hierarchical task management system (HTMS for short :) doesn't it sound professional?).
What pushed me to made such a decision was thinking about the nGENE. I found out that there are numerous parallelization opportunities. In scene graph each of the nodes at a given hierarchy level is completely independent from its siblings. That is to say we first have to update root node but then we can update all its children in parallel - there is no need to do it sequentially. Same stays true for any given node. But that's not everything. Animation controllers are also independent from each other. Same is true for quad/oct tree nodes (eg. visibility tests) and many other things. GUI/UI update can be performed at any time in parallel to such tasks as scene management or physics update - they don't have to wait for it. That would be a waste not to make use of it.
I decided to go with the hierarchical task management as the name implies where beginning of one task can be dependent on completing several different ones. Furthermore, each of the tasks can have subtasks which have to be completed in order to complete their parent (mentioned scene graph update can be one big task with updates of all of its nodes being subtasks) and thus proceed with the next task in the flow. Therefore think of it as of the graph or even better - a flow.
I also keep a pool of worker threads, each of them can perform any task ready for processing. Number of worker threads can be specified by user but by default it equals to a number of logical processors - 1 (this 1 is for main thread). If they have nothing to do they are put to sleep and awaken as soon as a new task arrives.
Each of the tasks can have priority assigned to it. When there are idle worker threads they can perform tasks with the highest priorities in the open list first. When the task is finished it is removed from this active (or open) list and the next one available for processing (i.e. all its dependencies are resolved) is added to it.
There is also main thread in which rendering and task scheduling is performed.
Moreover, each of the tasks can have affinity specified if it is required to run a task on a given processor.
Such an approach is good for several reasons. One of them is obvious - a lot of things are run in parallel what can result in higher performance (and my tests prove it so far). Besides it eliminates the need for explicit wait/signal calls. Also the flow is quite intuitive so I didn't need to make many changes - I just turned a few things into tasks. And that was it.
This task scheduling theme will be available in nGENE in a few days with the next SVN commit. It will also contain a lot of bug fixes and minor improvements.