Saturday, 5 June 2010

Parallel task management

It took me a lot of time to make this decision (apart from long hours of thinking how to implement it actually). But I finally made it. I refactored nGENE to make use of multi-threading/multi-core features in a more serious manner and thus speed up CPU part a bit. In this lengthy post I will try to describe in details approach I used which is heavily inspired by this post from Bitsquid creators and which can be called hierarchical task management system (HTMS for short :) doesn't it sound professional?).

What pushed me to made such a decision was thinking about the nGENE. I found out that there are numerous parallelization opportunities. In scene graph each of the nodes at a given hierarchy level is completely independent from its siblings. That is to say we first have to update root node but then we can update all its children in parallel - there is no need to do it sequentially. Same stays true for any given node. But that's not everything. Animation controllers are also independent from each other. Same is true for quad/oct tree nodes (eg. visibility tests) and many other things. GUI/UI update can be performed at any time in parallel to such tasks as scene management or physics update - they don't have to wait for it. That would be a waste not to make use of it.

I decided to go with the hierarchical task management as the name implies where beginning of one task can be dependent on completing several different ones. Furthermore, each of the tasks can have subtasks which have to be completed in order to complete their parent (mentioned scene graph update can be one big task with updates of all of its nodes being subtasks) and thus proceed with the next task in the flow. Therefore think of it as of the graph or even better - a flow.

I also keep a pool of worker threads, each of them can perform any task ready for processing. Number of worker threads can be specified by user but by default it equals to a number of logical processors - 1 (this 1 is for main thread). If they have nothing to do they are put to sleep and awaken as soon as a new task arrives.

Each of the tasks can have priority assigned to it. When there are idle worker threads they can perform tasks with the highest priorities in the open list first. When the task is finished it is removed from this active (or open) list and the next one available for processing (i.e. all its dependencies are resolved) is added to it.

There is also main thread in which rendering and task scheduling is performed.

Moreover, each of the tasks can have affinity specified if it is required to run a task on a given processor.

Such an approach is good for several reasons. One of them is obvious - a lot of things are run in parallel what can result in higher performance (and my tests prove it so far). Besides it eliminates the need for explicit wait/signal calls. Also the flow is quite intuitive so I didn't need to make many changes - I just turned a few things into tasks. And that was it.

This task scheduling theme will be available in nGENE in a few days with the next SVN commit. It will also contain a lot of bug fixes and minor improvements.

3 comments:

  1. I'm glad to hear that you've been able to get this implemented as you were planning :-)

    What are the performance gains that you've measured so far?

    In terms of dependencies between "modules" like physics->animation (one affects the other), you keep them in the same thread for the update?

    Another thing, how do you do as far as scaling for n cpus? I mean, nGene now knows how many cpus are available but how do you schedule the paralelization with that information?

    Keep it up! Great work :-)

    ReplyDelete
  2. "What are the performance gains that you've measured so far?"

    It's really hard to tell any precise number here. Some demos I tested were CPU heavy, some GPU (for that it didn't help much of course ;) ). In one of the demos I got around 40% (on a dual-core processor so pretty good result) performance gain. But still there are a few things which aren't tasks yet and will be turned into them.

    "In terms of dependencies between "modules" like physics->animation (one affects the other), you keep them in the same thread for the update?"

    Not necessarily. These are both tasks. One have to be finished before going with the other one, however (they are dependent as you say). But as they both contain numerous subtasks I still have all worker threads busy most of the time.

    As for the scaling I use very simple approach for now (maybe will do something more robust but at first I wanted to make it all working). At present there are n - 1 worker threads so at any given time as many as n - 1 tasks can be performed.

    ReplyDelete
  3. I'd love to read next part of this story describing in more details (some code?) about how do you create and run tasks (allocating new Task(...) each time?) and how do you pass input/output data.

    ReplyDelete