Soft graph coloring is a generalization of traditional graph coloring: the objective is to assign a color to each node in an undirected graph so that the number of edges that connect nodes of the same color is minimized. (In traditional graph coloring, the objective is to ensure that no edge connects nodes of the same color.)
Soft graph coloring can be viewed as an abstraction/simplification of resource allocation problems (involving resources that are mutually exclusive). For example:
Kestrel's project deals primarily with large, distributed networks of resources that use radio communication. Communication is considered to be expensive: it tends to have relatively high latency and can be detrimental in a physical environment that includes adversaries who are trying to determine the location of resources. Moreover, the tasks that the resource network is servicing have real-time aspects and so the resource allocation algorithm must be real-time, and the resources may be unreliable, so the algorithm must be robust. Finally, the network may be over-loaded (too many tasks) but the resource allocation algorithm must continue to behave well (it cannot just fall down or start consuming all of the CPU time trying to find perfect resource allocations when none exists).
More detail on the relationship between resource management and graph coloring is given in the section on the ANTS challenge problem. Here, it is taken for granted that the coloring algorithms must be real-time, scalable, decentralized and fault-tolerant, and must incur low communication costs. Moreover, they must behave well when they have too few colors to eliminate all conflicts.
The essential objective of a soft graph colorer is to find an assignment of colors to nodes that minimizes the number of color conflicts, that is, the number of edges that connect nodes of the same color. To date, we have mainly investigated coloring with a fixed number of colors - the case where the number of colors is variable, and must be minimized, will be investigated soon.
The algorithms that we have looked at are decentralized, anytime, local-repair algorithms:
There are two major design decisions to be made: what sort of local repairs are to be made and how to coordination repairs.
At the other extreme, a total order could be imposed on the nodes to ensure that only one changes color at any given time. This obviously is not a scalable solution.
To avoid thrashing while still maintaining a high degree of parallelism, we use stochastic, synchronous updating: the nodes are all synchronized and at each time step, they determine an activation probability and may change their color only if a randomly generated probability falls below the activation level. The activation level can be adjusted to strike a balance between coordinating neighbors' updates and maintaining parallelism.
(An asynchronous variant of this approach also seems to be promising, based on very preliminary investigations.)
Most of our investigation has focused on a very simple method for determining the activation level: in the Fixed Probability method, each node sets a uniform, constant activation level. Values for this constant of around 0.5 seem to work well, although it is expected that the optimal value would depend on the complexity of the graph (e.g., the mean degree). Also, highly irregular graphs (in which the degree varies significantly from node to node) may benefit from non-uniform activation levels. These are topics of activate investigation.
Decentralized, anytime, local-repair graph coloring can be seen in action in this animation: AVI or MPEG format. Nodes and edges are displayed brightly only when they are conflicts, so the progress of the colorer can be gauged by how few nodes and edges remain easily visible.
To assess how well the soft graph colorer performs in practice, we developed a simulator and measured the number of color conflicts over time for various graphs. We also measured the number of color changes that occurred, since each color change incurs communication costs (for the node to inform its neighbors of the change). Details of the experiments are given below; here is a summary:
For moderate activation levels, the Fixed Probability algorithm converges to high-quality (though not necessarily optimal) colorings. Thrashing occurs at high activation levels. The long term behavior represents what happens in a resource network when the coordination mechanism is given time to work without intereference from changing tasks or resources - the results indicate that the Fixed Probability algorithm will produce high quality solutions.
The diagram below shows typical behavior of the Fixed Probability algorithm for various activation levels. Each curve shows the number of conflicts against the number of synchronous steps for a fixed activation level. The number of conflicts is normalized by dividing by the total number of edges (to eliminate dependence of the metric on the size of the graph) and by multiplying by the number of colors (because coloring with more colors is inherently easier). Using the normalized scale, a random assignment of colors has an expected value of 100%.
The effect of the activation level can be seen in this animation: AVI or MPEG format. Copies of the same graph are 4-colored using different activation probabilities; clockwise from top left:
Even for over-constrained colorings (where the number of colors is insufficient to color the graph with zero conflicts) the Fixed Probability colorer is well behaved and performs well for moderate activation levels.
For example, the two diagrams to the right show the mean number of conflicts and color changes over the first 50 steps versus the number of colors for various activation levels.1 The means are measured because these represent how well the resources would be able to accomplish tasks while their schedules are being improved, and the cost incurred to achieve the improvement.
The graphs being colored can all be colored with zero conflicts using 4 colors, so these diagrams show the performance when the coloring is over-constrained (2 & 3 colors), critically constrained (4 colors) and under-constrained (5 & 6 colors).
In the conflicts diagram, for two colors, the theoretical lowest value that could be achieved for conflicts is 50%; the colorer achieves a value of around 58% after 50 steps and a mean of 65% (for moderate activation values); in the long term, it achieves a value of around 52%.
The communication diagram shows the mean fraction of nodes that change color (per step); this is proportional to the communication costs. For moderate activation levels, the rate of color change is low, less than 10%.
A straightforward analysis of the Fixed Probability colorer shows that its per-node, per-step computational, storage and communication costs are independent of the number of nodes. The costs do depend on the topology of the graph (roughly speaking, the mean degree) but we expect to deal only with graphs of (approximately) constant mean degree since these are the type that arise in large resource networks.
Experimentally, it was found that the performance of the Fixed Probability algorithm was independent of the size of the graph.
To assess the robustness of the Fixed Probability algorithm, the topology of the graph was adjusted while it was being colored by removing and subsequently reinserting nodes (and adjacent edges). This approximately corresponds to resources failing and recovering in a resource network.
It was found that low level, continuous change in the topology produced small increases in the number of conflicts. In contrast, high level, intermittent changes caused transient spikes in the number of conflicts - if the changes were not too frequent, the coloring robustly adapted to the changes and continued to improve over time.
We also investigated the effect of communication noise/failure and found that low levels of noise/failure induced only small increases in the number of conflicts.
1The results for short-term improvement and robustness are for a variant of the Fixed Probability colorer that has better performance when under-constrained.