## Design Structure Matrix (DSM) clustering

One way to find structure in a graph is by seeing it as a collection of *clustered nodes*, where two nodes within a single cluster are more connected to each other than two nodes in different clusters. In addition, in a lot of cases, there is a set of nodes that connect to many other nodes in the graph (or in terms of clusters, one cluster connects with many other clusters). Such a set of nodes is named a *bus*.

The DSM clustering tool aims to heuristically find such a bus and clusters. The implemented algorithm is based on [Wilschut et al. (2017)].

### Starting the clustering tool

The clustering tool can be started in the following ways:

In Eclipse, right click a

`.dsm`

file in the*Project Explorer*tab or*Package Explorer*tab and choose**Cluster a DSM…**.Right click an open text editor for a

`.dsm`

file and choose**Cluster a DSM…**.Use the

`dsmclustering`

tool in a ToolDef script.Use the

`dsmclustering`

command line tool.

### Options

Besides the general application options, this application has the following options:

*Input file path*: The absolute or relative file system path to the input DSM file.*Output file path*: The absolute or relative file system path for writing the generated DSM output file. By default, the output file path is the same as the input file path, but with the`.dsm`

extension removed (if it exists), and the`_output.dsm`

extension added. By setting this option, the default is overridden by the given value.*Evaporation factor*: Factor that influences when a node is considered to be part of a cluster. Higher values leads to higher connection requirements between nodes, which leads to fewer nodes in a single cluster and thus more (smaller) clusters. Between`1.0`

and`10.0`

, default value is`2.0`

.*Inflation factor*: Factor that influences how fast large values increase and small values decrease, where the small values are eventually eliminated. Higher values of the factor speed up the process. Between`1.0`

and`4.0`

, default value is`2.0`

.*Bus detection algorithm*: The bus detection algorithm to apply. By default, no bus is detected. See Bus detection algorithms below for more details.*Bus factor*: Factor that influences when a node is considered to be part of the bus. The actual interpretation of this factor depends on the chosen bus detection algorithm. Default value is`2`

. For more information, see Bus detection algorithms below for more details.*Convergence limit*: Allowed remaining numerical error before considering termination of the algorithm. Higher values end the computation sooner at the cost of less precision in the results. Values are between`0`

and`1`

(where`0`

is not achievable, and`1`

is not precise enough). Default value is`1e-4`

.*Step count*: Number of additional nodes to visit each iteration. Between`1`

and`4`

, default value is`2`

. Changing this values is rarely needed.*Output groups*: Whether to output the node numbers in the bus and each cluster. Default is`true`

.

### DSM file format

A Design Structure Matrix (DSM) file is an RFC-4180 compliant CSV file that contains an *N* times *N* matrix of values. Each line of the file contains a row of the matrix. Within the line, values are separated by commas, and leading and trailing white space around each value is discarded. Both integer and real values are supported, such as `0`

, `1`

, `1.5`

, `1e3`

and `1.5e-4`

. Negative values, `NaN`

and infinite values are not allowed. Before the first number at each row there should be a label indicating the name of the element of that row. Optionally, above the first line of data there may be a line of column labels as well. If column labels are present, the top-left cells of the matrix must be empty, and the row and column labels must match. Rows that are shorter than other rows are automatically extended with additional zero values. Zero values may be omitted. Labels that include a comma or space should be surrounded by double quotes, like `"Some text, and more text"`

.

The following example shows a DSM for a two by two adjacency matrix of elements `A`

and `B`

, with column labels:

```
, A , B
A, 1 , 0
B, 0.5, 0.1
```

### Computing a clustered DSM

Since the DSM clustering tool is based on heuristics, and typically much of the input values in the graph are not hard numbers, there are often several valid answers where some of the answers match your expectations better.

It is therefore recommended to experiment with the various factors somewhat to see what other answers are possible, and whether they make sense.

### Bus detection algorithms

Bus detection uses connectivity of the nodes, which is the sum of their in and out degrees.

Currently, the following bus detection algorithm options are available:

*Fix-point algorithm*, named`fix-point`

in the tool. This is the fix-point algorithm as introduced in [Wilschut et al. (2017)]. The algorithm repeatedly adds new nodes to the bus with a connectivity higher than the median connectivity of non-bus nodes multiplied by*bus factor*. The final set of bus nodes is obtained when such new nodes no longer exist. The value of*bus factor*should be between`1`

and`4`

(boundaries included).*Top-k algorithm*, named`top-k`

in the tool. This bus detection algorithm selects the nodes with the highest connectivity, where the number of nodes to select as bus nodes is*bus factor*. The value of*bus factor*should be an integer between`0`

and the number of elements in the DSM. Real numbers are truncated.*No bus*, named`no-bus`

in the tool. No bus detection mechanism is applied, so no bus elements are detected.

### References

[Wilschut et al. (2017)] T. Wilschut, L.F.P. Etman, J.E. Rooda and I.J.B.F. Adan, "Multilevel Flow-Based Markov Clustering for Design Structure Matrices", Journal of Mechanical Design, volume 139, issue 12, 2017, doi:10.1115/1.4037626