Design Structure Matrix (DSM) clustering
One way to find structure in a graph is by seeing it as a collection of clustered nodes, where two nodes within a single cluster are more connected to each other than two nodes in different clusters. In addition, in a lot of cases, there is a set of nodes that connect to many other nodes in the graph (or in terms of clusters, one cluster connects with many other clusters). Such a set of nodes is named a bus.
The DSM clustering tool aims to heuristically find such a bus and clusters. The implemented algorithm is based on [Wilschut et al. (2017)].
Starting the clustering tool
The clustering tool can be started in the following ways:
-
In Eclipse, right click a
.dsm
file in the Project Explorer tab or Package Explorer tab and choose Cluster a DSM…. -
Right click an open text editor for a
.dsm
file and choose Cluster a DSM…. -
Use the
dsmclustering
tool in a ToolDef script. -
Use the
dsmclustering
command line tool.
Options
Besides the general application options, this application has the following options:
-
Input file path: The absolute or relative file system path to the input DSM file.
-
Output file path: The absolute or relative file system path for writing the generated DSM output file. By default, the output file path is the same as the input file path, but with the
.dsm
extension removed (if it exists), and the_output.dsm
extension added. By setting this option, the default is overridden by the given value. -
Evaporation factor: Factor that influences when a node is considered to be part of a cluster. Higher values leads to higher connection requirements between nodes, which leads to fewer nodes in a single cluster and thus more (smaller) clusters. Between
1.0
and10.0
, default value is2.0
. -
Inflation factor: Factor that influences how fast large values increase and small values decrease, where the small values are eventually eliminated. Higher values of the factor speed up the process. Between
1.0
and4.0
, default value is2.0
. -
Bus detection algorithm: The bus detection algorithm to apply. By default, no bus is detected. See Bus detection algorithms below for more details.
-
Bus factor: Factor that influences when a node is considered to be part of the bus. The actual interpretation of this factor depends on the chosen bus detection algorithm. Default value is
2
. For more information, see Bus detection algorithms below for more details. -
Convergence limit: Allowed remaining numerical error before considering termination of the algorithm. Higher values end the computation sooner at the cost of less precision in the results. Values are between
0
and1
(where0
is not achievable, and1
is not precise enough). Default value is1e-4
. -
Step count: Number of additional nodes to visit each iteration. Between
1
and4
, default value is2
. Changing this values is rarely needed. -
Output groups: Whether to output the node numbers in the bus and each cluster. Default is
true
.
DSM file format
A Design Structure Matrix (DSM) file is an RFC-4180 compliant CSV file that contains an N times N matrix of values. Each line of the file contains a row of the matrix. Within the line, values are separated by commas, and leading and trailing white space around each value is discarded. Both integer and real values are supported, such as 0
, 1
, 1.5
, 1e3
and 1.5e-4
. Negative values, NaN
and infinite values are not allowed. Before the first number at each row there should be a label indicating the name of the element of that row. Optionally, above the first line of data there may be a line of column labels as well. If column labels are present, the top-left cells of the matrix must be empty, and the row and column labels must match. Rows that are shorter than other rows are automatically extended with additional zero values. Zero values may be omitted. Labels that include a comma or space should be surrounded by double quotes, like "Some text, and more text"
.
The following example shows a DSM for a two by two adjacency matrix of elements A
and B
, with column labels:
, A , B
A, 1 , 0
B, 0.5, 0.1
Computing a clustered DSM
Since the DSM clustering tool is based on heuristics, and typically much of the input values in the graph are not hard numbers, there are often several valid answers where some of the answers match your expectations better.
It is therefore recommended to experiment with the various factors somewhat to see what other answers are possible, and whether they make sense.
Bus detection algorithms
Bus detection uses connectivity of the nodes, which is the sum of their in and out degrees.
Currently, the following bus detection algorithm options are available:
-
Fix-point algorithm, named
fix-point
in the tool. This is the fix-point algorithm as introduced in [Wilschut et al. (2017)]. The algorithm repeatedly adds new nodes to the bus with a connectivity higher than the median connectivity of non-bus nodes multiplied by bus factor. The final set of bus nodes is obtained when such new nodes no longer exist. The value of bus factor should be between1
and4
(boundaries included). -
Top-k algorithm, named
top-k
in the tool. This bus detection algorithm selects the nodes with the highest connectivity, where the number of nodes to select as bus nodes is bus factor. The value of bus factor should be an integer between0
and the number of elements in the DSM. Real numbers are truncated. -
No bus, named
no-bus
in the tool. No bus detection mechanism is applied, so no bus elements are detected.
References
-
[Wilschut et al. (2017)] T. Wilschut, L.F.P. Etman, J.E. Rooda and I.J.B.F. Adan, "Multilevel Flow-Based Markov Clustering for Design Structure Matrices", Journal of Mechanical Design, volume 139, issue 12, 2017, doi:10.1115/1.4037626