Scripting Excel Spreadsheets using Epsilon¶
Spreadsheets are commonly used in software and systems engineering processes to capture and analyse structured data, and can be sources of valuable information for model-based software engineering activities.
Epsilon provides built-in support for querying and transforming Excel spreadsheets through an Apache POI-based EMC driver. This article discusses how you can configure an Epsilon program to query and modify an Excel spreadsheet, and the video below demonstrates the driver in action.
Citing the Excel EMC driver in a publication?
If you are referring to Epsilon's Excel EMC driver in a publication, please cite this paper instead of the website URL.
Regression in Epsilon 2.2
Due to a regression, the Excel driver is broken in Epsilon 2.2. The driver works well in previous versions (e.g. 2.1) as well as in 2.3. Support for column datatypes and for writing to Excel spreadsheets has improved substantially in 2.3.
Worksheets, Columns and Rows¶
Essentially, in the Excel driver, by default, worksheets are treated as model element types (e.g. Student
, Staff
, Module
and Mark
in the spreadsheet below), columns as their properties (e.g. Mark
has student
, module
and mark
properties), and rows are treated as model elements (i.e. there are two students, two members of staff, three modules and two marks in the spreadsheet below).
A | B | C | D | E | F | |
1 | id | firstname | lastname | age | supervisor | modules |
2 | jt501 | Joe | Thompson | 23 | mt506 | MSD,RQE |
3 | js502 | Jane | Smith | 22 | mt506 | MSD,HCI |
A | B | C | D | E | F | |
1 | id | firstname | lastname | teaches | ||
2 | mt506 | Matthew | Thomas | MSD,RQE | ||
3 | dj503 | Daniel | Jackson | HCI |
A | B | C | D | E | F | |
1 | id | title | term | |||
2 | MSD | Modelling and System Design | Autumn | |||
3 | HCI | Human Computer Interaction | Spring | |||
4 | RQE | Requirements Engineering | Spring |
A | B | C | D | E | F | |
1 | student | module | mark | |||
2 | jt501 | MSD | 62 | |||
3 | js502 | HCI | 74 |
References and Column Types¶
The driver supports specifying additional configuration information (e.g. column data types, references between columns) about a spreadsheet in the form of an external XML document, that can be attached to the spreadsheet in Epsilon's run configuration dialog.
For our example spreadsheet, above, the configuration file below specifies the types of the age
and mark
columns of the spreadsheet, the multiplicity of the teaches
column, as well as references between the Student.supervisor
and Staff.id
, and the Staff.teaches
and Module.id
columns.
<spreadsheet>
<worksheet name="Student">
<column name="age" datatype="integer"/>
<column name="modules" many="true"/>
</worksheet>
<worksheet name="Mark">
<column name="mark" datatype="integer"/>
</worksheet>
<worksheet name="Staff">
<column name="teaches" many="true" delimiter=","/>
</worksheet>
<reference source="Student->supervisor"
target="Staff->id"/>
<reference source="Student->modules"
target="Module->id"/>
<reference source="Staff->teaches"
target="Module->id"/>
<reference source="Mark->module"
target="Module->id"/>
<reference source="Mark->student"
target="Student->id"/>
</spreadsheet>
The format of the XML configuration document is as follows:
Worksheet¶
Each worksheet can have an optional name (if a name is not provided, the name of the worksheet on the spreadsheet is used) and acts as a container for column
elements.
Column¶
Each column
needs to specify at least either its zero-based index
in the context of the worksheet it belongs to, or its name
(if a name is not provided, the cell value in the first row is used, as discussed above). Additionally, a column
can have:
alias
: an alternative name for the column.datatype
: the type of the values in the column. As of Epsilon 2.5.0, the valid options (ignoring case) arestring
,integer
,boolean
,double
, orfloat
. The default datatype isstring
.many
: it can betrue
orfalse
. Iftrue
, then its cells will be considered to contain unbounded collections, separated by thedelimeter
(see below). Columns are single-valued by default.delimeter
: the delimiter used to separate values ifmany
istrue
. The default delimeter is,
.
As of 2.6.0, numeric values are converted to various datatypes as follows:
string
: the textual representation of the cell's value is used, applying its declared format if possible. If the cell format is not supported, a format similar to the Excel General format will be used:#
for whole numbers and#.##########
for decimal numbers (see the Apache POI DateFormatter documentation).integer
,double
, andfloat
: the cell value is computed (using the formula if one is present), and then cast to the appropriate datatype.boolean
: zero values will be treated asfalse
, and non-zero values will be treated astrue
.
Issues with numeric cell values in Epsilon 2.5.0 and earlier
In Epsilon 2.5.0 and earlier releases, the Excel driver only supported the integer
, double
, and float
datatypes for cells with numeric values. string
and boolean
datatypes were ignored, and treated as double
. See issue #89.
Reference¶
In a configuration document we can also specify ID-based references to capture relationships between columns belonging to potentially different worksheets. Each reference has a source
and a target
column, an optional name
(if a name is not specified, the name of the source column is used to navigate the reference), a cardinality (many
attribute), and specifies whether updates to cells of the target column should be propagated automatically (cascadeUpdates
attribute) to the respective cells in the source column to preserve referential integrity.
Querying and Modifying Spreadsheets¶
Having specified the configuration document above, we can now query the spreadsheet with EOL as follows.
// Returns all students supervised by Matthew Thomas
Student.all.select(s|s.supervisor?.lastname = "Thomas");
// Returns the modules taught by Daniel Jackson
Module.all.select(m|
Staff.all.exists(s|
s.firstname="Daniel" and s.teaches.includes(m)));
Creating Rows¶
As discussed above, worksheets are treated as types and rows as their instances. As such, to create a new row in the Student
worksheet, EOL's new
operation can be used.
var student : new Student;
Deleting Rows¶
To delete a row from a worksheet, EOL's delete
operator can be used. When a row is deleted, all the rows that contain cells referring to it through cascade-update references also need to be recursively deleted.
var student = Student.all.selectOne(s|s.id = "js502");
// deletes row 2 of the Student worksheet
// also deletes row 3 of the Mark worksheet
delete student;
Modifying Cell Values¶
If a cell is single-valued, a type-conforming assignment can be used to edit its value. For example, the following listing demonstrates modifying the age and the supervisor of a particular student.
var student : Student = ...;
var supervisor : Staff = ...;
student.age = 24;
student.supervisor = supervisor;
If on the other hand the cell is multi-valued, then its values should be handled as a collection. Adding/removing values from property collections has no effect on the spreadsheet; you need to re-assign values instead.
// Moves a module between two members of staff
var from : Staff = ...;
var to : Staff = ...;
var module : Module = ...;
// Neither of these will work
// from.teaches.remove(module);
// to.teaches.add(module);
// ... but these will
from.teaches = from.teaches.excluding(module);
to.teaches = to.teaches.including(module);
Updating the value of a cell can have side effects to other cells that are linked to it through cascade-update references to preserve referential integrity. For example, updating the value of cell A3 in the Module
worksheet, should trigger appropriate updates in cells D2 and F2 of the Staff
and Student
worksheets respectively.
Validating and Transforming Spreadsheets¶
Of course, we can also validate spreadsheets using EVL, transform them into other models using ETL, and into text using EGL, generate graphical views using Picto etc.
context Staff {
constraint NotOverloaded {
check: self.teaches.size() <= 4
message: "Member of staff" + self.firstname +
" " + self.lastname + " is overloaded"
}
}
Creating Spreadsheets¶
To create a spreadsheet from scratch (e.g. when it is produced by an ETL transformation), we also need to specify an index
for each column in the XML mapping file. Below is an EOL program that creates the spreadsheet above from scratch, and the mapping file for it. The complete example is in Epsilon's Git repo.
// Create the modules
var MSD = new Module(id="MSD",
title="Modelling and System Design", term="Autumn");
var HCI = new Module(id="HCI",
title="Human Computer Interaction", term="Spring");
var RQE = new Module(id="RQE",
title="Requirements Engineering", term="Spring");
// Create the staff
var matthew = new Staff(id="mt506", firstname="Matthew",
lastname="Thomas", teaches=Sequence{MSD, RQE});
var matthew = new Staff(id="dj503", firstname="Daniel",
lastname="Jackson", teaches=Sequence{HCI});
// Create the students
var joe = new Student(id="jt501", firstname="Joe",
lastname="Thompson", age="23", supervisor=matthew, modules=Sequence{MSD, RQE});
var jane = new Student(id="js502", firstname="Jane",
lastname="Smith", age="22", supervisor=matthew, modules=Sequence{MSD, HCI});
// Create the marks
new Mark(student=joe, module=MSD, mark=62);
new Mark(student=jane, module=HCI, mark=74);
<spreadsheet>
<worksheet name="Student">
<column index="0" name="id"/>
<column index="1" name="firstname"/>
<column index="2" name="lastname"/>
<column index="3" name="age" datatype="integer"/>
<column index="4" name="supervisor"/>
<column index="5" name="modules" many="true"/>
</worksheet>
<worksheet name="Mark">
<column index="0" name="student"/>
<column index="1" name="module"/>
<column index="2" name="mark" datatype="integer"/>
</worksheet>
<worksheet name="Staff">
<column index="0" name="id"/>
<column index="1" name="firstname"/>
<column index="2" name="lastname"/>
<column index="3" name="teaches" many="true" delimiter=","/>
</worksheet>
<worksheet name="Module">
<column index="0" name="id"/>
<column index="1" name="title"/>
<column index="2" name="term"/>
</worksheet>
<reference source="Student->supervisor"
target="Staff->id"/>
<reference source="Student->modules"
target="Module->id"/>
<reference source="Staff->teaches"
target="Module->id"/>
<reference source="Mark->module"
target="Module->id"/>
<reference source="Mark->student"
target="Student->id"/>
</spreadsheet>
Working with Formulas¶
To set the value of a cell to a formula, start its value with =
as shown below. The complete example is in Epsilon's Git repo.
var calc : new Calc;
calc.a = 1;
calc.b = 2;
calc.sum = "=A2+B2";
calc.sum.println(); // Prints 3
<spreadsheet>
<worksheet name="Calc">
<column index="0" name="a" datatype="integer"/>
<column index="1" name="b" datatype="integer"/>
<column index="2" name="sum" datatype="integer"/>
</worksheet>
</spreadsheet>
Reflective Access¶
To iterate over all the worksheets, columns and rows of a speadsheet without referring to them by name, we can use the following statements (assuming that our Excel spreadsheet is named M
in the run configuration). Additional methods of interest for this mode of access can be found in the Javadoc of the underlying ExcelModel and SpreadsheetModel classes.
// Iterate over all worksheets
for (w in M.worksheets) {
w.name.println();
// Iterate over all columns
// of the worksheet
for (c in w.header.columns) {
c.name.println("\t");
}
// Iterate over all rows
// of the worksheet
for (r in w.rows) {
r.println("\t");
}
}
Resources¶
- This article shows how to use Excel spreadsheets in ANT/Gradle/Maven builds.