The Movies Example
The movies example is included in the example project Xtend Introductory Examples (src/examples6/Movies.xtend) and is about reading a file with data about movies and doing some analysis on it.
The Data
The movie database is a plain text file (data.csv) with data sets describing movies. Here is an example data set:
Naked Lunch 1991 6.9 16578 Biography Comedy Drama Fantasy
The values are separated by two spaces. The columns are :
- title
- year
- rating
- numberOfVotes
- categories
Let us define a data type Movie
representing a data set:
@Data class Movie {
String title
int year
double rating
long numberOfVotes
Set<String> categories
}
A movie is a POJO with a strongly typed field for each column in the data sets. The @Data annotation will turn the class into an immutable value class, that is it will get
- a getter-method for each field,
- a
hashCode()
/equals()
implementation, - implementation of
Object.toString(),
- a constructor accepting values for all fields in the declared order.
Parsing The Data
Let us now add another class to the same file and initialize a field called movies with a list of movies. For the initialization we parse the text file and turn the data records into Movie
s:
import java.io.FileReader
import java.util.Set
import static extension com.google.common.io.CharStreams.*
class Movies {
val movies = new FileReader('data.csv').readLines.map [ line |
val segments = line.split(' ').iterator
return new Movie(
segments.next,
Integer.parseInt(segments.next),
Double.parseDouble(segments.next),
Long.parseLong(segments.next),
segments.toSet
)
]
}
A field’s type can be inferred from the expression on the right hand-side. That is called local type inference and is supported everywhere in Xtend. We want the field to be final, so we declare it as a value using the keyword val
.
The initialization on the right hand side first creates a new FileReader. Then the method readLines()
is invoked on that instance. But if you have a look at FileReader
you will not find such a method. In fact readLines()
is a static method from Google Guava’s CharStreams which was imported as an extension. Extensions allow us to use this readable syntax.
import static extension com.google.common.io.CharStreams.*
CharStreams.readLines(Reader)
returns a List<String>
on which we call another extension method map
. This one is defined in the runtime library (ListExtensions.map(…)) and is automatically imported and therefore available on all lists. The map
extension expects a function as a parameter. It basically invokes that function for each value in the list and returns another list containing the results of the function invocations. Actually this mapping is performed lazily so if you never access the values of the result list, the mapping function is never executed.
Function objects are created using lambda expressions (the code in squared brackets). Within the lambda we process a single line from the text file and turn it into a movie by splitting the string using two whitespace characters as the separator. On the result of the split operation, the method iterator()
is invoked. As you might know String.split(String) returns a string array (String[]
), which Xtend auto-converts to a list when we call Iterable.iterator()
on it.
val segments = line.split(' ').iterator
Now we use the iterator to create an instance of Movie
for each String that it yields. The data type conversion (e.g. String
to int
) is done by calling static methods from the wrapper types. The rest of the Iterable is turned into a set of categories. Therefore, the extension method IteratorExtensions.toSet(Iterator<T>)
is invoked on the iterator to consume its remaining values.
return new Movie (
segments.next,
Integer.parseInt(segments.next),
Double.parseDouble(segments.next),
Long.parseLong(segments.next),
segments.toSet
)
Answering Some Questions
Now that we have parsed the text file into a List<Movie>
, we are ready to execute some queries against it. We use JUnit to make the individual queries executable and to confirm their results.
Question 1 : What Is The Number Of Action Movies?
@Test def numberOfActionMovies() {
assertEquals(828,
movies.filter[ categories.contains('Action') ].size)
}
First the movies are filter
ed. The lambda expression checks whether the current movie’s categories contain the entry 'Action'
. Note that unlike the lambda we used to turn the lines in the file into movies, we have not declared a parameter name this time. We could have written
movies.filter[ movie | movie.categories.contains('Action') ].size
but since we left out the name and the vertical bar the variable is automatically named it
. it
is an implicit variable. It’s uses are similar to the implicit variable this
. We can write either
movies.filter[ it.categories.contains('Action') ].size
or even more compact
movies.filter[ categories.contains('Action') ].size
Eventually we call size
on the resulting iterable which is an extension method, too. It is defined in the utility class IterableExtensions.
Question 2 : What Is The Year The Best Movie From The 80’s Was Released?
@Test def void yearOfBestMovieFrom80s() {
assertEquals(1989,
movies.filter[ (1980..1989).contains(year) ].sortBy[ rating ].last.year)
}
Here we filter
for all movies whose year is included in the range from 1980 to 1989 (the 80’s). The ..
operator is again an extension defined in IntegerExtensions and returns an instance of IntegerRange. Operator overloading is explained in section.
The resulting iterable is sorted (IterableExtensions.sortBy
) by the rating
of the movies. Since it is sorted in ascending order, we take the last movie from the list and return its year
.
We could have sorted descending and take the head of the list as well:
movies.filter[ (1980..1989).contains(year) ].sortBy[ -rating ].head.year
Another possible solution would be to reverse the order of the sorted list:
movies.filter[ (1980..1989).contains(year) ].sortBy[ rating ].reverseView.head.year
Note that first sorting and then taking the last or first is slightly more expensive than needed. We could have used the method reduce
instead to find the best movie which would be more efficient. Maybe you want to try it on your own?
The calls to movie.year
as well as movie.categories
in the previous example in fact access the corresponding getter methods.
Question 3 : What Is The Sum Of All Votes Of The Top Two Movies?
@Test def void sumOfVotesOfTop2() {
val long sum = movies.sortBy[ -rating ].take(2).map[ numberOfVotes ].reduce[ a, b | a + b ]
assertEquals(47_229L, sum)
}
First the movies are sorted by rating, then we take the best two. Next the list of movies is turned into a list of their numberOfVotes
using the map
function. Now we have a List<Long>
which can be reduced to a single Long by adding the values.
You could also use reduce
instead of map
and reduce
. Do you know how?