Proposal

Eclipse Update Site Proposal

Summary
The current packaging of the Eclipse SDK as a set of large zips (one per platform) is a convenient one stop shop for getting up and running with the SDK. Unfortunately, this approach does not scale well. As more and more Eclipse projects (e.g., EMF, GEF, ...) come into common use, there is increasing interest in including these in pre-packaged zips. Similarly, as Eclipse is used in more varied scenarios, so increases the demand for different packagings (smaller and larger). It is not feasible for eclipse.org to create, manage and distribute pre-packaged configurations to meet all the demands. This proposal details a structure which enables Eclipse component providers to contribute their features and plug-ins and for Eclipse consumers to efficiently select, download and install their components of interest.

Last Modified: November 22, 2004

Problem Definition

The base Eclipse function is currently distributed as roughly the following zip files (as of 3.1M3)

RCP Binary (one per target OS/WS) (8 x ~5MB)
RCP SDK (one per target OS/WS) (8 x ~18MB)
Platform Binary (one per target OS/WS) (8 x ~25MB)
Platform SDK (one per target OS/WS) (8 x ~56MB)
JDT Binary (OS/WS independent) (~15MB)
JDT SDK (OS/WS independent) (~27MB)
PDE Binary (OS/WS independent) (~4MB)
PDE SDK (OS/WS independent) (~5MB)
Full Eclipse SDK zips including source and all doc (one per target OS/WS) (8 x ~88MB)
assorted other zips (e.g., examples, webdav, ...)

Eclipse tools and technology project downloads are available separately and come in a variety of flavours, sizes and organizations. People wanting the base SDK are quite comfortable with the current situation. They need only download the single zip and go. If they need to run on two different platforms they download the zip for that platform and ignore the fact that 95% of the second download is the same as the first.

Users looking to combine the base Eclipse with additional plug-ins/features are somewhat less happy. They have to search over several web sites, identifying downloads (zips) of interest and correlating versions manually. In general, the suppliers of these additional zips do not make it easy for users to understand their offerings.

The main issues with the current setup are:

Does not promote the easy provision, acquisition and use of plug-ins from a variety of sources. The basic premise of Eclipse is that plug-in consumers can gather a set of interesting plug-ins, put them together in an Eclipse platform configuration and enjoy an integrated seemless environment. Currently plug-in producers provide their plug-ins in various forms and the lack of any managed acquisition mechanism forces potential users to grovel around for plug-ins and then hope that they got the right thing.
Does not scale as the scope of Eclipse increases. The current approach addresses the provision/acquisition problem by packaging collections of interesting plug-ins together in a single zip. This works for Eclipse since it is at the bottom of the plug-in stack. However, others providers are not so lucky and are challenged with questions like "how many of my prerequisites should I include in my download?". Further, as more and more plug-ins are added to the Eclipse world, the size of the complete zip increases. A zip of a reasonable set of SDK drops from projects on eclipse.org could easily reach 150MB (i.e., double the Eclipse SDK size). Following the current model, there would be one of these uber-zips per target OS/WS.
Does not scale as the Eclipse use scenarios increase. As the range of people using Eclipse broadens, there will be increasing demand for alternative collections of plug-ins. These will contain either more or less than the Eclipse SDK. While we can take a stab and produce some common packagings, we are are not likely to satisfy all/most people.
Makes poor use of existing disk and bandwidth resources. As mentioned earlier, something like 95% of each OS/WS specific SDK zip is identical content. Further, from build to build, not all plug-ins change. The user doc for example tends to change slowly. Near the end of a milestone or release cycle we frequently rebuild to fix isolated problems. Rather than rebuilding just the plug-ins in question and providing these for download, we rebuild the whole stack and repackage an entire set of full zips. Subsequently, interested developers are forced to redownload the bulk of what they already have. This wastes diskspace and bandwidth for both eclipse.org and our users.
Finally, it is somewhat ironic that the update/install mechanism has been designed in part to address exactly these kinds of concerns yet it is one of the few components that does not see regular use by the Eclipse team itself. As a result, the update/install team misses out on the regular feedback enjoyed by the other Eclipse teams. It is seemingly hard to justify putting update/install forward as a useful technology while not using it ourselves.

Scenarios

Scenario 1: minimal install

User starts with a minimal Eclipse base install. This contains just the runtime, some UI elements and an Update manager UI. Based on this they want to build an install containing additional elements from various sources.

download and unzip the minimal install form eclipse.org or a mirror
start the new install
the user is presented with an update manager UI
user discovers or is presented with update sites
user selects features to install (the candidates presented are appropriate to the current environment e.g., OS/WS/etc as well as already installed components)
the selected features are downloaded
the features are installed into the current configuration
the appropriate product/application (one that was just downloaded) is started and the update manager GUI goes away

Variations

After following the above scenario, the user can manage their configuration either using the steps outlined in Scenario 2 or by restarting their configuration with an indication that the update manager UI should be started. This puts them back at step 3 but with all their previously installed plug-ins still installed.

Scenario 2: building on a product install

In this scenario, the user has an Eclipse product installed (possibly just the Eclipse SDK from a downloaded zip file or the configuration from the steps in scenario 1) and wants to add more function. This pretty much follows the normal update/install scenarios.

user installs an eclipse based product (e.g., Eclipse SDK)
at some point, while running the product, the user invokes the update manager UI
user discovers or is presented with update sites
user selects features to install (the candidates presented are appropriate to the current environment e.g., OS/WS/etc as well as already installed components)
the selected features are downloaded
the features are installed into the current configuration
the installed function is now available to the user

Scenario 3: Web install

Note: This is an advanced scenario that may not be supported in the near term.

This scenario is very much like scenario 1. The main difference is in how the initial install is obtained. Rather than manually downloading and unzipping the minimal install, JNLP (Webstart) is used to fetch and run the minimal install.

user browses some website and clicks on a JNLP link
the appropriate jars are downloaded and "installed" using the JNLP mechanism
the minimal Eclipse install is started
go to step 3 of scenario 1

Problem Solution

We propose to use the Eclipse update technology to address the described problems and implement the listed scenarios. Note that this new structure is a complement to the existing large zip packagings of Eclipse. We are not proposing to stop creating the familiar SDK etc zips. (we leave it to someone more brave to propose that!)

Update site

eclipse.org (and others) will provide update sites sporting all the relevant features and plug-ins. Collections of projects can cooperate and share an update site or each supply/support their own update site. Managing your own eliminates the need for some sort of shared site update mechanism. However, the user must be able to navigate these individual sites seamlessly (see the section on Update manager).

There will be one update site per release family. A release family is a set of releases which form a substantial and separable effort. For example, Eclipse 2.0, 2.1, 3.0 and 3.1 are all different release families. The life of release family continues after its initial release (e.g., 3.0) to include maintenance releases (e.g., 3.0.1, 3.0.2, ...). Isolating release families to their own update sites helps maintain a separation between development going on for the next release family and maintenance work on the previous family.

Versioning

Update sites are not possible unless we use the Eclipse version numbering scheme effectively. The main challenge is for plug-ins to have version numbers that accurately relate to their content. The essential element here is to ensure that if two plug-ins have the same id and version, their content is identical. The easiest way to do this is to use the version qualifier (i.e., the forth version element). For normal (not nightly) builds, the qualifier should be the version tag from the map file used for the build which generated them (e.g., 3.1.0.v20041122). The map file defines the content going into the build. If that does not change, then we can assume that the content of the output plug-in will not change. In this way, identical plug-ins will not be duplicated on the update site and consumers will not have to download them if they already have them. For more detail on plugin versioning, see the Plug-in Versioning Proposal.

Features on the other hand are collections of plug-ins. They are also relatively small. Since the content of a feature is indirectly defined by the content of the plug-ins included in the feature, it is hard to detect change. The safe bet here is to use the build stamp as the qualifier for feature version (e.g., 3.0.0.200402131200). This has the benificial effect of clearly identifying the features for a particular build. Further, since the entire feature set is regenerated, the collection of features with the same version number completely defines the entire build output.

Nightly builds do not go into an update site. There is no easy way to correlate the content in a nightly build (out of HEAD) with that of an integration build (using versioned resources). As a result, all plug-ins would be new and the economies of the proposed solution would not be realized. Further, nightly builds are primarily there for the individual teams to ensure they have not adversely affected other components etc. That is, they do not figure prominently in the scenarios driving this proposal.

Plug-in directories currently reflect the id and version of the plug-in. With the addition of the map stamp, the directory names will grow by typically ~10 chars). It is unlikely but this may push some path length limits. The use of the map stamp in a file/dir name may also place additional constraints the set of chars that can be used in a map stamp (i.e, CVS tag).

Sidenote on version qualifiers: The qualifier (fourth segment) is not interpreted. Qualifiers are compared using lexographical sorting. As such, care should be taken to ensure that the qualifier for subsequent builds monotonically increase. For example, if a build type identifier is to be included in the qualifier, it should be appended and the build date used as the main body (e.g., 20040217-I and 20040213-M7). See the complementary proposal on version numbering for more information.

Update manager

The main challenges for the update manager are in making all of this scale and usable.

the UI must support a structured (hierarchical?) view of the available features. This view (and in fact the underlying feature structure) must be able to span update sites/servers.
users must be able to select some set of root features and versions and then click "all required features". Again, this should span update sites/servers
using these capabilities, user can then create their own uber-features/sites which simply reference features/sites. This becomes in effect a favourites list. They select their favourite root, click "all required" and get everything they need.
the update manager mechanism may cache downloaded elements (features and plug-ins) for use in future install operations. That is, if you already have the plug-in/feature, you don't need to download it again. The cache location should be (optionally) set by the user.
ideally the update/install scenarios described will be possible when running offline providing that all required elements have been previously cached. That is, if the user has already installed the features for a build, she should be able to create a new install without touching a server.
as an advanced (i.e., longer term) item, update manager needs a way of managing its cache/store -- users need a way of purging old elements. Given a set of 1000 plugin jars of various versions, determining which to keep and which to delete is not possible without working from higher level feature groupings. In the near term, the user can delete the whole cache.

Interesting side note: using the runtime's new ability to run directly out of jars, it is possible to have many different eclipse configurations sharing the same actual plug-in files from the update manager's cache/store. This will break some assumptions about plug-ins being together in a plug-ins directory etc (the update reconciler may get upset) and it introduces some challenges for cache clean up but it is an interesting possibility.

PDE Build

PDE Build already updates feature.xml files with the version numbers of the plug-ins involved in the build. To support the version numbering scheme outlined above, PDE Build will be updated to

modify the plug-in version number in each plugin.xml (i.e., add the map stamp). The version number of a plugin.xml is updated only if it contains <_map stamp_> as the version qualifier. This allows plug-ins to opt out of this update mechanism.
modify the version number in the feature.xmls (i.e., add the build stamp). The version number of a feature.xml is updated only if it contains <_map stamp_> as the version qualifier. This allows features to opt out of this update mechanism.
(advanced) modify the version number on <import> statements. Some plug-ins and features may specify a perfect match rule in when listing their dependencies. While this is not recommended, it is certainly possible. Unfortunately, there is a conflict if the intention was that plug-in A depend perfectly on the version of plug-in B with which it was built. During the build, B's version number will be updated according to its map stamp. Plug-in A's <import> statement for B needs to be updated to reflect this new version. While this modification is technically possible, it requires more complex analysis of the plugin.xml file. Updating the version of the plug-in itself can be done using simple string replacement. In this case the builder has to
- look at each <import> using perfect match and wanting build-time map stamp substitutoin,
- identify the pluign being imported,
- discover the new version number
- replace the "substitutution desired" flag in the version number with the correct version number without loosing any other formatting, comments etc.

Note: the plugin.xml updating outlined above must be carried out on any manifest.mf files present in the build.

PDE Build will also be updated to output directly to update site format. This is already largely supported but there maybe some modest work required.

A possible longer term advance is to change PDE build to only build the plug-ins which actually changed. Since we know the map stamp of the plug-ins we want in the output, we can first look for those in the update site (or some cache). If present, there is no need to rebuild them. This will have a huge (positive) impact on rebuilds during release cycles.

Summary

The problem outlined above is making it hard to scale Eclipse. It is hard for people to use large numbers of Eclipse components (e.g., EMF, GEF, ...) and we are failing to put forward a good model of how others would build/stock component systems. The ability to do this easily is key to the overall Eclipse component model. The steps proposed here are a good first try at solving the problem. The result will not be perfect but will allow us to understand what is wrong and fix it. Fortunately, the bulk of the technology required exists in Eclipse today. Also, since we are proposing a complementary approach to the current download strategy, we can stage the implementation of this new mechanism both as we have time and as we learn more.

The key steps that should be taken for 3.0 are

create/populate an update site
update PDE build to automate the stamping of plug-ins and features
a modest amount of work on update manager to enable reasonable workflows (i.e., set a stake in the ground)