Proposal |
Summary
The data structure org.eclipse.core.runtime.IPath and its canonical implementation, org.eclipse.core.runtime.Path, impose restrictions on segment names that can be more restrictive than those for file names in the underlying file system. When Eclipse paths are used to represent file system paths, these restrictions prevent valid files from being added to the Eclipse workspace. This document describes the current set of restrictions in Eclipse 3.0, and proposes changes to lift these restrictions.Last modified: October 18, 2004
IPath is an abstract data structure supplied by the org.eclipse.core.runtime plug-in, and consists of the following parts:
To facilitate conversion between IPath and String instances, IPath reserves the colon (':') character as the device delimiter, and the forward and back slash ('/' and '\') characters as segment delimiters. The API javadoc for IPath.isValidSegment outlines the complete set of restrictions:
Lifting the restriction on paths with leading or trailing whitespace and paths containing the '\' character is easily achieved by specifying a new constructor for creation of paths. Lifting the restriction on the ':' character is more difficult, since it is needed as the path delimiter on operating systems that support a device.
The solution must accomodate the two interesting categories of IPath users:
The proposed solution is to introduce two constructors for creating IPath that perform the inverse of the two existing toString methods:
Path.fromOSString
: A factory method that decodes a platform-specific string.
For example, this will parse the output of a previous call to IPath.toOSString,
or the value returned by java.io.File.getAbsolutePath.Path.fromPortableString
: A factory method that decodes a platform-neutral
string, such as the output of a previous call to IPath.toPortableString.
Since changing the behaviour of the existing toString
method would cause too much breakage, an new method, toPortableString
will be introduced for creating a platform-neutral string representation of paths.
The existing toString
method will remain unchanged.
Most clients will use the platform-specific form of paths. The path can be converted to/from a platform-neutral representation when a path needs to be serialized in a portable fashion.
The platform-neutral encoding of paths (IPath.toPortableString
) will allow
all characters except slash ('/') in segment names, and include an optional device
separated from the segments by a single colon character. Literal colon characters in path
segments are escaped through doubling (one colon becomes two colons).
The following are some examples of windows file system paths and the
corresponding platform-neutral encoding:
UNC paths, which typically have no device but have a double leading separator will generally be the same
This platform-neutral encoding unambiguously encodes all possible paths on
all supported platforms. Most importantly, this toPortableString implementation
is fully backward compatible with the Eclipse 3.0 implementation of IPath.toString
for all paths that can be created in Eclipse 3.0. This means that clients who
previously used toString
for serializing paths can move to the
new toPortableString/fromPortableString
methods without
migrating file formats.
The platform-specific Path factory method will impose the minimum platform-specific requirements needed to unambiguosly parse all possible paths on that platform. The Windows implementation, for example, will interpret everything up to the first ':' as the device, and treat both '/' and '\' as path segment separators. No other rules will be imposed. Thus the existing restriction on paths that prevents path segments from having leading or trailing whitespace will no longer be enforced on any platform.
As before, detailed validation of all legal characters and names on that platform will not be enforced. Some clients use technology such as Cygwin or Samba to mount foreign file systems on a platform. In these situations, path name rules for the local file system do not apply. While it is difficult to fully support these users, any additional platform-specific verification performed on paths causes further problems for these users. Imposing the absolute minimum requirements for unamiguously parsing paths allows the majority of users to function without further impacting the corner cases.
The following existing methods on IPath and Path are affected:
fromOSString
factory
method. In other words, path validity becomes a platform-specific issue.
The specification will change to a more ambiguous wording stating only that
certain characters are reserved on some operating systems. In implementation,
it will just check for the device separator on operating systems that require it.
The restriction on leading and trailing spaces in segment names will be
removed on all operating systems that allow such paths.This proposal introduces two factory methods that clearly distinguish platform-neutral and platform-specific encodings of paths. The difficult question is what to do with old single argument Path constructor. The two options are:
Path
constructor, which explicitly
states how it handles ':' and '\' characters. The disadvantage is that this will
require all callers of the existing Path constructors
to migrate to one of the two path factory methods, depending on the origin
of the path string being used. Clients that do not migrate to the new factory
methods risk errors introduced when trying to construct IPath
instances corresponding to file system paths that were previously treated as
invalid. For example, the resources plugin would allow introduction of
resources with the ':' and '\' characters. Other plugins trying to create
a path corresponding to those resources using the old constructors will
fail. Experiments with this solution showed that plug-ins that failed to
migrate to the new factory method were broken due to the unexpected
introduction of previously invalid paths. This presents a bleak picture
for backwards-compatibility, regardless of the fact that no API contracts
are broken.
Path
constructor. On the positive
side, this introduces very little breakage in practice. The net effect
is of removing old restrictions on some operating systems. The only
breakage will be caused to clients who use a device for some reason
on all operating systems, and clients that need to construct IPath
objects representing file system paths from platforms other than the one
that the current Eclipse instance is running in. For example, a plug-in running
on Linux would not be able to use the old constructors to create IPath
objects representing files from a remote Windows system.
After investigating the implementation of both of the above approaches,
the second option introduces the smallest breakage by far. For example,
the first option requires almost all of the 600 references to the Path
constructors found in the current edition of the Eclipse platform. The second
option requires only a small set of localized changes in code that deals
with serializing and deserializing paths in a platform-neutral manner. Based
on testing the implementation of these two options, this proposal recommends
option two.
The following examples illustrate the behaviour of the various Path constructors and to*String methods.
Given the absolute path with device "C:" and single segment "foo", the following IPath methods will produce:
All clients who store absolute IPath objects as platform-neutral strings in a serialized form (as produced by IPath.toString in Eclipse 3.0), should switch to the new fromPortableString/toPortableString methods rather than the Path constructor and the toString method. Backward compatibility with files written by Eclipse 3.0 is automatic (no changes to file format or changing file format version numbers required). Examples of files that contain string representations of paths that will need to migrate include the workspace .project and .classpath files.
Under this proposal IPath.toPortableString and Path.fromPortableString are perfect inverses of each other. In other words, the expression
path.equals(Path.fromPortableString(path.toPortableString()))will be true for all paths, and
string.equals(Path.fromPortableString(string).toPortableString())will be true for all strings that represent canonical paths (strings with duplicate slashes or "." and ".." references will turn out differently). Furthermore, the Eclipse 3.1 implementation of Path.fromPortableString will be the perfect inverse of the Eclipse 3.0 implementation of IPath.toString.
On Unix, the toOSString and fromOSString methods will be inverses of each other. On Windows, the same can only be said for paths that do not contain colon or backslash characters within segment names (such paths are invalid on Windows anyway). Consider the following example:
String input = "foo::bar"; IPath pathOne = Path.fromPortableString(input); IPath pathTwo = Path.fromOSString(pathOne.toOSString()); pathOne.equals(pathTwo) -> false!The input string represents a path with no device, and a single segment whose name is "foo:bar" (invalid on Windows). When this is output using toOSString, it is encoded as "foo:bar". The fromOSString then interprets this as a path with device "foo:" and first segment "bar". Similar mangling occurs if you create a path with a segment containing the backslash character:
String input = "foo\\bar"; IPath pathOne = Path.fromPortableString(input); IPath pathTwo = Path.fromOSString(pathOne.toOSString()); pathOne.equals(pathTwo) -> false!In this case, the input is a path with one segment whose name is "foo\bar". This is interpreted by fromOSString as a path with two segments "foo" and "bar". In other words, under this proposal you cannot reliably manipulate paths containing backslash or colon using to/fromOSString on Windows. This seems to be an acceptable limitation.