Last modified: June 12, 2003
Plan item description: Eclipse 2.1 uses a single global file encoding setting for reading and writing files in the workspace. This is problematic; for example, when Java source files in the workspace use OS default file encoding while XML files in the workspace use UTF-8 file encoding. The Platform should support non-uniform file encodings. [Platform Core, Platform UI, Text, Search, Compare, JDT UI, JDT Core] [Theme: User experience] (bug 37933, 5399)The current situation is as follows:
ResourcesPlugin.getEncoding
returns the default encoding for
the workspace (the org.eclipse.core.resources.encoding
preference
value if available, otherwise the value of the file.encoding
Java system property).IFile.getContents
/setContents
work with byte streams
- no encoding can be applied.IFile.getEncoding
tries to guess the file encoding (looking
for the Byte Order
Mark), which is not enough. Also, this API has no known client
so far. This API method would be deprecated.ResourcesPlugin.getEncoding (same value
for all sources)
.The encoding for a resource (as returned by IResource.getCharset
- see API changes) will be:
IResource.setCharset
-
see API changes), if any, orRegarding #2, an extension-point would allow file format-aware encoding interpreters to register to the encoding discovery mechanism for specific file types (extensions) or to associate existing encoding interpreters to their own file extensions. Users would be able to associate more file extensions for the known interpreters (preference).
All clients, when creating character-based streams when reading/writing the
contents of a file resource, should pass along the charset string obtained from
IFile.getCharset
instead of the one provided by ResourcesPlugin.getEncoding
.
Examples are: text editors, compiler, search, compare.
Also, setting the encoding for a resource would generate a resource change event, but only for the directly affected resource (if clients are interested on what effects the change in a directory had on files inside it, they will have to find it out by themselves).
public void IResource.setCharset(String charsetName) throws CoreException
Sets the charset name for this resource. May be null
, which sets
it to default. For the workspace root, it sets the workspace's default encoding
preference to the charset's canonical name (or to the default encoding, if null
was provided).
public String IResource.getCharset() throws CoreException
Returns the name of the charset for this resource. For files, if none has been
defined (with setCharset
), returns the default charset. To determine
the default charset, it tries to guess it by a) inspecting the file contents
(BOM), b) calling the corresponding encoding interpreter (if any). Otherwise,
the parent's charset is returned. For the workspace root, a charset corresponding
to the workspace's default encoding preference is returned.
public boolean IResource.isDefaultCharset() throws CoreException
Returns true
if the currently configured charset was not explicitly
set by the user - (has a default value either guessed by file contents, or inherited from parent).
public static final int IResourceDelta.ENCODING = 0x100000;
public String IResourceDelta.getNewCharset();
public String IResourceDelta.getOldCharset();
For notifying changes in file encodings. Both methods should only be called
only valid when getKind()==CHANGE
, and (getFlags()&ENCODING)!=0
.
public interface IEncodingInterpreter { /** returns null if the charset cannot be determined. */ public String interpretCharset(java.io.InputStream input); }
Encoding interpreters will be associated to file types through a new core resources extension point. Users can associate additional file extensions ia preferences.
The platform would provide itself implementations for xml and other popular (?) file formats.
public int IFile.getEncoding() public int IFile.ENCODING_* constants
The encoding settings metadata will be stored inside the project's content area so it can be easily shared.