Use the UTF-8, Luke! File Encodings in IntelliJ IDEA

Posted on by cdr

Today we would like to answer the most frequent questions about file encodings in the IDE and show you a few tricks, which may help you to avoid potential pitfalls.

What is the problem with file encodings?

To be able to display the text correctly, IntelliJ IDEA needs to know which file encoding to use. Unfortunately, it is not always possible to tell the file encoding without additional information. Especially when single-byte encodings are used, there are multiple mappings possible.

However, things look better for UTF-family encodings. The UTF family consists of:

  • Several multi-byte encodings like UTF-16 or UTF-32, which are easily detectable by the BOM (Byte Order Mark) word in the beginning of the file.
  • The UTF-8 variable-bytes-per-character encoding which also can be auto-detected either by optional BOM or some specific byte combinations.

In particular, for an English character subset, the UTF-8 encoded file looks exactly like old plain ASCII text. That’s why UTF-8 is so popular and that’s why it’s the most preferred encoding.

How does the IDE determine encoding for the file?

IntelliJ IDEA uses multi-stage educated guessing, from most obvious to far-stretched.

First, if the BOM present, use the corresponding UTF-family encoding. Check if the file type declares the encoding itself and use that. For example, JSP files can specify the encoding right in the text:

Check if you have specified the encoding explicitly and use that. You can specify the desired encoding for the file or for the containing directory or for the whole project or for the IDE. IntelliJ IDEA will use the most specific encoding:

Try to figure out the encoding using some hints or heuristics. For example, when Auto-detect UTF-8 is selected, the IDE will analyze the file looking for some byte combinations which are UTF-8-specific.

Finally, use the project-level or, if the project is unavailable, the application-level encoding.
See Settings → File Encoding → Project Encoding → IDE Encoding.

What happens when I try to change the file encoding?

If the file encodings are completely compatible for this text, e.g. when changing English characters text from US_ASCII to UTF-8, IntelliJ IDEA just silently re-assigns encoding.

However, if the encodings are sufficiently different, the IDE have to ask you:

  • Whether you want to reload the file from disk in the other encoding.
    In this case IDEA will replace editor with text from the file decoded with the new encoding.
  • Or you would like to convert the text on the editor to the file using the other encoding.
    Here, IDEA will encode the text in the editor window using the new encoding and overwrite the file.

Please note these little gray exclamation marks, meaning that that particular conversion/reload can cause information loss.

For example when you try to reload UTF-8 encoded file with the US-ASCII encoding, losing the non-english characters in the process.

Or when you try to save the German umlauts to the plain text ISO-8859-1 file.

What else IntelliJ IDEA can do for me?

IntelliJ IDEA will warn you when you try to swear in German in an ASCII-only document:

To enable this inspection, go to Settings → Inspections → Lossy Encoding.

Likewise, IntelliJ IDEA will try to detect the situation when you load rich-encoded text with incompatible encoding:

What is the ultimate advice you have regarding file encodings?

To avoid any problems with file encoding we strongly recommend to use UTF-8.

That’s all for today and we hope this article was useful for you!

Develop with Pleasure!

Comments below can no longer be edited.

18 Responses to Use the UTF-8, Luke! File Encodings in IntelliJ IDEA

  1. Andrey Volkov says:

    March 18, 2013

    Intellij Idea has a great file encoding support. However, have Jetbrains team fixed http://youtrack.jetbrains.com/issue/IDEA-77537 (IDE file encodings should be per project and not have a global value)?

  2. Nicolas GANDRIAU says:

    March 18, 2013

    “Please note these little gray exclamation marks, meaning that that particular conversion/reload can cause information loss.”
    I cannot see them 🙁

  3. Geoffrey De Smet says:

    March 19, 2013

    “To enable this inspection, go to Settings → Inspections → Lossy Encoding.” – Does this mean it’s not enabled by default? Why not?

  4. неизвестный бобра желатель says:

    March 19, 2013

    Please fix http://youtrack.jetbrains.com/issue/IDEA-103058 in the release branch.

  5. Alexey Kudravtsev says:

    March 19, 2013

    Andrey: Does Settings|File Encoding|Project Encoding work for you?

  6. Alexey Kudravtsev says:

    March 19, 2013

    Nocolas: please try the latest IDEA 12.1 EAP available in http://confluence.jetbrains.com/display/IDEADEV/IDEA+12+EAP

  7. Andrey Volkov says:

    March 19, 2013

    Alexey: I have not tried Idea 12.1 yet, but it does not work on 12.0.

  8. Daniel Serodio says:

    March 21, 2013

    While were at it, can IDEA-102314 (Console encoding is wrong) be fixed in this release?

    http://youtrack.jetbrains.com/issue/IDEA-102314

  9. Markus says:

    February 18, 2014

    It would be cool to convert all files in a directorie, e.g. when importing an project that uses non-UTF-8.

  10. Alban says:

    February 20, 2014

    I have a weird behavior when I’m using IntelliJ 13 on Windows (cp1252 system encoding). All my project files are UTF-8 encoded but my logs, my webservices results, etc are not properly encoded (lots of é) when I run my application inside IntelliJ.

    Whatever I set in Settings → File Encoding, I have the same (wrong) result.

    But on Linux, it works (as always).

  11. jiaozebo says:

    July 28, 2014

    when comlilling utf0-8 encoding java src file(in android studio who is extended by intellij), i got this error:
    “Error(1,1)illegalcharacter ‘\ufeff’ when compiling on android studio”.
    how can i fix this error?
    please help me!

    • jiaozebo says:

      July 28, 2014

      i’m working on windows 7 system

      • Canadian Cavalry says:

        October 31, 2016

        I know this is from 2014 but just in case anyone else is having this issue….You need to find the line of code giving you this error and make sure there are no blank spaces before, after or during the line. It happens when you paste code containing a character that the ide does not recognize.

  12. Dan Chheng says:

    October 22, 2015

    Does IntelliJ Idea 14.x support Khmer Unicode?

  13. Tamas Kalman says:

    July 9, 2016

    EMOJIs are not showing in source code at all.
    Is that normal?
    Other UTF-8 characters seems to be normal.

  14. Daniel says:

    August 9, 2016

    Yeah, new RubyMine version (2016.2) has this major unpleasant regression – EMOJIs do not handled correctly
    Very sad
    Versions 7 and 8 works well with EMOJIs – have to use previous versions of product

  15. Dave says:

    March 13, 2017

    “swearing in German” the word “kopiëren” is actually a Dutch word and simply means ‘copy’.

  16. rolando.guo says:

    June 6, 2019

    I use idea “annotate” function to look line numbers of some file, that’s show “number of lines annotated by Git is not equal to number of lines in the file,check file encording and line separators”,but checked all of my idea encording ,that’s utf-8

Subscribe

Subscribe for updates