There are many beautiful fonts available for Latin characters. There are also quite a few nice fonts for CJK (Chinese/Japanese/Korean) characters. However, it’s still challenging to make them work together and display text pleasantly. Here we show where things can go wrong and how to fix them.

Problem 1: Incomplete coverage

Many fonts are only designed for Latin, or Latin with one additional language. For example, UnBatang is a famous Korean font, but covers very few Hanzi/Kanji/Hanja. Thus it’s not suitable for Chinese/Japanese display.

Problem 2: Character overlap

Even though UnBatang doesn’t have enough Kanji coverage, it does cover Hiragana and Katakana. The problem is, another font may cover them as well. For example, IPAGothic covers them because it’s a Japanese font. If you use these two fonts together then they will conflict. In this case you want to make sure IPAGothic has a higher priority over UnBatang when displaying Hiragana and Katakana, because the latter only has limited support for Japanese.

As another example, AR PL UMing CN is a Chinese font but covers kana and a potion (but not all) of Japanese Kanji. If you want to display both Chinese and Japanese at the same time, then you want to make sure the another Japanese font has higher priority over AR PL UMing CN to prevent mixed styles of Japanese Kanji. However, it’s likely that the Japanese font covers lots of Kanji and those Kanji overlap with Chinese Hanzi (See CJK Unified Ideographs). If the Japanese font has priority, then those unified ideographs will be displayed in the Japanese manner, even though it’s used in a Chinese context. An example of CJK unified ideographs is (U+5929). This character has its lower bar longer in Chinese, but upper bar longer in Japanese. In general there’s no way to tell which glyph should be used without additional mechanisms (such as variation selectors). This has contrast against western languages such as Latin, Greek, and Cyrillic.

If you wonder how many CJK unified characters there are, you may read these tables to get an idea. However, keep in mind that there are even more in extensions:

You may download the latest Unicode Standard here. You may also download the closely related ISO/IEC 10646 here, which shares the same character codes as Unicode. To understand the difference between Unicode and ISO 10640, click here.

Problem 3: Incorrect glyph

Take IPAGothic for example. This font is good until you want to display the backslash \, because then you will find, instead of \, it displays the Japanese yen symbol ¥. This dates back to some old Japanese code page. See the description here. In short this has troubled users and should be seen as a design flaw because the code point for ¥ should be U+00A5 instead of overwriting the backslash at U+005C.

Problem 4: Monospace is not monospace

This problem has an interesting story. The font in question is Noto Sans CJK KR released by Google. This font can be downloaded here. Unzip the download, install the font file NotoSansMonoCJKkr-Regular.otf and you find a new font named Noto Sans Mono CJK KR. This must be a monospace font right? Look at this bug and you will find this Mono font is actually not mono: Korean characters are narrower than Chinese and Japanese characters. The bug was issued about 3 years ago and still a won’t-fix now.

Further research shows this font was actually co-created by Google and Adobe, and also released by Adobe under the name Source Han Sans. The README is here, which mentions Korean hangul letters and syllables are monospaced at 920 (instead of 1000) units. So basically the hangul letters and syllables are only monospaced to themselves. Well, it’s questionable whether this is true monospace. But anyway, the Adobe font name doesn’t have Mono at all so I’m not expecting it to be monospace but only a sans-serif. That being said, the name Noto Sans Mono CJK KR is indeed confusing and better be fixed somehow. It’s not like you switch Latin letters into half-width so that the entire font becomes monospace.

Problem 5: Latin is not half-width

Programmers like monospace fonts. CJK programmers like monospace fonts with half-width Latin characters. Unfortunately many monospace fonts are not. A good example is WenQuanYi Micro Hei. Both Latins and CJKs are monospaced, but Latins are not at 1/2 width of CJKs. So lines contain CJKs won’t always match other lines. This is bad news.

Fortunately we have an easy solution for this. Simply overwrite Latins with another font that has strict half-width Latins. I’d definitely recommend Inconsolata. If you use it to print all printable Latin1 letters you will understand why it’s awesome. Thank god we only need ONE such font.

Problem 6: Font bug

WenQuanYi Micro Hei is a great font. But it once had a bug that made it unusable for Korean. Due to wrong advance width, Korean letters simply stack and not recognizable at all. This bug was reported here and here. However, while some Linux distros have patched in time, others are still leaving this issue open as of today. Sigh.

Solution

So many problems prove CJK users are living so hard getting their fonts displayed correctly. Now I will give the solution here.

CJK users need 6 font classes:

  • serif+Latin.
  • serif+CJK.
  • sans-serif+Latin.
  • sans-serif+CJK.
  • monospace+Latin.
  • monospace+CJK.

The idea is to use good fonts, and only good fonts. Fortunately, 3 font families can satisfy all the needs:

  • Droid:

    Droid has both serif and sans-serif variants. So it satisfies both serif+Latin and sans-serif+Latin.

    Additionally, Droid Sans Fallback has pan-CJK coverage, which makes it good for sans-serif+CJK and monospace+CJK.

  • Source Han Serif:

    Source Han Serif is part of the Source Han brand. It also has pan-CJK coverage. This makes it good for serif+CJK. Similar to Source Han Sans, this font also has narrower Korean problem. But we are using it as serif, who cares about width?

  • Inconsolata:

    Inconsolata satisfies monospace+Latin. I tried about 100 other fonts. There is no better choice. Perfect half-width. Works nicely with Droid Sans Fallback. Simply no better choice.

Due to distro-specific font packaging, the operations we perform are different.

Arch Linux

  1. Install these fonts:

    pacman -S adobe-source-han-serif-otc-fonts ttf-droid ttf-inconsolata
    
  2. Put this in /etc/fonts/local.conf.

    <?xml version="1.0"?>
    <!DOCTYPE fontconfig SYSTEM "fonts.dtd">
    <fontconfig>
        <alias>
            <family>serif</family>
            <prefer>
                <family>Droid Serif</family>
                <family>Source Han Serif SC</family>
            </prefer>
        </alias>
        <alias>
            <family>sans-serif</family>
            <prefer>
                <family>Droid Sans</family>
                <family>Droid Sans Fallback</family>
            </prefer>
        </alias>
        <alias>
            <family>monospace</family>
            <prefer>
                <family>Inconsolata</family>
                <family>Droid Sans Fallback</family>
            </prefer>
        </alias>
    </fontconfig>
    
  3. Rebuild font information cache:

    fc-cache -vf
    

Fedora

On Fedora, I recommend using Noto fonts over Source Han fonts. These two fonts are essentially the same. However, Noto fonts have OTC packages, while Source Han fonts only have region-specific subset OTF packages:

# dnf repoquery -l google-noto-serif-cjk-ttc-fonts
Last metadata expiration check: ...
/etc/fonts/conf.d/65-0-google-noto-serif-cjk-ttc.conf
/usr/share/fontconfig/conf.avail/65-0-google-noto-serif-cjk-ttc.conf
/usr/share/fonts/google-noto-cjk
/usr/share/fonts/google-noto-cjk/.uuid
/usr/share/fonts/google-noto-cjk/NotoSerifCJK-Black.ttc
/usr/share/fonts/google-noto-cjk/NotoSerifCJK-Bold.ttc
/usr/share/fonts/google-noto-cjk/NotoSerifCJK-ExtraLight.ttc
/usr/share/fonts/google-noto-cjk/NotoSerifCJK-Light.ttc
/usr/share/fonts/google-noto-cjk/NotoSerifCJK-Medium.ttc
/usr/share/fonts/google-noto-cjk/NotoSerifCJK-Regular.ttc
/usr/share/fonts/google-noto-cjk/NotoSerifCJK-SemiBold.ttc

# dnf repoquery -l adobe-source-han-serif-cn-fonts
Last metadata expiration check: ...
/etc/fonts/conf.d/65-2-adobe-source-han-serif-cn.conf
/usr/share/fontconfig/conf.avail/65-2-adobe-source-han-serif-cn.conf
/usr/share/fonts/adobe-source-han-serif-cn
/usr/share/fonts/adobe-source-han-serif-cn/.uuid
/usr/share/fonts/adobe-source-han-serif-cn/SourceHanSerifCN-Bold.otf
/usr/share/fonts/adobe-source-han-serif-cn/SourceHanSerifCN-ExtraLight.otf
/usr/share/fonts/adobe-source-han-serif-cn/SourceHanSerifCN-Heavy.otf
/usr/share/fonts/adobe-source-han-serif-cn/SourceHanSerifCN-Light.otf
/usr/share/fonts/adobe-source-han-serif-cn/SourceHanSerifCN-Medium.otf
/usr/share/fonts/adobe-source-han-serif-cn/SourceHanSerifCN-Regular.otf
/usr/share/fonts/adobe-source-han-serif-cn/SourceHanSerifCN-SemiBold.otf
/usr/share/licenses/adobe-source-han-serif-cn-fonts
/usr/share/licenses/adobe-source-han-serif-cn-fonts/LICENSE.txt

You can visit its official repo for explanations about OTCs and region-specific subset OTFs: https://github.com/adobe-fonts/source-han-serif/

The OTCs have full character coverage, while the OTFs only include glyphs for characters for a particular region. This does make a difference: When we set CSS property font-family: Source Han Serif CN; on an HTML element containing mixed CJK contents, JK contents may not display properly. Similarly, if we set font family for J(K) content, then CK(CJ) contents may not display properly. We can fix this problem by using OTC fonts: font-family: Noto Serif CJK SC; always works.

You can open this test html page in your own browser (after install both google-noto-serif-cjk-ttc-fonts and adobe-source-han-serif-cn-fonts):

<pre style="float:left; font-family: Source Han Serif CN">
いろはにほへとちりぬ
일이삼사오육칠팔구십
一二三四五六七八九十
Source Han Serif CN
(Korean is broken)
</pre>
<pre style="float:left; font-family: Noto Serif CJK SC">
いろはにほへとちりぬ
일이삼사오육칠팔구십
一二三四五六七八九十
Noto Serif CJK SC
(all is well)
</pre>

Understanding this, we can follow similar steps from Arch Linux:

  1. Install these fonts:

    dnf install \
        google-noto-serif-cjk-ttc-fonts \
        google-noto-sans-cjk-ttc-fonts \
        google-droid-serif-fonts \
        google-droid-sans-fonts \
        google-droid-sans-mono-fonts \
        levien-inconsolata-fonts
    

    We have installed more than necessary, but all are good fonts anyway.

  2. Put this in /etc/fonts/local.conf.

    <?xml version="1.0"?>
    <!DOCTYPE fontconfig SYSTEM "fonts.dtd">
    <fontconfig>
        <alias>
            <family>serif</family>
            <prefer>
                <family>Droid Serif</family>
                <family>Noto Serif CJK SC</family>
            </prefer>
        </alias>
        <alias>
            <family>sans-serif</family>
            <prefer>
                <family>Droid Sans</family>
                <family>Droid Sans Fallback</family>
            </prefer>
        </alias>
        <alias>
            <family>monospace</family>
            <prefer>
                <family>Inconsolata</family>
                <family>Droid Sans Fallback</family>
            </prefer>
        </alias>
    </fontconfig>
    
  3. Rebuild font information cache:

    fc-cache -vf
    

Terminal fonts

While the above solutions work well in GUI, they may work differently in CLI. For example, WenQuanYi Micro Hei Mono looks much nicer than Inconsolata in my terminal. So I can define another font family alias terminal like this:

    <alias>
        <family>terminal</family>
        <prefer>
            <family>WenQuanYi Micro Hei Mono</family>
            <family>Noto Sans Mono CJK SC</family>
            <family>Noto Sans Mono CJK TC</family>
            <family>Noto Sans Mono CJK JP</family>
            <family>Noto Sans Mono CJK KR</family>
        </prefer>
    </alias>

Add this in /etc/fonts/local.conf. Then rebuild font information cache. Then set the terminal program to use font terminal. Now GUI and CLI use different fonts, both with proper fallbacks.

Screenshot

To prove this method works, I made a screenshot displaying all printable Latin1 characters and some CJK characters:

!"#$%&'()*+,-./01234
56789:;<=>?@ABCDEFGH
IJKLMNOPQRSTUVWXYZ[\
]^_`abcdefghijklmnop
qrstuvwxyz{|}~¡¢£¤¥¦
§¨©ª«¬­®¯°±²³´µ¶·¸¹º
»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎ
ÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâ
ãäåæçèéêëìíîïðñòóôõö
÷øùúûüýþÿ
いろはにほへとちりぬ
일이삼사오육칠팔구십
一二三四五六七八九十
壹贰叁肆伍陆柒捌玖拾
壹貳叄肆伍陸柒捌玖拾
日本东京中国收入经济
日本東京中國收入經濟
日本東京中国収入経済

cjk-fonts.jpg

References