latex: highlighted escape from verbatim
verbatim environments are very useful for typesetting code; in latex there are several verbatim environments we can choose from:
verbatim
this is a basic verbatim environment; plain, no fancy, no surprise;
\begin{verbatim}
public class HelloWorld {
public static void main(String[] args) {
System.out.println("abc.rmfamily.xyz");
System.out.println("abc.sffamily.xyz");
System.out.println("abc.ttfamily.xyz");
System.out.println("abc.mdseries.xyz");
System.out.println("abc.bfseries.xyz");
System.out.println("abc.upshape.xyz");
System.out.println("abc.itshape.xyz");
System.out.println("abc.slshape.xyz");
System.out.println("abc.scshape.xyz");
System.out.println("abc.large.xyz");
System.out.println("abc.small.xyz");
System.out.println("abc.tiny.xyz");
System.out.println("abc.日本語.xyz");
System.out.println("abc.اللغة العربية.xyz");
System.out.println("abc.cyan.xyz");
System.out.println("abc.magenta.xyz");
System.out.println("abc.yellow.xyz");
System.out.println("abc.black.xyz");
}
}
\end{verbatim}
Verbatim
this is an enhanced verbatim environment, coming from package fancyvrb
; this
verbatim environment provides many more features than the basic verbatim
:
gobble, line numbering, commandchars, etc.; among these features, commandchars
are an important feature that allows escaping from the verbatim environment; we
can use this feature to embed raw latex code in the verbatim text to format it,
despite this is a verbatim environment;
\begin{Verbatim}[commandchars=\\\{\}]
public class HelloWorld \{
public static void main(String[] args) \{
System.out.println("abc.{\rmfamily{rmfamily}}.xyz");
System.out.println("abc.{\sffamily{sffamily}}.xyz");
System.out.println("abc.{\ttfamily{ttfamily}}.xyz");
System.out.println("abc.{\mdseries{mdseries}}.xyz");
System.out.println("abc.{\bfseries{bfseries}}.xyz");
System.out.println("abc.{\upshape{upshape}}.xyz");
System.out.println("abc.{\itshape{itshape}}.xyz");
System.out.println("abc.{\slshape{slshape}}.xyz");
System.out.println("abc.{\scshape{scshape}}.xyz");
System.out.println("abc.{\large{large}}.xyz");
System.out.println("abc.{\small{small}}.xyz");
System.out.println("abc.{\tiny{tiny}}.xyz");
System.out.println("abc.{\japanesefont{日本語}}.xyz");
System.out.println("abc.\RL{\arabicfont{اللغة العربية}}.xyz");
System.out.println("abc.{\color{cyan}{cyan}}.xyz");
System.out.println("abc.{\color{magenta}{magenta}}.xyz");
System.out.println("abc.{\color{yellow}{yellow}}.xyz");
System.out.println("abc.{\color{black}{black}}.xyz");
\}
\}
\end{Verbatim}
minted
what is missing in the above verbatim environments is code highlighting, which
is essential in code typesetting; while there are other packages offering such
functionalities, i recommend using minted
, which is based on pygments;
you know pygments, right? yes, it is the same pygments; by using minted
(and
so pygments) to highlight our code, we make sure the same color scheme is used
in different places, because pygments is ubiquitous;
\begin{minted}{java}
public class HelloWorld {
public static void main(String[] args) {
System.out.println("abc.rmfamily.xyz");
System.out.println("abc.sffamily.xyz");
System.out.println("abc.ttfamily.xyz");
System.out.println("abc.mdseries.xyz");
System.out.println("abc.bfseries.xyz");
System.out.println("abc.upshape.xyz");
System.out.println("abc.itshape.xyz");
System.out.println("abc.slshape.xyz");
System.out.println("abc.scshape.xyz");
System.out.println("abc.large.xyz");
System.out.println("abc.small.xyz");
System.out.println("abc.tiny.xyz");
System.out.println("abc.日本語.xyz");
System.out.println("abc.اللغة العربية.xyz");
System.out.println("abc.cyan.xyz");
System.out.println("abc.magenta.xyz");
System.out.println("abc.yellow.xyz");
System.out.println("abc.black.xyz");
}
}
\end{minted}
escapeinside
the above code is highlighted, but not displayed at its best; notably, there is no good way to display fonts properly, when there are multiple languages in the same verbatim environment; we need some kind of language markup;
we can use option escapeinside
to do the markup; this is a pygments option and
supported by minted
; it takes two characters set by user; contents between the
two characters are treated as escapes, and embedded in the output verbatim;
we can work it out like this:
\begin{minted}[escapeinside=||]{java}
public class HelloWorld {
public static void main(String[] args) {
System.out.println("abc.|{\rmfamily{|rmfamily|}}|.xyz");
System.out.println("abc.|{\sffamily{|sffamily|}}|.xyz");
System.out.println("abc.|{\ttfamily{|ttfamily|}}|.xyz");
System.out.println("abc.|{\mdseries{|mdseries|}}|.xyz");
System.out.println("abc.|{\bfseries{|bfseries|}}|.xyz");
System.out.println("abc.|{\upshape{|upshape|}}|.xyz");
System.out.println("abc.|{\itshape{|itshape|}}|.xyz");
System.out.println("abc.|{\slshape{|slshape|}}|.xyz");
System.out.println("abc.|{\scshape{|scshape|}}|.xyz");
System.out.println("abc.|{\large{|large|}}|.xyz");
System.out.println("abc.|{\small{|small|}}|.xyz");
System.out.println("abc.|{\tiny{|tiny|}}|.xyz");
System.out.println("abc.|{\japanesefont{|日本語|}}|.xyz");
System.out.println("abc.|\RL{\arabicfont{|اللغة العربية|}}|.xyz");
System.out.println("abc.|{\color{cyan}{cyan}}|.xyz");
System.out.println("abc.|{\color{magenta}{magenta}}|.xyz");
System.out.println("abc.|{\color{yellow}{yellow}}|.xyz");
System.out.println("abc.|{\color{black}{black}}|.xyz");
}
}
\end{minted}
but the result is not as expected: the escapes are not parsed as escapes, but as
verbatim texts; the reason is simple: pygments does not allow escape in strings;
this is mentioned in minted
doc; and this issue explains its mechanism;
in short, pygments first scans the text, looking for strings and comments; then it looks for escapes only in non-strings and non-comments; so escapes in strings do not work at all;
patch
this is why i made a patch; this patch improves the latex formatter such
that it breaks a string into substrings and escapes for output, rather than dump
the entire string; this is why we can embed escapes in strings; this patch also
provides a new command line option escapeinstr
, a bool controlling escapes in
strings;
we can test using this command:
pygmentize -l java -f latex -O escapeinside="||" -O escapeinstr
since the output size is big, i omit it in this article;
to use this option in latex, we also patch minted.sty
:
--- minted.sty.orig
+++ minted.sty
@@ -623,6 +623,7 @@
\[email protected]@[email protected]{texcl}{-P texcomments}
\[email protected]@[email protected]{texcomments}{-P texcomments}
\[email protected]@[email protected]{mathescape}{-P mathescape}
+\[email protected]@[email protected]{escapeinstr}{-P escapeinstr}
\[email protected]@[email protected]{linenos}
\[email protected]@opt{style}
\[email protected]@optfv{frame}
this patch is simple: just one line;
test
i made a test file a.tex
;
and compile it with:
xelatex -shell-escape a.tex
note: to make sure compile is fresh, we better delete .aux
and _minted-*
cache files before compile;
the output file a.pdf
;
macro trick
while the result looks good, it may not be obvious why it is good; in fact what happens behind the scene is somewhat tricky;
let us take a line from the test input, say:
System.out.println("abc.|{\bfseries{|bfseries|}}|.xyz");
whose essential part is:
|{\bfseries{|bfseries|}}|
at first glance, we may be expecting a process like this:
|{\bfseries{|bfseries|}}|
{\bfseries{ bfseries }}
but the actual process is like this (using spaces for illustration):
| {\bfseries{ | bfseries | }} |
\PYG{esc}{ {\bfseries{ }\PYG{l+s}{ bfseries }\PYG{esc}{ }} }
{\bfseries{ } bfseries }
the comparison:
{\bfseries{bfseries}} %% expected
{\bfseries{}bfseries} %% actual
this is a macro trick; actually the braces in escaped latex code has disrupted
PYG
stylers, but the actual result happens to be well formed and display the
same as expected; also, note that, in the actual result, the second bfseries
is a string, thus special chars therein are handled automatically by pygments;
this means even if we write xxxx
instead of bfseries
and even if there are
special chars in xxxx
, the result still displays as expected; therefore this
trick is robust and makes a “known path” of doing things;
the coloring code is written in a different way, because it needs to overwrite the pygments highlight color; this process is straightforward without tricks:
| {\color{cyan}{cyan}} |
\PYG{esc}{ {\color{cyan}{cyan}} }
{\color{cyan}{cyan}}
this is the same as what we would expect;