documented latex sources

when we download a latex package from ctan (eg: xcolor), we would often find it does not contain .sty but .dtx and .ins files; however, if we follow instructions in its readme, then we can generate .sty files from .dtx and .ins files; this is irrelevant for users who install latex packages via tds archives or os package managers; but latex packages shipped with os may be outdated, and not every package offers tds archives (eg: xcolor), so in that case we need some knowledge about .dtx and .ins files;

before reading sections below, i suggest downloading the source of xcolor to experiment with; xcolor has a very nice readme to try it out;

what is dtx

dtx stands for documented latex source; since it is latex source, we can run it through the latex program (or its derivations like pdflatex, etc.):

pdflatex xcolor.dtx

this gives us xcolor.pdf, which is the documentation of the xcolor package;

so, processing a .dtx file with latex gives a document; reasonable, but then what would be its difference from a regular .tex file?

why use dtx

a package needs code, code documentation and user documentation; dtx makes it possible to combine all three in one file, with additional benefits:

the documentation can be formatted with latex commands;
there is no need to update code separately in documentation;
the code can be extracted from the combined file;

writing typeset code documentation mixed with code itself is commonly known as literate programming;

a simple dtx

we can either write a .dtx file from scratch, or convert from an existing .sty file; i think the latter is more illustrative; so now we construct a simple example.dtx file from the following example.sty file:

\NeedsTeXFormat{LaTeX2e}[1994/06/01]
\ProvidesPackage{example}[2020/02/02 Example package]
\RequirePackage{lmodern}
\newcommand{\myname}{foo}
\DeclareOption{bar}{
  \renewcommand{\myname}{bar}
}
\ProcessOptions\relax
\newcommand{\showname}{\myname}
\endinput

while there are tools to do this job, here we want to handcraft it to make it as simple as possible; the crafted example.dtx is pasted below; dont be afraid of its size and complexity, since we will be analyzing it shortly:

% \iffalse meta-comment
% Copyright (C) 2020 Author
% \fi
% \iffalse
%<*driver>
\ProvidesFile{example.dtx}
\documentclass{ltxdoc}
\usepackage{example}
\EnableCrossrefs
\CodelineIndex
\RecordChanges
\begin{document}
  \DocInput{example.dtx}
\end{document}
%</driver>
% \fi
% \title{The \textsf{example} package}
% \author{Author}
% \date{}
%
% \maketitle
%
% \section{User Documentation}
%
% \StopEventually{
%   \PrintChanges
%   \PrintIndex
% }
%
% \section{Code Documentation}
%
% \iffalse
%<*package>
% \fi
%    \begin{macrocode}
\NeedsTeXFormat{LaTeX2e}[1994/06/01]
\ProvidesPackage{example}[2020/02/02 Example package]
\RequirePackage{lmodern}
\newcommand{\myname}{foo}
\DeclareOption{bar}{
  \renewcommand{\myname}{bar}
}
\ProcessOptions\relax
\newcommand{\showname}{\myname}
\endinput
%    \end{macrocode}
% \iffalse
%</package>
% \fi
%
% \Finale
\endinput

this is the accompanying example.ins:

%% Copyright (C) 2020 Author

\input docstrip.tex
\keepsilent
\askforoverwritefalse

\preamble
Copyright (C) 2020 Author
\endpreamble

\usedir{tex/latex/example}

\generate{\file{example.sty}{\from{example.dtx}{package}}}

\obeyspaces
\Msg{*************************************************************}
\Msg{* Done!                                                     *}
\Msg{*************************************************************}

\endbatchfile

now we can play with these files:

to extract example.sty from example.dtx:
```
pdflatex example.ins
```
it should give an identical example.sty except for comments and empty lines;
generate package documentation example.pdf:
```
pdflatex example.dtx
pdflatex example.dtx
makeindex -s gind.ist example.idx
pdflatex example.dtx
pdflatex example.dtx
```
it needs multiple commands because of the index; if you dont want the index, then just run the first command;

tools behind dtx

there are 2 important tools behind dtx:

docstrip:

strip comments and extract code blocks from dtx;
doc (and ltxdoc):

format latex user and code documentations;

the complexity of dtx mainly comes from the need to be readable by both tools, while remaining valid latex source at the same time; by the way, if you are really interested, docstrip and doc were written by the same person, so there is little surprise that they work together despite the intricacies;

the dtx trinity

now we delve further into dtx grammar; when analyzing dtx, it is very helpful to understand the dtx trinity:

the dtx is one file, but three views;

the first view

the first view of example.dtx:

\NeedsTeXFormat{LaTeX2e}[1994/06/01]
\ProvidesPackage{example}[2020/02/02 Example package]
\RequirePackage{lmodern}
\newcommand{\myname}{foo}
\DeclareOption{bar}{
  \renewcommand{\myname}{bar}
}
\ProcessOptions\relax
\newcommand{\showname}{\myname}
\endinput

the first view is about docstrip and .ins file;

first of all, we use an .ins file to extract code from a .dtx file; looking at the code, it is obvious that example.ins runs docstrip on example.dtx;

then, it is noticeable that example.dtx contains both commented and uncommented lines; when we run example.ins through latex, the commented lines in example.dtx are removed by docstrip;

but there are some strange comments: <*package> and </package>, and similar ones for driver; these are markers that mark a code block; docstrip understands these markers; in fact, when we ask docstrip to extract code from example.dtx into example.sty using marker package:

\generate{\file{example.sty}{\from{example.dtx}{package}}}

it extracts all uncommented lines between <*package> and </package> (and all uncommented lines outside of any markers) in example.dtx and writes them into example.sty (multiple empty lines are merged into one); actually, docstrip also recognizes single-line markers like <package>, but those are omitted here for brevity; additionally, docstrip stops reading if it hits \endinput;

the second view

the second view of example.dtx:

\ProvidesFile{example.dtx}
\documentclass{ltxdoc}
\usepackage{example}
\EnableCrossrefs
\CodelineIndex
\RecordChanges
\begin{document}
  \DocInput{example.dtx}
\end{document}

the second view is about latex source;

as we have said, a .dtx file is a valid latex source; this means, from a latex point of view, only uncommented lines matter (not even docstrip markers); looking at the code, the uncommented lines include a small piece of driver code followed by package source; however, the driver code ends with a line:

\end{document}

this means all lines thereafter are ignored;

so, from a latex point of view, the .dtx file is merely a driver code;

but how is the body of the generated .pdf file filled with contents?

the third view

the third view of example.dtx:

\title{The \textsf{example} package}
\author{Author}
\date{}

\maketitle

\section{User Documentation}

\StopEventually{
  \PrintChanges
  \PrintIndex
}

\section{Code Documentation}

\begin{macrocode}
\NeedsTeXFormat{LaTeX2e}[1994/06/01]
\ProvidesPackage{example}[2020/02/02 Example package]
\RequirePackage{lmodern}
\newcommand{\myname}{foo}
\DeclareOption{bar}{
  \renewcommand{\myname}{bar}
}
\ProcessOptions\relax
\newcommand{\showname}{\myname}
\endinput
\end{macrocode}

\Finale
\endinput

the third view is about doc;

the body of the driver code includes a single line:

\DocInput{example.dtx}

here the entire example.dtx file is sourced and typeset by doc; the magic (and source of confusion) of the typeseting here is:

commented lines (lines beginning with %) are treated as uncommented;

this explains why we need many \iffalse guards: contents between \iffalse and \fi are filtered because of the \iffalse, not the leading %; if we ignore contents between these guards, we will see what typeset into the body are texts and some macrocode environments; the macrocode environment typesets code verbatim; for the macrocode environment to work correctly, we must have exactly 4 spaces between the leading % and \end{macrocode} in the .dtx file, and we usually do the same for \begin{macrocode};

from sty to dtx

now we have understood what dtx is and how to view a .dtx file, we come up with the next question:

how to convert an existing .sty file into a .dtx file?

first of all, there is good news: the accompanying .ins file does not have to change (except for metadata such as package name); so we can take the above example.ins file for another package;

in fact, we can take the above example.dtx file as well; what we need to do here is to replace contents between \begin{macrocode} and \end{macrocode} with contents of the new .sty file;

this should work except when the new .sty file has commented lines (lines beginning with %); if so, these lines are typeset verbatim as code, because they are within the macrocode environment; this is usually not what we want;

to fix this problem, 2 things need to be done:

we need to end macrocode before commented lines and begin another macrocode after commented lines;
we need to change % to % ^^A, if we want to keep comments as comments; % is ignored by doc, and ^^A is provided as a comment character;

tools

there are some useful tools to make working with dtx easier:

dtxgen:

this tool creates a new .dtx file using a builtin template; use this tool if you want to start writing a new .dtx file;
sty2dtx:

this tool converts a .sty file into a .dtx file; it can track macro definitions and put them in special environments in the generated .dtx file; it can also generate a .ins file; empty lines are removed, which could be a source of bug; so you probably want to double check the .sty file extracted from the generated .dtx file matches the original; other than this, it is a great tool if you need to create a .dtx file from an existing .sty file;

summary

this is really the nutshell of dtx; we analyzed its basic structure but didnt talk much about specific commands; on the one hand, users can do much more with user documentation and code documentation, using various commands provided by package doc and ltxdoc; on the other hand, the docstrip package has commands to control code extraction which werent covered here; interested readers should consult their package documentation and references linked below; latex3 users may want to check l3doc;