latex中字母大小写转换实践

要点概览

目的
- 实现常用的字母大小写转换，可以用于常规的字符串处理以及biblatex参考文献样式定义
- 全部大写/全部小写
- 句首大写(句首字母大写其它不变，句首字母大写其它小写)
- 词首大写(句中每个单词的首字母大写，其它小写)
运行环境
- tex环境：texlive 2017
- 词首大写方法:(a)利用mfirstuc中的命令; (b)定义宏并利用biblatex中命令
注意点：
- biblatex中处理域格式的命令的层级要正确

之所以研究字母大小写的转换，最直接的原因是在定制biblatex参考文献样式的过程中，某些域的存在大小写要求。尽管这种要求可以通过在bibtex数据源文件输入时候完成，但对于参考文献样式定制者来说，能够通过格式定义来简化用户的输入是一件很有意义的事情。事实上latex内核提供字母大小写转换命令，但这是整体的转换，对于句首和词首大小写转换则没有提供，这就需要通过定义宏实现，而有些宏包提供有这些功能的宏。据我了解，biblatex很早就提供了句首大写命令，而较新的宏包mfirstuc提供了词首大写命令。在参考文献格式定制过程中，我首先测试了mfirstuc提供的命令，在域格式定义中使用出现错误。于是自己定义了一个宏来实现，但同样出错。在咨询biblatex作者后，发现是这是使用方式问题，也就是说无论是mfirstuc提供的命令的命令还是自定义的命令都是可以使用的。

下面我们通过实践来总结一下latex中的字母大小写转换:

整体大小写转换

latex提供了两个基本的大小写转换命令，\MakeUppercase和\MakeLowercase分别将参数整体转换为大写和小写。

\documentclass{article}
\usepackage{etoolbox,xstring,mfirstuc,textcase}
\usepackage[backend=biber,style=numeric]{biblatex}%
\addbibresource[location=local]{egbib.bib}
\begin{document}
\MakeUppercase{Electronic resources : selection and $x \neq Y$}\par
\MakeLowercase{Electronic resources : selection and $x \neq Y$}\par
\MakeTextUppercase{Electronic resources : selection and $x \neq Y$}\par
\MakeTextLowercase{Electronic resources : selection and $x \neq Y$}
\end{document}

结果为:

ELECTRONIC RESOURCES : SELECTION AND X ̸= Y
electronic resources : selection and x ̸= y
ELECTRONIC RESOURCES : SELECTION AND x ̸= Y
electronic resources : selection and x ̸= Y

注意:其中还给出了textcase宏包提供的两个替代命令:\MakeTextUppercase 和 \MakeTextLowercase。
其差别在于，textcase提供的这两个命令不会将参数中的数学公式转换大小写。

另外latex3提供了titlecase功能，可以利用使用命令：

\text_lowercase:n
\text_uppercase:n {⟨tokens⟩}
\text_uppercase:nn {⟨language⟩} {⟨tokens⟩}
\text_uppercase:n
\text_titlecase:n
\text_titlecase_first:n
\text_lowercase:nn
\text_uppercase:nn
\text_titlecase:nn
\text_titlecase_first:nn

句首字母转换为大写

句首字母转换为大写，biblatex宏包和mfirstuc宏包都有提供命令。

biblatex提供了:

\MakeCapital{text}，类似于\MakeUppercase，但仅将text的第一个可打印字符转换为大写
\MakeSentenceCase{text}，将text参数转换为句子模式，即字符串中的第一个单词首字母大写而剩下其他部分转换为小写。

mfirstuc提供了:

\makefirstuc，提供了类似\MakeCapital的功能。

\documentclass{article}
\usepackage{etoolbox,xstring,mfirstuc,textcase}
\usepackage{amsmath,amsfonts,amssymb,amsthm}
\usepackage[backend=biber,style=numeric]{biblatex}%
\addbibresource[location=local]{egbib.bib}
\begin{document}
\MakeCapital{electronic resources : Selection and x ≠ Y x \neq Y x=Y}\par
\MakeSentenceCase{electronic resources : Selection and x = y x= y x=y}\par% x ≠ Y x \neq Y x=Y
\makefirstuc{electronic resources : Selection and x ≠ Y x \neq Y x=Y}
\end{document}

结果为:

Electronic resources : Selection and x ̸= Y
Electronic resources : selection and x = y
Electronic resources : Selection and x ̸= Y

可以看到，首字母由小写变为大写，\MakeSentenceCase命令还将句子中的大写转换为小写，注意，在\MakeSentenceCase的参数中，一些数学符号命令会导致错误。

词首字母转换为大写

将词首字母转换为大写有两种方法，一种是使用mfirstuc提供的\capitalisewords命令，另一种是自定义命令。使用\capitalisewords命令比较简单，放到后面说明，下面我们先介绍自定义命令的方式。

首先来分析一下要定义的命令需要实现的功能:它能接受一个句子(字符串)，并能根据句中空格解析不同的词，并将词首字母大写。

本着从易到难的思路，我们从最简单的例子出发，假设一个句子只有两个词，中间由空格隔开，那么对其进行转换的话，我们可以定义一个很简单的命令:

\documentclass{article}
\usepackage{etoolbox,xstring,mfirstuc,textcase}
\usepackage{amsmath,amsfonts,amssymb,amsthm}
\usepackage[backend=biber,style=numeric]{biblatex}%
\addbibresource[location=local]{egbib.bib}
\begin{document}\def\strtowds#1 #2{\MakeSentenceCase{#1} \MakeSentenceCase{#2}}%\strtowds{abc cn} %error!%\strtowds{abc}{cn} %error!\strtowds abc cn\def\abc{abc cn}\expandafter\strtowds\abc%\expandafter\strtowds{abc cn} %error!
\end{document}

其中\strtowds命令解析一个由空格分隔的字符串，并利用\MakeSentenceCase命令将解析后的两部分字符串首字母大写。注意该命令的参数分界和输入流，当其后跟的参数输入流，正确的方式是\strtowds abc cn，当需要在其后使用一个宏所代表的字符串时，则需要利用\expandafter将该宏展开为输入流。

递归解析输入流设计宏

根据这种思路，我们写出一个宏，首先能将输入参数转换为一个输入流，然后对这个流进行解析:

\def\strtowdudf#1{%\edef\coefone{#1}%%\meaning\coefone\expandafter\stowparse@udf\coefone%}

其中\strtowdudf宏将输入参数转换为输入流后，由\stowparse@udf宏进行解析。这种思路是很正常的，那么下面需要将\stowparse@udf宏实现出来。我们知道解析是通过字符串中的空格进行的，也就是可以将其作为参数的分界符。根据空格对分界后的两部分进行分别处理，其中第一部分进行大小写转换操作，第二部分继续进行解析，于是我们写出为:

\def\stowparse@udf#1 #2\par{\def\coefone{#1}\def\coeftwo{#2}%%\par\meaning\coefone \meaning\coeftwo\ifx\coeftwo\@empty%\MakeSentenceCase{#1 }\par%\else%\MakeSentenceCase{#1 }%\edef\coeftwos{#2\par}%\expandafter\stowparse@udf\coeftwos%\@empty%\par\fi%}

通过测试，发现当使用\par作为第二个参数后的分界时，宏能正常工作。\stowparse@udf宏读取其后的以\par结尾的输入流进行解析，当输入流中存在空格，那么空格前面内容定义为\coefone，后面内容定义为\coeftwo，当\coeftwo不为空时，则继续利用\stowparse@udf对其进行解析。需要注意到这种利用\par作为输入流结尾分界的宏定义，利用了\par命令在行末自动插入空格，并且解析需要将\par从输入流中丢出，因此使用\strtowdudf命令时，其后要跟换行，否则就会出错。这也就说明这个宏的设计其实是有问题的。而这个问题的根子还在于\strtowdudf的设计，主要是它没有对输入参数做进一步的处理，以得到适合解析的输入流。我们观察，解析过程中剩下一个词的情况，比如abc cn def ghi字符串解析到剩下ghi时，这时#2\par就是ghi\par，那么再对其进行解析时必须要存在空格，否则无妨解析，这里之所以能工作是因为换行自动插入了空格。所以当结尾分界不使用\par而使用其它任何字符时，都会出错。因此可以知道，要时\stowparse@udf能够正确的解析，那么其输入流中必须要有空格，因此我们重新设计\strtowdudf，使的\stowparse@udf的输入流中总是带有空格，而不需要某些情况下需要换行提供。

\documentclass{article}
\usepackage{etoolbox,xstring,mfirstuc,textcase}
\usepackage{amsmath,amsfonts,amssymb,amsthm}
\usepackage[backend=biber,style=numeric]{biblatex}%
\addbibresource[location=local]{egbib.bib}
\makeatletter\newrobustcmd{\strtowdudfa}[1]{%\edef\coefone{#1 $}%%注意这里的空格对于定界很关键%\meaning\coefone\expandafter\stowparsea@udf\coefone%}\def\stowparsea@udf#1 #2${%\par%\@empty\def\coefone{#1}\def\coeftwo{#2}%%\par\meaning\coefone \meaning\coeftwo\ifx\coeftwo\@empty%\MakeSentenceCase{#1 }%\else%\MakeSentenceCase{#1 }%\def\coeftwos{#2$}%\expandafter\stowparsea@udf\coeftwos%\@empty%\par\fi%}
\makeatother
\begin{document}
\strtowdudfa{abc cn}\par
\strtowdudfa{abc cn def ghi}
\end{document}

结果为:

Abc Cn
Abc Cn Def Ghi

其中，\strtowdudfa将输入参数末尾加入一个空格并用$符号作为结尾分界。\stowparsea@udf则以$符号作为结尾分界进行解析。通过测试这个宏完全满足了提出的要求。

利用xstring宏包的字符串处理功能设计宏

上一小节，我们直接根据输入流递归解析来实现了目的。如果从编程更容易的角度，其实可以利用专门处理字符串的宏包xstring来实现宏的设计。

xstring宏包提供了命令\StrPosition命令可以返回字符串中某一字符的位置，而\StrSplit命令可以将字符串根据位置分割成两个部分。我们先设计一个简单的宏:

\newcommand\strxparse[1]{
\StrPosition{#1}{ }[\posspace]%name is a cs
\StrSplit{#1}{\posspace}{\stra}{\strb}
\MakeSentenceCase{\stra}\MakeSentenceCase{\strb}}
\strxparse{abc edf}

\strxparse宏首先得到输入参数中空格的位置\posspace，然后根据该位置将其分为两个部分，并对两个部分进行\MakeSentenceCase处理。通过测试表明利用\StrPosition和\StrSplit命令是可行的。那么如果要处理多个词，就需要进行循环，所以我们要先验证一下能否实现循环，于是我再设计一个宏:

\newcommand\strxparsea[1]{
\edef\strtobeparse{#1}
\StrPosition{\strtobeparse}{ }[\posspace]%name is a cs
\StrSplit{\strtobeparse}{\posspace}{\stra}{\strb}
\MakeSentenceCase{\stra}\MakeSentenceCase{\strb}}
\strxparsea{abc edf}

该宏与前一个宏的差别在于将输入参数表示为\strtobeparse，该宏能正常工作说明可以对\strtobeparse表示的字符串进行分割，那么当循环时，只要将要需要解析的字符串不断地定义为它，那么就能实现循环。下面我们利用tex中loop命令来实现循环:

\documentclass{article}
\usepackage{etoolbox,xstring,mfirstuc,textcase}
\usepackage{amsmath,amsfonts,amssymb,amsthm}
\usepackage[backend=biber,style=numeric]{biblatex}%
\addbibresource[location=local]{egbib.bib}%\newcounter{countA}
\newif\ifemptystr
\newcommand\strxparseb[1]{\edef\strtobeparse{#1}%\setcounter{countA}{0}\loop%\StrPosition{\strtobeparse}{ }[\posspace]%name is a cs\StrSplit{\strtobeparse}{\posspace}{\stra}{\strb}%%\stepcounter{countA}%%\arabic{countA}\meaning\strtobeparse\meaning\posspace\meaning\stra\meaning\strb\par %\meaning\emptystr%\ifnumequal{\value{countA}}{3}{\emptystrfalse\MakeSentenceCase{\strb}}{\emptystrtrue\MakeSentenceCase{\stra}}%\IfStrEq{\stra}{}{\emptystrfalse\MakeSentenceCase{\strb}}{\emptystrtrue\MakeSentenceCase{\stra}}%\ifemptystr%\let\strtobeparse=\strb%%\arabic{countA}\meaning\strtobeparse\par\repeat}
\begin{document}
\strxparseb{abc cn}\par
\strxparseb{abc cn def ghi}
\end{document}

其中，\strxparseb首先将输入参数表示为\strtobeparse，接着获取其中第一个空格的位置，利用该位置将其分割为\stra和\strb，当\stra为空时，表示解析到最后一个词(注意当\StrPosition没有找到字符时，返回0，且\StrSplit命令从位置0分割时，分割得到第一部分为空)，则利用\emptystrfalse将判断表达式设置为false，否则为true，表示循环继续，设置\strtobeparse=\strb继续分割。其中注释掉的部分是利用计数器来测试指定数量的循环。

上面MWE得到的结果为:

Abc Cn
Abc Cn Def Ghi

利用mfirstuc宏包的提供的命令

mfirstuc宏包提供的命令\capitalisewords可以实现词首大写功能。同时该宏包还可以利用\MFUnocap指定某些词不用进行大写。比如:

\documentclass{article}
\usepackage{etoolbox,xstring,mfirstuc,textcase}
\usepackage{amsmath,amsfonts,amssymb,amsthm}
\usepackage[backend=biber,style=numeric]{biblatex}%
\addbibresource[location=local]{egbib.bib}
\begin{document}
\capitalisewords{in the long run, We would like to use all-caps fonts}\par
\MFUnocap{to}
\MFUnocap{the}
\MFUnocap{in}
\capitalisewords{in the long run, We would like to use all-caps fonts}\par
\capitalisewords{abc cn}\par
\capitalisewords{abc cn def ghi}
\end{document}

结果为:

In The Long Run, We Would Like To Use All-caps Fonts
In the Long Run, We Would Like to Use All-caps Fonts
Abc Cn
Abc Cn Def Ghi

在biblatex参考文献样式中应用大小写转换命令

在biblatex参考文献样式定制中，常常需要进行域格式定义。比如:

%\DeclareFieldFormat{booktitle}{\strxparseb{#1}}%\strxparseb\strtowdudfa\capitalisewords\MakeCapital\MakeSentenceCase
\DeclareFieldFormat{booktitle}{\mkbibitalic{#1}}%\mkbibemph

即对booktitle域进行一个格式处理，上述的定义中，测试发现\mkbibemph等biblatex提供的命令没有问题，但是新定义的\strxparseb和\strtowdudfa，以及\capitalisewords都会报错。错误提示为:

! Missing \endcsname inserted.
<to be read again>\protect
1.41 \printbibliography[heading=bibliography]

我开始以为这是宏的处理不适应于biblatex，于是希望从biblatex.STY中寻找答案。但在深入研究之前决定咨询一下biblatex作者，很快作者给了回复，问题出在了域格式的使用上。

事实上，biblatex通过多层域格式来实现复杂的功能，比如这里实现一个斜体效果，但对于booktitle域还有针对该域本身的更低一层的域格式titlecase。比如:

{\printtext[booktitle]{%\printfield[titlecase]{booktitle}%\setunit{\subtitlepunct}%\printfield[titlecase]{booksubtitle}}%\newunit}

biblatex作者指出，直接针对域内容处理的格式应该在titlecase域格式中设置。显然，答案就是这样，在其中使用就没有问题。而且测试biblatex提供的\MakeSentenceCase命令，它放在booktitle域格式中同样也会出错，这说明了同样的问题。

于是重新定义如下:

\DeclareFieldFormat{bkttlcase}{\strtowdudfa{#1}}%\capitalisewords\strtowdudfa\renewbibmacro*{booktitle}{%\ifboolexpr{test {\iffieldundef{booktitle}}andtest {\iffieldundef{booksubtitle}}}{}{\printtext[booktitle]{%\printfield[bkttlcase]{booktitle}%\setunit{\subtitlepunct}%\printfield[titlecase]{booksubtitle}}%\newunit}%\printfield{booktitleaddon}}

如此实现了需要的词首字母大写需求。但是标题内全部单词首字母大写的方式其实在英文出版中并不常见，而通常是其它两种模式即句子模式和标题模式，句子模式首字母大写其它小写，标题模式则大写nouns, pronouns, adjectives, verbs and adverbs等词首字母，其它如articles, conjunctions and short prepositions则不大写。这种标题模式可能是我们需要使用的，但全部单词首字母大写是达不到这种要求的，显然真正的需求是根据需要大写单词首字母。而这可是使用\capitalisewords配合\MFUnocap命令实现，利用\MFUnocap枚举所有不需要大写的单词，然后使用\capitalisewords实现句首大写以及其它没有限制的单词的首字母大写，尽管这使用起来并不是那么的方便。

当然还有更简单的不用处理的方式，即前面说过的，在bib源文件中由参考文献用户自己定义大小写，而样式根据定义大小写原样输出。

小结

通过上述实践，完成了latex中字符大小写转换测试，并且在biblatex参考文献样式定制中进行了应用。
在整个实践过程中，不仅了解了多个宏包的提供的功能，也更进一步理解了tex的宏定义机制。

参考资料

[1]. Philipp Lehman, Philip Kime, Audrey Boruvka and JosephWright. The biblatex Package.
[2]. Philipp Lehman, Joseph Wright. The etoolbox Package.
[3]. David Carlisle. The textcase package.
[4]. Nicola L.C. Talbot. mfirstuc.sty v2.04: uppercasing first letter
[5]. Latex help e-book
[6]. LATEX2e: An unofficial reference manual
[7]. D. E. KNUTH. The TEXbook. trans by xianxian & zoho.
[8]. Victor Eijkhout. TEX by Topic, A TEXnician’s Reference. trans by zoho.
[9]. Make the first letter of each word uppercase in sentence
[10]. various case styles-Headings and publication titles

##ps:

略

##history：
v1.0 20171112 完成基本内容