| 12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273 | <refentry xmlns="http://docbook.org/ns/docbook"          xmlns:xlink="http://www.w3.org/1999/xlink"          xmlns:xi="http://www.w3.org/2001/XInclude"          xmlns:src="http://nwalsh.com/xmlns/litprog/fragment"          xmlns:xsl="http://www.w3.org/1999/XSL/Transform"          version="5.0" xml:id="make.index.markup"><refmeta><refentrytitle>make.index.markup</refentrytitle><refmiscinfo class="other" otherclass="datatype">boolean</refmiscinfo></refmeta><refnamediv><refname>make.index.markup</refname><refpurpose>Generate XML index markup in the index?</refpurpose></refnamediv><refsynopsisdiv><src:fragment xml:id="make.index.markup.frag"><xsl:param name="make.index.markup" select="0"/></src:fragment></refsynopsisdiv><refsection><info><title>Description</title></info><para>This parameter enables a very neat trick for getting properlymerged, collated back-of-the-book indexes. G. Ken Holman suggestedthis trick at Extreme Markup Languages 2002 and I'm indebted to himfor it.</para><para>Jeni Tennison's excellent code in<filename>autoidx.xsl</filename> does a great job of merging andsorting <tag>indexterm</tag>s in the document and building aback-of-the-book index. However, there's one thing that it cannotreasonably be expected to do: merge page numbers into ranges. (I wouldnot have thought that it could collate and suppress duplicate pagenumbers, but in fact it appears to manage that task somehow.)</para><para>Ken's trick is to produce a document in which the index at theback of the book is <quote>displayed</quote> in XML. Because the indexis generated by the FO processor, all of the page numbers have been resolved.It's a bit hard to explain, but what it boils down to is that instead of havingan index at the back of the book that looks like this:</para><blockquote><formalpara><info><title>A</title></info><para>ap1, 1, 2, 3</para></formalpara></blockquote><para>you get one that looks like this:</para><blockquote><programlisting><indexdiv>A</indexdiv><indexentry><primaryie>ap1</primaryie>,<phrase role="pageno">1</phrase>,<phrase role="pageno">2</phrase>,<phrase role="pageno">3</phrase></indexentry></programlisting></blockquote><para>After building a PDF file with this sort of odd-looking index, you canextract the text from the PDF file and the result is a proper index expressed inXML.</para><para>Now you have data that's amenable to processing and a simple Perl script(such as <filename>fo/pdf2index</filename>) canmerge page ranges and generate a proper index.</para><para>Finally, reformat your original document using this literal index instead ofan automatically generated one and <quote>bingo</quote>!</para></refsection></refentry>
 |