make.index.markup.xml 2.8 KB

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273
  1. <refentry xmlns="http://docbook.org/ns/docbook"
  2. xmlns:xlink="http://www.w3.org/1999/xlink"
  3. xmlns:xi="http://www.w3.org/2001/XInclude"
  4. xmlns:src="http://nwalsh.com/xmlns/litprog/fragment"
  5. xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  6. version="5.0" xml:id="make.index.markup">
  7. <refmeta>
  8. <refentrytitle>make.index.markup</refentrytitle>
  9. <refmiscinfo class="other" otherclass="datatype">boolean</refmiscinfo>
  10. </refmeta>
  11. <refnamediv>
  12. <refname>make.index.markup</refname>
  13. <refpurpose>Generate XML index markup in the index?</refpurpose>
  14. </refnamediv>
  15. <refsynopsisdiv>
  16. <src:fragment xml:id="make.index.markup.frag">
  17. <xsl:param name="make.index.markup" select="0"/>
  18. </src:fragment>
  19. </refsynopsisdiv>
  20. <refsection><info><title>Description</title></info>
  21. <para>This parameter enables a very neat trick for getting properly
  22. merged, collated back-of-the-book indexes. G. Ken Holman suggested
  23. this trick at Extreme Markup Languages 2002 and I'm indebted to him
  24. for it.</para>
  25. <para>Jeni Tennison's excellent code in
  26. <filename>autoidx.xsl</filename> does a great job of merging and
  27. sorting <tag>indexterm</tag>s in the document and building a
  28. back-of-the-book index. However, there's one thing that it cannot
  29. reasonably be expected to do: merge page numbers into ranges. (I would
  30. not have thought that it could collate and suppress duplicate page
  31. numbers, but in fact it appears to manage that task somehow.)</para>
  32. <para>Ken's trick is to produce a document in which the index at the
  33. back of the book is <quote>displayed</quote> in XML. Because the index
  34. is generated by the FO processor, all of the page numbers have been resolved.
  35. It's a bit hard to explain, but what it boils down to is that instead of having
  36. an index at the back of the book that looks like this:</para>
  37. <blockquote>
  38. <formalpara><info><title>A</title></info>
  39. <para>ap1, 1, 2, 3</para>
  40. </formalpara>
  41. </blockquote>
  42. <para>you get one that looks like this:</para>
  43. <blockquote>
  44. <programlisting>&lt;indexdiv&gt;A&lt;/indexdiv&gt;
  45. &lt;indexentry&gt;
  46. &lt;primaryie&gt;ap1&lt;/primaryie&gt;,
  47. &lt;phrase role="pageno"&gt;1&lt;/phrase&gt;,
  48. &lt;phrase role="pageno"&gt;2&lt;/phrase&gt;,
  49. &lt;phrase role="pageno"&gt;3&lt;/phrase&gt;
  50. &lt;/indexentry&gt;</programlisting>
  51. </blockquote>
  52. <para>After building a PDF file with this sort of odd-looking index, you can
  53. extract the text from the PDF file and the result is a proper index expressed in
  54. XML.</para>
  55. <para>Now you have data that's amenable to processing and a simple Perl script
  56. (such as <filename>fo/pdf2index</filename>) can
  57. merge page ranges and generate a proper index.</para>
  58. <para>Finally, reformat your original document using this literal index instead of
  59. an automatically generated one and <quote>bingo</quote>!</para>
  60. </refsection>
  61. </refentry>