toolchaintechnotes.xml 17 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335
  1. <?xml version="1.0" encoding="ISO-8859-1"?>
  2. <!DOCTYPE sect1 PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
  3. "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
  4. <!ENTITY % general-entities SYSTEM "../general.ent">
  5. %general-entities;
  6. ]>
  7. <sect1 id="ch-tools-toolchaintechnotes">
  8. <?dbhtml filename="toolchaintechnotes.html"?>
  9. <title>Toolchain Technical Notes</title>
  10. <para>This section explains some of the rationale and technical details
  11. behind the overall build method. It is not essential to immediately
  12. understand everything in this section. Most of this information will be
  13. clearer after performing an actual build. This section can be referred
  14. to at any time during the process.</para>
  15. <para>The overall goal of <xref linkend="chapter-temporary-tools"/> is to
  16. produce a temporary area that contains a known-good set of tools that can be
  17. isolated from the host system. By using <command>chroot</command>, the
  18. commands in the remaining chapters will be contained within that environment,
  19. ensuring a clean, trouble-free build of the target LFS system. The build
  20. process has been designed to minimize the risks for new readers and to provide
  21. the most educational value at the same time.</para>
  22. <para>The build process is based on the process of
  23. <emphasis>cross-compilation</emphasis>. Cross-compilation is normally used
  24. for building a compiler and its toolchain for a machine different from
  25. the one that is used for the build. This is not strictly needed for LFS,
  26. since the machine where the new system will run is the same as the one
  27. used for the build. But cross-compilation has the great advantage that
  28. anything that is cross-compiled cannot depend on the host environment.</para>
  29. <sect2 id="cross-compile" xreflabel="About Cross-Compilation">
  30. <title>About Cross-Compilation</title>
  31. <para>Cross-compilation involves some concepts that deserve a section on
  32. their own. Although this section may be omitted in a first reading, it
  33. is strongly suggested to come back to it later in order to get a full
  34. grasp of the build process.</para>
  35. <para>Let us first define some terms used in this context:</para>
  36. <variablelist>
  37. <varlistentry><term>build</term><listitem>
  38. <para>is the machine where we build programs. Note that this machine
  39. is referred to as the <quote>host</quote> in other
  40. sections.</para></listitem>
  41. </varlistentry>
  42. <varlistentry><term>host</term><listitem>
  43. <para>is the machine/system where the built programs will run. Note
  44. that this use of <quote>host</quote> is not the same as in other
  45. sections.</para></listitem>
  46. </varlistentry>
  47. <varlistentry><term>target</term><listitem>
  48. <para>is only used for compilers. It is the machine the compiler
  49. produces code for. It may be different from both build and
  50. host.</para></listitem>
  51. </varlistentry>
  52. </variablelist>
  53. <para>As an example, let us imagine the following scenario: we may have a
  54. compiler on a slow machine only, let's call the machine A, and the compiler
  55. ccA. We may have also a fast machine (B), but with no compiler, and we may
  56. want to produce code for a another slow machine (C). Then, to build a
  57. compiler for machine C, we would have three stages:</para>
  58. <informaltable align="center">
  59. <tgroup cols="5">
  60. <colspec colnum="1" align="center"/>
  61. <colspec colnum="2" align="center"/>
  62. <colspec colnum="3" align="center"/>
  63. <colspec colnum="4" align="center"/>
  64. <colspec colnum="5" align="left"/>
  65. <thead>
  66. <row><entry>Stage</entry><entry>Build</entry><entry>Host</entry>
  67. <entry>Target</entry><entry>Action</entry></row>
  68. </thead>
  69. <tbody>
  70. <row>
  71. <entry>1</entry><entry>A</entry><entry>A</entry><entry>B</entry>
  72. <entry>build cross-compiler cc1 using ccA on machine A</entry>
  73. </row>
  74. <row>
  75. <entry>2</entry><entry>A</entry><entry>B</entry><entry>B</entry>
  76. <entry>build cross-compiler cc2 using cc1 on machine A</entry>
  77. </row>
  78. <row>
  79. <entry>3</entry><entry>B</entry><entry>C</entry><entry>C</entry>
  80. <entry>build compiler ccC using cc2 on machine B</entry>
  81. </row>
  82. </tbody>
  83. </tgroup>
  84. </informaltable>
  85. <para>Then, all the other programs needed by machine C can be compiled
  86. using cc2 on the fast machine B. Note that unless B can run programs
  87. produced for C, there is no way to test the built programs until machine
  88. C itself is running. For example, for testing ccC, we may want to add a
  89. fourth stage:</para>
  90. <informaltable align="center">
  91. <tgroup cols="5">
  92. <colspec colnum="1" align="center"/>
  93. <colspec colnum="2" align="center"/>
  94. <colspec colnum="3" align="center"/>
  95. <colspec colnum="4" align="center"/>
  96. <colspec colnum="5" align="left"/>
  97. <thead>
  98. <row><entry>Stage</entry><entry>Build</entry><entry>Host</entry>
  99. <entry>Target</entry><entry>Action</entry></row>
  100. </thead>
  101. <tbody>
  102. <row>
  103. <entry>4</entry><entry>C</entry><entry>C</entry><entry>C</entry>
  104. <entry>rebuild and test ccC using itself on machine C</entry>
  105. </row>
  106. </tbody>
  107. </tgroup>
  108. </informaltable>
  109. <para>In the example above, only cc1 and cc2 are cross-compilers, that is,
  110. they produce code for a machine different from the one they are run on.
  111. The other compilers ccA and ccC produce code for the machine they are run
  112. on. Such compilers are called <emphasis>native</emphasis> compilers.</para>
  113. </sect2>
  114. <sect2 id="lfs-cross">
  115. <title>Implementation of Cross-Compilation for LFS</title>
  116. <note>
  117. <para>Almost all the build systems use names of the form
  118. cpu-vendor-kernel-os referred to as the machine triplet. An astute
  119. reader may wonder why a <quote>triplet</quote> refers to a four component
  120. name. The reason is history: initially, three component names were enough
  121. to designate unambiguously a machine, but with new machines and systems
  122. appearing, that proved insufficient. The word <quote>triplet</quote>
  123. remained. A simple way to determine your machine triplet is to run
  124. the <command>config.guess</command>
  125. script that comes with the source for many packages. Unpack the Binutils
  126. sources and run the script: <userinput>./config.guess</userinput> and note
  127. the output. For example, for a 32-bit Intel processor the
  128. output will be <emphasis>i686-pc-linux-gnu</emphasis>. On a 64-bit
  129. system it will be <emphasis>x86_64-pc-linux-gnu</emphasis>.</para>
  130. <para>Also be aware of the name of the platform's dynamic linker, often
  131. referred to as the dynamic loader (not to be confused with the standard
  132. linker <command>ld</command> that is part of Binutils). The dynamic linker
  133. provided by Glibc finds and loads the shared libraries needed by a
  134. program, prepares the program to run, and then runs it. The name of the
  135. dynamic linker for a 32-bit Intel machine will be <filename
  136. class="libraryfile">ld-linux.so.2</filename> (<filename
  137. class="libraryfile">ld-linux-x86-64.so.2</filename> for 64-bit systems). A
  138. sure-fire way to determine the name of the dynamic linker is to inspect a
  139. random binary from the host system by running: <userinput>readelf -l
  140. &lt;name of binary&gt; | grep interpreter</userinput> and noting the
  141. output. The authoritative reference covering all platforms is in the
  142. <filename>shlib-versions</filename> file in the root of the Glibc source
  143. tree.</para>
  144. </note>
  145. <para>In order to fake a cross compilation, the name of the host triplet
  146. is slightly adjusted by changing the &quot;vendor&quot; field in the
  147. <envar>LFS_TGT</envar> variable. We also use the
  148. <parameter>--with-sysroot</parameter> when building the cross linker and
  149. cross compiler, to tell them where to find the needed host files. This
  150. ensures none of the other programs built in <xref
  151. linkend="chapter-temporary-tools"/> can link to libraries on the build
  152. machine. Only two stages are mandatory, and one more for tests:</para>
  153. <informaltable align="center">
  154. <tgroup cols="5">
  155. <colspec colnum="1" align="center"/>
  156. <colspec colnum="2" align="center"/>
  157. <colspec colnum="3" align="center"/>
  158. <colspec colnum="4" align="center"/>
  159. <colspec colnum="5" align="left"/>
  160. <thead>
  161. <row><entry>Stage</entry><entry>Build</entry><entry>Host</entry>
  162. <entry>Target</entry><entry>Action</entry></row>
  163. </thead>
  164. <tbody>
  165. <row>
  166. <entry>1</entry><entry>pc</entry><entry>pc</entry><entry>lfs</entry>
  167. <entry>build cross-compiler cc1 using cc-pc on pc</entry>
  168. </row>
  169. <row>
  170. <entry>2</entry><entry>pc</entry><entry>lfs</entry><entry>lfs</entry>
  171. <entry>build compiler cc-lfs using cc1 on pc</entry>
  172. </row>
  173. <row>
  174. <entry>3</entry><entry>lfs</entry><entry>lfs</entry><entry>lfs</entry>
  175. <entry>rebuild and test cc-lfs using itself on lfs</entry>
  176. </row>
  177. </tbody>
  178. </tgroup>
  179. </informaltable>
  180. <para>In the above table, <quote>on pc</quote> means the commands are run
  181. on a machine using the already installed distribution. <quote>On
  182. lfs</quote> means the commands are run in a chrooted environment.</para>
  183. <para>Now, there is more about cross-compiling: the C language is not
  184. just a compiler, but also defines a standard library. In this book, the
  185. GNU C library, named glibc, is used. This library must
  186. be compiled for the lfs machine, that is, using the cross compiler cc1.
  187. But the compiler itself uses an internal library implementing complex
  188. instructions not available in the assembler instruction set. This
  189. internal library is named libgcc, and must be linked to the glibc
  190. library to be fully functional! Furthermore, the standard library for
  191. C++ (libstdc++) also needs being linked to glibc. The solution
  192. to this chicken and egg problem is to first build a degraded cc1+libgcc,
  193. lacking some fuctionalities such as threads and exception handling, then
  194. build glibc using this degraded compiler (glibc itself is not
  195. degraded), then build libstdc++. But this last library will lack the
  196. same functionalities as libgcc.</para>
  197. <para>This is not the end of the story: the conclusion of the preceding
  198. paragraph is that cc1 is unable to build a fully functional libstdc++, but
  199. this is the only compiler available for building the C/C++ libraries
  200. during stage 2! Of course, the compiler built during stage 2, cc-lfs,
  201. would be able to build those libraries, but (i) the build system of
  202. gcc does not know that it is usable on pc, and (ii) using it on pc
  203. would be at risk of linking to the pc libraries, since cc-lfs is a native
  204. compiler. So we have to build libstdc++ later, in chroot.</para>
  205. </sect2>
  206. <sect2 id="other-details">
  207. <title>Other procedural details</title>
  208. <para>The cross-compiler will be installed in a separate <filename
  209. class="directory">$LFS/tools</filename> directory, since it will not
  210. be part of the final system.</para>
  211. <para>Binutils is installed first because the <command>configure</command>
  212. runs of both GCC and Glibc perform various feature tests on the assembler
  213. and linker to determine which software features to enable or disable. This
  214. is more important than one might first realize. An incorrectly configured
  215. GCC or Glibc can result in a subtly broken toolchain, where the impact of
  216. such breakage might not show up until near the end of the build of an
  217. entire distribution. A test suite failure will usually highlight this error
  218. before too much additional work is performed.</para>
  219. <para>Binutils installs its assembler and linker in two locations,
  220. <filename class="directory">$LFS/tools/bin</filename> and <filename
  221. class="directory">$LFS/tools/$LFS_TGT/bin</filename>. The tools in one
  222. location are hard linked to the other. An important facet of the linker is
  223. its library search order. Detailed information can be obtained from
  224. <command>ld</command> by passing it the <parameter>--verbose</parameter>
  225. flag. For example, <command>$LFS_TGT-ld --verbose | grep SEARCH</command>
  226. will illustrate the current search paths and their order. It shows which
  227. files are linked by <command>ld</command> by compiling a dummy program and
  228. passing the <parameter>--verbose</parameter> switch to the linker. For
  229. example,
  230. <command>$LFS_TGT-gcc dummy.c -Wl,--verbose 2&gt;&amp;1 | grep succeeded</command>
  231. will show all the files successfully opened during the linking.</para>
  232. <para>The next package installed is GCC. An example of what can be
  233. seen during its run of <command>configure</command> is:</para>
  234. <screen><computeroutput>checking what assembler to use... /mnt/lfs/tools/i686-lfs-linux-gnu/bin/as
  235. checking what linker to use... /mnt/lfs/tools/i686-lfs-linux-gnu/bin/ld</computeroutput></screen>
  236. <para>This is important for the reasons mentioned above. It also
  237. demonstrates that GCC's configure script does not search the PATH
  238. directories to find which tools to use. However, during the actual
  239. operation of <command>gcc</command> itself, the same search paths are not
  240. necessarily used. To find out which standard linker <command>gcc</command>
  241. will use, run: <command>$LFS_TGT-gcc -print-prog-name=ld</command>.</para>
  242. <para>Detailed information can be obtained from <command>gcc</command> by
  243. passing it the <parameter>-v</parameter> command line option while compiling
  244. a dummy program. For example, <command>gcc -v dummy.c</command> will show
  245. detailed information about the preprocessor, compilation, and assembly
  246. stages, including <command>gcc</command>'s included search paths and their
  247. order.</para>
  248. <para>Next installed are sanitized Linux API headers. These allow the
  249. standard C library (Glibc) to interface with features that the Linux
  250. kernel will provide.</para>
  251. <para>The next package installed is Glibc. The most important
  252. considerations for building Glibc are the compiler, binary tools, and
  253. kernel headers. The compiler is generally not an issue since Glibc will
  254. always use the compiler relating to the <parameter>--host</parameter>
  255. parameter passed to its configure script; e.g. in our case, the compiler
  256. will be <command>$LFS_TGT-gcc</command>. The binary tools and kernel
  257. headers can be a bit more complicated. Therefore, take no risks and use
  258. the available configure switches to enforce the correct selections. After
  259. the run of <command>configure</command>, check the contents of the
  260. <filename>config.make</filename> file in the <filename
  261. class="directory">build</filename> directory for all important details.
  262. Note the use of <parameter>CC="$LFS_TGT-gcc"</parameter> (with
  263. <envar>$LFS_TGT</envar> expanded) to control which binary tools are used
  264. and the use of the <parameter>-nostdinc</parameter> and
  265. <parameter>-isystem</parameter> flags to control the compiler's include
  266. search path. These items highlight an important aspect of the Glibc
  267. package&mdash;it is very self-sufficient in terms of its build machinery
  268. and generally does not rely on toolchain defaults.</para>
  269. <para>As said above, the standard C++ library is compiled next, followed
  270. by all the programs that need themselves to be built. The install step
  271. uses the <envar>DESTDIR</envar> variable to have the programs land into
  272. the LFS filesystem.</para>
  273. <para>Then the native lfs compiler is built. First Binutils Pass 2, with
  274. the same <envar>DESTDIR</envar> install as the other programs, then the
  275. second pass of GCC, omitting libstdc++ and other non-important libraries.
  276. Due to some weird logic in GCC's configure script,
  277. <envar>CC_FOR_TARGET</envar> ends up as <command>cc</command> when host
  278. is the same as target, but is different from build. This is why
  279. <parameter>CC_FOR_TARGET=$LFS_TGT-gcc</parameter> is put explicitely into
  280. the configure options.</para>
  281. <para>Upon entering the chroot environment in <xref
  282. linkend="chapter-building-system"/>, the first task is to install
  283. libstdc++. Then temporary installations of programs needed for the proper
  284. operation of the toolchain are performed. Programs needed for testing
  285. other programs are also built. From this point onwards, the
  286. core toolchain is self-contained and self-hosted. In the remainder of
  287. the <xref linkend="chapter-building-system"/>, final versions of all the
  288. packages needed for a fully functional system are built, tested and
  289. installed.</para>
  290. </sect2>
  291. </sect1>