lfs-multiarch-note.md 26 KB

+++ date = "2018-09-01T16:17:00+08:00" draft = false tags = ['lfs'] title = "LFS/BLFS Multilib/Multiarch Note" summary = """ Linux From Scratch is a great project that provides you with step-by-step instructions for building your own custom Linux system, entirely from source code. To Keep It Simple and Stupid as an educational project, LFS does not contain instruction to build multilibs. I would discuss the changes of LFS/BLFS procedure to make a GNU/Linux system with multilib support in this article. """ authors = ["xry111"] +++

Introduction

Linux From Scratch is a great project that provides you with step-by-step instructions for building your own custom Linux system, entirely from source code. To Keep It Simple and Stupid as an educational project, LFS does not contain instruction to build multilibs.

However, for those who use LFS as their main system, multilib support may be essential. For example, some non-free softwares are only provided in 32-bit binaries, and others may want to use x32 ABI to improve the system performance.

There are various modifications which add multilib support to LFS procedure. Basically they just add the instruction to build multilib in LFS book. But as LFS an educational project, we should focus on why we should use these instruction. And, for a complete multilib environment we also have to build multilib for BLFS book. A Multilib BLFS book would be a massive work and I don't think someone can maintain it.

So, I would explain how multilib system works, and show the basic idea of how to add multilib to LFS system.

Multilib/Multiarch Basic

How Multilib works?

The basic idea of multilib is simple. The kernel will parse the header of the executables and find out whether the code is LP64 (x86-64), LP32 (x86) or L64P32 (x32) in it. Then the kernel can arrange the address space and provide syscall interface specified for this ABI then the code with all the three ABIs will work. For example, if we have a static linked x32 program compiled in a multilib system with

gcc -mx32 foo.c -o foo -static

To run the program foo in a 64-bit LFS system, just enable X86_X32 in the kernel config of the LFS system, and recompile the kernel. Then you can copy foo into the LFS system and run it. It will work. For running traditional x86 code in 64-bit LFS system, just enable another kernel config IA32_EMULATION.

Unfortunately many programs are dynamically linked. When we try to run a dynamically linked 32-bit program bar on a 64-bit LFS system, the kernel will parse the header of bar. The path to dynamic linker is hard coded in the ELF file bar:

$ readelf -l bar | grep INTERP -A1
  INTERP         0x000154 0x08048154 0x08048154 0x00013 0x00013 R   0x1
      [Requesting program interpreter: /lib/ld-linux.so.2]

But we don't have a /lib/ld-linux.so.2 in the 64-bit LFS system. So the execve syscall to execute bar just return with ENOENT.

Now forget LFS for a minute. On a travial binary distribution, we solve the problem by installing ld-linux.so.2 to /lib with a package manager, or manually. Then we'll hit another problem. The shared object name libc.so.6 is hard coded in the ELF file bar:

readelf -d bar | grep NEEDED
 0x00000001 (NEEDED)                     Shared library: [libc.so.6]

So ld-linux.so.2 immediately knows bar needs a shared library named libc.so.6. But the path to it is not hard coded so ld-linux.so.2 has to find it. If bar has no rpath and LD_LIBRARY_PATH is not set, ld-linux.so.2 would just search the library paths hard coded in itself, and those specified by /etc/ld.so.conf. If we just copied a ld-linux.so.2 into a brand new 64-bit LFS system and try to execute bar, ld-linux.so.2 won't find a 32-bit libc.so.6. Then it would fail and the execve call will return ENOENT. We have to copy a 32-bit libc.so.6 and put it into a place ld-linux.so.2 would search, and do it for all shared libraries bar needs. Finally we can execute bar successfully :).

So, a multilib enough to run a foreign 32-bit binary should contain a working 32-bit dynamic linker at /lib/ld-linux.so.2 and essential 32-bit shared libraries for this binary. And a multilib enough to run a foreign x32 binary should contain a working x32 dynamic linker at /libx32/ld-linux-x32.so.2 and essential x32 shared libraries. Now our task is to build them from source to get a Multilib Linux From Scratch.

Where to put the multilib?

The dynamic linker path is specified by the ABI and hard coded in ELF files. The three ABIs supported by a 64-bit Linux kernel on x86-64 processors and their dynamic linker paths are:

ABI Dynamic linker path
32-bit /lib/ld-linux.so.2
64-bit /lib64/ld-linux-x86-64.so.2
x32 /libx32/ld-linux-x32.so.2

We can choose to install the dynamic linkers to other location but then we must create a symlink for it at the specified location. For example, the original LFS book installs ld-linux-x86-64.so.2 to /lib, then symlink it to /lib64.

However, the location of other libraries are arbitary. You just can't put 32-bit and 64-bit libraries in a same directory. After all, the main Glibc library is named libc.so.6 so 32-bit and 64-bit Glibc will certainly collide in the same directory. Just choose three different directories to contain the libraries in three different ABIs. For example, you can make the library location to seem like the dynamic linker location:

ABI Library path in / Library path in /usr
32-bit /lib /usr/lib
64-bit /lib64 /usr/lib64
x32 /libx32 /usr/libx32

For another possibility, you may have a mostly 64-bit system with a few 32-bit applications and want the "main" library path to be 64-bit:

ABI Library path in / Library path in /usr
32-bit /lib32 /usr/lib32
64-bit /lib /usr/lib
x32 /libx32 /usr/libx32

But then there will be a 32-bit ld-linux.so.2 in /lib, along with many 64-bit libraries. Seems strange :(.

Other users (for example me) prefer Debian-like multiarch directories:

ABI Library path in / Library path in /usr
32-bit /lib/i386-linux-gnu /usr/lib/i386-linux-gnu
64-bit /lib/x86_64-linux-gnu /usr/lib/x86_64-linux-gnu
x32 /lib/x86_64-linux-gnux32 /usr/lib/x86_64-linux-gnux32

And even the following "strange" totally non-FHS layout is possible:

ABI Library path
32-bit /system32/lib
64-bit /system64/lib
x32 /systemx32/lib

Then how to let the dynamic linkers to find them? The intuitive way is creating an /etc/ld.so.conf contains all the directories:

# Begin /etc/ld.so.conf.d/01-multilib.conf

/usr/lib64
/usr/lib
/usr/libx32

# End

Then all three dynamic linkers will search all three directories. But that's not a problem since they'll ignore ABI incompatible libraries.

A better way is hard coding the correct location into the dynamic linkers. LFS builds them from source (in Glibc package) so it is possible. Then each dynamic linker will only search the directory containing the compatible libraries specified at build time. We'll show how to do that later.

Changes in LFS

Do We Need Temporary Multilib?

At first you may wonder, maybe I can skip multilib in Chapter 5 and only build multilib in Chapter 6. It's possible in theory but somehow tricky. To build 32-bit multilib of Glibc in Chapter 6, we need a 32-bit libgcc which is part of GCC Pass 2 in Chapter 5. So we have to use --enable-multilib for GCC Pass 2. But then the multilib of libstdc++ and other GCC runtime libraries would be also enabled. They rely on multilib of Glibc in Chapter 5 so we have to build 32-bit and x32 Glibc in Chapter 5 to provide it. And, some libraries in Chapter 5 is temporarily linked to in Chapter 6. So we should build all multilib in Chapter 5 unless we know the multilib from one package is absolutely unnecessary (for example libmagic in package File).

DJ's multilib LFS book build multilib in an additional Chapter 10 so Chapter 5 need not to be modified. But in Chapter 10 a temporary toolchain for multilib still has to be built.

If you are tough enough, you can try to hack GCC building procedure to build 32-bit and x32 libgcc after GCC Pass 2 manually. I have not done this and I don't suggest to do this. I just choose to build all multilib of temporary packages in Chapter 05.

Toolchain packages

Binutils

The symlink from lib64 to lib should be skipped.

If the library directories is same as dynamic linker directories (32-bit in lib, 64-bit in lib64, and x32 in lib32), no other changes are needed. Otherwise, we should edit ld/genscripts.sh in binutils source code to ensure correct library directories in linker scripts. Manually fixing the generated linker scripts is also possible.

GCC

The option --disable-multilib should be changed to --enable-multilib-list=64,32,x32. Then GCC will built runtime libraries (libgcc, libstdc++, etc.) automatically for them. This change should also be applied for Libstdc++ in Chapter 5.

If the library directories is same as dynamic linker directories, no other changes are needed. Otherwise, we should edit gcc/config/i386/t-linux64 in the source code to let GCC know them.

Even if the multilib directory layout is not multiarch, we should use --enable-multiarch. Then Python 3 package can use gcc -print-multiarch to discriminate shared objects and other platform specific files with different suffixes in their names.

Glibc

Glibc need to be built three times, each for an ABI. The complete configure line in Chapter 5 should be changed like:

CC="$LFS_TGT-gcc $BUILD_MULTI" \
CXX="$LFS_TGT-g++ $BUILD_MULTI" \
../configure \
		--prefix=/tools \
		--host=$LFS_TGT_MULTI \
		--libdir=/tools/$LIBDIR_MULTI \
		--build=$(../scripts/config.guess) \
		--enable-kernel=3.2 \
		--with-headers=/tools/include \
		libc_cv_forced_unwind=yes \
		libc_cv_c_cleanup=yes

The variables with suffix _MULTI varies with ABI:

ABI BUILD_MULTI LFS_TGT_MULTI LIBDIR_MULTI
32-bit -m32 i686-lfs-linux-gnu 32-bit library directory
64-bit -m64 x86_64-lfs-linux-gnu 64-bit library directory
x32 -mx32 x86_64-lfs-linux-gnu x32 library directory

Explaination of variables and options:

  • --host=$LFS_TGT_MULTI: LFS book has already explained that this is necessary for cross compiling. For 32-bit multilib we have to use the value i686-lfs-linux-gnu to tell the building system to use i686 (32-bit) assembly code instead of 64-bit assembly incompatable with 32-bit ABI.
  • CC=$LFS_TGT-gcc $BUILD_MULTI and CXX=$LFS_TGT-g++ $BUILD_MULTI: By default --host=$LFS_TGT_MULTI makes the building system to use $LFS_TGT_MULTI-gcc as C compiler. That's enough for the original LFS book without multilib, but we don't have a i686-lfs-linux-gnu-gcc now and x86_64-lfs-linux-gnu-gcc would normally generate code in 64-bit ABI. So we have to override the C/C++ compiler for 32-bit and x32. The -m64 for 64-bit could be omitted.

And, for the final GCC in Chapter 6:

CC="gcc $BUILD_MULTI -isystem $GCC_INCDIR -isystem /usr/include" \
CXX="g++ $BUILD_MULTI" \
../configure --prefix=/usr \
	--host=$HOST_MULTI
	--libdir=/usr/$LIBDIR_MULTI
	--disable-werror \
	--enable-kernel=3.2 \
	--enable-stack-protector=string \
	libc_cv_slibdir=/$LIBDIR_MULTI \
	libc_cv_complocaledir=/usr/lib/locale

Explaination of new options and variables:

  • --host=$HOST_MULTI: Should be i686-pc-linux-gnu for 32-bit, and x86_64-pc-linux-gnu for x32 and 64-bit. It tells Glibc to use i686 assembly for 32-bit version. It's necessary for 32-bit and can be omitted for 64-bit and x32 (since config.guess returns the x86_64-pc-linux-gnu or x86_64-unknown-linux-gnu when kernel is 64-bit).
  • libc_cv_slibdir=/$LIBDIR_MULTI: Tell Glibc to install shared libraries to correct (customized) location.
  • libc_cv_complocaledir=/usr/lib/locale: Tell Glibc to use standard /usr/lib/locale for locale archives, instead of /usr/$LIBDIR_MULTI/locale. This would save disk space and make the locale archive consistent for 32-bit, 64-bit and x32 applications.

We should install 64-bit Glibc after 32-bit and x32 version. Then the 64-bit executable binarys will overwrite the 32-bit and x32 ones. And then we need to edit the ldd script. For security reason it only handles the executables with correct dynamic linker path which is specified by the RTLDLIST variable. Since we have finally installed the 64-bit version, the RTLDLIST variable only contains one path to 64-bit dynamic linker. Open /usr/bin/ldd with an editor and modify RTLDLIST to be three dynamic linker paths sperated by space.

If we are not using the standard library directories, the dynamic linkers would be in wrong place (/$LIBDIR_MULTI/ld-*.so). We should symlink the dynamic linkers to correct location.

Other packages

Most packages could be configured for multilib with:

CC="gcc $BUILD_MULTI" CXX="g++ $BUILD_MULTI" \
${original_configure_line_in_book} \
	--libdir=$LIBDIR_MULTI
	--host=$HOST_MULTI

One may think CC="gcc -m32" is enough for 32-bit, but several packages has platform specific code so it's better to use --host=i686-pc-linux-gnu. Again, --host can be omitted for 64-bit and x32.

Here ${original_configure_line_in_book} is the original confiugre line from the LFS book. Actually we can simplify the line by creating some pesudo-cross compiler wrappers like:

#!/bin/sh

exec gcc -m32 "$@"

If you put this shell script as /usr/bin/i686-pc-linux-gnu-gcc, and do similar thing for i686-pc-linux-gnu-g++, you can skip the setting of CC and CXX since configure will automatically find the wrapper scripts for i686-pc-linux-gnu host. For x32 ABI, though the canonical host triplet is also x86_64-pc-linux-gnu, but we can use a customized triplet like x86_64-x32-linux-gnu or x86_64-pc-linux-gnux32.

Multilib of most LFS packages can be built and installed this way. But some packages need special handling.

GMP configure script recongnize "customized" host triplets for optimization depending on the hardware. For example when I run config.guess in GMP package on my laptop I get the triplet ivybridge-pc-linux-gnu. So if we use --host=x86_64-pc-linux-gnu or --host=i686-pc-linux-gnu, the script will configure for generic x86-64 or x86 CPU. This will make GMP slower. So, instead of using --host, we should use the environment $ABI for GMP. GMP configure script recongizes ABI=32, ABI=64, and ABI=x32. For example, ABI=32 would tell GMP to use 32-bit x86 assembly code, and add -m32 to compiler flags.

And gmp.h is platform specific. We must rename them to gmp-64.h, gmp-32.h and gmp-x32.h for three ABIs and create a gmp.h wrapping them:

#if defined(__x86_64__) && defined(__LP64__)
#include "gmp-64.h"
#elif defined(__x86_64__)
#include "gmp-x32.h"
#else
#include "gmp-32.h"
#endif

Bzip2 has no configure script. Most annoying, the library path (relative to install prefix) is hard coded to be lib in Makefile. We have to use sed to edit it, or install it manually.

By default pkg-config only know one .pc file path in library paths, which is /usr/lib/pkgconfig. So the .pc files in /usr/lib64/pkgconfig will be useless. Unfortunately most modern systems need to be mainly 64-bit. They have many libraries with only 64-bit version so they only have pkgconfig files in /usr/lib64/pkgconfig. We can override this using configure option --with-pc-path=.... We can add multiple directories and seperate them by :.

And, if pkg-config detects an pkgconfig file for 64-bit, it will output -L/usr/lib64 -lfoo for libfoo. -L/usr/lib64 is unnecessary and may cause problem. We can use --with-system-library-path=... to tell pkg-config which -L ldflags should be skipped.

I recommend to compile i686-pc-linux-gnu-pkg-config and x86_64-x32-linux-gnu-pkgconfig for 32-bit and x32. Some BLFS packages have additional ABI-specific information in the pkgconfig files (for example glib and gobject-introspection). i686-pc-linux-gnu-pkg-config should be configured to search 32-bit and shared (/usr/share/pkgconfig) pkgconfig files only, and x86_64-x32-linux-gnu-pkg-config should be configured to search x32 and shared pkgconfig files only. --host=i686-pc-linux-gnu will tell configure script of other packages to use i686-pc-linux-gnu-pkg-config instead of pkgconfig. if it exists.

How to add the prefix i686-pc-linux-gnu-? Use the configure option --program-prefix building pkg-config.

Ncurses installs ncursesw6-config. Since we install the 64-bit version last, /usr/bin/ncursesw6-config would be the version from 64-bit. Its output would contain -L/usr/lib64. It's annoying for 32-bit and x32. We have to use sed to edit it and remove the -L output.

Some other packages also has *-config scripts. They need to be modified too.

Do not modify headers install path since the headers are ABI specific. If you don't have i686-pc-linux-gnu-pkg-config, you may need PKG_CONFIG_PATH=/usr/lib/pkgconfig for the packages need libffi (for example, Python 3).

OpenSSL has customized configure system. Fortunately it's easy to use. It's Configure script (not config) accepts linux-x86_64 for 64-bit, linux-x86 for 32-bit, and linux-x32 for x32. Still we have to remember to change --libdir.

Pythons tends to have problem when cross compiling. So do not use --host for it.

Python 3 only use /usr/lib/python3.x as package directory. If we use --libdir=/usr/lib64 for Python, we would get a broken installation. To discriminate shared objects in one package with different ABIs, Python 3 hard code the multiarch name from gcc -print-multiarch at build time and suffix the shared objects with it. For example, there are _ssl.cpython-36m-x86_64-linux-gnu.so for 64-bit, and _ssl.cpython-36m-i386-linux-gnu.so for 32-bit.

The header pyconfig.h is ABI specific and need to be renamed and wrapped. And, since we can't use --libdir, we have to manually move libpython3.6m.so to the correct library path.

Meson itself is in pure Python and has no binary libraries. But to use Meson building system to build multilib, we need to tell Meson some information of the ABI. After installation of Meson, create /usr/share/meson/native/x86 for 32-bit:

[binaries]
c = '/usr/bin/i686-pc-linux-gnu-gcc'
cpp = '/usr/bin/i686-pc-linux-gnu-g++'
pkgconfig = '/usr/bin/i686-pc-linux-gnu-pkg-config'
ar = '/usr/bin/ar'
strip = '/usr/bin/strip'
exe_wrapper = ''

[properties]
sizeof_void* = 4
sizeof_long = 4

[host_machine]
system = 'linux'
cpu_family = 'x86'
cpu = 'i686'
endian = 'little'

Then we can use meson --native-file x86 for 32-bit. exec_wrapper = "" tells Meson we can run the generated executables natively. Without it some BLFS packages refuse to build.

{{% callout note %}} I'd used /usr/share/meson/cross and --cross-file, just like a pseudo-cross building with autoconf configure script. But it turned out some packages refuse to build certain parts (for example gir files) when they are cross compiled. And, meson now uses host pkg-config to locate g-ir-scanner and g-ir-compiler. So we have to stop pretending cross building for 32-bit. {{% /callout %}}

Changes in BLFS

Most BLFS packages can be built for multilib like normal LFS packages. Still some packages need special case.

I have a multiarch patch for Python 2. With it we can build Python 2 just like building Python 3 in LFS.

I suggest to edit the default library pathes in /usr/share/cmake-${version}/Modules/GNUInstallDir.cmake so we don't need to set them manually each time. But it only supports 32-bit and 64-bit (no x32 support) now. So we still nned a -DCMAKE_INSTALL_LIBDIR=/usr/lib32 for x32. I don't know how to hack cmake to support installing x32 libraries to /usr/lib32.

Some packages using cmake doesn not use GNUInstallDirs.cmake but use a config variable LIB_SUFFIX. Which can be used to specify library path like -DLIB_SUFFIX=64 (result in /usr/lib64). But there are still packages hard coding lib in CMakeLists.txt. They are quite annoying and need some sed.

We have to set libexecdir of Gstreamer same as libdir because it has some ABI specific helper programs.

It's very tricky. My approach is install i686-pc-linux-gnu-g-ir-scanner etc. alongside with the normal g-ir-scanner. The Python code of gobject- introspection is installed in the library path so we can hold all three versions. Then hack the code of 32-bit and x32 version so they'll find correct compiler and pkg-config. And, edit gobject-introspection-1.0.pc so other packages can find correct gobject-introspection with (prefixed) pkg-config.

However Meson building system always searches g-ir-scanner etc. from $PATH instead of calling pkg-config. I hacked Meson code to force it search gobject-introspection tools using pkg-config. It seems working well.

Rustc is a compiler so it doesn't need multilib itself. But we have to build multilib for its runtime libraries (just like libstdc++ from GCC). Simply modify config.toml in BLFS book will do the job:

# see config.toml.example for more possible options
[llvm]
targets = "X86"

# When using system llvm prefer shared libraries
link-shared = true

[build]
# install cargo as well as rust
extended = true
target = ["x86_64-unknown-linux-gnu", "i686-unknown-linux-gnu"]

[install]
prefix = "/usr"
docdir = "share/doc/rustc-1.25.0"

[rust]
channel = "stable"
rpath = false

# get reasonably clean output from the test harness
quiet-tests = true

# BLFS does not install the FileCheck executable from llvm,
# so disable codegen tests
codegen-tests = false

[target.x86_64-unknown-linux-gnu]
# delete this *section* if you are not using system llvm.
# NB the output of llvm-config (i.e. help options) may be
# dumped to the screen when config.toml is parsed.
llvm-config = "/usr/bin/llvm-config"

[target.i686-unknown-linux-gnu]
llvm-config = "/usr/bin/llvm-config"
linker = "i386-linux-gnu-gcc"

But rustc will install many 64-bit libraries in /usr/lib. I don't like this behavior so I remove all of them and add /usr/lib/rustlib/${arch}/lib to /etc/ld.so.conf. Rustc seems also supporting x32 now but I've not tested.

After a rustc with multilib has been installed, cross compiling multilib for librsvg is simple.

Other Packages

Google Go Compiler

It needs no multilib - the compiler will compile 32-bit runtime as needed. Set GOARCH=386 then it would produce 32-bit code. But it doesn't support x32 now.