Goliath's random stuff

A couple years ago when I still was in school I was once bored enough toattempt to build a cross compilation toolchain myself and bootstrap a simple,working GNU/Linux system running inside a VM.

I basically ended up copying the LFS instructions and it turned out to be verytedious, took me multiple attempts over several days, but in the end, most ofit, most of the time mostly worked.

Fast forward to 2016: At my workplace, I was asked to rewrite themtd-utils build system from a custom,broken Makefile to autotools, so you had something that actually worked andcould easily be integrated into existing cross build toolchains likebuildroot, or our own in-housecrosstools-NG based system.

Anyway, after some in-depth learning about the internals of the autotools, Ithought, "hmm... GCC and binutils, like all GNU packages, use an autotoolsbuild system. I know how to get autotools stuff running, so it can't bethat hard to bootstrap a cross toolchain" and decided to give it a try overthe weekend.

Turns out, it isn't. GCC 6.2.0 requires amazingly few clutches and I managedto get a GCC+musl cross toolchain running by around two in the morning onSaturday.

Nevertheless, some friends asked me to write about it and I figured that mightbe a good idea, since most instructions I found on the internet were useless.Many didn't work at all, some only worked for a specific target, but mostof them were full of magic (i.e. "just run this, I have no idea whyit works").

The LFS andCLFS books werevery help full and mostly worked if you already knew exactly what you were doing.At the time I read them, they were also full of "just apply this patch don'task" magic and generally lacked explanations, expecting you to just mindlesslycopy shell commands.

Maybe there will be a follow up on how to boot strap a small Linux+Busyboxsystem.

NOTE: I'm still working on the writeup for the ARM target as well as howto use your existing libc. I just didn't get around to it yet and decidedto put the unfinished article up as is, because it's been sitting around forso long I was afraight I might otherwise forget about it completely.

Overview

I'm building the toolchain on an AMD64 (aka x86_64) system. I previously triedit on Arch Linux, but for writing this post, Iretraced my steps on a fresh CentOS 7.

I'm going to discuss two different target architectures: 32 bit ARM and32 bit x86. The reason for this is that I have easy access to actual hardwarefor testing and the two require different clutches to build the toolchain.

We'll build a simple, straight forward cross toolchain.No Canadian crossor similar complex stuff.

The entire process itself consists of the following steps:

Installing the kernel headers to the output toolchain directory.
Compiling cross binutils.
Compiling a minimal GCC cross compiler with minimal libgcc.
Cross compiling the C standard library (in our case musl) with the minimal GCC.
Compiling a full version of the GCC cross compiler with complete libgcc.

The main reason for compiling GCC twice is the inter-dependency between thecompiler and the standard library.

First of all, the GCC build system needs to know what kind of C standardlibrary we are using and where to find it. Not only does the compiler needto know what to link programs against, it also links executable programsprograms against bootstrap object code provided by the libc that doesstack setup, CALLs the main() function and calls exit(3) when main() returns.The libc also provides the dynamic linker that the compiler writes into theELF interpreter field of dynamically linked programs.

Second, there is libgcc.libgcc contains low level platform specific helpers (like exception handling,soft float code, etc.) and is automatically linked to programs built with GCC.Libgcc source code comes with GCC and is compiled by the GCC build systemspecifically for our cross compiler & libc combination.

However, some functions in the libgcc need functions from the C standardlibrary. Some larger libc implementations (like glibc) directly use utilityfunctions from libgcc for e.g. stack unwinding (libgcc_s).

After building a GCC cross compiler, we need to cross compile libgcc, so we canthen cross compile other stuff that needs libgcc like the libc. But weneed an already cross compiled libc in the first place for compiling libgcc.

The solution is to build a minimalist GCC. With that we compile a minimallibgcc that has lots of features disabled and uses internal stubs for standardC functions instead of linking against libc.

We can then cross compile the libc and let the compiler link it against theminimal libgcc.

With that, we can compile the full GCC, pointing it at the C standard libraryfor the target system and build a dynamically linked, fully featured libgccalong with it. We can simply install it over the existing GCC and libgcc inthe toolchain directory.

If you already have an existing distro running on the target hardware, youalready have a libc and libgcc for your target that you want to copy over andlink against. In that case, you can skip the first pass (more on that later),but that would be kind of boring :-)

Prerequisites

The following source packages are required for building the toolchain. Thelinks below point to the exact versions that I used.

Linux.This is a very popular OS kernel that runs on our target system. We needat least the headers to build the C standard library.
Musl. A tinyC standard library implementation.
Binutils. Thiscontains the GNU assembler, linker and various tools for working withexecutable files.
GCC, the GNUcompiler collection. Contains compilers for C and other languages.

The following things are required for compiling GCC. We only need to point theGCC build system to the location of the source and it takes care of compilingit along.

MPFR. A multiple precisionfloating point library.
GMP. A multiple precisionarithmetic library.
MPC. Multiple precisionlibrary for complex numbers.
ISL.More math stuff needed for optimizing loops. Apparently a library for workingwith sets?
CLOOG.A code generator library.

For compiling all of this you will need:

bash
gcc
g++
make
flex
bison
gperf
makeinfo
ncurses (with headers)
awk
automake
help2man

In case you wonder: you need the C++ compiler to build GCC. The GCC code basemainly uses C99, but with some additional C++ features. makeinfo is used bythe GNU utilities that generate info pages from texinfo. ncurses is mainlyneeded by the kernel build system.

I'm not entirely sure on the list as I normally work on systems with tons ofdevelopment tools and libraries already installed, so I just used commonsense, and also took a look at the configure output of the packages we aregoing to build. I also pulled some things from the README file of our in-housedistro build system.

I wrote bash on the list simply because I'm too lazy to rid my shellone-liners of bash-isms, so we are going to work on bash.

Getting started

To keep things clean, we start out in an empty, fresh working directory inwhich we want to piece our toolchain together.

At first, we set a few handy shell variables that will store the configurationof our toolchain:

TARGET="i686-linux-musl"ARCH="x86"CPU="i686"

The TARGET variable holds the target triplet of our system. It describesthe target platform by pasting together CPU architecture, kernel and user land.This string is not arbitrary! The GNU build system parses this string tofigure out what it is building for!If you compile a musl toolchain, the last part has to be musl! Otherwise,the GCC build system will assume a different libc provider and thesecond pass GCC will blow up in your face!

We also need the triplet for the local machine that we are going to buildthings on. We are going to use this later on when building GCC:

$ HOST=$(uname -m)-$OSTYPE$ echo $HOSTx86_64-linux-gnu

The OSTYPE is a shell builtin. Some guides suggest using another shellbuiltin MACHTYPE instead of the line above, however this deliveredinconsistent results. On CentOS 7 I got this:

$ echo "$MACHTYPE"x86_64-redhat-linux-gnu

On Arch Linux, however it returned a different result, namely the same resultthat uname -m returns on both systems:

$ echo "$MACHTYPE"x86_64

To get a similar result on Arch I had to piece the string together like this:

$ echo "$MACHTYPE-$OSTYPE"x86_64-linux-gnu

This however, produces garbage on the CentOS machine, so I used the HOSTas defined above.

The CPU and ARCH variables both hold the target CPU architecture.The later is used for the kernel build system, the former for the GNU buildsystem as the two can't decide on a common scheme for naming things.

We will store the absolute path to the working directory inside a shellvariable called BUILDROOT and create a few directories to organizeour stuff in:

BUILDROOT=$(pwd)mkdir -p "build" "src" "download" "toolchain/bin" "toolchain/$TARGET"

I stored the downloaded packages in the download directory and extractedthem to a directory called src.

We will later build packages outside the source tree (GCC even requires thatnowadays), inside a sub directory of build.

Our final toolchain will end up in a directory called toolchain. We alreadycreate the sub directories bin and $TARGET in advance for the kernel andbinutils build systems. The former directory will hold binaries of ourtoolchain with target prefix, the later will hold headers, libraries andbinaries without prefix.

We store the toolchain location inside another shell variable that I calledTCDIR and prepend the executable path of our toolchain to the PATHvariable:

TCDIR="$BUILDROOT/toolchain"export PATH="$TCDIR/bin:$PATH"

Right now, you should have a directory tree that looks something like this:

build/
toolchain/
- bin/
- i686-linux-musl/
src/
- binutils-2.27/
- cloog-0.18.1/
- gcc-6.2.0/
- gmp-6.1.1/
- isl-0.16.1/
- linux-4.8.5/
- mpc-1.0.3/
- mpfr-3.1.4/
- musl-1.1.15/
download/
- binutils-2.27.tar.bz2
- cloog-0.18.1.tar.gz
- gcc-6.2.0.tar.bz2
- gmp-6.1.1.tar.bz2
- isl-0.16.1.tar.bz2
- linux-4.8.5.tar.xz
- mpc-1.0.3.tar.gz
- mpfr-3.1.4.tar.bz2
- musl-1.1.15.tar.gz

I previously mentioned that we only need to "point" the GCC build system tothe locations of its dependency libraries. To simplify things, I created abunch of symlinks inside the GCC source dir for the dependencies:

cd "$BUILDROOT/src/gcc-6.2.0/"ln -s "$BUILDROOT/src/cloog-0.18.1" "cloog"ln -s "$BUILDROOT/src/gmp-6.1.1" "gmp"ln -s "$BUILDROOT/src/isl-0.16.1" "isl"ln -s "$BUILDROOT/src/mpc-1.0.3" "mpc"ln -s "$BUILDROOT/src/mpfr-3.1.4" "mpfr"cd "$BUILDROOT"

You could also install the libraries trough a package management system andlet the GCC build system use them instead. However, some are closely tied toGCC, and the GCC build system tends to be quite fragile, so I prefer buildingthem along for the local GCC build.

Theoretically you could also build the libraries separately beforehand andthen just point the GCC configure script to their location. But if you inspectthe configure output from the GCC build system, you can see that it sets quitea number of specific options depending on the target, so it's probably easiestto just create the symlinks and let the GCC build system do its thing.

Extracting the kernel headers

export KBUILD_OUTPUT="$BUILDROOT/build/linux"mkdir -p "$KBUILD_OUTPUT"cd "$BUILDROOT/src/linux-4.8.5"make O="$KBUILD_OUTPUT" ARCH="$ARCH" headers_checkmake O="$KBUILD_OUTPUT" ARCH="$ARCH" INSTALL_HDR_PATH="$TCDIR/$TARGET" headers_installcd "$BUILDROOT"

We create a build directory inside $BUILDROOT/build/linux. Building thekernel outside its source tree works a bit different compared to autotoolsbased stuff.

According to the Makefile in the Linux source, you can either specify anenvironment variable called KBUILD_OUTPUT, or set a Makefile variablecalled O, where the later overrides the environment variable. The snippetabove shows both ways.

The headers_check target runs a few trivial sanity checks on the headerswe are going to install. It checks if a header includes something nonexistent,if the declarations inside the headers are sane and if kernel interna areleaked into user space. For stock kernel tar-balls, this shouldn't benecessary, but could come in handy when working with kernel git trees,potentially with local modifications.

Lastly (before switching back to the root directory), we actually install thekernel headers into e.g. "toolchain/i686-linux-musl/include" where the libclater expects them to be.

Since I've seen the question in a few forums: it doesn't matter if the kernelversion exactely matches the one running on your target system. The kernelsystem call ABI is stable, so you can use an older kernel. Only if you use amuch newer kernel, the libc might end up exposing or using features that yourkernel does not yet support.

If you have some embedded board with a heavily modified vendor kernel and noupstream support, you are pretty much on your own. If in addition to that, thevendor breaks the ABI take the board and burn it (preferably outside;don't inhale the fumes).

Compiling cross binutils

We will compile binutils outside the source tree, inside the directorybuild/binutils. So first, we create the build directory and switch intoit. To keep things clean, we use a shell variable srcdir to remember wherewe kept the binutils source. A pattern that we will repeat later:

mkdir -p "$BUILDROOT/build/binutils"cd "$BUILDROOT/build/binutils"srcdir="$BUILDROOT/src/binutils-2.27"

From the binutils build directory we run the configure script:

$srcdir/configure --prefix="$TCDIR" --target="$TARGET"                   --with-sysroot="$TCDIR/$TARGET"                   --disable-nls --disable-multilib

In an autotools build system, there are three different system tripletsat work:

The --build option specifies what system we are building thepackage on.
The --host option specifies what system the binaries will run on.
The --target option is specific for compilation tools and specifywhat system to generate output for.

We only set the --target option to tell the build system what target theassembler, linker and other tools should generate output for. We don'texplicitly set the other options because the binutils build system is somewhatmore robust than the GCC one and can figure out that it is being built for thelocal machine.

If we were doing a Canadian cross, we would set the --host option to thetriplet of the existing cross toolchain in order to build binutils that run ona machine different from ours and generate output for yet another one.

The --prefix option specifies where to install files to, together withthe make variable DESTDIR. When you run make DESTDIR=xy install on anautomake generated makefile, it will install binaries to xy/prefix/bin,libraries to xy/prefix/lib, headers to xy/prefix/include and so on. Thefile type specific suffix can of course also be configured, but that is notreally of interest right now.

The default prefix is /usr/local/. We set it to the top level directory ofour toolchain (remember, TCDIR=$BUILDROOT/toolchain).

The --with-sysroot option tells the build system that our systems rootdirectory is not '/' but actually '$TCDIR/$TARGET'(e.g. "toolchain/i686-linux-musl") and it should look for libraries andheaders over there.

We disable the features nls (native language support, i.e. i18n) mainlybecause we don't need it.

Some architectures support executing code for other, related architectures(e.g. x86 code can run x86_64). On GNU/Linux distributions that support that,you typically have different versions of the same libraries (e.g. in lib/ andlib32/ directories) with programs for different architectures being linkedto the appropriate libraries. We are only interested in a single architectureand don't need that, so we set --disable-multilib.

Now we can compile and install binutils:

make configure-hostmakemake installcd "$BUILDROOT"

The first make target, configure-host is binutils specific and just tells itto check out the system it is being built on, i.e. your local machine andmake sure it has all the tools it needs for compiling. If it reports a problem,go fix it before continuing.

We then go on to build the binutils. You may want to speed up compilation byrunning a parallel build with make -j NUMBER-OF-PROCESSES.

Lastly, we run make install to install the binutils in the configuredtoolchain directory and go back to our root directory.

First pass GCC

Similar to above, we create a directory for building the compiler, changeinto it and store the source location in a variable:

mkdir -p "$BUILDROOT/build/gcc-1"cd "$BUILDROOT/build/gcc-1"srcdir="$BUILDROOT/src/gcc-6.2.0"

Notice, how the build directory is called gcc-1. For the second pass, wewill later create a different build directory. Not only does this out of treebuild allow us to cleanly start afresh (because the source is left untouched),but current versions of GCC will flat out refuse to build inside thesource tree.

$srcdir/configure --prefix="$TCDIR" --target="$TARGET"                   --build="$HOST" --host="$HOST"                   --with-sysroot="$TCDIR/$TARGET"                   --disable-nls --disable-shared --without-headers                   --disable-multilib --disable-decimal-float                   --disable-libgomp --disable-libmudflap                   --disable-libssp --disable-libatomic                   --disable-libquadmath --disable-threads                   --enable-languages=c --with-newlib --with-arch="$CPU"

The --prefix, --target and --with-sysroot work just like above forbinutils.

This time we explicitly specify --build (i.e. the system that we are goingto compile GCC on) and --host (i.e. the system that the GCC will run on).In our case those are the same. We use the machine triplet that we piecedtogether earlier. It might be generally wise to always set those, but hereI only set them for GCC, because of my experience with the fragile GCC buildsystem. And yes, I have seen older versions of GCC throw a fit or assumecomplete nonsense if you don't explicitly specify those.

The option --with-arch gives the build system slightly more specificinformation about the target processor architecture.

We also disable a bunch of stuff we don't need. I already explained nlsand multilib above. We also disable a bunch of optimization stuff and helperlibraries. Among other things, we also disable support for dynamic linking andthreads.

The option --without-headers tells the build system that we don't have theheaders for the libc yet and it should use minimal stubs instead where itneeds them. The --with-newlib option is more of a hack. It tells that weare going to use the newlib as C standardlibrary. This isn't actually true, but forces the build system to disable somelibgcc features that depend on the libc.

The option --enable-languages accepts a comma separated list of languagesthat we want to build compilers for. For now, we only need a C compiler forcompiling the libc.

If you are interested: Here is a detailed list of all GCC configure options.

make all-gcc all-target-libgccmake install-gcc install-target-libgcccd "$BUILDROOT"

We explicitly specify the make targets for GCC and cross-compiled libgccfor our target. We are not interested in anything else.

For the first make, you really want to specify a -j NUM-PROCESSES optionhere. Even the first pass GCC we are building here will take a while to compileon an ordinary desktop machine.

C standard library

We create our build directory and change there:

mkdir -p "$BUILDROOT/build/musl"cd "$BUILDROOT/build/musl"srcdir="$BUILDROOT/src/musl-1.1.15"

Musl is quite easy to build but requires some special handling, because itdoesn't use autotools. The configure script is actually a hand written shellscript that tries to emulate some of the typical autotools handling:

CC="${TARGET}-gcc" $srcdir/configure --prefix=/ --target="$TARGET"

We override the shell variable CC to point to the cross compiler that wejust built. Remember, we added the /bin of the toolchain directory toour PATH.

We do the same thing for actually compiling musl and we explicitly set theDESTDIR variable for installing:

CC="${TARGET}-gcc" makemake DESTDIR="$TCDIR/$TARGET" installcd "$BUILDROOT"

Second pass GCC

mkdir -p "$BUILDROOT/build/gcc-2"cd "$BUILDROOT/build/gcc-2"srcdir="$BUILDROOT/src/gcc-6.2.0"

As you can see, we are using a different build directory for the second passgcc.

$srcdir/configure --prefix="$TCDIR" --target="$TARGET"                   --build="$HOST" --host="$HOST"                   --with-sysroot="$TCDIR/$TARGET"                   --disable-nls --enable-languages=c,c++                   --enable-c99 --enable-long-long                   --disable-libmudflap --disable-multilib                   --disable-libmpx --disable-libssp --disable-libsanitizer                   --with-arch="$CPU"                   --with-native-system-header-dir="/include"

Most of the options should be familiar already.

For the second pass, we also build a C++ compiler. The options --enable-c99and --enable-long-long are C++ specific. When our final compiler runs inC++98 mode, we allow it to expose C99 functions from the libc through a GNUextension. We also allow it to support the long long data type standardizedin C99.

You may wonder why we didn't have to build a libstdc++ between thefirst and second pass, like the libc. The source code for the libstdc++comes with the G++ compiler and is built automatically like libgcc.On the one hand, it is really just a library that adds C++ stuffon top of libc and the compiler doesn't depend on it. On the other hand,C++ does not have a standard ABI and it is all compiler and OS specific. Socompiler vendors will typically ship their own libstdc++ implementation withthe compiler.

The options --disable-libmpx and --disable-libssp are special hacksthat we need for building an x86 cross compiler on AMD64. Those two librariesare used in code generation for utilizing some 64 bit instruction set features.The GCC build system is smart enough not to compile those libraries for the x86target (because it simply does not have that CPU features), but for some reasontries to link the final compiler against the libraries, generatinga linking error. Disabling those libraries altogether will stop thatfrom happening.

We --disable-libsanitizer because it simply won't build for musl. I triedfixing it, but it simply assumes too much about the nonstandard internalsof the libc. A quick Google search reveals that it has lots of similarissues with all kinds of libc & kernel combinations, so even if I fix it onmy system, you may run into other problems on your system or with differentversions of packets. It even has different problems with different versionsof glibc. Projects like buildroot simply disable it when using musl. It "only"provides a static code analysis plugin for the C++ compiler.

The option --with-native-system-header-dir is of special interest for ourcross compiler. Since we pointed the root directory to $TCDIR/$TARGET, thecompiler will look for headers in $TCDIR/$TARGET/usr/include, but we didn'tinstall them to /usr/include, we installed them to$TCDIR/$TARGET/include, so we have to tell the build system that is shouldlook in /include (relative to the root directory) instead.

makemake installcd "$BUILDROOT"

This time, we are going to build and install everything. You really want todo a parallel build here. On an ordinary desktop machine, this is going to takesome time. You might want to go for a walk, watch an episode of Columbo ordo whatever while this builds. If you are using a laptop or similar machinewith thermal issues, you might want to open a window (assuming it is coldoutside).

Testing the Toolchain

We quickly write our average hello world program into a file called test.c:

#include <stdio.h>int main(void){    puts("Hello, world");    return 0;}

We can now use our cross compiler to compile this C file:

$ ${TARGET}-gcc test.c

Running the program file on the resulting a.out will tell us that it hasbeen properly compiled and linked for our target machine:

$ file a.outa.out: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-musl-i386.so.1, not stripped

Of course, you won't be able to run the program on your build system, exceptmaybe for the x86 version which will run on x86_64 if you have a 32 bit muslinstalled or if you compile it completely statically linked:

$ ${TARGET}-gcc -static -static-libgcc test.c

Cross compiling mtd-utils

Conclusion

本站仅提供存储服务，所有内容均由用户发布，如发现有害或侵权内容，请点击举报。