[back]
Rust: Adventure Into System Software For ARM
modified: 2015-03-22 04:22:15

1. Rust: Adventure Into System Software For ARM
1.1. Minimal Program
1.2. To ARM Or Not To ARM?
1.3. Tools Needed
1.4. QEMU Saves The Day
1.5. More Structure And.. A Kernel?
1. Rust: Adventure Into System Software For ARM

To preface this and explain my reasoning I had decided I was going to try to use Rust to do some system level programming on bare metal. I have done a lot on X86 and ARM using C but never with Rust. All I had at the time was the rustc installed using the pre-built windows installation and a virtual machine with 300MB running an older version of Debian linux with a gcc cross-compiler for ARM and binutils.

1.1. Minimal Program

Firstly, I wanted to see how much I could strip away and have left just the minimal rust implementation in an outputted binary. To do this with a little research I came up with:

    #![feature(lang_items)]
    #![no_std]

    #[lang="sized"]
    trait Sized {}
    #[lang="sync"]
    trait Sync {}

    #[start]
    fn main(argc: int, argv: *const *const u8) -> int {
        0
    }

I checked the output with --emit=asm and as expected it literally only contained the main function! It did however include prolog and epilog for the function. After some research I found out that currently there is no way to tell Rust to emit a naked function essentially. This means I will either have to sort of hack it up or link to some external assembly file. However, I was going to put this on hold and look at actually targeting my desired architecture which is different than my development platform.

1.2. To ARM Or Not To ARM?

In my case I desired to target ARM as I had no real desire as a hobby project to deal with 16-bit mode, GRUB, or all the intricate and complicated X86 and X86-64 structures. I really like the ARM architecture and I feel like I have burned out playing with X86 so something new and different is always great. So my next goal was to see how to get rustc to emit an object file targeting a specific architecture.

If you do not already know Rust uses LLVM. Rust produces IR code that LLVM reads and then LLVM translates that into machine code for the target architecture.

It seems that when rustc is compiled LLVM is also compiled. LLVM is compiled supporting default or specified architectures. I am not sure if it defaultly compiles for all architectures although I doubt it does. Never the less the LLVM you compile either supports your target architecture or it does not. In my case it did. The easiest way to find out is to continue along like I am explaining and it will either work or it will not.

So, I found the --target option for rustc. This accepts an identifier that identifiers the target. The target identifier selects either an internal description built into rustc under the librustc_back module or ends up selecting an external JSON file which descripts the target. The description of the target includes various thins such as target endianess, target word size, architecture, OS, and various other things. This description allows LLVM to correctly generate machine instructions, system calls, and also helps Rust in the selection of libraries. These libraries include things like core, std, and friends. Each of these libraries that are written in Rust must be compiled for the specific target. However, if you are doing system level programming like me you can get away with using none of these libaries or only some of them!

In my case I wanted bare metal which means I would use no std or even core which I was advised against but since I love a challenge and a learning experience I decided to ignore the advice. Since I was firstly on my Windows box I decided to check out the source code for rustc to determine how it selects the target using the --target option. I found out that rustc includes an internal description for the --target=arm-unknown-linux-gnueabi and by default looks for the directory arm-unknown-linux-gnueabi to contain all the libraries such as std, core, and friends. So I decided to try an experiment and simply copy my x86_64-pc-windows-gnu directory and rename it arm-unknown-linux-gnueabi.

I then used my bare metal source code shown near the beginning and ran rustc test.rs --emit=asm --target=arm-unknown-linux-gnueabbi and it worked! It produced test.s which contained the ARM instructions. I then decided to experiment and removed the #![no_std] in the source. As expected it failed with:

    error: couldn't find crate `std` with expected target triple arm-unknown-linux-gnueabi

Obviously, it likely found a std crate but of course it was compiled for x86_64-pc-windows-gnu which means it rejected it and either errored or or continued to search unsuccessfully. So, this is actually a success as it means I can to some extend target ARM with my Windows build with out doing a custom compile.

1.3. Tools Needed

Well, I decided I would really like to use objcopy and objdump and just have a pure Linux build setup. Also since I run Linux in a VM and have a shared directory between the two I can do some code writing in Windows which is nice and then do my actual build on Linux. The problem is I am using Debian which has no official Rust package. So I decided I would give compiling it a shot. The problem is I only have 300MB of RAM for my VM and I have read a few times that you need at least 2GB but I decided to give it a try anyway. Eventually the compile failed with out of memory so I increased my VM to 2GB and the compile completed fine.

I tried running the rustc directly with out running the installation script from but got errors loading shared libraries. So I ran make install. At first I was encountering an illegal instruction when running my VM at 300MB and discovered this was caused by an out of memory program during the running of rustdoc during make install. Once I increased memory to 512MB make install completed successfully.

At this time I have a working rustc which is awesome first of all. The next goal was to get it to target ARM and finally transform the output into a flat binary that can be executed by the target processor. On my windows machine I had copied the target directory containing all the precompiled Rust libraries before trying to compile for the target. This time I decided since I did not know immediantly where the library folder resided on my Linux machine to go ahead and try to target ARM:

    rustc test.rs --emit=asm --target=arm-unknown-linux-gnueabi

To my amazement it actually did not complain about a missing directory! However, it did try to load the crate std and complained that it could not find it for the target. So I went back into my code and realized I had commented out #![no_std]. Once I uncommented the directive rustc happily emitted ARM 32-bit instructions in an ELF32 little-endian format! Excellent! Of course I will have to compile any crate I want to use from the standard library if I desire to use them if they can be used (which I heard crate core can and is encouraged), but I can live with out them for the time being at least and maybe for a long time as a challenge.

As I said before my next challenge was to transform the output into a flat binary which I have done before when working with C. The process is almost exactly the same, or should be, using Rust since I have reached the object file stage. One final hurdle remains to be easily solved and that is getting a jump instruction at the top of my flat binary with either a naked function or a series of instructions to setup a stack. Once a stack is setup I can jmp or call into an Rust function like normal and continue setting up the system from that point.

I finally got objcopy and ld up and running nicely. Also rustc under ARM32 is producing essentially a naked function and I got the function at the top of the object file to which I assume to be due to ordering by function name. Here is my current source:

    #![feature(lang_items)]
    #![no_std]
    #![allow(unused_variables)]
    #![allow(dead_code)]
    #![feature(asm)]

    #[lang="sized"]
    trait Sized {}
    #[lang="sync"]
    trait Sync {}

    #[lang = "exchange_heap"]
    #[experimental = "may be renamed; uncertain about custom allocator design"]
    pub static HEAP: () = ();

    /// A type that represents a uniquely-owned value.
    #[lang = "owned_box"]
    #[unstable = "custom allocators will add an additional type parameter (with default)"]
    pub struct Box<T>(*mut T);

    struct KernelImageGlobal {
        heapoffset:     uint,
        curheapndx:     uint
    }

    static mut KERNELIMAGEGLOBAL: KernelImageGlobal = KernelImageGlobal {
        heapoffset:     0,
        curheapndx:     0
    };

    unsafe fn __topofimage() {
        asm!("mov sp, $0" : : "r"(0x1000u));
        /*
            These are things that I really do not want to implement
            at the moment. Also my `as` implementation has broken
            and I do not want to use `gas`, also my goal was to get
            everything in Rust - does this not count!!
        */
        asm!("__morestack:");
        asm!("__aeabi_unwind_cpp_pr0:")
    }

    #[lang="exchange_malloc"]
    #[inline]
    unsafe fn exchange_malloc(size: uint, align: uint) -> *mut u8 {
        /*
            The most simple heap possible!
        */
        let ptr: uint;
        ptr = KERNELIMAGEGLOBAL.heapoffset + KERNELIMAGEGLOBAL.curheapndx;
        KERNELIMAGEGLOBAL.curheapndx += size;

        ptr as *mut u8
    }

    #[lang="exchange_free"]
    #[inline]
    unsafe fn exchange_free(ptr: *mut u8, old_size: uint, align: uint) {
        /*
            The most simple heap possible. It does not support
            deallocation!
        */
    }

    #[start]
    fn main(argc: int, argv: *const *const u8) -> int {
        let x: Box<uint> = box 3u;

        0
    }

I essentially leave __aeabi_unwind_cpp_pr0 and

    rustc main.rs --emit=obj --target=arm-unknown-linux-gnueabi
    arm-ld main.o -o rustk.elf

After fiddling with the flat binary output some more I found a few problems and made some changes in the source mostly to do with elimination of dead code by the compiler. It seems when rustc is on --opt-level 3 it will not detect symbols references in assembly as being alive code and will eliminate code causing linker errors. So I had to do a little hacking. Essentially my entry function can either by written at the moment to work with optimization level zero or optimization level three but not both because I have to use a different method to keep it from removing code that it thinks is dead.

1.4. QEMU Saves The Day

I just realize I need qemu-system-arm! So I grabbed a copy of the latest stable source and compiled producing the binary. Next, I compiled my final source form of:

    #![feature(lang_items)]
    #![no_std]
    #![allow(unused_variables)]
    #![allow(dead_code)]
    #![feature(asm)]

    #[lang="sized"]
    trait Sized {}
    #[lang="sync"]
    trait Sync {}

    static GDT: [u32, ..5] = [0, 1, 2, 3, 4];

    #[lang = "exchange_heap"]
    #[experimental = "may be renamed; uncertain about custom allocator design"]
    pub static HEAP: () = ();

    /// A type that represents a uniquely-owned value.
    #[lang = "owned_box"]
    #[unstable = "custom allocators will add an additional type parameter (with default)"]
    pub struct Box<T>(*mut T);

    struct KernelImageGlobal {
        heapoffset:     uint,
        curheapndx:     uint
    }

    static mut KERNELIMAGEGLOBAL: KernelImageGlobal = KernelImageGlobal {
        heapoffset:     0,
        curheapndx:     0
    };

    #[start]
    fn main(argc: int, argv: *const *const u8) -> int {
    //unsafe fn __topofimage() {
        unsafe {
            asm!("mov sp, $0" : : "i"(0x2000u));
        }

        kstart();

        unsafe {
            asm!("b kstart");
            /*
                These are things that I really do not want to implement
                at the moment. Also my `as` implementation has broken
                and I do not want to use `gas`, also my goal was to get
                everything in Rust - does this not count!!
            */
            asm!("__morestack:");
            asm!("__aeabi_unwind_cpp_pr0:");
        }

        0
    }

    #[lang="exchange_malloc"]
    #[inline]
    unsafe fn exchange_malloc(size: uint, align: uint) -> *mut u8 {
        /*
            The most simple heap possible!
        */
        let ptr: uint;
        ptr = KERNELIMAGEGLOBAL.heapoffset + KERNELIMAGEGLOBAL.curheapndx;
        KERNELIMAGEGLOBAL.curheapndx += size;

        ptr as *mut u8
    }

    #[lang="exchange_free"]
    #[inline]
    unsafe fn exchange_free(ptr: *mut u8, old_size: uint, align: uint) {
        /*
            The most simple heap possible. It does not support
            deallocation!
        */
    }

    const SERIAL_BASE: u32 = 0x10009000;
    const SERIAL_FLAG_REGISTER: u32 = 0x18;
    const SERIAL_BUFFER_FULL: u32 = 1 << 15;

    fn kserdbg_putc(c: u8) {
        unsafe {
            let mem: *mut u32 = (SERIAL_BASE + SERIAL_FLAG_REGISTER) as *mut u32;

            while (*mem & SERIAL_BUFFER_FULL) == 0 {}

            let mem: *mut u32 = SERIAL_BASE as *mut u32;

            *mem = c as u32;
        }
    }

    #[no_mangle]
    extern fn kstart() {
        /*
            Print A then B then C to the serial h/w port.
        */
        kserdbg_putc(65);
        kserdbg_putc(66);
        kserdbg_putc(67);
        loop { }
    }

My build script looked like this:

    rustc main.rs --opt-level 3 --emit=obj --target=arm-unknown-linux-gnueabi
    arm-ld main.o -o rustk.elf
    arm-objcopy -j .text -O binary rustk.elf rustk.bin

And, my launch of QEMU looked like this:

    qemu-system-arm -m 8 -kernel rustk.bin -serial stdio -machine realview-eb-mpcore

The -machine realview-eb-mpcore is required. The integratorcp sports the same serial hardware but it is located at a different MMIO address.

1.5. More Structure And.. A Kernel?

Well, when I started I wanted to write some system software. I just did not mention exactly that it would be a kernel because at first it was not and at this time it truly is not yet, but to get any more advanced we will have to take the form of one. A simple kernel at that

I knew that I needed more strucure, and my previous experience, especially with ARM I knew that I needed a flexible build system that could handle multiple boards. In C I built a fancy builder written in Python that took care of piecing the kernel together as a bunch of object files. In previous projects I had the structure of a board driver, kernel, and other modules. The other modules includes drivers, programs, or data. These modules were attach to the kernel and allowed it to boot the system to a point where it would be able, if needed, access primary storage and finish loading the system.

Rust is a little different compared to C. Maybe, I should say Rust is a lot different especially in compiling with it's system of modules and crates. What I needed was to learn a bit more and experiment with the crate, module, and library system of Rust to come up with a system similar to one in C where one could substitute certain libraries, modules, and crates. For example I might have the kernel try to use a crate called board but the board crate would be compiled from a selected board and each board would reside in a directory. This would allow compiling the kernel with different board support and make more structure and help in dividing out the design of our little system software.

I have decided to do a three crate approach for the main kernel image. Essentially, I will have a crate for the board, kernel, and core. I had to make a core crate because some code is shared between kernel and board and needs to be present to build board in order to build the kernel. So the core crate is compiled first then the board and finally the kernel. The board crate is choosen and built depending on the target board so it can change depending on what board the kernel is targeting. I think the actual modules containing for example a driver will be built separately and loaded separately however this may enlarge the kernel and I may try to find a way to support more of a built-in driver method, but I fear it will still involve compiling each driver to a crate and pulling it in during compile time instead of run time although I will for certain always certain a run-time as this will be required to load drivers from primary storage after the kernel has booted and initialized.