Log entry 2024-03-26

This commit is contained in:
vanten-s 2024-03-26 20:47:51 +01:00
parent 0d2071e68d
commit d5b0db8e88
Signed by: vanten-s
GPG key ID: DE3060396884D3F2

View file

@ -1,6 +1,365 @@
# Writing an OS in rust
# Introduction
I have tried multiple times to write an OS, but have always failed. Time to fail again :). I will be writing down my progress here, and maybe share it some day.
I have tried multiple times to write an OS, but have always failed. Time to fail again :). I will be writing down my progress here, and maybe share it some day. I have previously used [osdev](https://osdev.org/) to learn and now I also use [OS in Rust](https://os.phil-opp.com/).
# Cross-compilation
The first step to making an OS is of course installing the correct compilers.
## binutils
Binutils is needed for linking and assembling. To install binutils I began by downloading the latest version from [sourceware](https://sourceware.org/pub/binutils/snapshots/) into the repository root directory.
After that I ran these commands in the repository:
```bash
tar -xzf binutils-x.y.z.tar.xz # Extract the source code
mv binutils-x.y.z binutils # Move source into a binutils directory
cd binutils
mkdir build-binutils # Create build directory
../configure --target=i686-elf --prefix=../../opt --with-sysroot --disable-nls --disable-werror
make # Compile
make install # Install
```
## gcc
Gcc is needed for eventual C code and as a frontend for the linker. To install gcc I began by downloading the source code from [mirrorservice](https://mirrorservice.org/sites/sourceware.org/pub/gcc/snapshots/) into the repository directory. After I downloaded gcc I ran these commands:
```bash
tar -xzf gcc-x.y.z.tar.xz # Extract the source code
mv gcc-x.y.z gcc # Move source into a binutils directory
cd gcc
mkdir build-gcc # Create build directory
../configure --target=i686-elf --prefix=../../opt --disable-nls --enable-languages=c,c++ --without-headers
make all-gcc # Compile gcc
make all-target-libgcc # Compile libgcc
make install-gcc # Install gcc
make install-target-libgcc # Install libgcc
```
In order to add gcc and binutils to `$PATH` I added this line to `shellHook` in `shell.nix`:
```bash
export PATH="$PATH:$PWD/opt/bin"
```
## rustup
[Rustup](https://rustup.rs/) is needed to install the correct target. To make anyone able to replicate my toolchain, I used [nix](https://nixos.org). Following [this](https://nixos.wiki/wiki/Rust) tutorial, I got rustup installed. I also added
```bash
rustup component add rust-analyzer
rustup component add rust-src
```
to `shellHook` to add the necessary components for IDEs and for compiling to custom targets.
In the end, `shell.nix` looked like this:
```nix
{ pkgs ? import <nixpkgs> {} }:
pkgs.mkShell rec {
buildInputs = with pkgs; [
clang
llvmPackages_17.bintools
rustup
qemu
grub2
libisoburn
bison
flex
gmp
libmpc
mpfr
texinfo
isl
];
RUSTC_VERSION = "nightly"; # Required for some experimental cargo features
hardeningDisable = [ "all" ]; # Required to compile gcc
LIBCLANG_PATH = pkgs.lib.makeLibraryPath [ pkgs.llvmPackages_latest.libclang.lib ];
shellHook = ''
rustup component add rust-analyzer
rustup component add rust-src # Needed for compiling to a custom target
export PATH=$PATH:''${CARGO_HOME:-~/.cargo}/bin
export PATH=$PATH:''${RUSTUP_HOME:-~/.rustup}/toolchains/nightly-x86_64-unknown-linux-gnu/bin/
export PATH="$PATH:$PWD/opt/bin"
'';
# Add glibc, clang, glib, and other headers to bindgen search path
BINDGEN_EXTRA_CLANG_ARGS =
# Includes normal include path
(builtins.map (a: ''-I"${a}/include"'') [
# add dev libraries here (e.g. pkgs.libvmi.dev)
pkgs.glibc.dev
])
# Includes with special directory paths
++ [
''-I"${pkgs.llvmPackages_latest.libclang.lib}/lib/clang/${pkgs.llvmPackages_latest.libclang.version}/include"''
''-I"${pkgs.glib.dev}/include/glib-2.0"''
''-I${pkgs.glib.out}/lib/glib-2.0/include/''
];
}
```
# Bootstrap assembly
To boot the operating system you need a bootloader. For this project I will just be using grub. But to use grub, grub needs to know how to boot your OS. To tell this to grub you need to have a header in your binary file. In order to achieve this I made an assembly file named `boot.s` containing this:
```asm
/* Declare constants for the multiboot header. */
.set ALIGN, 1<<0 /* align loaded modules on page boundaries */
.set MEMINFO, 1<<1 /* provide memory map */
.set FLAGS, ALIGN | MEMINFO /* this is the Multiboot 'flag' field */
.set MAGIC, 0x1BADB002 /* 'magic number' lets bootloader find the header */
.set CHECKSUM, -(MAGIC + FLAGS) /* checksum of above, to prove we are multiboot */
/*
Declare a multiboot header that marks the program as a kernel. These are magic
values that are documented in the multiboot standard. The bootloader will
search for this signature in the first 8 KiB of the kernel file, aligned at a
32-bit boundary. The signature is in its own section so the header can be
forced to be within the first 8 KiB of the kernel file.
*/
.section .multiboot
.align 4
.long MAGIC
.long FLAGS
.long CHECKSUM
/*
The multiboot standard does not define the value of the stack pointer register
(esp) and it is up to the kernel to provide a stack. This allocates room for a
small stack by creating a symbol at the bottom of it, then allocating 16384
bytes for it, and finally creating a symbol at the top. The stack grows
downwards on x86. The stack is in its own section so it can be marked nobits,
which means the kernel file is smaller because it does not contain an
uninitialized stack. The stack on x86 must be 16-byte aligned according to the
System V ABI standard and de-facto extensions. The compiler will assume the
stack is properly aligned and failure to align the stack will result in
undefined behavior.
*/
.section .bss
.align 16
stack_bottom:
.skip 16384 # 16 KiB
stack_top:
/*
The linker script specifies _start as the entry point to the kernel and the
bootloader will jump to this position once the kernel has been loaded. It
doesn't make sense to return from this function as the bootloader is gone.
*/
.section .text
.global _start
.type _start, @function
_start:
/*
The bootloader has loaded us into 32-bit protected mode on a x86
machine. Interrupts are disabled. Paging is disabled. The processor
state is as defined in the multiboot standard. The kernel has full
control of the CPU. The kernel can only make use of hardware features
and any code it provides as part of itself. There's no printf
function, unless the kernel provides its own <stdio.h> header and a
printf implementation. There are no security restrictions, no
safeguards, no debugging mechanisms, only what the kernel provides
itself. It has absolute and complete power over the
machine.
*/
/*
To set up a stack, we set the esp register to point to the top of the
stack (as it grows downwards on x86 systems). This is necessarily done
in assembly as languages such as C cannot function without a stack.
*/
mov $stack_top, %esp
/*
This is a good place to initialize crucial processor state before the
high-level kernel is entered. It's best to minimize the early
environment where crucial features are offline. Note that the
processor is not fully initialized yet: Features such as floating
point instructions and instruction set extensions are not initialized
yet. The GDT should be loaded here. Paging should be enabled here.
C++ features such as global constructors and exceptions will require
runtime support to work as well.
*/
/*
Enter the high-level kernel. The ABI requires the stack is 16-byte
aligned at the time of the call instruction (which afterwards pushes
the return pointer of size 4 bytes). The stack was originally 16-byte
aligned above and we've pushed a multiple of 16 bytes to the
stack since (pushed 0 bytes so far), so the alignment has thus been
preserved and the call is well defined.
*/
call kernel_main
/*
If the system has nothing more to do, put the computer into an
infinite loop. To do that:
1) Disable interrupts with cli (clear interrupt enable in eflags).
They are already disabled by the bootloader, so this is not needed.
Mind that you might later enable interrupts and return from
kernel_main (which is sort of nonsensical to do).
2) Wait for the next interrupt to arrive with hlt (halt instruction).
Since they are disabled, this will lock up the computer.
3) Jump to the hlt instruction if it ever wakes up due to a
non-maskable interrupt occurring or due to system management mode.
*/
cli
1: hlt
jmp 1b
/*
Set the size of the _start symbol to the current location '.' minus its start.
This is useful when debugging or when you implement call tracing.
*/
.size _start, . - _start
```
This file creates a multiboot header and calls `kernel_main`.
To assemble this I used the GNU assembler we compiled earlier:
`i686-elf-as -o boot.o boot.s`
# Linking
## What is it?
In order to combine `boot.s` and the kernel we need to do something called "linking". This will combine all the object files into a raw binary. An object file is a like a half done executable. It will contain references in ascii to for example `printf`. This can't be ran directly, because it doesn't know where `printf` is. When you link the object file, it combines your object file with `stdio`, which contains `printf`.
## For an operating system
I have to combine `boot.s` and the kernel in a specific way to preserve the multiboot header. To do this I can create a *linker script* that tells the linker how to link the object files. To do this I created a file called `linker.ld` in the repository root, which contains this:
```linker
/* The bootloader will look at this image and start execution at the symbol
designated as the entry point. */
ENTRY(_start)
/* Tell where the various sections of the object files will be put in the final
kernel image. */
SECTIONS
{
/* Start offset */
. = 2M;
/* First put the multiboot header, as it is required to be put very early
in the image or the bootloader won't recognize the file format.
Next we'll put the .text section. */
.text BLOCK(4K) : ALIGN(4K)
{
*(.multiboot)
*(.text)
}
/* Read-only data. */
.rodata BLOCK(4K) : ALIGN(4K)
{
*(.rodata)
}
/* Read-write data (initialized) */
.data BLOCK(4K) : ALIGN(4K)
{
*(.data)
}
/* Read-write data (uninitialized) and stack */
.bss BLOCK(4K) : ALIGN(4K)
{
*(COMMON)
*(.bss)
}
/* The compiler may produce other sections, by default it will put them in
a segment with the same name. Simply add stuff here as needed. */
}
```
I specified that I want to use this script in `i686-bare-metal.json`.
# Kernel
The last thing that `boot.s` does is that it calls `kernel_main`. This is defined in a rust file. To setup the rust project I used `cargo new kernel`. This creates an (not entirely) empty rust project called `kernel`.
## Target
By default rust will compile to x86_64-unknown-linux-gnu, but to make it compile to x86 without an OS and with our own linker script I needed to define a custom target. In `kernel/i686-bare-metal.json` I put this
```json
{
"llvm-target": "i686-unknown-none",
"data-layout": "e-m:e-p:32:32-p270:32:32-p271:32:32-p272:64:64-i128:128-f64:32:64-f80:32-n8:16:32-S128",
"arch": "x86",
"target-endian": "little",
"target-pointer-width": "32",
"target-c-int-width": "32",
"os": "none",
"executables": true,
"linker-flavor": "gcc",
"linker": "i686-elf-gcc",
"panic-strategy": "abort",
"disable-redzone": true,
"features": "-sse,+soft-float",
"dynamic-linking": false,
"relocation-model": "pic",
"code-model": "kernel",
"exe-suffix": ".elf",
"has-rpath": false,
"no-default-libraries": true,
"position-independent-executables": false,
"pre-link-args": {
"gcc": ["-T", "../linker.ld", "-ffreestanding", "-nostdlib", "-lgcc", "../build/boot.o"]
}
}
```
This tells the compiler to compile to 32-bit x86:
```json
"llvm-target": "i686-unknown-none",
"arch": "x86",
"target-pointer-width": "32",
"target-c-int-width": "32",
```
To use gcc for linking:
```json
"linker-flavor": "gcc", # Tells the compiler that we're using gcc
"linker": "i686-elf-gcc", # Tells the compiler what gcc executable we want to use
```
And this tells rust to link this with `boot.s`, without a standard library, freestanding and using my build script `../../linker.ld`.
To use this custom target I needed to create a cargo configuration. To do this I wrote this in `kernel/.cargo/config.toml`
```toml
[build]
target = "i686-unknown-bare.json" # Use the custom target
[unstable]
build-std-features = ["compiler-builtins-mem"] # To use a custom target, I need to manually compile the standard library.
build-std = ["core", "compiler_builtins"]
```
## Code
Just to test the kernel I put this in `kernel/src/main.rs`
```rust
#![no_std] // Don't compile with std
#![no_main] // Don't compile with a main function
#[no_mangle]
fn kernel_main() {
let vga_buffer = 0xb8000 as *mut u8; // Make a pointer to address 0xB8000
let string: &[u8] = b":3"; // Define a string to print
for (i, &byte) in string.iter().enumerate() { // For every character in the string
unsafe {
*vga_buffer.offset(i as isize * 2) = byte; // Print the character
*vga_buffer.offset(i as isize * 2 + 1) = 0xf; // Set the color to white
}
}
loop {} // End of program
}
#[panic_handler]
fn panic(_info: &core::panic::PanicInfo) -> ! {
loop {}
}
```
# Building the kernel
To build the kernel I just run `cargo build` in the `kernel` directory. This will produce a binary called `kernel/target/i686-bare-metal/debug/kernel.elf`. I now have a multiboot enabled kernel :)
# File structure
```
geos/
| opt/
| | gcc and binutils binaries and lib.
| kernel/
| | rust project
| gcc/
| | gcc source code
| binutils
| | binutils source code
```
# Assembler