ARM Bare-metal Programming
ARM Cortex processors and microcontollers are ubiquitous, it’s such a successful processor architecture. It’s useful to understand how they work and how to use them as tools. When getting started with embedded development with ARM Cortex it might seem like a complex and difficult platform since a lot of the work will usually be done for you- i.e. you will download tools that setup projects for you, configure and initialize the devices for you and pull in dependencies with some API to use the device peripherals. It’s not obvious what is going on “under the hood” during those steps and what some of the resulting code is doing. Here I’ll document the steps to get an ARM Cortex microcontroller up and running with only simple tools and eventually make it perform more advanced tasks.
Source code is available at my GitHub. In this post a LED is toggled. In the repository there is code that do more advanced tasks such as configuring the microcontroller as a USB PC mouse, sending I2C data or running code asynchronously using interrupt service routines.
Article summary: I program an ARM Cortex M3 microcontroller using basic tools. With nothing more than the GCC-ARM toolchain program a STM32 microcontroller.
The tools needed is the GCC-ARM toolchain, a text-editor and a way to program the microcontroller. On Ubuntu the toolchain is installed by executing
apt install gcc-arm-none-eabi binutils-arm-none-eabi
To get a basic hello world program that blinks a LED on and off running the steps are the following.
Define the microcontroller memory layout for the compiler using a linker-script.
Create the startup-routine. It’s a sort of bootloader.
Create the main application which is hello world.
Compile the programs and flash them to the microcontroller.
The microcontroller I’ll be using is STM32F103C8. The datasheet is available here.
The linker script is used to define the memory layout of the output binary, that is the program flashed onto the microcontroller and what it will execute on boot.
The image below is an excerpt from the datasheet and shows at which addresses flash memory and SRAM are located. Because the memory is accessed at different addresses the linker script is needed. Data needs to be written to addresses there is actual hardware memory. Certain types of data should be stored in different types of memory. For example data in SRAM is destroyed when power is lost but it’s faster to access than flash memory and perfect to store read-only data such as the actual program instructions.
Data will be segmented depending on the type. Here data is segmented using “text”, “rodata”, “data” and “bss”. This wikipedia article summarizes the difference between some of these. A custom segment called “vectors” will also be used. It is an architecture specific data-region that contains addresses for the stack-top and addresses for exception handlers. Read more about it here.
Here is the actual link script:
link_script.ld
/* Linkfile for a STM32F103C8*
* Robin Isaksson 2016
*/
MEMORY {
SRAM : ORIGIN = 0x20000000, LENGTH = 20K
FLASH : ORIGIN = 0x08000000, LENGTH = 64K
}
SECTIONS {
_STACK_TOP = 0x20000000 + 20K; /* Set the stack-top symbol to the end of SRAM */
. = 0x00000000; /* Set current address to the origin of the flash memory */
.text : {
KEEP ( * (vectors) ); /* Custom segment, ARM Cortex vector table */
* (.text);
} > FLASH
.rodata : {
* (.rodata);
} > FLASH
FLASH_DATA_START = .; /*Where we will load from flash when writing to SRAM */
/* Now we'll configure where we will keep our data outside off flash */
. = 0x20000000; /* Set current address to the origin of the SRAM */
SRAM_DATA_START = .;
.data : AT (FLASH_DATA_START) {
* (.data);
} > SRAM
SRAM_DATA_END = .;
SRAM_DATA_SIZE = SRAM_DATA_END - SRAM_DATA_START;
BSS_START = .;
.bss : {
* (.bss);
* (COMMON);
} > SRAM
BSS_END = .;
BSS_SIZE = BSS_END - BSS_START;
}
ENTRY(main)
The memory regions are defined in the memory block. From the datasheet the flash and SRAM region starting addresses and lengths are obtained. The flash memory is 64k and the SRAM 20k. Next the data segments are defined using the section block. Some useful memory-addresses symbols are defined, such as the top of the stack, the part of flash memory after the text- and rodata-segments (flash data start) and some similar symbols for SRAM. Those symbols will be used by the startup program to initialize and load data.
By using the “AT” keyword a load-address can be specified, that is data can be loaded from one part of the binary and executed from another. Doing this all data is flashed onto the flash-memory but some parts of it will be relocated to SRAM. This is something that the startup routine needs to do.
The startup routine is needed for a few tasks. It’s used to define the custom memory region with the vector table mentioned in the previous section and to define symbols that can be used to set that vector table’s entries memory addresses to exception handler functions. It’s also used to move executable code from the flash memory to the faster SRAM and to initialize some variables to zero (for example global variables in C need to be initialized to zero).
The image below show’s the format of the vector table. The beginning of the startup routine defines this data segment.
The startup routine begins with some assembler directives specifying syntax, architecture and to use the thumb-instruction set. Then the vector data segment is defined by allocating long data types contiguously. I’ve only included the generic vectors so far. More device specific vectors need to be added later to implement useful hardware interrupts, such as creating an interrupt handler that executes code when a timer reaches a certain value.
startup.s
/* Startup code for a STM32F103C8
* Robin Isaksson 2017
*/
.syntax unified
.arch armv7-m
.thumb
/* Initial vector table */
.section "vectors"
.long _STACK_TOP
.long _reset_Handler
.long _NMI_Handler
.long _HardFault_Handler
.long _MemManage_Handler
.long _BusFault_Handler
.long _UsageFault_Handler
.long 0 /* Reserved */
.long 0 /* Reserved */
.long 0 /* Reserved */
.long 0 /* Reserved */
.long _SVCall_Handler
.long _DebugMonitor_Handler
.long 0 /* Reserved */
.long _PendSV_Handler
.long _SysTick_Handler
/* The macro below define each entry as a weak symbol which can be overwritten by
* including the corresponding function-name in a C-file (for example) */
.macro def_rewritable_handler handler
.thumb_func
.weak \handler
.type \handler, %function
\handler: b . @@ Branch forever in default state
.endm
def_rewritable_handler _NMI_Handler
def_rewritable_handler _HardFault_Handler
def_rewritable_handler _MemManage_Handler
def_rewritable_handler _BusFault_Handler
def_rewritable_handler _UsageFault_Handler
def_rewritable_handler _SVCall_Handler
def_rewritable_handler _DebugMonitor_Handler
def_rewritable_handler _PendSV_Handler
def_rewritable_handler _SysTick_Handler
Next follows the actual startup program logic. I won’t be getting into details of the actual assembler logic. The program moves data to SRAM using the symbols in the link-script, it initializes variables in the bss-segment to zero and then it calls the main-function. The main-function is not defined yet and will be defined in a C program. The “_reset_handler” symbol from the vector table is inserted here which mean that when the reset handler is called the startup routine will re-run. This is useful as the device will be in a predictable state when it resets.
startup.s continued
/* Startup code */
.text
.thumb_func
_reset_Handler:
/* Copy data from flash to RAM */
_startup:
@@ Copy data to RAM
ldr r0, =FLASH_DATA_START
ldr r1, =SRAM_DATA_START
ldr r2, =SRAM_DATA_SIZE
@@ If data is zero then do not copy
cmp r2, #0
beq _BSS_init
/* Initialize data by loading it to SRAM */
ldr r3, =0 @@ i = 0
_RAM_copy:
ldr r4, [r0, r3] @@ r4 = FLASH_DATA_START[i]
str r4, [r1, r3] @@ SRAM_DATA_START[i] = r4
adds r3, #4 @@ i++
cmp r3, r2 @@ if i < SRAM_DATA_SIZE then branch
blt _RAM_copy @@ otherwise loop again
/* Initialize uninitialized variables to zero (required for C) */
_BSS_init:
ldr r0, =BSS_START
ldr r1, =BSS_END
ldr r2, =BSS_SIZE
/* If BSS size is zero then do not initialize */
cmp r2, #0
beq _end_of_init
/* Initialize BSS variables to zero */
ldr r3, =0x0 @@ i = 0
ldr r4, =0x0 @@ r4 = 0
_BSS_zero:
str r4, [r0, r3] @@ BSS_START[i] = 0
adds r3, #4 @@ i++
cmp r3, r2 @@ if i < BSS_SIZE then branch
blt _BSS_zero @@ otherwise loop again
/* Here control is given to the C-main function. Take note that
* no heap is initialized. */
_end_of_init: bl main
_halt: b _halt
.end
The hello world program is written in C and will simply cycle an output pin between low to high states. I’ll be using a small LED-connected to that pin and it will be possible to see it blink when the program is running. Turning a led on and off is the equivalent to hello world in embedded development.
I’m including the library header file called “stm32f10x.h” since it contains preprocessor macros for registry memory addresses and defines some helpful structs to work with registries. This makes the code infinitely more readable and no tedious work of copying registry addresses from the datasheet is required.
main.c
/* LED blinker for STM32F103C8
* Robin Isaksson 2017
*/
#include <stm32f10x.h>
#include <stdint.h>
void delay(void);
int main(void) {
/* Enable clock to IO port C */
RCC->APB2ENR |= RCC_APB2ENR_IOPCEN;
/* Set pin C13 as an output */
GPIOC->CRH &= ~GPIO_CRH_MODE13; //Clear bits
GPIOC->CRH |= GPIO_CRH_MODE13; //Output mode, max 50 MHz
GPIOC->CRH &= ~GPIO_CRH_CNF13; //GPIO output push-pull
while(1) {
GPIOC->BSRR = (1U << 13U); //Set pin HIGH
delay();
GPIOC->BRR = (1U << 13U); //Reset pin to LOW
delay();
}
}
/* Primitive delay function */
void delay(void) {
uint8_t i;
uint8_t j;
for (i = 0; i < 0xFF; i++) {
for (j = 0; j < 0xFF; j++) {
}
}
}
The code is simple. The general purpose input/output port C is first enabled and then pin 13 of the port is configured to be an output. In an infinite while loop pin 13 is repeatedly set high and then reset back to low between delays. The delay function is a simple loop that counts to a specified number.
To compile the project the ARM-GCC toolchain is used. I’ve written a makefile that compiles all sourcefiles separately and then links them to a ELF-file which is then translated it to a binary format that is ready to be flashed to the microcontroller. I’m using the “ST-Link” programmer which has a command-line program called “st-flash” that is used to flash the microcontroller.
# The output- and intermediary files will be named $(TARGET)
TARGET = blinky
DEBUG = -g
TOOLCHAIN = arm-none-eabi
CC = $(TOOLCHAIN)-gcc
AS = $(TOOLCHAIN)-as
LD = $(TOOLCHAIN)-ld
OCP = $(TOOLCHAIN)-objcopy
GDB = $(TOOLCHAIN)-gdb
LDSCRIPT = $(wildcard *.ld)
SRCS = $(wildcard *.s)
SRCC = $(wildcard *.c)
OBJS = $(SRCS:.s=.o)
OBJS +=$(SRCC:.c=.o)
CFLAGS = -Wall -Wextra -mcpu=cortex-m3 -mthumb -c -Ilib $(DEBUG)
AFLAGS = -g -mcpu=cortex-m3 -mthumb
LDFLAGS = -g -T $(LDSCRIPT)
OCPFLAGS_HEX = -O ihex
default: $(TARGET).hex
%.o: %.s
$(AS) $(AFLAGS) $< -o $@
%.o: %.c
$(CC) $(CFLAGS) $< -o $@
$(TARGET).elf: $(OBJS) $(LDSCRIPT)
$(LD) $(LDFLAGS) $(OBJS) -o $@
$(TARGET).hex: $(TARGET).elf
$(OCP) $(OCPFLAGS_HEX) $< $@
clean:
rm -f ./*.o ./*.elf ./*.bin ./*.syms ./*.hex
symbols: $(TARGET).elf
$(TOOLCHAIN)-nm -n $<
flash: $(TARGET).hex
st-flash --format ihex write $(TARGET).hex
debug: $(TARGET).elf
$(GDB) --eval-command="target extended-remote :4242" $(TARGET).elf
builds the project and produces the output .hex-file that is ready to be flashed.
tries to flash the program to the microcontroller using the ST-Link utilities.
outputs the symbols in the .elf file, useful for debugging
tries to run the GNU-debugger with the program.
The next steps is creating a useful program. Take a look at my GitHub repository for code and libraries. I have continued on this work and written
USART implementation
I2C implementation
USB HID implementation
Use of hardware timers
Asynchronous programs using interrupt handlers
Setup with a 72 MHz system clock using PLL with a crystal oscillator
A more advanced project using all of these components and libraries is my USB intertial headtracker.