I have been spending time with Jorge Aparicio's RTFM for Cortex M3 framework for writing Rust to target Cortex-M3 devices from Arm (and particularly the STM32F103 from ST Microelectronics). Jorge's work in this area has been of interest to me ever since I discovered him working on this stuff a while ago. I am very tempted by the idea of being able to implement code for the STM32 with the guarantees of Rust and the language features which I have come to love such as the trait system.

I have been thinking to myself that, while I admire and appreciate the work done on the GNUK, I would like to, personally, have a go at implementing some kind of security token on an STM32 as a USB device. And with the advent of the RTFM for M3 work, and Jorge's magical tooling to make it easier to access and control the registers on an M3 microcontroller, I figured it'd be super-nice to do this in Rust, with all the advantages that entails in terms of isolating unsafe behaviour and generally having the potential to be more easily verified as not misbehaving.

To do this though, means that I need a USB device stack which will work in the RTFM framework. Sadly it seems that, thus-far, only Jorge has been working on drivers for any of the M3 devices his framework supports. And one person can only do so much. So, in my infinite madness, I decided I should investigate the complexity of writing a USB device stack in Rust for the RTFM/M3 framework. (Why I thought this was a good idea is lost to the mists of late night Googling, but hey, it might make a good talk at the next conference I go to). As such, this blog post, and further ones along these lines, will serve as a partial tour of what I'm up to, and a partial aide-memoir for me about learning USB. If I get something horribly wrong, please DO contact me to correct me, otherwise I'll just continue to be wrong. If I've simplified something but it's still strictly correct, just let me know if it's an oversimplification since in a lot of cases there's no point in me putting the full details into a blog posting. I will mostly be considering USB2.0 protocol details but only really for low and full speed devices. (The hardware I'm targetting does low-speed and full-speed, but not high-speed. Though some similar HW does high-speed too, I don't have any to hand right now)

A brief introduction to USB

In order to go much further, I needed a grounding in USB. It's a multi-layer protocol as you might expect, though we can probably ignore the actual electrical layer since any device we might hope to support will have to have a hardware block to deal with that. We will however need to consider the packet layer (since that will inform how the hardware block is implemented and thus its interface) and then the higher level protocols on top.

USB is a deliberately asymmetric protocol. Devices are meant to be significantly easier to implement, both in terms of hardware and software, as compared with hosts. As such, despite some STM32s having OTG ports, I have no intention of supporting host mode at this time.

USB is arranged into a set of busses which are, at least in the USB1.1 case, broadcast domains. As such, each device has an address assigned to it by the host during an early phase called 'configuration'. Once the address is assigned, the device is expected to only ever respond to messages addressed to it. Note that since everything is asymmetric in USB, the device can't send messages on its own, but has to be asked for them by the host, and as such the addressing is always from host toward device.

USB devices then expose a number of endpoints through which communication can flow IN to the host or OUT to the device. Endpoints are not bidirectional, but the in and out endpoints do overlap in numbering. There is a special pair of endpoints, IN0 and OUT0 which, between them, form what I will call the device control endpoints. The device control endpoints are important since every USB device MUST implement them, and there are a number of well defined messages which pass over them to control the USB device. In theory a bare minimum USB device would implement only the device control endpoints.

Configurations, and Classes, and Interfaces, Oh My!

In order for the host to understand what the USB device is, and what it is capable of, part of the device control endpoints' responsibility is to provide a set of descriptors which describe the device. These descriptors form a heirarchy and are then glommed together into a big lump of data which the host can download from the device in order to decide what it is and how to use it. Because of various historical reasons, where a multi-byte value is used, they are defined to be little-endian, though there are some BCD fields. Descriptors always start with a length byte and a type byte because that way the host can parse/skip as necessary, with ease.

The first descriptor is the device descriptor, is a big one, and looks like this:

Device Descriptor
Field Name Byte start Byte length Encoding Meaning
bLength 0 1 Number Size of the descriptor in bytes (18)
bDescriptorType 1 1 Constant Device Descriptor (0x01)
bcdUSB 2 2 BCD USB spec version compiled with
bDeviceClass 4 1 Class Code, assigned by USB org (0 means "Look at interface descriptors", common value is 2 for CDC)
bDeviceSubClass 5 1 SubClass Code, assigned by USB org (usually 0)
bDeviceProtocol 6 1 Protocol Code, assigned by USB org (usually 0)
bMaxPacketSize 7 1 Number Max packet size for IN0/OUT0 (Valid are 8, 16, 32, 64)
idVendor 8 2 ID 16bit Vendor ID (Assigned by USB org)
idProduct 10 2 ID 16bit Product ID (Assigned by manufacturer)
bcdDevice 12 2 BCD Device version number (same encoding as bcdUSB)
iManufacturer 14 1 Index String index of manufacturer name (0 if unavailable)
iProduct 15 1 Index String index of product name (0 if unavailable)
iSerialNumber 16 1 Index String index of device serial number (0 if unavailable)
bNumConfigurations 17 1 Number Count of configurations the device has.

This looks quite complex, but breaks down into a relatively simple two halves. The first eight bytes carries everything necessary for the host to be able to configure itself and the device control endpoints properly in order to communicate effectively. Since eight bytes is the bare minimum a device must be able to transmit in one go, the host can guarantee to get those, and they tell it what kind of device it is, what USB protocol it supports, and what the maximum transfer size is for its device control endpoints.

The encoding of the bcdUSB and bcdDevice fields is interesting too. It is of the form 0xMMmm where MM is the major number, mm the minor. So USB2.0 is encoded as 0x0200, USB1.1 as 0x0110 etc. If the device version is 17.36 then that'd be 0x1736.

Other fields of note are bDeviceClass which can be 0 meaning that interfaces will specify their classes, and idVendor/idProduct which between them form the primary way for the specific USB device to be identified. The Index fields are indices into a string table which we'll look at later. For now it's enough to know that wherever a string index is needed, 0 can be provided to mean "no string here".

The last field is bNumConfigurations and this indicates the number of ways in which this device might function. A USB device can provide any number of these configurations, though typically only one is provided. If the host wishes to switch between configurations then it will have to effectively entirely quiesce and reset the device.

The next kind of descriptor is the configuration descriptor. This one is much shorter, but starts with the same two fields:

Configuration Descriptor
Field Name Byte start Byte length Encoding Meaning
bLength 0 1 Number Size of the descriptor in bytes (9)
bDescriptorType 1 1 Constant Configuration Descriptor (0x02)
wTotalLength 2 2 Number Size of the configuration in bytes, in total
bNumInterfaces 4 1 Number The number of interfaces in this configuration
bConfigurationValue 5 1 Number The value to use to select this configuration
iConfiguration 6 1 Index The name of this configuration (0 for unavailable)
bmAttributes 7 1 Bitmap Attributes field (see below)
bMaxPower 8 1 Number Maximum bus power this configuration will draw (in 2mA increments)

An important field to consider here is the bmAttributes field which tells the host some useful information. Bit 7 must be set, bit 6 is set if the device would be self-powered in this configuration, bit 5 indicates that the device would like to be able to wake the host from sleep mode, and bits 4 to 0 must be unset.

The bMaxPower field is interesting because it encodes the power draw of the device (when set to this configuration). USB allows for up to 100mA of draw per device when it isn't yet configured, and up to 500mA when configured. The value may be used to decide if it's sensible to configure a device if the host is in a low power situation. Typically this field will be set to 50 to indicate the nominal 100mA is fine, or 250 to request the full 500mA.

Finally, the wTotalLength field is interesting because it tells the host the total length of this configuration, including all the interface and endpoint descriptors which make it up. With this field, the host can allocate enough RAM to fetch the entire configuration descriptor block at once, simplifying matters dramatically for it.

Each configuration has one or more interfaces. The interfaces group some endpoints together into a logical function. For example a configuration for a multifunction scanner/fax/printer might have an interface for the scanner function, one for the fax, and one for the printer. Endpoints are not shared among interfaces, so when building this table, be careful.

Next, logically, come the interface descriptors:

Interface Descriptor
Field Name Byte start Byte length Encoding Meaning
bLength 0 1 Number Size of the descriptor in bytes (9)
bDescriptorType 1 1 Constant Interface Descriptor (0x04)
bInterfaceNumber 2 1 Number The number of the interface
bAlternateSetting 3 1 Number The interface alternate index
bNumEndpoints 4 1 Number The number of endpoints in this interface
bInterfaceClass 5 1 Class The interface class (USB Org defined)
bInterfaceSubClass 6 1 SubClass The interface subclass (USB Org defined)
bInterfaceProtocol 7 1 Protocol The interface protocol (USB Org defined)
iInterface 8 1 Index The name of the interface (or 0 if not provided)

The important values here are the class/subclass/protocol fields which provide a lot of information to the host about what the interface is. If the class is a USB Org defined one (e.g. 0x02 for Communications Device Class) then the host may already have drivers designed to work with the interface meaning that the device manufacturer doesn't have to provide host drivers.

The bInterfaceNumber is used by the host to indicate this interface when sending messages, and the bAlternateSetting is a way to vary interfaces. Two interfaces with the came bInterfaceNumber but different bAlternateSettings can be switched between (like configurations, but) without resetting the device.

Hopefully the rest of this descriptor is self-evident by now.

The next descriptor kind is endpoint descriptors:

Endpoint Descriptor
Field Name Byte start Byte length Encoding Meaning
bLength 0 1 Number Size of the descriptor in bytes (7)
bDescriptorType 1 1 Constant Endpoint Descriptor (0x05)
bEndpointAddress 2 1 Endpoint Endpoint address (see below)
bmAttributes 3 1 Bitmap Endpoint attributes (see below)
wMaxPacketSize 4 2 Number Maximum packet size this endpoint can send/receive
bInterval 6 1 Number Interval for polling endpoint (in frames)

The bEndpointAddress is a 4 bit endpoint number (so there're 16 endpoint indices) and a bit to indicate IN vs. OUT. Bit 7 is the direction marker and bits 3 to 0 are the endpoint number. This means there are 32 endpoints in total, 16 in each direction, 2 of which are reserved (IN0 and OUT0) giving 30 endpoints available for interfaces to use in any given configuration. The bmAttributes bitmap covers the transfer type of the endpoint (more below), and the bInterval is an interval measured in frames (1ms for low or full speed, 125µs in high speed). bInterval is only valid for some endpoint types.

The final descriptor kind is for the strings which we've seen indices for throughout the above. String descriptors have two forms:

String Descriptor (index zero)
Field Name Byte start Byte length Encoding Meaning
bLength 0 1 Number Size of the descriptor in bytes (variable)
bDescriptorType 1 1 Constant String Descriptor (0x03)
wLangID[0] 2 2 Number Language code zero (e.g. 0x0409 for en_US)
wLangID[n] 4.. 2 Number Language code n ...

This form (for descriptor 0) is that of a series of language IDs supported by the device. The device may support any number of languages. When the host requests a string descriptor, it will supply both the index of the string and also the language id it desires (from the list available in string descriptor zero). The host can tell how many language IDs are available simply by dividing bLength by 2 and subtracting 1 for the two header bytes.

And for string descriptors of an index greater than zero:

String Descriptor (index greater than zero)
Field Name Byte start Byte length Encoding Meaning
bLength 0 1 Number Size of the descriptor in bytes (variable)
bDescriptorType 1 1 Constant String Descriptor (0x03)
bString 2.. .. Unicode The string, in "unicode" format

This second form of the string descriptor is simply the the string is in what the USB spec calls 'Unicode' format which is, as of 2005, defined to be UTF16-LE without a BOM or terminator.

Since string descriptors are of a variable length, the host must request strings in two transactions. First a request for 2 bytes is sent, retrieving the bLength and bDescriptorType fields which can be checked and memory allocated. Then a request for bLength bytes can be sent to retrieve the entire string descriptor.

Putting that all together

Phew, this is getting to be quite a long posting, so I'm going to leave this here and in my next post I'll talk about how the host and device pass packets to get all that information to the host, and how it gets used.