Previously we talked about all the different kinds of descriptors which USB devices use to communicate their capability. This is important stuff because to write any useful USB device firmware we need to be able to determine how to populate our descriptors. However, having that data on the device is entirely worthless without an understanding of how it gets from the device to the host so that it can be acted upon. To understand that, let's look at the USB wire protocol.

Note, I'll again be talking mostly about USB2.0 low- and full-speed. I believe that high speed is approximately the same but with faster wires, except not quite that simple.

Down to the wire

I don't intend to talk about the actual electrical signalling, though it's not un-reasonable for you to know that USB is a pair of wires forming a differentially signalled bidirectional serial communications link. The host is responsible for managing all the framing and timing on the link, and for formatting the communications into packets.

There are a number of packet types which can appear on the USB link:

Packet type Purpose
Token Packet When the host wishes to send a message to the Control endpoint to configure the device, read data IN, or write data OUT, it uses this to start the transaction.
Data(0/1) Packet Following a Setup, In, or Out token, a Data packet is a transfer of data (in either direction). The 0 and 1 alternate to provide a measure of confidence against lost packets.
Handshake Packet Following a data packet of some kind, the other end may ACK the packet (all was well), NAK the packet (report that the device cannot, temporarily, send/receive data, or that an interrupt endpoint isn't triggered), or STALL the bus in which case the host needs to intervene.
Start of Frame Every 1ms (full-speed) the host will send a SOF packet which carries a frame number. This can be used to help keep time on very simple devices. It also divides the bus into frames within which bandwidth is allocated.

As an example, when the host wishes to perform a control transfer, the following packets are transacted in turn:

  1. Setup Token - The host addresses the device and endpoint (OUT0)
  2. Data0 Packet - The host transmits a GET_DESCRIPTOR for the device descriptor
  3. Ack Packet - The device acknowledges receipt of the request

This marks the end of the first transaction. The device decodes the GET_DESCRIPTOR request and prepares the device descriptor for transmission. The transmission occurs as the next transaction on the bus. In this example, we're assuming 8 byte maximum transmission sizes, for illustrative purposes.

  1. In Token - The host addresses the device and endpoint (IN0)
  2. Data1 Packet - The device transmits the first 8 bytes of the descriptor
  3. Ack Packet - The host acknowledges the data packet
  4. In Token - The host addresses the device and endpoint (IN0)
  5. Data0 Packet - The device transmits the remaining 4 bytes of the descriptor (padded)
  6. Ack Packet - The host acknowledges the data packet

The second transaction is now complete, and the host has all the data it needs to proceed. Finally a status transaction occurs in which:

  1. Out Token - The host addresses the device and endpoint (OUT0)
  2. Data1 Packet - The host transmits a 0 byte data packet to indicate successful completion
  3. Ack Packet - The device acknowledges the completion, indicating its own satisfaction

And thus ends the full control transaction in which the host retrieves the device descriptor.

From a high level, we need only consider the activity which occurs at the point of the acknowledgement packets. In the above example:

  1. On the first ACK the device prepares IN0 to transmit the descriptor, readying whatever low level device stack there is with a pointer to the descriptor and its length in bytes.
  2. On the second ACK the low levels are still thinking.
  3. On the third ACK the transmission from IN0 is complete and the endpoint no longer expects to transfer data.
  4. On the fourth ACK the control transaction is entirely complete.

Thinking at the low levels of the control interface

Before we can build a high level USB stack, we need to consider the activity which might occur at the lower levels. At the low levels, particularly of the device control interface, work has to be done at each and every packet. The hardware likely deals with the token packet for us, leaving the data packets for us to process, and the resultant handshake packets will be likely handled by the hardware in response to our processing the data packets.

Since every control transaction is initiated by a setup token, let's look at the setup requests which can come our way...

Setup Packet (Data) Format
Field Name Byte start Byte length Encoding Meaning
bmRequestType 0 1 Bitmap Describes the kind of request, and the target of it. See below.
bRequest 1 1 Code The request code itself, meanings of the rest of the fields vary by bRequest
wValue 2 2 Number A 16 bit value whose meaning varies by request type
wIndex 4 2 Number A 16 bit value whose meaning varies by request type but typically encodes an interface number or endpoint.
wLength 6 2 Number A 16 bit value indicating the length of the transfer to come.

Since bRequest is essentially a switch against which multiple kinds of setup packet are selected between, here's the meanings of a few...

GET_DESCRIPTOR (Device) setup packet
Field Name Value Meaning
bmRequestType 0x08 Data direction is IN (from device to host), recipient is the device
bRequest 0x06 GET_DESCRIPTOR (in this instance, the device descriptor is requested)
wValue 0x0001 This means the device descriptor
wIndex 0x0000 Irrelevant, there's only 1 device descriptor anyway
wLength 12 This is the length of a device descriptor (12 bytes)
SET_ADDRESS to set a device's USB address
Field Name Value Meaning
bmRequestType 0x00 Data direction is OUT (from host to device), recipient is the device
bRequest 0x05 SET_ADDRESS (Set the device's USB address)
wValue 0x00nn The address for the device to adopt (max 127)
wIndex 0x0000 Irrelevant for address setting
wLength 0 There's no data transfer expected for this setup operation

Most hardware blocks will implement an interrupt at the point that the Data packet following the Setup packet has been receive. This is typically called receiving a 'Setup' packet and then it's up to the device stack low levels to determine what to do and dispatch a handler. Otherwise an interrupt will fire for the IN or OUT tokens and if the endpoint is zero, the low level stack will handle it once more.

One final thing worth noting about SET_ADDRESS is that it doesn't take effect until the completion of the zero-length "status" transaction following the setup transaction. As such, the status request from the host will still be sent to address zero (the default for new devices).

A very basic early "packet trace"

This is an example, and is not guaranteed to be the packet sequence in all cases. It's a good indication of the relative complexity involved in getting a fresh USB device onto the bus though...

When a device first attaches to the bus, the bus is in RESET state and so the first event a device sees is a RESET which causes it to set its address to zero, clear any endpoints, clear the configuration, and become ready for control transfers. Shortly after this, the device will become suspended.

Next, the host kicks in and sends a port reset of around 30ms. After this, the host is ready to interrogate the device.

The host sends a GET_DESCRIPTOR to the device, whose address at this point is zero. Using the information it receives from this, it can set up the host-side memory buffers since the device descriptor contains the maximum transfer size which the device supports.

The host is now ready to actually 'address' the device, and so it sends another reset to the device, again around 30ms in length.

The host sends a SET_ADDRESS control request to the device, telling it that its new address is nn. Once the acknowledgement has been sent from the host for the zero-data status update from the device, the device sets its internal address to the value supplied in the request. From now on, the device shall respond only to requests to nn rather than to zero.

At this point, the host will begin interrogating further descriptors, looking at the configuration descriptors and the strings, to build its host-side representation of the device. These will be GET_DESCRIPTOR and GET_STRING_DESCRIPTOR requests and may continue for some time.

Once the host has satisfied itself that it knows everything it needs to about the device, it will issue a SET_CONFIGURATION request which basically starts everything up in the device. Once the configuration is set, interrupt endpoints will be polled, bulk traffic will be transferred, Isochronous streams begin to run, etc.

Okay, but how do we make this concrete?

So far, everything we've spoken about has been fairly abstract, or at least "soft". But to transfer data over USB does require some hardware. (Okay, okay, we could do it all virtualised, but there's no fun in that). The hardware I'm going to be using for the duration of this series is the STM32 on the blue-pill development board. This is a very simple development board which does (in theory at least) support USB device mode.

If we view the schematic for the blue-pill, we can see a very "lightweight" USB interface which has a pullup resistor for D+. This is the way that a device signals to the host that it is present, and that it wants to speak at full-speed. If the pullup were on D- then it would be a low-speed device. High speed devices need a little more complexity which I'm not going to go into for today.

The USB lines connect to pins PA11 and PA12 which are the USB pins on the STM32 on the board. Since USB is quite finicky, the STM32 doesn't let you remap that function elsewhere, so this is all looking quite good for us so far.

The specific STM32 on the blue-pill is the STM32F103C8T6. By viewing its product page on ST's website we can find the reference manual for the part. Jumping to section 23 we learn that this STM32 supports full-speed USB2.0 which is convenient given the past article and a half. We also learn it supports up to eight endpoints active at any one time, and offers double-buffering for our bulk and isochronous transfers. It has some internal memory for packet buffering, so it won't use our RAM bandwidth while performing transfers, which is lovely.

I'm not going to distill the rest of that section here, because there's a large amount of data which explains how the USB macrocell operates. However useful things to note are:

  • How IN OUT and SETUP transfers work.
  • How the endpoint buffer memory is configured.
  • That all bus-powered devices MUST respond to suspend/resume properly
  • That the hardware will prioritise endpoint interrupts for us so that we only need deal with the most pressing item at any given time.
  • There is an 'Enable Function' bit in the address register which must be set or we won't see any transactions at all.
  • How the endpoint registers signal events to the device firmware.

Next time, we're going to begin the process of writing a very hacky setup routine to try and initialise the USB device macrocell so that we can see incoming transactions through the ITM. It should be quite exciting, but given how complex this will be for me to learn, it might be a little while before it comes through.