Humble's Blog: August 2015

14 August 2015

Mutex and Semaphore

Thread is a light weight process which shares the memory.
- Multiple threads allows the programmer to run particular job independent of all the others. ex:- spell check in wordpad.
- Multiple threads can run on multiple CPUs, providing a performance improvement.

Multithreaded applications requires synchronization.
- mutex - Only the thread that locks a mutex can unlock it.
- semaphore - Binary semaphore is equal to mutex, which can be unlocked by other threads. ex:- with semaphore, a thread to wait for other threads.

The pieces of code protected by mutex and semaphore is called Critical Section.

Implementing Semaphores on ARM Processors
- Semaphores are used to manage access to a shared resource. Unfortunately, semaphore themselves are shared resources. Who will protect semaphore? ha ha ha...
- In single core system, easy way to avoid the issue is, preventing any other interrupts from being served while we access (read–modify–write) the semaphore.

MRS   r12, CPSR        ; read CPSR
ORR   r12, r12, #I_bit ; set I bit
MSR   CPSR_c, r12      ; write back CPSR

CPSID i                ; disable IRQ

- In multi core system, we need a mechanism to prevent the other core from accessing the system bus, while one task in one core carries out the read–modify–write sequence. SWP disables interrupt and blocks system bus, causing critical performance bottleneck.

LOCKED EQU 0         ; define value indicating

 LDR   r1, <addr>    ; load semaphore address
 LDR   r0, =LOCKED   ; preload "locked" value

spin_lock
 SWP   r0, r0, [r1]  ; swap register value with semaphore
 CMP   r0, #LOCKED   ; if semaphore was locked already
 BEQ   spin_lock     ;     retry

- A new, non-blocking method is Exclusive load (LDREX) (reads and tags the memory) and Exclusive store (STREX) (stores data to memory only if the tag is still valid). With this mechanism, bus masters won't be locked out from memory access altogether, but only if they access the same memory.

LOCKED EQU 0           ; define value indicating

 LDR     r12, <addr>   ; preload semaphore address
 LDR     r1, =LOCKED   ; preload "locked" value

spin_lock
 LDREX   r0, [r12]     ; load semaphore value
 CMP     r0, #LOCKED   ; if semaphore was locked already
 STREXNE r0, r1, [r12] ;    try to claim
 CMPNE   r0, #1        ;    and check success
 BEQ     spin_lock     ; retry if claiming semaphore failed.

Reference:
http://softpixel.com/~cwright/programming/threads/threads.c.php
http://koti.mbnet.fi/niclasw/MutexSemaphore.html
https://www.doulos.com/knowhow/arm/Hints_and_Tips/Implementing_Semaphores/

13 August 2015

PCIE

PCIE consists of 3 layers:
1. The Transaction Layer - 
 - Transaction Layer Packet (TLP)
2. The Data Link Layer -
 a. This layer adds DLL header (2 bytes) and CRC at the end.
    Called Data Link Layer Packets (DLLPs).
 - With CRC TLP’s integrity is assured.
 - An ack-retransmit mechanism makes sure no TLPs are lost. ie. reliability is assured.
 c. A flow control mechanism makes sure a packet is sent and received.
 d. Makes sure NO TLP delivery fails.
 e. Packet reordering
3. The Physical Layer -

- Most TLPs are routed by ID, which is a combination of Bus number, Device number and Function number.
- Bus mastering allows peripheral to exchange TLPs with peer peripherals.
- TLP on the bus generates PCIE interrupt. ie. a Write Request, with a special address, which the host has written into the peripheral’s configuration space during initialization.

- Vendors of FPGA devices provide a Transaction Layer front-end IP core to use with application logic.
- PCIE switch allow more devices to connect to a single Root Port.
- pCIE bridge provides an interface to other buses.

- PCIE BUS ENUMERATION
a. OS addresses PCI devises through PCIE controller, using IDSEL (Initialization Device Select) signal.
b. Bus enumeration is performed by attempting to read the vendor register and device ID register for each combination of bus number and device number at the device's function #0.
Initialization Device Select signal (IDSEL)
c. When a read to a specified B/D/F combination for the vendor ID register succeeds, OS knows that it exists; it writes all ones to its BARs and reads back the device's requested memory size in an encoded form.
d. Now OS programs the memory-mapped and I/O port addresses into the device's BAR configuration register.
e. If a PCI-to-PCI bridge is found, enumeration continues on that secondary bus.

- PCIE BUS ARBITRATION
Arbitration signals (REQ# and GNT#) are used to obtain permission for transaction.
PCIE requests with REQ# and should wait for GNT# from an arbiter located on the motherboard.

Reference:
http://www.xillybus.com/tutorials/pci-express-tlp-pcie-primer-tutorial-guide-1
http://www.xillybus.com/tutorials/pci-express-tlp-pcie-primer-tutorial-guide-2
http://www.xillybus.com/tutorials/pci-express-dma-requests-completions
http://rts.lab.asu.edu/web_438/CSE438_598_slides_yhlee/438_5_PCI_Architecture.pdf
https://en.wikipedia.org/wiki/Conventional_PCI

09 August 2015

AMBA

Different AMBA buses are:
1. AMBA 1 Advanced System Bus (ASB) [1996]
2. AMBA 1 Advanced Peripheral Bus (APB) [1996]
3. AMBA 2 High-performance Bus (AHB) - widely used on ARM Cortex-M based designs
4. AMBA 3 AMBA Extensible Interface (AXI) [2003]
5. AMBA 4 AMBA Extensible Interface 4 (AXI4) [2010]
6. AMBA 4 AXI Coherency Extensions (ACE) [2011]
7. AMBA 5 Coherent Hub Interface (CHI) [2013]

Reference:
https://en.wikipedia.org/wiki/Advanced_Microcontroller_Bus_Architecture
https://www.doulos.com/knowhow/