Monday, December 15, 2008

Hyper-threading

Hyper-threading, officially called Hyper-Threading Technology (HTT), is Intel's trademark for their implementation of the simultaneous multithreading technology on the Pentium 4 microarchitecture. It is a more advanced form of Super-threading that debuted on the Intel Xeon processors and was later added to Pentium 4 processors.
The technology improves processor performance under certain workloads by providing useful work for execution units that would otherwise be idle, for example during a cache miss. A Pentium 4 with Hyper-Threading enabled is treated by the operating system as two processors instead of one.
Older Pentium 4 based CPUs use Hyper-Threading, but the current-generation cores, Merom, Conroe and Woodcrest, do not. Hyper-Threading is a specialized form of simultaneous multithreading, which has been said to be on Intel's plans for the generation after Merom, Conroe and Woodcrest.
Normal multithreading operating systems allow multiple processes and threads to utilize the processor one at a time, giving exclusive ownership to a particular thread for a time slice in the order of milliseconds - this is called Temporal multithreading. Quite often, a process will stall for hundreds of cycles while waiting for some external resource (for example, a RAM load), thus lowering processor efficiency.
A successive improvement is super-threading, where the processor can execute instructions from a different thread each cycle. Thus cycles left unused by a thread can be used by another that is ready to run.
Still, a given thread is almost surely not utilizing all the multiple execution units of a modern processor at the same time. Simultaneous multithreading allows multiple threads to execute different instructions in the same clock cycle, using the execution units that the first thread left spare. This is done without great changes to the basic processor architecture: the main additions needed are the ability to fetch instructions from multiple threads in a cycle, and a larger register file to hold data from multiple threads. The number of concurrent threads can be decided by the chip designers, but practical restrictions on chip complexity have limited the number to two for most SMT implementations.
Since the technique is really an efficiency solution, and there is inevitable increased conflict on shared resources, measuring or agreeing on the "goodness" of the solution can be difficult. Some researchers have shown that the extra threads can be used to proactively seed a shared resource like a cache, to improve the performance of another single thread, and claim this shows that SMT is not just an efficiency solution. Others use SMT to provide redundant computation, for some level of error detection and recovery.
But, in most current cases, SMT is about hiding memory latency, efficiency and increased throughput of computations per amount of hardware used.