Most manufacturers would simply attempt to use SMP to distribute TMOS process across multiple processors—with shared memory, network card, and special purpose processors. Others might attempt to run multiple instances of the TMM on different processors—still with the requisite shared memory, network card, and special-purpose processors. Instead, CMP(clustered multiprocessing) enables load balancing of multiple processing cores, each with its own dedicated memory, network interface, and special-purpose processors. Each core runs its own, completely independent TMM process. By separating the dependencies between the instances, CMP allows more of the traffic management process virtually the entire process to be parallelized. This provides a substantial benefit to the overall performance of the system.The hardware that enables CMP is comprised of two important, proprietary F5 technologies: the Disaggregator and the High Speed Bridge (HSB).
The Disaggregator acts as a hardware-based load balancer, distributing traffic flows between the independent TMM instances and managing flow affinity if or when necessary. Not only does this facilitate a near 1:1 linear performance growth (doubling the number of processing cores nearly doubles the computing power with no diminished returns), but it completely virtualizes the processing cores from the system and the other cores. This provides high availability and reliability in the event that any core becomes non-functional.
The HSB delivers direct, non-blocking communication between the TMM instances and the outside world without the loss normally associated with Ethernet interconnects. It also provides the streamlined message-passing interface that enables TMM instances to share information. This provides the unsurpassed throughput and interconnectivity of each processor’s dedicated network interfaces. It also mitigates the performance impact of inter-process communications in the few remaining instances where it takes place.

The rules has been changed by CMP
The amount of performance increase that can be expected by parallelizing a process is a factor of the amount of the process that can truly be parallelized. If a process requiring 10 units of time can only be 50 percent parallelized, the process will never run in less than five units, even if the parallelized portion is processed instantly. As a result, the entire process can never be more than twice as fast.
Up until now, the game has been pretty simple—and widely understood. First, it was to optimize your code to run on a single processor as best you can and ride the “Intel power-curve.” Then, it was to optimize your code for SMP or AMP and then build your platforms with as many processing cores as possible. All the while, performance improvements have slowly dwindled to miniscule amounts.
CMP changes the rules of the game. Instead of working to continually improve the performance of a never-changing proportion of parallelized processes, CMP’s most basic tenant is to change that proportion. Continuing improvements in performance can only be realized by increasing the amount of the application delivery process that can be parallelized. Only parallelizing nearly all of that process can enable near 1:1 linear scaling—fully utilizing all the processing cores.


