The limitation has been the Von Neumann architecture of clock signals and interrupts, memory channels all as deadlocks for computation. The problem is when you try to simulate using this architecture, much like the early efforts using the Nvidia Tesla and the AMD Radeon GPUs, what you get is a chipset that requires nearly a 1000 watts of power and generates tons of heat. Why? Because it's all running at full throttle.
Instead IBM has broken away and is using an event driven model. That's not to say that it wont be still producing a lot of heat in full use, but it is much more efficient without tying up the memory bus. Still, computing with just 4000 cores, it's not quite a brain yet, and the one million neuron number may be the number of neurons driven with this architecture, that is still simulation. We still are waiting for a true million core neural chip.
Given that our digital hardware is equivalent to a software
model, one can ask: why not take the software model itself
and translate it into hardware directly? This would corre-
spond to an ASIC implementation of the software simulator.
Unfortunately this leads to a highly inefficient implemen-
tation, because the software has been written assuming a
von Neumann model of computation. Specifically, the von
Neumann architecture separates memory and computation, and
therefore requires high-bandwidth to communicate spikes to
off-chip routing tables, leading to high power consumption.
Furthermore, the parallel and event-driven computation of the
brain does not map well to the sequential processing model
of conventional computers. In sharp contrast, we implement
fanout by integrating crossbar memory with neurons to keep
data movement local, and use an asynchronous event-driven
design where each circuit evaluates in parallel and without any
clock, dissipating power only when absolutely necessary .
These architectural choices lead to dense integrated synapses
while delivering ultra-low active power and guaranteeing real-
time performance" - A Digital Neurosynaptic Core Using Embedded
Crossbar Memory with 45pJ per Spike in 45nm
Paul Merolla, ...