The author of my previous quote is very clearly implying that a 96 kHz sampling rate is audibly better and a 192 kHz sampling rate is even better than that. And the implication is that the time distance between the samples can affect the imaging vis-a-vis a time delay.
I was not addressing the author, I was addressing the specific question/issue you presented about the interchannel time delays that he mentioned.

I agree that a phenomenon called "interaural time delay" can be detected by humans and is used in conjunction with intensity differences to determine a direction for sound. But like I said I do not see how time delay is related in any way to bit sampling rate.
I don't know what it has to do with bit sampling rate, either. BUt as for frequency sampling rate, how else would you propose to record/playback at a 5us or 1.5us accuracy with 44.1Khz? While the audible spectral informatin will be recorded with redbook format, the specific coordinates in time will be shifted into what can be stored in a 44.1kHz rate.

Here is an over-simplified illustration of my understanding of this phenomena and how it related to sampling frequency:

H=2mm(approx. 6us)(0dB)
UUUUUUUU=17mm(50us)(1 cycle 20khz sine wave)

Potential difference example:

170Khz bandwidth limited
L:
HHHHHUUUUUUUUHHHUUUUUUUUHHHHHHHHHUUUUUUUUHH
R:
HHHHUUUUUUUUHHHHUUUUUUUUHHHHHHHHHHHUUUUUUUU

20kHz bandwidth limited
L:
HHHHHHHHUUUUUUUU UUUUUUUUHHHHHHHHUUUUUUUU
R:
HHHHHHHHUUUUUUUU UUUUUUUUHHHHHHHHUUUUUUUU

The higher bandiwdth can allow for the interchannel time difference to exist at finer resolution, as illustrated in the crude graphic above. While a 20kHz cycle is 50us in duration, the actual time at where this amplitude can actually originate is not limited. The 20kHz wavelength can begin at 200us or 204us or 201.3486 us, etc. Lower sampling rated reduces this possible difference relative the sampling rate limits.

If my understanding is wrong, please explain.

All of this and how it relates to audibility are a different issue.

-Chris