So how does hyperthreading actually work at the hardware level? (I know the general premise) Because it seems like some people (parroting information about presumably the first implementation) claim that each core remains entirely single-threaded and it's solely a trick on OS schedulers (and the pipelining/prefetching circuitry), while other people claim that the majority of each core is duplicated with only the largest circuitry blocks shared between the logical cores?