@KeyJ it sits right at the intersection of everything that SW remains stubbornly bad at:
- bunch of compute in the setup/address calc part that either needs atypical primitive operations, wants/prefers odd widths, or both
- big memory access latency hiding path in the middle (with giant FIFOs or even some out-of-order completion)
- needs tight integration with memory subsystem it feeds from
- more oddball caches, tex decomp blocks, and odd ops/sizes in texel data return/actual filter path