yeah but that doesn't mean anything, does it? I don't think they just tokenize the raw audio, that wouldn't make sense, right?