torch.Size([1, 8192, 1024]) torch.Size([1, 1024]) this is 8frame output last hiddent state, the tokens is extremly large.