News
Currently, video question answering (VideoQA) algorithms relying on video-text pretraining models employ intricate unimodal encoders and multimodal fusion Transformers, which often lead to decreased ...
As such, a natural question arises as to how to expand the receptive fields in convolutional neural networks to achieve the superior performance comparable with that of transformers. Large kernel ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results