Abstract: Neural networks assume training and test data share the same distribution and label space, but real-world violations degrade performance when domain and category shifts occur simultaneously.
Model implementations with various configurations (native ViT, ResNet+ViT hybrid, different patch/heads/blocks setups, Stochastic Depth/DropPath, etc.) Training and ...