higher resolution version siglip2_decoder
#3
by szlgallen - opened
Hi! Thank you for your great work!
I noticed that the current open-sourced version only supports 224×224 image generation. May I ask if there are any plans to release a higher-resolution SigLIP2 decoder, similar to the DINOv2-B_512 configuration in RAE?
Additionally, is there any approach that would allow the current model to directly support higher-resolution image decoding (e.g., 512×512)? One advantage of VAEs is that they typically do not require training separate models for different input resolutions, so I was wondering if a similar flexibility could be achieved here.