new

Get trending papers in your email inbox!

Subscribe

Daily Papers

byAK and the research community

Feb 25

Can MLLMs Understand the Deep Implication Behind Chinese Images?

As the capabilities of Multimodal Large Language Models (MLLMs) continue to improve, the need for higher-order capability evaluation of MLLMs is increasing. However, there is a lack of work evaluating MLLM for higher-order perception and understanding of Chinese visual content. To fill the gap, we introduce the **C**hinese **I**mage **I**mplication understanding **Bench**mark, **CII-Bench**, which aims to assess the higher-order perception and understanding capabilities of MLLMs for Chinese images. CII-Bench stands out in several ways compared to existing benchmarks. Firstly, to ensure the authenticity of the Chinese context, images in CII-Bench are sourced from the Chinese Internet and manually reviewed, with corresponding answers also manually crafted. Additionally, CII-Bench incorporates images that represent Chinese traditional culture, such as famous Chinese traditional paintings, which can deeply reflect the model's understanding of Chinese traditional culture. Through extensive experiments on CII-Bench across multiple MLLMs, we have made significant findings. Initially, a substantial gap is observed between the performance of MLLMs and humans on CII-Bench. The highest accuracy of MLLMs attains 64.4%, where as human accuracy averages 78.2%, peaking at an impressive 81.0%. Subsequently, MLLMs perform worse on Chinese traditional culture images, suggesting limitations in their ability to understand high-level semantics and lack a deep knowledge base of Chinese traditional culture. Finally, it is observed that most models exhibit enhanced accuracy when image emotion hints are incorporated into the prompts. We believe that CII-Bench will enable MLLMs to gain a better understanding of Chinese semantics and Chinese-specific images, advancing the journey towards expert artificial general intelligence (AGI). Our project is publicly available at https://cii-bench.github.io/.

  • 21 authors
·
Oct 17, 2024 2

Deep and Sparse Denoising Benchmarks for Spectral Data Cubes of High-z Galaxies: From Simulations to ALMA observations

Beyond cosmic noon, galaxies appear as faint whispers amid noise, yet this epoch is key to understanding massive galaxy assembly. ALMA's sensitivity to cold dust and [C II] emission allows us to probe their interstellar medium, but faint signals make robust denoising essential. We evaluate and benchmark denoising strategies including Principal Component Analysis, Independent Component Analysis, sparse unsupervised representations: iterative soft thresholding with 2D-1D wavelets, and supervised deep learning with a 3D U-Net, to identify techniques that suppress noise while preserving flux and morphology across peak SNRs of 2.5-8, applied to (i) synthetic spectral cubes of rotating toy disk galaxies, (ii) synthetic [C II] IFU cubes from FIRE simulations, and (iii) ALMA [C II] observations of CRISTAL galaxies and W2246-0526. Performance is assessed via RMSE, conservation of flux and spectra, noise reduction, and SNR improvement of the central galaxy. For synthetic cubes: PCA and ICA provide marginal improvement; IST reduces noise effectively at moderate SNRs but can suppress emission at low SNRs; and the U-Net outperforms IST, though it can produce quantifiable hallucinations at lower-SNRs. For moderate-SNR observations (ALMA-CRISTAL), U-Net and IST achieve comparable performance, conserving >91% flux and increasing SNR by >6. However, for observations with complex morphologies absent in the training set (W2246), the U-Net underperforms relative to IST, recovering ~80% flux, while IST robustly conserves flux and improves SNR by ~3, highlighting generalisation challenges and the need for physically-motivated training priors. We conclude that IST is a robust unsupervised denoiser for moderate-SNR data, and a synthetically trained U-Net generalises effectively to real data, dependent on training priors. This framework offers a pathway for transferable denoising for ALMA, VLT/MUSE, and JWST.

  • 7 authors
·
Feb 11