How Dataset Imbalances Shape Medical Image Retrieval Accuracy

This article explores challenges and innovations in medical image retrieval, focusing on dataset imbalance, organ size and shape biases, and recall accuracy interpretation. It highlights a novel application of ColBERT-inspired re-ranking, demonstrating its feasibility in refining CBIR results by incorporating context such as user behavior and medical relevance. While no strong link was found between anatomical region size and retrieval recall, the study opens new pathways for improving image retrieval systems, balancing computational costs, and enhancing real-world usability.


This content originally appeared on HackerNoon and was authored by Image Recognition

Abstract and 1. Introduction

  1. Materials and Methods

    2.1 Vector Database and Indexing

    2.2 Feature Extractors

    2.3 Dataset and Pre-processing

    2.4 Search and Retrieval

    2.5 Re-ranking retrieval and evaluation

  2. Evaluation and 3.1 Search and Retrieval

    3.2 Re-ranking

  3. Discussion

    4.1 Dataset and 4.2 Re-ranking

    4.3 Embeddings

    4.4 Volume-based, Region-based and Localized Retrieval and 4.5 Localization-ratio

  4. Conclusion, Acknowledgement, and References

4 Discussion

4.1 Dataset

As depicted in Figure 6, the labels inside the database and query subset (derived from TS train and test set, respectively) are not balanced. This should resemble a pattern as can be observed in future real-world scenarios of image retrieval. At the same time, this imbalance should be kept in mind when reading and interpreting recall values from the provided result tables.

\ Additionally, it is worth noting that the size and shape of organs can impact the probability of correctly predicting a given label by chance. For example, smaller organs can be less likely to collect "by-chance" true positive predictions compared to larger organs. Similarly, organs with elongated shapes aligned with the slice-wise sampling direction can increase the likelihood of "by-chance" hits. A volume and shape-adjusted representation of recall values does not seem reasonable and thus has not been performed in this work. However, organ volume as shown in Figure 7 and Figure 8 should be considered while interpreting result tables.

\ Figure 9 and Figure 10 present an overview of mean recall for each of the retrieval methods (all models) versus the mean anatomical region size for 29 and 104 classes, respectively. There is no pattern suggesting any correlation between the size of the anatomical region and the average retrieval recall.

\ Figure 6: Distribution of the classes in database (a) and query (b) volumes.

\

4.2 Re-ranking

For the first time, we could successfully adopt and show the feasibility of ColBERT-inspired re-ranking for an image retrieval task. In theory, this shows that CBIR results can be made subject to context-aware re-ranking. This is very important as it provides a conceptual entry point to use the information of a future retrieval solution in the real world. Concretely, observations such as user behavior on a graphical user interface, and temporal or medical relevance can be "factored in" to adjust the search results. Further research will study the advantages and disadvantages of ColBERT-inspired re-ranking. In future works, further insights into balancing computational costs in the context of latency-accuracy trade-offs will be shared.

\

:::info Authors:

(1) Farnaz Khun Jush, Bayer AG, Berlin, Germany (farnaz.khunjush@bayer.com);

(2) Steffen Vogler, Bayer AG, Berlin, Germany (steffen.vogler@bayer.com);

(3) Tuan Truong, Bayer AG, Berlin, Germany (tuan.truong@bayer.com);

(4) Matthias Lenga, Bayer AG, Berlin, Germany (matthias.lenga@bayer.com).

:::


:::info This paper is available on arxiv under CC BY 4.0 DEED license.

:::

\


This content originally appeared on HackerNoon and was authored by Image Recognition


Print Share Comment Cite Upload Translate Updates
APA

Image Recognition | Sciencx (2025-08-29T05:00:09+00:00) How Dataset Imbalances Shape Medical Image Retrieval Accuracy. Retrieved from https://www.scien.cx/2025/08/29/how-dataset-imbalances-shape-medical-image-retrieval-accuracy/

MLA
" » How Dataset Imbalances Shape Medical Image Retrieval Accuracy." Image Recognition | Sciencx - Friday August 29, 2025, https://www.scien.cx/2025/08/29/how-dataset-imbalances-shape-medical-image-retrieval-accuracy/
HARVARD
Image Recognition | Sciencx Friday August 29, 2025 » How Dataset Imbalances Shape Medical Image Retrieval Accuracy., viewed ,<https://www.scien.cx/2025/08/29/how-dataset-imbalances-shape-medical-image-retrieval-accuracy/>
VANCOUVER
Image Recognition | Sciencx - » How Dataset Imbalances Shape Medical Image Retrieval Accuracy. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2025/08/29/how-dataset-imbalances-shape-medical-image-retrieval-accuracy/
CHICAGO
" » How Dataset Imbalances Shape Medical Image Retrieval Accuracy." Image Recognition | Sciencx - Accessed . https://www.scien.cx/2025/08/29/how-dataset-imbalances-shape-medical-image-retrieval-accuracy/
IEEE
" » How Dataset Imbalances Shape Medical Image Retrieval Accuracy." Image Recognition | Sciencx [Online]. Available: https://www.scien.cx/2025/08/29/how-dataset-imbalances-shape-medical-image-retrieval-accuracy/. [Accessed: ]
rf:citation
» How Dataset Imbalances Shape Medical Image Retrieval Accuracy | Image Recognition | Sciencx | https://www.scien.cx/2025/08/29/how-dataset-imbalances-shape-medical-image-retrieval-accuracy/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.