You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We recently received some insights and feedback from a renowned professor who majored in image security. His comments are valuable, and we want to share them with you. Let's first quote his questions below:
In the earlier experiments, CAT-Net and SPAN were used. In the BenCo framework, are these models included, or are they independent from the codebase?
In the later part, various backbones were evaluated, but it seems there is no specific mention of using different losses (various SOTA methods) for comparison.
My understanding might be incorrect, but is it right to conclude that the highest F1 performance in the upper part of Table 3 is 0.53, and Table 6 shows that using Swin-B alone can achieve 0.586? Does this imply that many modules have a negative effect and that many methods proposed in papers are insignificant?
The text was updated successfully, but these errors were encountered:
The IMDL-BenCo codebase has included these models in the model_zoo, either through their official implementations or our manual reproductions. They are integrated according to the required IMDL-BenCo paradigm and can be directly utilized as needed by benco init model_zoo.
A2:
Currently, the choice of loss function often depends on the specific structure of the model, which can lead to poorer performance when simply switching a particular loss to another backbone. Due to both code difficulties and space limitations, we have not yet conducted experiments in this area. However, given our existing framework, conducting such tests would be straightforward. We plan to consider refining and documenting these experiments on GitHub and in our documents in the future.
A3:
In our paper, Table 3 averages results from five datasets, while Table 6 averages results from four datasets, excluding IMD-2020. We acknowledge this omission and plan to address it in future versions to minimize ambiguity. For IMD-2020, Swin-B achieves an F1 score of 0.3459, with an average F1 of 0.53794 across all five datasets, still slightly surpassing TruFor's 0.530. This still indicates that Vanilla Swin-B performs exceptionally well, outperforming all models specifically designed for IMDL tasks.
In conclusion, many low-level feature extraction methods appear to contribute little and may even exacerbate overfitting. More thought-provokingly, a segmentation task backbone can already achieve strong performance in IMDL tasks. Future discussions in the IMDL domain may pivot toward how artifacts in this task should be extracted and what distinguishes them from traditional segmentation tasks. Nevertheless, we believe that leveraging superior backbones and enhancing them with accumulated domain expertise will inevitably lead to designing a better model for IMDL. Standing on the shoulders of giants, continue to propel the IMDL field towards a brighter future.
As authors, we warmly welcome discussions on similarly valuable topics. Please feel free to reach out to our team via email or directly create an issue, as your input can drive advancements in the field.
We recently received some insights and feedback from a renowned professor who majored in image security. His comments are valuable, and we want to share them with you. Let's first quote his questions below:
The text was updated successfully, but these errors were encountered: