Some in-depth QA and discussions about the paper. #22

SunnyHaze · 2024-06-28T08:34:08Z

We recently received some insights and feedback from a renowned professor who majored in image security. His comments are valuable, and we want to share them with you. Let's first quote his questions below:

In the earlier experiments, CAT-Net and SPAN were used. In the BenCo framework, are these models included, or are they independent from the codebase?

In the later part, various backbones were evaluated, but it seems there is no specific mention of using different losses (various SOTA methods) for comparison.

My understanding might be incorrect, but is it right to conclude that the highest F1 performance in the upper part of Table 3 is 0.53, and Table 6 shows that using Swin-B alone can achieve 0.586? Does this imply that many modules have a negative effect and that many methods proposed in papers are insignificant?

SunnyHaze · 2024-06-28T08:49:17Z

Our Discussions are as follows:

A1:
- The IMDL-BenCo codebase has included these models in the model_zoo, either through their official implementations or our manual reproductions. They are integrated according to the required IMDL-BenCo paradigm and can be directly utilized as needed by benco init model_zoo.
A2:
- Currently, the choice of loss function often depends on the specific structure of the model, which can lead to poorer performance when simply switching a particular loss to another backbone. Due to both code difficulties and space limitations, we have not yet conducted experiments in this area. However, given our existing framework, conducting such tests would be straightforward. We plan to consider refining and documenting these experiments on GitHub and in our documents in the future.
A3:
- In our paper, Table 3 averages results from five datasets, while Table 6 averages results from four datasets, excluding IMD-2020. We acknowledge this omission and plan to address it in future versions to minimize ambiguity. For IMD-2020, Swin-B achieves an F1 score of 0.3459, with an average F1 of 0.53794 across all five datasets, still slightly surpassing TruFor's 0.530. This still indicates that Vanilla Swin-B performs exceptionally well, outperforming all models specifically designed for IMDL tasks.
- In conclusion, many low-level feature extraction methods appear to contribute little and may even exacerbate overfitting. More thought-provokingly, a segmentation task backbone can already achieve strong performance in IMDL tasks. Future discussions in the IMDL domain may pivot toward how artifacts in this task should be extracted and what distinguishes them from traditional segmentation tasks. Nevertheless, we believe that leveraging superior backbones and enhancing them with accumulated domain expertise will inevitably lead to designing a better model for IMDL. Standing on the shoulders of giants, continue to propel the IMDL field towards a brighter future.

SunnyHaze · 2024-06-28T09:51:32Z

As authors, we warmly welcome discussions on similarly valuable topics. Please feel free to reach out to our team via email or directly create an issue, as your input can drive advancements in the field.

SunnyHaze added the good first issue Good for newcomers label Jun 28, 2024

Knightzjz self-assigned this Jun 28, 2024

Knightzjz mentioned this issue Jun 28, 2024

About input size #21

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some in-depth QA and discussions about the paper. #22

Some in-depth QA and discussions about the paper. #22

SunnyHaze commented Jun 28, 2024

SunnyHaze commented Jun 28, 2024

SunnyHaze commented Jun 28, 2024

Some in-depth QA and discussions about the paper. #22

Some in-depth QA and discussions about the paper. #22

Comments

SunnyHaze commented Jun 28, 2024

SunnyHaze commented Jun 28, 2024

SunnyHaze commented Jun 28, 2024