Replies: 1 comment 1 reply
-
Hi @schnpwr, Thanks for bringing this to our attention, it's a really interesting issue. In the ECtHR A task, this is not an issue in my opinion, since the labeling accounts for violated articles that are only verbatim mentioned in the court decision -the last part of the court's opinion-, which is not part of the input (facts). So, references to the allegedly violated articles (a superset of the violated ones, i.e., the set of violated articles is a subset of those) do not leak the labeling, since the involved parties, their legal representation, and the judges all know which are the articles at stake. In the ECtHR B task, it is a potential issue, since the labeling refers to the allegedly violated articles, i.e., those that are considered/discussed in court in order to decide if they have been violated or not. In this case, the potential labeling leakage should be considered case by case, since in many cases the input legal case is a re-trial or appeal in some sense, which means that the court has already considered the applicant's legal case in the past, and there were grounds for reconsideration, which means that the original allegedly violated articles are part of the facts. In any case, I agree that masking this information would improve the experimental setup since it's hard to distinguish whether this information leads to label leakage or not. It would be interesting to manually review these cases to identify what is the role of these references. Could you provide the IDs or some snippets? Have you already considered experiments w/ and w/o masking? |
Beta Was this translation helpful? Give feedback.
-
I observed that actual ECHR article numbers are present in the text of case descriptions. For example, in ECtHR B dataset, in the test split, I found 178 instances where at least one gold-standard label is explicitly mentioned in the case description. Can anyone confirm whether such article mentions were masked while computing the accuracy numbers? If yes, how were they masked? I feel that if the data is used as it is, it will give an unfair advantage to the classifiers.
Beta Was this translation helpful? Give feedback.
All reactions