Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rules to be added #117

Open
kensei-te opened this issue Oct 20, 2022 · 14 comments
Open

Rules to be added #117

kensei-te opened this issue Oct 20, 2022 · 14 comments
Assignees
Labels
curation Discussions related to the curation

Comments

@kensei-te
Copy link
Collaborator

kensei-te commented Oct 20, 2022

  1. In the guideline, we need to mention what we should do when there is an unresolved formula such as,
    "(Ca, Pr)FeAs2", "Ca1-xPrxFeAs2" without given x explicitly.
    I think these should be removed.

  2. For the formula, we should mention that abbreviation is not accepted, such as,
    "(Ca,Pr)112"
    If curator get to know the complete formula from the context (sentences nearby), it has to be filled. Otherwise, those has to be removed.

  3. Same rule can be applied to pressure value as well. Sometimes we see such an expression that,
    "Tc^{max} of 10 K has been achieved under high pressure", where the exact value of pressure is unwritten.
    We should remove such entity.

  4. We should mention that we do not store the rate, such as,
    "Tc decreases by pressure with dTc/dP ~ 3 K/GPa"
    We also do not try to calculate Tc under pressure based on above expression.

Any opinion?

@lfoppiano
Copy link
Owner

OK for all.

1 and 2 should be removed with error type "Composition resolution"
3: not sure which error type we should use
4: error type should be "Tc classification"

@lfoppiano lfoppiano added the documentation Improvements or additions to documentation label Oct 20, 2022
@lfoppiano
Copy link
Owner

By the way, @kensei-te these are all going into the guidelines, isn't it?

@kensei-te
Copy link
Collaborator Author

By the way, @kensei-te these are all going into the guidelines, isn't it?

Yes, if other people agree with this rule.

@lfoppiano lfoppiano assigned kensei-te and unassigned lfoppiano Oct 25, 2022
@kensei-te
Copy link
Collaborator Author

OK for all.

1 and 2 should be removed with error type "Composition resolution" 3: not sure which error type we should use 4: error type should be "Tc classification"

My opinion is that, for 3:, this should be linking error in theory (if we could link "high pressure" to Tcvalue). And, since pressure value is not extracted, it is virtually extracted as if it is the Tcvalue of ambient pressure. In this sense, it is "wrong".

Another possible opinion is that, if we think that such unspecified expression should not be extracted, it could be "Invalid", but I have no idea what error-type can correspond to this.

Is an extraction of "high pressure" as pressure possible in the current system?

@lfoppiano
Copy link
Owner

My opinion is that, for 3:, this should be linking error in theory (if we could link "high pressure" to Tcvalue). And, since pressure value is not extracted, it is virtually extracted as if it is the Tcvalue of ambient pressure. In this sense, it is "wrong".
Another possible opinion is that, if we think that such unspecified expression should not be extracted, it could be "Invalid", but I have no idea what error-type can correspond to this.
Is an extraction of "high pressure" as pressure possible in the current system?

The expression "high pressure" is in the training data, so I'm assuming it has to be extracted.
What do we want to do?

  1. If this is the case, then when is missing, we should add it and mark with error type "Extraction"
  2. if this is not the case, then:
    a. we should re-train the model to avoid such expressions
    b. when is extracted we should remove it (only the pressure or the whole record? You tell me) and mark it as "extraction" as well.

That's my opinion

@kensei-te
Copy link
Collaborator Author

1 and 2 should be removed with error type "Composition resolution"

In the current system, how we do this?
(i) First we press "edit" button
(ii) choose "error type" "composition resolution"
(iii) press "remove" button
Is this correct?

@lfoppiano
Copy link
Owner

lfoppiano commented Nov 1, 2022

There is a remove button (the bin near the "edit"):

image

which now opens the same dialog box as the "edit" button. Indeed, we need to gray out the whole form (just opened #124).

@kensei-te
Copy link
Collaborator Author

which now opens the same dialog box as the "edit" button
I see. I remember that when I pressed "remove" last time, it simply removed without asking further, but maybe my memory is wrong. Thanks!

@kensei-te
Copy link
Collaborator Author

One more proposal. "High pressure"(no specification of what exact pressure in the paper) and "La1-xFexAs2" (no specification of what exact x value in the paper), those are not useful for machine learning but there is nothing wrong in the extraction process.

I think we can give another status and error-type, called "Values resolution". How do you think?
スクリーンショット 2022-11-01 10 22 49

@kensei-te
Copy link
Collaborator Author

I mean, those should be "remove"d by curator, with error-type "values resolution".

strictly speaking, it is not an error, system is not wrong, but for convention.

@lfoppiano
Copy link
Owner

Yes, I agree with you. We can add "Values resolution" within the error types

@kensei-te
Copy link
Collaborator Author

Yes, I agree with you. We can add "Values resolution" within the error types

ok. Until this error type is added, what should we do for curation of this kind data?
How about removing them by "composition resolution", for the time being?

@lfoppiano
Copy link
Owner

Sure!

@lfoppiano lfoppiano added curation Discussions related to the curation and removed documentation Improvements or additions to documentation labels Nov 2, 2022
@kensei-te
Copy link
Collaborator Author

Here, for "Values resolution", I want to add one more.

Since data in tables and figures are out of scope, curator should not try hard to fill to complete variables in formula if values for variables are available only at tables or figures, not the sentence itself.

Therefore, such imcomplete formula (material name) should be assigned to "Values resolution" error as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
curation Discussions related to the curation
Projects
None yet
Development

No branches or pull requests

2 participants