Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regulates_o_occurs_in (ref: gomodel:59a9023b00000107) #44

Open
BarbaraCzub opened this issue Sep 7, 2017 · 18 comments
Open

Regulates_o_occurs_in (ref: gomodel:59a9023b00000107) #44

BarbaraCzub opened this issue Sep 7, 2017 · 18 comments

Comments

@BarbaraCzub
Copy link

Re: Figures 5 and 6 in PMID:22302796.

The figures show that glutamate re-uptake by Slc1a6 transporter in Purkinje cells regulates AMPA receptor activity in Bergmann glial cells.

In P2GO I would capture this as follows, based on the fact that occurs_in refers to the location of the regulatory process, whereas regulates_o_occurs_in describes the location of the process being regulated:

Slc1a6 - GO:2000311 regulation of AMPA receptor activity - AEs: occurs_in Purkinje cell, regulates_o_occurs_in Bergmann glial cell

The relation regulates_o_occurs_in is not available in Noctua, so I tried two ways to capture the information, but neither of them seems to be translated correctly into the GPAD format.

Firstly, I tried using the part_of relationship between the MF and the BP and used occurs_in to indicate the location of each one of them:

screen shot 2017-09-07 at 18 18 49

This results in the following incorrect annotations:

screen shot 2017-09-07 at 18 28 58

Bergmann cells should not be included in the MF annotation at all.

screen shot 2017-09-07 at 18 37 14

Here the regulates_o_occurs_in relation would have been correct, but occurs_in indicates that the regulatory process occurs in these cells (whereas the entity is active in Purkinje cells in this example).

Since this did not work, I then tested the 'regulates' relation instead:

screen shot 2017-09-07 at 18 18 44

This resulted in loss of the Bergmann glial cell from the AE for both the MF and the BP.

So in this case the MF annotation is correct:

screen shot 2017-09-07 at 18 31 08

But the BP annotation lacks information wrt location of the process being regulated. And instead it contains an additional unnecessary and redundant 'regulates' relation.

screen shot 2017-09-07 at 18 33 26

How do I fix this? There should be at least one way to capture this in Noctua, which will result in the correct occurs_in and regulates_o_occurs_in relations in GPAD.

I'll be very grateful for some advice on this!

Thanks,
Barbara

cc: @RLovering

@cmungall
Copy link
Member

cmungall commented Sep 7, 2017

The 2nd is correct. As you point out, the GPAD annotations are valid but could be improved in two ways. I'll address these below. But just to be clear you can go ahead and model things with regulates.

Redundancy

Yes, explicitly adding the regulates relationship in the extension field is redundant as the inferred GO class already states this. But this redundancy only manifests in the extension. How high a priority is it to tackle this?

Missing extension relationship

You would like to see regulates_o_occurs_in(Bergmann cell) in the extensions field. This is indeed valid.

This should be easily addressable by adding the regulates_o_occurs_in relation (GOREL:0001004) to the whitelist here:

https://github.com/geneontology/minerva/blob/master/minerva-converter/src/main/resources/org/geneontology/minerva/legacy/sparql/gpad-extensions.rq

However before we do this, let's consider:

  • is this a special case, or do we want to do this for all the other named property chains?
  • even if we do for just this one case, this will result in everyone getting these in their extensions field, for relevant models. It seems not all groups are consistent here as to what to include and what not to include. I want to avoid a situation where we add it to the whitelist and then this disrupts another group.

I would not be sad to see named property chains (e.g the ones with two relations linked by composition operator o) disappear from extension fields. They were designed for the situation where we could not have an explicit model. They are inherently inherently lossy: the semantics explicitly state there is a chain of relationships, but we don't say what the thing linking in the chain is. When I introduced these originally I thought it would be a simple concept, but unfortunately even within GO they have proven highly confusing. I very much doubt users understand or make use of these abstracted extensions. In contrast I think the model is clear, and with a little bit of extra work we are close to making nice visualizations that can be easily embedded on gene pages.

@cmungall
Copy link
Member

cmungall commented Sep 7, 2017

Thanks for the really clear example. I like how you did the screenshots with the reasoner on so you can see the inferences directly.

TANGENT AHEAD

For the ontology geeks: why didn't the reasoner flag the first example as being wrong? Recall that part_of o occurs_in -> occurs_in (we discussed this at some length at the OWL training in Berkeley, a few had some initial difficulties with the concepts but we worked through these). So this means that in the first example, the glutamate reuptake would be happening in the Bergman cell as well as the Purkinje cell. This clearly would be wrong!

However, the machine doesn't know that this is invalid. It would have to be equipped with the knowledge that something can't be part of two cells, and that Bergman and Purkinje cells are disjoint. In fact that first piece of knowledge may be fake, we have to think carefully about parasites and possibly normal cases of engulfment or cell overlap. A safer approach would be to have general high level rules about what kinds of cells never overlap. But this would be a lot of work and possibly not worth it here

END OF TANGENT

@BarbaraCzub
Copy link
Author

BarbaraCzub commented Sep 8, 2017

I am glad you find the screenshots helpful. They certainly helped me explain the issue.

Re: Redundancy

I do not think this needs to be tackled very urgently. However, it is confusing. When I saw this extension I assumed I must have done something wrong, because it did not make sense to me that this extension appeared. Now I will know not to worry about this.

Re: Missing extension relationship

It was difficult for me at first to remember the difference between occurs_in vs regulates_o_occurs_in relations, but now I find the distinction very helpful. I suppose I rely on this relation quite a lot. I have just had a look in Protein2GO reports and during the last 6 months I used regulates_o_occurs_in 285 times. So this is not a special case (in my case!). But it took me a while to learn to use this relationship, and I appreciate that many users are likely to use it incorrectly, or to use occurs_in instead. (Before I was made aware of the usage of the regulates_o_occurs_in relation, I had wrongly used occurs_in myself, when I actually should have used regulates_o_occurs_in). It can be confusing, so I still have a piece of paper over my desk reminding me which one means what, which is useful in moments of doubt.

@vanaukenk
Copy link
Contributor

Hi @BarbaraCzub
One thought I had when looking over your model is that there aren't actually (at least in the screenshots) MFs annotated to the gene products. I think that adding in the MFs may help clarify where the different activities and processes occur.
The fact that you are able to create a GP enables 'glutamate reuptake' (a BP) suggests that we need domain and range restrictions that restrict enables/enabled_by to a continuant (meaning a gene product or complex) and an MF.

@BarbaraCzub
Copy link
Author

Hi @vanaukenk
Thank you for pointing this out. I will add the transmembrane transporter activity (which I have on the lhs on the current version of the model - screenshot below), and then I'll see how this affects the output file.

And yes, because I was able to annotate: GP enables 'glutamate reuptake', this suggested to me that this was an MF. I should have double-checked, thanks again for bringing my attention to it. I think it would be helpful, if a rule was introduced, which would restrict this.

screen shot 2017-09-08 at 16 38 20

@cmungall
Copy link
Member

cmungall commented Sep 8, 2017

I made a PR geneontology/minerva#138

On the call we also decided once implemented we would grep the generated gpads for this relation make a model list and send it to appropriate people for checking

@kltm
Copy link
Member

kltm commented Sep 9, 2017

If you ping me once accepted, I'll rebuild and push to production.

@cmungall
Copy link
Member

cmungall commented Sep 9, 2017

ok,PR is merged.

@kltm
Copy link
Member

kltm commented Sep 9, 2017

Rebuilt and restarted.

@cmungall
Copy link
Member

OK, I made a model just for testing this:

http://noctua.berkeleybop.org/editor/graph/gomodel:59b38c1100000000?barista_token=0fftg0rdaeef0rcyvbql#

We still don't see the regulates_o_occurs_in extension - but I know why.

This relation is in the gorel extension to RO. Historically, we have never used gorel with noctua. It contains relations that are only used in the context of extensions, we've never needed it. But of course, it is needed if we are to infer complete extensions as in this case.

The fix should be simple: add gorel.owl into the import chain. However, I want to proceed very carefully as we need to check that this doesn't introduce any problems such as stray relations. @dosumis has been the thankless caretaker of gorel, we'll need to figure out a plan for taking care of it into the future if we want to keep using these. It may make more sense to move the ones we think a user may want to see into RO.

@BarbaraCzub
Copy link
Author

Hi @vanaukenk and @ukemi

I have now updated the model as discussed during the editors call last Friday:
screen shot 2017-09-11 at 11 08 59

The current output of this fragment of the model is:
screen shot 2017-09-11 at 11 01 29

So all appears to be correct with the exception that we are still lacking the regulates_o_occurs_in relation, and have the redundant 'regulates' relation in the 'regulation of AMPA receptor activity' annotation, as discussed in this ticket above and on the call.

I'll check the preview and/or GPAD output again after the gorel.owl has been added, as @kltm mentioned above.

Thanks!
Barbara

@ukemi
Copy link

ukemi commented Sep 11, 2017

This model looks much better. I think there is one redundant edge between the function and the cell type. If the function is part of the process, and the process occurs in a cell type, then the function must occur there too. There is still the redundant regulation term/regulation extension, but I'm not sure that is a huge deal.

Here is what I perceive is going to be the controversial bit. I have been thinking a lot about the chain relations over the weekend and am not wondering if we really want them in the GPAD/GAF files. Originally we put them there because we wanted to be able to express information about the regulatory target. But now with GO-CAM, is this necessary? Perhaps chaining relations in extensions is 'overloading' the GAF/GPAD format. It might be worth discussing this in Cambridge.

@BarbaraCzub
Copy link
Author

Re: redundant edge. Thanks @ukemi you are right. I have removed this edge and the preview still displays the same annotations, so the location of the MF has indeed been inferred correctly.

Re: chain relations. If you have concerns about 'overloading' the GPAD, I agree that it is probably best to discuss this more widely in Cambridge before any decisions are made and implemented. @cmungall also mentioned above that these chain relationships were required before models could be made, but are not really so necessary now, when information can be captured in Noctua.

@dosumis
Copy link
Contributor

dosumis commented Sep 11, 2017 via email

@ValWood
Copy link

ValWood commented Sep 11, 2017

Maybe somebody could use this model to demonstrate the answers to a few of the questions which curators repeatedly ask, but which are still unclear.

  1. What comes out in the GAF
  2. How this would differ if you made this annotation in "traditional GO"
  3. Which additional relationships are required to make the model but not exported.

It's important to think about what will be displayed on gene pages. I know you want to show models, but a GO summary is still required when a gene product participates in many many models. Also MOD users largely consume information on genes via gene pages, and this will continue even when models are available. They are a nice addition.

Users are interested in GO data on a gene by gene basis . This is our user spike
(labelled cdc2 paper 10/7) when I announced the availability of a new cdc substrates dataset using GO. Note this is not even our live site, it's a "preview" site.

pombase use

Obviously we need to make connections between genes, but the main consumers, and often forgotten users of GO are individual bench scientists who consume this info daily via Model Organism Database gene pages.

@ValWood
Copy link

ValWood commented Sep 11, 2017

Shouldn't Slc1A6 be explictly annotated as a glutamate transporter?
(the model does't say that as far as I can tell)

There is an annotation to "glutamate reuptake" but you don't know that this is connected to the MF here do you?

@BarbaraCzub
Copy link
Author

Hi @ValWood yes, I could have used a more specific GO term, as you suggest. I have updated this now to GO:0005313 L-glutamate transmembrane transporter activity, thanks for pointing this out.

I am not sure what you mean when you are asking whether "glutamate reuptake" is connected to the MF. The MF here is a part of "glutamate reuptake", so, yes, they are connected.

@ValWood
Copy link

ValWood commented Sep 11, 2017

I am not sure what you mean

ignore that I was looking at the earlier model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants