Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sam record pair identification depends on required SAM fields #994

Conversation

DenisVerkhoturov
Copy link

@DenisVerkhoturov DenisVerkhoturov commented Sep 21, 2017

Description

Associated pull-request, please, follow the link for details.

Checklist

  • Code compiles correctly
  • New tests covering changes and new functionality
  • All tests passing
  • Extended the README / documentation, if necessary
  • Is not backward compatible (breaks binary or source compatibility)

Copy link
Contributor

@pshapiro4broad pshapiro4broad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks 👍 to me.

import org.testng.Assert;
import org.testng.annotations.Test;

public class SAMRecordTest {
Copy link
Contributor

@pshapiro4broad pshapiro4broad Sep 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately it looks like SAMRecordUnitTest is the class where tests for SAMRecord are stored. If so, you should either move those tests here or these tests there.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, done here.

@codecov-io
Copy link

codecov-io commented Sep 21, 2017

Codecov Report

Merging #994 into master will increase coverage by 0.024%.
The diff coverage is 37.5%.

@@               Coverage Diff               @@
##              master      #994       +/-   ##
===============================================
+ Coverage     66.039%   66.063%   +0.024%     
- Complexity      7477      7485        +8     
===============================================
  Files            529       529               
  Lines          31963     31971        +8     
  Branches        5457      5464        +7     
===============================================
+ Hits           21108     21121       +13     
+ Misses          8713      8706        -7     
- Partials        2142      2144        +2
Impacted Files Coverage Δ Complexity Δ
src/main/java/htsjdk/samtools/SAMRecord.java 66.353% <37.5%> (-0.274%) 303 <4> (+4)
src/main/java/htsjdk/samtools/SAMFileHeader.java 65.318% <0%> (+1.734%) 43% <0%> (+1%) ⬆️
...a/htsjdk/samtools/cram/encoding/rans/Decoding.java 85.714% <0%> (+3.571%) 12% <0%> (+1%) ⬆️
...samtools/util/AsyncBlockCompressedInputStream.java 76% <0%> (+4%) 13% <0%> (+1%) ⬆️
...sjdk/samtools/util/Md5CalculatingOutputStream.java 76.923% <0%> (+7.692%) 9% <0%> (+1%) ⬆️

@nh13
Copy link
Member

nh13 commented Sep 22, 2017

I am a bit time constrained at the moment. I strongly object to this PR on my first read. Can you wait until early next week for me or a maintainer to take a look?

@@ -334,6 +334,19 @@ private boolean hasReferenceName() {
}

/**
* @return {@code true} if records belong to the same pairwise alignment
*/
public boolean isPair(final SAMRecord record) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused about this function, a mate is another record with the same query name...why are you checking everything else, and not the QNAME? I'm sure there's a reason, but it should be explained in the javadoc as it is not obvious at all!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the remark, I think it does matter in SamRecord and I fixed it, but if we implement it in HitsForInsert it doesn't matter because it deals records with the same QNAME only. Here is the logic that produces HitsForInsert.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to move this to SamPairUtil, because there are already a lot of methods in SAMRecord that may be better to be used within an utility class. Eventually, reads would be accessed by an interface (e.g., #985) and thus this methods may live in an utility class that is implementation-independent.

@yfarjoun
Copy link
Contributor

could you explain in the description what problem this is solving? what is the purpose of this new function?

@yfarjoun
Copy link
Contributor

yes, but a public function with the name isPair is going to be found by users who will then make the wrong assumption. either keep is inside MergeSamFiles or make it clear from the name what is the purpose of this function.

Copy link
Member

@tfenne tfenne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't this PR is likely to be merged ever. It seems to me that this method is intended for a very specific purpose and is unsuitable for inclusion in SAMRecord. It appears to be asking "is this other record related to, and meeting some arbitrary definition of properly paired with, this record".

@DenisVerkhoturov
Copy link
Author

@yfarjoun If we leave it in HitsForInsert it is a private function. There is no need to make it public it will be used only in HitsForInsert, and it can be a method-local function because it is used only in HitsForInsert#coordinateMyMate method scope.

@DenisVerkhoturov DenisVerkhoturov force-pushed the epam-ls_SamRecordPairIdentification branch from 20ad2c9 to 3c01a88 Compare September 22, 2017 13:20
Copy link
Member

@lbergelson lbergelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't really understand this method and it's purpose. Two reads with the same name are by definition a pair, unless they're supplementary or secondary reads. I assume this is intended to be use for some sort of validation? It should probably take into account supplementary / secondary reads if it's going to be part of the public api. It also doesn't account for many segment reads, which are admittedly poorly supported in htsjdk, but we should try to make an effort to deal with in new code.

I agree with the other commenters that this doesn't belong in SAMRecord, it could maybe go into one of the utils classes if it's named more clearly and the javadoc explains what's for better.

I hate to reject your effort, especially since you did the work of including tests (thank you for that!), but it seems like it's probably something for a very specific use case though and maybe it's best to just leave it as private method wherever it's used.

@DenisVerkhoturov
Copy link
Author

Thank everybody for comments!=)
So, to recap, should I just close this pull-request and leave arePair (SAMRecord, SAMRecord) -> boolean function in this request?

@lbergelson
Copy link
Member

@DenisVerkhoturov I think that's the right way to proceed. Thank you for your PR and interest in the project! We welcome future contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants