Sunday, 25 June 2017

Analysing my DNA: Crossovers Part 2

This is a continuation from my part 1 post at http://notjusttheparrys.blogspot.co.uk/2016/12/analysing-my-dna-crossovers-part-1.html.  My initial intention for this post was simply to look at the shared matches between the siblings, to see how those results correlate with the phasing of chromosome 21 represented in part 1.  That sounds easy enough but one of the reasons it has taken me so long to post, is that things very rapidly become complicated! 

So, in this post, I will look at the shared matching between the siblings and their three closest relatives - a niece, a first cousin and a third cousin once removed - and how adding the additional relatives caused me to alter my interpretation of how the niece matched. 

This was my starting point from part 1, the four siblings A, B, C, and D, with their chromosomes represented by the four colours:

The parents' chromosomes can then be represented as follows:

And which parts of the parents' chromosomes each of the siblings received like this:

One of the closest matches to the siblings is their niece, daughter of a deceased brother.  Since the brother was never tested, I don't know what crossover points he received from his parents. The niece will only have one chromosome (of each chromosome pair) from her father - but the differences in matching between the niece and the siblings could be as a result of crossovers within each of her father's chromosomes, or between the father's two chromosomes. 

So these are the comparisons between the niece and each of the siblings from Gedmatch, along with a potential crossover point identified at 43:

So immediately there is an issue - the niece matches sibling B up until 43, and, correctly, does not match any of the other siblings until that point.  However, beyond 43, the niece appears to match none of the siblings (based on the grey "match" bar).  But we know that, since sibling A and B do not match each other at all on this chromosome, the two siblings A and B, between them, cover all four of the siblings' parents' chromosome 21s.  So, if the niece doesn't match sibling B, then she has to match sibling A at least.  And, looking at the Gedmatch image, it seems quite clear that this is a threshold issue - the niece does actually match the three siblings A, C and D beyond 43.  The match just isn't being picked up as a match by Gedmatch at the default threshold.  Reducing the threshold indicates the niece matches all three siblings A, C and D by 6.9cM, containing between 1041 - 1045 SNPs.

The initial interpretation of the DNA received by the niece therefore became:

Next, I looked at how the siblings and the niece matched the siblings' paternal first cousin.  The Gedmatch image below was produced using the default threshold, but again, reducing the thresholds slightly indicated a potential matching segment just below the 7cM threshold:

Chr        Start Location        End Location        Centimorgans (cM)
21        14,677,076        22,936,413        18.2
21        22,950,552        33,423,011        15.7
21        34,132,054        37,056,381        6.7

The paternal first cousin can only match the siblings through their father's chromosomes.  But, as their father will not have received exactly the same DNA as the first cousin's parent did, there will be some areas where the first cousin does not match any of the siblings.

By comparison to the phasing of the siblings and niece, the first cousin's matching segments were therefore mapped as follows:

(this process also indicated that the "Parent 2" phasing represents the siblings' father's chromosomes.)

So far, so good.

When I downloaded the matching segments for the siblings, in order to start investigating the shared matches, I realised a known relative shared DNA with sibling B on chromosome 21.  The relative is a 3rd cousin 1 removed (3c1r) and shares from about 17 to 28.  The shared ancestry is on the siblings' paternal side of the family, the same as the 1c is:

But now there's a problem.  This 3c1r does not match any of the other siblings, or the niece, on chromosome 21.  But, at the point where the  3c1r matches B, we have already "used" both of the paternal chromosomes, one for the matching between the first cousin and siblings ACD, the other for the matching between the niece and sibling B.   It's okay that the 1c doesn't match the 3c1r - that actually indicates that the chromosome ACD share with the 1c must be the one the siblings' father received from his mother, the siblings' grandmother, as she is also a common ancestor with the 1c. 

But, clearly the chromosome the niece shares with sibling B cannot be the other paternal chromosome.  As far as I am aware, there's no other shared ancestry with the 3c1r.  So, let's go back to the matching between the niece and the siblings - where did I go wrong?

Siblings A, C, and D all show a very small area of potentially matching SNPs between 24 and 26 - but it is only 1.5 cM and 365 SNP.  I don't believe that has any significance, especially as there's no change in matching with sibling B. (The niece only has one relevant chromosome in this comparison - and the kit being used is a "paternal" one that's been phased using her mother's data, so should be fairly accurate.)

So what about the potentially matching segment with sibling C, between 37 - 39?  This is a 4.2 cM segment, containing 743 SNPs - so it is a small segment that, under normal circumstances, when matching to unknown and more distant relatives, should be ignored. 

From the sibling phasing, B and C are matching from 37, after C had a crossover, and their matching segment is a "Parent 1" segment.  So, is it possible that the niece's matching should actually be as follows:

The niece is matching B on a Parent 1 chromosome (now known to be maternal).  Sibling C then starts to match both B & the niece at 37, but the niece stops matching C at 39, as the niece has a crossover between the two chromosomes her father had.  If she switches from her father's maternal chromosome to his paternal chromosome, and those are also the two chromosomes sibling B has, that would account for why the niece continues to match B until 43.  At 43 there is then a crossover between the two chromosomes of Parent 2 - which would indicate a crossover in the niece's father, passed on to the niece within the segment from his paternal chromosome.  This interpretation would account for the niece's match to the 1c, between 40 - 43, and explain why she does not match the paternal 3c1r earlier on the chromosome, between 17 - 28.

If that is the situation, then the diagram of the siblings' parents' chromosomes can now be extended to also show the DNA received by their grandchild, the siblings' niece, as well as the potential source for the paternal chromosomes:

Please let me know if you can spot any mistakes in my reasoning.