Saturday, 3 December 2016

Analysing my DNA: Crossovers Part 1

Recently I have been working on identifying the "crossover points" in the DNA of a group of four siblings. They are related to me, so the results of their DNA tests will not only help me in my ancestral searches, but also in discovering more about my own DNA and mapping it to specific ancestors. Crossover points are important for anyone who is trying to identify where segments of DNA came from, ie through which ancestors, as they indicate a change in the DNA, from that of one grandparent to that of the other grandparent in that couple.

There are detailed explanations of the processes involved available elsewhere online but these are the basics, for anyone who is new to this. We all have 23 pairs of chromosomes, one of each pair coming from our father and one from our mother. Just as we received them from our parents, so our parents received one of each pair of their chromosomes from each of their parents, any future child's grandparents:

Things aren't usually as simple as that shown above. Meiosis, the process of cell division that produces the egg or sperm and ensures the correct amount of DNA is passed on to the offspring, more commonly involves the two chromosomes in a pair splitting and then recombining in a different way, so that the resulting chromosome that's passed on is a mixture of the two of the parent:

 As the process of recombination is random, children of the same parents will each receive a different combination of the DNA that came to their parents from their grandparents:

Unfortunately, the DNA tests do not phase our results, so we cannot even identify our two separate chromosomes, unless we have other relatives tested. All we have to work with is the raw data and information about where we match other people.

In the case of a parent and child who have both tested, comparison with their matches may sometimes indicate a possible crossover:

Not only does the Match, match me by much less than they do my mother, they also match a group of other people, who only match my mother, not me, over the latter part of the segment, from about 122,000,000 to 134,000,000. So it appears that I may have a crossover in my maternal chromosome and did not receive the rest of the segment from the ancestor shared with this match. If I knew which side of my mother's family this match has a connection to then, if my mother and I have any shared matches starting after 125,000,000, I would know to concentrate my search for the shared ancestry on the other side of my mother's family.

But comparisons with a parent against other matches is only likely to reveal a few of the potential crossovers. A better method, available to anyone with a group of three or more siblings tested, is to use the sibling comparisons to identify crossovers. As already indicated, a group of siblings will have received different segments of the grandparent's DNA, but the results are not phased by any of the testing company chromosome browsers:

I have represented in orange the (approximate!) overall matching segments of the children - but, using Gedmatch, it is possible to also identify where two people fully match, ie match on both of their chromosomes, rather than just "half match", ie match on one chromosome. It is this, more complete, pattern of matching - changing between the states of having full, half, or no, matching DNA - which is used in order to identify the points where the DNA "crossed over" from one grandparent's DNA to the other.

There are probably several methods for doing this but I think most credit goes to Kathy Johnston for her "Visual Phasing" method. There are some very good blog posts about using Kathy's method, which I shall include links to below - it is worth reading several, as we all have different ways of describing what we do. I have now developed a slightly different method of working, which suits me better.

But I am going to start with the smallest chromosome, chromosome 21, and follow Kathy's instructions, to illustrate the basic method to start with.

These are the comparisons between the four siblings at Gedmatch:

This is the key to the Gedmatch colours:

 And these are the figures for the comparisons:

I have kept the figures separate from the individual chromosome images, as I found the crossover lines end up obscuring the figures on the chromosomes with more crossovers.

Despite having looked at comparisons between the siblings in various other formats (eg the FTDNA downloads), it was only when I did these comparisons that I realised Siblings A and B do not match each other on this chromosome.

Which goes to show how we often notice just what is present - not what is missing! 🙂

But, as you can see, even where two siblings do not match each other (ie they have a grey bar along the lower section, not a blue bar) there are still some base pairs showing a half, or even a full, match. There just aren't sufficient of such matching base pairs in a consecutive sequence for it to be regarded as genealogically relevant.

The next step is for the crossover points to be identified. These are the points where there is a change between fully matching and half matching, or half matching and non-matching, ie where the bottom bar changes between blue and grey, or where the top bars change between an area that is consistently green and one which is predominantly yellow, with intermittent green. The former changes are also demonstrated by the figures. Unfortunately, the changes between fully matching and half matching are not specifically identified in any figures at Gedmatch, although Sue Griffith has explained how to obtain a very good estimate of them*. They can also be identified using one of David Pike's tools*.

Once the crossover points have been identified, they are allocated to particular siblings - a crossover "belongs" to the sibling who shows that change in all of their comparisons. (This isn't always obvious, especially if only using three siblings - sometimes, what looks like a single crossover for one sibling can actually be a double crossover for the two others. Having results available from more than three siblings is an advantage for me.)

In the comparison between B to D, the matching segment does seem to start before the crossover point indicated in the comparisons between D to A and D to C. I suspect this segment could be being artificially extended through some base pairs that just happen to match on both B and D. Issues like this are things to note for future investigation, as they may be a hint that something is wrong with the identification.

Next, working with just the identified crossover lines in an image, but referring to the comparison diagram and the figures, the phased segments of the grandparents' DNA are constructed, usually starting with a segment where two siblings are fully identical. In order to do this, four colours are chosen to represent the DNA received by the children from the grandparents.  Two colours are used for the top grandparent couple and two for the bottom grandparent couple.  [Note, If you follow a colour coded genealogy filing system, I would suggest choosing different colours for the chromosome mapping  (at least, until you are absolutely positive you have identified the correct grandparents' segments, in which case you could change the colours to match your genealogy system.  This would then also be a visual clue that, that chromosome is "confirmed")  But, if you use those colours prior to such confirmation, you might find yourself becoming confused, as we do not yet know which grandparent couple is represented by which phased chromosome.]

At the start of this chromosome 21, B does not match any of their siblings, whereas the other three are all fully identical to each other, so the colours can be allocated as follows:

Since neither A nor B have any crossovers, their coloured bars can be extended for the full length of the chromosome. D's can also be extended as far as D's crossover line at 40:

 Between "37" and "40", C becomes half identical to all three of the other siblings. We don't know whether the crossover is on the maternal or the paternal chromosomes (and we haven't identified the colours as being for specific grandparents anyway), so we just have to pick one of the colours to change. I have chosen the top chromosome, purple changing to blue.  As this is the only crossover C has, the two bars can then be extended to the end:

At "40", D becomes half matching to A and B, but fully matching to C. The same chromosome that we changed for C therefore needs to change for D, in order to produce the correct pattern of matching, and the other colour can be extended, unchanged, to the end:

At this stage, we don't know which colours represent which grandparents - that can only be identified by comparison to other known relatives. But we can still look at the shared matches between the siblings, to see how those results correlate with the phasing represented here. For example, I would expect there to be no shared matches between A and B at any point on this chromosome, whereas A, C and D should have exactly the same matches prior to the point "37". B, C and D will share some matches after point "40", but not all of them. The ones C and D don't share with B after "40", should be people that match A, as well.

So, in my next post, I will explore that. I'll also describe some of the issues I have come across in this process so far, as well as explain the way I have adapted Kathy's method to my own way of working.

But, if you thought this chromosome was easy to phase, then perhaps you'd like to consider the following set of comparisons:

[PS Having begun to look at the matches the siblings have with their niece and their 1st cousin, as well as the more distant matches, I have found an "anomaly". So, perhaps phasing chromosome 21 isn't so straightforward, after all!]

* Sources and references I have found helpful:

Kathy Johnston - step by step instructions for her method: http://forums.familytreedna.com/showthread.php?t=36812 (make sure you download both the slides and the instructions)
Jason Lee - a blog post detailing Kathy's method: http://dnagenealogy.tumblr.com/post/137722603308/the-use-of-crossover-lines-among-siblings-to
Blaine Bettinger's pdf combining his five posts about the phasing process - http://thegeneticgenealogist.com/wp-content/uploads/2016/11/Visual-Phasing-Bettinger.pdf

Two other bloggers with helpful posts about phasing, including issues such as the way what looks like a single crossover for one sibling can actually be a double crossover for two others:
Ann Raymont - https://dnasleuth.wordpress.com/2016/06/01/chromosome-mapping-with-siblings-part-2/ (and part 1)
Joel Hartley: http://www.jmhartley.com/HBlog/?p=2239

 Sue Griffith's post on how obtain the values for crossovers from FIR to HIR & vice versa: http://www.genealogyjunkie.net/blog/obtaining-fir-boundaries-on-gedmatch-using-the-little-tick-marks

David Pike has a number of free DNA tools, including the "Search for Shared DNA Segments in Two Raw Data Files" which reports single and double matching segments (ie half identical and fully identical): http://www.math.mun.ca/~dapike/FF23utils/pair-comp.php