Friday, 25 November 2016

DNA Update

It has been an "interesting" year on my DNA journey. Ever since I first took an autosomal DNA test with 23andMe in 2010, I have been working on looking for what are known as "triangulating groups" (TGs) in the data. These are groups of people, who all match me over the same segment of DNA and who also all match each other over that same segment. The theory is that shared DNA indicates shared ancestry and, therefore, if a group of people all share the same segment of DNA, it must have come from the same ancestor (at some level - some of the people in the group may share a close ancestor along the line back to the overall shared ancestor.) The theory sounds "right" and logical, and it appears to fit the patterns I can see in the data:

 I liked using 23andMe for this process. It is the only testing company where it is possible to compare the people you match (and are sharing with) to each other and therefore confirm for yourself whether, or not, they form a TG. This is not possible at the other companies I have tested with. At Family Tree DNA (FTDNA), it is only possible to see where someone matches you, and whether they are "in common with" (ie also share some DNA with) any of your other matches. But you then need to ask them where they match the other people, in order to confirm if they actually match those people over the same segment that they match you on. If it is a different segment, so the TG theory went, then you may all be related to each other through different ancestors, since many of us probably have multiple ancestors in common, as we move further back in time. It was said that you could only be sure the DNA was from the same ancestor if you matched on the same segment.

Part of the difficulty in identifying the TGs at FTDNA, and why you cannot assume people who match you over what looks to be the same segment, and who are "in common with" each other, actually do match each other in the same place and therefore form a TG, is that these DNA tests do not phase the data, ie they do not split it into the two sides we received from our parents. We all have 23 pairs of chromosomes, one of each pair from our father and one from our mother - but the tests just report the two base pairs (bits of DNA!) we have at particular points along the chromosome. So, whilst it might look as if two people match you over the same segment of DNA, one could be matching you on your maternal side and one could be matching you on your paternal side. In that case, the DNA each shares with you would be from different ancestors, one on each side of your family. If the two people also happened to share another ancestor between them, they would show as "in common with" each other - but you would not all be a TG.

 [The lack of phasing also creates the possibility of "false positives" - people who show as a match but who aren't really, because the computers doing the matching have effectively criss-crossed between the base pairs of each chromosome. This is potentially an issue at both FTDNA and 23andMe, in particular. It isn't thought to be so much of an issue at Ancestry, as Ancestry does a form of phasing of the data. However, I didn't think such false matches were likely to be much of a problem, because I thought that, if a group of people were all triangulating, then the chances of all the comparisons being "computer creations" must be quite slim. I do have some groups of matches where no-one matches each other, despite all apparently matching me over the same segment - so those were the matches I took to be "false positives", as theoretically there can only be a maximum of two non-matching results over any particular segment. A third person must match one of the other two, if the matches are genuine.]

 Although I have more of my relatives tested at FTDNA, the reliance on having to contact your matches in order to obtain the details for how they match others was why FTDNA did not seem to be so useful to me, especially as many people do not respond to contact. And Ancestry does not give us any tools to analyse where the actual shared DNA is, so the process of finding TGs is impossible there. Therefore, whilst the other companies do have their own advantages, 23andMe was where I did most of my "work" and, although most of the triangulating groups at 23andMre shared relatively small segments with me (ie between 7cM - 15cM ), I had identified the potential shared ancestry with one of my matches, a 4th Cousin 1x removed, who shared 14cM with me and I just assumed the relationships for the other matches were likely to be further back in time.

So I was happy with my 23andMe process. I'd even agreed to do a talk for the Guild of One-Name Studies on using autosomal DNA, as I felt confident I knew what I was doing.

But a couple of months later, everything changed. A different theory had developed, partly as a result of statistics produced by Ancestry but also through the work of other scientists. These statistics demonstrated that the probability of several cousins actually sharing the same matching segment was very low, if not impossible. Instead of "triangles", we now had "circles" - and suddenly that brought into question exactly what all these "triangulating groups" really are.

The "circle" theory is still based on the fact that shared DNA means shared ancestry - but now the claim was that the shared DNA would be on different segments of the chromosomes, because of the way DNA is transmitted. A parent passes half their DNA to each child, but each child receives a different half, as there is a recombination process between each parent's two chromosomes before one chromosome is passed on to the child. After several generations, there would be quite a variety of smaller segments carried by cousins descended from the same ancestor. So, rather than looking for the TGs, we should be looking for "genetic networks", clusters of people who share DNA with each other in the cluster but not necessarily over the same segments. The existence of the TGs was explained partly by features in the testing process, such as the lack of phasing, but also by the existence of what are called "population segments" - sequences of base pairs that are just very common in particular populations, so everyone has them, even though there are no close ancestors in common.

How does one know the difference between a genealogically significant triangulating segment and a population segment? Or between a group of matches who have received different segments of DNA from a single ancestor and a group of matches who match on different segments that have come to them from a variety of shared ancestors? Surely the companies are taking these factors into account when they predict the matches? Were the results from the companies even reliable?

So many questions - I felt like I was floundering.

My confidence in what I was doing certainly took a dive at that time. It didn't help that I had also uploaded the raw data for my mother and I to another organisation, DNA Land, who claim to be able to impute "missing" (by which I assume they mean, "untested") areas of DNA, in order to produce a more complete sequence - and yet the number of matches they suggested as a result of this process was not only much less than I have at the other companies, it included people who don't appear to match me at any of the other companies. That seems strange, given that I have tested at all three of the main companies. I know only a small number of my matches elsewhere will have uploaded to DNA Land, but the differences still seemed quite significant [ie only three matches, including Mum, for me at DNA Land - compared to the 1888 I currently have at 23andMe, 1146 at FTDNA, and almost 6000 at Ancestry!]

Was this DNA testing all a waste of time (and money!)?

When in doubt - I go back to what I know. Just as I work from the known to the unknown in my normal genealogy, I realised I needed to do that more with my DNA research, as well. A "stab in the dark" may occasionally hit a target but it's just as likely to leave me floundering around in the darkness, following blind alleys.  And that's what looking for shared ancestry just from the TGs felt like.

The statistics from all of the companies indicate that autosomal test relationships can only be predicted reliably for about the first five generations. That is not to say we won't show a match to more distant relatives - it's just that, the more distant the relationship, the more difficult it becomes to predict the level of that relationship, as the range of possibilities increases. A single segment of DNA may be passed on unchanged for many generations. But, in all the test results, I knew my known relatives always showed up as they should do. My mother was definitely my mother (not that I doubted that!) And my father's known relatives all show up as matches at the right levels.

So DNA testing works!

Beating the temptation to run and hide, I gave the talk in August, describing the two theories and commenting that "most of us don't understand enough about the statistics to make definitive claims either way so a combination of the methods seems to be the best approach. Both methods are valid but have caveats, eg small segments often appear to triangulate, but may not be genuine, clusters of people sharing different DNA may be due to having multiple ancestors in common."

Some bloggers do seem to be finding segments that are shared by groups of distant cousins. The problem for many of us in the UK, though, is that often we don't have sufficient "middle-distance" relatives identified (both in our genealogy and in our DNA) to produce the sort of success stories that many in the US seem to be experiencing. For example I only have 29 fourth cousins in the Ancestry "4th cousins & closer" section, whereas some of the American results I have seen have between 400 - 750 relatives at that level!

But I have had some success in identifying relationships with my matches - I now have the potential shared ancestry identified for 10 of them (and if the 10th is actually correct, it's a big clue as to which of my ancestral lines three other shared matches fit into). So that's a start.

As well as confirming my genealogy & finding new relatives, one of my goals with DNA testing is mapping where my DNA came from. Identifying shared ancestry with my matches is one part of this process and, so far, my chromosome map, mapping DNA received to the relevant "most recent common ancestor" (MRCA), looks like this:

Chromosome 4 shows where a known Parry segment contains within it a Saunders segment:

And this shows how that Saunders segment of DNA appears to have passed down to my Parry grandfather:

Any other matches over the identified segments on the chromosome map should (if the identification is correct) be either a descendant of the same couple, or a descendant of one of their ancestors. 

I think there needs to be a continual checking process, using both DNA and genealogy - for example, having found a genealogical connection to one of my DNA matches at Ancestry, we were then able to confirm, using FTDNA, that the person also matched my mother over the same segment, and that neither my mother, nor I, matched the person's father (both requirements necessary for the genealogy to be correct.) 

Since I have several close relatives tested, it gives me the opportunity to work from the DNA data backwards, rather than just concentrating on those potential triangulating groups of distant relatives. My DNA consists of segments of the DNA of my grandparents, passed to me by each of my parents. The "crossover points", where a segment from one grandparent switches over to a segment from the other grandparent can (sometimes) be identified in our DNA, using the details of how we match close relatives. This is a process I began looking at some years ago, using tools written by David Pike. But now more of my relatives are on Gedmatch, I can use the "Visual phasing" method as explained by Kathy Johnston, which should be a lot easier. 

I have been working on this recently and will post about the process soon (now there's a challenge to myself!)