-->

Wednesday 15 January 2020

Genetic Networks (simple ones!)


Isn’t it great when a new ‘4th cousin & closer’ appears, and you can see this amongst the shared matches:




Four other matches, who all seem to share one suggested line.  I may not have identified the 'most recent common ancestor' yet (the suggested "HARLAND" line comes from a surname & location in common with one of the matches), but this ‘clustering’ gives me something to work on (ie the potential connections between each of the matches), as well as helps me to focus on the part of my own tree that seems most likely to contain the shared ancestors with all of these matches.

In my view, this use of shared matches to produce networks, or clusters, is one of the most effective methods for making sense of our matches at the DNA companies.  Last year, I led a workshop at the Family Tree Live show in Alexandra Palace about ‘Genetic Networks and Triangulation’, and have decided it might be helpful for others if I post some of the  information here.

I wrote about my initial foray into networking back in 2017, with the post at http://notjusttheparrys.blogspot.com/2017/08/ancestry-shared-matches-and-new.html

Since then, other methods for using the shared matches have been developed, for example, the Dana Leeds Method of clustering, MyHeritage’s Autoclustering and the auto clustering of the Collins-Leeds Method in the DNAGedcom Client App - all of which are useful tools.*

But I still like the ‘genetic network’ method, which can take account of every shared match, as opposed to many of the clustering methods, which have restrictions on who is in the cluster, eg all the matches in a cluster must match a certain percentage of the other matches in the cluster.  Whilst that might be useful for those people with thousands of close matches, I think, for those of us with only a few hundred, it can cause important clues to be left out.

But what is the theory behind clustering?

We all (hopefully) know that having known relatives tested can help us to narrow down which part of our family tree other matches connect to.  For example, with parents tested, your matches can be divided into paternal and maternal matches, depending on which parent they also match. (There will probably be a few "false positives" on your match list, as well, who match neither parent, but we won't worry about them for now). 

The principle can be extended with other known relatives, for example, if a cousin of my father tested, it narrows down the potential link with a shared match to one of my great grandparent couples



ie a match to both my Dad's cousin and me will either descend from the shared great grandparent couple or from one of their ancestors.  So I can discount 3/4 of my tree, when I look for the shared ancestry.

Even if we don’t have any known relatives tested, the principle can still be applied - and also extended further back up the branches of our family tree




It does get a bit complicated when trying to describe shared matching in terms of descendants of ancestral lines!
But it should be possible to see how people in particular positions in our family tree will only match certain other relatives, depending on which ancestral lines they share.

I think a key question to ask to understand this is "Where has the DNA come from?"  We received our autosomal DNA through all of our ancestral lines.  Our matches will have received their autosomal DNA from all of their ancestral lines.  The only way we can be genuine genealogical matches is if we share an ancestor somewhere so that we both received the 'same' segment(s) of that ancestor's DNA.  Each segment will only have travelled down to us through one of our ancestral lines.  

Some other people who descend from that ancestor will also have received the same segment.  For example, many people are testing siblings, parents and close cousins.  Such a group of close relatives will all match each other and several of them would be likely to share any particular segment.   So (assuming there's enough DNA to be picked up as matches to me), the descendants of a particular shared ancestor will show up as a group of "shared matches" to each other.

This is the principle behind all the "clustering" and "networking" methods of working with our matches. 

With close relatives, matches fall into only a small number of clusters, but the further back we go, the potential number of clusters increases. If we could just look at matches who all share an ancestor with us at one particular generation, we would get neat clusters:

Relationship

Shared Ancestors
‘Ideal’ number of Clusters/Groups
Full siblings

All
1
Half Siblings

One side, paternal or maternal
2
1st cousins

2 grandparents
2
2nd cousins

2 great grandparents
4
3rd cousins

2 great great grandparents
8
4th cousins

2 great great great grandparents
16


But the reality is, there will be overlap between groups when there are relatives from different generations included.  For example, matches at second cousin level will match two separate clusters from third cousin level, whereas people in one of the third cousin clusters will not match those in the other:


There are issues to be aware of - beyond third cousins, it's possible there will not be sufficient DNA in common for cousins to show up as matching each other.  But there is also the added complication that, at some stage, we reach a point where we share multiple common ancestors with some matches.  This means clusters can show as being linked to each other when they don't actually share the same common ancestor.

But the main point is that, if we can identify how a group of our matches all match each other, then that can sometimes help us in identifying how we match them as well.

And this was what I demonstrated at Family Tree Live, using my top 25 matches at Ancestry.

First, I allocated a letter to each match, for privacy.  Then I produced a table showing who amongst those 25 matches matched each other:

Predicted Relationship level

Shared matches
1st Cousin
Match A
B, E, G, L, V, W,
3rd Cousins
Match B
A, E, G, L, N, S, W,

Match C
Q, T,

Match D
F, H, I, P, R, Y,

Match E
A, B, G, L, N, W,

Match F
D, H, O, P, R, Y,

Match G
A, B, E, L, N, W,
4th Cousins
Match H
D, F, K, M, O, P, X,

Match I
D, Y,

Match J
(only shared matches beyond the first 25)

Match K
H, M, O,

Match L
A, B, E, G, N, W,

Match M
H, K, O, P, X

Match N
B, E, G, L, W,

Match O
F, H, K, M, R, X,

Match P
D, F, H, M, R, X,

Match Q
C,

Match R
D, F, O, P,

Match S
B,

Match T
C,

Match U
(only shared matches beyond the first 25)

Match V
A,

Match W
A, B, E, G, L, N,

Match X
H, M, O, P,

Match Y
D, F, I,

For the workshop, I produced an image showing lettered dots, with no lines joining them up, so that people could have a go at manually producing the network diagram (A relatively simple task, when the numbers of matches are limited - I suggest not trying it manually with hundreds of matches!)

But here is the diagram with lines drawn to show who matches who:



As you can see, the matches fell into three groups of shared matches.  Not everyone in each group matches everyone else, but the groups are separate from each other.

I don't know exactly how I relate to all of the matches - some have not responded to messages, others have no information about their families.  However, by placing those I do know onto my tree, it is possible to get a good idea as to why the matches in each network form the groups they do:



Network 2 appears to all be on my father's side of my family, as all the known matches would have received DNA from ancestors of my paternal grandmother.

Network 1 is on my mother's father's side of my family.  There are six matches who potentially trace back to a NAYLOR family in London in the early 1800s. The NAYLOR line marries into the ALLEN line at my great grandparents level and, of the other five matches in this network, two descend from the ALLEN line prior to the two lines joining, two descend from after the lines join and one is unknown. 

If I removed from the network the two who descend from after the lines join, as well as the unknown match, who also appears to descend from both lines, this network would fall into two separate groups that do not share DNA with each other - matches who descend from the NAYLORs in one and matches who descend from the ALLENs in the other.  Since these matches do not have any ancestry in common, it is not surprising that they do not share any DNA either:


The smaller Network 3 is on my mother's mother's side of my family and can be seen as a similar pattern to Network 2, with a close match who descends from two of my ancestral lines (DOWDING and HARRISON), matching two others who each descend from one of those lines and who therefore do not match each other.

It is not possible to rely solely on the information from DNA networks or clusters - as I have mentioned, multiple common ancestors, and the variable nature of DNA transmission (as well as company policies regarding the thresholds being used for showing shared matches) can mean that some people show as matches who 'shouldn't' (because they share through a different ancestor), or don't show as matches when they 'should' (because the DNA has dropped out, or the quantity of shared DNA is below the company threshold.)  

But hopefully, it is clear that shared matches can provide vital information to help us trace our common ancestry with our matches.  

[And, just in case you're wondering about the lack of matches from my father's father's side, I suspect this is due to a mixture of family structures and the fact DNA testing is still not as popular in the UK as in some other countries.  In the three ancestral lines represented above, my grandparents all had multiple siblings, and recent generations have embraced DNA testing.  Whereas my paternal grandfather only had one sibling and, as far as I am aware, only one of the descendants has tested and that was at a different company.  I do have matches from further back on this, my PARRY line, at several of the other companies - so no worries so far about anybody's parents not being the expected ones. But that is always something one must bear in mind! :-) ].

*
The Dana Leeds Method of clustering - https://www.danaleeds.com/the-leeds-method/
The DNAGedcom Client App (subscription based) - available from https://dnagedcom.com/

No comments:

Post a Comment