Isn’t it great when a new ‘4th cousin & closer’ appears, and
you can see this amongst the shared matches:
Four other matches, who all seem to share one suggested line. I may
not have identified the 'most recent common ancestor' yet (the suggested "HARLAND" line comes from a surname & location in common with one of the
matches), but this ‘clustering’ gives me something to work on (ie the potential connections between each of the matches), as well as helps me to focus on the part
of my own tree that seems most likely to contain the shared ancestors with all of these matches.
In my view, this use of shared matches to produce networks, or
clusters, is one of the most effective methods for making sense of our matches
at the DNA companies. Last year, I led a workshop at the Family Tree Live
show in Alexandra Palace about ‘Genetic Networks and Triangulation’, and have decided it might be helpful for others if I post some of the information
here.
Since then, other methods for using the shared matches have been
developed, for example, the Dana Leeds Method of clustering, MyHeritage’s
Autoclustering and the auto clustering of the Collins-Leeds Method in the
DNAGedcom Client App - all of which are useful tools.*
But I still like the ‘genetic network’ method, which can take
account of every shared match, as opposed to many of the clustering methods,
which have restrictions on who is in the cluster, eg all the matches in a
cluster must match a certain percentage of the other matches in the
cluster. Whilst that might be useful for those people with thousands of
close matches, I think, for those of us with only a few hundred, it can cause
important clues to be left out.
But what is the theory behind clustering?
We all (hopefully) know that having known relatives tested can
help us to narrow down which part of our family tree other matches connect
to. For example, with parents tested,
your matches can be divided into paternal and maternal matches, depending on
which parent they also match. (There will probably be a few "false
positives" on your match list, as well, who match neither parent, but we
won't worry about them for now).
The principle can be extended with other known relatives, for
example, if a cousin of my father tested, it narrows down the potential link
with a shared match to one of my great grandparent couples
ie a match to both my Dad's cousin and me will either descend from
the shared great grandparent couple or from one of their ancestors. So I can discount 3/4 of my tree, when I look
for the shared ancestry.
Even if we don’t have any known relatives tested, the principle
can still be applied - and also extended further back up the branches of our
family tree
It does get a bit complicated when trying to describe shared
matching in terms of descendants of ancestral lines!
But it should be possible to see how people in particular positions in our family tree will only match certain other relatives, depending on which ancestral lines they share.
I think a key question to ask to understand this is "Where has the DNA come from?" We received our autosomal DNA through all of our ancestral lines. Our matches will have received their autosomal DNA from all of their ancestral lines. The only way we can be genuine genealogical matches is if we share an ancestor somewhere so that we both received the 'same' segment(s) of that ancestor's DNA. Each segment will only have travelled down to us through one of our ancestral lines.
Some other people who descend from that ancestor will also have received the same segment. For example, many people are testing siblings, parents and close cousins. Such a group of close relatives will all match each other and several of them would be likely to share any particular segment. So (assuming
there's enough DNA to be picked up as matches to me), the descendants of a particular shared ancestor will show up as a group of "shared
matches" to each other.
This is the principle behind all the "clustering" and
"networking" methods of working with our matches.
With close relatives, matches fall into only a small number of
clusters, but the further back we go, the potential number of clusters
increases. If we could just look at matches who all share an ancestor with us
at one particular generation, we would get neat clusters:
Relationship
|
|
Shared Ancestors
|
‘Ideal’ number of Clusters/Groups
|
Full
siblings
|
|
All
|
1
|
Half
Siblings
|
|
One
side, paternal or maternal
|
2
|
1st cousins
|
|
2
grandparents
|
2
|
2nd cousins
|
|
2
great grandparents
|
4
|
3rd cousins
|
|
2
great great grandparents
|
8
|
4th cousins
|
|
2
great great great grandparents
|
16
|
But the reality is, there will be overlap between groups when
there are relatives from different generations included. For example, matches at second cousin level will match two separate clusters
from third cousin level, whereas people in one of the third cousin clusters
will not match those in the other:
There are issues to be aware of - beyond third cousins, it's possible there will not be sufficient
DNA in common for cousins to show up as matching each other. But there is also the added complication
that, at some stage, we reach a point where we share multiple common ancestors
with some matches. This means clusters
can show as being linked to each other when they don't actually share the same
common ancestor.
But the main point is that, if we can identify
how a group of our matches all match each other, then that can sometimes help
us in identifying how we match them as well.
And this was what I demonstrated at Family Tree Live, using my top
25 matches at Ancestry.
First, I allocated a letter to each match, for privacy. Then I produced a table showing who amongst
those 25 matches matched each other:
Predicted
Relationship level
|
|
Shared matches
|
1st Cousin
|
Match A
|
B, E, G, L, V, W,
|
3rd Cousins
|
Match B
|
A, E, G, L, N, S,
W,
|
|
Match C
|
Q, T,
|
|
Match D
|
F, H, I, P, R, Y,
|
|
Match E
|
A, B, G, L, N, W,
|
|
Match F
|
D, H, O, P, R, Y,
|
|
Match G
|
A, B, E, L, N, W,
|
4th Cousins
|
Match H
|
D, F, K, M, O, P,
X,
|
|
Match I
|
D, Y,
|
|
Match J
|
(only shared
matches beyond the first 25)
|
|
Match K
|
H, M, O,
|
|
Match L
|
A, B, E, G, N, W,
|
|
Match M
|
H, K, O, P, X
|
|
Match N
|
B, E, G, L, W,
|
|
Match O
|
F, H, K, M, R, X,
|
|
Match P
|
D, F, H, M, R, X,
|
|
Match Q
|
C,
|
|
Match R
|
D, F, O, P,
|
|
Match S
|
B,
|
|
Match T
|
C,
|
|
Match U
|
(only shared
matches beyond the first 25)
|
|
Match V
|
A,
|
|
Match W
|
A, B, E, G, L, N,
|
|
Match X
|
H, M, O, P,
|
|
Match Y
|
D, F, I,
|
For the workshop, I produced an image showing lettered dots, with
no lines joining them up, so that people could have a go at manually producing
the network diagram (A relatively simple task, when the numbers of matches are
limited - I suggest not trying it manually with hundreds of matches!)
But here is the diagram with lines drawn to show who matches who:
As you can see, the matches fell into three groups of shared
matches. Not everyone in each group
matches everyone else, but the groups are separate from each other.
I don't know exactly how I relate to all of the matches - some
have not responded to messages, others have no information about their
families. However, by placing those I do
know onto my tree, it is possible to get a good idea as to why the matches in each network form
the groups they do:
Network 2 appears to all be on my father's side of my family, as all the
known matches would have received DNA from ancestors of my paternal
grandmother.
Network 1 is on my mother's father's side of my family. There are six matches who potentially trace
back to a NAYLOR family in London in the early 1800s. The NAYLOR line marries
into the ALLEN line at my great grandparents level and, of the other five
matches in this network, two descend from the ALLEN line prior to the two lines
joining, two descend from after the lines join and one is unknown.
If I removed from the network the two who descend from after the
lines join, as well as the unknown match, who also appears to descend from both
lines, this network would fall into two separate groups that do not share DNA
with each other - matches who descend from the NAYLORs in one and matches who
descend from the ALLENs in the other.
Since these matches do not have any ancestry in common, it is not
surprising that they do not share any DNA either:
The smaller Network 3 is on my mother's mother's side of my family
and can be seen as a similar pattern to Network 2, with a close match who
descends from two of my ancestral lines (DOWDING and HARRISON), matching two
others who each descend from one of those lines and who therefore do not match
each other.
It is not possible to rely solely on the information from DNA
networks or clusters - as I have mentioned, multiple common ancestors, and the
variable nature of DNA transmission (as well as company policies regarding the
thresholds being used for showing shared matches) can mean that some people
show as matches who 'shouldn't' (because they share through a different
ancestor), or don't show as matches when they 'should' (because the DNA has
dropped out, or the quantity of shared DNA is below the company threshold.)
But hopefully, it is clear that shared matches can provide vital
information to help us trace our common ancestry with our matches.
[And, just in case you're wondering about the lack of matches from my father's father's side, I suspect this is due to a mixture of family structures and the fact DNA testing is still not as popular in the UK as in some other countries. In the three ancestral lines represented above, my grandparents all had multiple siblings, and recent generations have embraced DNA testing. Whereas my paternal grandfather only had one sibling and, as far as I am aware, only one of the descendants has tested and that was at a different company. I do have matches from further back on this, my PARRY line, at several of the other companies - so no worries so far about anybody's parents not being the expected ones. But that is always something one must bear in mind! :-) ].
*