I sat down at my computer recently, ready to tackle one of my "ToDos", which was to read through my blogs from the last few months and create a list of all the things I have said I will do, in order to (try to!) keep myself 'on track' with those tasks.
Funny how plans can change!
I'd previously seen a post by Debbie Cruwys Kennett, in the "DNA help for Genealogy (UK)" group on Facebook, which indicated that Ancestry was releasing a "clustering tool" for DNA matches. The tool is only available to those with Pro-Tools, which I currently happen to have. But the official Ancestry blog does state that "Some members will not be able to access this feature until December 2025."
So I wasn't expecting to have access to it yet, although I was pleasantly surprised to see a few UK members reporting that they did. I was then even more surprised to discover that I also now do.
I'm not going to spend time describing what the tool does, or how important 'clustering', or grouping one's matches on the basis of other shared matches, is - you can read about that on the official blog, at https://www.ancestry.co.uk/c/ancestry-blog/dna/dna-matches-by-cluster or on the Support page at https://support.ancestry.co.uk/s/article/Matches-by-Cluster.
I've also written a couple of posts over the years, regarding genetic networks and clustering of shared matches, which you can find at https://notjusttheparrys.blogspot.com/2017/08/ancestry-shared-matches-and-new.html and https://notjusttheparrys.blogspot.com/2020/01/genetic-networks-simple-ones.html
The genealogical research is still going to be necessary, in order to make sense of the clusters.
At the moment, the clustering only involves matches who share between 65cM - 1300cM of DNA. Perhaps, unlike many in the UK, I am fortunate in that I have almost thirty matches who meet that criteria, through an assortment of first and second cousins, at varying levels of 'remove'. This means I am seeing some clusters - but not the overwhelming numbers which, perhaps, many people in the USA are having to make sense of.
As with any new tool, it is also important to remember that the initial release by Ancestry is in a "Beta" mode, which means the tool is still 'being tested and actively developed'. So it (or the results) are likely to change.
And they have certainly been doing that!
This was my 'first look' at the clusters. over the course of the initial days:
Even without knowing who any of the squares represent, those of us who have seen other clustering tools might find the first image a bit 'strange', since it looks as if almost all of the matches should fit into one of two main groups, each group containing some subgroups, rather than the two main groups being split up as they appear there.
When I next looked, the results had changed to the second image. "Great", I thought. "The system just wasn't properly taking into account the separation between my paternal and my maternal matches."
On the third day (not shown), the results had reverted to the first image, with split clusters.
Oh well - I did say the tool was in beta! :-)
Currently, the results still appear as per the third image and I imagine they will stay like that until I receive some more close matches.
But how close does a match need to be to affect the clustering results? [ps I don't know the answer!]
You'll note that I have stated the numbers of matches included at the top of each image. The text might be a bit small so:
1st image - 24 matches
2nd image - 22 matches
3rd image - 23 matches
This variation in the number is something I noticed because I was trying to write this post, and was therefore privatizing the names.
Since the clustering seems to be redone each time the page is visited, perhaps differences should be expected.
But nothing had changed with regard to those close matches - there were no 'new' close matches received. Whether more distant new matches were affecting the results, or whether the changes were due to Ancestry modifying their clustering calculations, I don't know.
The loss of one of the matches, in particular, is intriguing, and helps to demonstrate the ongoing need for genealogical research, and not 'assumptions' about what the clusters are showing.
I'm going to deal with each side of my family separately.
Maternal Side
These are the maternal clusters - on the left, an image from the first day, when there were 24 matches included in total, 8 of them being maternal, on the right, the current results, which only includes 7 of the maternal matches:
The red cross shows the match that is no longer included - as you can see, it is someone who matches everyone in both clusters. The clusters have also switched around (so the blue in the left hand image is now the purple in the right hand, and the pink in the left hand is now the blue in the right hand image, less the one match.)
Paternal Side
The clusters for my paternal side - just one image this time since, although one match did "disappear" after the first day, they did reappear again, and have remained included in the clusters since.
Two of the matches are so far unidentified, "6" and "11" but the following image shows where the other fourteen fit in my paternal side:
Once again, you can see that, if it wasn't for two of my first cousins, who will have received DNA from my paternal grandfather, all of the rest of the matches are on my paternal grandmother's side of my family.
So, as with my maternal side, not all branches of my family are represented in the clusters.
And, just as close relatives (the two siblings) appeared in different clusters on my maternal side, this time, a parent is in a different cluster from their three children, and grandchild.
Although Ancestry does give details about how they do the clustering on their support page, "Science of: Matches by Cluster", and indicate that the tool takes into account how matches match each other, using "sets of matches who are more related to each other than to your other matches," in order to end up with "Clear, organized clusters", you can see that the results don't always look like that!
Considering that the majority of people on my paternal side are all matching each other, it would be interesting to know more about what the different clusters are representing. Matches 4, 5, & 7, as well as 1, 2 and 3, (the 'gold' group) will contain DNA from branches of my family that the 'pink' and 'purple' groups don't so, perhaps, when I finally do a more detailed analysis, not just of how much DNA is shared between all of us who are second and third cousins, but also combine that with who shares which ancestral lines, it might become clearer.
Comparing to the DNAGedcom Client App clustering
In the meantime, I thought I would also do a comparison to the clustering produced by the DNAGedcom Client app, using the same cM criteria that Ancestry does. I have a total of 29 matches who fit into that range and since, with the Client app, it is possible to include "unclustered" matches (ie matches who don't have sufficient shared matches to create a cluster), they are all shown in the following diagram:
"1" - I don't know exactly where this match fits, because identifying the shared ancestry in this case is complicated because of immigration into the USA (possibly via Australia). But my current thought, based on pedigrees and research in the USA, is that they descend from a sibling of WN in the first generation shown on the tree. That would explain why, although they appear in the cluster for my mother's paternal side, they do not match numbers 25-27, but do match 21-23 & X, who will all carry DNA from the same line (ie parents of WN, down through CN, and then JWFA.)
That would make them a 4c to me - however, Ancestry predicts that they are a "3rd cousin or half 2nd cousin 1x removed" (and I find it unusual for Ancestry's predictions to be closer than the genealogy suggests. Several of my second cousins are predicted to be more distant than the genealogy indicates, to the extent that I have wondered whether the Ancestry predictions are slightly 'biased' by the multiple common ancestors many American testers seem to share with each other, since that creates a higher level of shared DNA for particular relationships, than we see in the UK. Thus causing predictions for some of our relationships to be more distant than the relationship actually is.) (I hope that makes sense! :-) )
The cousin matching is something I still need to look into more specifically among my matches - as is the possibility that there is a missing 'Frederick Nayl(o/e)r' at a closer level than the one I know of, who descends from a sibling of WN.
"2" and "3" are identified matches, who descend from my Mother's maternal lines, a branch not represented in any of the Ancestry Clusters
"4" is currently unknown but, given the shared matching to "2", is also likely to be on my Mother's maternal side
And, finally, "5", the solitary match, is a 2c from my Father's paternal lines.
I have added the three known additional matches, to the following tree image, which combines my maternal and paternal lines. I have also included a representation of where I currently think "1" descends from, as doing so should help to clarify their matching/not matching within that maternal cluster (the match themselves should be at either my mother's, or my generation):
As Ancestry develops the tool further, we know there will be changes to the cM limits, which will help to make the tool more useful. Whether they also start to include 'unclustered' matches remains to be seen.
But, in conclusion, I hope this goes some way to demonstrating how necessary the genealogical research still is, in order to make sense of the clustering.
No comments:
Post a Comment