-->

Monday, 23 March 2026

MyHeritage WGS DNA test results - initial comparison to an uploaded kit

 In January, I mentioned that I had now tested with MyHeritage, rather than just uploading kits from other DNA companies to that site, since MyHeritage have now brought in "Whole Genome Sequencing" (WGS).  At the time, my kit was still in the processing stage, and I was looking forward to receiving the results, and the possibility of comparing them to my other kits there.

So this post is the beginning of any comparisons, looking at the new test compared to the upload I did, in November 2016, of the data from my FTDNA test.  

[Please note, this is just my personal exploration of the results I have - I don't keep up with the "bigger picture" of what's happening regarding genetic genealogy so, if you're looking for more detailed analysis and comments regarding the WGS test, I suggest reading the comparisons carried out by someone such as Roberta Estes, on her blog, "DNAeXplained – Genetic Genealogy". I also don't have a subscription to MyHeritage, which might affect the level of detail available for my kits.]  

One of my expectations of the new test was that there would be less matches than I have with the FTDNA kit. This was because, in order to compare kits from different companies, who might not all test exactly the same points in the DNA, MyHeritage uses a process of 'imputation'.  This process 'fills in' gaps in the sequences. Although imputation seems to be a common process used by all of the DNA companies, and is carried out in accordance with specific principles, it can potentially lead to cases where people are incorrectly identified as matches, when they shouldn't be (and possibly vice versa).  Looking at my "new match" notifications from MyHeritage in the past, I've often thought that could be the case, with many of the matches showing low levels of shared DNA.  Hence my expectation that the better coverage of the new test would discount these lower level matches.

But I was wrong!

When I first received my results, the new kit showed a total number of matches of 17019, whereas my FTDNA uploaded kit showed 16531. The totals as at the time of writing this (23/3/26) are 17328, and 16820 respectively.

It's currently not possible to download a list of all one's matches at MyHeritage, although that used to be possible. So I opted for a 'cut & paste' collection of the closest 2000 matches to each kit, and put those into two spreadsheets (and, yes, that did take a while.)  For both kits, this resulted in the lowest matches having a total shared DNA of around 19cM/20cM.

I then did a fairly simple comparison between the two spreadsheets, and discovered that almost half of the names in each spreadsheet did not appear in the other sheet:   

It is possible (and, I imagine, quite likely) that many of those who only appeared on one sheet do match the other kit but are beyond the first 2000 matches. I haven't specifically looked at many of the matches to check that yet, given the numbers of "No" matches involved. But, with my 'close' and 'extended' family only accounting for 17 of the matches, and the rest all being 'distant', I can imagine small changes in the levels of shared DNA could make quite a change to the order in which they appear on my match list.  

It was also obvious from those figures that there was something a bit 'odd' with the comparisons, since one would expect the same number of 'yes' matches in each sheet.  

The difference was caused by the fact that I had only compared names (since I expected there might be differences in the levels of shared DNA, so hadn't included that information in the comparison criteria, but didn't think to include other items, such as age, where the matches were from, or who manages the DNA, etc.)  

The reasons I identified for the difference included:
- five names that appeared twice in the MyH sheet, but only once in the FTDNA.
- three matches appeared twice in the FTDNA sheet, but only once in the MyH sheet.
- ten matches in the MyH sheet were only identified as "DNA kit", an increase of three from the number of such matches in the FTDNA sheet.  
- three 'private' matches in the FTDNA sheet did not appear in the MyH sheet.

At this point, I copied all the ''no" entries into one spreadsheet, and the "yes" entries into another, and physically aligned all the "yes" entries for the two kits, so that I could investigate how the shared DNA levels might have changed.  I took out all of the 'anomalous' entries identified above, leaving 1009 entries which appeared in both kits. 

[Note, I have also now re-run the comparison between the sheets, having concatenated "name", "age", "from", "managed by", "contact", and the "tree or not", items. Doing so identified just three entries that didn't match up correctly. Two of them were where there was one entry in the MyH sheet but two in the FTDNA sheet, and I had picked the wrong one to include. One of these would make no difference to the figures, the other would increase the number of kits that have gained one segment. The third entry was a mismatch between two kits labelled as "unknown" that I hadn't spotted. I don't like that sort of error so, in the following, I have removed that kit (leaving 1008 in both 'yes' lists), and also updated the other two entries with the correct matching details.  The new comparison also showed that I could have included some of the entries just identified as "DNA kit" in the following comparisons, since they can be matched up across the two kits. However, I haven't added them in, since none of them are particularly close matches, or make a noticeable difference to the figures/charts.]

To start with, I looked at the "no" spreadsheet and, for each kit, plotted the total DNA shared against the longest segment, just to give me an idea of the levels of sharing that didn't make it into the other kit list:  

  


As you can see, the majority of the kits seem to be where the total shared is between 20-30cM, and the longest segment is less than 20cM.  As I mentioned above, I suspect that many of these kits might be matching my other test, but the variations mean the matches appear in different orders and these just didn't appear in the first 2000 entries. 

However, there are some where the longest segment is over 30cM, or the total shared DNA is over 40cM, as well as the longest segment being over 20cM.  So I decided to check each of those, to see if they were matching the other kit, but beyond the first 2000 matches - only four of them were:

Of the four kits easily identified as also matching the other (FTDNA upload), the first in the table above had gained two new segments, on different chromosomes, the next had lost one segment, but then gained three new segments on other chromosomes, the third showed increases on the two 'existing' segments, plus the addition of a new segment on a different chromosome, and the final one showed an increase (of over 20cM) on the 'existing' segment.

I might come back to these details, as and when I take another look at chromosome mapping. But, for now, I moved on to look at the 'yes' sheet, ie those matches that appeared in the first 2000 entries of both my FTDNA upload, and also the new MyHeritage WGS test.

The 'yes' kits   

I began by looking at a scattergram of the change in 'Total cM shared' against the change in 'Longest segment' but it's perhaps more helpful to look at the following two charts first.

This shows the numbers of matches whose 'Total cM shared' changed, within particular ranges of values (calculated by 'Total shared cM with MyH kit - Total shared cM with FTDNA upload'):

 


From this, you can see that the Total cM shared, for the majority of matches, did not change by very much.  

I think that's important to note, given that the specific examples I'm exploring in more detail are all 'outliers', ie the matches where the changes are more extreme.  I'm looking at them because I find the situations intriguing, not because I'm saying there is anything 'wrong'.  

One can see the same thing, when looking at the changes in values of the 'Longest segment' (calculated by Longest segment matching MyH kit - Longest segment matching FTDNA upload) - the majority of matches showed very little change in the longest segment value:   


The following image shows the changes in Total cM shared against the changes in Longest segment for each individual match:


I think there are two different things showing up here - there's the points falling along a diagonal, indicating that there's been a change in both the Longest segment length and the Total cM shared. But then also a horizontal line of points along the x-axis, where there's been a change in Total cM shared, but without corresponding changes to the Longest segment length - potentially indicating the loss, or gain, of other, smaller, segments.  

From the following figures, it can be seen that, although again, the majority of kits showed no changes in the numbers of segments, almost 200 matches did show either a loss, or a gain:


Of the 81 who lost one segment, the Total cM shared decreased for 80 of them, but the Longest segment showed no change for 70 of those. And, for the 101 that gained one segment, 99 showed an increase in the Total cM shared, but 78 of those showed no change to the Longest segment.  

So those figures would seem to support the possible explanation for the 'horizontal' line of points, that the segments being lost, or gained, are smaller segments, rather than the longest. [and whether any of it is 'significant' would be a totally different issue, given that many of the changes are only in the range of 5-10cM.]  

I was intrigued by one match, who had lost one segment, and yet both their Longest segment and the Total cM shared had increased (by 23.5cM and 18.7cM respectively.) This was a case where a 'gap' between two small 6cM segments on chromosome 18, is now shown as matching, creating one segment of 31cM, another three segments remaining identical:

Another match I followed up was one where the number of matching segments increased by 3 yet the Total cM decreased by 1.8cM and the Longest segment decreased by 20.5cM:

 


In some ways, I don't know what to make of this - the total loss of a segment on one chromosome, but gaining four small segments on different chromosomes.  

The companies give us many such small matches so, according to their science etc, it must indicate at least a 'potential' relationship. But I certainly wouldn't be spending time looking for a genealogical connection to such a match!

The other match that gained three segments had increased both their Total cM shared, and their longest segment (by 28.0cM and 4.9cM respectively):


That seems a bit more 'reasonable' than the previous case, with an increase to the existing segment, and the 'discovery' of three other segments. 

But how relevant some of these segments are remains to be seen.

Closer matches

Finally, I looked for any changes to the matching with my closest relatives. 

In comparisons with my mother's kit, the MyHeritage WGS kit showed different totals on seven chromosomes, from those shown with the FTDNA upload. Two chromosomes showed decreases, the other five were increases, but all individual changes were less than 7cM, producing an increase in Total cM shared of 16.7cM. I'd need to research the particular start and end RSID points of the tests, to see if differences in those explain these changes (since I should match my mother along the full length of every chromosome.)

Comparing my kits to my uncle's, with whom I share 43 segments on each kit, six of the segments had changed slightly (one increased, five reduced), all changes less than 3cM. Three of the segments are all on chromosome one and at least the first segment is potentially due to wider coverage of the newer test, since the starting location has changed to exactly the same RSID point that my mother's kit did.  

The next closest eight matches, taking me down to a Total shared DNA level of 100cM, includes seven identified second and third cousins.  Of these, only one shows a change in the Total shared DNA, with an increase of 18.4cM on chromosome one.  In this case, the increase doesn't seem to be connected to a change in the starting location (which is actually quite interesting, since the starting location for this match on my FTDNA uploaded kit was already showing the earlier location - so why wasn't that kit showing as matching to my mother, and my uncle, from that point?) 

Since this 2c should be matching my uncle over the same range, I shall investigate this further.

But that can wait until another day!



Saturday, 14 February 2026

Bit and pieces: Match numbers, Second cousins, DNA clusters, and Ancestor Score

Match Numbers
As anticipated, the numbers of new DNA matches at Ancestry have been increasing over the last few of weeks - I now have a total of 20,066 matches, up from 20,003 on the 31st January. The numbers of close matches have also increased - following the two new ones during January, there was a loss of one in the first few days of February, so my current total of 379 actually represents four new matches in the "close" category so far this year.

Second Cousins
What I wasn't anticipating was that, just a few days after posting the comparisons between my first and second cousins, I'd gain a new, and relevant, second cousin!  

I have updated the previous table with the shared DNA and Ancestry predictions:


I can confirm that matches 6 and 7 are siblings, thanks to the protools. So, once again, this is a second cousin relationship that shows much less DNA than would be expected, based on the predictions.

DNA clusters
I was interested to see that one of the shared matches with this new 2c connects to what I call a 'splurge' cluster - a large group of matches who all seem to match each other.  

You can see an example of this with the "Group 29" on my post here. 

That's an old post, from when the number of matches was much less than it is now (I only had fifty nine 'close' matches then!) But it illustrates the point of how the shared matches cluster together, and how some of those clusters are much larger than others. 

In this case, there are 172 other shared matches between us, as opposed to just the sixteen matches I share with my second cousin.

Although there has been debate over the years as to what these large clusters represent, I've often wondered whether they could be caused by moderately recent ancestors, whose descendants emigrated to America as part of the Mormon migrations, and who now have a large number of descendants over there.  

So I was very interested to see that this match connects back to ancestors in Utah.  

And, although they aren't showing any connections back to the UK in their tree, I recognise the family they connect to as one that I looked at briefly many years ago, when Ancestry was producing "Circles" and "New Ancestor Discoveries":


  

The particular match does only share 14cM with me - which I know is low and, without any other clues, I would not normally research such a match (or the associated cluster). 

But this does make me think I should be taking another look at the Herefordshire ancestors of those people in the cluster, to see if I can identify my connection back to them - which, potentially, might only be in the early 1800s.

Ancestror Score
One of the things I had hoped to do, in a post today, was to revisit something called an "Ancestor Score".  I first posted about this on Valentine's Day back in 2015 here.  I'd seen the idea on another blog and thought it would be a great way of keeping track of progress, not just on my family history and identified ancestors, but also, by including that extra column, on monitoring my identified DNA matches, as well.

At that time I was expecting to make progress with my genealogy as I took part in the "Genealogy Do-Over".  But life got in the way, as it has a habit of doing! 

I've decided against attempting to produce a quickly updated chart - although there has been some progress over the intervening years, much of it hasn't been as fully documented, and evidenced, as I'd like it to be, so I don't feel it counts. 

Now that I have dropped some of my other activities, and really do plan to make progress this year, I am going to repeat that 2015 table here instead, to lay it down as a 'baseline'.


The proof of progress will be in next year's post on Valentine's Day!



Friday, 6 February 2026

1st and 2nd cousins - shared DNA variability

This post is a bit of "thinking aloud" - I have some data, but not a full answer for why the data shows what it does.

We know that the DNA passed on by the same two parents to their children will vary, such that, although every child will receive half their DNA from each parent, the level of shared DNA between the siblings will vary, depending on which 'bits' of the parents' DNA they each received. And that, as relationships become more distant, the quantity of DNA shared becomes even more variable for particular levels of relationship.  

This is why, for a specific quantity of shared DNA, several possible relationships are often predicted by the DNA testing companies.

When I first took a DNA test at Ancestry, my closest match was a predicted 3rd cousin, who shared 92cM with me.

Based on that quantity of DNA, Ancestry gives the following alternative relationships:




 And the "Shared cM Project" tool1 gives the following probabilities for the various possible relationships:




My match had tested more for ethnicity and 'general' information, and didn't know much about their family history so, based on the image the Shared cM project produces, and the level of shared DNA, I draw out a possible "family tree", showing where my match might fit into my family, along with what I knew about the family at the time:

 




[The only reason for not including the half relationships side of the diagram was to keep things fairly simple.]  

I then set to work on the genealogy - from which we discovered that the match actually seemed to be a second cousin, not a third, despite us sharing a relatively low level of DNA for that relationship. 

A question was asked, by one of the DNA experts, as to whether the match might be a half 2c - and that is a possibility I still bear in mind.

However, I have been interested to see the other quantities of DNA shared, as more of the family have tested over the years. 

I do have quite a few second cousin matches now, thanks to my grandmother being one of ten, but I'm concentrating here on just four of them - a single second cousin from my grandfather's side, and three second cousins from my grandmother's side, who are siblings to each other - and comparing them to myself and two of my first cousins.  This is because the closer relationships, of the siblings to each other, and of the first cousins to each other, are confirmed through the shared DNA, as well as the known family history.

So this is how we all relate to each other:


And these are the levels of shared DNA:


Below is a table of the averages, and ranges, of shared cM for particular relationships, taken from the DNAPainter diagram:


So, with the exception of the 39cM shared between match 5 and me, and of the 23cM shared between matches 1 and 6, all of the values do actually fall within the range for possible second cousins. 

However, the probability of the relationships being second cousins (or even half second cousins) seems to be classed as fairly low for many of the values:   


I have included the Ancestry predictions for the relationships in the following table:


As you can see, only two of the relationships (highlighted in yellow) are predicted to be possible second cousins.  If there is a "half relationship" situation, another two of the predictions (highlighted in pink) would be okay.

But Ancestry's predictions for all the other comparisons are for more distant relationships.

When I received that very first match, one of the first things I did was to put the shared DNA figure into a predictor and, if it hadn't been for the match then being able to give me a couple of names that I recognised, I would probably have been looking at the wrong generation of my tree, at least initially, to try to find our shared ancestry.

As I mentioned above, the question was asked as to whether my first match (and now that would mean their siblings, as well) might be half second cousins to me (and also now to my two first cousins). Since the respective grandparents were the second and fourth children out of ten in the family, with fairly regular "two year intervals" between them all, there would have to be a "story" behind that, if it was true.  

It's obviously not impossible, though, so I'm not discounting it and will continue to explore the possibility, through the clustering of other shared matches.

But, even if a half relationship between my grandparents and their siblings does become evident, it wouldn't explain the fact that the shared DNA, for the majority of the relationships, is still less than would be expected - and therefore, if I needed to search for how I connected to these matches, I might be looking in the wrong parts of my tree! 

So, one point I am trying to make is the importance of "doing the genealogy" and not just relying on such predictions.  Does the predicted relationship fit with the known family history, with ages, and with locations, etc?  If not, don't just assume the "most probable" prediction is the correct one.

Another possibility I have wondered about, is whether the predictions from companies such as Ancestry, and the Shared cM Project, might have a tendency to predict more distant relationships for those of us in the UK.  This could be due to much of the data coming from people with ancestry in the US.  It seems those in the US often have many more matches than those of us in the UK, and potentially, a higher level of "overlapping ancestors", which might create a higher level of shared DNA for particular relationships. And thus 'bias' the predictions.

I don't know enough about the wider field of DNA statistics to know whether that is possible, or whether other people in the UK have found similarly lower levels of shared DNA.  

But I shall certainly be checking the predictions for all my other identified DNA matches more closely in future, to see if those show the same tendency. 



Notes and Sources
1. Shared cM Project 4.0 tool v4





Saturday, 31 January 2026

A possible route into family stories, and the religious leanings of one of my ancestors - more news from January

A possible route into family stories - a "life index"
 I've written several times about family history being more than just 'names and dates', that it's about the lives of our ancestors, and their other relatives, who they were, what they did, and the circumstances that impacted them. Even their hopes and fears, where possible. 

I've also mentioned that we should be recording our own stories because, one day, most of us will, more than likely, have become an 'ancestor'.

But I hadn't resolved the issue of how to make any of that easily accessible to future generations (or even to 'future me'!) or how to provide a 'way in' that might catch their attention and spark their interest.

For some years, I've tried to keep a "Master Timeline" record, in an excel spreadsheet, of what we, as a family, did on particular days.  It began as just a way of keeping track of some of the mundane things, like dentist and optician's appointments, but also included holidays, days out, and other key dates as the children grew up. 

It was designed to answer those inevitable questions of "When did we....?"

Maintaining the spreadsheet has been a bit erratic at times, particularly in the years when I've also kept a journal, and especially once the children left home. And I'd never seriously even considered how easy (or otherwise) the information would be for anyone else to access, since it was mainly just for me, and I knew what was in it.  

But, of course, one can ask a question the other way around - "What did we do on...?"

 For example, looking up all the 31st January entries, I can see that, in 1986, my parents' dog, Sadie, had to be put to sleep, that in 1998, one of my sons ran in the local primary schools' cross country championship, and, in 1999, he had a rehearsal for a school show. (and yes, on this date in 2012, there was a dentist appointment! ☺)

I've realised this as a result of a post by Taneya Koonce1, another member of the Guild of One-Name Studies, whose blog I follow.  She posted a video about her "life index" journals, and it struck me what a brilliant idea this is, to have an index for every day of the year.  

It was one of those 'lightbulb moments', when an idea that I 'knew about' in some form, eg from the old 'birthday books' that some of us might have kept, or the 'on this day' notices on 'history' websites, suddenly became something I could actually use in a way that will help me to achieve what I want to.  

Taneya is in the US so, although I love the tree design on the front of her journals, those specific ones would take a while to arrive in the UK. There are similar ones available here but I have decided instead to go for a plain covered, larger journal (A4) which contains 400 pages (200 sheets).  That means I can include two pages for dates which I know I'll probably have lots of stories for (eg Christmas, or close family birthdays).  There'll also be some 'spare' pages that I can use to list stories where the specific date is unknown (eg my mother, as a child, using the bedsheets to climb out of her bedroom window, in order to try to avoid having a bath!)

As you can possibly tell, from the things I've listed above for the 31st January, just noting a key event on a day can act as a prompt for a family story - in those cases, what Sadie, the dog, was like and the things she got up to, as well as the childrens' sporting and 'theatrical' activities. 

As Taneya says, "writing things down doesn’t just preserve them. It activates them."

Another January discovery, the religious leanings of an ancestor
I had a lovely surprise early in the month, when I was contacted by a descendant of one of the step-daughters of my 2xgreat grandfather.  We had exchanged information some years ago but, recently, she had discovered that the burial register for Rowlestone, Herefordshire, is available on the Ewyas Lacy Study Group site.2
 
Now, I knew that my Thomas PARRY's second wife, Ann, had been buried on 19 September 1908, "without the burial service according to the rites of the Church of England", because that was information supplied to my dad, by a local vicar, back in the 1990s. 

But, what the vicar had not passed on, was that the register also gives the name of who performed the service, "William JAMES, Abergavenny", or that it was a noted as a Christadelphian funeral. 

Having discovered this, my contact had then spent time working through the Christadelphian Magazine archives, from which she was able to discover that:
    - My Thomas PARRY and his wife, Ann were both formerly Methodists
    - That they were baptised by immersion and received into fellowship with the Christadelphians in May 1900
    - That their home in Walterstone was used for meetings
    - That their names occurred several times among those from Walterstone who met with the brethren in Abergavenny
    - and that, when Thomas PARRY died, in February 1918, the Christadelphians noted that "we were not allowed to take any part in laying our brother to rest." 

Did other members of the family not approve? 

Or was it the minister of the church where Thomas was buried who objected? 

Or could it have been simply because he was being buried in the same grave as other family members, and a non-conformist burial would have required a separate grave? 

We'll possibly never know for sure. But Thomas PARRY was buried on the 26 Feb 1918, in Christchurch, Govilon, which comes under Llanwenarth Ultra, Monmouthshire, in the same grave as his first wife, Sarah, and their infant son, Lewis.

I'm very grateful to the researcher who, many years ago, uncovered the gravestone and supplied my dad with this photograph:

 


[Especially since, when I visited the graveyard some years later, in 2002, the particular area where this stone was, was totally overgrown. I found several of the graveyards around there were being allowed to 'return to nature', so the stone would have been almost impossible to find. ] 


Notes and Sources

1. Taneya's post on substack: https://taneyakoonce.substack.com/p/helping-dates-tell-stories-with-an

It's also on a Facebook reel: https://www.facebook.com/reel/950876704124479


DNA match numbers

 Like many people, I imagine, I've spent some of January doing a bit of 'sorting and planning' to help me achieve what I'd like to during the year.  

So now I just need to actually do the things I've planned!

One of the first tasks was to update the graph of how many close matches I have at Ancestry.  At the time of my last post, the review of 2025, the number had increased to 376 close matches.  I now have 378 close matches - and I also noticed yesterday that I had exactly 20,000 matches, in total, there. 

(But that total had already increased to 20,003 by this morning.)


Since I was interested in the rate of increase, I also looked at the change in the totals over the years:


The Ancestry test was launched in the US in 2012 and then in the UK, in January 2015.1 One can see that, after an initial slow start, for me, the three years between 2017-2019 saw the most new close matches, with an average of 50 across those three years.  Numbers have since reduced, averaging 30-35 per year, but are quite variable.

 From the graph, many of the years seem to show a higher rate of increase in the early months of the year - probably due to the sales in December, and 'Christmas gifting', which results in more kits being processed during those early months.

It will be interesting to see if the early part of this year shows the same sort of curve. Although kit prices at Ancestry were reduced, those of one of the other companies, MyHeritage, were even cheaper.  

And, with the news that MyHeritage was moving on to "Whole Genome Sequencing" (WGS)2, perhaps more people will have opted to purchase kits from there instead?

Either way, I'm sure, with this change, there will be a surge in the numbers at MyHeritage - if only because of all those who have already taken DNA tests elsewhere now deciding to try the new test, as well. 

I admit it - I did too.

My kit is currently in the "WGS in progress" stage, and I am looking forward to receiving the results.  It will be interesting to see how they compare to those received from the other companies I have tested with, and especially with those kits I transferred to MyHeritage.

Unfortunately, I've not been tracking numbers there in the same way, with those transferred kits - but perhaps it will be worth starting to do so, once these new results are in.


Notes and Sources

1. Launch dates of the autosomal DNA test at Ancestry: https://isogg.org/wiki/AncestryDNA


Wednesday, 31 December 2025

2025 NJTPs review

I thought I'd end this year with a quick review of how things have progressed with my family history research, and this blog, over the past twelve months - which basically consists of me looking at all my blog posts and seeing how many things I said I'd do, and whether I have actually achieved them, or not!  

The previous year had ended with a family get-together, which reminded me of "the need to focus on my own family history again - the unwritten stories, the wider research possible now that record availability is so much better than it was when my parents began their research in the 1980s, the opportunities that DNA provides in tracing more distant or 'lost' relatives...."  (January)

There's been very little visible progress with this, since my focus has still been on other things for most of the year.  However, I did make a start on sorting out my various family trees in FamilyTreeMaker (private, working, & public, etc), and hope to get that process finished, and the relevant trees synced online, in the early part of 2026, so that I can then build on them, when I (finally!) deal with my parents' research paperwork.  

In January, I noted that I had 345 DNA matches on Ancestry in the "4th cousins & closer" category.  While I haven't yet added this year's entries to my file, in order to produce a comparable graph, I do now have 376 matches in that category. So the numbers of closer matches are still steadily increasing.

Also in January, I wrote about my need to catch up with the best tools for working with DNA matches now, since I hadn't been keeping up with such things over the previous few years, especially since I had recently taken out the Ancestry "pro-tools" subscription.  Again, I haven't made much visible progress with that.  However, in October, I did attend the DNA seminar organised by the Guild of One-Name Studies, at Oadby, in Leicestershire.  That was an informative, and useful, day, and I shall be attempting to apply some of what I learnt over the next year.

In February, I mentioned DNA and pro-tools again, after receiving a couple of DNA matches who are likely to connect to me through my NAYLOR ancestry.  I still haven't written the post I originally planned to write, using data from my first and second cousins, to illustrate how useful it can be to see the quantity of DNA shared between shared matches - so that post has become a 'priority' for next year. 

But I was able to confirm the information about the NAYLOR monuments that I mentioned in February, adding photographs of them in March. The NAYLORs are a family that I will certainly be posting more about over this coming year, - their line was one of the larger branches that I needed to add to my FTM files, and, in October, I was also able to photograph a couple of the churches, in and near Hull, where the earliest known events in the family occurred.  I'll obviously be including the photographs, when I write about the events.

As well as the post about the NAYLORs in March, I had a little 'rant' about incorrect information that had been supplied to my Dad, some years before, relating to our JONES & SAUNDERS family in Breconshire, and also shared a bit about a different JONES family who, together with the HENGLERs, were involved in firework making and displays.  Major failure here, in that I still haven't finished writing the article about them that I mentioned, for the local Family History Society. (Despite undertaking Janet FEW's course, "Are You Sitting Comfortably? Writing and Telling Your Family History" with Pharos Tutors, in the meantime!  This is definitely no reflection on Janet, or the course, which was very good, and gave me several ideas on aspects I hadn't previously considered. But I deliberately chose not to do the assessed version of the course, and the year reinforces my experience, from several episodes in the past, which have demonstrated that I am not very good at meeting deadlines, even when I set them myself!  I think I prefer my experience after the Guild Conference this year, when I wrote a post about the Conference for my one-name study blog, and was pleasantly surprised to then be asked if it could be included in the Guild's Journal - definitely a less stressful process for getting something published. )

In July, I took a look at Ancestry's 'clustering' and demonstrated how necessary the genealogical research still is, and identifying as many of your matches as possible through that, rather than relying on the clustering.  Clustering is not a 'magic bullet', and can be misleading, if considered on its own.  Using pro-tools to look at how shared matches are related to each other, and using that information to build out their trees, is a much more important tool, in my opinion.

Although one could be forgiven for thinking I had dropped off the face of the earth after July, considering the total absence of posts here, a cursory glance at my two other blogs would demonstrate I was still alive and well.  

But I do want to bring about a better 'balance' between the three blogs next year, since they each relate to areas that are important to me, and which I want to see progress in.  So, just as I have done with my one-name study blog, I am setting a 'pact' for myself - that I will try to make at least one post each month here, in my 'family history and DNA' blog.

And so I am including some 'intentions' here - hopefully I will be able to look back at the end of 2026, and say they are all completed!
  
Activities to do:
Family trees to bring up to date, and shared/synced as appropriate
Work through, scan, and clear, parents' paperwork
Improve family history 'administrative' documentation
Finish the 'firework makers' article

Write posts on:
1c and 2c DNA sharing
Graphing numbers of close matches
The NAYLOR family events
 (and making at least one post per month - current ideas for posts include: my current 'brickwalls', whether I have identified my first case of bigamy, an annual "% of my tree completed", and continuing with 'in depth' posts on particular ancestors, or couples.)

Also some "educational" intentions:
Use of AI tools
WATO and BANYAN, tools for analysing DNA relationships and complex genealogies
Check out Zotero, a tool I have seen recommended, to help manage references, 

Finally, if I compare what I have written above, which includes several 'failures', to what I think other people achieve during a year, it would be easy to feel a bit down. 

But Ancestry recently told me that I am a "Hint Hero!" 

So I thought I'd record their figures and, if they produce another set at the end of 2026, it will be interesting to see how they compare!  

I had 1,935 new hints, 188 of which came from new collections. 
I have viewed 2,679 records during the year.
Clues from my tree supposedly helped 30 other people this year.
Their Regions update uncovered new detail in 6 new regions
They rate my tree as 'good' (at 7.7)(which I know isn't really that good!)
 And I had 1,240 new DNA matches!
 
Happy New Year Everyone!

Thursday, 17 July 2025

Clustering on Ancestry - a first look at the new Pro-tool.

 I sat down at my computer recently, ready to tackle one of my "ToDos", which was to read through my blogs from the last few months and create a list of all the things I have said I will do, in order to (try to!) keep myself 'on track' with those tasks.

Funny how plans can change! 

I'd previously seen a post by Debbie Cruwys Kennett, in the "DNA help for Genealogy (UK)" group on Facebook, which indicated that Ancestry was releasing a "clustering tool" for DNA matches. The tool is only available to those with Pro-Tools, which I currently happen to have. But the official Ancestry blog does state that "Some members will not be able to access this feature until December 2025." 

So I wasn't expecting to have access to it yet, although I was pleasantly surprised to see a few UK members reporting that they did.  I was then even more surprised to discover that I also now do.

I'm not going to spend time describing what the tool does, or how important 'clustering', or grouping one's matches on the basis of other shared matches, is - you can read about that on the official blog, at https://www.ancestry.co.uk/c/ancestry-blog/dna/dna-matches-by-cluster or on the Support page at https://support.ancestry.co.uk/s/article/Matches-by-Cluster.

I've also written a couple of posts over the years, regarding genetic networks and clustering of shared matches, which you can find at https://notjusttheparrys.blogspot.com/2017/08/ancestry-shared-matches-and-new.html and https://notjusttheparrys.blogspot.com/2020/01/genetic-networks-simple-ones.html

But the aim of this post is to show that, even though analysis of shared matches using clusters is extremely useful, it is not some "magic bullet" that is going to instantly solve all your genealogical DNA problems.

The genealogical research is still going to be necessary, in order to make sense of the clusters.

At the moment, the clustering only involves matches who share between 65cM - 1300cM of DNA. Perhaps, unlike many in the UK, I am fortunate in that I have almost thirty matches who meet that criteria, through an assortment of first and second cousins, at varying levels of 'remove'.  This means I am seeing some clusters - but not the overwhelming numbers which, perhaps, many people in the USA are having to make sense of.

As with any new tool, it is also important to remember that the initial release by Ancestry is in a "Beta" mode, which means the tool is still 'being tested and actively developed'.  So it (or the results) are likely to change.

And they have certainly been doing that!

This was my 'first look' at the clusters. over the course of the initial days:

Even without knowing who any of the squares represent, those of us who have seen other clustering tools might find the first image a bit 'strange', since it looks as if almost all of the matches should fit into one of two main groups, each group containing some subgroups, rather than the two main groups being split up as they appear there.

When I next looked, the results had changed to the second image. "Great", I thought. "The system just wasn't properly taking into account the separation between my paternal and my maternal matches." 

On the third day (not shown), the results had reverted to the first image, with split clusters.  

Oh well - I did say the tool was in beta! :-) 

Currently, the results still appear as per the third image and I imagine they will stay like that until I receive some more close matches.

But how close does a match need to be to affect the clustering results?   [ps I don't know the answer!]

You'll note that I have stated the numbers of matches included at the top of each image.  The text might be a bit small so:

1st image - 24 matches

2nd image - 22 matches

3rd image - 23 matches

This variation in the number is something I noticed because I was trying to write this post, and was therefore privatizing the names.  

Since the clustering seems to be redone each time the page is visited, perhaps differences should be expected. 

But nothing had changed with regard to those close matches - there were no 'new' close matches received.  Whether more distant new matches were affecting the results, or whether the changes were due to Ancestry modifying their clustering calculations, I don't know.   

The loss of one of the matches, in particular, is intriguing, and helps to demonstrate the ongoing need for genealogical research, and not 'assumptions' about what the clusters are showing.

I'm going to deal with each side of my family separately.

Maternal Side

These are the maternal clusters - on the left, an image from the first day, when there were 24 matches included in total, 8 of them being maternal, on the right, the current results, which only includes 7 of the maternal matches:



The red cross shows the match that is no longer included - as you can see, it is someone who matches everyone in both clusters.  The clusters have also switched around (so the blue in the left hand image is now the purple in the right hand, and the pink in the left hand is now the blue in the right hand image, less the one match.)

I haven't yet identified one of the eight matches, but the following image indicates the relationships between the other seven matches and me:




It can be tempting to treat the various clusters as if they represent separate branches of our ancestry, with any overlap likely to be from a closer generation (who therefore combines DNA from both of the branches).  

But here you can see that: 
- all of the matches in the two clusters are from one side of my Mum's family. They do not represent both of my maternal grandparents.
- the 'now missing' match ("X") had been placed in a different cluster from their full sibling.

I already knew the relationship between the two siblings, "23" and  "X", through correspondence with them. And, since the clustering is only available to subscribers to the Pro-Tools, it was obviously possible for me to check Ancestry's predicted relationships for them, when they appeared in different clusters, which confirmed that they are full siblings. 

But, if they had been unknown matches, it might have been easy to miss, especially if the clusters had involved more people.  

Paternal Side

The clusters for my paternal side - just one image this time since, although one match did "disappear" after the first day, they did reappear again, and have remained included in the clusters since.


 

Two of the matches are so far unidentified, "6" and "11" but the following image shows where the other fourteen fit in my paternal side:


Once again, you can see that, if it wasn't for two of my first cousins, who will have received DNA from my paternal grandfather, all of the rest of the matches are on my paternal grandmother's side of my family. 

So, as with my maternal side, not all branches of my family are represented in the clusters.

And, just as close relatives (the two siblings) appeared in different clusters on my maternal side, this time, a parent is in a different cluster from their three children, and grandchild. 

Although Ancestry does give details about how they do the clustering on their support page, "Science of: Matches by Cluster", and indicate that the tool takes into account how matches match each other, using "sets of matches who are more related to each other than to your other matches," in order to end up with "Clear, organized clusters", you can see that the results don't always look like that!

Considering that the majority of people on my paternal side are all matching each other, it would be interesting to know more about what the different clusters are representing.  Matches 4, 5, & 7, as well as 1, 2 and 3, (the 'gold' group) will contain DNA from branches of my family that the 'pink' and 'purple' groups don't so, perhaps, when I finally do a more detailed analysis, not just of how much DNA is shared between all of us who are second and third cousins, but also combine that with who shares which ancestral lines, it might become clearer.

Comparing to the DNAGedcom Client App clustering

In the meantime, I thought I would also do a comparison to the clustering produced by the DNAGedcom Client app, using the same cM criteria that Ancestry does.  I have a total of 29 matches who fit into that range and since, with the Client app, it is possible to include "unclustered" matches (ie matches who don't have sufficient shared matches to create a cluster), they are all shown in the following diagram:



All sixteen of the paternal matches shown by Ancestry are included in the initial purple cluster, although not in the same order as in the Ancestry Clusters.  Again, I'm going to need to take a closer look at this - but I'm thinking that those Ancestry 'sub-clusters' might turn out to be more informative in the end.

The orange and blue clusters, which show some shared matching between them, are the eight original maternal matches included by Ancestry, along with one match that Ancestry didn't include ("1") 

The final four are all matches who were not included by Ancestry, no doubt because of limited shared matches.

Out of the additional matches: 

"1" - I don't know exactly where this match fits, because identifying the shared ancestry in this case is complicated because of immigration into the USA (possibly via Australia). But my current thought, based on pedigrees and research in the USA, is that they descend from a sibling of WN in the first generation shown on the tree.  That would explain why, although they appear in the cluster for my mother's paternal side, they do not match numbers 25-27, but do match 21-23 & X, who will all carry DNA from the same line (ie parents of WN, down through CN, and then JWFA.)

That would make them a 4c to me - however, Ancestry predicts that they are a "3rd cousin or half 2nd cousin 1x removed" (and I find it unusual for Ancestry's predictions to be closer than the genealogy suggests.  Several of my second cousins are predicted to be more distant than the genealogy indicates, to the extent that I have wondered whether the Ancestry predictions are slightly 'biased' by the multiple common ancestors many American testers seem to share with each other, since that creates a higher level of shared DNA for particular relationships, than we see in the UK. Thus causing predictions for some of our relationships to be more distant than the relationship actually is.) (I hope that makes sense! :-) )

The cousin matching is something I still need to look into more specifically among my matches - as is the possibility that there is a missing 'Frederick Nayl(o/e)r' at a closer level than the one I know of, who descends from a sibling of WN.

"2" and "3" are identified matches, who descend from my Mother's maternal lines, a branch not represented in any of the Ancestry Clusters

"4" is currently unknown but, given the shared matching to "2", is also likely to be on my Mother's maternal side 

And, finally, "5", the solitary match, is a 2c from my Father's paternal lines.

I have added the three known additional matches, to the following tree image, which combines my maternal and paternal lines.  I have also included a representation of where I currently think "1" descends from, as doing so should help to clarify their matching/not matching within that maternal cluster (the match themselves should be at either my mother's, or my generation):


As Ancestry develops the tool further, we know there will be changes to the cM limits, which will help to make the tool more useful.  Whether they also start to include 'unclustered' matches remains to be seen.

But, in conclusion, I hope this goes some way to demonstrating how necessary the genealogical research still is, in order to make sense of the clustering.