-->

Monday, 23 March 2026

MyHeritage WGS DNA test results - initial comparison to an uploaded kit

 In January, I mentioned that I had now tested with MyHeritage, rather than just uploading kits from other DNA companies to that site, since MyHeritage have now brought in "Whole Genome Sequencing" (WGS).  At the time, my kit was still in the processing stage, and I was looking forward to receiving the results, and the possibility of comparing them to my other kits there.

So this post is the beginning of any comparisons, looking at the new test compared to the upload I did, in November 2016, of the data from my FTDNA test.  

[Please note, this is just my personal exploration of the results I have - I don't keep up with the "bigger picture" of what's happening regarding genetic genealogy so, if you're looking for more detailed analysis and comments regarding the WGS test, I suggest reading the comparisons carried out by someone such as Roberta Estes, on her blog, "DNAeXplained – Genetic Genealogy". I also don't have a subscription to MyHeritage, which might affect the level of detail available for my kits.]  

One of my expectations of the new test was that there would be less matches than I have with the FTDNA kit. This was because, in order to compare kits from different companies, who might not all test exactly the same points in the DNA, MyHeritage uses a process of 'imputation'.  This process 'fills in' gaps in the sequences. Although imputation seems to be a common process used by all of the DNA companies, and is carried out in accordance with specific principles, it can potentially lead to cases where people are incorrectly identified as matches, when they shouldn't be (and possibly vice versa).  Looking at my "new match" notifications from MyHeritage in the past, I've often thought that could be the case, with many of the matches showing low levels of shared DNA.  Hence my expectation that the better coverage of the new test would discount these lower level matches.

But I was wrong!

When I first received my results, the new kit showed a total number of matches of 17019, whereas my FTDNA uploaded kit showed 16531. The totals as at the time of writing this (23/3/26) are 17328, and 16820 respectively.

It's currently not possible to download a list of all one's matches at MyHeritage, although that used to be possible. So I opted for a 'cut & paste' collection of the closest 2000 matches to each kit, and put those into two spreadsheets (and, yes, that did take a while.)  For both kits, this resulted in the lowest matches having a total shared DNA of around 19cM/20cM.

I then did a fairly simple comparison between the two spreadsheets, and discovered that almost half of the names in each spreadsheet did not appear in the other sheet:   

It is possible (and, I imagine, quite likely) that many of those who only appeared on one sheet do match the other kit but are beyond the first 2000 matches. I haven't specifically looked at many of the matches to check that yet, given the numbers of "No" matches involved. But, with my 'close' and 'extended' family only accounting for 17 of the matches, and the rest all being 'distant', I can imagine small changes in the levels of shared DNA could make quite a change to the order in which they appear on my match list.  

It was also obvious from those figures that there was something a bit 'odd' with the comparisons, since one would expect the same number of 'yes' matches in each sheet.  

The difference was caused by the fact that I had only compared names (since I expected there might be differences in the levels of shared DNA, so hadn't included that information in the comparison criteria, but didn't think to include other items, such as age, where the matches were from, or who manages the DNA, etc.)  

The reasons I identified for the difference included:
- five names that appeared twice in the MyH sheet, but only once in the FTDNA.
- three matches appeared twice in the FTDNA sheet, but only once in the MyH sheet.
- ten matches in the MyH sheet were only identified as "DNA kit", an increase of three from the number of such matches in the FTDNA sheet.  
- three 'private' matches in the FTDNA sheet did not appear in the MyH sheet.

At this point, I copied all the ''no" entries into one spreadsheet, and the "yes" entries into another, and physically aligned all the "yes" entries for the two kits, so that I could investigate how the shared DNA levels might have changed.  I took out all of the 'anomalous' entries identified above, leaving 1009 entries which appeared in both kits. 

[Note, I have also now re-run the comparison between the sheets, having concatenated "name", "age", "from", "managed by", "contact", and the "tree or not", items. Doing so identified just three entries that didn't match up correctly. Two of them were where there was one entry in the MyH sheet but two in the FTDNA sheet, and I had picked the wrong one to include. One of these would make no difference to the figures, the other would increase the number of kits that have gained one segment. The third entry was a mismatch between two kits labelled as "unknown" that I hadn't spotted. I don't like that sort of error so, in the following, I have removed that kit (leaving 1008 in both 'yes' lists), and also updated the other two entries with the correct matching details.  The new comparison also showed that I could have included some of the entries just identified as "DNA kit" in the following comparisons, since they can be matched up across the two kits. However, I haven't added them in, since none of them are particularly close matches, or make a noticeable difference to the figures/charts.]

To start with, I looked at the "no" spreadsheet and, for each kit, plotted the total DNA shared against the longest segment, just to give me an idea of the levels of sharing that didn't make it into the other kit list:  

  


As you can see, the majority of the kits seem to be where the total shared is between 20-30cM, and the longest segment is less than 20cM.  As I mentioned above, I suspect that many of these kits might be matching my other test, but the variations mean the matches appear in different orders and these just didn't appear in the first 2000 entries. 

However, there are some where the longest segment is over 30cM, or the total shared DNA is over 40cM, as well as the longest segment being over 20cM.  So I decided to check each of those, to see if they were matching the other kit, but beyond the first 2000 matches - only four of them were:

Of the four kits easily identified as also matching the other (FTDNA upload), the first in the table above had gained two new segments, on different chromosomes, the next had lost one segment, but then gained three new segments on other chromosomes, the third showed increases on the two 'existing' segments, plus the addition of a new segment on a different chromosome, and the final one showed an increase (of over 20cM) on the 'existing' segment.

I might come back to these details, as and when I take another look at chromosome mapping. But, for now, I moved on to look at the 'yes' sheet, ie those matches that appeared in the first 2000 entries of both my FTDNA upload, and also the new MyHeritage WGS test.

The 'yes' kits   

I began by looking at a scattergram of the change in 'Total cM shared' against the change in 'Longest segment' but it's perhaps more helpful to look at the following two charts first.

This shows the numbers of matches whose 'Total cM shared' changed, within particular ranges of values (calculated by 'Total shared cM with MyH kit - Total shared cM with FTDNA upload'):

 


From this, you can see that the Total cM shared, for the majority of matches, did not change by very much.  

I think that's important to note, given that the specific examples I'm exploring in more detail are all 'outliers', ie the matches where the changes are more extreme.  I'm looking at them because I find the situations intriguing, not because I'm saying there is anything 'wrong'.  

One can see the same thing, when looking at the changes in values of the 'Longest segment' (calculated by Longest segment matching MyH kit - Longest segment matching FTDNA upload) - the majority of matches showed very little change in the longest segment value:   


The following image shows the changes in Total cM shared against the changes in Longest segment for each individual match:


I think there are two different things showing up here - there's the points falling along a diagonal, indicating that there's been a change in both the Longest segment length and the Total cM shared. But then also a horizontal line of points along the x-axis, where there's been a change in Total cM shared, but without corresponding changes to the Longest segment length - potentially indicating the loss, or gain, of other, smaller, segments.  

From the following figures, it can be seen that, although again, the majority of kits showed no changes in the numbers of segments, almost 200 matches did show either a loss, or a gain:


Of the 81 who lost one segment, the Total cM shared decreased for 80 of them, but the Longest segment showed no change for 70 of those. And, for the 101 that gained one segment, 99 showed an increase in the Total cM shared, but 78 of those showed no change to the Longest segment.  

So those figures would seem to support the possible explanation for the 'horizontal' line of points, that the segments being lost, or gained, are smaller segments, rather than the longest. [and whether any of it is 'significant' would be a totally different issue, given that many of the changes are only in the range of 5-10cM.]  

I was intrigued by one match, who had lost one segment, and yet both their Longest segment and the Total cM shared had increased (by 23.5cM and 18.7cM respectively.) This was a case where a 'gap' between two small 6cM segments on chromosome 18, is now shown as matching, creating one segment of 31cM, another three segments remaining identical:

Another match I followed up was one where the number of matching segments increased by 3 yet the Total cM decreased by 1.8cM and the Longest segment decreased by 20.5cM:

 


In some ways, I don't know what to make of this - the total loss of a segment on one chromosome, but gaining four small segments on different chromosomes.  

The companies give us many such small matches so, according to their science etc, it must indicate at least a 'potential' relationship. But I certainly wouldn't be spending time looking for a genealogical connection to such a match!

The other match that gained three segments had increased both their Total cM shared, and their longest segment (by 28.0cM and 4.9cM respectively):


That seems a bit more 'reasonable' than the previous case, with an increase to the existing segment, and the 'discovery' of three other segments. 

But how relevant some of these segments are remains to be seen.

Closer matches

Finally, I looked for any changes to the matching with my closest relatives. 

In comparisons with my mother's kit, the MyHeritage WGS kit showed different totals on seven chromosomes, from those shown with the FTDNA upload. Two chromosomes showed decreases, the other five were increases, but all individual changes were less than 7cM, producing an increase in Total cM shared of 16.7cM. I'd need to research the particular start and end RSID points of the tests, to see if differences in those explain these changes (since I should match my mother along the full length of every chromosome.)

Comparing my kits to my uncle's, with whom I share 43 segments on each kit, six of the segments had changed slightly (one increased, five reduced), all changes less than 3cM. Three of the segments are all on chromosome one and at least the first segment is potentially due to wider coverage of the newer test, since the starting location has changed to exactly the same RSID point that my mother's kit did.  

The next closest eight matches, taking me down to a Total shared DNA level of 100cM, includes seven identified second and third cousins.  Of these, only one shows a change in the Total shared DNA, with an increase of 18.4cM on chromosome one.  In this case, the increase doesn't seem to be connected to a change in the starting location (which is actually quite interesting, since the starting location for this match on my FTDNA uploaded kit was already showing the earlier location - so why wasn't that kit showing as matching to my mother, and my uncle, from that point?) 

Since this 2c should be matching my uncle over the same range, I shall investigate this further.

But that can wait until another day!



Saturday, 14 February 2026

Bit and pieces: Match numbers, Second cousins, DNA clusters, and Ancestor Score

Match Numbers
As anticipated, the numbers of new DNA matches at Ancestry have been increasing over the last few of weeks - I now have a total of 20,066 matches, up from 20,003 on the 31st January. The numbers of close matches have also increased - following the two new ones during January, there was a loss of one in the first few days of February, so my current total of 379 actually represents four new matches in the "close" category so far this year.

Second Cousins
What I wasn't anticipating was that, just a few days after posting the comparisons between my first and second cousins, I'd gain a new, and relevant, second cousin!  

I have updated the previous table with the shared DNA and Ancestry predictions:


I can confirm that matches 6 and 7 are siblings, thanks to the protools. So, once again, this is a second cousin relationship that shows much less DNA than would be expected, based on the predictions.

DNA clusters
I was interested to see that one of the shared matches with this new 2c connects to what I call a 'splurge' cluster - a large group of matches who all seem to match each other.  

You can see an example of this with the "Group 29" on my post here. 

That's an old post, from when the number of matches was much less than it is now (I only had fifty nine 'close' matches then!) But it illustrates the point of how the shared matches cluster together, and how some of those clusters are much larger than others. 

In this case, there are 172 other shared matches between us, as opposed to just the sixteen matches I share with my second cousin.

Although there has been debate over the years as to what these large clusters represent, I've often wondered whether they could be caused by moderately recent ancestors, whose descendants emigrated to America as part of the Mormon migrations, and who now have a large number of descendants over there.  

So I was very interested to see that this match connects back to ancestors in Utah.  

And, although they aren't showing any connections back to the UK in their tree, I recognise the family they connect to as one that I looked at briefly many years ago, when Ancestry was producing "Circles" and "New Ancestor Discoveries":


  

The particular match does only share 14cM with me - which I know is low and, without any other clues, I would not normally research such a match (or the associated cluster). 

But this does make me think I should be taking another look at the Herefordshire ancestors of those people in the cluster, to see if I can identify my connection back to them - which, potentially, might only be in the early 1800s.

Ancestror Score
One of the things I had hoped to do, in a post today, was to revisit something called an "Ancestor Score".  I first posted about this on Valentine's Day back in 2015 here.  I'd seen the idea on another blog and thought it would be a great way of keeping track of progress, not just on my family history and identified ancestors, but also, by including that extra column, on monitoring my identified DNA matches, as well.

At that time I was expecting to make progress with my genealogy as I took part in the "Genealogy Do-Over".  But life got in the way, as it has a habit of doing! 

I've decided against attempting to produce a quickly updated chart - although there has been some progress over the intervening years, much of it hasn't been as fully documented, and evidenced, as I'd like it to be, so I don't feel it counts. 

Now that I have dropped some of my other activities, and really do plan to make progress this year, I am going to repeat that 2015 table here instead, to lay it down as a 'baseline'.


The proof of progress will be in next year's post on Valentine's Day!



Friday, 6 February 2026

1st and 2nd cousins - shared DNA variability

This post is a bit of "thinking aloud" - I have some data, but not a full answer for why the data shows what it does.

We know that the DNA passed on by the same two parents to their children will vary, such that, although every child will receive half their DNA from each parent, the level of shared DNA between the siblings will vary, depending on which 'bits' of the parents' DNA they each received. And that, as relationships become more distant, the quantity of DNA shared becomes even more variable for particular levels of relationship.  

This is why, for a specific quantity of shared DNA, several possible relationships are often predicted by the DNA testing companies.

When I first took a DNA test at Ancestry, my closest match was a predicted 3rd cousin, who shared 92cM with me.

Based on that quantity of DNA, Ancestry gives the following alternative relationships:




 And the "Shared cM Project" tool1 gives the following probabilities for the various possible relationships:




My match had tested more for ethnicity and 'general' information, and didn't know much about their family history so, based on the image the Shared cM project produces, and the level of shared DNA, I draw out a possible "family tree", showing where my match might fit into my family, along with what I knew about the family at the time:

 




[The only reason for not including the half relationships side of the diagram was to keep things fairly simple.]  

I then set to work on the genealogy - from which we discovered that the match actually seemed to be a second cousin, not a third, despite us sharing a relatively low level of DNA for that relationship. 

A question was asked, by one of the DNA experts, as to whether the match might be a half 2c - and that is a possibility I still bear in mind.

However, I have been interested to see the other quantities of DNA shared, as more of the family have tested over the years. 

I do have quite a few second cousin matches now, thanks to my grandmother being one of ten, but I'm concentrating here on just four of them - a single second cousin from my grandfather's side, and three second cousins from my grandmother's side, who are siblings to each other - and comparing them to myself and two of my first cousins.  This is because the closer relationships, of the siblings to each other, and of the first cousins to each other, are confirmed through the shared DNA, as well as the known family history.

So this is how we all relate to each other:


And these are the levels of shared DNA:


Below is a table of the averages, and ranges, of shared cM for particular relationships, taken from the DNAPainter diagram:


So, with the exception of the 39cM shared between match 5 and me, and of the 23cM shared between matches 1 and 6, all of the values do actually fall within the range for possible second cousins. 

However, the probability of the relationships being second cousins (or even half second cousins) seems to be classed as fairly low for many of the values:   


I have included the Ancestry predictions for the relationships in the following table:


As you can see, only two of the relationships (highlighted in yellow) are predicted to be possible second cousins.  If there is a "half relationship" situation, another two of the predictions (highlighted in pink) would be okay.

But Ancestry's predictions for all the other comparisons are for more distant relationships.

When I received that very first match, one of the first things I did was to put the shared DNA figure into a predictor and, if it hadn't been for the match then being able to give me a couple of names that I recognised, I would probably have been looking at the wrong generation of my tree, at least initially, to try to find our shared ancestry.

As I mentioned above, the question was asked as to whether my first match (and now that would mean their siblings, as well) might be half second cousins to me (and also now to my two first cousins). Since the respective grandparents were the second and fourth children out of ten in the family, with fairly regular "two year intervals" between them all, there would have to be a "story" behind that, if it was true.  

It's obviously not impossible, though, so I'm not discounting it and will continue to explore the possibility, through the clustering of other shared matches.

But, even if a half relationship between my grandparents and their siblings does become evident, it wouldn't explain the fact that the shared DNA, for the majority of the relationships, is still less than would be expected - and therefore, if I needed to search for how I connected to these matches, I might be looking in the wrong parts of my tree! 

So, one point I am trying to make is the importance of "doing the genealogy" and not just relying on such predictions.  Does the predicted relationship fit with the known family history, with ages, and with locations, etc?  If not, don't just assume the "most probable" prediction is the correct one.

Another possibility I have wondered about, is whether the predictions from companies such as Ancestry, and the Shared cM Project, might have a tendency to predict more distant relationships for those of us in the UK.  This could be due to much of the data coming from people with ancestry in the US.  It seems those in the US often have many more matches than those of us in the UK, and potentially, a higher level of "overlapping ancestors", which might create a higher level of shared DNA for particular relationships. And thus 'bias' the predictions.

I don't know enough about the wider field of DNA statistics to know whether that is possible, or whether other people in the UK have found similarly lower levels of shared DNA.  

But I shall certainly be checking the predictions for all my other identified DNA matches more closely in future, to see if those show the same tendency. 



Notes and Sources
1. Shared cM Project 4.0 tool v4





Saturday, 31 January 2026

A possible route into family stories, and the religious leanings of one of my ancestors - more news from January

A possible route into family stories - a "life index"
 I've written several times about family history being more than just 'names and dates', that it's about the lives of our ancestors, and their other relatives, who they were, what they did, and the circumstances that impacted them. Even their hopes and fears, where possible. 

I've also mentioned that we should be recording our own stories because, one day, most of us will, more than likely, have become an 'ancestor'.

But I hadn't resolved the issue of how to make any of that easily accessible to future generations (or even to 'future me'!) or how to provide a 'way in' that might catch their attention and spark their interest.

For some years, I've tried to keep a "Master Timeline" record, in an excel spreadsheet, of what we, as a family, did on particular days.  It began as just a way of keeping track of some of the mundane things, like dentist and optician's appointments, but also included holidays, days out, and other key dates as the children grew up. 

It was designed to answer those inevitable questions of "When did we....?"

Maintaining the spreadsheet has been a bit erratic at times, particularly in the years when I've also kept a journal, and especially once the children left home. And I'd never seriously even considered how easy (or otherwise) the information would be for anyone else to access, since it was mainly just for me, and I knew what was in it.  

But, of course, one can ask a question the other way around - "What did we do on...?"

 For example, looking up all the 31st January entries, I can see that, in 1986, my parents' dog, Sadie, had to be put to sleep, that in 1998, one of my sons ran in the local primary schools' cross country championship, and, in 1999, he had a rehearsal for a school show. (and yes, on this date in 2012, there was a dentist appointment! ☺)

I've realised this as a result of a post by Taneya Koonce1, another member of the Guild of One-Name Studies, whose blog I follow.  She posted a video about her "life index" journals, and it struck me what a brilliant idea this is, to have an index for every day of the year.  

It was one of those 'lightbulb moments', when an idea that I 'knew about' in some form, eg from the old 'birthday books' that some of us might have kept, or the 'on this day' notices on 'history' websites, suddenly became something I could actually use in a way that will help me to achieve what I want to.  

Taneya is in the US so, although I love the tree design on the front of her journals, those specific ones would take a while to arrive in the UK. There are similar ones available here but I have decided instead to go for a plain covered, larger journal (A4) which contains 400 pages (200 sheets).  That means I can include two pages for dates which I know I'll probably have lots of stories for (eg Christmas, or close family birthdays).  There'll also be some 'spare' pages that I can use to list stories where the specific date is unknown (eg my mother, as a child, using the bedsheets to climb out of her bedroom window, in order to try to avoid having a bath!)

As you can possibly tell, from the things I've listed above for the 31st January, just noting a key event on a day can act as a prompt for a family story - in those cases, what Sadie, the dog, was like and the things she got up to, as well as the childrens' sporting and 'theatrical' activities. 

As Taneya says, "writing things down doesn’t just preserve them. It activates them."

Another January discovery, the religious leanings of an ancestor
I had a lovely surprise early in the month, when I was contacted by a descendant of one of the step-daughters of my 2xgreat grandfather.  We had exchanged information some years ago but, recently, she had discovered that the burial register for Rowlestone, Herefordshire, is available on the Ewyas Lacy Study Group site.2
 
Now, I knew that my Thomas PARRY's second wife, Ann, had been buried on 19 September 1908, "without the burial service according to the rites of the Church of England", because that was information supplied to my dad, by a local vicar, back in the 1990s. 

But, what the vicar had not passed on, was that the register also gives the name of who performed the service, "William JAMES, Abergavenny", or that it was a noted as a Christadelphian funeral. 

Having discovered this, my contact had then spent time working through the Christadelphian Magazine archives, from which she was able to discover that:
    - My Thomas PARRY and his wife, Ann were both formerly Methodists
    - That they were baptised by immersion and received into fellowship with the Christadelphians in May 1900
    - That their home in Walterstone was used for meetings
    - That their names occurred several times among those from Walterstone who met with the brethren in Abergavenny
    - and that, when Thomas PARRY died, in February 1918, the Christadelphians noted that "we were not allowed to take any part in laying our brother to rest." 

Did other members of the family not approve? 

Or was it the minister of the church where Thomas was buried who objected? 

Or could it have been simply because he was being buried in the same grave as other family members, and a non-conformist burial would have required a separate grave? 

We'll possibly never know for sure. But Thomas PARRY was buried on the 26 Feb 1918, in Christchurch, Govilon, which comes under Llanwenarth Ultra, Monmouthshire, in the same grave as his first wife, Sarah, and their infant son, Lewis.

I'm very grateful to the researcher who, many years ago, uncovered the gravestone and supplied my dad with this photograph:

 


[Especially since, when I visited the graveyard some years later, in 2002, the particular area where this stone was, was totally overgrown. I found several of the graveyards around there were being allowed to 'return to nature', so the stone would have been almost impossible to find. ] 


Notes and Sources

1. Taneya's post on substack: https://taneyakoonce.substack.com/p/helping-dates-tell-stories-with-an

It's also on a Facebook reel: https://www.facebook.com/reel/950876704124479


DNA match numbers

 Like many people, I imagine, I've spent some of January doing a bit of 'sorting and planning' to help me achieve what I'd like to during the year.  

So now I just need to actually do the things I've planned!

One of the first tasks was to update the graph of how many close matches I have at Ancestry.  At the time of my last post, the review of 2025, the number had increased to 376 close matches.  I now have 378 close matches - and I also noticed yesterday that I had exactly 20,000 matches, in total, there. 

(But that total had already increased to 20,003 by this morning.)


Since I was interested in the rate of increase, I also looked at the change in the totals over the years:


The Ancestry test was launched in the US in 2012 and then in the UK, in January 2015.1 One can see that, after an initial slow start, for me, the three years between 2017-2019 saw the most new close matches, with an average of 50 across those three years.  Numbers have since reduced, averaging 30-35 per year, but are quite variable.

 From the graph, many of the years seem to show a higher rate of increase in the early months of the year - probably due to the sales in December, and 'Christmas gifting', which results in more kits being processed during those early months.

It will be interesting to see if the early part of this year shows the same sort of curve. Although kit prices at Ancestry were reduced, those of one of the other companies, MyHeritage, were even cheaper.  

And, with the news that MyHeritage was moving on to "Whole Genome Sequencing" (WGS)2, perhaps more people will have opted to purchase kits from there instead?

Either way, I'm sure, with this change, there will be a surge in the numbers at MyHeritage - if only because of all those who have already taken DNA tests elsewhere now deciding to try the new test, as well. 

I admit it - I did too.

My kit is currently in the "WGS in progress" stage, and I am looking forward to receiving the results.  It will be interesting to see how they compare to those received from the other companies I have tested with, and especially with those kits I transferred to MyHeritage.

Unfortunately, I've not been tracking numbers there in the same way, with those transferred kits - but perhaps it will be worth starting to do so, once these new results are in.


Notes and Sources

1. Launch dates of the autosomal DNA test at Ancestry: https://isogg.org/wiki/AncestryDNA