Sunday, 25 June 2017

Analysing my DNA: Crossovers Part 2

This is a continuation from my part 1 post at http://notjusttheparrys.blogspot.co.uk/2016/12/analysing-my-dna-crossovers-part-1.html.  My initial intention for this post was simply to look at the shared matches between the siblings, to see how those results correlate with the phasing of chromosome 21 represented in part 1.  That sounds easy enough but one of the reasons it has taken me so long to post, is that things very rapidly become complicated! 

So, in this post, I will look at the shared matching between the siblings and their three closest relatives - a niece, a first cousin and a third cousin once removed - and how adding the additional relatives caused me to alter my interpretation of how the niece matched. 

This was my starting point from part 1, the four siblings A, B, C, and D, with their chromosomes represented by the four colours:

The parents' chromosomes can then be represented as follows:

And which parts of the parents' chromosomes each of the siblings received like this:

One of the closest matches to the siblings is their niece, daughter of a deceased brother.  Since the brother was never tested, I don't know what crossover points he received from his parents. The niece will only have one chromosome (of each chromosome pair) from her father - but the differences in matching between the niece and the siblings could be as a result of crossovers within each of her father's chromosomes, or between the father's two chromosomes. 

So these are the comparisons between the niece and each of the siblings from Gedmatch, along with a potential crossover point identified at 43:

So immediately there is an issue - the niece matches sibling B up until 43, and, correctly, does not match any of the other siblings until that point.  However, beyond 43, the niece appears to match none of the siblings (based on the grey "match" bar).  But we know that, since sibling A and B do not match each other at all on this chromosome, the two siblings A and B, between them, cover all four of the siblings' parents' chromosome 21s.  So, if the niece doesn't match sibling B, then she has to match sibling A at least.  And, looking at the Gedmatch image, it seems quite clear that this is a threshold issue - the niece does actually match the three siblings A, C and D beyond 43.  The match just isn't being picked up as a match by Gedmatch at the default threshold.  Reducing the threshold indicates the niece matches all three siblings A, C and D by 6.9cM, containing between 1041 - 1045 SNPs.

The initial interpretation of the DNA received by the niece therefore became:

Next, I looked at how the siblings and the niece matched the siblings' paternal first cousin.  The Gedmatch image below was produced using the default threshold, but again, reducing the thresholds slightly indicated a potential matching segment just below the 7cM threshold:

Chr        Start Location        End Location        Centimorgans (cM)
21        14,677,076        22,936,413        18.2
21        22,950,552        33,423,011        15.7
21        34,132,054        37,056,381        6.7

The paternal first cousin can only match the siblings through their father's chromosomes.  But, as their father will not have received exactly the same DNA as the first cousin's parent did, there will be some areas where the first cousin does not match any of the siblings.

By comparison to the phasing of the siblings and niece, the first cousin's matching segments were therefore mapped as follows:

(this process also indicated that the "Parent 2" phasing represents the siblings' father's chromosomes.)

So far, so good.

When I downloaded the matching segments for the siblings, in order to start investigating the shared matches, I realised a known relative shared DNA with sibling B on chromosome 21.  The relative is a 3rd cousin 1 removed (3c1r) and shares from about 17 to 28.  The shared ancestry is on the siblings' paternal side of the family, the same as the 1c is:

But now there's a problem.  This 3c1r does not match any of the other siblings, or the niece, on chromosome 21.  But, at the point where the  3c1r matches B, we have already "used" both of the paternal chromosomes, one for the matching between the first cousin and siblings ACD, the other for the matching between the niece and sibling B.   It's okay that the 1c doesn't match the 3c1r - that actually indicates that the chromosome ACD share with the 1c must be the one the siblings' father received from his mother, the siblings' grandmother, as she is also a common ancestor with the 1c. 

But, clearly the chromosome the niece shares with sibling B cannot be the other paternal chromosome.  As far as I am aware, there's no other shared ancestry with the 3c1r.  So, let's go back to the matching between the niece and the siblings - where did I go wrong?

Siblings A, C, and D all show a very small area of potentially matching SNPs between 24 and 26 - but it is only 1.5 cM and 365 SNP.  I don't believe that has any significance, especially as there's no change in matching with sibling B. (The niece only has one relevant chromosome in this comparison - and the kit being used is a "paternal" one that's been phased using her mother's data, so should be fairly accurate.)

So what about the potentially matching segment with sibling C, between 37 - 39?  This is a 4.2 cM segment, containing 743 SNPs - so it is a small segment that, under normal circumstances, when matching to unknown and more distant relatives, should be ignored. 

From the sibling phasing, B and C are matching from 37, after C had a crossover, and their matching segment is a "Parent 1" segment.  So, is it possible that the niece's matching should actually be as follows:

The niece is matching B on a Parent 1 chromosome (now known to be maternal).  Sibling C then starts to match both B & the niece at 37, but the niece stops matching C at 39, as the niece has a crossover between the two chromosomes her father had.  If she switches from her father's maternal chromosome to his paternal chromosome, and those are also the two chromosomes sibling B has, that would account for why the niece continues to match B until 43.  At 43 there is then a crossover between the two chromosomes of Parent 2 - which would indicate a crossover in the niece's father, passed on to the niece within the segment from his paternal chromosome.  This interpretation would account for the niece's match to the 1c, between 40 - 43, and explain why she does not match the paternal 3c1r earlier on the chromosome, between 17 - 28.

If that is the situation, then the diagram of the siblings' parents' chromosomes can now be extended to also show the DNA received by their grandchild, the siblings' niece, as well as the potential source for the paternal chromosomes:

Please let me know if you can spot any mistakes in my reasoning.  

Sunday, 19 February 2017

A slight sidetrack - my LivingDNA results

I received my LivingDNA results earlier this month and have been doing some research as a result. I'm therefore taking a little side-track, to blog about that, rather than continuing with the post about mapping crossover points (which will be posted eventually, I promise!)

Details about the LivingDNA test can be found on their website (at https://www.livingdna.com ) and various other bloggers have already described their own results, in particular Debbie Kennett, who probably has the most detailed review of all areas of the results.* Here I am only concentrating on the "Family Ancestry" area, also known as the autosomal DNA.

Basically, unlike the other autosomal tests I have taken (at 23andMe, Family Tree DNA, and Ancestry DNA), where I regard their ethnicity predictions 'with a pinch of salt', and my main aim has been to obtain matches through which I can confirm and further my family history, the LivingDNA test is currently purely about ethnicity, about where we come from (although other features will be added later). The value of their test is that it has been developed in partnership with a range of scientific teams, such as those involved with the People of the British Isles project, enabling more precise predictions of origins for those people with British ancestry in particular, than the other tests currently available provide. Ethnicity, or 'Origins', predictions are dependent on the reference populations you are being compared to and, in the case of LivingDNA, this currently includes 80 world regions, with 21 regions in Britain and Ireland.

So what makes me, me?

The above image shows my DNA mix in the last 10 generations, at three levels of detail, through means of a family ancestry avatar, which is a bit of fun.  At the moment, only the "standard" mode is available, but "cautious" and "complete" views will be provided in the future.

The results are also shown in a map format, again at the three levels - global, regional and sub-regional. At the global level my Family Ancestry Overview indicates that I am 98.4% Europe and 1.6% World (unassigned).

At the regional level, the Europe 98.4% is broken down into
Great Britain and Ireland 91.3%
Europe (North and West) 4.9%
Europe (unassigned) 2.2%

The map for this looks very similar, but just in shades of green:

It's at the Sub-Regional level that the picture becomes much more interesting:

The following image shows the level of detail within the UK area, which indicates I have ancestry from at least 13 specific UK regions, with some DNA still unassigned:

As I had to reduce the size of the screenshots, to get all the figures in, here is the percentage breakdown:

Europe 98.4%

Great Britain and Ireland 91.3%
  • South Wales Border 41% 
  • Southeast England 10.1% 
  • East Anglia 8.1% 
  • Cumbria 4.5% 
  • Cornwall 3.4% 
  • Northwest England 3.3% 
  • South Wales 3.2% 
  • Devon 2.9% 
  • South Central England 2.8% 
  • South Yorkshire 2.6% 
  • Lincolnshire 1.2% 
  • Northumbria 1.2% 
  • Orkney 1.1% 
  • Great Britain and Ireland (unassigned) 5.7% 
Europe (North and West) 4.9%
  • Scandinavia 2.8% 
  • France 2.1% 
Europe (unassigned) 2.2%

World (unassigned) 1.6%

I like the distribution maps but I really love the 'Do-nut' chart, as I think that gives a better indication of how much of me is thought to come from each of the regions, ie my percentage make-up:

You can see how much more detailed these results are, compared to those currently provided by the other companies:
Family Tree DNA (comparing to 18 population clusters) - 99% European (made up of 70% British Isles, 29% Scandinavian) and 1% Middle Eastern (North Africa),
23andMe (standard) (comparing to 31 populations worldwide) - 99.7% European (18.9% British & Irish, 0.4% French & German, 68% broadly Northwestern European, 0.4% Iberian, 0.3% broadly Southern European, 11.8% Broadly European), <0.1% Sub-Saharan African (Central & South African) and 0.2% unassigned
AncestryDNA (comparing to 26 global regions) - 99% European (63% Great Britain, 24% Ireland, 10% Europe West, 1% Finland/Northwest Russia, <1% Europe East ) and <1% Africa (Africa North)

Of course, what's important to me is how the genetics works with my genealogy, ie how well do these results match to where my family history indicates my ancestors came from?

Using the regional descriptions on the page at https://www.livingdna.com/en-gb/uk-regional-breakdown , I have coloured a pedigree chart with my ancestors' birthplaces (an idea copied from Debbie Kennett, who attributes the original idea to J. Paul Hawthorne with his #Mycolorfulancestry meme).

There are some problems trying to match the colours like this. One of my ancestors comes from Hampshire, which is in the South England region - a region which does not show up in my results, so I don't have a matching colour to use. I also have three unknown 3xgreat grandparents. Another issue is that the regional descriptions given on the above page differ from that given on my results pages.

On my results pages, it states "The areas of Shropshire, Herefordshire, Monmouthshire, Worcestershire, Powys and Gwent are collectively called the South Wales border" and, for South Wales, it states "unique southern signature is found in the modern counties of Pembrokeshire, Ceredigion, Carmarthenshire and West Glamorgan."

Whereas the regional breakdown page above describes the South Wales border as "approximately Herefordshire/Worcestershire/Shropshire/W Midlands and surrounding areas" and South Wales is then described as "approximately Pembrokeshire/Carmarthenshire/South Powys/Swansea/Glamorgan/Monmouthshire areas"

Thus my Monmouthshire, Breconshire and Radnorshire ancestors are in different regions in these two descriptions. (Breconshire and Radnorshire are now part of Powys) So the three grey/blue "South Wales (or SW border)" entries in the pedigree above possibly should be orange, to match the rest of the "South Wales border" entries.

This seems more probable when I plot the known birthplaces of my 3xgreat grandparents using Genmap*:

As you can see, my paternal 3xgreat grandparents cluster around the South Wales border area and those within Monmouthshire, Breconshire and Radnorshire, are only just over the border so perhaps more likely to be genetically similar to the South Wales Border region than to the South Wales region.

The colours on my pedigree give the impression that a higher percentage of my DNA from my maternal ancestry should be in the Southeast England region. However, from the map, it is clear that my maternal ancestry generally is more spread out than my paternal ancestry and that those in the Southeast region are predominantly in London:

The ancestor with the red square around them (one of three plotted at that point in Lambeth) is known to have a German grandfather. Since the DNA results are said to relate to my DNA mix in the last 10 generations, and my pedigree is only showing 5 generations, then clearly there is a lot of potential for my other London based maternal ancestors to have arrived there from somewhere else in the country.

So, can the DNA results actually help me with tracing my ancestry, particularly with regard to my London ancestors? Should I be looking for connections, for example, to the north of England, or down in Cornwall?

Please note, I am just exploring ideas here.

In the Guild of One-Name Studies, we consider the frequencies and distributions of the surnames we study, as this can often shed light on the origins of the surnames - and potentially suggest locations that ancestors who suddenly "appear" somewhere might have come from.

So, using Steve Archer's Surname Atlas*, which maps the distributions and frequencies of surnames from the 1881 census, I've produced maps for each of the surnames of my known 3xgreat grandparents.

These are the distributions for the paternal surnames (15/16 known):

The majority of the surnames do show concentrations in Wales, or the South Wales border area, although there are some interesting "non-Welsh" distributions for Robinson, Taylor and Mitchell in particular. The surname Robinson does seem very concentrated across the north of England. Harris has both a south Wales and a Cornish concentration. Although the surname Parry shows a concentration across North Wales and predominantly in Anglesey, I know that this surname is a Welsh patronymic, and therefore has multiple origins across Wales. The dates of origin for this surname can be anywhere between about 1400 - 1800. So, for my family, I suspect the origin is more likely to be in the South Wales border area, where the known family were, and perhaps related to the concentration in Breconshire.

These are the distributions for the maternal surnames (only 12/16 known):

Interestingly, two of the surnames on my maternal side also show potential Welsh origins. More of these maternal surnames show a countrywide distribution, which makes it difficult to identify a specific "potential origin". But there are some concentrations in the North and also one surname, Rice, showing a concentration in Devon. The two concentrations of the Harland surname, both in coastal regions, does make me wonder if that on the south coast could have been created by migration of some families from Yorkshire.

Obviously this sort of idea needs confirming properly through thorough research. But perhaps there are some hints from this surname mapping process, as to which of my ancestral lines might be the sources of the DNA from regions such as Cumbria.

I've already mentioned the German ancestry of one of my London 3xgreat grandfathers, whose grandfather was called John Michael Hengler. The other DNA companies provide "match lists" showing who else in their databases I relate to. Quite a few of these matches do have ancestry in the north of England and I have often wondered if this was due to other descendants of my ancestors having moved into those areas later. For example, whilst I descend from the daughter of John Michael Hengler, his son was married in Ireland and it appears that some descendants of that line ended up in Lancashire.

But I don't think that explanation would fit with the different regional results as discovered by the scientific research in the POBI project and LivingDNA. So maybe I will need to be looking further back to find the shared ancestry after all.

As a follow up to plotting the distributions, I did have a look at my Naylor ancestry. This is a surname that has cropped up in the pedigrees of some of my matches at Ancestry, so I had recently identified the potential link between my 2xgreat grandfather, William Naylor, born about 1838 in Islington, Middlesex, to his father George Richard Naylor, who was born in Gloucestershire. But I was intrigued by the Naylor surname distribution showing a concentration in Rutland and across Yorkshire. Proper confirmation is still required but initial research suggests that George Richard Naylor (or "Nayler") was the son of a Richard Nayler, a surgeon in Gloucestershire, and that this Richard was the son of another surgeon, a George Nayler of Stroud, Gloucestershire. But this George Nayler's wife was a Sarah, daughter of John Fark of Clitheroe, Lancashire.

Okay, so that's not the Nayler's themselves coming from the north of England (so far) - but potentially it indicates there could be an ancestral line from that area.

As I said, I've just been exploring an idea with this - but clearly our family history and our DNA must tie in with each other. It's just a matter of us discovering how!

I gather that AncestryDNA will soon be releasing a "Genetic Communities" feature, as part of their DNA results*. Some people in the UK can already see a beta version of this, but it doesn't appear on my account. I am looking forward to its eventual release, as it will be interesting to see how it compares to my LivingDNA results (and whether another "theory" of mine could be true - that some of my DNA matches in the US potentially stem from Morman converts from the Herefordshire border area - as I understand, from a comment by Debbie Kennett, that there's a "Mormon Pioneers" community listed in her results.*)

These are certainly exciting times to be involved in Genetic Genealogy!

* Notes and sources:
Blogger's posts about their results:
Debbie Kennett:
(Debbie includes links to other Blogger's posts at the end of the part 1 above)

Genmap and Surname Atlas - programs by Steve Archer. See Archer Software, at http://www.archersoftware.co.uk/

Debbie's Mormon pioneers comment - on Ania Waterman's blog at https://ancestraladventures.wordpress.com/2017/02/09/new-ancestry-dna-feature