-->

Tuesday, 11 July 2017

AncestryDNA - Genetic Communities

Back in February, when I wrote about my LivingDNA results, I commented on the upcoming release of AncestryDNA's "Genetic Communities" feature, which I'd heard about through others who could see their communities as part of the beta testing.  Unfortunately, general "busy-ness" got in the way of me posting about my own Genetic Communities, when I received them soon after that.  So this is a 'catch up' post.  I'm not going to cover all the details of how the Genetic Communities work - information about that is already available on the blogs of other genetic genealogists, such as Blaine Bettinger* or Debbie Kennett*, or on the Ancestry site itself. In this post I'm just going to focus on my own results and explore how useful (or otherwise) the information might be.

This is from my AncestryDNA Home Page, showing my general ethnicity and also that I am in three of the genetic communities.



Clicking though to view my "genetic ancestry" gives me the details of which communities I am in, and a map showing both the communities and the estimated general ethnicity areas (I only have traces of 'ancestry' from the "three more regions" so they aren't shown in detail.)


There are over 300 Genetic Communities currently available (Blaine Bettinger has provided a pdf of the full list, from a link on his blog), and it is possible to click down from a continental level, to explore what communities have been identified in different regions of the world, by clicking the "view all" button.  However, I find this a bit inconsistent, and potentially "buggy", when trying to explore the regions where I am in a community.

For example, If I look at the "Scots", which I am not part of, all of the communities show separately in white:


But, when I view a region where I am part of a community, I can only see my own community. For example "The Welsh and English West Midlanders" contains three communities:


But I only seem to get shown the one that I am in, when I try to view these:


This is virtually the same view I get when viewing my own Genetic Community, "English in the West Midlands". 





Based on the list provided by Blaine Bettinger, the "Welsh and English West Midlanders" region also contains the "North Walians" and the "South Walians", but I don't seem able to access the view similar to the one I see for the Scots region, showing all three of the communities in the region - although I can (sometimes) see the whole region, if I access it from the drop down on my own genetic communities view above:




For the other two community regions that I am in, the "English Midlanders and Northerners" and the "Southern English", I seem to be in the overall region but not allocated  to a more specific community within that, but again, the only view I can obtain is the same as my personal view, so I cannot see what the three more refined communities in each of these regions are.


 I would be interested in seeing how the three regions my Genetic Communities are in look like to someone who is not in them.

Comparison to LivingDNA
Since LivingDNA is the only other company that provides ethnicity estimates in fine detail within the UK, I thought it might be interesting to compare the results from them to my Ancestry Genetic Community regions.  My LivingDNA results have been updated since I wrote about them at http://notjusttheparrys.blogspot.co.uk/2017/02/a-slight-sidetrack-my-livingdna-results.html so, for now, I am including an image from both versions of LivingDNA to compare to AncestryDNA's Genetic Communities. (I will do a more detailed post about the updated LivingDNA results later.)




The three Genetic Communities I am in on Ancestry cover a large area of England, but do not include any of Scotland and only cover the border area of Wales.  In some ways, the earlier version of the LivingDNA results was a better match to the Genetic Communities, as it included down into Devon and Cornwall, and did not include much of Scotland, whereas the updated results no longer show any Devon or Cornish DNA, and now include Aberdeenshire.  However, we are talking about fairly low percentages for these counties.  Both Ancestry and LivingDNA place my main 'ancestry' as being from the West Midlands/Welsh Border areas - which does tie in with my known family history.

So I do feel that both companies are identifying connections to similar areas within the UK and, as the details continue to be refined, potentially the results will be very useful in furthering my family history.

Debbie Kennett has pointed out that, given the current predominance of Americans in the database, the Genetic Communities can help those of us in the UK to filter our match lists so as to focus on the more relevant matches, ie those who do have an identifiable connection to the same UK areas that we have.  However, although the Genetic Communities are created initially from the DNA analysis, with pedigrees then being used to supply historical information that helps to 'identify' the community, it isn't necessary to have a pedigree in order to be in a community, so finding the connections to matches who are in communities will usually involve further research (and, ultimately, might still be impossible in some cases). 

But the very fact that a pedigree isn't required, in order to appear in a community, does make the Genetic Communities a useful feature for anyone who does not know their family history, as it can help to identify some "times and places" for them to explore potential connections to their matches.

So, as confirming my family history and discovering new relatives are my main aims in using DNA, how useful are the communities for finding the connections between my matches and my own family history, beyond the general benefit of narrowing down my match lists? 

 The story views on the Genetic Communities help to provide more detail about the places where my matches' ancestors were from.




And also where they went to:



And the connection page indicates some of the surnames that are more prominent in the particular community, as well as indicating my own strength of connection to the Community:


(I love the background photo, by the way - definitely a place with relevance to my family history!)

As you can see, there is overlap between the three communities that I am in.



Just as I am in several communities, so are many of my matches.  The following diagram illustrates the numbers of my matches in each of the overlapping Community groupings:




(For anyone who does the maths, yes, there is an inconsistency between the images, with 23 matches being listed as in the "English in the West Midlands" community, and only 22 shown in my diagram - that's because another person was added in the four days between extracting the community match lists to produce the diagram and then copying the "Your Connection" image above.  Keeping data up to date is not easy!)

Since the "English in the West Midlands" is a subset of the "Welsh and English in the West Midlands", it does seem strange that two of the matches are in the subset but not in the higher level community (but that's just a minor anomaly that I've noticed, rather than something I'm looking into).

It seems clear that, at the moment, whilst it is helpful to know these matches have a UK connection, the Communities don't necessarily narrow that down to a particular branch of my family - partly because my genetic matches and I might both be in the same multiple communities but also because, as Blaine points out in his post, just because a match shares a particular community with me, it doesn't mean that, that is definitely where the shared ancestry is from.  But the Genetic Communities certainly could be helpful 'pointers' to potential connections and I imagine they will also improve over time, so may eventually even hint at specific family lines, especially when combined with other information from known family history and shared matches. 

What about those DNA matches that I have already identified some shared ancestry with - how do the Genetic Communities match up to our shared ancestry? 

Unfortunately, only two of those 'identified matches' appear in the same communities that I am in.  In one case, the match is in three of the communities I am in - the 'Welsh & English West Midlanders', 'English in the West Midlands' and 'English Midlanders and Northerners'.  There is quite an overlap between these three communities anyway, but it is reassuring that our shared ancestry is from around the Bromyard area, in north eastern Herefordshire.  The other match is in both the 'Southern English' and the 'English Midlanders and Northerners'.  In this case, our shared ancestry is in London in the later 1800s and then traces back to Wiltshire by the beginning of that century, so it looks as if the 'Southern English' community may be relevant to this - but, if I didn’t already know the connection, the shared 'English Midlanders and Northerners' could send us looking in the wrong place.

There is one other match who, whilst I don't know exactly how we relate, is known to be related to me on my mother's side, thanks to comparisons at Gedmatch.  They are in both the 'Southern English' and the 'English Midlanders and Northerners', either of which could be relevant to my mother's side of my family.  However, I have noticed that a third match, who is shared between the two of us, is showing as just in the 'Southern English' community, so that may possibly hint at where the shared ancestry is (although that community does take in everything under a line from South Wales to the Wash, so that's hardly narrowing things down :-) )

In another example, I do have a match who is in all four communities that I can see, but is a shared match to someone who is only in one of the four.  So the combination of the Genetic Communities with shared matches may be another topic to explore, to see if it can help indicate the potentially more relevant areas of the country to be researching in. 

However,  this may not be without its problems and may still be misleading to me.  For example, I have a match who shows up in just the 'Southern English' community, but both his profile and a shared match indicate there's likely to be a high level of Welsh ancestry.  Since I assume that I am not seeing any communities that my matches are in, but which I am not in, it's possible that they both share in a Welsh community,  and it's probably more likely that one of my West Midlands ancestors headed into Wales and connects into their trees that way, than the connection being in the south of England.

Shared matches are something I will write about in a separate post soon, so I shall perhaps consider the combined use of these two tools further in that.  I'm certainly grateful to AncestryDNA for the various tools they provide and look forward to future developments.  

I just know that I still have a lot to learn, to be able to work with the tools effectively!


Sources





Tuesday, 4 July 2017

A Day Out at UCL

It was with some slight trepidation that I set out last Tuesday morning for the Workshop on “Personal Genetic Testing: Challenges and Benefits in and Beyond the Clinic” at the University College London (UCL). PGT covers more than just the ‘direct-to-customer’ DNA tests that we genetic genealogists use and this was clearly going to be an “academic” day. Was it all going to be “over my head”?

But, considering my interest in DNA testing, and with people such as Debbie Kennett involved in the event, I had decided it was worth taking the risk. In the end, whilst I imagine the day might not have appealed to the ‘average’ genetic genealogist, I did find it interesting and useful, even if some of the topics were not directly relevant to me.

Following a non-eventful train journey to Euston (would you believe that first class on London Midland was actually the cheapest ticket!), I arrived with plenty of time to spare, so took a few minutes to sit in the garden at the Friends meeting house, which was between Euston and UCL, to enjoy the experience of being in London.

The first talk of the day was by Adam Rutherford.  Although he is well known as an author and presenter, it was actually the first time I’ve heard him speak. I may not fully grasp the issues of “identity politics” but his talk was interesting and informative, and gave me a better understanding of how genetics-related topics I’d previously learnt about, such as Mendelian inheritance from biology lessons at school, and the “Nature vs Nurture” debate from when I was studying psychology, fit into the wider picture of genetics.  I also learnt a new term (‘Genetic isopoint’ – “the time at which everyone alive is the ancestor of everyone alive today, or no-one”*, which is said to be approximately the tenth century.)  It was enlightening (and slightly shocking) to hear how some topics, which Dr Rutherford described as “non-controversial” to geneticists, can be extremely controversial amongst some members of the general public.  Unfortunately, it would seem that the simplistic understanding many of us possibly have about genetics can lead to undesirable consequences, such as when a concept like the “warrior gene” becomes an accepted excuse for criminal behaviour.

Coffee break was followed by a focus group on the Science of Ancestry Testing, dealing with the tests we take for genealogy.  Rather than the panel members doing presentations, as in the later sessions, this was initiated by the moderator, Mark Thomas, posing some questions about what we actually mean when we use terms such as “ethnicity” and “ancestry”, and whether genetic testing is helping to debunk myths, or whether it is reinforcing them.  The representatives of two testing companies (Dave Nicholson, from LivingDNA, and Mike Mulligan, from Ancestry) made good points about their companies’ activities in education, and about the need to be trusted by their customers (and therefore having a solid scientific basis to their claims).  But there’s clearly some differences of opinion with the scientists as to how scientific the simple ‘one-liners’ that often appear in adverts actually are.  And of course, it is often similar, simple one-liners that make the news headlines about DNA testing. There was a good comment from someone to the effect that phrases such as ‘the seven daughters of Eve’ may provide a “clear narrative” but are “scientifically problematic”.

So one “take home point” for me, from this session, was that I should try to be more critical and analytical about the things I read (and write) about genetic genealogy – people place their own interpretations on what they read, based on their own understanding and biases, and even terms such as “ancient ancestry” and “recent ancestry” often have different meanings, for example, when used by a genetic genealogist, as opposed to a population geneticist.  There is a need for clarity about how terms are being used in any particular context, as well as more awareness of the details underlying the headlines.

After the lunch break (when I joined most of the other genetic genealogists for an enjoyable lunch in the nearby Wellcome Institute Cafe), the first afternoon session concerned ethical issues in PGT.  This involved three presentations which were all thought-provoking, for different reasons.  Concepts such as “genomic sovereignty”, and the “forensic microbiome”, have certainly given me a few things to look up since I returned home*.  Whilst I cannot even imagine what it is like to live in a country such as Mexico, where thousands have been killed, or have disappeared, the second presentation, involving the question of the “personal or social” nature of genetic testing was one I could relate to more easily, having considered some of the issues myself when deciding to test at 23andMe (and in asking relatives to also test).  To know, or not to know, that is the question.  I was glad that one conclusion of the study was that people can make ethical decisions, if they have the relevant information.  The third presentation, concerning the ethical issues that arise in the use of DNA when dealing with disaster settings, is one I hope I never need to consider from a personal viewpoint.  Sadly very timely in the light of recent events, this was an insight into the very real challenges, and difficult decisions, faced by those who work in this field and raised many questions about the “Pandora’s box” that the ability to carry out DNA testing has opened.

The next session was a panel presenting social scientific perspectives on PGT and Identity. Unfortunately, I’ve always struggled with the “wordiness” of the social sciences so, for me, this was the least interesting session and reminded me of why I didn’t go into research following my psychology degree.

After another coffee break, there was a useful tutorial on the challenges of security and privacy in genomics.  It’s an important point to remember that, unlike passwords or bank details, there’s no “reset button” for our genomic data, which is why, even at the level of data we genealogists deal with, we should consider carefully what we share about it, and about those we connect to.

The final panel concerned medical and research aspects of PGT.  Again, these were interesting, even though not directly relevant to me.  The first, concerning personalised medicine and whole genome sequencing for genetic diagnosis, again illustrated some of the difficult decisions organisations such as the NHS face, when considering issues such as population screening, where the benefit of potentially discovering a curable disease at an early stage, needs to be weighed against the possibility of discovering other, untreatable, diseases at the same time. The second talk in this panel, and the final one of the day, was an enthusiastic presentation about open-access medical genomics, with particular concentration on the Personal Genome Project UK (PGP-UK). This introduced me to a few more “omics” terms (epigenomics and transcriptomics) to go with ‘genomics’, as well as describing how different types of data access affected ease of research. The PGP has a very intense application procedure, including an exam that even someone with a genetics PhD can fail, if they don’t read the information properly. So participants are very clear about what the project involves, and what open access of their data will mean, before they take part in the project.  I doubt there’ll be any concerns regarding a lack of informed consent in that project!

The day ended with an informal reception, which was another opportunity to catch up with the other genetic genealogists, and to hear their views of the day.

So, to sum up, I enjoyed the day and it opened my eyes to some of the wider issues concerning PGT and it wasn’t (entirely) over my head. I do feel that there is a gap between what the academics are focusing on and the priorities for many genetic genealogists. I imagine that the time some scientists have had to spend ‘debunking’ the more ridiculous claims that have been made regarding genetic identities of groups (such as of the Vikings), has influenced this. There clearly is scope for research into the relationship between DNA testing and identity, or ‘belonging’ – but I suspect that the majority of those testing initially do so from a sense of curiosity, rather than as a way of finding their place in the world or, as one participant put it, an “identity grab”, finding “distinctiveness in a complex world, with fractured identities”.  However, at the moment, there seems to be an overemphasis on the ancestry/ethnicity side of the tests and the claims relating to that aspect, rather than on the other aspects, such as “cousin matching” (in order to confirm researched family history, or to discover unknown parentage), which is a very important aspect for many genealogists who test.  Although I admit to having had a variety of reasons for the specific tests I have taken, or arranged to be taken, over the years, including curiosity and health issues, as well as using it as a tool for my one-name study, it is confirming my family history and finding new relatives that are the priority for me.  

Why does any of this matter?

There’s probably several reasons, but here’s a couple: I have visited societies that have had a talk by a scientist about DNA testing, who have been left with the impression that direct-to-customer DNA tests are overly expensive and not worth doing.  This concerns me, given that I am trying to encourage the use of such tests for genealogy.  I don’t mind people deciding against testing - I have several relatives who have done that and it is entirely their choice – but I’d like people to be making the decision based on accurate information.   Also, during the day I spoke to at least one person who supported the idea of regulation of the direct-to-customer DNA tests.  This wasn’t the first time that I’d found myself involved in such a conversation, having previously experienced it at WDYTYA. It’s no surprise that the topic of regulation comes up, not just because of the concerns about unscientific “ancestry” claims, but when one considers that there are now companies claiming they can use your DNA to help you with your diet, your exercise, even your wine choice*, it can seem as if the general public might need protecting.

So I think it is important that we continue to engage with the scientists and academic community to ensure that how we are using DNA testing is based on sound scientific principles, and that the way we are using it is then properly understood and represented by those who may, one day, be involved in any potential regulation.  I am very grateful to the other genetic genealogists who attended last week, as I know most of them have a better understanding of these issues than I do.  I’m also grateful to the scientists and staff at UCL, who are enabling ongoing debate about the issues surrounding PGT.  Long may it continue.

And, hopefully, we will all end up better for it.


* Sources, references or other relevant links
Personal Genetic Testing: Challenges and Benefits in and Beyond the Clinic

Genetic Sovereignty - 
Genomic Sovereignty and "The Mexican Genome" - https://ore.exeter.ac.uk/repository/handle/10036/3500
Genomic sovereignty and the African promise: mining the African genome for the benefit of Africa

The increasing use of DNA in other aspects of life:
Diet and fitness – examples of scientific literature I found:
http://www.bmj.com/content/324/7351/1438 (Summary free, main article behind a paywall)
(And a search on google for “DNA diet” will give results from companies aiming to sell you such a test.  Caveat emptor!)
Wine choice

Sunday, 25 June 2017

Analysing my DNA: Crossovers Part 2

This is a continuation from my part 1 post at http://notjusttheparrys.blogspot.co.uk/2016/12/analysing-my-dna-crossovers-part-1.html.  My initial intention for this post was simply to look at the shared matches between the siblings, to see how those results correlate with the phasing of chromosome 21 represented in part 1.  That sounds easy enough but one of the reasons it has taken me so long to post, is that things very rapidly become complicated! 

So, in this post, I will look at the shared matching between the siblings and their three closest relatives - a niece, a first cousin and a third cousin once removed - and how adding the additional relatives caused me to alter my interpretation of how the niece matched. 


This was my starting point from part 1, the four siblings A, B, C, and D, with their chromosomes represented by the four colours:

The parents' chromosomes can then be represented as follows:


And which parts of the parents' chromosomes each of the siblings received like this:


One of the closest matches to the siblings is their niece, daughter of a deceased brother.  Since the brother was never tested, I don't know what crossover points he received from his parents. The niece will only have one chromosome (of each chromosome pair) from her father - but the differences in matching between the niece and the siblings could be as a result of crossovers within each of her father's chromosomes, or between the father's two chromosomes. 


So these are the comparisons between the niece and each of the siblings from Gedmatch, along with a potential crossover point identified at 43:

So immediately there is an issue - the niece matches sibling B up until 43, and, correctly, does not match any of the other siblings until that point.  However, beyond 43, the niece appears to match none of the siblings (based on the grey "match" bar).  But we know that, since sibling A and B do not match each other at all on this chromosome, the two siblings A and B, between them, cover all four of the siblings' parents' chromosome 21s.  So, if the niece doesn't match sibling B, then she has to match sibling A at least.  And, looking at the Gedmatch image, it seems quite clear that this is a threshold issue - the niece does actually match the three siblings A, C and D beyond 43.  The match just isn't being picked up as a match by Gedmatch at the default threshold.  Reducing the threshold indicates the niece matches all three siblings A, C and D by 6.9cM, containing between 1041 - 1045 SNPs.

The initial interpretation of the DNA received by the niece therefore became:


Next, I looked at how the siblings and the niece matched the siblings' paternal first cousin.  The Gedmatch image below was produced using the default threshold, but again, reducing the thresholds slightly indicated a potential matching segment just below the 7cM threshold:

Chr        Start Location        End Location        Centimorgans (cM)
21        14,677,076        22,936,413        18.2
21        22,950,552        33,423,011        15.7
21        34,132,054        37,056,381        6.7




The paternal first cousin can only match the siblings through their father's chromosomes.  But, as their father will not have received exactly the same DNA as the first cousin's parent did, there will be some areas where the first cousin does not match any of the siblings.

By comparison to the phasing of the siblings and niece, the first cousin's matching segments were therefore mapped as follows:


(this process also indicated that the "Parent 2" phasing represents the siblings' father's chromosomes.)

So far, so good.

When I downloaded the matching segments for the siblings, in order to start investigating the shared matches, I realised a known relative shared DNA with sibling B on chromosome 21.  The relative is a 3rd cousin 1 removed (3c1r) and shares from about 17 to 28.  The shared ancestry is on the siblings' paternal side of the family, the same as the 1c is:

But now there's a problem.  This 3c1r does not match any of the other siblings, or the niece, on chromosome 21.  But, at the point where the  3c1r matches B, we have already "used" both of the paternal chromosomes, one for the matching between the first cousin and siblings ACD, the other for the matching between the niece and sibling B.   It's okay that the 1c doesn't match the 3c1r - that actually indicates that the chromosome ACD share with the 1c must be the one the siblings' father received from his mother, the siblings' grandmother, as she is also a common ancestor with the 1c. 

But, clearly the chromosome the niece shares with sibling B cannot be the other paternal chromosome.  As far as I am aware, there's no other shared ancestry with the 3c1r.  So, let's go back to the matching between the niece and the siblings - where did I go wrong?


Siblings A, C, and D all show a very small area of potentially matching SNPs between 24 and 26 - but it is only 1.5 cM and 365 SNP.  I don't believe that has any significance, especially as there's no change in matching with sibling B. (The niece only has one relevant chromosome in this comparison - and the kit being used is a "paternal" one that's been phased using her mother's data, so should be fairly accurate.)

So what about the potentially matching segment with sibling C, between 37 - 39?  This is a 4.2 cM segment, containing 743 SNPs - so it is a small segment that, under normal circumstances, when matching to unknown and more distant relatives, should be ignored. 

From the sibling phasing, B and C are matching from 37, after C had a crossover, and their matching segment is a "Parent 1" segment.  So, is it possible that the niece's matching should actually be as follows:


The niece is matching B on a Parent 1 chromosome (now known to be maternal).  Sibling C then starts to match both B & the niece at 37, but the niece stops matching C at 39, as the niece has a crossover between the two chromosomes her father had.  If she switches from her father's maternal chromosome to his paternal chromosome, and those are also the two chromosomes sibling B has, that would account for why the niece continues to match B until 43.  At 43 there is then a crossover between the two chromosomes of Parent 2 - which would indicate a crossover in the niece's father, passed on to the niece within the segment from his paternal chromosome.  This interpretation would account for the niece's match to the 1c, between 40 - 43, and explain why she does not match the paternal 3c1r earlier on the chromosome, between 17 - 28.

If that is the situation, then the diagram of the siblings' parents' chromosomes can now be extended to also show the DNA received by their grandchild, the siblings' niece, as well as the potential source for the paternal chromosomes:


Please let me know if you can spot any mistakes in my reasoning.  

Sunday, 19 February 2017

A slight sidetrack - my LivingDNA results

I received my LivingDNA results earlier this month and have been doing some research as a result. I'm therefore taking a little side-track, to blog about that, rather than continuing with the post about mapping crossover points (which will be posted eventually, I promise!)

Details about the LivingDNA test can be found on their website (at https://www.livingdna.com ) and various other bloggers have already described their own results, in particular Debbie Kennett, who probably has the most detailed review of all areas of the results.* Here I am only concentrating on the "Family Ancestry" area, also known as the autosomal DNA.

Basically, unlike the other autosomal tests I have taken (at 23andMe, Family Tree DNA, and Ancestry DNA), where I regard their ethnicity predictions 'with a pinch of salt', and my main aim has been to obtain matches through which I can confirm and further my family history, the LivingDNA test is currently purely about ethnicity, about where we come from (although other features will be added later). The value of their test is that it has been developed in partnership with a range of scientific teams, such as those involved with the People of the British Isles project, enabling more precise predictions of origins for those people with British ancestry in particular, than the other tests currently available provide. Ethnicity, or 'Origins', predictions are dependent on the reference populations you are being compared to and, in the case of LivingDNA, this currently includes 80 world regions, with 21 regions in Britain and Ireland.

So what makes me, me?



The above image shows my DNA mix in the last 10 generations, at three levels of detail, through means of a family ancestry avatar, which is a bit of fun.  At the moment, only the "standard" mode is available, but "cautious" and "complete" views will be provided in the future.

The results are also shown in a map format, again at the three levels - global, regional and sub-regional. At the global level my Family Ancestry Overview indicates that I am 98.4% Europe and 1.6% World (unassigned).




At the regional level, the Europe 98.4% is broken down into
Great Britain and Ireland 91.3%
Europe (North and West) 4.9%
Europe (unassigned) 2.2%

The map for this looks very similar, but just in shades of green:


It's at the Sub-Regional level that the picture becomes much more interesting:


The following image shows the level of detail within the UK area, which indicates I have ancestry from at least 13 specific UK regions, with some DNA still unassigned:




As I had to reduce the size of the screenshots, to get all the figures in, here is the percentage breakdown:

Europe 98.4%

Great Britain and Ireland 91.3%
  • South Wales Border 41% 
  • Southeast England 10.1% 
  • East Anglia 8.1% 
  • Cumbria 4.5% 
  • Cornwall 3.4% 
  • Northwest England 3.3% 
  • South Wales 3.2% 
  • Devon 2.9% 
  • South Central England 2.8% 
  • South Yorkshire 2.6% 
  • Lincolnshire 1.2% 
  • Northumbria 1.2% 
  • Orkney 1.1% 
  • Great Britain and Ireland (unassigned) 5.7% 
Europe (North and West) 4.9%
  • Scandinavia 2.8% 
  • France 2.1% 
Europe (unassigned) 2.2%

World (unassigned) 1.6%

I like the distribution maps but I really love the 'Do-nut' chart, as I think that gives a better indication of how much of me is thought to come from each of the regions, ie my percentage make-up:




You can see how much more detailed these results are, compared to those currently provided by the other companies:
Family Tree DNA (comparing to 18 population clusters) - 99% European (made up of 70% British Isles, 29% Scandinavian) and 1% Middle Eastern (North Africa),
23andMe (standard) (comparing to 31 populations worldwide) - 99.7% European (18.9% British & Irish, 0.4% French & German, 68% broadly Northwestern European, 0.4% Iberian, 0.3% broadly Southern European, 11.8% Broadly European), <0.1% Sub-Saharan African (Central & South African) and 0.2% unassigned
AncestryDNA (comparing to 26 global regions) - 99% European (63% Great Britain, 24% Ireland, 10% Europe West, 1% Finland/Northwest Russia, <1% Europe East ) and <1% Africa (Africa North)


Of course, what's important to me is how the genetics works with my genealogy, ie how well do these results match to where my family history indicates my ancestors came from?

Using the regional descriptions on the page at https://www.livingdna.com/en-gb/uk-regional-breakdown , I have coloured a pedigree chart with my ancestors' birthplaces (an idea copied from Debbie Kennett, who attributes the original idea to J. Paul Hawthorne with his #Mycolorfulancestry meme).


There are some problems trying to match the colours like this. One of my ancestors comes from Hampshire, which is in the South England region - a region which does not show up in my results, so I don't have a matching colour to use. I also have three unknown 3xgreat grandparents. Another issue is that the regional descriptions given on the above page differ from that given on my results pages.

On my results pages, it states "The areas of Shropshire, Herefordshire, Monmouthshire, Worcestershire, Powys and Gwent are collectively called the South Wales border" and, for South Wales, it states "unique southern signature is found in the modern counties of Pembrokeshire, Ceredigion, Carmarthenshire and West Glamorgan."

Whereas the regional breakdown page above describes the South Wales border as "approximately Herefordshire/Worcestershire/Shropshire/W Midlands and surrounding areas" and South Wales is then described as "approximately Pembrokeshire/Carmarthenshire/South Powys/Swansea/Glamorgan/Monmouthshire areas"

Thus my Monmouthshire, Breconshire and Radnorshire ancestors are in different regions in these two descriptions. (Breconshire and Radnorshire are now part of Powys) So the three grey/blue "South Wales (or SW border)" entries in the pedigree above possibly should be orange, to match the rest of the "South Wales border" entries.

This seems more probable when I plot the known birthplaces of my 3xgreat grandparents using Genmap*:



As you can see, my paternal 3xgreat grandparents cluster around the South Wales border area and those within Monmouthshire, Breconshire and Radnorshire, are only just over the border so perhaps more likely to be genetically similar to the South Wales Border region than to the South Wales region.

The colours on my pedigree give the impression that a higher percentage of my DNA from my maternal ancestry should be in the Southeast England region. However, from the map, it is clear that my maternal ancestry generally is more spread out than my paternal ancestry and that those in the Southeast region are predominantly in London:




The ancestor with the red square around them (one of three plotted at that point in Lambeth) is known to have a German grandfather. Since the DNA results are said to relate to my DNA mix in the last 10 generations, and my pedigree is only showing 5 generations, then clearly there is a lot of potential for my other London based maternal ancestors to have arrived there from somewhere else in the country.

So, can the DNA results actually help me with tracing my ancestry, particularly with regard to my London ancestors? Should I be looking for connections, for example, to the north of England, or down in Cornwall?

Please note, I am just exploring ideas here.

In the Guild of One-Name Studies, we consider the frequencies and distributions of the surnames we study, as this can often shed light on the origins of the surnames - and potentially suggest locations that ancestors who suddenly "appear" somewhere might have come from.

So, using Steve Archer's Surname Atlas*, which maps the distributions and frequencies of surnames from the 1881 census, I've produced maps for each of the surnames of my known 3xgreat grandparents.

These are the distributions for the paternal surnames (15/16 known):


The majority of the surnames do show concentrations in Wales, or the South Wales border area, although there are some interesting "non-Welsh" distributions for Robinson, Taylor and Mitchell in particular. The surname Robinson does seem very concentrated across the north of England. Harris has both a south Wales and a Cornish concentration. Although the surname Parry shows a concentration across North Wales and predominantly in Anglesey, I know that this surname is a Welsh patronymic, and therefore has multiple origins across Wales. The dates of origin for this surname can be anywhere between about 1400 - 1800. So, for my family, I suspect the origin is more likely to be in the South Wales border area, where the known family were, and perhaps related to the concentration in Breconshire.

These are the distributions for the maternal surnames (only 12/16 known):


Interestingly, two of the surnames on my maternal side also show potential Welsh origins. More of these maternal surnames show a countrywide distribution, which makes it difficult to identify a specific "potential origin". But there are some concentrations in the North and also one surname, Rice, showing a concentration in Devon. The two concentrations of the Harland surname, both in coastal regions, does make me wonder if that on the south coast could have been created by migration of some families from Yorkshire.

Obviously this sort of idea needs confirming properly through thorough research. But perhaps there are some hints from this surname mapping process, as to which of my ancestral lines might be the sources of the DNA from regions such as Cumbria.

I've already mentioned the German ancestry of one of my London 3xgreat grandfathers, whose grandfather was called John Michael Hengler. The other DNA companies provide "match lists" showing who else in their databases I relate to. Quite a few of these matches do have ancestry in the north of England and I have often wondered if this was due to other descendants of my ancestors having moved into those areas later. For example, whilst I descend from the daughter of John Michael Hengler, his son was married in Ireland and it appears that some descendants of that line ended up in Lancashire.

But I don't think that explanation would fit with the different regional results as discovered by the scientific research in the POBI project and LivingDNA. So maybe I will need to be looking further back to find the shared ancestry after all.

As a follow up to plotting the distributions, I did have a look at my Naylor ancestry. This is a surname that has cropped up in the pedigrees of some of my matches at Ancestry, so I had recently identified the potential link between my 2xgreat grandfather, William Naylor, born about 1838 in Islington, Middlesex, to his father George Richard Naylor, who was born in Gloucestershire. But I was intrigued by the Naylor surname distribution showing a concentration in Rutland and across Yorkshire. Proper confirmation is still required but initial research suggests that George Richard Naylor (or "Nayler") was the son of a Richard Nayler, a surgeon in Gloucestershire, and that this Richard was the son of another surgeon, a George Nayler of Stroud, Gloucestershire. But this George Nayler's wife was a Sarah, daughter of John Fark of Clitheroe, Lancashire.

Okay, so that's not the Nayler's themselves coming from the north of England (so far) - but potentially it indicates there could be an ancestral line from that area.

As I said, I've just been exploring an idea with this - but clearly our family history and our DNA must tie in with each other. It's just a matter of us discovering how!

I gather that AncestryDNA will soon be releasing a "Genetic Communities" feature, as part of their DNA results*. Some people in the UK can already see a beta version of this, but it doesn't appear on my account. I am looking forward to its eventual release, as it will be interesting to see how it compares to my LivingDNA results (and whether another "theory" of mine could be true - that some of my DNA matches in the US potentially stem from Morman converts from the Herefordshire border area - as I understand, from a comment by Debbie Kennett, that there's a "Mormon Pioneers" community listed in her results.*)

These are certainly exciting times to be involved in Genetic Genealogy!

* Notes and sources:
Blogger's posts about their results:
Debbie Kennett:
https://cruwys.blogspot.co.uk/2017/01/my-living-dna-results-part-1-family.html
https://cruwys.blogspot.co.uk/2017/01/my-living-dna-results-part-2-mtdna-and.html
(Debbie includes links to other Blogger's posts at the end of the part 1 above)

Genmap and Surname Atlas - programs by Steve Archer. See Archer Software, at http://www.archersoftware.co.uk/

Debbie's Mormon pioneers comment - on Ania Waterman's blog at https://ancestraladventures.wordpress.com/2017/02/09/new-ancestry-dna-feature

Saturday, 3 December 2016

Analysing my DNA: Crossovers Part 1

Recently I have been working on identifying the "crossover points" in the DNA of a group of four siblings. They are related to me, so the results of their DNA tests will not only help me in my ancestral searches, but also in discovering more about my own DNA and mapping it to specific ancestors. Crossover points are important for anyone who is trying to identify where segments of DNA came from, ie through which ancestors, as they indicate a change in the DNA, from that of one grandparent to that of the other grandparent in that couple.

There are detailed explanations of the processes involved available elsewhere online but these are the basics, for anyone who is new to this. We all have 23 pairs of chromosomes, one of each pair coming from our father and one from our mother. Just as we received them from our parents, so our parents received one of each pair of their chromosomes from each of their parents, any future child's grandparents:



Things aren't usually as simple as that shown above. Meiosis, the process of cell division that produces the egg or sperm and ensures the correct amount of DNA is passed on to the offspring, more commonly involves the two chromosomes in a pair splitting and then recombining in a different way, so that the resulting chromosome that's passed on is a mixture of the two of the parent:



 As the process of recombination is random, children of the same parents will each receive a different combination of the DNA that came to their parents from their grandparents:



Unfortunately, the DNA tests do not phase our results, so we cannot even identify our two separate chromosomes, unless we have other relatives tested. All we have to work with is the raw data and information about where we match other people.

In the case of a parent and child who have both tested, comparison with their matches may sometimes indicate a possible crossover:



Not only does the Match, match me by much less than they do my mother, they also match a group of other people, who only match my mother, not me, over the latter part of the segment, from about 122,000,000 to 134,000,000. So it appears that I may have a crossover in my maternal chromosome and did not receive the rest of the segment from the ancestor shared with this match. If I knew which side of my mother's family this match has a connection to then, if my mother and I have any shared matches starting after 125,000,000, I would know to concentrate my search for the shared ancestry on the other side of my mother's family.

But comparisons with a parent against other matches is only likely to reveal a few of the potential crossovers. A better method, available to anyone with a group of three or more siblings tested, is to use the sibling comparisons to identify crossovers. As already indicated, a group of siblings will have received different segments of the grandparent's DNA, but the results are not phased by any of the testing company chromosome browsers:



I have represented in orange the (approximate!) overall matching segments of the children - but, using Gedmatch, it is possible to also identify where two people fully match, ie match on both of their chromosomes, rather than just "half match", ie match on one chromosome. It is this, more complete, pattern of matching - changing between the states of having full, half, or no, matching DNA - which is used in order to identify the points where the DNA "crossed over" from one grandparent's DNA to the other.

There are probably several methods for doing this but I think most credit goes to Kathy Johnston for her "Visual Phasing" method. There are some very good blog posts about using Kathy's method, which I shall include links to below - it is worth reading several, as we all have different ways of describing what we do. I have now developed a slightly different method of working, which suits me better.

But I am going to start with the smallest chromosome, chromosome 21, and follow Kathy's instructions, to illustrate the basic method to start with.

These are the comparisons between the four siblings at Gedmatch:



This is the key to the Gedmatch colours:



 And these are the figures for the comparisons:



I have kept the figures separate from the individual chromosome images, as I found the crossover lines end up obscuring the figures on the chromosomes with more crossovers.

Despite having looked at comparisons between the siblings in various other formats (eg the FTDNA downloads), it was only when I did these comparisons that I realised Siblings A and B do not match each other on this chromosome.

Which goes to show how we often notice just what is present - not what is missing! 🙂

But, as you can see, even where two siblings do not match each other (ie they have a grey bar along the lower section, not a blue bar) there are still some base pairs showing a half, or even a full, match. There just aren't sufficient of such matching base pairs in a consecutive sequence for it to be regarded as genealogically relevant.

The next step is for the crossover points to be identified. These are the points where there is a change between fully matching and half matching, or half matching and non-matching, ie where the bottom bar changes between blue and grey, or where the top bars change between an area that is consistently green and one which is predominantly yellow, with intermittent green. The former changes are also demonstrated by the figures. Unfortunately, the changes between fully matching and half matching are not specifically identified in any figures at Gedmatch, although Sue Griffith has explained how to obtain a very good estimate of them*. They can also be identified using one of David Pike's tools*.

Once the crossover points have been identified, they are allocated to particular siblings - a crossover "belongs" to the sibling who shows that change in all of their comparisons. (This isn't always obvious, especially if only using three siblings - sometimes, what looks like a single crossover for one sibling can actually be a double crossover for the two others. Having results available from more than three siblings is an advantage for me.)



In the comparison between B to D, the matching segment does seem to start before the crossover point indicated in the comparisons between D to A and D to C. I suspect this segment could be being artificially extended through some base pairs that just happen to match on both B and D. Issues like this are things to note for future investigation, as they may be a hint that something is wrong with the identification.

Next, working with just the identified crossover lines in an image, but referring to the comparison diagram and the figures, the phased segments of the grandparents' DNA are constructed, usually starting with a segment where two siblings are fully identical. In order to do this, four colours are chosen to represent the DNA received by the children from the grandparents.  Two colours are used for the top grandparent couple and two for the bottom grandparent couple.  [Note, If you follow a colour coded genealogy filing system, I would suggest choosing different colours for the chromosome mapping  (at least, until you are absolutely positive you have identified the correct grandparents' segments, in which case you could change the colours to match your genealogy system.  This would then also be a visual clue that, that chromosome is "confirmed")  But, if you use those colours prior to such confirmation, you might find yourself becoming confused, as we do not yet know which grandparent couple is represented by which phased chromosome.]

At the start of this chromosome 21, B does not match any of their siblings, whereas the other three are all fully identical to each other, so the colours can be allocated as follows:



Since neither A nor B have any crossovers, their coloured bars can be extended for the full length of the chromosome. D's can also be extended as far as D's crossover line at 40:



 Between "37" and "40", C becomes half identical to all three of the other siblings. We don't know whether the crossover is on the maternal or the paternal chromosomes (and we haven't identified the colours as being for specific grandparents anyway), so we just have to pick one of the colours to change. I have chosen the top chromosome, purple changing to blue.  As this is the only crossover C has, the two bars can then be extended to the end:



At "40", D becomes half matching to A and B, but fully matching to C. The same chromosome that we changed for C therefore needs to change for D, in order to produce the correct pattern of matching, and the other colour can be extended, unchanged, to the end:



At this stage, we don't know which colours represent which grandparents - that can only be identified by comparison to other known relatives. But we can still look at the shared matches between the siblings, to see how those results correlate with the phasing represented here. For example, I would expect there to be no shared matches between A and B at any point on this chromosome, whereas A, C and D should have exactly the same matches prior to the point "37". B, C and D will share some matches after point "40", but not all of them. The ones C and D don't share with B after "40", should be people that match A, as well.

So, in my next post, I will explore that. I'll also describe some of the issues I have come across in this process so far, as well as explain the way I have adapted Kathy's method to my own way of working.

But, if you thought this chromosome was easy to phase, then perhaps you'd like to consider the following set of comparisons:



[PS Having begun to look at the matches the siblings have with their niece and their 1st cousin, as well as the more distant matches, I have found an "anomaly". So, perhaps phasing chromosome 21 isn't so straightforward, after all!]


* Sources and references I have found helpful:

Kathy Johnston - step by step instructions for her method: http://forums.familytreedna.com/showthread.php?t=36812 (make sure you download both the slides and the instructions)
Jason Lee - a blog post detailing Kathy's method: http://dnagenealogy.tumblr.com/post/137722603308/the-use-of-crossover-lines-among-siblings-to
Blaine Bettinger's pdf combining his five posts about the phasing process - http://thegeneticgenealogist.com/wp-content/uploads/2016/11/Visual-Phasing-Bettinger.pdf

Two other bloggers with helpful posts about phasing, including issues such as the way what looks like a single crossover for one sibling can actually be a double crossover for two others:
Ann Raymont - https://dnasleuth.wordpress.com/2016/06/01/chromosome-mapping-with-siblings-part-2/ (and part 1)
Joel Hartley: http://www.jmhartley.com/HBlog/?p=2239

 Sue Griffith's post on how obtain the values for crossovers from FIR to HIR & vice versa: http://www.genealogyjunkie.net/blog/obtaining-fir-boundaries-on-gedmatch-using-the-little-tick-marks

David Pike has a number of free DNA tools, including the "Search for Shared DNA Segments in Two Raw Data Files" which reports single and double matching segments (ie half identical and fully identical): http://www.math.mun.ca/~dapike/FF23utils/pair-comp.php