Sunday, 25 June 2017

Analysing my DNA: Crossovers Part 2

This is a continuation from my part 1 post at http://notjusttheparrys.blogspot.co.uk/2016/12/analysing-my-dna-crossovers-part-1.html.  My initial intention for this post was simply to look at the shared matches between the siblings, to see how those results correlate with the phasing of chromosome 21 represented in part 1.  That sounds easy enough but one of the reasons it has taken me so long to post, is that things very rapidly become complicated! 

So, in this post, I will look at the shared matching between the siblings and their three closest relatives - a niece, a first cousin and a third cousin once removed - and how adding the additional relatives caused me to alter my interpretation of how the niece matched. 

This was my starting point from part 1, the four siblings A, B, C, and D, with their chromosomes represented by the four colours:

The parents' chromosomes can then be represented as follows:

And which parts of the parents' chromosomes each of the siblings received like this:

One of the closest matches to the siblings is their niece, daughter of a deceased brother.  Since the brother was never tested, I don't know what crossover points he received from his parents. The niece will only have one chromosome (of each chromosome pair) from her father - but the differences in matching between the niece and the siblings could be as a result of crossovers within each of her father's chromosomes, or between the father's two chromosomes. 

So these are the comparisons between the niece and each of the siblings from Gedmatch, along with a potential crossover point identified at 43:

So immediately there is an issue - the niece matches sibling B up until 43, and, correctly, does not match any of the other siblings until that point.  However, beyond 43, the niece appears to match none of the siblings (based on the grey "match" bar).  But we know that, since sibling A and B do not match each other at all on this chromosome, the two siblings A and B, between them, cover all four of the siblings' parents' chromosome 21s.  So, if the niece doesn't match sibling B, then she has to match sibling A at least.  And, looking at the Gedmatch image, it seems quite clear that this is a threshold issue - the niece does actually match the three siblings A, C and D beyond 43.  The match just isn't being picked up as a match by Gedmatch at the default threshold.  Reducing the threshold indicates the niece matches all three siblings A, C and D by 6.9cM, containing between 1041 - 1045 SNPs.

The initial interpretation of the DNA received by the niece therefore became:

Next, I looked at how the siblings and the niece matched the siblings' paternal first cousin.  The Gedmatch image below was produced using the default threshold, but again, reducing the thresholds slightly indicated a potential matching segment just below the 7cM threshold:

Chr        Start Location        End Location        Centimorgans (cM)
21        14,677,076        22,936,413        18.2
21        22,950,552        33,423,011        15.7
21        34,132,054        37,056,381        6.7

The paternal first cousin can only match the siblings through their father's chromosomes.  But, as their father will not have received exactly the same DNA as the first cousin's parent did, there will be some areas where the first cousin does not match any of the siblings.

By comparison to the phasing of the siblings and niece, the first cousin's matching segments were therefore mapped as follows:

(this process also indicated that the "Parent 2" phasing represents the siblings' father's chromosomes.)

So far, so good.

When I downloaded the matching segments for the siblings, in order to start investigating the shared matches, I realised a known relative shared DNA with sibling B on chromosome 21.  The relative is a 3rd cousin 1 removed (3c1r) and shares from about 17 to 28.  The shared ancestry is on the siblings' paternal side of the family, the same as the 1c is:

But now there's a problem.  This 3c1r does not match any of the other siblings, or the niece, on chromosome 21.  But, at the point where the  3c1r matches B, we have already "used" both of the paternal chromosomes, one for the matching between the first cousin and siblings ACD, the other for the matching between the niece and sibling B.   It's okay that the 1c doesn't match the 3c1r - that actually indicates that the chromosome ACD share with the 1c must be the one the siblings' father received from his mother, the siblings' grandmother, as she is also a common ancestor with the 1c. 

But, clearly the chromosome the niece shares with sibling B cannot be the other paternal chromosome.  As far as I am aware, there's no other shared ancestry with the 3c1r.  So, let's go back to the matching between the niece and the siblings - where did I go wrong?

Siblings A, C, and D all show a very small area of potentially matching SNPs between 24 and 26 - but it is only 1.5 cM and 365 SNP.  I don't believe that has any significance, especially as there's no change in matching with sibling B. (The niece only has one relevant chromosome in this comparison - and the kit being used is a "paternal" one that's been phased using her mother's data, so should be fairly accurate.)

So what about the potentially matching segment with sibling C, between 37 - 39?  This is a 4.2 cM segment, containing 743 SNPs - so it is a small segment that, under normal circumstances, when matching to unknown and more distant relatives, should be ignored. 

From the sibling phasing, B and C are matching from 37, after C had a crossover, and their matching segment is a "Parent 1" segment.  So, is it possible that the niece's matching should actually be as follows:

The niece is matching B on a Parent 1 chromosome (now known to be maternal).  Sibling C then starts to match both B & the niece at 37, but the niece stops matching C at 39, as the niece has a crossover between the two chromosomes her father had.  If she switches from her father's maternal chromosome to his paternal chromosome, and those are also the two chromosomes sibling B has, that would account for why the niece continues to match B until 43.  At 43 there is then a crossover between the two chromosomes of Parent 2 - which would indicate a crossover in the niece's father, passed on to the niece within the segment from his paternal chromosome.  This interpretation would account for the niece's match to the 1c, between 40 - 43, and explain why she does not match the paternal 3c1r earlier on the chromosome, between 17 - 28.

If that is the situation, then the diagram of the siblings' parents' chromosomes can now be extended to also show the DNA received by their grandchild, the siblings' niece, as well as the potential source for the paternal chromosomes:

Please let me know if you can spot any mistakes in my reasoning.  

Sunday, 19 February 2017

A slight sidetrack - my LivingDNA results

I received my LivingDNA results earlier this month and have been doing some research as a result. I'm therefore taking a little side-track, to blog about that, rather than continuing with the post about mapping crossover points (which will be posted eventually, I promise!)

Details about the LivingDNA test can be found on their website (at https://www.livingdna.com ) and various other bloggers have already described their own results, in particular Debbie Kennett, who probably has the most detailed review of all areas of the results.* Here I am only concentrating on the "Family Ancestry" area, also known as the autosomal DNA.

Basically, unlike the other autosomal tests I have taken (at 23andMe, Family Tree DNA, and Ancestry DNA), where I regard their ethnicity predictions 'with a pinch of salt', and my main aim has been to obtain matches through which I can confirm and further my family history, the LivingDNA test is currently purely about ethnicity, about where we come from (although other features will be added later). The value of their test is that it has been developed in partnership with a range of scientific teams, such as those involved with the People of the British Isles project, enabling more precise predictions of origins for those people with British ancestry in particular, than the other tests currently available provide. Ethnicity, or 'Origins', predictions are dependent on the reference populations you are being compared to and, in the case of LivingDNA, this currently includes 80 world regions, with 21 regions in Britain and Ireland.

So what makes me, me?

The above image shows my DNA mix in the last 10 generations, at three levels of detail, through means of a family ancestry avatar, which is a bit of fun.  At the moment, only the "standard" mode is available, but "cautious" and "complete" views will be provided in the future.

The results are also shown in a map format, again at the three levels - global, regional and sub-regional. At the global level my Family Ancestry Overview indicates that I am 98.4% Europe and 1.6% World (unassigned).

At the regional level, the Europe 98.4% is broken down into
Great Britain and Ireland 91.3%
Europe (North and West) 4.9%
Europe (unassigned) 2.2%

The map for this looks very similar, but just in shades of green:

It's at the Sub-Regional level that the picture becomes much more interesting:

The following image shows the level of detail within the UK area, which indicates I have ancestry from at least 13 specific UK regions, with some DNA still unassigned:

As I had to reduce the size of the screenshots, to get all the figures in, here is the percentage breakdown:

Europe 98.4%

Great Britain and Ireland 91.3%
  • South Wales Border 41% 
  • Southeast England 10.1% 
  • East Anglia 8.1% 
  • Cumbria 4.5% 
  • Cornwall 3.4% 
  • Northwest England 3.3% 
  • South Wales 3.2% 
  • Devon 2.9% 
  • South Central England 2.8% 
  • South Yorkshire 2.6% 
  • Lincolnshire 1.2% 
  • Northumbria 1.2% 
  • Orkney 1.1% 
  • Great Britain and Ireland (unassigned) 5.7% 
Europe (North and West) 4.9%
  • Scandinavia 2.8% 
  • France 2.1% 
Europe (unassigned) 2.2%

World (unassigned) 1.6%

I like the distribution maps but I really love the 'Do-nut' chart, as I think that gives a better indication of how much of me is thought to come from each of the regions, ie my percentage make-up:

You can see how much more detailed these results are, compared to those currently provided by the other companies:
Family Tree DNA (comparing to 18 population clusters) - 99% European (made up of 70% British Isles, 29% Scandinavian) and 1% Middle Eastern (North Africa),
23andMe (standard) (comparing to 31 populations worldwide) - 99.7% European (18.9% British & Irish, 0.4% French & German, 68% broadly Northwestern European, 0.4% Iberian, 0.3% broadly Southern European, 11.8% Broadly European), <0.1% Sub-Saharan African (Central & South African) and 0.2% unassigned
AncestryDNA (comparing to 26 global regions) - 99% European (63% Great Britain, 24% Ireland, 10% Europe West, 1% Finland/Northwest Russia, <1% Europe East ) and <1% Africa (Africa North)

Of course, what's important to me is how the genetics works with my genealogy, ie how well do these results match to where my family history indicates my ancestors came from?

Using the regional descriptions on the page at https://www.livingdna.com/en-gb/uk-regional-breakdown , I have coloured a pedigree chart with my ancestors' birthplaces (an idea copied from Debbie Kennett, who attributes the original idea to J. Paul Hawthorne with his #Mycolorfulancestry meme).

There are some problems trying to match the colours like this. One of my ancestors comes from Hampshire, which is in the South England region - a region which does not show up in my results, so I don't have a matching colour to use. I also have three unknown 3xgreat grandparents. Another issue is that the regional descriptions given on the above page differ from that given on my results pages.

On my results pages, it states "The areas of Shropshire, Herefordshire, Monmouthshire, Worcestershire, Powys and Gwent are collectively called the South Wales border" and, for South Wales, it states "unique southern signature is found in the modern counties of Pembrokeshire, Ceredigion, Carmarthenshire and West Glamorgan."

Whereas the regional breakdown page above describes the South Wales border as "approximately Herefordshire/Worcestershire/Shropshire/W Midlands and surrounding areas" and South Wales is then described as "approximately Pembrokeshire/Carmarthenshire/South Powys/Swansea/Glamorgan/Monmouthshire areas"

Thus my Monmouthshire, Breconshire and Radnorshire ancestors are in different regions in these two descriptions. (Breconshire and Radnorshire are now part of Powys) So the three grey/blue "South Wales (or SW border)" entries in the pedigree above possibly should be orange, to match the rest of the "South Wales border" entries.

This seems more probable when I plot the known birthplaces of my 3xgreat grandparents using Genmap*:

As you can see, my paternal 3xgreat grandparents cluster around the South Wales border area and those within Monmouthshire, Breconshire and Radnorshire, are only just over the border so perhaps more likely to be genetically similar to the South Wales Border region than to the South Wales region.

The colours on my pedigree give the impression that a higher percentage of my DNA from my maternal ancestry should be in the Southeast England region. However, from the map, it is clear that my maternal ancestry generally is more spread out than my paternal ancestry and that those in the Southeast region are predominantly in London:

The ancestor with the red square around them (one of three plotted at that point in Lambeth) is known to have a German grandfather. Since the DNA results are said to relate to my DNA mix in the last 10 generations, and my pedigree is only showing 5 generations, then clearly there is a lot of potential for my other London based maternal ancestors to have arrived there from somewhere else in the country.

So, can the DNA results actually help me with tracing my ancestry, particularly with regard to my London ancestors? Should I be looking for connections, for example, to the north of England, or down in Cornwall?

Please note, I am just exploring ideas here.

In the Guild of One-Name Studies, we consider the frequencies and distributions of the surnames we study, as this can often shed light on the origins of the surnames - and potentially suggest locations that ancestors who suddenly "appear" somewhere might have come from.

So, using Steve Archer's Surname Atlas*, which maps the distributions and frequencies of surnames from the 1881 census, I've produced maps for each of the surnames of my known 3xgreat grandparents.

These are the distributions for the paternal surnames (15/16 known):

The majority of the surnames do show concentrations in Wales, or the South Wales border area, although there are some interesting "non-Welsh" distributions for Robinson, Taylor and Mitchell in particular. The surname Robinson does seem very concentrated across the north of England. Harris has both a south Wales and a Cornish concentration. Although the surname Parry shows a concentration across North Wales and predominantly in Anglesey, I know that this surname is a Welsh patronymic, and therefore has multiple origins across Wales. The dates of origin for this surname can be anywhere between about 1400 - 1800. So, for my family, I suspect the origin is more likely to be in the South Wales border area, where the known family were, and perhaps related to the concentration in Breconshire.

These are the distributions for the maternal surnames (only 12/16 known):

Interestingly, two of the surnames on my maternal side also show potential Welsh origins. More of these maternal surnames show a countrywide distribution, which makes it difficult to identify a specific "potential origin". But there are some concentrations in the North and also one surname, Rice, showing a concentration in Devon. The two concentrations of the Harland surname, both in coastal regions, does make me wonder if that on the south coast could have been created by migration of some families from Yorkshire.

Obviously this sort of idea needs confirming properly through thorough research. But perhaps there are some hints from this surname mapping process, as to which of my ancestral lines might be the sources of the DNA from regions such as Cumbria.

I've already mentioned the German ancestry of one of my London 3xgreat grandfathers, whose grandfather was called John Michael Hengler. The other DNA companies provide "match lists" showing who else in their databases I relate to. Quite a few of these matches do have ancestry in the north of England and I have often wondered if this was due to other descendants of my ancestors having moved into those areas later. For example, whilst I descend from the daughter of John Michael Hengler, his son was married in Ireland and it appears that some descendants of that line ended up in Lancashire.

But I don't think that explanation would fit with the different regional results as discovered by the scientific research in the POBI project and LivingDNA. So maybe I will need to be looking further back to find the shared ancestry after all.

As a follow up to plotting the distributions, I did have a look at my Naylor ancestry. This is a surname that has cropped up in the pedigrees of some of my matches at Ancestry, so I had recently identified the potential link between my 2xgreat grandfather, William Naylor, born about 1838 in Islington, Middlesex, to his father George Richard Naylor, who was born in Gloucestershire. But I was intrigued by the Naylor surname distribution showing a concentration in Rutland and across Yorkshire. Proper confirmation is still required but initial research suggests that George Richard Naylor (or "Nayler") was the son of a Richard Nayler, a surgeon in Gloucestershire, and that this Richard was the son of another surgeon, a George Nayler of Stroud, Gloucestershire. But this George Nayler's wife was a Sarah, daughter of John Fark of Clitheroe, Lancashire.

Okay, so that's not the Nayler's themselves coming from the north of England (so far) - but potentially it indicates there could be an ancestral line from that area.

As I said, I've just been exploring an idea with this - but clearly our family history and our DNA must tie in with each other. It's just a matter of us discovering how!

I gather that AncestryDNA will soon be releasing a "Genetic Communities" feature, as part of their DNA results*. Some people in the UK can already see a beta version of this, but it doesn't appear on my account. I am looking forward to its eventual release, as it will be interesting to see how it compares to my LivingDNA results (and whether another "theory" of mine could be true - that some of my DNA matches in the US potentially stem from Morman converts from the Herefordshire border area - as I understand, from a comment by Debbie Kennett, that there's a "Mormon Pioneers" community listed in her results.*)

These are certainly exciting times to be involved in Genetic Genealogy!

* Notes and sources:
Blogger's posts about their results:
Debbie Kennett:
(Debbie includes links to other Blogger's posts at the end of the part 1 above)

Genmap and Surname Atlas - programs by Steve Archer. See Archer Software, at http://www.archersoftware.co.uk/

Debbie's Mormon pioneers comment - on Ania Waterman's blog at https://ancestraladventures.wordpress.com/2017/02/09/new-ancestry-dna-feature

Saturday, 3 December 2016

Analysing my DNA: Crossovers Part 1

Recently I have been working on identifying the "crossover points" in the DNA of a group of four siblings. They are related to me, so the results of their DNA tests will not only help me in my ancestral searches, but also in discovering more about my own DNA and mapping it to specific ancestors. Crossover points are important for anyone who is trying to identify where segments of DNA came from, ie through which ancestors, as they indicate a change in the DNA, from that of one grandparent to that of the other grandparent in that couple.

There are detailed explanations of the processes involved available elsewhere online but these are the basics, for anyone who is new to this. We all have 23 pairs of chromosomes, one of each pair coming from our father and one from our mother. Just as we received them from our parents, so our parents received one of each pair of their chromosomes from each of their parents, any future child's grandparents:

Things aren't usually as simple as that shown above. Meiosis, the process of cell division that produces the egg or sperm and ensures the correct amount of DNA is passed on to the offspring, more commonly involves the two chromosomes in a pair splitting and then recombining in a different way, so that the resulting chromosome that's passed on is a mixture of the two of the parent:

 As the process of recombination is random, children of the same parents will each receive a different combination of the DNA that came to their parents from their grandparents:

Unfortunately, the DNA tests do not phase our results, so we cannot even identify our two separate chromosomes, unless we have other relatives tested. All we have to work with is the raw data and information about where we match other people.

In the case of a parent and child who have both tested, comparison with their matches may sometimes indicate a possible crossover:

Not only does the Match, match me by much less than they do my mother, they also match a group of other people, who only match my mother, not me, over the latter part of the segment, from about 122,000,000 to 134,000,000. So it appears that I may have a crossover in my maternal chromosome and did not receive the rest of the segment from the ancestor shared with this match. If I knew which side of my mother's family this match has a connection to then, if my mother and I have any shared matches starting after 125,000,000, I would know to concentrate my search for the shared ancestry on the other side of my mother's family.

But comparisons with a parent against other matches is only likely to reveal a few of the potential crossovers. A better method, available to anyone with a group of three or more siblings tested, is to use the sibling comparisons to identify crossovers. As already indicated, a group of siblings will have received different segments of the grandparent's DNA, but the results are not phased by any of the testing company chromosome browsers:

I have represented in orange the (approximate!) overall matching segments of the children - but, using Gedmatch, it is possible to also identify where two people fully match, ie match on both of their chromosomes, rather than just "half match", ie match on one chromosome. It is this, more complete, pattern of matching - changing between the states of having full, half, or no, matching DNA - which is used in order to identify the points where the DNA "crossed over" from one grandparent's DNA to the other.

There are probably several methods for doing this but I think most credit goes to Kathy Johnston for her "Visual Phasing" method. There are some very good blog posts about using Kathy's method, which I shall include links to below - it is worth reading several, as we all have different ways of describing what we do. I have now developed a slightly different method of working, which suits me better.

But I am going to start with the smallest chromosome, chromosome 21, and follow Kathy's instructions, to illustrate the basic method to start with.

These are the comparisons between the four siblings at Gedmatch:

This is the key to the Gedmatch colours:

 And these are the figures for the comparisons:

I have kept the figures separate from the individual chromosome images, as I found the crossover lines end up obscuring the figures on the chromosomes with more crossovers.

Despite having looked at comparisons between the siblings in various other formats (eg the FTDNA downloads), it was only when I did these comparisons that I realised Siblings A and B do not match each other on this chromosome.

Which goes to show how we often notice just what is present - not what is missing! 🙂

But, as you can see, even where two siblings do not match each other (ie they have a grey bar along the lower section, not a blue bar) there are still some base pairs showing a half, or even a full, match. There just aren't sufficient of such matching base pairs in a consecutive sequence for it to be regarded as genealogically relevant.

The next step is for the crossover points to be identified. These are the points where there is a change between fully matching and half matching, or half matching and non-matching, ie where the bottom bar changes between blue and grey, or where the top bars change between an area that is consistently green and one which is predominantly yellow, with intermittent green. The former changes are also demonstrated by the figures. Unfortunately, the changes between fully matching and half matching are not specifically identified in any figures at Gedmatch, although Sue Griffith has explained how to obtain a very good estimate of them*. They can also be identified using one of David Pike's tools*.

Once the crossover points have been identified, they are allocated to particular siblings - a crossover "belongs" to the sibling who shows that change in all of their comparisons. (This isn't always obvious, especially if only using three siblings - sometimes, what looks like a single crossover for one sibling can actually be a double crossover for the two others. Having results available from more than three siblings is an advantage for me.)

In the comparison between B to D, the matching segment does seem to start before the crossover point indicated in the comparisons between D to A and D to C. I suspect this segment could be being artificially extended through some base pairs that just happen to match on both B and D. Issues like this are things to note for future investigation, as they may be a hint that something is wrong with the identification.

Next, working with just the identified crossover lines in an image, but referring to the comparison diagram and the figures, the phased segments of the grandparents' DNA are constructed, usually starting with a segment where two siblings are fully identical. In order to do this, four colours are chosen to represent the DNA received by the children from the grandparents.  Two colours are used for the top grandparent couple and two for the bottom grandparent couple.  [Note, If you follow a colour coded genealogy filing system, I would suggest choosing different colours for the chromosome mapping  (at least, until you are absolutely positive you have identified the correct grandparents' segments, in which case you could change the colours to match your genealogy system.  This would then also be a visual clue that, that chromosome is "confirmed")  But, if you use those colours prior to such confirmation, you might find yourself becoming confused, as we do not yet know which grandparent couple is represented by which phased chromosome.]

At the start of this chromosome 21, B does not match any of their siblings, whereas the other three are all fully identical to each other, so the colours can be allocated as follows:

Since neither A nor B have any crossovers, their coloured bars can be extended for the full length of the chromosome. D's can also be extended as far as D's crossover line at 40:

 Between "37" and "40", C becomes half identical to all three of the other siblings. We don't know whether the crossover is on the maternal or the paternal chromosomes (and we haven't identified the colours as being for specific grandparents anyway), so we just have to pick one of the colours to change. I have chosen the top chromosome, purple changing to blue.  As this is the only crossover C has, the two bars can then be extended to the end:

At "40", D becomes half matching to A and B, but fully matching to C. The same chromosome that we changed for C therefore needs to change for D, in order to produce the correct pattern of matching, and the other colour can be extended, unchanged, to the end:

At this stage, we don't know which colours represent which grandparents - that can only be identified by comparison to other known relatives. But we can still look at the shared matches between the siblings, to see how those results correlate with the phasing represented here. For example, I would expect there to be no shared matches between A and B at any point on this chromosome, whereas A, C and D should have exactly the same matches prior to the point "37". B, C and D will share some matches after point "40", but not all of them. The ones C and D don't share with B after "40", should be people that match A, as well.

So, in my next post, I will explore that. I'll also describe some of the issues I have come across in this process so far, as well as explain the way I have adapted Kathy's method to my own way of working.

But, if you thought this chromosome was easy to phase, then perhaps you'd like to consider the following set of comparisons:

[PS Having begun to look at the matches the siblings have with their niece and their 1st cousin, as well as the more distant matches, I have found an "anomaly". So, perhaps phasing chromosome 21 isn't so straightforward, after all!]

* Sources and references I have found helpful:

Kathy Johnston - step by step instructions for her method: http://forums.familytreedna.com/showthread.php?t=36812 (make sure you download both the slides and the instructions)
Jason Lee - a blog post detailing Kathy's method: http://dnagenealogy.tumblr.com/post/137722603308/the-use-of-crossover-lines-among-siblings-to
Blaine Bettinger's pdf combining his five posts about the phasing process - http://thegeneticgenealogist.com/wp-content/uploads/2016/11/Visual-Phasing-Bettinger.pdf

Two other bloggers with helpful posts about phasing, including issues such as the way what looks like a single crossover for one sibling can actually be a double crossover for two others:
Ann Raymont - https://dnasleuth.wordpress.com/2016/06/01/chromosome-mapping-with-siblings-part-2/ (and part 1)
Joel Hartley: http://www.jmhartley.com/HBlog/?p=2239

 Sue Griffith's post on how obtain the values for crossovers from FIR to HIR & vice versa: http://www.genealogyjunkie.net/blog/obtaining-fir-boundaries-on-gedmatch-using-the-little-tick-marks

David Pike has a number of free DNA tools, including the "Search for Shared DNA Segments in Two Raw Data Files" which reports single and double matching segments (ie half identical and fully identical): http://www.math.mun.ca/~dapike/FF23utils/pair-comp.php

Friday, 25 November 2016

DNA Update

It has been an "interesting" year on my DNA journey. Ever since I first took an autosomal DNA test with 23andMe in 2010, I have been working on looking for what are known as "triangulating groups" (TGs) in the data. These are groups of people, who all match me over the same segment of DNA and who also all match each other over that same segment. The theory is that shared DNA indicates shared ancestry and, therefore, if a group of people all share the same segment of DNA, it must have come from the same ancestor (at some level - some of the people in the group may share a close ancestor along the line back to the overall shared ancestor.) The theory sounds "right" and logical, and it appears to fit the patterns I can see in the data:

 I liked using 23andMe for this process. It is the only testing company where it is possible to compare the people you match (and are sharing with) to each other and therefore confirm for yourself whether, or not, they form a TG. This is not possible at the other companies I have tested with. At Family Tree DNA (FTDNA), it is only possible to see where someone matches you, and whether they are "in common with" (ie also share some DNA with) any of your other matches. But you then need to ask them where they match the other people, in order to confirm if they actually match those people over the same segment that they match you on. If it is a different segment, so the TG theory went, then you may all be related to each other through different ancestors, since many of us probably have multiple ancestors in common, as we move further back in time. It was said that you could only be sure the DNA was from the same ancestor if you matched on the same segment.

Part of the difficulty in identifying the TGs at FTDNA, and why you cannot assume people who match you over what looks to be the same segment, and who are "in common with" each other, actually do match each other in the same place and therefore form a TG, is that these DNA tests do not phase the data, ie they do not split it into the two sides we received from our parents. We all have 23 pairs of chromosomes, one of each pair from our father and one from our mother - but the tests just report the two base pairs (bits of DNA!) we have at particular points along the chromosome. So, whilst it might look as if two people match you over the same segment of DNA, one could be matching you on your maternal side and one could be matching you on your paternal side. In that case, the DNA each shares with you would be from different ancestors, one on each side of your family. If the two people also happened to share another ancestor between them, they would show as "in common with" each other - but you would not all be a TG.

 [The lack of phasing also creates the possibility of "false positives" - people who show as a match but who aren't really, because the computers doing the matching have effectively criss-crossed between the base pairs of each chromosome. This is potentially an issue at both FTDNA and 23andMe, in particular. It isn't thought to be so much of an issue at Ancestry, as Ancestry does a form of phasing of the data. However, I didn't think such false matches were likely to be much of a problem, because I thought that, if a group of people were all triangulating, then the chances of all the comparisons being "computer creations" must be quite slim. I do have some groups of matches where no-one matches each other, despite all apparently matching me over the same segment - so those were the matches I took to be "false positives", as theoretically there can only be a maximum of two non-matching results over any particular segment. A third person must match one of the other two, if the matches are genuine.]

 Although I have more of my relatives tested at FTDNA, the reliance on having to contact your matches in order to obtain the details for how they match others was why FTDNA did not seem to be so useful to me, especially as many people do not respond to contact. And Ancestry does not give us any tools to analyse where the actual shared DNA is, so the process of finding TGs is impossible there. Therefore, whilst the other companies do have their own advantages, 23andMe was where I did most of my "work" and, although most of the triangulating groups at 23andMre shared relatively small segments with me (ie between 7cM - 15cM ), I had identified the potential shared ancestry with one of my matches, a 4th Cousin 1x removed, who shared 14cM with me and I just assumed the relationships for the other matches were likely to be further back in time.

So I was happy with my 23andMe process. I'd even agreed to do a talk for the Guild of One-Name Studies on using autosomal DNA, as I felt confident I knew what I was doing.

But a couple of months later, everything changed. A different theory had developed, partly as a result of statistics produced by Ancestry but also through the work of other scientists. These statistics demonstrated that the probability of several cousins actually sharing the same matching segment was very low, if not impossible. Instead of "triangles", we now had "circles" - and suddenly that brought into question exactly what all these "triangulating groups" really are.

The "circle" theory is still based on the fact that shared DNA means shared ancestry - but now the claim was that the shared DNA would be on different segments of the chromosomes, because of the way DNA is transmitted. A parent passes half their DNA to each child, but each child receives a different half, as there is a recombination process between each parent's two chromosomes before one chromosome is passed on to the child. After several generations, there would be quite a variety of smaller segments carried by cousins descended from the same ancestor. So, rather than looking for the TGs, we should be looking for "genetic networks", clusters of people who share DNA with each other in the cluster but not necessarily over the same segments. The existence of the TGs was explained partly by features in the testing process, such as the lack of phasing, but also by the existence of what are called "population segments" - sequences of base pairs that are just very common in particular populations, so everyone has them, even though there are no close ancestors in common.

How does one know the difference between a genealogically significant triangulating segment and a population segment? Or between a group of matches who have received different segments of DNA from a single ancestor and a group of matches who match on different segments that have come to them from a variety of shared ancestors? Surely the companies are taking these factors into account when they predict the matches? Were the results from the companies even reliable?

So many questions - I felt like I was floundering.

My confidence in what I was doing certainly took a dive at that time. It didn't help that I had also uploaded the raw data for my mother and I to another organisation, DNA Land, who claim to be able to impute "missing" (by which I assume they mean, "untested") areas of DNA, in order to produce a more complete sequence - and yet the number of matches they suggested as a result of this process was not only much less than I have at the other companies, it included people who don't appear to match me at any of the other companies. That seems strange, given that I have tested at all three of the main companies. I know only a small number of my matches elsewhere will have uploaded to DNA Land, but the differences still seemed quite significant [ie only three matches, including Mum, for me at DNA Land - compared to the 1888 I currently have at 23andMe, 1146 at FTDNA, and almost 6000 at Ancestry!]

Was this DNA testing all a waste of time (and money!)?

When in doubt - I go back to what I know. Just as I work from the known to the unknown in my normal genealogy, I realised I needed to do that more with my DNA research, as well. A "stab in the dark" may occasionally hit a target but it's just as likely to leave me floundering around in the darkness, following blind alleys.  And that's what looking for shared ancestry just from the TGs felt like.

The statistics from all of the companies indicate that autosomal test relationships can only be predicted reliably for about the first five generations. That is not to say we won't show a match to more distant relatives - it's just that, the more distant the relationship, the more difficult it becomes to predict the level of that relationship, as the range of possibilities increases. A single segment of DNA may be passed on unchanged for many generations. But, in all the test results, I knew my known relatives always showed up as they should do. My mother was definitely my mother (not that I doubted that!) And my father's known relatives all show up as matches at the right levels.

So DNA testing works!

Beating the temptation to run and hide, I gave the talk in August, describing the two theories and commenting that "most of us don't understand enough about the statistics to make definitive claims either way so a combination of the methods seems to be the best approach. Both methods are valid but have caveats, eg small segments often appear to triangulate, but may not be genuine, clusters of people sharing different DNA may be due to having multiple ancestors in common."

Some bloggers do seem to be finding segments that are shared by groups of distant cousins. The problem for many of us in the UK, though, is that often we don't have sufficient "middle-distance" relatives identified (both in our genealogy and in our DNA) to produce the sort of success stories that many in the US seem to be experiencing. For example I only have 29 fourth cousins in the Ancestry "4th cousins & closer" section, whereas some of the American results I have seen have between 400 - 750 relatives at that level!

But I have had some success in identifying relationships with my matches - I now have the potential shared ancestry identified for 10 of them (and if the 10th is actually correct, it's a big clue as to which of my ancestral lines three other shared matches fit into). So that's a start.

As well as confirming my genealogy & finding new relatives, one of my goals with DNA testing is mapping where my DNA came from. Identifying shared ancestry with my matches is one part of this process and, so far, my chromosome map, mapping DNA received to the relevant "most recent common ancestor" (MRCA), looks like this:

Chromosome 4 shows where a known Parry segment contains within it a Saunders segment:

And this shows how that Saunders segment of DNA appears to have passed down to my Parry grandfather:

Any other matches over the identified segments on the chromosome map should (if the identification is correct) be either a descendant of the same couple, or a descendant of one of their ancestors. 

I think there needs to be a continual checking process, using both DNA and genealogy - for example, having found a genealogical connection to one of my DNA matches at Ancestry, we were then able to confirm, using FTDNA, that the person also matched my mother over the same segment, and that neither my mother, nor I, matched the person's father (both requirements necessary for the genealogy to be correct.) 

Since I have several close relatives tested, it gives me the opportunity to work from the DNA data backwards, rather than just concentrating on those potential triangulating groups of distant relatives. My DNA consists of segments of the DNA of my grandparents, passed to me by each of my parents. The "crossover points", where a segment from one grandparent switches over to a segment from the other grandparent can (sometimes) be identified in our DNA, using the details of how we match close relatives. This is a process I began looking at some years ago, using tools written by David Pike. But now more of my relatives are on Gedmatch, I can use the "Visual phasing" method as explained by Kathy Johnston, which should be a lot easier. 

I have been working on this recently and will post about the process soon (now there's a challenge to myself!)

Friday, 24 June 2016

My Ancestors and their Descendants - my potential DNA Tree

Earlier this year, our ISP informed us that it would no longer support personal web spaces - a poor decision in my view (of course!)

The upside of this is that it will force me to do the web site "re-write" that I set as a goal in 2015.

The downside is that I haven't done it yet, so my Parry Surname Research (Family History and the One-Name Study) site has disappeared.

Theoretically, since the site was written in html and css, it would have been quite easy to just upload all the files elsewhere.  But then there'd be little incentive to get the rewrite done.  And, with the development of the Guild's "Members' Websites Project", it seems an ideal opportunity to separate out any personal family history from the Parry One-Name Study information, and to ensure the long term survival of the ONS data by placing it on the Guild's site.

So that's the plan. And it is in progress (slowly).

But today, frustrated at the loss of my "DNA tree", which I really need to accompany the autosomal DNA project I have set up at Family Tree DNA, I decided to try uploading that here, on Blogger.  It's taken a bit of tweaking of the coding, especially on the page width, which I hope I don't accidentally delete, but at least the information is available again:

My Ancestors and their Descendants - my potential DNA Tree

And now I've been reminded of just how many of my ancestors and their descendants I still need to trace. ☺

Tuesday, 5 May 2015

Other activities - the Genealogy Do-Over interlude

Sometimes I keep a diary.  And sometimes I don't.  And, when I don't, I often look back and wonder what I did for all those days! 

So, for my own future reference (and for any descendants who ever wonder what their "x times great" grandmother did), here are a few notes.  Firstly, I resurrected another hobby - sewing.  Prompted by the thought that the Saturday night banquet at the Guild of One Name Studies Conference has seen me wearing the same dress for a number of years, I decided to make a skirt - which then developed into making a skirt, top, evening bag and several other items just for the fun of it.  Getting the critical items finished on time did involve stitching at 5.30 am on the morning of the banquet but, since I'd woken up early anyway, it seemed like a good use of my time.

Finishing the sewing so early at least left me free to chat to people in any spare time during that day.  And chat I did, as the Conference is a great time for catching up with "old" friends, as well as making new ones.  Some of the conference sessions were recorded and the videos are available on the Guild's YouTube channel - I am looking forward to watching some of those sessions I missed, due to there being two sessions running at the same time.  It would be hard to pick highlights from the Conference, as it was all so good, but I think Jim Benedict's interactive session on "Succession-Proofing your ONS" probably stands out as providing the most laughs, as the various groups debated why *their* method of succession-proofing was best (Debbie, have you bought that spaceship yet?).

We heard more about the Guild Members Websites project over the weekend and I took the opportunity to chat with Mike Spathaky about his Cree Study site, and the various different options for producing websites.  It was Mike who had asked me, on the Guild hangout in February, why I was thinking of moving my PARRY ONS site to WordPress.  As a result of our discussions about the benefits, and potential longevity, of html, I now have a few more reasons for not doing so.

For the first time at the Conference, on the Friday afternoon there was an informal meeting for those interested in DNA testing.  Despite me being totally disorganised, having arrived at the hotel later than planned, and then walking all the way to my hotel room, only to discover that my key didn't work, so that I was still carrying around half my belongings at the time the meeting began, things seemed to run smoothly as we all shared about our various levels of involvement with DNA testing.  No doubt we will all be building on this in the coming months and years. 

I have frequently come away from the Conference with some snippet of Parry information, whether it has been from Marriage Challenge certificates passed on to me, or references I have found in books on the bookstall, or in someone's talk, etc.  This year was no exception, as Jo Fitz-Henry very kindly supplied me with photographs of some Parry gravestones that she had come across.  I'll write more about those on the Parry ONS blog.

The Conference was held at Brigg in Lincolnshire and my route there provided an opportunity to drive past RAF Scampton, one of the bases where my mother had been stationed in her WRAF days.  When planning my conference attendance, I had originally thought of contacting the museum on the base with a view to arranging to visit enroute to Brigg.  It was probably a good job I didn't do that, given how time went.  But that's now on my "To Do" list, for another occasion.

Moving on from the Conference in March, the next main event was the WDYTYA? Live Show in April which, for the first time, was being held at the NEC, Birmingham.  This provided another incentive to do some sewing!  Several years ago, Dick Eastman blogged about the Progeny Charting Companion program and its ability to produce an embroidery pattern from your family tree.  "What a wonderful idea," I thought, and soon after that, I was able to replace my 35 year old sewing machine with a new one capable of following such a pattern.  Then came the "busy-ness" of the last few years.  I still haven't tried that program but, ever since I discovered some ancestors who were "artisans in fireworks", I have had an idea in my mind - and I finally managed to execute that in time to wear to the show.

Okay, the hall was too warm to actually wear the hoody *in* the show, but I'd achieved my goal!  I'm now on the look-out for other items I can embroider with bits of my family history!

At the show, I was helping to man the ISOGG stand (ISOGG = International Society of Genetic Genealogy).  We were so busy throughout most of the time that I was amazed I hadn't lost my voice - it seemed like every time I sat down, another visitor would arrive with a query.  Hopefully, we will be seeing a rapid increase in DNA testing in the UK over the coming months, especially now all three of the main companies (FamilyTreeDNA, 23andMe and Ancestry) are marketing their products here.  Another enjoyable aspect of WDYTYA was meeting many of the ISOGG members who came across from the United States to assist with the practical aspects of testing on the FTDNA stand.  Although ISOGG itself is an independent organisation and, as far as possible, information is always presented without bias, many of us would admit to having a personal preference towards FTDNA, not least because they are the only testing company that support the YDNA and mtDNA projects.  (Having taken the autosomal test at all three companies, I think it only fair to mention that I can find pros and cons for each of them.)

There was a fair amount of catching up to do, after the three days of "doing nothing" at WDYTYA, which was followed by a deadline for some paperwork.  But, now that's been met, I find myself actually restarting my Genealogy Do-Over. 

I wonder whether I can get to week 13 without any further interruptions!

Genealogy Do-Over "restart"

It's time to restart my restart!

As I described in my last post, I needed to postpone my Genealogy Do-Over, as other activities have had to take priority recently.  However, I'm now back again - and, amazingly, back before the repeat of the scheduled Do-Over week that I had paused at.  So that gives me a bit of time to refresh my memory of what I had been doing (seems to be an increasingly necessary task these days!)

There has still been some - almost unintentional - progress on the Do-Over topics in the interim.  I have bought a new laptop, as the start up of my previous one would have been beaten by a snail doing a marathon.  Unlike previous occasions when I have changed computers, this time I do not intend to just transfer everything across in one go, thus maintaining (and perhaps being limited by) the old file structure.  Instead,  I will take the opportunity to redesign my filing system - which was one of my aims for the Do-Over.  Since I am keeping the old laptop to use whenever I run a stand for the Guild of One-Name Studies at a family history fair, the new laptop has also been a good opportunity to purchase full and/or up-to-date versions of the programs I'm going to be using from now on, such as Legacy and Evidentia.

So the next couple of weeks will be a steep learning curve, as I start to get to grips with these properly, as well as continue trying to build the use of programs such as OneNote and Evernote into my routine, in order to maintain a good system to my research files and the Parry data collection, in particular.  Thankfully, many of the programs have active User Groups, which I imagine I shall be making frequent use of!